COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

Size: px

Start display at page:

Download "COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION"

Hector Porter
5 years ago
Views:

1 (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department of Statistics, İzmir, Turkey 2 Eskişehir Osmangazi University, Department of Biostatistics and Medical Informatics, Eskişehir, Turkey Received: Accepted: Abstract: Quite a few robust estimators have been proposed by many authors since the well-known estimators of the location and scale parameters, the sample mean and the sample standard deviation, which are not robust to deviations from normality. There are some studies in the literature investigating the robustness of the various methods through simulation but they generally focus on investigating the performance of the estimators of the location parameter. In this study we compared the performance of two types of Huber s M estimators (w24 and BS82), the modified maximum likelihood (MML) estimators, and the sample median and the scaled median absolute deviation (MAD) w.r.t. the sample mean and the sample standard deviation via simulation under the mixture and outlier models. Depending on the simulation results, in the estimation of the location parameter, we can suggest the general usage of the Huber s M estimators. In the estimation of the scale parameter, the MML estimator of the scale parameter can be used unless the sample size and the extremity of contamination (k) are large. In such situations the sample standard deviation should be preferred. Keywords: Modified Maximum Likelihood; Robustness; M Estimators; Mixture Model; Outlier Model 1. INTRODUCTION The most well-known estimators of the location and scale parameters are the sample mean and the sample standard deviation, respectively. They have the optimal properties under normality but they do not possess robustness which means they lose considerable amount of efficiency in the case of deviations from normality or in the presence of outliers [1-3]. Assumption of normality may not be realistic for various real life data sets [3]. Ignoring the violation of the normality assumption can end up with inefficient estimation of the parameters. This may possibly lead to a wrong analysis and interpretation of the situation. Quite a few robust estimators have been proposed to * Corresponding Author: Tel: Fax: hakan.savas.sazak@ege.edu.tr 20

2 alleviate this problem. Wilcox [3] gave the definitions and properties of a variety of estimators in detail. Wilcox [3], Özdemir [4], and Wilcox and Özdemir [5] performed simulation studies to compare the efficiencies of several estimators of the location parameter for different distributions and models. As a common result, they have found that no estimator of the location parameter is the best for all situations but it is clear that the sample mean is the least efficient estimator unless the distribution is normal. This is an expected result since it is known that the sample mean is too sensitive to deviations from normality [1]. In this study, we will first introduce the most popular estimators of the location and scale parameters and then compare their performance through a Monte Carlo simulation study under several situations. In detail, two types of Huber s M estimators (w24 and BS82), the modified maximum likelihood (MML) estimators, and the sample median and the scaled median absolute deviation (MAD), will be compared with the sample mean and the sample standard deviation under the normal and non-normal distributed data sets for various sample sizes. Non-normal conditions are provided with different mixture and outlier models. The paper is organized as follows. We give the descriptions of the mentioned estimators of the location and scale parameters, and the mixture and outlier models in Section 2. Section 3 contains the simulation results which are performed to compare the efficiencies of the introduced methods. Final section includes some concluding remarks and suggestions. 2. METHODOLOGY The usual estimators of the location and scale parameters are the sample mean and the sample standard deviation which are, respectively, n x 1 x i n and n 1 2 s (x i x). n 1 i 1 i 1 21

3 A. Huber s M Estimators Let y 1,y 2,,y n be a random sample from a distribution of the type (1/ )f((y- )/ ). Huber [6] assumed that f is unknown but a long-tailed symmetric distribution (kurtosis 3), and then proposed a new method to estimate the location and 4 22 scale parameters. Gross [7] investigated 25 estimators of and out of 65 estimators discussed by Andrews et al. [8] and recommended three of them, namely, the wave estimators w24, the bisquare estimators BS82 and the Hampel estimators H22 [1]. In this study, w24 and BS82 estimators were used for comparison. Pairs of equations according to w24 and BS82, respectively, are shown below: T 0 =median(y i ), S 0 =median ( y i T 0 ) and, z i = (y i T 0 ) ( 1 i n. h S 0 ) For w24, μ w24 = T 0 + (hs 0 ) tan 1 [ i sin z i ] and i cos z σ w24 = (hs 0 ) [n i where h=2.4. i sin(z i ) 2]1/2 ( i cos(z i )) For BS82, μ BS82 = T 0 + (hs 0 ) i ψ(z i ) i ψ (z i ) and σ BS82 = (hs 0 ) [n i ψ 2 (z i ) 2]1/2 ( i ψ (z i )). Here, ψ(z) = { z(1 z2 ) 2 ; z 1 0 ; z > 1 where h=8.2. and ψ (z) = 1 6z i 2 + 5z i 4 Remark: Gross [7] tried various h coefficients for the wave and bisquare estimators and finally recommended using h=2.4 and 8.2 for w24 and BS82 estimators, respectively, depending on the Monte Carlo simulations since they possess both high efficiency and robustness for these coefficients. 22

4 B. Modified Maximum Likelihood (MML) Estimators The normality assumption is too restrictive from applications point of view; see, for example, Huber [9] and Tiku et al. [10]. Hampel et al. [11] pointed out that many real life data can be approximated by Student s t-distribution. Assuming Student s t distribution also provides more robust estimators [12]. Because of these facts we assumed an underlying long tailed symmetric (LTS) distribution which is a scaled Student s t distribution with 2p-1 df. scaled so that its variance is 2. Another advantage of LTS distribution is that it covers normal distribution since it reduces to normal distribution for p=. Let X be a random variable following LTS distribution that is shown below: f(x; p) = p 1 (x μ)2 σ kβ ( 1 2, p 1 (1 + 2 ) kσ 2 ), < x < where k = 2p 3 (p 2); β(a, b) = Γ(a)Γ(b)/Γ(a + b). In order to obtain the MML estimators of the location and scale parameters which originated with Tiku [13], initially the maximum likelihood (ML) equations are expressed in terms of the ordered variates z (i) = x (i) μ σ simply by replacing z i = x i μ σ by z (i) ( 1 i n). The intractable terms in the likelihood equations are linearized by using the first two terms of a Taylor series expansion and the following estimators are obtained for a given value of p. where μ MML = n i=1 β ix (i) n i=1 β i n B = 2p α k i=1 i [X (i) i=1 β i n n and X (i) i=1 β i σ MML = B+ B 2 +4nC 2 n(n 1) n ] and C = 2p β k i=1 i [X (i) i=1 β i n n X (i) i=1 β i 2 ], 23

5 α i = (1/k)t 3 (i) [1+(1/k)t (i) 2 ] 2 and β i = 1 [1+(1/k)t 2 (i) ] 2. Note that, t (i) = E(z (i) ) where z (i) = x (i) μ For 1 i n, t (i) can be obtained from the equation f(z)dz. σ. t (i) In real life, the parameter p of LTS distribution is not known. In our study we used a calibration technique [14] to estimate p. The likelihood function of LTS distribution is computed for several values of p with the corresponding MML estimates of μ and σ. Then, the value of p, that maximizes the likelihood function, is taken as the estimate of p. C. Median and Median Absolute Deviation (MAD) Median (μ ) is one of the widely known robust estimators of the location parameter. Let y 1,y 2,.,y n be a random sample. Median is the middle order statistic when n is odd. When n is even, then the average of the order statistics with ranks (n/2) and ((n/2)+1) is equal to the median. Median absolute deviation (MAD) is a simple way to calculate the variation of a data set which is median y i median(y i ). It was first used to estimate the unknown scale parameter. Then, MAD was scaled by dividing it by to make it an unbiased estimator of for normal distribution as follows MAD = median y i median(y i ) Mixture and Outlier Models of Normal Distribution In the mixture model, a sample contains subsamples and each of these subsamples comes from a different population with a specified probability. 24

6 In this study, for the mixture model, we assume that the sample contains two subsamples that come from normal distributions with mean zero but with different scale parameters with probability π and (1 π), respectively. The mixture model is shown below: obs. with probability π~ N(0, k 2 ) and obs. with probability (1 π)~ N(0,1) This model has mean 0 and variance (1 - π + π k 2 ). Consider that a sample which has outliers contains totally n observations. If we want to model this sample with a theoretical distribution, outliers and the regular observations must be modeled separately. The model that is combined with the distributions of regular and outlying observations is called an outlier model. In this study, for the outlier model, it is assumed that both regular and outlying observations come from a normal distribution with mean zero but with different scale parameters. The outlier model is shown below: a obs. ~ N(0, k 2 ) and (n a) obs. ~ N(0,1) This model has mean 0 and variance (1 - (a/n) + (a/n) k 2 ). For both the mixture and outlier models, k can be considered as the extremity of contamination. Remark: Under regularity conditions, the distributions of the sample mean and the sample standard deviation are approximately normal for large n (see Bain and Engelhardt [15] and Kenney and Keeping [16]). In the same way, under regularity conditions, M estimators have asymptotic normal distribution (see Huber [2]). The MML estimators also have asymptotic normal distribution (like the ML estimators) under very general regularity conditions since they are asymptotically equivalent to ML estimators (see Tiku and Akkaya [1] for details). Since the sample median is a central order statistic (or the average of two central order statistics for the even sample size), it also has asymptotic normal distribution under certain conditions (see Bain and Engelhardt [15]). Hall and Welsh [17] showed that MAD is asymptotically normal under only very mild smoothness conditions on the underlying distribution. It is 25

7 possible to work out the exact distributions of the mentioned estimators under several situations by using some approximation methods as Edgeworth expansion or saddlepoint techniques but it can be very cumbersome [2]. 3. SIMULATION STUDY In this study, the performance of various estimators of the location and scale parameters are investigated under standard normal distribution and different cases of the mixture and outlier models of normal distribution through simulation. (100,000/n) Monte Carlo runs are performed with MATLAB package program. The simulations are done for the sample sizes n=20, 50 and 100. For the mixture model, the probability that the observations come from N(0, k 2 ) is taken as π = 0.05 and 0.1. For the outlier model, the proportion of the outliers in a sample is taken to be p=0.05 and 0.1. For both model, the extremity of contamination, k, is taken as 5, 10 and 20. After the data sets have been generated, they are standardized by the square root of the variance of the model. Thus, for all the data sets, the expected mean value is zero and the expected standard deviation is 1. Then, the simulated means, biases, variances, mean square errors (mse) and relative efficiencies (eff) are calculated to investigate the efficiency of the mentioned estimators. The estimators of the location parameter, μ w24, μ BS82, μ MML and μ, are compared according to the relative efficiency w.r.t the sample mean x. The formula of the relative efficiency is shown below: eff(θ i x ) = 100x mse(x ) mse(θ i) θ i ( i=1,2,3,4; θ 1 = μ w24, θ 2 = μ BS82, θ 3 = μ MML, θ 4 = μ ) The estimators of the scale parameter, σ w24, σ BS82, σ MML and MAD, are compared according to the relative efficiency w.r.t the standard deviation (s). The formula of the relative efficiency is shown below: eff(θ i s) = 100x mse(s) mse(θ i) 26

8 θ i (i=1,2,3,4; θ 1 = σ w24, θ 2 = σ BS82, θ 3 = σ MML θ 4 = MAD ) The simulation results are given in Tables 1-7. Tables include simulated means, biases, variances, mse s and efficiency values. The values in the tables are grouped by π and k for Tables 1-6. Table 7 contains the results of the data sets from the standard normal distribution. In general, the Huber s M estimators, w24 and BS82 estimators produce very similar results. Thus, we give comments about the Huber s M estimators without differentiating between them. Table 1 shows the simulation results for the mixture model when the sample size is equal to 20. It is observed that as π or k values increase, efficiencies of the robust estimators of the location parameter increase. Huber s M estimators of the location parameter are the most efficient estimators and the sample mean is the worst estimator of the location parameter in this situation. The MML estimator of the location parameter is more efficient than the median for whereas it is worse than the median for and 20. For the scale parameter estimation, for the situation when π=0.05 and, the Huber s M estimators of the scale parameter are the best. The MML estimator of the scale parameter takes the second place although there is only a marginal difference between the MML estimator and the Huber s M estimators. MAD* takes the third place and the sample standard deviation is the worst. In the other situations the MML estimator is the best. For the situation when π=0.05 and Huber s M estimators are the second best and MAD* takes the third place which is only marginally better than the sample standard deviation. In all other situations, the sample standard deviation is the second best after the MML estimator of the scale parameter. It is seen that as π or k values increase, the bias of Huber s M estimators of the scale parameter and MAD* get larger and this makes them extremely inefficient in estimating the scale parameter. The simulation results of the mixture model for the sample size n=50 are given in Table 2. All the estimators of the location parameter give higher efficiencies than the sample mean and have the same order as they have in Table 1. The MML estimator of the scale parameter is the best for and for π = 0.05 and. As π or k values increase, The MML estimator of the scale parameter produce some bias and it becomes inefficient w.r.t the sample standard deviation. For high k values, the sample standard 27

9 deviation is the best among all the scale parameter estimators. We should especially note that the Huber s M estimators of the scale parameter and MAD* produce huge bias as π or k values increase. For example for π = 0.1 and, their mean is around 0.17 making an approximate bias of which leads to an efficiency around 21%. The last simulation study for the mixture model is done for the sample size n=100, which is shown in Table 3. Results are very similar with Table 1 and 2 for the estimation of the location parameter. In the estimation of the scale parameter, the sample standard deviation is the best estimator except for where the MML scale estimator is the best. Huber s M estimators of the scale parameter and MAD* cannot be used for this situation because of huge bias and extreme inefficiency. In Table 4, the simulation results of the outlier model for the sample size n=20 are given. Again, Huber s M estimators produce the most efficient estimation of the location parameter. They are followed by the MML estimator for low p and k values. For the situation when, the sample median is better than the MML estimator of the location parameter. It is also better than the MML estimator when p=0.1 and whereas it is worse than the MML estimator when p=0.05 and. The sample mean is the worst estimator of the location parameter which is not a surprising result. The MML estimator of the scale parameter dominates this table. The sample standard deviation takes the second place except for p=0.05 and where Huber s M estimators are better. Again, the Huber s M estimators of the scale parameter and MAD* produce huge bias as the values of p and k increase. Table 5 shows the simulation results of the outlier model for the sample size n=50. The results are very similar with the results of Table 4 for the location parameter. In the estimation of the scale parameter, the MML estimator of the scale parameter is the best for and for p=0.05 and. In other situations it takes the second place behind the sample standard deviation. For high k values, the MML estimator of the scale parameter produce some bias but the bias produced by the Huber s M estimators of the scale parameter and MAD* is huge. For example for p=0.1 and, their mean is around 0.17 and their bias is approximately This inevitably results in an extremely inefficient estimation of the scale parameter. 28

10 Table 6 contains the simulation results of the outlier model for the sample size n=100. The results are again very similar with the results of Table 4 and 5 for the location parameter. The MML estimator of the scale parameter is the best for. In all other situations the sample standard deviation is the best estimator of the scale parameter and the MML estimator takes the second place. Huber s M estimators of the scale parameter and MAD* have huge bias in all situations of Table 6 and the bias gets huge as p and k get larger. It is very obvious that they cannot be used in the estimation of the scale parameter for this case. Finally, Table 7 gives the simulation results for the sample size n=20, 50 and 100 when the underlying distribution is standard normal. It is very natural to see that the sample mean and the standard deviation are the most efficient estimators in this situation. The MML estimators of the location and scale parameters take the second place although there is only a marginal difference between them and the sample mean and the sample standard deviation. The Huber s M estimators of the location and scale parameters take the next place. There is no big difference between the Huber s M estimators of the location parameter and the MML estimator of the location parameter but there is a significant difference between the Huber s M estimators and the MML estimator of the scale parameter. The sample median and MAD* have very poor efficiencies in this situation. 29

11 π = π = 0. 1 LOCATION PARAMETERS SCALE Methods: μ w24 μ BS82 μ MML μ x σ w24 σ BS82 σ MML MAD* S mean bias variance mse eff mean bias variance mse eff mean bias variance mse eff mean bias variance mse eff mean bias variance mse eff mean bias variance mse eff Table 1 Simulation results of the mixture model for n=20 30

12 LOCATION PARAMETERS SCALE Methods: μ w24 μ BS82 μ MML μ x σ w24 σ BS82 σ MML MAD* S mean bias variance mse eff π = mean bias variance mse eff mean bias variance mse eff mean bias variance mse eff π = 0. 1 mean bias variance mse eff mean bias variance mse eff Table 2 Simulation results of the mixture model for n=50 31

13 LOCATION PARAMETERS SCALE Methods: μ w24 μ BS82 μ MML μ x σ w24 σ BS82 σ MML MAD* S mean bias variance mse eff π = mean bias variance mse eff mean bias variance mse eff mean bias variance mse eff π = 0. 1 mean bias variance mse eff mean bias variance mse eff Table 3 Simulation results of the mixture model for n=100 32

14 LOCATION PARAMETERS SCALE Methods: μ w24 μ BS82 μ MML μ x σ w24 σ BS82 σ MML MAD* S mean bias variance mse eff p = mean bias variance mse eff mean bias variance mse eff mean bias variance mse eff p = 0. 1 mean bias variance mse eff mean bias variance mse eff Table 4 Simulation results of the outlier model for n=20 33

15 LOCATION PARAMETERS SCALE Methods : μ w24 μ BS82 μ MML μ x σ w24 σ BS82 σ MML MAD* S mean bias variance mse eff p = mean bias variance mse eff mean bias variance mse eff mean bias variance mse eff p = 0. 1 mean bias variance mse eff mean bias variance mse eff Table 5 Simulation results of the outlier model for n=50 34

16 LOCATION PARAMETERS SCALE Methods: μ w24 μ BS82 μ MML μ x σ w24 σ BS82 σ MML MAD* S mean bias variance mse eff p = mean bias variance mse eff mean bias variance mse eff mean bias variance mse eff p = 0. 1 mean bias variance mse eff mean bias variance mse eff Table 6 Simulation results of the outlier model for n=100 35

17 n=20 LOCATION PARAMETERS SCALE Methods: μ w24 μ BS82 μ MML μ x σ w24 σ BS82 σ MML MAD* S mean Bias variance mse eff n=50 n=100 mean bias variance mse eff mean bias variance mse eff Table 7 Simulation results for standard normal distribution 4. CONCLUSION In this paper we have done a simulation study to compare the performance of some well-known estimation methods for the location and scale parameters under several conditions including the mixture and outlier models and the normal distribution. For the mixture and outlier models, similar results are observed. In the estimation of the location parameter, the Huber s M estimators give the best results. For low k values the MML estimator takes the second place whereas for high k values the sample median is the second best estimator of the location parameter. The worst estimator of the location parameter is the sample mean. In the estimation of the scale parameter, for the sample size n=20, the MML estimator of the scale parameter is always the best estimator except the case when p or π=0.05 and where the Huber s M estimators are the best. In other situations the efficiency of the Huber s M estimators and MAD* are very close to each other but both are worse than the MML estimators and the sample standard deviation. It is easily observed that as the values of π or p and k get higher, the Huber s M estimators of the scale parameter and MAD* produce great bias and become very inefficient. For the sample size n=50, the MML estimators are still the best for and 10 except the case when p or π=0.1 and where the sample standard deviation is the 36

18 best. In other situations the MML estimator takes the second place after the sample standard deviation. Thus, the MML estimator of the scale parameter and the sample standard deviation takes the first two places in estimating the scale parameter. Again, as the values of π or p and k get higher, the Huber s M estimators of the scale parameter and MAD* produce great bias and become very inefficient. For the sample size n=100, the MML estimator of the scale parameter is the best estimator just in the case when. The second best is the sample standard deviation. In all other cases the sample standard deviation is the best and the MML estimator of the scale parameter is the second best estimator of the scale parameter. For this case both the Huber s M estimators of the scale parameter and MAD* are extremely inefficient w.r.t. the sample standard deviation. This is because of the fact that they produce huge bias and the bias gets larger as π or p and k get higher. Finally, in the simulation for the standard normal distribution, as expected, the best results are produced by the sample mean and the sample standard deviation. The MML estimators of the location and scale parameters take the second place but there is just a marginal difference between them and the sample mean and the sample standard deviation. The Huber s M estimators take the third place. The sample median and MAD* are the worst estimators of the location and the scale parameter, respectively, for this situation. They are extremely inefficient and cannot be used. If we have to give a suggestion for the usage of the estimator of the location parameter, we can suggest the usage of the Huber s M estimators. In the estimation of the scale parameter, the MML estimator of the scale parameter can be used unless the sample size and the extremity of contamination (k) are large. In such situations the sample standard deviation should be preferred. REFERENCES [1] M. L. Tiku and A. D. Akkaya, Robust Estimation and Hypothesis Testing, 2004, New Delhi. [2] P.J. Huber, Robust Statistics, Wiley, New York,1981. [3] R.R. Wilcox, Introduction to Robust Estimation and Hypothesis Testing, 2005, Elsevier Academic Press, Second Edition. 37

19 [4] A.F. Özdemir, Comparing measures of location when the underlying distribution distribution has heavier tails than normal, İstatistikçiler Dergisi, 2010, 3, pp [5] A.F. Özdemir and R. Wilcox, New results on the small-sample properties of some robust univariate estimators of location, Communications in Statistics - Simulation and Computation, 2012, 41(9), pp [6] P.J. Huber, Robust estimation of a location parameter, Annals Math. Stat., 1964, 35, pp [7] A.M. Gross, Confidence interval robustness with long-tailed symmetric distributions, J. Amer. Stat. Assoc., 1976, 71, pp [8] D.F. Andrews, P.J. Bickel, F.R. Hampel, P.J. Huber, W.H. Rogers and J.W. Tukey, Robust Estimates of Location: Survey and Advances, 1972, Princeton, NJ: Princeton University Press. [9] P.J. Huber, Robust Statistics, Wiley, New York, [10] M.L. Tiku, W.Y. Tan and N. Balakrishnan, Robust Inference, 1986, Marcel Dekker, New York. [11] F.R. Hampel, E.M. Ronchetti, and P.J. Rousseeuw, Robust Statistics, 1986, Wiley, New York. [12] K.L. Lange, R.J.A. Little, J.M.G. Taylor, Robust statistical modeling using the t-distribution, Journal of the American Statistical Association, 1989, 84 (408), pp [13] M.L. Tiku, Estimating the mean and standard deviation from a censored normal sample, Biometrika, 1967, 54, pp [14] H. Yilmaz and H.S. Sazak, Double-looped maximum likelihood estimation for the parameters of the generalized gamma distribution, Mathematics and Computers in Simulation, 2014, 98, pp [15] L. J. Bain and M. Engelhardt, Introduction to Probability and Mathematical Statistics, Second edition, PWS-Kent, Boston. [16] J. F. Kenney and E. S. Keeping, Mathematics of Statistics, Part 2, Second edition, Princeton, [17] P. Hall and A. H. Welsh, Limit theorems for the median deviation, Annals of the Institute of Statistical Mathematics, 1985, 37 (1), pp

Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution

Journal of Computational and Applied Mathematics 216 (2008) 545 553 www.elsevier.com/locate/cam Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution