Weighted empirical likelihood estimates and their robustness properties

Computational Statistics & Data Analysis ( ) www.elsevier.com/locate/csda Weighted empirical likelihood estimates and their robustness properties N.L. Glenn a,, Yichuan Zhao b a Department of Statistics, University of South Carolina, Columbia, SC 29208, USA b Department of Mathematics & Statistics, Georgia State University, Atlanta, GA 30303, USA Abstract Maximum likelihood methods are by far the most popular methods for deriving statistical estimators. However, parametric likelihoods require distributional specifications. The empirical likelihood is a nonparametric likelihood function that does not require such distributional assumptions, but is otherwise analogous to its parametric counterpart. Both likelihoods assume that the random variables are independent with a common distribution. A nonparametric likelihood function for data that are independent, but not necessarily identically distributed is introduced. The contaminated normal density is used to compare the robustness properties of weighted empirical likelihood estimators to empirical likelihood estimators. It is shown that as the contamination level of the sample increases, the root mean squared error of the empirical likelihood estimator for the mean increases. Conversely, the root mean squared error of the weighted empirical likelihood estimator for the mean remains closer to the theoretical root mean squared error. 2006 Elsevier B.V. All rights reserved. Keywords: Empirical likelihood; Weighted likelihood; Contaminated normal distribution 1. Introduction Likelihood-based methods are effective in finding efficient estimators, constructing tests with good power properties, and quantifying uncertainty through confidence intervals and confidence regions (Owen, 2001). Nevertheless, when the distribution is misspecified, parametric likelihood-based estimates are inefficient. In this case, the empirical likelihood is preferred because it does not require such distributional assumptions. Weighted empirical likelihood requires even fewer distributional assumptions since it relaxes the identical distributed assumption. Hence, it can offset problems arising from auxiliary information such as contamination by modifying the constraints or objective function. We develop weighted empirical likelihood for point estimators which extends to confidence intervals, confidence regions, and likelihood-based methods. Empirical likelihood methods extend classical imum likelihood methods for random samples from a common distribution of known functional form to the situation where the form of the distribution is unknown. In many applications, the observations may be from different distributions having a common mean. For example, if two types of measuring instruments have different variabilities and these instruments are used to obtain data, a fraction of these data Corresponding author. Tel.: +1 803 7773788; fax: +1 803 7774048. E-mail address: nglenn@stat.sc.edu (N.L. Glenn). 0167-9473/$ - see front matter 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2006.07.032

2 N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) comes from a more variable distribution. Classical robustness considered the problem of estimating the mean in one such case. In particular, Tukey (1960) introduced the contaminated normal family of densities CN ( γ, σ 2) : f γ,σ (x) = 1 (1 γ)e x2 /2 + 1 2π σ /2σ 2 2π γe x2, (1) where γ,0 γ 1is the contamination parameter, and σ is the scale parameter. The robustness implications of contaminated data were studied by Lehmann (1983), Gastwirth and Cohen (1970), Andrews and Mallows (1974), and more recently by Taskinen et al. (2003). They also arise in regression (Sinha and Wiens, 2002), and financial data (Ellis et al., 2003). We compare the properties of the weighted empirical likelihood estimate to those of the usual empirical likelihood, the trimmed mean, and the Winsorized mean in the context of robust estimation. Results indicate that the weighted empirical likelihood estimate is somewhat more robust to contamination than the usual empirical likelihood estimate. As the sample size increases, weighted empirical likelihood estimators are comparable to the trimmed mean and the Winsorized mean. The trimmed mean and Winsorized mean s disadvantages are that the amount of trim and the number of observations replaced are somewhat arbitrary. In Section 2, we review the empirical likelihood approach and introduce the weighted empirical likelihood for the mean. The major difference between weighted empirical likelihood and empirical likelihood is that the former incorporates a weight vector that weighs each observation s contribution to the likelihood function. Section 3 details how to obtain the weight vector, presents the nonlinear programming problem, then solves it to obtain the weighted empirical likelihood estimator for the mean. Section 4 explores the properties of the weighted empirical likelihood estimator when estimating the mean in the contaminated normal model. Concluding remarks are presented in Section 5. 2. Definitions 2.1. Empirical likelihood For a random sample X = {X 1,X 2,...,X n } of size n, the parametric likelihood L(θ) = L (θ; X 1,X 2,...,X n ), θ Θ = f (X i ; θ) if {X 1,X 2,...,X n } are independent (2) is a function of θ which takes values in the space Θ. When the density function f(x; θ) is unknown one can use the empirical likelihood method (Owen, 1988, 2001). Definition 2.1. Suppose {X 1,X 2,...,X n } is a random sample from an unknown distribution. Let the parameter θ denote the mean. Suppose p i is the probability mass placed on X i, n p i = 1, p i 0. Let t(p) = n p i X i denote the value t assumes at p. The empirical likelihood (Owen, 1988, 2001) for θ is defined as L(θ) = p,t(p)=θ For all θ in the convex hull of X, p,t(p)=θ p i p p i. (3) p i 1 n ) = L (ˆθ, (4)

N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) 3 where ˆθ = ) n X i /n. Therefore, L (ˆθ = n n = θ L(θ). The corresponding empirical log-likelihood is ( l(θ) = log L(θ) / )) L (ˆθ. (5) To determine p i,1 i n, we consider the following nonlinear programming problem: p log ( ) np i subject to p i X i = θ, and p i = 1 p i 0. Example 2.1 illustrates the definition. (6) Example 2.1. Let θ equal the population mean μ. Plot the empirical likelihood for the population mean, Eq. (3), for the small random sample X ={1, 3, 7} with sample mean x = 3.67. Solution: We use an equilateral triangle to represent the probability vector p = (p 1,p 2,p 3 ) that corresponds to a specific μ. Since n = 3, p = (p 1,p 2,p 3 ) can be represented in Baricentric coordinates. Each data point is associated with a vertex of an equilateral triangle. Without loss of generality, suppose that X 1 = 1 and X 2 = 3 are associated with the bottom left and right vertices, respectively, and that X 3 = 7 is associated with the top vertex. The p i s, i = 1, 2, 3 represent probabilities and p = (p 1,p 2,p 3 ) gives a multinomial distribution on the n = 3 points. Hence, the probability vectors that correspond to 1 and 3 are (1, 0, 0) and (0, 1, 0), respectively. For μ values other than {1, 3, 7} a convex combination determines corresponding probability vectors: p γ = γp i + (1 γ)p j, (7) where γ [0, 1] and i, j {1, 2, 3},i = j. Since empirical likelihood is supported on the data, μ s that lie outside the convex hull of the data are not considered. All probability vectors corresponding to specific mean values lie on lines that intersect distinct sides of the equilateral triangle. Fig. 1 is a plot of the product of the elements of p γ for five possible values of μ. The plot in Fig. 1 uses integer values for convenience, but real numbers could have been used. The imum value of each curve is the value of the empirical likelihood for that particular value of the mean. Hence, the empirical likelihood is a profile likelihood. The empirical likelihood ratio is the empirical likelihood divided by the largest possible value of the empirical likelihood, n n. Example 2.1 describes one approach for finding the empirical likelihood value for the mean. It is useful to summarize the steps involved in obtaining the empirical likelihood function for the mean: 1. Choose an interval of hypothesized mean values. 2. For each mean value μ in the interval, find several corresponding probability vectors p γ, γ [0, 1]. 3. For each μ, graph γ vs. the product of the elements of p γ. Graph this product for several values of p γ ; see Fig. 1. 4. Find the imum μ for each curve; see Fig. 1. 5. Each optimal value found in the previous step is the empirical likelihood for each corresponding μ. The empirical likelihood ratio function is the empirical likelihood divided by the nonparametric imum likelihood estimator of the cumulative distribution function; see Fig. 2.

4 N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) X = {1,3,7} probability products 0.04 0.03 0.02 0.01 Mu=2 Mu=3 Mu=4 Mu=5 Mu=6 0.0 0.0 0.2 0.4 0.6 0.8 1.0 gamma Fig. 1. Plots of γ [0, 1] vs. the product of the elements of p γ. The indicated imums (filled points) are the values of the empirical likelihood function for corresponding mean values, μ. 1.0 X = {1,3,7} empirical likelihood ratio 0.8 0.6 0.4 0.2 0.0 2 3 4 Mu 5 6 Fig. 2. Empirical likelihood ratio curve for the parameter μ for data in Example 2.1. Note that the curve reaches its imum value at the sample mean of 3.67. Definition 2.1 assumes the X i are independent and identically distributed. Often one has a sample where the X i s are independent, but not identically distributed. Choi et al. (2000) use a weighted parametric likelihood in the nonidentical distribution case. Weighted empirical likelihood extends the parametric method by Choi et al. (2000) to the nonparametric setting by incorporating a weight vector that tilts the log-empirical likelihood function in order to allow an assignment of differing weights to each term in the sum of the log empirical likelihood ratio objective function. This methodology generalizes Eq. (6). 2.2. Weighted empirical likelihood To extend the procedure by Choi et al. (2000) to empirical likelihood, we weight the contribution of each data point to the log-empirical likelihood function. Weighting the data s contribution amounts to starting with a discrete uniform distribution on n points and tilting the distribution to achieve a desired empirical log-likelihood. The resulting tilted distribution is a multinomial distribution on n points. Notice that each term in the objective function Eq. (6) receives the

N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) 5 same weight. Each term in an analogous weighted empirical likelihood ratio function receives weight w i, n w i =1, w i > 0, as demonstrated in the weighted empirical likelihood definition for the mean: Definition 2.2. Suppose X={X 1,X 2,...,X n } come from distributions with a common mean θ, and different variances. Suppose p i is the probability mass placed on X i, n p i =1, p i 0. Let t(p)= n p i X i denote the value t takes on when p i is the probability mass placed on X i. Given the weight vector w, n w i = 1, w i > 0, the weighted empirical likelihood for the parameter θ is defined as WEL(θ) = p,t(p)=θ p nw i i. (8) For all θ in the convex hull of X, np nw i i w nw i i. (9) Therefore, p,t(p)=θ p nw i i w nw i i ) = L (ˆθ, where ˆθ = n w i X i. From this, we obtain the weighted empirical log-likelihood ratio for the mean: ) wel(θ) = log WEL(θ)/WEL (ˆθ = nw i log (p i /w i ), (10) p,t(p)=θ where ) WEL (ˆθ = w nw i i. (11) Each term in the last equality of Eq. (10) receives weight w i, scaled by n. When w i = 1/n, (10) reduces to (6). 3. The weighted empirical likelihood estimator 3.1. The weight vector w Consider the case where σ > 1. We use the data-labeling rule of Tietjen and Moore (1972) to detect the contamination. After detecting the contamination we define the weight vector w = w b + cd, (12) where w b = (1/n,...,1/n) c is a scalar d is a vector of unit length. Theorem 3.1. If X = {X 1,X 2,...,X n } is an sample of size n, and k is the number of points labeled as contamination, then d is the vector of unit length defined as d = 1 kn y, (13) n k

6 N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) where y is the rank ordered vector defined as k n k. k n k. 1.. 1 The rank ordered vector y contains n k elements equal to k/(n k) and k elements equal to 1. See Appendix A for proof. The weight vector decreases the contribution of the contamination to the likelihood function. The scalar c is a positive constant that is bounded above by a function of the sample size. Theorem 3.2. Suppose X = {X 1,X 2,...,X n } is an sample of size n. Without loss of generality, assume X is contaminated by one point. If w is defined as in Eq. (12) then 0 c< 1 n n n 1. (14) See Appendix A for proof. 3.2. Obtaining weighted empirical likelihood estimators using nonlinear programming Suppose w is determined as in Section 3.1, p is defined as in Definition 2.2, and μ is the population mean. Determining a weighted empirical likelihood estimator for μ begins with solving the following nonlinear programming problem: p nw i log (p i /w i ) subject to p i X i = μ, and p i = 1 (15) p i 0. The above nonlinear programming problem yields optimal probability vectors p for weighted empirical likelihood. To formulate the objective function (15), assume that the weight vector w is given; it is computed as explained in Section 3.1. As with empirical likelihood, imize the product of the elements of the weighted empirical likelihood ) probability vector since the imum indicates the most plausible parameter value. The WEL(θ) and WEL (ˆθ are defined as in Eqs. (8) and (11), and the imum of the weighted empirical likelihood ratio is denoted: p n (p i ) nw i n (w i ) nw i. (16)

N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) 7 As usual, we imize the logarithm of Eq. (16), which yields p nw i log (p i /w i ). (17) The steps outlined in the above paragraphs yield the objective function in Eq. (15). Because the weighted empirical likelihood objective function is a strictly concave function on a convex set of probability vectors, the Karush Kuhn Tucker Theorem states that a unique global imum exists (Nocedal and Wright, 1999). The form of this optimum is derived in Section 3.3. 3.3. Karush Kuhn Tucker theorem Obtaining univariate weighted empirical likelihood point estimators starts with solving the nonlinear programming problem presented in Section 3.2. The first step is to determine the Lagrangian for (15), which is ) ) L (p, λ 1, λ 2 w, x, μ) = nw i log (p i /w i ) + λ 1 (1 p i + λ 2 (μ X i p i. In the following paragraphs, we use the above equation to derive the form of the optimal weighted empirical likelihood probability vectors. The optimality conditions for the constrained optimization problem (15) are the Karush Kuhn Tucker conditions. Therefore, a first-order necessary condition is that L ( p, λ 1, ) λ 2 w, x, μ = 0, (18) where p is the optimal probability vector for the weighted empirical likelihood, and λ = ( λ 1, 2) λ is a Lagrange multiplier vector such that the Karush Kuhn Tucker conditions are satisfied at ( p, λ ). In order to derive the form of the optimal probability vector for weighted empirical likelihood, derive an expression for the Lagrange multiplier, λ 1 as follows. Determine the gradient of the Lagrangian, Eq. (18), L = nw i λ 1 λ 2 X i = 0, p i p i i = 1, 2,...,n. (19) Multiplying Eq. (19) by p i yields, nw i λ 1 p i λ 2 X i p i = 0. (20) Since both w and p satisfy the axioms of probability vectors, summing each term in (20) gives n n w i λ 1 λ 2 μ=0. Therefore, λ 1 = n w i λ 2 μ is an expression for λ 1. Solving Eq. (20) for nw i yields nw i = λ 1 p i + λ 2 X i p i. Replacing λ 1 in this equation by the right-hand side of Eq. (21) yields nw i = (n λ 2 μ) p i + λ 2 X i p i. The final step is to solve for p i. Therefore, (21) p i = nw i n n j=1 w j λ 2 (μ X i ) (22) is the form of the ith element of the optimal probability vector for the weighted empirical likelihood. To evaluate Eq. (22), we provide values for w i, μ, and λ 2. First consider w i, the ith (1 i n) element of the weight vector, w. We use scale estimation to find an appropriate w, as detailed in Section 3.1. A scale estimator is a statistic that is equivariant under scale transformations. That is, if the parameter is transformed then the answer is also transformed (Glenn, 2002).

8 N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) After determining w we consider the parameter μ. As the values of μ must lie in the convex hull of the data, μ must be in ( X (1),X (n) ) in order to satisfy the constraint n p i X i = μ, where p i 0 and n p i = 1. The global imum of the weighted empirical likelihood function is μ.tofindμ given w note that wel(μ w, x) = ( p ) nw i log i (μ), (23) w i where p i (μ) is the ith element of p. It satisfies the constraints n p i =1 and n p i X i =μ forafixedμ. Therefore, μ = arg wel(μ w, x). (24) μ The Lagrange multiplier λ 2 is chosen such that the constraints and p i = 1 (25) p i X i = μ (26) are satisfied. Suppose p i is defined as in Eq. (22). Let λ 2 be a Lagrange multiplier value that allows the constraints (25) and (26) to be satisfied. The first of three steps for choosing λ 2 is to recall Eq. (22), the form of the elements of weighted empirical likelihood s optimal probability vector. From Eqs. (22) and (25) we define the function f (λ 2 ) = nw i n n 1. (27) j=1 w j λ 2 (μ X i ) The Lagrange multiplier λ 2 is a root of n / ) (nw i n nj=1 w j λ 2 (μ X i ) 1 = 0 that satisfies the constraints (25) and (26). Hence, f ( λ 2) = 0. Notice that λ2 = 0 is always a solution for Eq. (27). In order to find the root λ 2 of f (λ 2 ) take the derivative f (λ 2 ) = nw i (μ X i ) { n } 2, (28) n j=1 w j λ 2 (μ X i ) then evaluate at λ 2 = 0. The process of finding the root λ 2 of f (λ 2) separates into three cases. All three cases use the fact that λ 2 = 0is always a solution. Cases II and III have two roots that are either negative or positive. Case I: f (0) = 0. In this case, Eq. (27) has one root. Therefore, λ 2 = 0. Case II: f (0)>0. This implies that n w i (μ X i ) > 0. Therefore, λ 2 < 0. Case III: f (0)<0. This implies that n w i (μ X i ) < 0. Therefore, λ 2 > 0. From the two previous steps, one can infer whether λ 2 is positive, negative, or zero. The third and final step for choosing λ 2 is to find its exact value. We apply the bisection method to: { } n g (λ 2 ) = w i n n 1 = 0, (29) i=j w j λ 2 (μ X i )

N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) 9 which uses the lower (λ ) and upper (λ + ) bounds of λ 2. To derive the bounds, notice that μ >X i implies λ 2 < 1/ (μ X i ). μ <X i implies λ 2 > 1/ (μ X i ). μ = X i implies λ 2 unrestricted. The bounds for λ are: λ = 1/ ( ) μ X (n) and λ+ = 1/ ( ) μ X (1). Therefore, λ 2 is the value of λ 2 that lies in the interval (λ, λ + ) and allows the constraints Eqs. (25) and (26) to be satisfied. The above argument focuses on Eq. (25). However, when Eq. (25) is satisfied, Eq. (26) is implicitly satisfied. 4. Empirical results The root mean squared error of an estimator, the square root of the average squared difference between an estimator and a parameter, is a good measure of the performance of an estimator. To compare robustness properties of the weighted empirical likelihood estimator for the mean to those of the empirical likelihood estimator for the mean, trimmed mean, and Winsorized mean, we compute point estimators for the mean for the stated estimators for various sample sizes. We then compare the root mean squared errors of the mean in each case. In the first case, the samples are not contaminated; see the first row of Tables 1 4. We simulate 1000 samples of sizes n = 10, 20, 50, 100 from a standard normal distribution. In the second case, the level of contamination is 10%. We simulate 1000 samples of sizes n = 10, 20, 50, 100 from a contaminated normal distribution in which a sample is from a N(0, 9) distribution with probability 0.10 and otherwise from a N(0, 1) distribution. In the third case, the level of contamination is 20%. We simulate 1000 samples of sizes n = 10, 20, 50, 100 from a contaminated normal distribution in which a sample is from a N(0, 9) distribution with probability 0.20 and otherwise from N(0, 1). For each level of contamination, Tables 1 4 contain the theoretical root mean squared error of the mean, the root mean squared error for the weighted empirical likelihood, empirical likelihood, trimmed, and Winsorized estimators of the mean. The tables also contain the fraction of times the Shapiro Wilk test fails to reject the null hypothesis of standard normality. Simulations were carried out with an S-PLUS program that is available from the authors. When there is no contamination, the theoretical root mean squared error of the mean given n = 10 and σ 2 = 1is σ 2 X = σ 2 /n = 1/10 = 0.3162278. In the case of contamination, the theoretical mean square error of the mean is ( ) MSE = (1 γ)n 1 + γ σ 2 /n. (30) The root mean square of the mean is the square root of Eq. (30). To simulate the root mean squared error of the weighted empirical likelihood, empirical likelihood, trimmed and Winsorized mean, we estimate the mean for each sample using each method. For each method we square the errors, sum the squared errors, then divide by the number of simulations. Finally, we take the square root of the average squared errors. The simulation study indicates that as the contamination level increases, the empirical likelihood s root mean squared error increases. However, the weighted empirical likelihood s root mean squared error remains closer to the theoretical root mean squared error. As the sample size increases, the Winsorized mean is comparable to weighted empirical likelihood. In some cases involving larger sample sizes, the trimmed mean performs better than the weighted empirical likelihood estimator. However, the amount of trim is somewhat arbitrary. We chose 5% as it is typically used. When contamination is not present, weighted empirical likelihood reduces to empirical likelihood. The Shapiro Wilk test, which is commonly regarded as one of the most powerful tests of normality, has low power to differentiate a contaminated normal from a standard normal distribution in all cases. This indicates that the weighted empirical likelihood estimator for the mean is superior to the empirical likelihood estimator for the mean when the Shapiro Wilk test is incapable of differentiating a contaminated normal from a standard normal distribution. Therefore, the weighted empirical likelihood is less sensitive to contamination than the empirical likelihood. However, the relative robustness of weighted empirical likelihood did not increase with the amount of contamination. In an unreported simulation we kept track of which observations came from the contaminant, and found that the robustness of the weighted empirical likelihood relative to that of the empirical likelihood increased with the level of contamination.

10 N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) Table 1 This table contains results from simulating 1000 samples of size n = 10 from a N(0, 1) distribution, and 1000 samples of size n = 10 from a contaminated normal distribution Cont. (%) RMSE WEL (EL) TM (WM) SW 0 0.3162278 0.3278945 (0.3278945) 0.3278945 (0.3278945) 0.042 10 0.4242641 0.3885378 (0.4211143) 0.4281171 (0.4281171) 0.165 20 0.5099020 0.4982628 (0.5228110) 0.5209232 (0.5209232) 0.196 The contamination is from a N(0, 9) distribution with probabilities 0.10 and 0.20. RMSE is the theoretical root mean squared error of the mean. WEL is the root mean squared error for the weighted empirical likelihood estimate of the mean; empirical likelihood s root mean squared error is in parentheses. TM is the root mean squared error for the trimmed mean; the root mean squared mean for the Winsorized mean is in parentheses. SW is the fraction of times the Shapiro Wilk test rejects the null hypothesis of standard normality. Table 2 This table contains results from simulating 1000 samples of size n = 20 from a N(0, 1) distribution, and 1000 samples of size n = 20 from a contaminated normal distribution Cont. (%) RMSE WEL (EL) TM (WM) SW 0 0.2236067 0.2258830 (0.2258830) 0.2258830 (0.2258830) 0.046 10 0.3000000 0.2882647 (0.3116531) 0.2760253 (0.2899979) 0.264 20 0.3605551 0.3449832 (0.3625126) 0.3152415 (0.3417611) 0.347 The contamination is from a N(0, 9) distribution with probabilities 0.10 and 0.20. RMSE is the theoretical root mean squared error of the mean. WEL is the root mean squared error for the weighted empirical likelihood estimate of the mean; empirical likelihood s root mean squared error is in parentheses. TM is the root mean squared error for the trimmed mean; the root mean squared mean for the Winsorized mean is in parentheses. SW is the fraction of times the Shapiro Wilk test rejects the null hypothesis of standard normality. Table 3 This table contains results from simulating 1000 samples of size n = 50 from a N(0, 1) distribution, and 1000 samples of size n = 50 from a contaminated normal distribution Cont. (%) RMSE WEL (EL) TM (WM) SW 0 0.1414213 0.1427533 (0.1427533) 0.1427533 (0.1427533) 0.996 10 0.1897367 0.1703024 (0.1799723) 0.1595506 (0.1676465) 0.996 20 0.2280351 0.2240984 (0.2320960) 0.2026324 (0.2223487) 0.998 The contamination is from a N(0, 9) distribution with probabilities 0.10 and 0.20. RMSE is the theoretical root mean squared error of the mean. WEL is the root mean squared error for the weighted empirical likelihood estimate of the mean; empirical likelihood s root mean squared error is in parentheses. TM is the root mean squared error for the trimmed mean; the root mean squared mean for the Winsorized mean is in parentheses. SW is the fraction of times the Shapiro Wilk test rejects the null hypothesis of standard normality. Table 4 This table contains results from simulating 1000 samples of size n = 100 from a N(0, 1) distribution, and 1000 samples of size n = 100 from a contaminated normal distribution Cont. (%) RMSE WEL (EL) TM (WM) SW 0 0.1000000 0.1008119 (0.1008119) 0.1008119 (0.1008119) 0.065 10 0.1341641 0.1275518 (0.1322816) 0.1156860 (0.1208304) 0.633 20 0.1612452 0.1591750 (0.1629989) 0.1384336 (0.1532082) 0.815 The contamination is from a N(0, 9) distribution with probabilities 0.10 and 0.20. RMSE is the theoretical root mean squared error of the mean. WEL is the root mean squared error for the weighted empirical likelihood estimate of the mean; empirical likelihood s root mean squared error is in parentheses. TM is the root mean squared error for the trimmed mean; the root mean squared mean for the Winsorized mean is in parentheses. SW is the fraction of times the Shapiro Wilk test rejects the null hypothesis of standard normality. 5. Discussion We construct the weighted empirical likelihood function, a nonparametric likelihood function that is suitable for data that are independent but not necessarily identically distributed. We compare the robustness properties of weighted empirical likelihood to those of empirical likelihood, trimmed mean, and Winsorized mean by comparing the root mean

N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) 11 squared errors of the mean for each method. Using the contaminated normal distribution, we showed that as the level of contamination increases, the weighted empirical likelihood s root mean squared error for the mean remains closer to the theoretical root mean squared error for the mean as compared to the empirical likelihood. The weighted empirical likelihood is comparable to the trimmed mean and Winsorized mean as the sample size increases. We also conducted a Shapiro Wilk test on the data. The Shapiro Wilk test was not capable of differentiating a contaminated normal and a standard normal even though the sample sizes were small. When the Shapiro Wilk test did not detect nonnormality, the weighted empirical likelihood estimator was less sensitive to contamination than the empirical likelihood estimator. In future research, we will explore large sample property derivations of the weighted empirical likelihood estimator. Acknowledgments The authors would like to thank Professor Joseph Gastwirth of the Department of Statistics, George Washington University for suggestions regarding robustness. The authors also thank Professors David W. Scott and Katherine Ensor of the Department of Statistics, Rice University, and Professor Yin Zhang of the Department of Computational and Applied Mathematics, Rice University. Appendix A. Proofs Proof of Theorem 3.1. By definition of unit vector, d = (1/ y )y. Since y = kn/(n k), k n k 1. k d =. kn n k n k 1.. 1 This completes the proof. Proof of Theorem 3.2. Without loss of generality, assume that X n is the contamination. From Eq. (12), 1 1 n 1 ṇ w =.. + c.. (31) n 1 1 n 1 n 1 n 1 Therefore w[n]= 1 n + c n n 1 is the element of w that corresponds to X n. In order for the contamination to receive a weight less than or equal to the other data, the following must hold: 0 < w[n] 1 n. (33) Substituting Eq. (32) into Eq. (33), yields the following inequality: 0 < 1 n + c n 1 n. (34) n 1 (32)

12 N.L. Glenn, Y. Zhao / Computational Statistics & Data Analysis ( ) Solving (34) for c yields, 0 c< 1 n n n 1. (35) This completes the proof. References Andrews, D.F., Mallows, C.L., 1974. Scale mixtures of normal distributions. J. Roy. Statist. Soc. Ser. B 36, 99 102. Choi, E., Hall, P., Presnell, B., 2000. Rendering parametric procedures more robust by empirically tilting the model. Biometrika 87 (2), 453 465. Ellis, S., Steyn, Faans, S., Venter, H., 2003. Fitting a pareto-normal-pareto distribution to the residuals of financial data. Comput. Statist. 18, 477 491. Gastwirth, J., Cohen, M., 1970. Small sample behavior of some robust linear estimators of location. J. Amer. Statist. Assoc. 65, 946 973. Glenn, N., 2002. Robust empirical likelihood. Ph.D. Thesis, Rice University, Houston, TX. Lehmann, E.L., 1983. Theory of Point Estimation. Wiley, Inc., Canada. Nocedal, J., Wright, S.J., 1999. Numerical Optimization. Springer, New York. Owen, A., 1988. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75, 237 249. Owen, A., 2001. Empirical Likelihood. Chapman & Hall/CRC Press, London, Boca Raton, FL. Sinha, S., Wiens, D.P., 2002. Mini weights for generalised M-estimation in biased regression models. Canadian J. Statist. 30 (3), 401 414. Taskinen, S., Kankainen, A., Oja, H., 2003. Sign test of independence between two independent random vectors. Statist. Probab. Lett. 62 (1), 9 21. Tietjen, G.L., Moore, R.H., 1972. Some Grubbs-type statistics for the detection of several outliers. Technometrics 14, 583 597. Tukey, J.W., 1960. A survey of sampling from contaminated distributions. In: Olkin, I. et al. (Eds.), Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. Stanford University Press, CA, pp. 448 485.