Handling Missing Data on Asymmetric Distribution

Similar documents
New Method to Estimate Missing Data by Using the Asymmetrical Winsorized Mean in a Time Series

A = h w (1) Error Analysis Physics 141

Department of Statistics & Operations Research, Aligarh Muslim University, Aligarh, India

VARIANCE ESTIMATION FOR COMBINED RATIO ESTIMATOR

Average Rate of Change

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

Ratio estimation using stratified ranked set sample

Estimation Approach to Ratio of Two Inventory Population Means in Stratified Random Sampling

Numerical Differentiation

ESTIMATION OF A POPULATION MEAN OF A SENSITIVE VARIABLE IN STRATIFIED TWO-PHASE SAMPLING

lecture 26: Richardson extrapolation

Higher Derivatives. Differentiable Functions

These errors are made from replacing an infinite process by finite one.

DELFT UNIVERSITY OF TECHNOLOGY Faculty of Electrical Engineering, Mathematics and Computer Science

Taylor Series and the Mean Value Theorem of Derivatives

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

Investigating Euler s Method and Differential Equations to Approximate π. Lindsay Crowl August 2, 2001

The derivative function

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION MODULE 5

Math 34A Practice Final Solutions Fall 2007

IEOR 165 Lecture 10 Distribution Estimation

Combining functions: algebraic methods

2.1 THE DEFINITION OF DERIVATIVE

The Verlet Algorithm for Molecular Dynamics Simulations

= 0 and states ''hence there is a stationary point'' All aspects of the proof dx must be correct (c)

LECTURE 14 NUMERICAL INTEGRATION. Find

Some Review Problems for First Midterm Mathematics 1300, Calculus 1

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

Investigation of Tangent Polynomials with a Computer Algebra System The AMATYC Review, Vol. 14, No. 1, Fall 1992, pp

232 Calculus and Structures

Regularized Regression

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Computer Derivations of Numerical Differentiation Formulae. Int. J. of Math. Education in Sci. and Tech., V 34, No 2 (March-April 2003), pp

Chapter 2. Limits and Continuity 16( ) 16( 9) = = 001. Section 2.1 Rates of Change and Limits (pp ) Quick Review 2.1

Order of Accuracy. ũ h u Ch p, (1)

Logistic Kernel Estimator and Bandwidth Selection. for Density Function

The Derivative The rate of change

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS

Material for Difference Quotient

REVIEW LAB ANSWER KEY

2.8 The Derivative as a Function

Teaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line

De-Coupler Design for an Interacting Tanks System

Math 312 Lecture Notes Modeling

Math 31A Discussion Notes Week 4 October 20 and October 22, 2015

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

The Priestley-Chao Estimator

Week #15 - Word Problems & Differential Equations Section 8.2

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t).

Finding and Using Derivative The shortcuts

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES

Math 1241 Calculus Test 1

(4.2) -Richardson Extrapolation

School of Geomatics and Urban Information, Beijing University of Civil Engineering and Architecture, Beijing, China 2

A Generalization of the Lavallée and Hidiroglou Algorithm for Stratification in Business Surveys

Notes on wavefunctions II: momentum wavefunctions

How to Find the Derivative of a Function: Calculus 1

Polynomial Interpolation

Estimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution

Polynomial Interpolation

WYSE Academic Challenge 2004 Sectional Mathematics Solution Set

WYSE Academic Challenge 2004 State Finals Mathematics Solution Set

Solution for the Homework 4

Numerical Analysis MTH603. dy dt = = (0) , y n+1. We obtain yn. Therefore. and. Copyright Virtual University of Pakistan 1

Polynomials 3: Powers of x 0 + h

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

Fast optimal bandwidth selection for kernel density estimation

MATH1131/1141 Calculus Test S1 v8a

Outline. MS121: IT Mathematics. Limits & Continuity Rates of Change & Tangents. Is there a limit to how fast a man can run?

2.11 That s So Derivative

3.1 Extreme Values of a Function

Financial Econometrics Prof. Massimo Guidolin

The Laplace equation, cylindrically or spherically symmetric case

Fundamentals of Concept Learning

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h

RightStart Mathematics

Bootstrap confidence intervals in nonparametric regression without an additive model

Numerical Solution of One Dimensional Nonlinear Longitudinal Oscillations in a Class of Generalized Functions

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a?

New Distribution Theory for the Estimation of Structural Break Point in Mean

Efficient algorithms for for clone items detection

The Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA

Strati cation by Size Revisited

Fast Exact Univariate Kernel Density Estimation

Chapter 2 Limits and Continuity

MVT and Rolle s Theorem

Function Composition and Chain Rules

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative

158 Calculus and Structures

arxiv: v1 [stat.me] 5 Nov 2008

Variance Estimation in Stratified Random Sampling in the Presence of Two Auxiliary Random Variables

Copyright c 2008 Kevin Long

Definition of the Derivative

Additional Lecture Notes

4.2 - Richardson Extrapolation

THE STURM-LIOUVILLE-TRANSFORMATION FOR THE SOLUTION OF VECTOR PARTIAL DIFFERENTIAL EQUATIONS. L. Trautmann, R. Rabenstein

Transcription:

International Matematical Forum, Vol. 8, 03, no. 4, 53-65 Handling Missing Data on Asymmetric Distribution Amad M. H. Al-Kazale Department of Matematics, Faculty of Science Al-albayt University, Al-Mafraq-Jordan amed_005k@yaoo.com Abstract Te problem of imputation of missing observations emerges in many areas. Data usually contained missing observations due to many factors, suc as macine failures and uman error. Incomplete dataset usually causes bias due to differences between observed and unobserved data. Tis paper proposed Neyman allocation metod to estimate asymmetric winsorizing mean for andling missing observations wen te data follow te exponential distribution. Different values of te exponential distribution parameters were used to illustrate. A set of data from exponential distribution were generated to compare te performance of te proposed metods suc as regression trend, average of te wole data, naive forecast and average bound of te oles and te proposed Neyman allocation metod. Te goodness-of-fit criterions used were te mean absolute error (MAE) and te mean squared error (MSE). It was found tat te proposed metod gave te best fit in te sense of aving smaller error, in particular for a large percentage of missing observations. Keywords: Neyman Allocation, Missing Data, Single imputation metods, Winsorized mean. INTRODUCTION Te attendance of missing observations in statistical survey data is an important issue to deal wit (Little and Rubin, 987). Besides missing observations, te time series data are possibly contaminated by outliers or are eterogeneous. Incomplete datasets may lead to results tat are different from tose tat would ave been obtained from a complete dataset (Hawtorne and Elliot, 005).

54 A. M. H. Al-Kazale Tis paper proposed a metod of substituting tese missing observations by using asymmetric winsorized mean. In tis metod, we prepared te wole data by dividing it into groups by using te stratified sampling to determine te boundaries among te strata. Stratification is te process of grouping members of te population into relatively omogeneous groups before sampling. Te strata sould be mutually exclusive were every element in te population must be assigned to only one stratum (Cyert, and Davidson, 96). One of te main objectives of stratified sampling is to reduce te variance of te estimator and to get more statistical precision tan wit te simple random sampling (Cocran, 977; Hanif, 000; Hess et al., 966; Seik and Amad, 00). In tis paper, Neyman allocation metod is used to determine te boundaries of te groups. Te missing observations problem is an old one for analysis tasks. Te waste of te data wic can result from casewise deletion of missing values, oblige to propose alternative approaces. A problem frequently encounters in data collection is missing observations or observations may be virtually impossible to obtain, eiter because of time or cost constrains. In order to replace tose observations, tere are several different options available to te researcers (Pankratz, 983; Patricia, 994). Firstly, replace te missing observations wit te mean of te series. Secondly; replace te missing observations wit te naive forecast. Also; replace te missing observations wit a simple trend forecast. Finally replace te missing observations wit an average of te last two known observations tat bound te missing observations.. NEYMAN ALLOCATION Neyman allocation is a sample allocation metod tat may be used wit stratified samples. We used te Neyman allocation to determine te stratum boundaries, wen te population is igly skewed te Neyman allocation sould be used (Cocran,977; Amad Mair et al., 007; Samiuddin et al., 998). Te suffix denotes te stratum and i te unit witin te stratum. Wen a population of N units are being stratified into L strata and te samples from eac stratum are selected wit simple random sampling an unbiased estimate of te population mean for te estimation variable x, is x st (st for stratified), were x L N x = = = st N = L W x (.) For te stratified random sampling, te variance of te estimate x st is

Handling missing data on asymmetric distribution 55 L L N S W S N n n = = V ( x ) = = st (.) Te variance depend on Neyman allocation is reduced to L VNey ( xst ) = ( WS ) n (.3) = Let x 0, x L be te smallest and largest values of x in te population. Te problem is to find intermediate stratum boundaries y, y,..., y L and by differentiating wit respect to te stratum boundaries y. We get te minimum variance VNey ( x st ) wit respect to y, since y appears in te sum only in te terms of W, S and W+, S+ (Cocran,977). Hence we ave te formula for finding te optimum stratum boundaries. ( y μ) + S ( y μ + ) + S + = were =,,3,..., L (.6) S S + Suppose tat te distribution of x be continuous wit te density function, f ( x ), a < x < bin order to make L (groups) strata, te range of x is to be cut up at points y < y < y3 <... < y L. Te relative frequency W, te population mean μ and te population variance S at te -t stratum are given by. y W = f ( x) dx (.7) y y μ = x f ( x) dx W (.8) y y = ( ) μ W y S x f x dx (.9) Tese equations are difficult to be solved, since μ and S depend on y. We must use te iterative metod to solve tem by using computer program C++ (Amad Mair et al. 007). We divide te wole data into groups by putting ( y0, y ) into group one were y0 = x0, ( y, y ) in group two and so on. 3. WINSORIZED MEAN Winsorized mean is a winsorized statistical measurement of central tendency, like te mean and te median and even more similar to te truncated mean (Mingxin and

56 A. M. H. Al-Kazale Yijun, 009; Barnett and Lewis 994), te winsorized mean eliminates te outliers at bot ends of an ordered set of observations. Unlike te trimmed mean, te winsorized mean replaces te outliers wit observed values, rater tan discarding tem. It involves te calculation of te mean after replacing te given parts of a probability distribution at te ig and low ends wit te most extreme remaining values. Te winsorized mean is defined by n s xw = [( r + ) x r+ + xi ( s ) x n s ] n + + (3.) i = r+ wic means tat te winsorized mean is te average of te observations were te first r smallest values are replaced by te ( r + ) -t smallest value x r +, and te first s largest values are replaced by te ( s + ) -t largest value x n s (Barnett and Lewis 994). Wen, we determine te suitable number of groups (must be more tan two groups) depending on te wole data by using Equation (.6). We obtain te boundaries of tese groups at te first and te last groups. Consequently we count te number of observations inside tese two groups. Te number of observations in te first group is r and te number of observations in te last group is s. Subsequently, te asymmetrical winsorized mean can be calculated using Equation (3.). Te data were generated from exponential distribution. Ten, te wole data ordered from te smallest to te largest, te smallest value is zero and te largest value is 7.. Te data were assumed aving 5%, 0%, and 0% randomly missing observations in tis study. 4 THE BOUNDARIES OF GROUPS IN EXPONENTIAL DISTRIBUTION Suppose tat te time series x, x,..., x n of n observations come from an exponential distribution wit probability density function f ( x) = λe λx, λ > 0, x > 0, (4.) were λ is te parameter of exponential function. In order to make L (groups) strata, te domain of f ( x ) was truncated from, (0, ) to (0, b ] were b is te largest value of x. From equations (.7), (.8) and (.9), te relative frequency W, mean μ and variance S of te -t stratum can be computed for te exponential distribution as te following λy λy W = e e (4.)

Handling missing data on asymmetric distribution 57 S λx λx ( y + ) e ( y + ) e μ = λ λ λ y λ y e e ( y + y + ) e ( y + y + ) e λx λx = λ λ λ λ ( μ ) λy λy e e (4.3) (4.4) Table Stratum boundaries for groups size of tree to eigt in te case of exponential distribution wen λ = 0., Boundaries 3G 4G 5G 6G 7G 8G y 3.95.306.8005.4783.539.0887 y 7.9365 5.303 3.9989 3.4.688.307 y 3 9.699 6.834 5.365 4.3633 3.7033 y 4 0.783 7.984 6.37757 5.39 y 5.6395 8.904 7.547 y 6.98 9.654 y 7.83 Table Stratum boundaries for groups size of tree to eigt in te case of exponential distribution wen λ = 0.4, Boundaries 3G 4G 5G 6G 7G 8G y.884.3550.0594 0.8699 0.7379 0.6407 y 4.9603 3.33.4064.94.604.3759 y 3 6.63 4.593 3.66.659.383 y 4 7.567 5.0994 3.986 3.84 y 5 8.0536 5.808 4.60 y 6 8.730 6.4054 y 7 9.74 Table 3 Stratum boundaries for groups size of tree to eigt in te case of exponential distribution wen λ = 0.6 Boundaries 3G 4G 5G 6G 7G 8G y.73 0.969 0.776 0.5898 0.5007 0.435 y 3.377.886.634.307.0903 0.9356 y 3 4.87.905.33.8074.55 y 4 5.008 3.4935.73.49 y 5 5.5878 3.993 3.570 y 6 6.084 4.453 y 7 6.543

58 A. M. H. Al-Kazale Te number of groups or strata depends on te size of te population. If we want to divide te wole data into tree groups, we need to calculate two boundaries y and y since x 0 = y 0 = 0 and y 3 = 7. wic is te largest value. Te boundary y depend on te μ and μ and S and S, also y depends on μ and μ 3 and S and S 3. We calculated tese values by applying equation (.6) depending on equations (.8) and (.9) and in te calculation of te boundaries we assumed te parameter of exponential distribution are λ = 0., 0.4, and 0.6. If we divide te wole data into four groups, we need to calculate tree boundaries y, y and y 3. Similarly we find te boundaries of te strata (group) for te five, six, seven and eigt groups in te same way by using te C++ program. Tables,, and 4 ave sown te boundaries for all groups for te different values of te parameter of exponential distribution for λ = 0., 0.4, and 0.6 respectively. 5 SUMMARY AND DISCUSSION Te metod to determine te two parameters of asymmetric winsorized mean ( rs, ) wen te wole data was divided into tree to eigt groups as follow. We assumed tat te data wit different percentage of missing observations suc as 5%, 0%, and 0% wic as been selected randomly. In te first case wen te data as 5% missing observations, te asymmetric winsorized mean was calculated as te following: Suppose te stratum boundaries for te wole data was divided into tree groups wit 5% missing data wit λ = 0.. Te first group as te interval (0, 3.95), te second group as te interval (3.95, 7.9365) and te tird group as te interval (7.9365, 7.) (see Table ). Since we are interested in te first and te last group, ten we count te number of observations in bot groups. Te number of observations in te first one was 9, and te number of observations in tird group was 55 as sown in Table 5a. Finally te asymmetric winsorized mean was calculated by taking r= 9 and s= 55 and by applying equation (3.), wic is equal to 5.0037. Te same procedure to compute te asymmetric winsorized mean was followed for different groups for 5% missing observations as sown in tables 5b, and 5c.

Handling missing data on asymmetric distribution 59 Table 5a Te values of te two parameters and te asymmetric winsorizing mean wen λ = 0., and 5%missing observations for tree groups to eigt groups r 9 65 46 3 5 s 55 5 3 9 8 W ( r, s ) 5.0037 4.977 4.9573 4.9303 4.948 4.949 Moreover, for te data wit 0% missing observations, wen te wole data was divided into tree groups wit λ = 0.,, te number of observations in te first group was 9 wic was te value of r and te number of observations in te last one 50 wic was s, ten te asymmetric winsorized mean was equal to 4.9483. In te same way, te asymmetric winsorized mean was calculated for different groups. Tables 6a, 6b, and 6d ave sown te numbers of observations in te first and last groups for different values of exponential distributions parameter and te values of asymmetric winsorized mean. Table 5b Te values of te two parameters and te asymmetric winsorizing mean wen λ = 0.4 and 5%missing observations for tree groups to eigt groups r 48 8 7 3 s 70 09 76 5 4 3 W ( r, s ) 3.933 4.343 4.5547 4.675 4.749 4.803 Table 5c Te values of te two parameters and te asymmetric winsorizing mean wen λ = 0.6 and 5%missing observations for tree groups to eigt groups r 6 8 9 s 67 08 69 39 6 00 W ( r, s ).93 3.530 3.866 4.085 4.45 4.3604 Table 6a Te values of te two parameters and te asymmetric winsorizing mean wen λ = 0., and 0% missing observations for tree groups to eigt groups r 9 67 48 3 4 0 s 50 0 8 7 W ( r, s ) 4.9483 4.8934 4.8633 4.839 4.849 4.8555

60 A. M. H. Al-Kazale Table 6b Te values of te two parameters and te asymmetric winsorizing mean wen λ = 0.4 and 0%missing observations for tree groups to eigt groups r 49 7 0 6 3 s 56 00 69 47 38 9 W ( r, s ) 3.885 4.7 4.4784 4.594 4.6656 4.753 Table 6c Te values of te two parameters and te asymmetric winsorizing mean wen λ = 0.6 and 0%missing observations for tree groups to eigt groups r 5 7 0 9 s 46 9 55 8 06 9 W ( r, s ).9366 3.4768 3.8089 4.070 4.78 4.903 Finally, for 0% missing observations, te data was divided into tree groups. Te number of te observations of te first group was 03 wic was te value of te parameter r and te number of observations in te tird group was 5 wic was te value of parameter s, ten te asymmetric winsorized mean computed was found to be 5.0346, as sown in Table 7a. Similarly te asymmetric winsorized mean computed for te oter groups wit 0% percentage of missing observations as obtained in Tables 7b, and 7c. Table 7a Te values of te two parameters and te asymmetric winsorizing mean wen λ = 0., and 0% missing observations for tree groups up to eigt groups r 03 59 4 9 3 9 s 5 5 3 9 8 W ( r, s ) 5.0346 5.0074 4.9975 4.969 4.989 4.9948 Table 7b Te values of te two parameters and te asymmetric winsorizing mean wen λ = 0.4 and 0% missing observations for tree groups up to eigt groups r 43 5 9 5 s 45 96 68 49 40 3 W ( r, s ) 3.9075 4.36 4.5480 4.6763 4.7635 4.85

Handling missing data on asymmetric distribution 6 Table 7c Te values of te two parameters and te asymmetric winsorizing mean wen λ = 0.6 and 0% missing observations for tree groups up to eigt groups r 3 7 0 9 8 s 4 74 44 0 89 W ( r, s ).9646 3.4945 3.834 4.0508 4.7 4.345 We compared te result of using asymmetric winsorized mean metod for missing observations wit oter estimation metods for missing observations. Te first metod is to replace te missing observations wit te mean of te series. Tis mean can be calculated over te entire range of te sample. Second metod is to replace te missing observations wit te naive forecast. Naive model is te simplest form of a univariate forecast model. Tis model uses te current time period's value for te next time period, tat is X ˆt + = X t. Also, we can replace te missing observations wit a simple trend forecast. Tis is accomplised by estimating te regression equation of te form X t = a+ bt (were t is te time) for te periods prior to te missing value. Ten use te equation to fit te time periods missing. Finally, we can replace te missing observations wit an average of te last two known observations tat cover te missing observations (Patricia,994). Te accuracy of estimating te missing observations wit asymmetric winsorized mean depends on ow close te estimating values to te actual values. In practice, we define te difference between te actual and te estimating values as an error. If te estimation is doing a good job, te error will be relatively small. Tis means tat te error for eac time period is purely random fluctuation around original value. So we sould get a value equal 0 or near 0. We tested asymmetric winsorized mean approac to estimating missing data points on time series data sets wit respect to te oter metods. In tis paper te following statistical measures were used as te estimators of accuracy (Patricia, 994). - Te mean absolute errors - Te mean square errors MAE = MSE = n t = n n e t = t n e t (5.) (5.)

6 A. M. H. Al-Kazale were n is te number of imputations, e i is difference between te actual and te estimating values. In order to evaluate te accuracy of te estimation procedure for missing observations, we used mean absolute error (MAE) as in Equation (5.) and mean square error (MSE) as in Equation (5.). Tables 8a to 9c sown te amount of te MAE and MSE for te estimation of te missing observations by using te proposed metod, and Tables and for te oter metods. Te MAE values as sown in Tables 8a to 8c for te proposed metods and Table 0 for te oter metods. At te 5% missing observations case, te simple average metod does better tan te asymmetric winsorized metods wen λ = 0., in tree and four groups, but for te oter values, it is clear tat te proposed metod is better tan te oter metods. For te 0% missing observations, te average coverage metod does better tan te oter metods, wile te simple average metod does better tan te asymmetric metod only wen λ = 0., at tree and four groups, λ = 0.4 at tree groups and wen λ = 0.6 at tree, four and five groups. Finally, for te 0% missing observations, tere is only one case; te simple average metod does better tan te asymmetric winsorized metod wen λ = 0., in tree groups. For te second accuracy measures, te MSE as sown in Tables 9a to 9c for te proposed metods and Table for te oter metods. For te 5% missing observations case in general, te asymmetric winsorized mean metods do better tan te oters, but te simple average metod do better tan te proposed metod wen λ = 0., in tree and four groups. For te 0% missing observations, te asymmetric winsorized metods do better tan te naive and trend metods, wile te average bound metod do better tan te proposed metod, te mixed results between te proposed metods and te simple average metod. Finally, for te 0% missing observations, te simple average metod does better tan te proposed metod in two cases wen te λ = 0., in tree groups and λ = 0.6 in tree and four groups. In tis paper, 8 asymmetric winsorized mean metods were proposed to andle missing observations wit 5%, 0%, and 0% missing observations. General results sown tat te proposed metods do better tan te oter metods, in particular wen te data aving 0% missing observations. We recommend using te asymmetric winsorized metod wen te data ave more missing observations and dividing te wole data into more tan four groups. Table 8a Te values of mean absolute error of asymmetric winsorized mean for 5% missing data, λ = 0., 0.4, and 0.6 for different groups MAE 3G 4G 5G 6G 7G 8G λ = 0.,.9554.9403.9335.907.96.996 λ = 0.4.4483.64.748.7986.8349.8600 λ = 0.6.037.804.449.586.596.6507

Handling missing data on asymmetric distribution 63 Table 8b Te values of mean absolute error of asymmetric winsorized mean for 0% missing data, λ = 0., 0.4, and 0.6 for different groups MAE 3G 4G 5G 6G 7G 8G λ = 0.,.4763.47.468.4659.4669.4675 λ = 0.4.507.463.455.45.4546.4569 λ = 0.6.788.5967.567.483.4679.465 Table 8c Te values of mean absolute error of asymmetric winsorized mean for 0% missing data, λ = 0., 0.4, and 0.6 for different groups MAE 3G 4G 5G 6G 7G 8G λ = 0.,.937.949.94.954.904.98 λ = 0.4.8533.83.8396.8559.87.889 λ = 0.6.0876.9343.865.8384.83.83 Table 9a Te values of mean square error for asymmetric winsorized mean for 5% missing data, λ = 0., 0.4, and 0.6 for different groups MSE 3G 4G 5G 6G 7G 8G λ = 0., 0.575 0.455 0.40 0.30 0.344 0.37 λ = 0.4 7.6499 8.4974 9.0695 9.455 9.67 9.8500 λ = 0.6 6.99 7.339 7.5377 7.986 8.658 8.546 Table 9b Te values of mean square error for asymmetric winsorized mean for 0% missing data, λ = 0., 0.4, and 0.6 for different groups MSE 3G 4G 5G 6G 7G 8G λ = 0., 9.3746 9.3545 9.3460 9.3405 9.346 9.344 λ = 0.4 0.06 9.5468 9.3974 9.355 9.3354 9.3307 λ = 0.6.5735 0.998 0.95 9.8494 9.649 9.5303 Table 9c Te values of mean square error for asymmetric winsorized mean for 0% missing data, λ = 0., 0.4, and 0.6 for different groups MSE 3G 4G 5G 6G 7G 8G λ = 0., 5.56 5.478 5.4645 5.470 5.4534 5.4609 λ = 0.4 5.749 5.0050 5.057 5.3 5.00 5.578 λ = 0.6 6.84 5.6858 5.44 5.0773 5.044 5.0055

64 A. M. H. Al-Kazale Table 0 Te values of mean absolute error for te oter metods for 5, 0 and 0%. missing MAE Simple Bound Trend Naive Average Average Regression 5%.93538 4.03374 3.96569 3.4393 0%.46873 3.34.956.98584 0%.965.5009.36689 4.033494 Table Te values of mean square error for te oter metods for 5, 0 and 0%. missing MSE Simple Bound Trend Naive Average Average Regression 5% 0.465 4.90699 5.0595 5.4698 0% 9.347366 6.8883 8.39393 5.04977 0% 5.486735 0.637 9.0379 8.30867 REFERENCES [] A.K. Seik, and M. Amad,. Statistical Models of Accelerated Life Testing. Pak. J. Statist. 7 (): (00) 75-0. [] A. Pankratz,. Forecasting wit Univariate Box - Jenkins Models. Jon Wiley, USA, 983. [3] E. G. Patricia, Introduction to Time Series Modeling and Forecasting in Business and Economics. McGraw-Hill, Inc. New York, 994. [4] G. Hawtorne, and P. Elliot,. Imputing Cross-Sectional Missing Data: Comparison of Common Tecniques. Australian and New Zealand Journal of Psyciatry. 39: (005) 583-590. [5] I. Hess, V.K. Seti, and and T.R. Balakrisnan, Stratification: a practical investigation. J. Amer. Statist. Assoc. 6, (966) 74 90. [6] M. Hanif,. Design and Model-based sampling inference. Pak. J. Statist. 6(3): (000) 9-46. [7] M. Samiuddin, M. Hanif, and A.K.A Kattan, An optimum form for te design based ratio estimator. Pak. J. Statist. 4 () (998) 8-96. [8] R. Amad Mair, A.M.H. Al-Kazale and M.M.T. Al-Kassab,. Approximation Metod in Finding Optimum Stratum Depending on Neyman Allocation Applied on Beta Distribution. Proceedings of te t WSEAS International Conference on Applied Matematics. Cairo, Egypt. WSEAS Press. (007) 34-345.

Handling missing data on asymmetric distribution 65 [9] R.M. Cyert, and H.J. Davidson, Statistical Sampling for Accounting Information. Prentice-Hall, Englewood Cliffs, NJ, pp. (96) 6 7,. [0] R. J. A. Little, and B.B. Rubin,. Statistical Analysis wit Missing Data. Jon Wiley: New York, 987. [] V. Barnett, and T. Lewis Outliers in Statistical Data. 3 rd Edition, Jon Wiley, New York, USA,994. [] W.G. Cocran. Sampling Tecniques. 3 rd Edition. Jon Wiley, New York, USA, 977. [3] W. Mingxin, and Z. Yijun,. Trimmed and winsorised means based on a scaled deviation. J. of Stat. Planning and Inference. 39: (009) 350 365. Received: September, 0