Method of Conditional Moments Based on Incomplete Data

Similar documents
Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Latent Variable Models and EM algorithm

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

EM Algorithm II. September 11, 2018

Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme

Estimating the parameters of hidden binomial trials by the EM algorithm

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY

Type II Bivariate Generalized Power Series Poisson Distribution and its Applications in Risk Analysis

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

A New Class of Positively Quadrant Dependent Bivariate Distributions with Pareto

ON THE TRUNCATED COMPOSITE WEIBULL-PARETO MODEL

Maximum Likelihood Estimation. only training data is available to design a classifier

Lecture 3. Inference about multivariate normal distribution

Statistical Estimation

On nonlinear weighted least squares fitting of the three-parameter inverse Weibull distribution

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Gaussian Mixture Distance for Information Retrieval

Submitted to the Brazilian Journal of Probability and Statistics

Parametric Techniques Lecture 3

Biostat 2065 Analysis of Incomplete Data

The Use of Copulas to Model Conditional Expectation for Multivariate Data

Tail negative dependence and its applications for aggregate loss modeling

Chapter 3 : Likelihood function and inference

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Optimum Test Plan for 3-Step, Step-Stress Accelerated Life Tests

Parametric Techniques

STAT 730 Chapter 4: Estimation

ABC methods for phase-type distributions with applications in insurance risk problems

Computational Statistics and Data Analysis. Estimation for the three-parameter lognormal distribution based on progressively censored data

Estimation of the Parameters of Bivariate Normal Distribution with Equal Coefficient of Variation Using Concomitants of Order Statistics

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Chapter 3. Point Estimation. 3.1 Introduction

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Estimation Under Multivariate Inverse Weibull Distribution

An Empirical Characteristic Function Approach to Selecting a Transformation to Normality

Constant Stress Partially Accelerated Life Test Design for Inverted Weibull Distribution with Type-I Censoring

Accelerating the EM Algorithm for Mixture Density Estimation

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Multivariate Differentiation 1

On Mixture Regression Shrinkage and Selection via the MR-LASSO

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

The Type I Generalized Half Logistic Distribution Downloaded from jirss.irstat.ir at 11: on Monday August 20th 2018

Estimating Gaussian Mixture Densities with EM A Tutorial

OPTIMAL B-ROBUST ESTIMATORS FOR THE PARAMETERS OF THE GENERALIZED HALF-NORMAL DISTRIBUTION

Statistics 3858 : Maximum Likelihood Estimators

Parameter Estimation

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution

An Introduction to Expectation-Maximization

STAT 512 sp 2018 Summary Sheet

Estimation for Mean and Standard Deviation of Normal Distribution under Type II Censoring

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Mean. Pranab K. Mitra and Bimal K. Sinha. Department of Mathematics and Statistics, University Of Maryland, Baltimore County

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Maximum Likelihood Estimation

Bayes Prediction Bounds for Right Ordered Pareto Type - II Data

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

4. CONTINUOUS RANDOM VARIABLES

Introduction An approximated EM algorithm Simulation studies Discussion

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Statistics 135 Fall 2007 Midterm Exam

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Small area estimation with missing data using a multivariate linear random effects model

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Brief Review on Estimation Theory

TAMS39 Lecture 2 Multivariate normal distribution

A note on multiple imputation for general purpose estimation

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Survival Analysis for Case-Cohort Studies

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

Negative Multinomial Model and Cancer. Incidence

Experience Rating in General Insurance by Credibility Estimation

ESTIMATING BIVARIATE TAIL

The properties of L p -GMM estimators

Extreme Value Analysis and Spatial Extremes

Inference for the Positive Stable Laws Based on a Special Quadratic Distance

Spring 2012 Math 541B Exam 1

On Bicomplex Nets and their Confinements

1 Degree distributions and data

Estimation of the functional Weibull-tail coefficient

Multivariate Assays With Values Below the Lower Limit of Quantitation: Parametric Estimation By Imputation and Maximum Likelihood

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Asymptotic Statistics-III. Changliang Zou

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Nonparametric Function Estimation with Infinite-Order Kernels

A Conditional Approach to Modeling Multivariate Extremes

Tail Dependence of Multivariate Pareto Distributions

Maximum Likelihood Estimation

Discrete Dependent Variable Models

CS Lecture 18. Expectation Maximization

Multiscale Systems Engineering Research Group

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Transcription:

, ISSN 0974-570X (Online, ISSN 0974-5718 (Print, Vol. 20; Issue No. 3; Year 2013, Copyright 2013 by CESER Publications Method of Conditional Moments Based on Incomplete Data Yan Lu 1 and Naisheng Wang 2 1 Department of Mathematics and Statistics University of New Mexico Albuquerque, NM, USA, 87131-0001 luyan@math.unm.edu 2 China Securities Index Co., Ltd Shanghai, China, 200135 wangnaisheng@yahoo.com.cn ABSTRACT This paper extends the traditional methods of moments to incomplete sample cases. This method is termed as the method of conditional moments since it is obtained via conditioning on observed data. The convergence and asymptotic normality of the conditional moment estimator are investigated under certain conditions. An iterative algorithm is proposed to solve the conditional moment equations. Examples show that the proposed method performs well for incomplete data. Keywords: Asymptotic normality, Censored sample, Conditional moment estimator, Convergence, Grouped sample. 2012 Mathematics Subject Classification: 62. 1 Introduction Method of moments is one of the oldest and most extensively used methods for parameter estimation. Let X =(X 1,X 2,,X n be independent and identically distributed random variables, each with distribution function F (x, θ for some fixed θ Θ R d. If the moments exist, the moment estimators for θ =(θ 1,,θ d are the solutions of the moment equations: μ (θ =m (X, =1,,d, (1.1 where μ (θ =E(X1 is the -th non-central population moment that depends on the unnown parameter vector θ, whereas m (X is the -th non-central sample moment which does not depend on any unnown parameter. Thus, the method of moments attempt to equate the population moments to the corresponding non-central sample moments to solve for θ (if exists. Under appropriate conditions, certain properties such as consistency and asymptotic normality can be derived for the moment estimators. Method of moment is popular also because of its simplicity in computing. However, an underlying assumption for the method of moments is the www.ceser.in/ijmc.html www.ceserp.com/cp-jour www.ceserpublications.com

completeness of the data set, which is not satisfied in many practical settings. The method of moments fails in those situations. The problem to apply the method of moments to the incomplete data cases lies in the construction of the sample moments m (X. This issue has been investigated by Wang (1992 and Mao and Wang (1997 in some special cases such as type-ii censored Weibull distribution. In this paper we propose the method of conditional moments for parameter estimation when the sample is incomplete. The proposed method equates the population moments to the corresponding sample moments under given conditions. One can apply this method in both complete and incomplete sample cases. This paper is organized as follows. Section 2 describes the proposed method of conditional moments. Section 3 discusses properties of the conditional moment estimators such as consistency and asymptotic normality. Section 4 proposes an efficient iterative algorithm of the method of conditional moments. Illustrative examples are presented in Section 5. Concluding remars are given in Section 6. 2 Method of Conditional Moments 2.1 Definition of the proposed method Define (X 1,,X n, θ, μ (θ and m (X as in Section 1. If the sample is incomplete, it is a problem to obtain m (X and therefore the regular method of moments fails. Let Y be the observed sample and a (θ, Y the conditional expectation of m (X given the observed Y, i.e. a (θ, Y =E{m (X Y}. (2.1 We call a (θ, Y the -th non-central conditional sample moments. It is clear that a (θ, Y depends on both Y and θ. By the property of conditional expectation, we have E{a (θ, Y} = μ (θ. Given the observed incomplete sample Y, the conditional expectation a (θ, Y is the optimal estimation of m (X under the mean square error minimization criteria. Therefore, a (θ, Y is expected to be close to m (X. Thus, we may replace m (X in (1.1 with a (θ, Y to estimate the parameter θ. Let a(θ, Y = (a 1 (θ, Y,,a d (θ, Y and μ(θ = (μ 1 (θ,,μ d (θ, (1.1 becomes: μ(θ =a(θ, Y. (2.2 We call (2.2 the conditional moment equation since a(θ, Y is the non-central conditional sample moment of the complete sample X. ˆθ is called the conditional moment estimator of parameter θ, if it falls in the parameter space Θ and satisfies (2.2. In this paper, we assume that θ is identifiable. The conditional sample moments reduce to the sample moments if data is complete. 3 Properties of the Method of Conditional Moments In this section, we will investigate the properties of the proposed method. In Section 3.1, we discuss convergence and asymptotic normality of the conditional moment estimator. In Section 41

3.2, we show that some frequently used incomplete samples such as randomly censored sample, doubly censored sample and grouped sample all fall within the range of the conditions for the convergence and asymptotic normality of the conditional moment estimator. The notation a (n (θ, Y, ˆθ n and a (n (θ, Y is used in this section. The supscript and subscript n stands for the size of the unobserved complete sample X. 3.1 Convergence and Asymptotic Normality Theorem 3.1. Suppose the following conditions are satisfied. (1 parameter space Θ is a compact subset of R d ; (2 μ(θ is continuous with respect to parameter θ; (3 a (n (θ, Y is differentiable with respect to θ, and θ a(n (θ, Y < K< holds for each n and θ, here is Euclidean norm and K is certain positive number. (4 there exits θ 0 Θ, such that a (n (θ, Y a.s μ(θ (as n if and only if θ = θ 0 ; Under the above conditions, we have ˆθ n a.s θ 0, as n Proof. From the compactness of the parameter space Θ, there exists a θ Θ, such that where {ˆθ n } is a subsequence of {ˆθ (n }. Thus By the definition of ˆθ n,wehave Expand a (n (ˆθ n, Y at the point θ, ˆθ n a.s θ, as n μ(ˆθ n μ( θ, as n a (n (ˆθ n, Y μ( θ, as n a (n (ˆθ n, Y =a (n ( θ, Y+(ˆθ n θ θ a(n (θ n, Y where θ n lies between ˆθ n and θ. Since θ a(n (θ n, Y < K, taing limitation on both sides of above equation, we get a (n ( θ, Y μ( θ, as n Under condition (4, θ must equal to θ 0. Since this is true for any limit point of the subsequence ˆθ n, i.e, all the subsequence converges to the same point, the result follows immediately. Theorem 3.1 established the strong convergence of the conditional moment estimator ˆθ n. The asymptotic normality of ˆθ n is presented in the following theorem. 42

Theorem 3.2. Provided the following conditions satisfied, (1 conditional sample moment a (n (θ, Y is approximately distributed as n{a (n (θ, Y μ(θ} L N(0, V θ where V θ is a positive matrix. (2 a (n 2 (θ, Y is twice differentiable with respect to θ, and θ θ a(n (θ, Y is bounded for each n and each θ in N θ0, a neighborhood of θ 0. (3 lim n a (n (θ, Y exists and the matrix M θ = lim n θ (a(n (θ, Y μ(θ is of full ran. Suppose ˆθ n is a consistent solution of the conditional moment equation (2.2. We have n(ˆθn θ 0 Σ 1/2 L N(0, Id where Σ 1/2 = M(θ 0 1 V(θ 0 1/2 and I d is a d d unit matrix. Proof. By the consistency of ˆθ n, we can focus on the case ˆθ n N θ0. Expand a (n (ˆθ n μ(ˆθ n at the point θ 0, a (n (ˆθ n μ(ˆθ n =a (n (θ 0, Y μ(θ 0 +(ˆθ n θ 0 θ {a(n (θ, Y μ(θ} θ=θ n, where θ n lies between θ 0 and ˆθ n. Note that a (n (ˆθ n μ(ˆθ n =0, n(ˆθn θ 0 θ {a(n (θ, Y μ(θ} θ=θ n = n{a (n (θ 0, Y μ(θ 0 }. 2 Since θ θ a(n (θ, Y is bounded for each n and each θ in N θ0, using similar arguments as in Theorem 3.1, it can be proved that θ {a(n P (θ, Y μ(θ} θ=θ n M(θ0 as n, where P means to converge in probability. Thus the result follows immediately from the asymptotic normality of a (n (θ, Y. The conditions in Theorem 3.1 and Theorem 3.2 seem a little too strict for the convergence and asymptotic normality of the estimator sequence ˆθ n. But some of the most commonly used incomplete samples such as randomly censored sample, doubly censored sample and grouped sample fall within the range of these conditions. This will be presented in Section 3.2. 3.2 Conditional Sample Moments of Censored and Grouped Samples In this section, we will investigate the conditional sample moments of three inds of frequently used incomplete samples including randomly censored sample, doubly Type-II censored sample and grouped sample. Although these three inds of incomplete samples can be investigated in a unified formula as in Dempster, Laird and Rubin (1977, we will not do that for the sae of explicit forms of the conditional sample moments. The convergence and asymptotic normality are displayed in detail. By Theorem 3.1 and Theorem 3.2, estimators by the method of conditional moments have the strong consistency and asymptotic normality. These results can be easily extended to more complicated settings such as multiply censored data. 43

3.2.1 Randomly Censoring In many prospective studies, survival data are often subject to random censoring. This typically comes up in the analysis of lifetime data. Under random censoring, rather than X = (X 1,,X n, the complete sample of interest, one observes Z i = min(x i,c i and δ i =1 (Xi C i i =1,,n where 1 (A is the indicator function of A and {C i } is another i.i.d random sequence independent of the {X i } sequence. Obviously, random censoring reduce to the usual Type-I censoring if all the C i s equal ( to a single specified real number. Z 1 Z 2 Z n Let Y =. Then Y is the observed incomplete sample. By the definition δ 1 δ 2 δ n of the conditional sample moments, the th non-central conditional sample moment of X is given by a (n (θ, Y = 1 {δ i Z 1 i + (1 δ i x df (x, θ} (3.1 n 1 F (Z i, θ Z i From the law of large numbers and central limit theorem, the following proposition follows immediately. Proposition 1. Assume 1 dg(x <, here G(x is the common distribution 1 F (x, θ function of C i s. Then (1 if x df (x, θ <, then a (n a.s (θ, Y μ (θ(as n. (2 if x2 df (x, θ <, then n(a (n (θ, Y μ (θ L N(0, 2, where 2 = x2ḡ(xdf (x, θ+ F (x, θ{ z x df (x, θ} 2 dg(x μ 2 (θ 3.2.2 Doubly Type-II Censoring Let X =(X 1,,X n be a random sample of size n from distribution F (x, θ, which has a density function f(x, θ. Censor the r 1 smaller and the s largest observations, then the remaining observations Y =(X r:n,,x n s:n constitute a doubly Type-II censored sample, where X i:n is the i th smallest observation in X. Doubly Type-II censoring frequently occurs in survival analysis and other research fields. Tin, Tan and Balarishnan (1986 gave a detailed discussion of the application of doubly Type-II censoring in robust analysis. From (2.1, the th non-central conditional sample moment of X is given as follows (θ, Y = 1 n s n [ r 1 Xi:n + E{Xi:n Y} + a (n i=r = 1 n s n { Xi:n + r 1 F (X r:n,θ i=r Xr:n i=n s+1 x df (x, θ E{X i:n Y}] s + x df (x, θ} (3.2 1 F (X n s:n,θ n s:n Assume equation F (x, θ =p has unique solution ξ p (θ for each p (0, 1. Let i = p i (1 p i /f(ξ pi, θ (i =1, 2 44

ρ = p 1 (1 p 2 /p 2 (1 p 1 ( 1 2 ρ 1 2 Σ= ρ 1 2 2 2 By the convergence and asymptotic normality of sample quantiles, the convergence and asymptotic normality of a (θ, Y can also be proved. Proposition 2. Assume (1 x2 df (x, θ < ; (2 there are 0 <p 1 <p 2 < 1, such that r n n = p 1 + o( 1 and n r n = p 1 + o( 1 ; n n n (3 f(x, θ is continuous at the points x = ξ pi (θ and f(ξ pi, θ is positive. Then a (n a.s (θ, Y μ (θ as n (n n(a (θ, Y μ (θ L N(0, 2 where 2 =[ ξ p2 ξ p1 u 2 1 ξp2 f(u, θdu { uf(u, θdu} 2 ] 1 p 2 p 1 The proof is trivial. ξ p1 3.2.3 Grouping In grouped (or interval censored data, each X i is nown only to lie between two nown constants a j and a j+1, here <a 0 a 1 a 2 a l a l+1 <. Grouped data is common in various fields of applied statistics. Denote the interval (a j,a j+1 by ( I j. Let r j be the number of the observations fell in interval I j. Then the observed data is Y = I 0 I 1 I l r 0 r 1 r l.by definition, the th non-central conditional sample moment of X is a (n (θ, Y = 1 n l r i { F (a j+l,θ F (a j, θ j=0 We rewrite this expression as follows a (n (θ, Y = 1 n l δ ij [ { F (a j+1, θ F (a j, θ j=0 aj+1 a j x df (x, θ} (3.3 aj+1 a j x df (x, θ}] where δ ij = 1 (Xi Ij. Thus, the convergence and asymptotic normality of a (θ, Y can be obtained from the law of large numbers and central limit theorem. Proposition 3. Assume F (a j+1, θ F (a j, θ > 0 for each j. (1 if x d(x, θ <, then a (n a.s. (θ, Y μ (θ, Y as n ; (2 if x2 d(x, θ <, then n(a (n (θ, Y μ (θ L N(0, 2 as n, where 2 = { a j+1 a j x df (x} 2 F (a j+1 F (a j μ2. j=0 45

4 Iterative Algorithm The method proposed in Section 2 can be applied to almost all inds of complicated incomplete sample cases as long as the required moments of the underlying population exist. Unfortunately, it is only for relatively simple situations that (2.2 has an explicit solution. The intractability of (2.2 results from F (x, θ Y, the conditional joint distribution function of X, which often has complicated form. Therefore, numerical iterative methods are required for solving the equation to obtain ˆθ. For this, general purpose root finding algorithm such as Newton-Raphson and quasi-newton algorithm are available. The Newton-Raphson algorithm is fast convergent but an analytical expression for the derivatives of a(θ, Y and μ(θ with respect to θ may not be obtained easily in some complicated situations. The deficiency of the quasi-newton algorithm in obtaining ˆθ has been noted in some application (Srivastava and Keen, 1998. In this section, we propose a new iterative procedure for solving (2.2 to obtain ˆθ. Suppose θ (p denotes the current value of θ after p step cycles of the algorithm. The next value of θ can be obtained by solving the following equation a(θ (p, Y =μ(θ (4.1 The above equation has the same form as (1.1, the moment equation, as if the whole sample X were observed. Hence θ (p+1 can be easily obtained. Obviously, we can decompose each cycle of this iterative procedure into two steps as follows. Step 1 : Calculate a(θ (p, Y, the conditional expectation of the sample moments of the unobserved complete sample X. Step 2 : Defining θ (p+1 as the solution of (4.1. One may find that these two steps are very similar to those of EM algorithm except that the left side of (4.1 is replaced by the log-lielihood in the latter. The following theorem established the convergence of the proposed iterative procedure. Illustrative examples are given in Section (5. Theorem 4.1. The sequence {θ (p } converges to ˆθ, if the following conditions satisfied. (1 (4.1 has unique solution. (2 a(θ, Y and μ(θ are continuous with respect to θ. Proof. Let {θ p } be any subsequence of {θ (p }. Since Θ is compact, there exists subsequence {θ p l } of {θ p } such that {θ p l } θ, where θ Θ. Since a(θ, Y is continuous about θ, a(θ (p l, Y a( θ, Y,n By (4.1 and continuity of μ(θ, wehave μ(θ (p +1 l μ( θ, (n So a( θ, Y =μ( θ i.e., θ is equivalent to the solution ˆθ from (2.2. Since sequence {θ (p } is arbitrary, the result follows. 46

5 Illustrative Examples In this section, we present three examples to illustrate the use of the method discussed in this paper. Example 1. Doubly censored Pareto distribution. Pareto distribution is widely used in economic research. It has been found that the claims on an insurance can be fitted well by Pareto distribution. Recently many researchers use Pareto distribution to fit the tail of the return rate of investments. The distribution function and density function of Pareto distribution are given by F (x, θ =1 (c/x θ and f(x, θ =θc θ /x θ+1 x>c>0,θ >0 where c is a nown positive constant and θ is an unnown parameter to be estimated. By the definition from Resni (1997 of a heavy tailed distribution, Pareto distribution is typically heavy tailed, and θ is the so-called tail indicator. Smaller θ implies a heavier tail. Other forms of the Pareto distribution can be found in Johnson, Kotz and Balarishnan (1994. The mean of Pareto distribution is μ 1 (θ =cθ/(θ 1 (θ >1 Suppose that the observed sample has the form x r+1:n < <x n s:n, that is doubly censored. According to (3.2, the conditional sample mean of X is given by a 1 (θ, Y = 1 n { rcθ θ 1 c 1 ( θ+1 x r+1:n c 1 ( } + n s θ r+1 x r+1:n x i:n + sx n s:nθ θ 1. Then the conditional moment estimator ˆθ of θ can be obtained by solving the conditional moment equation a 1 (θ, Y =μ 1 (θ. The maximum lielihood estimator θ is given by maximizing the following lielihood function L(θ {F (x r+1:n,θ} r n s i=r+1 f(x i:n,θ {1 F (x n s:n,θ} s We compare the conditional moment estimator and maximum lielihood estimator by a simulated study. These simulations are for fixed c =1and several groups of (n, r, s. For each (n, r, s the results are obtained by 5000 simulations. The simulated results in Table (1 are obtained through the proposed iterative algorithm in the previous section. Table (1 shows negligible difference between conditional moment estimator and maximum lielihood estimator. This implies that the performance of the conditional moment estimator is almost as efficient as the MLE s. Example 2. Grouped normal data. The second example concerns with the grouped normal data which has been discussed by several authors including Swan (1969 and Wolnetz (1979. Here we use the conditional method to estimate the unnown mean and standard deviation. Suppose the real line is divided into +1intervals (a 0,a 1, [a 1,a 2,, [a,a +1, here a 0 = 47

Table 1: Simulated comparison between the proposed method and maximum lielihood method for doubly censored Pareto distribution. ˆθ is the estimator from proposed method. θ is the estimator from maximum lielihood method. Rbias is relative bias of the simulated estimator. Rmse is the relative mean square error of simulated estimator. θ n r s Rbias(ˆθ Rbias( θ Rmse(ˆθ Rmse( θ 5 5.0519.0487.0515.0509 5 30 5 10.0580.0552.0667.0663 10 5.0504.0468.0501.0494 10 10.0508.0478.0604.0600 5 5.0302.0282.0234.0234 5 50 5 10.0297.0279.0281.0279 10 5.0246.0221.0262.0260 10 10.0367.0349.0282.0281 5 5.0522.0476.0510.0502 8 30 5 10.0576.0530.0639.0634 10 5.0403.0347.0466.0461 10 10.0737.0693.0732.0721 5 5.0216.0186.0239.0235 8 50 5 10.0330.0300.0282.0281 10 5.0328.0276.0261.0254 10 10.0253.0221.0290.0288 48

,a +1 =. Let r i be the number of the observations fell in the interval (a i,a i+1. By (3.3 the first conditional sample moments of X are a 1 (μ,, Y =μ + n a 2 (μ,, Y = μ 2 + + n i=0 i=0 3μ + 2 n φ( a i μ r i Φ( a i μ i=0 a i φ( a i μ r i Φ( a i+1 μ φ( a i+1 μ Φ( a i+1 μ φ( a i μ r i Φ( a i+1 μ φ( a i+1 μ a i+1 φ( a i+1 μ Φ( a i μ Φ( a i μ One can see that the above equations are equivalent to the lielihood ones. Therefore the conditional moment estimators of μ and equal the maximum lielihood estimators. Similarly, the proposed iterative algorithm is equivalent to the algorithm in this case. Example 3: Bivariate normal data with non-response in one variable. The purpose of this example is to estimate the mean vector μ and the covariance matrix Σ of a bivariate normal (X 1,X 2. The sample consists of m completed pairs of observations (x 11,x 12,, (x m1,x m2 and n m observed values x m+1,1,,x n,1 of X 1. Let μ =(μ 1,μ 2, ψ = vec(σ =( 11, 12, 22 and θ =(μ, ψ. The lielihood estimation for parameter θ has no closed form. This example was first discussed by Little and Rubin (2002. This is a typical missing data problem and the proposed estimate method in this paper can( be applied here. x 11 x 12 x 1n Denote by X the interested complete data and Y the observed incomplete data. From the property of multivariate nor- x 21 x 22 x ( 2n x 11 x 1m x 1m+1 x 1n x 21 x 2m.. mal distribution (Rao, 2001, the conditional distribution of X 1 given X 2 is N(μ 2 + 12 (X 1 11 μ 1, 22 2 12. Hence the first two conditional sample moments of X given Y is given by 11 E{ 1 n E{ 1 n E{ 1 n X i1 Y} = 1 n x i1, E( 1 n X1 Y 2 = 1 n X i2 Y} = 1 m n { x i2 + (n mμ 2 + 12 11 E{ 1 n i=m+1 x 2 i1 (x i1 μ 1 } Xi2 Y} 2 = 1 m n [ x 2 i2 + (n m( 22 2 12 11 + i=m+1 X i1 X i2 Y} = 1 m n [ x i1 x i2 + 49 {μ 2 + 12 (x i1 μ 1 } 2 ] 11 i=m+1 x i1 {μ 2 + 12 11 (x i1 μ 1 }]

Thus we can construct conditional moment equations by equating the above conditional sample moments to the corresponding population moments and the conditional moment estimators for θ are given as follows. here x (h j = 1 h h x ij, S (h j ˆμ 1 = x (n 1, ˆ 11 = S (n 1, ˆ 12 = ρ (m S (n 1 /S (m 1 = 1 h ˆμ 2 = x (m 2 ρ(m S (m 1 ˆ 22 = S (m 2 (S (n 1 S (m h (x ij 1 h ( x (m 1 x (n 1 1 ( ρ(m S (m 2 1 h x ij 2, h is either n or m, ρ (m = 1 m x i1 x i2 m x (m 1 x (m 2. And thus the estimators of mean vector μ and covariance matrix Σ are derived. We can get the lielihood function of Y by n m f(y,μ, 2 + 1 2 2 1 1 i=m+1 m ( ( 2 2 exp{ 1 m x i1 μ 1 Σ 1 x i1 μ 1 2 x i2 μ 2 x i2 μ 2 (x i1 μ 1 2 } This is equivalent to the following expression: n m f(y,μ, 2 1 trσ 1 ( m ( 2 2 exp{ 1 2 [m S (m 1 ρ (m ρ (m S (m 2 + n( x (n 1 μ 1 2 m( x (m 1 μ 1 2 ]} x (m ( 1 μ 1 x (m Σ 1 2 μ 2 + n m 1 2 (ns (n 1 ms (m 1 x (m 1 μ 1 x (m 2 μ 2 where tr is the trace of the matrix. By the factor decomposition theory, ( x (n 1, x(m 1, x (m 2,S (n 1,S (m 2,ρ (m are sufficient statistics. Compared to the maximum lielihood estimators, the conditional moment estimators of parameters have simple explicit forms and are functions of sufficient statistics. 6 Concluding Remars In this paper, we proposed the method of conditional moments as an extension of the traditional method of moments. This method is appealing since it can be used to obtain estimator for various complicated settings with incomplete data set. The conditional moment estimator might be not unique if different conditional sample moments are used. There is no optimal rule for the choice of the conditional sample moments system. We prefer the simplicity of the conditional moment equation (2.2 as a choice. The resulting estimator should be functions of sufficient statistics if possible. We can generalize the proposed method to incomplete time series and investigate the large sample properties similarly. 50

References Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977. Maximum lielihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society 39: 1 38. Johnson, N. L., Kotz, S. and Balarishnan, N. (1994. Continuous univariate distribution, John Wiley & Sons, New Yor. Little, R. J. and Rubin, D. B. (2002. Statistical Analysis with Missing Data, Wiley, New Yor. Mao, S. and Wang, L. (1997. Accelerated life testing, Science press of china. Rao, C. R. 2001. Linear statistical inference and its application, John Wiley, New Yor. Resni, S. I. (1997. Heavy tail modeling and teletraffic data, Annals of Statistics 25: 1805 1866. Srivastava, M. S. and Keen, K. J. 1998. Estimation of the interclass correlation coefficient, Biometria 75: 731 739. Swan, A. S. (1969. Algorithm as 16: Maximum lielihood estiamtion from grouped and cencored normal data, Applied statistics 18: 110 114. Tin, M. L., Tan, W. Y. and Balarishnan, N. (1986. Robust inference, Marcel Deer, Inc. Wang, B. (1992. Statistical inference for weibull distribution, Chinese journal of applied probability and statistics 8: 257 364. Wolnetz, M. S. (1979. Algorithm as 139: Maximum lielihood estiamtion from confined and cencored normal data, Applied statistics 28: 195 206. 51