PARAMETER ESTIMATION BASED ON CUMU- LATIVE KULLBACK-LEIBLER DIVERGENCE

Similar documents
Convergence of random variables. (telegram style notes) P.J.C. Spreij

Lecture 19: Convergence

Bayesian and E- Bayesian Method of Estimation of Parameter of Rayleigh Distribution- A Bayesian Approach under Linex Loss Function

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MOMENT-METHOD ESTIMATION BASED ON CENSORED SAMPLE

Distribution of Random Samples & Limit theorems

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

7.1 Convergence of sequences of random variables

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

This section is optional.

Random Variables, Sampling and Estimation

Estimation of Gumbel Parameters under Ranked Set Sampling

Chapter 6 Principles of Data Reduction

Stochastic Simulation

ECE 901 Lecture 13: Maximum Likelihood Estimation

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State

Introductory statistics

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Expectation and Variance of a random variable

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

New Entropy Estimators with Smaller Root Mean Squared Error

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

Mathematical Modeling of Optimum 3 Step Stress Accelerated Life Testing for Generalized Pareto Distribution

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Maximum likelihood estimation from record-breaking data for the generalized Pareto distribution

Kernel density estimator

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

ANOTHER WEIGHTED WEIBULL DISTRIBUTION FROM AZZALINI S FAMILY

Asymptotic distribution of products of sums of independent random variables

Unbiased Estimation. February 7-12, 2008

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Self-normalized deviation inequalities with application to t-statistic

Estimation for Complete Data

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Stat410 Probability and Statistics II (F16)

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Statistical Theory MT 2008 Problems 1: Solution sketches

Lecture 33: Bootstrap

32 estimating the cumulative distribution function

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Statistical Theory MT 2009 Problems 1: Solution sketches

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

An Introduction to Asymptotic Theory

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

Lecture Stat Maximum Likelihood Estimation

7.1 Convergence of sequences of random variables

Mi-Hwa Ko and Tae-Sung Kim

1 Introduction to reducing variance in Monte Carlo simulations

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Modeling and Estimation of a Bivariate Pareto Distribution using the Principle of Maximum Entropy

A statistical method to determine sample size to estimate characteristic value of soil parameters

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Exponential Families and Bayesian Inference

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Stat 421-SP2012 Interval Estimation Section

1.010 Uncertainty in Engineering Fall 2008

Statistical Inference Based on Extremum Estimators

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Bayesian Methods: Introduction to Multi-parameter Models

11 THE GMM ESTIMATION

STA Object Data Analysis - A List of Projects. January 18, 2018

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Empirical Process Theory and Oracle Inequalities

Problem Set 4 Due Oct, 12

CSE 527, Additional notes on MLE & EM

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Lecture 2: Monte Carlo Simulation

6.3 Testing Series With Positive Terms

Quantile regression with multilayer perceptrons.

4. Partial Sums and the Central Limit Theorem

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

An Introduction to Randomized Algorithms

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Topic 9: Sampling Distributions of Estimators

LECTURE 8: ASYMPTOTICS I

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Infinite Sequences and Series

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Sieve Estimators: Consistency and Rates of Convergence

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Topic 9: Sampling Distributions of Estimators

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Element sampling: Part 2

o <Xln <X2n <... <X n < o (1.1)

INFINITE SEQUENCES AND SERIES

Transcription:

PARAMETER ESTIMATION BASED ON CUMU- LATIVE KULLBACK-LEIBLER DIVERGENCE Authors: Yaser Mehrali Departmet of Statistics, Uiversity of Isfaha, 81744 Isfaha, Ira (y.mehrali@sci.ui.ac.ir Majid Asadi Departmet of Statistics, Uiversity of Isfaha, 81744 Isfaha, Ira (m.asadi@sci.ui.ac.ir Abstract: I this paper, we propose some estimators for the parameters of a statistical model based o Kullback-Leibler divergece of the survival fuctio i cotiuous settig ad apply it to type I cesored data. We prove that the proposed estimators are subclass of geeralized estimatig equatios estimators. The asymptotic properties of the estimators such as cosistecy ad asymptotic ormality are ivestigated. Some illustrative examples are also provided. I particular, i estimatig the shape parameter of geeralized Pareto distributio, we show that our procedure domiates some existig methods i the sese of bias ad mea squared error. Key-Words: Estimatio; Geeralized Estimatig Equatios; Iformatio Measures; Geeralized Pareto distributio; Cesorig. AMS Subject Classificatio: 6B1, 94A15, 94A17, 6G3, 6E, 6F1, 6F1, 6N. The opiios expressed i this text are those of the authors ad do ot ecessarily reflect the views of ay orgaizatio.

Yaser Mehrali ad Majid Asadi

Parameter estimatio based o cumulative Kullback-Leibler divergece 3 1. INTRODUCTION The Kullback-Leibler (KL divergece (also kow as relative etropy is a measure of discrimiatio betwee two probability distributios. If the radom variables X ad Y have probability desity fuctios f ad g, respectively, the KL divergece of f relative to g is defied as D (f g = R f (x log f (x g (x dx, for x such that g(x. The fuctio D (f g is always oegative ad it is zero if ad oly if f = g a.s. Let f θ belog to a parametric family with p-dimesioal parameter vector θ Θ R p ad f be a kerel desity estimator of f θ based o radom variables {X 1,..., X } of distributio of X. Basu ad Lidsay [3] used KL divergece of f relative to f θ as (1.1 D ( f f θ = f (x log f (x f (x; θ dx, ad defied the miimum KL divergece estimator of θ as R θ = arg if D ( f f θ. θ Θ Lidsay [19] proposed a versio of (1.1 i discrete settig. I recet years, may authors such as Morales et al. [1], Jiméez ad Shao [17], Broiatowski ad Keziou [6], Broiatowski [5], Cherfi [7, 8, 9] studied the properties of miimum divergece estimators uder differet coditios. Basu et al. [4] discussed i their book about the statistical iferece with the miimum distace approach. Although the method of estimatio based o D ( f f θ has very iterestig properties, the defiitio is based o f which, i geeral, may ot exist. Let X be a radom variable with cumulative distributio fuctio (c.d.f F (x = P (X x ad survival fuctio (s.f F (x = 1 F (x. Based o observatios {x 1,..., x } of distributio F, defie the empirical cumulative distributio ad survival fuctios, respectively, by (1. F (x = i= i I [x (i,x (i+1 (x, ad 1 ( (1.3 F (x = 1 i I [x(i,x (i+1 (x,

4 Yaser Mehrali ad Majid Asadi where I is the idicator fuctio ad ( = x ( x (1 x ( x ( ( x (+1 = are the order observatios correspodig to the sample. The fuctio F ( F is kow i the literature as empirical estimator of F ( F. I the case whe X ad Y are cotiuous oegative radom variables with s.f s F ad Ḡ, respectively, a versio of KL divergece i terms of s.f s F ad Ḡ ca be give as follows: KLS ( F Ḡ = F (x log F (x dx [E (X E (Y ]. Ḡ(x The properties of this divergece measure are studied by some authors such as Liu [] ad Baratpour ad Habibi Rad [1]. I order to estimate the parameters of a statistical model F θ, Liu [] proposed cumulative KL divergece betwee the empirical survival fuctio F ad survival fuctio F θ (we call it CKL ( F F θ as CKL ( F F ( θ = F (x log F (x F (x; θ [ F (x F (x; θ ] dx = F (x log F (x dx [ x E θ (X ], F (x log F (x; θ dx where x is the observed sample mea. The cited author defied miimum CKL divergece estimator (M CKLE of θ as θ = arg if CKL ( F (x F θ. θ Θ If we cosider the parts of CKL ( F F that depeds o θ ad defie (1.4 g (θ = E θ (X F (x log F (x; θ dx, the the MCKLE of θ ca equivaletly be defied by θ = arg if g (θ. θ Θ Two importat advatages of this estimator are that oe does ot eed to have the desity fuctio ad that for large values of the empirical estimator F teds to the distributio fuctio F. Liu [] applied this estimator i uiform ad expoetial models ad Yari ad Saghafi [35] ad Yari et al. [34] used it for estimatig parameters of Weibull distributio; see also Park et al. [6] ad Hwag ad Park [16]. Yari et al. [34] foud a simple form of (1.4 as (1.5 g (θ = E θ (X 1 h (x i = E θ (X h (x,

Parameter estimatio based o cumulative Kullback-Leibler divergece 5 where h (x = 1 h (x i, ad (1.6 h (x = They also proved that E (h (X = x log F (y; θ dy. F (x; θ log F (x; θ dx, which shows that if teds to ifiity, the CKL ( F F θ coverges to zero. The aim of the preset paper is to exted the defiitio of MCKLE to the case that the radom variable of iterest has support i whole real lie. I the process of doig so we also ivestigate asymptotic properties of M CKLE ad provide some examples. Recetly Park et al. [4] exteded the cumulative Kullback-Leibler iformatio to the whole real lie as ad CRKL (F : G = CKL (F : G = F (x log F (x dx [E (X E (Y ], Ḡ(x F (x log F (x dx [E (Y E (X]. G(x They proposed a geeral cumulative Kullback-Leibler iformatio as GCKL α (F : G = αckl (F : G + (1 α CRKL (F : G, α 1, ad studied its applicatio to a test for ormality i compariso with some competig test statistics based o the empirical distributio fuctio. The rest of the paper is orgaized as follows: I Sectio, we propose a extesio of the MCKLE i the case whe the support of the distributio is real lie ad preset some illustrative examples. I Sectio 3, we show that the proposed estimator belogs to the class of geeralized estimatig equatios (GEE. Asymptotic properties of M CKLE such as cosistecy, ormality are ivestigated i this sectio. Several examples are give i this sectio. We have show, amog other examples, that whe the uderlyig distributio is geeralized Pareto oe ca employ M CKLE to estimate the shape parameter of the model, for a subset of parameter space, while the MLE does ot exist i that subset. I Sectio 4, we exted the results to the type I cesored data.. AN EXTENSION OF MCKLE I this sectio, we propose a extesio of the MCKLE for the case whe X is assumed to be a cotiuous radom variable with support R. It is kow

6 Yaser Mehrali ad Majid Asadi that [3] E θ X = F (x dx + F (x dx. We first give a extesio of CKL divergece for the case that the radom variables are distributed over real lie R. Defiitio.1. Let X ad Y be radom variables o R with c.d.f s F ad G, s.f s F ad Ḡ ad fiite meas E (X ad E (Y, respectively. The CKL divergece of F relative to Ḡ is defied as CKL ( F Ḡ { = F (x log F (x } [F (x G (x] dx G (x { + F (x log F (x Ḡ (x [ F (x Ḡ (x ] } dx = F (x log F (x G (x dx + F (x log F (x dx [E X E Y ]. Ḡ (x A applicatio of the log-sum iequality ad the fact that, for all x, y > x log x y x y, (equality holds if ad oly if x = y show that the CKL is oegative. Usig the fact that i log-sum iequality, equality holds if ad oly if F = G, a.s., oe gets that CKL ( F Ḡ = if ad oly if F = G, a.s. Let F θ be the populatio c.d.f. with ukow parameter θ Θ R p ad F be the empirical c.d.f. based o a radom sample X 1, X,..., X from F θ. Based o the above defiitio, the CKL divergece of F relative to F θ is defied as CKL ( F F θ = F (x log F (x F (x; θ dx + [ ] x E θ X, F (x log F (x F (x; θ dx where x is the mea of absolute values of the observatios. Let us also defie (.1 g (θ = E θ X F (x log F (x; θ dx F (x log F (x; θ dx. Now, we have the followig defiitio which is a extesio of CKL estimator i Liu approach: Defiitio.. Assume that E θ X < ad g (θ is positive defiite. The, uder the existece, we defie MCKLE of θ to be a value i the parameter space Θ which miimizes g(θ. If X is oegative, the g (θ i (.1 reduces to (1.4. So the results of Liu [], Yari ad Saghafi [35], Yari et al. [34], Park et al. [6] ad Hwag ad Park

Parameter estimatio based o cumulative Kullback-Leibler divergece 7 [16] yield as special cases. It should be oted that by the law of large umbers F coverges to F θ ad F coverges to F θ as teds to ifiity. Cosequetly CKL ( F F θ coverges to zero as teds to ifiity. I order to study the properties of the estimator, we first fid a simple form of (.1. Let us itroduce the followig otatios: for x <, ad u (x = x log F (y; θ dy, (. s (x = I (, (x u (x + I [, (x h (x, for x R, where h is defied i (1.6. Assumig that x (1, x (,..., x ( deote the ordered observed values of the sample ad that x (k < x (k+1, for some value of k, k =,..., (x ( =, the by (1. ad (1.3, we have F (x log F (x; θ dx = k 1 = 1 = 1 i x (i+1 k log F (x; θ dx + x (i x (k log F (x; θ dx k 1 i [ u ( ( ] k x (i u x(i+1 + u ( x (k k u ( x (i. Usig the same steps, we have F (x log F (x; θ dx = 1 So, g (θ i (.1 gets the simple form h ( x (i. i=k+1 (.3 g (θ = E θ X 1 = E θ X 1 k u ( x (i 1 i=k+1 h ( x (i s (x i = E θ X s (x. If k = (i.e., X is oegative, the g (θ i (.3 reduces to (1.5. It ca be easily see that E (s (X = F (x; θ log F (x; θ dx + F (x; θ log F (x; θ dx, I the followig, we give some examples.

8 Yaser Mehrali ad Majid Asadi Example.1. Let {X 1,..., X } be i.i.d. Normal radom variables with probability desity fuctio ( 1 ϕ (x; µ, = exp 1 ( x µ, x R, µ R, >. π I this case E ( X = µ [ Φ ( µ ] ( 1 +ϕ µ, where Φ deotes the distributio fuctio of stadard ormal. For this distributio, h (x, u (x ad g (µ, do ot have closed forms. The zeros of the gradiet of g (µ, with respect to µ ad give respectively ad (.4 ϕ ( µ Φ + i=k+1 x i ( µ + k ( xi µ log Φ x i < ( µ xi log Φ k x i < µ x i µ zϕ (z Φ (z dz ( + k log Φ µ ( µ ( k log Φ =, i=k+1 x i x i µ µ zϕ (z dz =. 1 Φ (z To obtai our estimators, we eed to solve these equatios umerically. For computatioal purposes, the followig equivalet equatio ca be solved istead of (.4. ( µ µ ϕ + x (1 µ F (µ + z zϕ (z x ( µ Φ (z dz µ F (µ + z zϕ (z dz =. 1 Φ (z Figure 1 compares these estimators with the correspodig M LE s. I order to compare our estimators ad the MLE s we made a simulatio study i which we used samples of sizes 1 to 55 by 5 with 1 repeats, where we assume that the true values of the model parameters are µ true = ad true = 3. It is evidet from the plots that the MCKLE approximately coicides with the MLE i both cases. Example.. Let {X 1,..., X } be i.i.d. Laplace radom variables with probability desity fuctio f (x; θ = 1 ( θ exp x, θ x R, θ >. We simply have MCKLE of θ as X θ =. This is exactly the momet estimator of θ.

Parameter estimatio based o cumulative Kullback-Leibler divergece 9 µ µ true S (µ 1. 1.1 MLE MCKLE 1 15 5 3 35 4 45 5 55..6 1. MLE MCKLE 1 15 5 3 35 4 45 5 55 true S (.9.94.98 MLE MCKLE.1.3 MLE MCKLE 1 15 5 3 35 4 45 5 55 1 15 5 3 35 4 45 5 55 Figure 1: µ/µ true, S ( µ, / true ad S ( as fuctios of sample size 3. ASYMPTOTIC PROPERTIES OF ESTIMATORS I this sectio we study asymptotic properties of M CKLE s. For this purpose, first we give a brief review o GEE. Some related refereces o GEE are Huber [13], Serflig [31], Qi ad Lawless [9], va der Vaart [33], Pawita [8], Shao [3], Huber ad Rochetti [15] ad Hampel et al. [1]. Throughout this sectio, we use the termiology from Shao [3]. We assume that X 1,..., X represets idepedet radom vectors, i which the dimesio of X i is d i, i = 1,..., (sup i d i <. We also assume that i the populatio model the vector θ is a p-vector of ukow parameters. The GEE method is a geeral method i statistical iferece for derivig poit estimators. Let Θ R p be the rage of θ, ψ i be a Borel fuctio from R d i Θ to R p, i = 1,...,, ad s (γ = ψ i (X i, γ, γ Θ. If θ Θ is a estimator of θ which satisfies s ( θ =, the θ is called a GEE estimator. The equatio s (γ = is called a GEE. Most of the estimatio methods such as likelihood estimators, momet estimators ad M-estimators are

1 Yaser Mehrali ad Majid Asadi special cases of GEE estimators. Usually GEE s are chose such that (3.1 E [s (θ] = E [ψ i (X i, θ] =. If the exact expectatio does ot exist, the the expectatio E may be replaced by a asymptotic expectatio. The cosistecy ad asymptotic ormality of the GEE are studied uder differet coditios (see, for example Shao [3]. 3.1. Cosistecy ad asymptotic ormality of the M CKLE Let θ be MCKLE which miimizes g i (.3 with s as defied i (.. Here, we show that the MCKLE s are special cases of GEE. Usig this, we show the cosistecy ad asymptotic ormality of M CKLE s. Theorem 3.1. of GEE estimators. M CKLE s, by miimizig g i (.3, are special cases Proof: I order to miimize g i (.3, we get the derivative of g, uder the assumptio that it exists, θ g (θ = θ E θ X 1 which is equivalet to GEE s (θ = where (3. s (θ = with [ θ E θ X ] θ s (x i = θ s (x i =, ψ (x i, θ, (3.3 ψ (x, θ = θ E θ X θ s (x. Now E [s (θ] =, sice (3.4 E [ ] θ s (X = θ E θ X, that ca be prove by some simple algebra. This proves the result.

Parameter estimatio based o cumulative Kullback-Leibler divergece 11 Corollary 3.1. I the special case whe the support of X is R +, MCKLE is a special case of GEE estimators, where [ (3.5 s (θ = θ E θ (X ] θ h (x i = ψ (x i, θ, with (3.6 ψ (x, θ = θ E θ (X θ h (x. The M CKLE s are cosistet estimators uder mild coditios. To see this, let for each θ be a MCKLE or equivaletly a GEE estimator, i.e., s ( θ =, where s is defied as (3. or (3.5. Suppose that ψ defied i (3.3 or (3.6 is a bouded ad cotiuous fuctio of θ. Let also Ψ (θ = E [ψ (X, θ], where we assume that Ψ (θ exists ad is full rak. The, from Propositio 5. of Shao [3] ad usig the fact that (3.1 holds, θ p θ. Asymptotic ormality of a cosistet sequece of M CKLE s ca be established uder some coditios. We first cosider the special case where θ is scalar ad X 1,..., X are i.i.d. Theorem 3.. Let θ be a cosistet MCKLE of θ. The d ( ( θ θ N, F, where F = A/B, with ad B = A = E [ ] [ ] θ s (X θ E θ X, [ ] [ θ F (x; θ θ F (x; θ dx + F (x; θ F (x; θ ] dx. Proof: Usig Theorem 3.1 we have E [ψ (X, θ] =. So if we cosider ψ defied i (3.3, we have E [ψ (X, θ] = V ar [ψ (X, θ] [ = V ar θ E θ X ] θ s (X [ ] = V ar θ s (X [ ] [ ] = E θ s (X θ E θ X,

1 Yaser Mehrali ad Majid Asadi where the last equality follows from (3.4. O the other had [ ] Ψ (θ = θ E θ X E θ s (X, ad [ ] E θ s (X = So = + x x + log F (y; θ dyf (x; θ dx θ θ log F (y; θ dyf (x; θ dx [ F (y; θ ] θ F (y; θ θ F (y; θ dy F (y; θ F (y; θ θ F [ (y; θ θ F ] (y; θ F (y; θ F (y; θ F (y; θ dy [ ] = θ E θ X θ F (x; θ dx F (x; θ Ψ (θ = Now, usig Theorem 5.13 of Shao [3], F [ [ ] [ θ F (x; θ θ F (x; θ dx + F (x; θ F (x; θ is give as F = E(ψ (X, θ [Ψ (θ]. θ F (x; θ F (x; θ ] dx. ] dx. Similar to Theorem 3. it ca be show i the case that θ Θ R p is vector ad X 1,..., X are i.i.d., uder the coditios of Theorem 5.14 of Shao [3], V 1/ d ( θ θ Np (, I p, where V = 1 B 1 AB 1 with ad B = A = [ ] [ ] T [ ] [ ] T θ s (X θ s (X θ E θ X θ E θ X, [ θ F (x; θ ] [ θ F (x; θ ] T F (x; θ provided that B is ivertible matrix. [ dx + θ F ] [ (x; θ θ F (x; θ F (x; θ ] T dx,

Parameter estimatio based o cumulative Kullback-Leibler divergece 13 Remark 3.1. I Theorem 3. (ad the result stated just after that for p dimesioal parameter if we assume that the support of X is oegative A ad B are give, respectively, by [ ] [ ] (3.7 A = E θ h (X θ E θ (X, ad B = [ θ F (x; θ F (x; θ ] dx, (3.8 A = E [ ] [ ] T [ ] [ ] T θ h (X θ h (X θ E θ (X θ E θ (X, B = [ θ F ] [ (x; θ θ F (x; θ F (x; θ Now, followig Pawita [8], we ca fid sample versio of the variace formula for the MCKLE as follows. Give x 1,..., x let ] T dx. (3.9 ad (3.1 J = Ê [ψ (X, θ] = 1 ( ψ x i, θ ( ψ T x i, θ = { } { } T θ s (x θ s (x θ= θ I = Ê ψ (X, θ θ = 1 ( θ ψ x i, θ = θ E θ X + θ= θ θ s (x { } { } T θ s (x θ s (x, θ= θ θ= θ. Usig otatios defied i (3.9 ad (3.1 we have where V 1/ d ( θ θ Np (, I p, (3.11 V = 1 I 1 JI 1,

14 Yaser Mehrali ad Majid Asadi provided that I is ivertible matrix, or equivaletly g (θ has ifimum value o parameter space Θ. I particular whe the support of X is R +, J ad I are give, respectively, by (3.1 J = { } { } T { } { } T θ h (x θ h (x θ h (x θ h (x, θ= θ θ= θ ad (3.13 I = θ E θ (X + θ= θ θ h (x θ= θ. I Theorem 3., the estimator V is a sample versio of V, see also Basu ad Lidsay [3]. It is also kow that the sample variace (3.11 is a robust estimator which is kow as the sadwich estimator, with I 1 as the bread ad J as the fillig [14]. I likelihood approach, the quatity I is the usual observed Fisher iformatio. Example 3.1. Let {X 1,..., X } be i.i.d. expoetial radom variables with probability desity fuctio f (x; λ = λe λx, x >, λ >. We simply have MCKLE of λ as λ = X. This estimator is a fuctio of liear combiatios of Xi s, ad so by strog law of large umbers (SLLN, λ is strogly cosistet for λ. Now, usig the cetral limit theorem (CLT ad delta method or usig Theorem 3., oe ca show that d ( λ λ N (, 5λ, 4 ad the asymptotic bias of λ is of order 1 ( λ : E λ = 15λ 8. It is well kow that the MLE of λ is λ m = 1/ X with asymptotic distributio d ( ( λm λ N, λ, ad the asymptotic bias of λ m is of order 1 ( λm : E λ = λ. Notice that usig asymptotic bias of λ, we ca fid some ubiasig factors to improve our estimator. Sice the M LE has iverse Gamma distributio,

Parameter estimatio based o cumulative Kullback-Leibler divergece 15 λ λ true S (λ 1. 1.5 1.1 1.15 MLE MCKLE UMLE UMCKLE 1 3 4 MLE MCKLE UMLE UMCKLE 1 3 4 5 1 3 4 5 Figure : λ/λtrue ad S ( λ as fuctios of sample size the ubiased estimator of λ is λ um = ( 1 / X [1]. I Liu approach a approximately ubiased estimator of λ is (3.14 λu = 8 8 + 15 X. Figure compares these estimators. I order to compare our estimator ad the MLE, we made a simulatio study i which we used samples of sizes 1 to 55 by 5 with 1 repeats, where we assumed that the true value of the model parameter is λ true = 5. The plots i Figure show that the MCKLE has more bias tha the MLE. It is evidet from the plots that the MCKLE i (3.14 which is approximately ubiased is very close to the ubiased MLE i the sese of biased ad variace. Remark 3.. I Example., ote that X has expoetial distributio. So, usig Example 3.1, oe ca easily fid asymptotic properties of θ i Laplace distributio. Example 3.. Let {X 1,..., X } be i.i.d. two parameter expoetial radom variables with probability desity fuctio f (x; µ, = 1 e (x µ/, x µ, µ R, >. If µ, the we have g (µ, = µ + + 1 (x i µ

16 Yaser Mehrali ad Majid Asadi ad MCKLE of µ ad are, respectively, µ = X X X, = X X, which are also ME s of (µ,. These estimators are fuctios of liear combiatios of X i s ad Xi s, ad hece by SLLN, ( µ, are strogly cosistet for (µ,. Now, by CLT ad delta method or usig Theorem 3., oe ca show that ( V 1/ µ µ d N (, I, where V = [ ] 1 1. 1 O the other had if µ <, the we get ( µ g (µ, = exp µ + 1 + k x i > i=k+1 x i x i µ i=k+1 x i ( Li (exp x i µ ( k Li exp x i ( µ, where Li ( is the dilogarithm fuctio. I this case, the MCKLE of µ ad ca be foud umerically. I the followig example, we show that i geeralized Pareto distributio while the MLE of the shape parameter of the model does ot exist oe ca use M CKLE to estimate the shape parameter. Example 3.3. Suppose that {X 1,..., X } are iid from geeralized Pareto distributio (GP D with c.d.f. { 1 (1 kx/ 1/k, if k, F (x;, k = 1 e x/, if k =, where >, k R, x < for k ad x /k for k >. For this distributio the MLE of the shape parameter k does ot exist for k (1, [11]. Let be fixed. After some algebra we get g (k = k + 1 1 h (x i, 1 < k /x (,

Parameter estimatio based o cumulative Kullback-Leibler divergece 17 where h (x = k [ kx + ( 1 kx ( ] log 1 kx, k, x, x, k =, x, k = x, ad MCKLE estimator k ca be foud umerically. It should be oted that i this case, for k 1, k does ot exist. Recetly Zhag [37] cosidered the estimatio of for k based o the likelihood method ad empirical Bayesia [36], [38]. Deotig the Zhag s estimator by k Zhag, the cited author shows that the performace of k Zhag is better tha other existig methods for 6 k 1/. I order to compare our estimator ( k MCKLE ad Zhag s estimator k Zhag, we evaluated them usig simulated samples of sizes 15,, 5, 1,, 5 ad 1 with 1 replicates, cosiderig differet true values of the populatio parameter as k =.75,.5,.5,,.5,.5, 1, 3, 5 ad 7. Tables 3.3 ad 3.3 compare bias ad root mea squared error (RM SE of estimators, respectively. It is evidet from Table 3.3 that for all values k >.5, k MCKLE has less bias tha k Zhag. Also for k =.5, = 15,, 5, 1, the performace of our estimator is better tha the Zhag s estimator. O the other had, it is see from table 3.3 that except for k =.75, = 1,, 5, 1, ad k =.5, = 5, 1, for all values of k, k MCKLE has less RMSE tha k Zhag. k -.75 -.5 -.5.5 Zhag MCKLE Zhag MCKLE Zhag MCKLE Zhag MCKLE Zhag MCKLE 15.478.384.71.136 -..147 -.41.141 -.15.761.185.714.55.181 -.113.1189 -.366.81 -.789.573 5.16.184.66.139 -.3.581 -.86.346 -.17.19 1.51.14.3.698 -.1.337 -.54.18 -.97.13.44.1135.5.49..9 -.8.13 -.5.56 5.14.845.8.93 -.1.1 -.13.43 -.4.1 1.1.687.7...57 -.6.3 -.1.1 k.5 1 3 5 7 Zhag MCKLE Zhag MCKLE Zhag MCKLE Zhag MCKLE Zhag MCKLE 15 -.185.566 -.416.36-1.8133.14-3.5561.1-5.4191 1 5 -.145.41 -.343.1-1.6568. -3.363 6 1 6-5.66 6 1 6 5 -.499.136 -.1687.33-1.339 -.4 -.883 1 1 5-4.574 3 1 8 1 -.8.55 -.979 -.4 -.9988 -. -.467 6 1 7-4.1576 1 1 -.89.5 -.6 -.1 -.851 -.1 -.1764 1 9-3.7953 3 1 1 5 -.5.5 -.396 -.1 -.6514 8 1 6-1.861 1 11-3.3789 4 1 15 1 -.8.1 -.33 -.1 -.5518 1 7-1.6659 5 1 13-3.168 < 1 16 Table 1: Biases of k MCKLE ad k Zhag for the GP D 4. AN EXTENSION OF M CKLE TO THE TYPE I CENSORED DATA I this sectio, we exted MCKLE for the case whe the data are collected i cesored type I scheme, i cotiuous case. Some authors such as Lim ad Park [18], Cherfi [8], Baratpour ad Habibi Rad [], Park ad Shi [7], Park et

18 Yaser Mehrali ad Majid Asadi k -.75 -.5 -.5.5 Zhag MCKLE Zhag MCKLE Zhag MCKLE Zhag MCKLE Zhag MCKLE 15.467.3968.44.367.345.73.893.64.618.185.471.3496.3543.86.33.34.565.1893.7.1516 5.54.38.167.188.1851.149.1573.174.135.83 1.1753.1863.151.1354.178.114.173.736.919.57.135.151.16.143.889.743.73.514.616.356 5.785.1154.674.758.565.498.46.3.374.16 1.55.957.47.597.395.364.319.7.55.149 k.5 1 3 5 7 Zhag MCKLE Zhag MCKLE Zhag MCKLE Zhag MCKLE Zhag MCKLE 15.84.1498.459.948 1.838.131 3.566.1 5.416.4.363.1198.3837.715 1.6671.77 3.3676.1 5.91.1 5.177.587.6.87 1.436.16.814.1 4.5764 9 1 7 1.84.367.1313.158 1.73.8.466 1 5 4.1595 3 1 9.564.39.889.93.831.3.1794 3 1 8 3.7969 1 1 1 5.336.139.568.49.6561.1 1.8641 1 1 3.38 1 1 13 1.8.93.4.31.555 8 1 6 1.6673 1 1 3.175 6 1 16 Table : RMSE s of k MCKLE ad k Zhag for the GP D al. [] Park ad Lim [3] ad Park ad Pakyari [5] studied some forms of KL divergeces i differet cesored data cases. Let T 1,..., T be i.i.d. oegative cotiuous radom variables from a c.d.f. F, p.d.f. f ad survival fuctio F. I a variety of applicatios i biostatistics ad life testig, we are oly able to observe X = mi (T, C where C is the costat cesorig poit. The desity fuctio of X ca be writte as f C (x =, o.w. f (x, < x < C, F (C, x = C, It is kow that (4.1 E θ (X = F (x dx. The authors i Lim ad Park [18] ad Park ad Shi [7] preseted two cesored versios of KL divergece of desity g C relative to f C, respectively, by ad I (g, f : C = K (,C (g : f = g (x log g (x dx + F (C G (C, f (x g (x log g (x 1 G (C dx + (1 G (C log f (x 1 F (C, which is oegative ad is mootoe i C. Park ad Lim [3] defied CKL for cesored data as CKL C (Ḡ F = Ḡ (x log Ḡ (x F (x [ Ḡ (x F (x ] dx.

Parameter estimatio based o cumulative Kullback-Leibler divergece 19 They also defied the CKL C of F relative to F as CKL C ( F F θ = = + F (x log F (x F (x; θ [ F (x F (x; θ ] dx F (x log F (x dx F (x; θ dx F (x dx, F (x log F (x; θ dx ad cosidered it i type II cesorship. Here we apply CKL C for type I cesored data. Usig (4.1 we get CKL C ( F F θ = F (x log F (x dx F (x log F (x; θ dx+e θ (X x. Cosider the parts of CKL C ( F F θ that depeds o θ ad defie (4. g (θ = E θ (X F (x log F (x; θ dx. The the MCKLE of θ is defied as ( θ = arg if CKL C F F θ = arg if g (θ, θ Θ θ Θ provided that E θ (X < ad g (θ is positive defiite; see also Park ad Lim [3]. If C, the g (θ i (4. reduces to (1.4 ad results i o-cesored case yield as special case. I order to study the properties of the estimator, followig o-cesored case, we have simple form of g (θ as (1.5, with h as (1.6. Let θ be MCKLE i cesored case by miimizig g i (4.. Here, MCKLE is also a special case of GEE with ψ (x, θ as (3.6, ad uder the coditios give i o-cesored case the MCKLE i cesored case is also cosistet. Asymptotic ormality of a cosistet sequece of M CKLE ca be established uder the coditios imposed i o-cesored case. We first cosider the special case where θ is scalar ad X 1,..., X are i.i.d. cotiuous radom variables. Theorem 4.1. GEE estimator. The For each, let θ be a MCKLE or equivaletly a d ( ( θ θ N, F,

Yaser Mehrali ad Majid Asadi where F = A/B, with A as (3.7 ad B = [ θ F (x; θ F (x; θ ] dx. Proof: The proof is similar to o-cesored case. The ext theorem shows asymptotic ormality of MCKLE, whe θ Θ R p is vector ad X 1,..., X are i.i.d. ad cotiuous. Theorem 4.. Uder coditios of Theorem 5.14 of Shao [3], V 1/ d ( θ θ Np (, I p, where V = B 1 AB 1, with A as (3.8 ad [ θ F ] [ (x; θ θ F (x; θ B = F (x; θ provided that B is ivertible matrix. ] T dx, Proof: The proof is similar to o-cesored case ad hece it is omitted. Remark 4.1. I Theorems 4.1 ad 4., if C (o cesorig, the results i o-cesored case yield as special cases. Now, followig Pawita [8], similar to o-cesored case the sample versio of the variace formula for the MCKLE i cesored case is as (3.11, with I ad J as (3.1 ad (3.13. Example 4.1. Let {X 1,..., X } be i.i.d. type I cesored Expoetial radom variables with probability desity fuctio λe λx, < x < C, f C (x = e λc, x = C,, o.w. where λ >. After some algebra, we have g (λ = 1 λ = 1 λ ( 1 e λc + ( 1 e λc + λ x, λ ( r C + λ r x (i

Parameter estimatio based o cumulative Kullback-Leibler divergece 1 λ^ 4 6 8 1 1 14 C=. C=.4 C=.6 C=.8 C=1..5.1.15..5.3 x Figure 3: λ as a decreasig fuctio of x ad λ ca be foud umerically as a decreasig fuctio of x, ad hece, by usig strog law of large umbers (SLLN, it is strogly cosistet. Figure 3 shows λ as a decreasig fuctio of x. Now, usig Theorem 4.1, oe ca show that ( λ λ d N (, F, where F = λ ( 5 e λc (λc + 1 e λc ( λ 3 C 3 + 3λ C + 4λC + 4 ( e λc (λ C + λc +. If C (o cesorig, the we obtai the results i o-cesored case. ACKNOWLEDGMENTS The authors would like to thak the Editor ad a aoymous reviewer for useful commets which led to improvig the expositio of this paper.

Yaser Mehrali ad Majid Asadi REFERENCES [1] Baratpour, S. ad Habibi Rad, A. (1. Testig goodess-of-fit for expoetial distributio based o cumulative residual etropy. Commuicatios i Statistics-Theory & Methods, 41, 8, 1387-1396. [] Baratpour, S. ad Habibi Rad, A. (16. Expoetiality test based o the progressive type II cesorig via cumulative etropy. Commuicatios i Statistics- Simulatio & Computatio, 45, 7, 65-637. [3] Basu, A. ad Lidsay, B. G. (1994. Miimum disparity estimatio for cotiuous models: efficiecy, distributios ad robustess. Aals of the Istitute of Statistical Mathematics, 46, 4, 683-75. [4] Basu, A.; Shioya, H. ad Park, C. (11. Statistical Iferece: the Miimum Distace Approach. CRC Press. [5] Broiatowski, M. (14. Miimum divergece estimators, maximum likelihood ad expoetial families. Statistics & Probability Letters, 93, 7-33. [6] Broiatowski, M. ad Keziou, A. (9. Parametric estimatio ad tests through divergeces ad the duality techique. Joural of Multivariate Aalysis, 1, 1, 16-36. [7] Cherfi, M. (11. Dual ϕ-divergeces estimatio i ormal models. arxiv preprit arxiv:118.999. [8] Cherfi, M. (1. Dual divergeces estimatio for cesored survival data. Joural of Statistical Plaig ad Iferece, 14, 7, 1746-1756. [9] Cherfi, M. (14. O bayesia estimatio via divergeces. Comptes Redus Mathematique, 35, 9, 749-754. [1] Forbes, C.; Evas, M., Hastigs, N. ad Peacock, B. (11. Statistical Distributios. Joh Wiley & Sos. [11] Grimshaw, S. D. (1993. Computig maximum likelihood estimates for the geeralized Pareto distributio. Techometrics, 35,, 185-191. [1] Hampel, F. R.; Rochetti, E. M.; Rousseeuw, P. J. & Stahel, W. A. (11. Robust Statistics: the Approach Based o Ifluece Fuctios. Joh Wiley & Sos. [13] Huber, P. J. (1964. Robust estimatio of a locatio parameter. The Aals of Mathematical Statistics, 35, 1, 73-11. [14] Huber, P. J. (1967. The behavior of maximum likelihood estimates uder ostadard coditios. Proceedigs of the fifth Berkeley Symposium o Mathematical Statistics & Probability. 1-33. [15] Huber, P. ad Rochetti, E. (9. Robust Statistics, Wiley: New York. [16] Hwag, I. ad Park, S. (13. O scaled cumulative residual Kullback-Leibler iformatio. Joural of the Korea Data & Iformatio Sciece Society, 4, 6, 1497-151. [17] Jiméez, R. ad Shao, Y. (1. O robustess ad efficiecy of miimum divergece estimators. Test, 1,, 41-48.

Parameter estimatio based o cumulative Kullback-Leibler divergece 3 [18] Lim, J. ad Park, S. (7. Cesored Kullback-Leibler iformatio ad goodess-of-fit test with type II cesored data. Joural of Applied Statistics, 34, 9, 151-164. [19] Lidsay, B. G. (1994. Efficiecy versus robustess: the case for miimum Helliger distace ad related methods. The Aals of Statistics,,, 181-1114. [] Liu, J. (7. Iformatio Theoretic Cotet ad Probability. Ph.D. Thesis, Uiversity of Florida. [1] Morales, D.; Pardo, L. ad Vajda, I. (1995. Asymptotic divergece of estimates of discrete distributios. Joural of Statistical Plaig & Iferece, 48, 3, 347-369. [] Park, S.; Choi, D. ad Jug, S. (14. Kullback-Leibler iformatio of the equilibrium distributio fuctio ad its applicatio to goodess of fit test. Commuicatios for Statistical Applicatios & Methods, 1,, 15-134. [3] Park, S. ad Lim, J. (15. O cesored cumulative residual Kullback-Leibler iformatio ad goodess-of-fit test with type II cesored data. Statistical Papers, 56, 1, 47-56. [4] Park, S.; Noughabi, H. A. ad Kim, I. (18. Geeral cumulative Kullback- Leibler iformatio. Commuicatios i Statistics-Theory & Methods, 47, 7, 1551-156. [5] Park, S. ad Pakyari, R. (15. Cumulative residual Kullback-Leibler iformatio with the progressively Type-II cesored data. Statistics & Probability Letters, 16, 87-94. [6] Park, S.; Rao, M. ad Shi, D. W. (1. O cumulative residual Kullback- Leibler iformatio. Statistics & Probability Letters, 8, 11, 5-3. [7] Park, S. ad Shi, M. (14. Kullback-Leibler iformatio of a cesored variable ad its applicatios. Statistics, 48, 4, 756-765. [8] Pawita, Y. (1. I all Likelihood: Statistical Modellig ad Iferece Usig Likelihood: Oxford Uiversity Press. [9] Qi, J. ad Lawless, J. (1994. Empirical likelihood ad geeral estimatig equatios. The Aals of Statistics,, 1, 3-35. [3] Rohatgi, V. K. ad Ehsaes Saleh, A. K. Md. (15. A Itroductio to Probability ad Statistics ( ed.. Joh Wiley. New York. [31] Serflig, R. (198. Approximatio Theorems of Mathematical Statistics Joh Wiley. New York. [3] Shao, J. (3. Mathematical Statistics ( ed.. Spriger, New York, USA. [33] va der Vaart, A. W. (. Asymptotic Statistics. Cambridge uiversity press. [34] Yari, G.; Mirhabibi, A. ad Saghafi, A. (13. Estimatio of the Weibull parameters by Kullback-Leibler divergece of Survival fuctios. Appl. Math. If. Sci., 7, 1, 187-19. [35] Yari, G. ad Saghafi, A. (1. Ubiased Weibull Modulus Estimatio Usig Differetial Cumulative Etropy. Commuicatios i Statistics-Simulatio & Computatio, 41, 8, 137-1378.

4 Yaser Mehrali ad Majid Asadi [36] Zhag, J. (7. Likelihood momet estimatio for the geeralized Pareto distributio. Australia & New Zealad Joural of Statistics, 49, 1, 69-77. [37] Zhag, J. (1. Improvig o Estimatio for the Geeralized Pareto Distributio. Techometrics, 5, 3, 335-339. [38] Zhag, J. ad Stephes, M. A. (9. A ew ad efficiet estimatio method for the Geeralized Pareto Distributio. Techometrics, 51, 3, 316-35.