arxiv: v4 [stat.co] 18 Mar 2018

Similar documents
arxiv: v3 [stat.co] 22 Jun 2017

Bayesian analysis of three parameter absolute. continuous Marshall-Olkin bivariate Pareto distribution

Discriminating Between the Bivariate Generalized Exponential and Bivariate Weibull Distributions

On Sarhan-Balakrishnan Bivariate Distribution

Bivariate Geometric (Maximum) Generalized Exponential Distribution

Estimation of the Bivariate Generalized. Lomax Distribution Parameters. Based on Censored Samples

On Weighted Exponential Distribution and its Length Biased Version

Hybrid Censoring; An Introduction 2

A New Two Sample Type-II Progressive Censoring Scheme

Generalized Exponential Geometric Extreme Distribution

Weighted Marshall-Olkin Bivariate Exponential Distribution

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Some conditional extremes of a Markov chain

A Convenient Way of Generating Gamma Random Variables Using Generalized Exponential Distribution

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

On Five Parameter Beta Lomax Distribution

Estimation Under Multivariate Inverse Weibull Distribution

Bivariate Sinh-Normal Distribution and A Related Model

Univariate and Bivariate Geometric Discrete Generalized Exponential Distributions

A New Class of Positively Quadrant Dependent Bivariate Distributions with Pareto

COM336: Neural Computing

The Expectation-Maximization Algorithm

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Type-II Progressively Hybrid Censored Data

Parametric Techniques

An Extension of the Freund s Bivariate Distribution to Model Load Sharing Systems

On Bivariate Birnbaum-Saunders Distribution

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Parametric Techniques Lecture 3

Bivariate Weibull-power series class of distributions

On Bivariate Discrete Weibull Distribution

An Extension of the Generalized Exponential Distribution

New Classes of Multivariate Survival Functions

Stat 5101 Lecture Notes

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Lecture 18: Learning probabilistic models

An Introduction to Expectation-Maximization

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

An Extended Weighted Exponential Distribution

Marginal Specifications and a Gaussian Copula Estimation

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Extreme Value Analysis and Spatial Extremes

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

inferences on stress-strength reliability from lindley distributions

Bayesian Inference for Clustered Extremes

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Geometric Skew-Normal Distribution

Burr Type X Distribution: Revisited

Analysis of Middle Censored Data with Exponential Lifetime Distributions

Bayesian Inference and Life Testing Plan for the Weibull Distribution in Presence of Progressive Censoring

Bayes Estimation and Prediction of the Two-Parameter Gamma Distribution

Quantitative Biology II Lecture 4: Variational Methods

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Financial Econometrics and Volatility Models Copulas

The Instability of Correlations: Measurement and the Implications for Market Risk

Point and Interval Estimation of Weibull Parameters Based on Joint Progressively Censored Data

Statistical Data Analysis Stat 3: p-values, parameter estimation

Tail Dependence of Multivariate Pareto Distributions

A Class of Weighted Weibull Distributions and Its Properties. Mervat Mahdy Ramadan [a],*

Does k-th Moment Exist?

Lecture 4: Probabilistic Learning

Random Numbers and Simulation

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

A Conditional Approach to Modeling Multivariate Extremes

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

CSC321 Lecture 18: Learning Probabilistic Models

A simple analysis of the exact probability matching prior in the location-scale model

Exam C Solutions Spring 2005

The Bayes classifier

Practice Problems Section Problems

Copulas. Mathematisches Seminar (Prof. Dr. D. Filipovic) Di Uhr in E

Subject CS1 Actuarial Statistics 1 Core Principles

Generalized Exponential Distribution: Existing Results and Some Recent Developments

STA 4273H: Statistical Machine Learning

Machine Learning, Fall 2012 Homework 2

Introduction of Shape/Skewness Parameter(s) in a Probability Distribution

Multivariate Birnbaum-Saunders Distribution Based on Multivariate Skew Normal Distribution

Modeling Spatial Extreme Events using Markov Random Field Priors

Multivariate Survival Data With Censoring.

MODEL BASED CLUSTERING FOR COUNT DATA

Some Recent Results on Stochastic Comparisons and Dependence among Order Statistics in the Case of PHR Model

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3 September 1

Bayesian Methods for Machine Learning

Estimation of parametric functions in Downton s bivariate exponential distribution

POLI 8501 Introduction to Maximum Likelihood Estimation

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

A measure of radial asymmetry for bivariate copulas based on Sobolev norm

Estimating the parameters of hidden binomial trials by the EM algorithm

Exam 2. Jeremy Morris. March 23, 2006

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

BAYESIAN DECISION THEORY

9 Multi-Model State Estimation

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Transcription:

An EM algorithm for absolute continuous bivariate Pareto distribution Arabin Kumar Dey, Biplab Paul and Debasis Kundu arxiv:1608.02199v4 [stat.co] 18 Mar 2018 Abstract: Recently [3] used EM algorithm to estimate singular Marshall-Olkin bivariate Pareto distribution. We describe absolute continuous version of this distribution. We study estimation of the parameters by EM algorithm both in presence and without presence of location and scale parameters. Some innovative solutions are provided for different problems arised during implementation of EM algorithm. We also address issues related to bayesian analysis and proposed methods to compute bayes estimate of the parameters. A real-life data analysis is also shown for illustrative purpose. Keywords and phrases: Joint probability density function ; Absolute Continuous Distribution ; Bivariate pareto distribution ; EM algorithm. 1. Introduction Recently [3] used EM algorithm to estimate singular Marshall-Olkin bivariate Pareto distribution. There is no paper on formulation and estimation of absolute continuous version of this distribution. Few recent works ([11], [12], [14], [18]) include Marshall-Olkin bivariate distribution. Estimation through EM algorithm can be seen there. But estimation of absolute continuous version of bivariate pareto through EM algorithm is not straight forward. The problem becomes more complicated when we consider location and scale parameters in our problem. Usual gradient descent does not work as bivariate likelihood is a discontinuous function with respect to location and scale parameters. We suggest a novel way to handle all related computational problems in most efficient way. The distribution can be used as an alternative model for the data transformed via peak over threshold method. Bivariate Pareto has wide application in modeling data related to finance, insurance, environmental sciences and internet network. The dependence structure of absolute continuous version of bivariate Pareto can also be described by well-known Marshall-Olkin copula [[19], [15]]. The methodology proposed in this paper works for moderately large sample. [5] proposed an absolute continuous bivariate exponential distribution. Later [13] introduced an absolute continuous bivariate generalized exponential distribution, whose marginals are not generalized exponential distributions. Many papers related to multivariate extreme value theory talk about absolute continuous/ singular bivariate and multivariate Pareto distribution [[26], [27], [23], [20]]. Development of multivariate Pareto is an asymptotic process in extreme value Corresponding Author 1

Dey et al./a sample document 2 theory. None of the paper mentioned specifically anything about Marshal- Olkin form of the distribution or its estimation through EM algorithm in presence of location and scale parameters. A novel EM cum Gradient descent is the major contribution of this paper. However we address few modifications in implementation of EM even in case of three parameter set up. The paper contains mix of several issues and implementational issues which together provides our final proposal algorithm. We show that our proposed algorithm works quite well for moderately large sample size. We arrange the paper in the following way. In section 2 we keep the formulation of Marshall-Olkin bivariate Pareto and its absolute continuous version. In section 3, we describe the EM algorithm for singular case. Different extension of EM algorithm for absolute continuous version is available in section 4. Some simulation results show the performance of the algorithm in section 5. In section 6 we show data analysis. Finally we conclude the paper in section 7. 2. Formulation of Marshal-Olkin bivariate pareto A random variable X is said to have Pareto of second kind, i.e. X P a(ii)(µ, σ, α) if it has the survival function F X (x; µ, σ, α) = P (X > x) = (1 + x µ σ ) α and the probability density function (pdf) f(x; µ, σ, α) = α σ (1 + x µ σ ) α 1 with x > µ R, σ > 0 and α > 0. Let U 0, U 1 and U 2 be three independent univariate pareto distributions P a(ii)(0, 1, α 0 ), P a(ii)(µ 1,, α 1 ) and P a(ii)(µ 2, σ 2, α 2 ). We define X 1 = min{ U 0 + µ 1, U 1 } and X 2 = min{σ 2 U 0 + µ 2, U 2 }. We can show that (X 1, X 2 ) jointly follow Marshall-Olkin bivariate Pareto distribution of second kind and we denote it as BV P A(µ 1,, µ 2, σ 2, α 0, α 1, α 2 ). The joint distribution can be given by where f 1 (x 1, x 2 ) f(x 1, x 2 ) = f 2 (x 1, x 2 ) f 0 (x) if x1 µ1 if x1 µ1 if x1 µ1 < x2 µ2 σ 2 > x2 µ2 σ 2 = x2 µ2 σ 2 = x f 1 (x 1, x 2 ) = α 1 (α 0 + α 2 )(1 + (x 2 µ 2 ) ) (α0+α2+1) (1 + (x 1 µ 1 ) σ 2 σ 2 ) (α1+1) f 2 (x 1, x 2 ) = α 2 (α 0 + α 1 )(1 + x 1 µ 1 ) (α0+α1+1) (1 + x 2 µ 2 σ 2 σ 2 ) (α2+1)

Dey et al./a sample document 3 (a) ξ 1 (b) ξ 2 (c) ξ 3 (d) ξ 4 Fig 1. Density plots of BVPAC for different sets of parameters f 0 (x) = α 0 (1 + x µ 1 ) (α1+α2+α0+1) We denote absolute continuous part as BVPAC. Surface and contour plots of the absolute continuous part of the pdf are shown in Figure-1 and Figure-2 respectively. The following four different sets of parameters provide four subfigures in each figure. ξ 1 : µ 1 = 0, µ 2 = 0, = 1, σ 2 = 0.5, α 0 = 1, α 1 = 0.3, α 2 = 1.4; ξ 2 : µ 1 = 1, µ 2 = 2, = 0.4, σ 2 = 0.5, α 0 = 2, α 1 = 1.2, α 2 = 1.4; ξ 3 : µ 1 = 0, µ 2 = 0, = 1.4, σ 2 = 0.5, α 0 = 1, α 1 = 1, α 2 = 1.4; ξ 4 : µ 1 = 0, µ 2 = 0, = 1.4, σ 2 = 0.5, α 0 = 2, α 1 = 0.4, α 2 = 0.5.

Dey et al./a sample document 4 (a) ξ 1 (b) ξ 2 (c) ξ 3 (d) ξ 4 Fig 2. Contour plots of the density of BVPAC for different sets of parameters

2.1. Absolute Continuous Extension Dey et al./a sample document 5 We assume that the random vector (Z 1, Z 2 ) follows an absolute continuous bivariate pareto distribution (as described in [12]). The joint pdf of BVPAC can be written as = f BV P AC (z 1, z 2 ) cf 1 (z 1, z 2 ) = cf P A (z 1 ; µ 1,, α 1 )f P A (z 2 ; µ 2, σ 2, α 0 + α 2 ) if (z1 µ1) cf 2 (z 1, z 2 ) = cf P A (z 1 ; µ 1,, α 0 + α 1 )f P A (z 2 ; µ 2, σ 2, α 2 ) if (z1 µ1) < (z2 µ2) σ 2 > (z2 µ2) σ 2 where c is a normalizing constant and c = α0+α1+α2 α 1+α 2 and f P A ( ) denotes the pdf of univariate distribution. Likelihood function for the parameters of the BVPAC can be written as L(µ 1, µ 2,, σ 2, α 0, α 1, α 2 ) = n ln(α 0 + α 1 + α 2 ) n ln(α 1 + α 2 ) + n 1 ln α 1 + n 1 ln(α 0 + α 2 ) n 1 ln n 1 ln σ 2 (α 0 + α 2 + 1) i I 1 ln(1 + z 2i µ 2 σ 2 ) (α 1 + 1) i I 1 ln(1 + z 1i µ 1 ) n 2 n 2 ln σ 2 + n 2 ln α 2 + n 2 ln(α 0 + α 1 ) (α 0 + α 1 + 1) i I 2 ln(1 + z 1i µ 1 ) (α 2 + 1) i I 2 ln(1 + z 2i µ 2 σ 2 ) The three parameter or likelihood function taking µ 1 = µ 2 = 0 and =

Dey et al./a sample document 6 σ 2 = 1 corresponding to this pdf can be given by L(α 0, α 1, α 2 ) = (n 1 + n 2 ) ln(α 0 + α 1 + α 2 ) (n 1 + n 2 ) ln(α 1 + α 2 ) + ln f P A (z 1i ; 0, 1, α 1 ) + ln f P A (z 2i ; 0, 1, α 0 + α 2 ) i I 1 i I 1 + i I 2 ln f P A (z 1i ; 0, 1, α 0 + α 1 ) + i I 2 ln f P A (z 2i ; 0, 1, α 2 ) = (n 1 + n 2 ) ln(α 0 + α 1 + α 2 ) (n 1 + n 2 ) ln(α 1 + α 2 ) + n 1 ln α 1 (α 1 + 1) i I 1 ln(1 + z 1i ) + n 1 ln(α 0 + α 2 ) (α 0 + α 2 + 1) i I 1 ln(1 + z 2i ) + n 2 ln(α 0 + α 1 ) (α 0 + α 1 + 1) i I 2 ln(1 + z 1i ) + n 2 ln α 2 (α 2 + 1) i I 2 ln(1 + z 2i ) Marginal distribution of Z 1 and Z 2 can be obtained by routine calculation. Expressions are as follows : f Z1 (z 1 ; µ 1,, α 0 + α 1 ) = cf P A (z 1 ; µ 1,, α 0 + α 1 ) c (α 0 + α 1 + α 2 ) f P A(z 1 ; µ 1,, α 0 + α 1 + α 2 ) f Z2 (z 2 ; µ 2, σ 2, α 0 + α 2 ) = cf P A (z 2 ; µ 2, σ 2, α 0 + α 2 ) c (α 0 + α 1 + α 2 ) f P A(z 2 ; µ 2, σ 2, α 0 + α 1 + α 2 ) respectively, where c = (α0+α1+α2) (α 1+α 2). α 0 α 0 3. EM-algorithm for singular three parameter BVPA distribution Now we divide our data into three parts :- I 0 = {(x 1i, x 2i ) : x 1i = x 2i },I 1 = {(x 1i, x 2i ) : x 1i < x 2i }, I 2 = {(x 1i, x 2i ) : x 1i > x 2i } We do not know if X 1 is U 0 or U 1. We also do not know if X 2 is U 0 or U 2. So we introduce two new random variables ( 1 ; 2 ) as 1 = { 0 if X 1 = U 0 and 2 = 1 if X 1 = U 1 { 0 if X 2 = U 0 2 if X 2 = U 2 The 1 and 2 are the missing values of the E-M algorithm. To calculate the E-step we need the conditional distribution of 1 and 2.

Dey et al./a sample document 7 Using the definition of X 1, X 2 and 1, 2 we have : For group I 0, both 1 and 2 are known, 1 = 2 = 0 For group I 1, 1 is known, 2 is unknown, 1 = 1, 2 = 0 or 2 Therefore we need to find out u 1 = P ( 2 = 0 I 1 )and u 2 = P ( 2 = 2 I 1 ) For group I 2, 2 is known, 1 is unknown, 1 = 0 or 1, 2 = 2 Moreover we need w 1 = P ( 1 = 0 I 2 ) and w 2 = P ( 1 = 1 I 2 ) Since, each posterior probability corresponds to one of the ordering from Table-3, we calculate u 1, u 2, w 1, w 2 using the probability of appropriate ordering. Ordering (X 1, X 2 ) Group U 0 < U 1 < U 2 (U 0, U 0 ) I 0 U 0 < U 2 < U 1 (U 0, U 0 ) I 0 U 1 < U 0 < U 2 (U 1, U 0 ) I 1 U 1 < U 2 < U 0 (U 1, U 2 ) I 1 U 2 < U 0 < U 1 (U 0, U 2 ) I 2 U 2 < U 1 < U 0 (U 1, U 2 ) I 2 We have the following expressions for u 1, u 2, w 1, w 2 Now we have u 1 = u 2 = w 1 = w 2 = P (U 1 < U 0 < U 2 ) = P (U 1 < U 0 < U 2 ) P (U 1 < U 0 < U 2 ) + P (U 1 < U 2 < U 0 ) P (U 1 < U 2 < U 0 ) P (U 1 < U 0 < U 2 ) + P (U 1 < U 2 < U 0 ) P (U 2 < U 0 < U 1 ) P (U 2 < U 0 < U 1 ) + P (U 2 < U 1 < U 0 ) P (U 2 < U 1 < U 0 ) P (U 2 < U 0 < U 1 ) + P (U 2 < U 1 < U 0 ) 0 [1 (1 + x) α1 ]α 0 (1 + x) (α0+1) (1 + x) (α2) = α 0 (1 + x) (α0+α2+1) (1 + x) (α0+α1+α2+1) = 0 α 0 α 1 (α 0 + α 2 + 1)(α 0 + α 1 + α 2 + 1) Using this above result we evaluate the other probabilites to get values of u 1, u 2, w 1, w 2 as : u 1 = α0 α 0+α 2 and u 2 = α2 α 0+α 2 w 1 = α0 α 0+α 1 and w 2 = α1 α 0+α 1

3.1. Pseudo-likelihood expression Dey et al./a sample document 8 We define n 0, n 1, n 2 as : n 0 = I 0, n 1 = I 1, n 2 = I 2. where I j for j = 0, 1, 2 denotes the number of elements in the set I j. Now the pseudo log-likelihood can be written down as L(α 0, α 1, α 2 ) = α 0 ( i I 0 ln(1 + x i ) + i I 2 ln(1 + x 1i ) + i I 1 ln(1 + x 2i )) + (n 0 + u 1 n 1 + w 1 n 2 ) ln α 0 α 1 ( ln(1 + x i ) + ln(1 + x 1i )) i I 0 i I 1 I 2 + (n 1 + w 2 n 2 ) ln α 1 α 2 ( ln(1 + x i ) + ln(1 + x 2i )) i I 0 i I 1 I 2 + (n 2 + u 2 n 1 ) ln α 2 (1) Therefore M-step involves maximizing (1) with respect to α 0, α 1, α 2 at 0 = n 0 + u (t) 1 n 1 + w (t) 1 n 2 i I 0 ln(1 + x i ) + i I 2 ln(1 + x 1i ) + i I 1 ln(1 + x 2i ). (2) 1 = (n 1 + w (t) 2 n 2) i I 0 ln(1 + x i ) + i (I 1 I 2) ln(1 + x 1i) n 2 + u (t) 2 2 = n 1 ( i I 0 ln(1 + x i ) + i (I ln(1 + x 1 I 2) 2i)) Therefore the algorithm can be given as (3) (4) Algorithm 1 EM procedure for bivariate Pareto distribution 1: while Q/Q < tol do 2: Compute u (i) 1, u(i) 2, w(i) 1, w(i) 2 from α (i) 0, α(i) 1, α(i) 2. 3: Update α (i+1) 0, α (i+1) 1, α (i+1) 2 using Equation (2), (3) and (4). 4: Calculate Q for the new iterate. 5: end while 4. EM-algorithm for absolute continuous case We adopt similar existing methods described in [12], we replace the missing observations n 0 falling within I 0 and each observation U 0 by its estimate ñ 0 and E(U 0 U 0 < min{u 1, U 2 }). Note that in this case, n 0 is a random variable which α has negative binomial with parameters (n 1 + n 2 ) and 1+α 2 α 0+α 1+α 2. α Clearly ñ 0 = (n 1 + n 2 ) 0 1 α 1+α 2, E(U 0 U 0 < min{u 1, U 2 }) = (α 0+α 1+α 2 1) An important restriction for this approximation is that we have to ensure α 0 + α 1 + α 2 > 1. The restriction will ensure the existence of the above expectation.

Dey et al./a sample document 9 Therefore under above mentioned restriction, we write the pseudo likelihood function by replacing the missing observations with its expected value. L(α 0, α 1, α 2 ) = α 0 (ñ 0 ln(1 + a) + i I 2 ln(1 + x 1i ) + i I 1 ln(1 + x 2i )) + (ñ 0 + u 1 n 1 + w 1 n 2 ) ln α 0 α 1 (ñ 0 ln(1 + a) + ln(1 + x 1i )) i I 1 I 2 + (n 1 + w 2 n 2 ) ln α 1 α 2 (ñ 0 ln(1 + a) + ln(1 + x 2i )) i I 1 I 2 + (n 2 + u 2 n 1 ) ln α 2 (5) Therefore M-step involves maximizing (5) with respect to α 0, α 1, α 2 at ñ 0 + u (t) 1 0 = n 1 + w (t) 1 n 2 ñ 0 ln(1 + a) + i I 2 ln(1 + x 1i ) + i I 1 ln(1 + x 2i ). (6) (n 1 + w (t) 2 1 = n 2) ñ 0 ln(1 + a) + i (I ln(1 + x 1 I 2) 1i) (7) 2. n 2 + u (t) 2 2 = n 1 (ñ 0 ln(1 + a) + i (I ln(1 + x 1 I 2) 2i)) Therefore the modified EM algorithm steps would be according to Algorithm- (8) Algorithm 2 Modified EM procedure for three parameter absolute continuous bivariate Pareto distribution 1: while Q/Q < tol do 2: Compute u (i) 1, u(i) 2, w(i) 1, w(i) 2, ñ 0, a from α (i) 0, α(i) 1, α(i) 2. 3: Update α (i+1) 0, α (i+1) 1, α (i+1) 2 using Equation (6), (7) and (8). 4: Calculate Q for the new iterate. 5: end while Important Remark : The above method fails when α 0 + α 1 + α 2 > 1. 4.1. Proposed Modification : To make the algorithm valid for any range of parameter, instead of estimating U 0 we estimate ln(1 + U 0 ) conditional on U 0 < min{u 1, U 2 }. Now since ln(x) is an increasing function in x, we condition on the equivalent event ln(1 + U 0 ) < min{ln(1 + U 1 ), ln(1 + U 2 )}. Therefore we replace the unknown missing information n 0 and ln(1 + U 0 ) by α 0 ñ 0 = (n 1 + n 2 ) α 1 + α 2

and Dey et al./a sample document 10 a = E(ln(1 + U 0 ) ln(1 + U 0 ) < min{ln(1 + U 1 ), ln(1 + U 2 )}) = 1 (α 0 + α 1 + α 2 ) respectively. Therefore M-step involves maximizing (5) with respect to α 0, α 1, α 2 at ñ 0 0 = + u(t) 1 n 1 + w (t) 1 n 2 ñ 0a + i I 2 ln(1 + x 1i ) + i I 1 ln(1 + x 2i ). (9) (n 1 + w (t) 2 1 = n 2) ñ 0a + i (I ln(1 + x 1 I 2) 1i) (10) n 2 + u (t) 2 2 = n 1 (ñ 0a + i (I ln(1 + x 1 I 2) 2i)) (11) Therefore the final EM steps for three parameters can be kept according to Algorithm-3. Algorithm 3 Final modified EM procedure for three parameter absolute continuous bivariate Pareto distribution 1: while Q/Q < tol do 2: Compute u (i) 1, u(i) 2, w(i) 1, w(i) 2, n 0, a from α (i) 0, α(i) 1, α(i) 2. 3: Update α (i+1) 0, α (i+1) 1, α (i+1) 2 using Equation (9), (10) and (11). 4: Calculate Q for the new iterate. 5: end while 4.2. Algorithm for seven parameter absolute continuous Pareto distribution For seven parameter case we first estimate the location parameters from the marginal distributions and keep it fixed. Estimates of location parameters can be replaced by minimum of the marginals of the data. Our EM algorithm update scale parameters and α 0, α 1, α 2 at every iteration. We use one step ahead gradient descend of and σ 2 based on likelihood of its marginal density. We can not use the bivariate likelihood function in gradient descend updates of and σ 2 as the likelihood function may not be differentiable with respect to and σ 2. Once we get an update of the scale parameters, we first fix I 1 and I 2 and try to mimic the EM steps for α 0, α 1 and α 2 discussed in previous section. We define (x1i ˆµ1) (x2i ˆµ2) z 1i = ˆ and z 2i = ˆσ 2. If estimates of location and scale parameters are correct (Z 1i, Z 2i ) follow a bivariate Pareto distribution with parameter sets (0, 0, 1, 1, α 0, α 1, α 2 ). But one step ahead scale parameter won t be a correct estimate of scale parameters, however we assume approximate distribution of (Z 1i, Z 2i ) would be Pareto with parameter sets (0, 0, 1, 1, α 0, α 1, α 2 ).

Dey et al./a sample document 11 Therefore M-step expression would be same as before : 0 = n 0 + u (t) 1 n 1 + w (t) 1 n 2 i I 0 ln(1 + z 1i ) + i I 2 ln(1 + z 1i ) + i I 1 ln(1 + z 2i ). (12) 1 = (n 1 + w (t) 2 n 2) i I 0 ln(1 + z 1i ) + i (I 1 I 2) ln(1 + z 1i) n 2 + u (t) 2 2 = n 1 ( i I 0 ln(1 + z 1i ) + i (I ln(1 + z 1 I 2) 2i)) (13) (14) We make similar modifications of the above expressions. As usual observations within I 0 and cardinality of I 0 would be unknown in this case. We estimate i I 0 ln(1 + z i ) by n 0a 0, where a 0 = E(log(1 + U 0 ) log(1 + U 0 ) < 1 min{log(1 + U 1 ), log(1 + U 2 )}) = (α and 0+α 1+α 2) n 0 by ñ 0 = (n α 1 + n 2 ) 0 α 1+α 2. Therefore final M-step expression would be same as before : ñ 0 0 = + u(t) 1 n 1 + w (t) 1 n 2 ñ 0 a 0 + i I 2 ln(1 + z 1i ) + i I 1 ln(1 + z 2i ). (15) (n 1 + w (t) 2 1 = n 2) ñ 0 a 0 + i (I ln(1 + z 1 I 2) 1i) (16) n 2 + u (t) 2 2 = n 1 (ñ 0 a 0 + i (I ln(1 + z 1 I 2) 2i)) (17) The key intuition of this algorithm is very similar to stochastic gradient descent. As we increase the number of iteration, EM steps try to force the three parameters α 0, α 1, α 2 to pick up the right direction starting from any value and gradient descent steps of and σ 2 gradually ensure them to roam around the actual values. One of the drawback with this approach is that it considers too many approximations. However this algorithm works even for moderately large sample sizes. Sometimes it takes lot of time to converge or roam around the actual value for some really bad sample. Probability of such events are very low. We stop the calculation after 2000 iteration in such cases. Our numerical result in next section shows that this does not make any significant effect in the calculation of mean square error. Therefore algorithmic steps of our proposed algorithm for seven parameter absolute continuous bivariate Pareto distribution can be kept according to Algorithm-4. 5. Numerical Results We use package R 3.2.2 to perform the estimation procedure. All the programs will be available to author on request. We generate samples of different sizes

Dey et al./a sample document 12 Algorithm 4 EM procedure for seven parameter absolute continuous bivariate Pareto distribution 1: Take minimum of the marginals as estimates of the location parameters. Initialize other parameters. 2: while Q/Q < tol do 3: Compute one step ahead gradient descent update of scale parameters based on likelihood function of the marginals. 4: Fix I 1 and I 2 based on the estimated location and scale parameters. 5: Compute u (i) 1, u(i) 2, w(i) 1, w(i) 2, n 0, a 0 from α(i) 0, α(i) 1, α(i) 2. 6: Update α (i+1) 0, α (i+1) 1, α (i+1) 2 using Equation (15), (16) and (17). 7: Calculate Q for the new iteration. 8: end while to calculate the average estimates based on 1000 simulations. We also calculate mean square error and parametric bootstrap confidence interval for all parameters. The result shown in Table-1 deals only with three unknown parameters following Algorithm-3. We start EM algorithm taking initial guesses for the parameters as α 0 = 2, α 1 = 2.2, α 2 = 2.4. We try other initial guesses, but average estimates and MSEs are same. Estimates are calculated based on sample sizes n = 50, 150, 250, 350, 450 respectively. We use stopping criteria as absolute value of likelihood changes with respect to previous likelihood at each iteration. In stopping criteria we use tolerance value as 0.00001. The results shown here are based on modified algorithm which works for any choice of parameters within its usual range. The average estimates (AE), mean squared errors (MSE) are obtained based on 1000 replications. Table-2 provides the estimates in presence of location and scale parameters following Algorithm-4. In this case, we take sample size n = 450, 550, 1000, 1500. However the algorithm works even for smaller sample sizes, although mean square errors are little higher. Since EM algorithm starts after plug-in the estimates of location parameters as minimum of the maginals, we take initial values of other parameters as α 0 = 2, α 1 = 1.2, α 2 = 1.4, = 0.6, σ 2 = 0.7. We find average estimates are closer to true values of the parameters. However parametric bootstrap confidence interval does not work for location and scale parameters as it never contains the true parameters.

Dey et al./a sample document 13 parameters α 0 α 1 α 2 n 50 (AE) 1.634 0.678 0.807 MSE 0.594 0.278 0.370 Parametric Bootstrap [0.561, 2.733] [0.0003, 1.479] [0.0004, 1.759] Confidence Interval (CI) 150 (AE) 1.865 0.4944abs 0.624 MSE 0.233 0.089 0.125 Parametric Bootstrap [1.023, 2.711] [0.018, 1.066] [0.021, 1.267] Confidence Interval (CI) 250 (AE) 1.9227 0.467 0.577 MSE 0.125 0.054 0.074 Parametric Bootstrap [1.186, 2.630] [0.044, 0.948] [0.047, 1.150] Confidence Interval (CI) 350 (AE) 1.935 0.428 0.543 MSE 0.09 0.033 0.050 Parametric Bootstrap [1.371, 2.595] [0.064, 0.800] [0.087, 0.976] Confidence Interval (CI) 450 (AE) 1.966 0.427 0.528 MSE 0.06 0.027 0.039 Parametric Bootstrap [1.505, 2.486] [0.121, 0.736] [0.138, 0.898] Confidence Interval (CI) Table 1 The average estimates (AE), the Mean Square Error (MSE) and parametric bootstrap confidence interval for α 0 = 2, α 1 = 0.4 and α 2 = 0.5 parameters µ 1 µ 2 n 450 AE 0.101 0.101 0.753 MSE 0.0000038 0.0000023 0.01203 450 σ 2 α 0 α 1 α 2 AE 0.804 1.820 0.458 0.626 MSE 0.0159 0.09738 0.02546 0.0716 550 parameters µ 1 µ 2 AE 0.10117 0.10091 0.762 MSE 0.0000026 0.0000016 0.012 550 σ 2 α 0 α 1 α 2 AE 0.807 1.820 0.466 0.632 MSE 0.014 0.095 0.0261 0.0716 1000 µ 1 µ 2 AE 0.1006574 0.1004984 0.817 MSE 0.000000848 0.00000050 0.0085 1000 σ 2 α 0 α 1 α 2 AE 0.793 1.855 0.481 0.631 MSE 0.012 0.0655 0.0260 0.06122 1500 parameters µ 1 µ 2 AE 0.1004 0.1003 0.8095 MSE 0.00000039 0.000000215 0.008423629 1500 σ 2 α 0 α 1 α 2 AE 0.8213 1.8892 0.4790 0.6136 MSE 0.008 0.038 0.033 0.0302 Table 2 The average estimates (AE), the Mean Square Error (MSE) for µ 1 = 0.1, µ 2 = 0.1, = 0.8, σ 2 = 0.8, α 0 = 2, α 1 = 0.4 and α 2 = 0.5 5.1. Data Set 1 : The data set is taken from https://archive.ics.uci.edu/ml/machine-learningdatabases. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope. The data set contains related measurements. We extract a part of the data for bivariate modeling. We consider only measurements related to female population

Dey et al./a sample document 14 (a) ξ 1 (b) ξ 2 Fig 3. Survival plots for two marginals of the transformed dataset Fig 4. Two dimensional density plots for the transformed dataset where one of the variable is Length as Longest shell measurement and other variable is Diameter which is perpendicular to length. We use peak over threshold method on this data set. From [7], we know that peak over threshold method on random variable U provides polynomial generalized Pareto distribution for any x 0 with 1 + log(g(x 0 )) (0, 1) i.e. P (U > tx 0 U > x 0 ) = t α, t 1 where G( ) is the distribution function of U. We choose appropriate t and x 0 so that data should behave more like Pareto distribution. The transformed data set does not have any singular component. Therefore one possible assumption can be absolute continuous Marshall Olkin bivariate Pareto. We also verify our assumption by plotting empirical two dimensional density plot in Figure-4 which resembles closer to the surface of Marshall-Olkin bivariate Pareto distribution. The estimated parameters can be given as µ 1 = 10.855, µ 2 = 8.632, = 2.124, σ 2 = 1.711, α 0 = 3.124, α 1 = 1.743 and α 2 = 1.602. We can also calculate the mean square error by parametric bootstrap method. We calculate the MSEs

Dey et al./a sample document 15 based on 1000 bootstrap sample. The corresponding MSE s are 8.129 10 6, 5.759 10 6, 0.632, 0.380, 0.759, 0.598, 0.492 respectively. 6. Conclusion: We observe successful implementation of EM algorithm for absolute continuous bivariate Pareto distribution. The approximation process runs in multiple stages. Therefore the algorithm works better for larger sample size. Estimation procedure even in case of three parameters is not straightforward. The paper also shows some innovative approach to handle the estimation in case of location and scale parameters. But that s not a problem as it will roam around closer to the actual value. Since the bivariate likelihood function is discontinuous with respect to location and scale parameters, construction of asymptotic confidence interval becomes difficult. However parametric bootstrap confidence interval provides useful solution for this problem. The work of this paper can be further used for discrimination of several models. The work is in progress. References [1] Arnold, B. C. (1967). A note on multivariate distributions with specified marginals. Journal of the American Statistical Association 62 (320), 1460 1461. [2] Asimit, A. V., E. Furman, and R. Vernic (2010). On a multivariate pareto distribution. Insurance: Mathematics and Economics 46 (2), 308 316. [3] Asimit, A. V., E. Furman, and R. Vernic (2016). Statistical inference for a new class of multivariate pareto distributions. Communications in Statistics- Simulation and Computation 45 (2), 456 471. [4] Balakrishnan, N., R. C. Gupta, D. Kundu, V. Leiva, and A. Sanhueza (2011). On some mixture models based on the birnbaum saunders distribution and associated inference. Journal of Statistical Planning and Inference 141 (7), 2175 2190. [5] Block, H. W. and A. Basu (1974). A continuous, bivariate exponential extension. Journal of the American Statistical Association 69 (348), 1031 1037. [6] Dey, A. K. and D. Kundu (2012). Discriminating between bivariate generalized exponential and bivariate weibull distributions. Chilean Journal of Statistics 3 (1), 93 110. [7] Falk, M. and A. Guillou (2008). Peaks-over-threshold stability of multivariate generalized pareto distributions. Journal of Multivariate Analysis 99 (4), 715 734. [8] Gupta, R. D. and D. Kundu (1999). Theory & methods: Generalized exponential distributions. Australian & New Zealand Journal of Statistics 41 (2), 173 188. [9] Khosravi, M., D. Kundu, and A. Jamalizadeh (2015). On bivariate and a mixture of bivariate birnbaum saunders distributions. Statistical Methodology 23, 1 17.

Dey et al./a sample document 16 [10] Kundu, D. (2012). On sarhan-balakrishnan bivariate distribution. Journal of Statistics Applications & Probability 1 (3), 163. [11] Kundu, D. and R. D. Gupta (2009). Bivariate generalized exponential distribution. Journal of Multivariate Analysis 100 (4), 581 593. [12] Kundu, D. and R. D. Gupta (2010). A class of absolutely continuous bivariate distributions. Statistical Methodology 7 (4), 464 477. [13] Kundu, D. and R. D. Gupta (2011). Absolute continuous bivariate generalized exponential distribution. AStA Advances in Statistical Analysis 95 (2), 169 185. [14] Kundu, D., A. Kumar, and A. K. Gupta (2015). Absolute continuous multivariate generalized exponential distribution. Sankhya B 77 (2), 175 206. [15] Marshall, A. W. (1996). Copulas, marginals, and joint distributions. Lecture Notes-Monograph Series, 213 222. [16] Marshall, A. W. and I. Olkin (1967). A multivariate exponential distribution. Journal of the American Statistical Association 62 (317), 30 44. [17] McLachlan, G. and D. Peel (2004). Finite mixture models. John Wiley & Sons. [18] Mirhosseini, S. M., M. Amini, D. Kundu, and A. Dolati (2015). On a new absolutely continuous bivariate generalized exponential distribution. Statistical Methods & Applications 24 (1), 61 83. [19] Nelsen, R. B. (2007). An introduction to copulas. Springer Science & Business Media. [20] Rakonczai, P. and A. Zempléni (2012). Bivariate generalized pareto distribution in practice: models and estimation. Environmetrics 23 (3), 219 227. [21] Ristić, M. M. and D. Kundu (2015). Marshall-olkin generalized exponential distribution. Metron 73 (3), 317 333. [22] Sarhan, A. M. and N. Balakrishnan (2007). A new class of bivariate distributions and its mixture. Journal of Multivariate Analysis 98 (7), 1508 1527. [23] Smith, R. L. (1994). Multivariate threshold methods. In Extreme Value Theory and Applications, pp. 225 248. Springer. [24] Smith, R. L., J. A. Tawn, and S. G. Coles (1997). Markov chain models for threshold exceedances. Biometrika 84 (2), 249 268. [25] Tsay, R. S. (2014). An introduction to analysis of financial data with R. John Wiley & Sons. [26] Yeh, H.-C. (2000). Two multivariate pareto distributions and their related inferences. Bulletin of the Institute of Mathematics, Academia Sinica. 28 (2), 71 86. [27] Yeh, H.-C. (2004). Some properties and characterizations for generalized multivariate pareto distributions. Journal of Multivariate Analysis 88 (1), 47 60. Department of Mathematics, IIT Guwahati e-mail: arabin.k.dey@gmail.com Department of Mathematics, IIT Guwahati e-mail: biplab.paul@iitg.ac.in

Dey et al./a sample document 17 Department of Mathematics and Statistics IIT Kanpur Kanpur, India e-mail: kundu@iitk.ac.in