Estimation of the Correlation Coefficient for a Bivariate Normal Distribution with Missing Data

Similar documents
Pearson s Chi-Square Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted Histograms

Alternative Tests for the Poisson Distribution

Central Coverage Bayes Prediction Intervals for the Generalized Pareto Distribution

TESTING THE VALIDITY OF THE EXPONENTIAL MODEL BASED ON TYPE II CENSORED DATA USING TRANSFORMED SAMPLE DATA

Bayesian Analysis of Topp-Leone Distribution under Different Loss Functions and Different Priors

A New Method of Estimation of Size-Biased Generalized Logarithmic Series Distribution

MEASURES OF BLOCK DESIGN EFFICIENCY RECOVERING INTERBLOCK INFORMATION

LET a random variable x follows the two - parameter

arxiv: v2 [physics.data-an] 15 Jul 2015

Hydroelastic Analysis of a 1900 TEU Container Ship Using Finite Element and Boundary Element Methods

SIMPLE LINEAR REGRESSION ANALYSIS FOR INCOMPLETE LONGITUDINAL DATA. Juthaphorn Saekhoo

2. The Munich chain ladder method

On Polynomials Construction

CSCE 478/878 Lecture 4: Experimental Design and Analysis. Stephen Scott. 3 Building a tree on the training set Introduction. Outline.

Hypothesis Test and Confidence Interval for the Negative Binomial Distribution via Coincidence: A Case for Rare Events

A Multivariate Normal Law for Turing s Formulae

A pathway to matrix-variate gamma and normal densities

Temporal-Difference Learning

GENLOG Multinomial Loglinear and Logit Models

KR- 21 FOR FORMULA SCORED TESTS WITH. Robert L. Linn, Robert F. Boldt, Ronald L. Flaugher, and Donald A. Rock

Psychometric Methods: Theory into Practice Larry R. Price

THE IMPACT OF NONNORMALITY ON THE ASYMPTOTIC CONFIDENCE INTERVAL FOR AN EFFECT SIZE MEASURE IN MULTIPLE REGRESSION

3.1 Random variables

n 1 Cov(X,Y)= ( X i- X )( Y i-y ). N-1 i=1 * If variable X and variable Y tend to increase together, then c(x,y) > 0

Size and Power in Asset Pricing Tests

SPATIAL BOOTSTRAP WITH INCREASING OBSERVATIONS IN A FIXED DOMAIN

A NEW GENERALIZED PARETO DISTRIBUTION. Abstract

Weighted least-squares estimators of parametric functions of the regression coefficients under a general linear model

An Overview of Limited Information Goodness-of-Fit Testing in Multidimensional Contingency Tables

Surveillance Points in High Dimensional Spaces

FUSE Fusion Utility Sequence Estimator

Multiple Criteria Secretary Problem: A New Approach

On the Poisson Approximation to the Negative Hypergeometric Distribution

Hammerstein Model Identification Based On Instrumental Variable and Least Square Methods

Nuclear Medicine Physics 02 Oct. 2007

Mitscherlich s Law: Sum of two exponential Processes; Conclusions 2009, 1 st July

COUPLED MODELS OF ROLLING, SLIDING AND WHIRLING FRICTION

Goodness-of-fit for composite hypotheses.

CALCULATING THE NUMBER OF TWIN PRIMES WITH SPECIFIED DISTANCE BETWEEN THEM BASED ON THE SIMPLEST PROBABILISTIC MODEL

A Backward Identification Problem for an Axis-Symmetric Fractional Diffusion Equation

Analytical Solutions for Confined Aquifers with non constant Pumping using Computer Algebra

Some technical details on confidence. intervals for LIFT measures in data. mining

Research Design - - Topic 17 Multiple Regression & Multiple Correlation: Two Predictors 2009 R.C. Gardner, Ph.D.

Encapsulation theory: the transformation equations of absolute information hiding.

ANALYSIS OF PRESSURE VARIATION OF FLUID IN AN INFINITE ACTING RESERVOIR

Power and sample size calculations for longitudinal studies comparing rates of change with a time-varying exposure

Moment-free numerical approximation of highly oscillatory integrals with stationary points

The Substring Search Problem

Concomitants of Multivariate Order Statistics With Application to Judgment Poststratification

Power and sample size calculations for longitudinal studies estimating a main effect of a time-varying exposure

ESTIMATION OF A LOGISTIC REGRESSION MODEL WITH MISMEASURED OBSERVATIONS

ANALYSIS OF QUANTUM EIGENSTATES IN A 3-MODE SYSTEM

Aalborg Universitet. Load Estimation from Natural input Modal Analysis Aenlle, Manuel López; Brincker, Rune; Canteli, Alfonso Fernández

MAGNETIC FIELD AROUND TWO SEPARATED MAGNETIZING COILS

DEMONSTRATING THE INVARIANCE PROPERTY OF HOTELLING T 2 STATISTIC

Functions Defined on Fuzzy Real Numbers According to Zadeh s Extension

Introduction to Mathematical Statistics Robert V. Hogg Joeseph McKean Allen T. Craig Seventh Edition

On the Quasi-inverse of a Non-square Matrix: An Infinite Solution

Identification of the degradation of railway ballast under a concrete sleeper

ELASTIC ANALYSIS OF CIRCULAR SANDWICH PLATES WITH FGM FACE-SHEETS

Directed Regression. Benjamin Van Roy Stanford University Stanford, CA Abstract

ON INDEPENDENT SETS IN PURELY ATOMIC PROBABILITY SPACES WITH GEOMETRIC DISTRIBUTION. 1. Introduction. 1 r r. r k for every set E A, E \ {0},

MEASURING CHINESE RISK AVERSION

Stanford University CS259Q: Quantum Computing Handout 8 Luca Trevisan October 18, 2012

Forecasting Agricultural Commodity Prices Using Multivariate Bayesian Machine Learning. Andres M. Ticlavilca, Dillon M. Feuz, and Mac McKee

4/18/2005. Statistical Learning Theory

Energy Savings Achievable in Connection Preserving Energy Saving Algorithms

Chapter 3 Optical Systems with Annular Pupils

Mathematisch-Naturwissenschaftliche Fakultät I Humboldt-Universität zu Berlin Institut für Physik Physikalisches Grundpraktikum.

Basic Bridge Circuits

Web-based Supplementary Materials for. Controlling False Discoveries in Multidimensional Directional Decisions, with

Contact impedance of grounded and capacitive electrodes

Lecture 7 Topic 5: Multiple Comparisons (means separation)

Absorption Rate into a Small Sphere for a Diffusing Particle Confined in a Large Sphere

A Comparative Study of Exponential Time between Events Charts

Failure Probability of 2-within-Consecutive-(2, 2)-out-of-(n, m): F System for Special Values of m

CROSSTABS. Notation. Marginal and Cell Statistics

THE INFLUENCE OF THE MAGNETIC NON-LINEARITY ON THE MAGNETOSTATIC SHIELDS DESIGN

International Journal of Mathematical Archive-3(12), 2012, Available online through ISSN

Model and Controller Order Reduction for Infinite Dimensional Systems

Journal of Inequalities in Pure and Applied Mathematics

Information Retrieval Advanced IR models. Luca Bondi

APPLICATION OF CHEBYSHEV- AND MARKOV-TYPE INEQUALITIES IN STRUCTURAL ENGINEERING ABSTRACT

Numerical Integration

Recent Advances in Chemical Engineering, Biochemistry and Computational Chemistry

COMPUTATIONS OF ELECTROMAGNETIC FIELDS RADIATED FROM COMPLEX LIGHTNING CHANNELS

New problems in universal algebraic geometry illustrated by boolean equations

A DETAILED DESCRIPTION OF THE DISCREPANCY IN FORMULAS FOR THE STANDARD ERROR OF THE DIFFERENCE BETWEEN A RAW AND PARTIAL CORRELATION: A TYPOGRAPHICAL

A Multivariate Normal Law for Turing s Formulae

FORMULA FOR SPIRAL CLOUD-RAIN BANDS OF A TROPICAL CYCLONE. Boris S. Yurchak* University of Maryland, Baltimore County, Greenbelt, Maryland =, (2)

Localization of Eigenvalues in Small Specified Regions of Complex Plane by State Feedback Matrix

A Comparison and Contrast of Some Methods for Sample Quartiles

Bifurcation Analysis for the Delay Logistic Equation with Two Delays

Coupled Electromagnetic and Heat Transfer Simulations for RF Applicator Design for Efficient Heating of Materials

Liquid gas interface under hydrostatic pressure

Do Managers Do Good With Other People s Money? Online Appendix

Digital Image Restoration Using Autoregressive Time Series Type Models. Casilla 110-V, Valparaíso, Chile 2 Universidad Católica de Valparaíso,

On the global uniform asymptotic stability of time-varying dynamical systems

JIEMS Journal of Industrial Engineering and Management Studies

Construction and Analysis of Boolean Functions of 2t + 1 Variables with Maximum Algebraic Immunity

Transcription:

Kasetsat J. (Nat. Sci. 45 : 736-74 ( Estimation of the Coelation Coefficient fo a Bivaiate Nomal Distibution with Missing Data Juthaphon Sinsomboonthong* ABSTRACT This study poposes an estimato of the coelation coefficient fo a bivaiate nomal distibution with missing data, via the complete obsevation analysis method. Evaluation of the poposed estimato ( ˆρ J in compaison with the Peason coelation coefficient ( ˆρ P was conducted using a simulation study. It was found that, fo a highe pecentage of missing data in a lage sample size, the absolute bias of ˆρ J was less than that of ˆρ P when the population coelation coefficients (ρ wee not close to zeo. In addition, the mean squae eo of ˆρ J was not diffeent fom that of ˆρ P in each situation. Keywods: bivaiate nomal distibution, coelation coefficient, bias, missing data, mean squae eo INTRODUCTION Missing data, which ae almost always found in eseach studies and caused by many possible easons, usually intoduce bias and inefficiency in paamete estimation (Little and Rubin, ; Noazian et al., 8. Hence, welldesigned data analysis is especially necessay. Pincipally, although incomplete data may possibly be analyzed using standad statistical methods though which missing data ae ignoed, an impotant limitation is that the methods ae specifically appopiate fo studies which contain small amounts of missing data. Moeove, standad statistical appoaches can also cause deficiencies of data when incomplete cases ae discaded. Data deficiency always causes impecision and also an escalation in biases (Rao et al., 999; Little and Rubin,. Acock (5 mentioned that this may futhe educe o exaggeate statistical powe. Likewise, Rotnitzky and Wypij (994, Roth et al. (996, Goelick (6 and Fitzmauice (8 mentioned that these may possibly esult in invalid conclusions, since the degee of bias and the loss of pecision depend not only on the faction of complete cases and the patten of missing data, but also on diffeences between the complete and incomplete cases, and the paametes of inteest. Recently, seveal authos have investigated poblems egading the estimation of the population coelation coefficient, ρ, fo samples fom a bivaiate nomal distibution. The maximum likelihood estimato of ρ fo a bivaiate nomal distibution was poposed by Dahiya and Kowa (98 fo equal vaiances and an incomplete dataset. Gaen (998 examined the poblem of maximum likelihood estimation fo the coelation coefficient and its asymptotic popeties in a bivaiate nomal model with missing data. Mudelsee (3 studied the Peason coelation coefficient with bootstap confidence intevals fom a bivaiate climate time seies and Depatment of Statistics, Faculty of Science, Kasetsat Univesity, Bangkok 9, Thailand. * Coesponding autho: e-mail: fscijps@ku.ac.th Received date : // Accepted date : 9/6/

Kasetsat J. (Nat. Sci. 45(4 737 unknown data distibutions. In addition, the pefomances of Peason coelation coefficient and Speaman coelation coefficient have been futhe investigated by Huson et al. (7. They found that the Peason coelation coefficient is in fact, fo most pactical puposes, an adequate choice fo the coelation coefficient investigation. Moeove, the Peason estimate was bette than the much moe widely known Speaman estimate. Howeve, Nete et al. (996 and Zimmeman et al. (3 mentioned that the Peason coelation coefficient was a biased estimato of the population coelation coefficient fo bivaiate nomal populations. In addition, the bias deceased when the sample size inceased and it was zeo when the population coelation coefficients wee zeo and one. Futhemoe, Efon and Tibshiani (993 and Smith and Pontius (6 have applied Jackknife s method (Quenouille, 949, 956; Tukey, 958 of bias eduction to the estimation of paametes. The basic idea behind Jackknife s method lies in systematically ecomputing statistics by using samples that leave out one obsevation at a time fom the sample set. Fom this new set of obsevations, an estimato can be calculated. Geneally, in the cuent study, incomplete data with bivaiate nomal distibution wee examined. The estimato of population coelation coefficient was modified fom Peason coelation coefficient, and Jackknife s method was applied fo bias eduction. MATERIALS AND METHODS The Peason coelation coefficient Conside the incomplete bivaiate sample fom a bivaiate nomal distibution with mean vecto (µ, µ, a vaiance covaiance matix = σ ρσσ ρσσ σ and coelation coefficient ρ. Assume that pais of (X, X ae completely obseved with bivaiate nomal distibution, but the est n obsevations of X ae lost and thee ae only n obsevations ( < < n of X collected (see Figue. All data pais ae independent and identically distibuted and data ae assumed to be missing completely at andom (Little and Rubin,. In othe wods, whethe o not data ae missing is independent of both the obseved and the unobseved values of X and X. Based on the data pais, it is well-known that the maximum likelihood estimato of ρ, denoted by ˆρ P, is given by ( x x ( x x whee ( x x ( x x j j ρˆ P = j j x = xj and x = x j (Nete et al., 996; Andeson, 3. This estimato is often called the Peason coelation coefficient. It is a biased estimato of ρ (unless Obsevations: + n Vaiable : x x x x, + xn Vaiable : x x x Figue Monotone missing data patten fo a bivaiate nomal distibution.

738 Kasetsat J. (Nat. Sci. 45(4 ρ = o, which is usually small when sample size is lage (Nete et al., 996; Zimmeman et al., 3. The Jackknife s method of bias eduction This section poposes the estimato of ρ and applies the Jackknife s method fo bias eduction of ˆρ P as follows: Given a sample X = (x, x,..., x, x, x,..., x and an estimato δ(x = ˆ ( x x ( x x j j ρ P = j j ( x x ( x x ( whee x = xj and x = xj. The i th Jackknife sample, X ( i, consists of the data set with the i th obsevation emoved. X ( i =(x, x,..., x (i, x (i+,..., x, x, x,..., x (i x (i+,..., x fo i =,,,. 3 δ(x ( i is the i th Jackknife eplication of δ(x and δ(x ( i = ˆρ P ( i = ( x x ( x x j i ( x x ( x x j i j ( i j ( i j ( i j ( i j i whee x ( i = x j i x ( i = x j i fo i =,,,. j j and 4 Calculate the pseudo values in the fom of J i whee J i = δ(x ( δ(x ( i ( = ˆρ P ( ˆρ P ( i. 5 The poposed estimato of ρ is given by ˆρ J J i whee ˆρ J = i= = ρˆ P ( ˆ ρp ( i [ ] i= = ˆ ρ ρ i= i= ˆ P P ( i = ˆ ρ ρ ˆ P P ( i i= ˆρ P and ˆρ P ( i ae given by the fomat of equation ( and ( espectively. RESULTS In ode to empiically evaluate the validity and eliability of the poposed estimato, a simulation study was conducted. In the study, populations of (X, X at a size of N =, wee geneated in the fom of a bivaiate nomal distibution with µ =, µ = 3, σ = 4 and σ = 9. Coelation coefficients of (X, X at -.9, -.8,,,.,.,,.9 with the sample sizes of n =, 3 and 6 wee conducted using a simple andom sampling with eplacement method with,-times epetitions, and missing data wee set at, and 3 pecentage of the total cases, thus ceating 7 situations fo the simulation study. Then, absolute bias and mean squae eo (MSE compaisons of ˆρ J and ˆρ P wee empiically pefomed. The simulation esults pesented in Figue eveal the absolute biases of ˆρ J and ˆρ P. When the sample size and pecentage of the missing data wee and %, espectively, the absolute bias of ˆρ J was less than that of ˆρ P fo the population coelation coefficients (ρ which fell between. and.. Fo % missing data and the sample size of, the absolute bias of ˆρ J was less than that of ˆρ P when population coelation coefficients wee not about zeo.

Kasetsat J. (Nat. Sci. 45(4 739.4...8.6.4. n =, % Missing data -.4...8.6.4. n =, % Missing data -.4...8.6.4. n =, 3% Missing data -.4...8.6.4. n = 3, % Missing data.4...8.6.4. n = 3, % Missing data.4...8.6.4. n = 3, 3% Missing data - - -.4...8.6.4. n = 6, % Missing data -.4...8.6.4. n = 6, % Missing data -.4...8.6.4. n = 6, 3% Missing data - Peason estimate, % missing data Peason estimate, % missing data Peason estimate, 3% missing data Poposed estimate, % missing data Poposed estimate, % missing data Poposed estimate, 3% missing data Figue Compaison of absolute biases fo ˆρ J and ˆρ P when n =, 3 and 6. Likewise, the absolute bias of ˆρ J was less than that of ˆρ P fo population coelation coefficients which wee not between -.5 and -. when the pecentage of missing data was 3% and the sample size was. In addition, the absolute bias of ˆρ J was less than that of ˆρ P when the sample size was geate than and the pecentage of missing data was geate than % fo population coelation coefficients which wee not close to zeo. Moeove, the absolute biases of ˆρ J and ˆρ P wee less than.4 and.4, espectively, fo sample sizes of 3 and 6. Thus, the absolute bias of ˆρ P seemed to be geate than that of ˆρ J when sample sizes wee 3 and 6. With a data loss of %, the absolute bias of ˆρ J was less than that of ˆρ P at all levels of population coelation coefficient fo n = 3, wheeas fo n = 6, the absolute bias of ˆρ J was less than that of ˆρ P when the population coelation coefficients wee positive. Moeove, the absolute biases of ˆρ J and ˆρ P seemed to decease wheneve the sample size inceased. Figue 3 indicates that the mean squae eo of ˆρ J seems to have no diffeence fom that of ˆρ P in each situation fo this study. Futhemoe, the mean squae eos of ˆρ J and ˆρ P seem to

74 Kasetsat J. (Nat. Sci. 45(4 Mean squae eo..8.6.4. n =, % Missing data - Mean squae eo..8.6.4. n =, % Missing data - Mean squae eo..8.6.4. n =, 3% Missing data - Mean squae eo..8.6.4. n = 3, % Missing data Mean squae eo..8.6.4. n = 3, % Missing data Mean squae eo..8.6.4. n = 3, 3% Missing data - - - Mean squae eo..8.6.4. n = 6, % Missing data Mean squae eo..8.6.4. n = 6, % Missing data Mean squae eo..8.6.4. n = 6, 3% Missing data - - - Peason estimate, % missing data Peason estimate, % missing data Peason estimate, 3% missing data Poposed estimate, % missing data Poposed estimate, % missing data Poposed estimate, 3% missing data Figue 3 Mean squae eos of ˆρ J and ˆρ P when n =, 3 and 6. decease wheneve the sample size inceased whateve the pecentages of missing data. The anges of the mean squae eos of ˆρ J and ˆρ P wee found to be naowe when the sample size was lage, with % and 3% missing data. This simulation study found that the pefomance of ˆρ J was bette than that of ˆρ P fo sample sizes of 3 and 6 with a highe pecentage of missing data and the population coelation coefficients wee not close to zeo. DISCUSSION The simulation esults indicated that ˆρ P seemed to be a biased estimato as Nete et al. (996 and Zimmeman et al. (3 mentioned. Hence, the bias of ˆρ P can be educed by Jackknife s method as epoted (Efon and Tibshiani, 993; Smith and Pontius, 6. Moeove, the bias of the poposed estimato educed to zeo fo a lage sample size. These findings can be applied in eseach, in education, psychology, medicine and othe fields. Jackknife s method can be applied

Kasetsat J. (Nat. Sci. 45(4 74 in the elimination of biases in the coelation coefficient estimation fo incomplete samples fom bivaiate nomal populations. In addition, it is possible to calculate the poposed estimato without difficulty by compute pogamming. CONCLUSION This pape poposed an estimato of the coelation coefficient fo a bivaiate nomal distibution when obsevations ae missing fom one of the vaiables. The poposed estimato ( ˆρ J was deived fom the Peason coelation coefficient ( ˆρ P and based on the analysis of complete cases. The esults of the simulation study indicated that the absolute bias of ˆρ J was less than that of ˆρ P when the sample size was lage fo highe pecentages of missing data and the population coelation coefficients wee not close to zeo. Futhemoe, the absolute bias of ˆρ J was less than.4 fo sample sizes of 3 and 6 with whateve pecentage of missing data. In addition, the mean squae eo of ˆρ J seemed to be no diffeent fom that of ˆρ P in each situation fo this simulation study. ACKNOWLEDGEMENTS The autho would like to thank the Depatment of Statistics, Faculty of Science, Kasetsat Univesity fo financial suppot and necessay facilities duing the eseach. LITERATURE CITED Acock, A.C. 5. Woking with missing values. Jounal of Maiage and Family 67: 8. Andeson, T.W. 3. An Intoduction to Multivaiate Statistical Analysis. 3d ed. Wiley. New Jesey. 7 pp. Dahiya, R.C. and R.M. Kowa. 98. Maximum likelihood estimates fo a bivaiate nomal distibution with missing data. The Annals of Statistics 8: 687 69. Efon, B. and R.J Tibshiani. 993. An Intoduction to the Bootstap. Chapman& Hall/CRC. USA. 45 pp. Fitzmauice, G. 8. Missing data: Implications fo analysis. Nutition 4:. Gaen, S.T. 998. Maximum likelihood estimation of the coelation coefficient in a bivaiate nomal model with missing data. Statistics & Pobability Lettes 38: 8 88. Goelick, M.H. 6. Bias aising fom missing data in pedictive models. Jounal of Clinical Epidemiology 59: 5 3. Huson, L.W., Biostatistics Goup and F.H. La- Roche. 7. Pefomance of some coelation coefficients when applied to zeo-clusteed data. Jounal of Moden Applied Statistical Method 6: 53 536. Little, R.J.A. and D.B. Rubin.. Statistical Analysis with Missing Data. Wiley. New Jesey. 49 pp. Mudelsee, M. 3. Estimating Peason s coelation coefficient with bootstap confidence inteval fom seially dependent time seies. Math. Geol. 35: 65 665. Nete, J., M.H. Kutne, C.J. Nachtsheim and W. Wasseman. 996. Applied Linea Statistical Models. 4th ed. Iwin. Chicago.,43 pp. Noazian, M.N., Y.A. Shuki, R.N. Azam and A.M.M. Al Baki. 8. Estimation of missing values in ai pollution data using single imputation techniques. ScienceAsia 34: 34 345. Quenouille, M.H. 949. Appoximate test of coelation in time-seies. Jounal of the Royal Statistical Society. Seies B (Methodological : 68 84. Quenouille, M.H. 956. Notes on bias in estimation. Biometika 43: 353 36. Rao, C.R., H. Toutenbug and A. Fiege. 999. Linea Models: Least Squaes and

74 Kasetsat J. (Nat. Sci. 45(4 Altenatives. nd ed. Spinge-Velag. New Yok. 44 pp. Roth, P.L., J.E. Campion and S.D. Jones. 996. The Impact of fou missing data techniques on validity estimates in human esouce management. Jounal of Business and Psychology :. Rotnitzky, A. and D. Wypij. 994. A Note on the biased of estimatos with missing data. Biometics 5: 63 7. Smith, C.D. and J.S. Pontius. 6. Jackknife estimato of species ichness with S-PLUS. Jounal of Statistical Softwae 5:. Tukey, J.W. 958. Bias and confidence in not-quite lage samples. Annals of Mathematical Statistics 9: 64 63. Zimmeman, D.W., B.D. Zumbo and R.H. Williams. 3. Bias in estimation and hypothesis testing of coelation. Psicológica 4: 33 58.