An Empirical Characteristic Function Approach to Selecting a Transformation to Normality

Similar documents
Estimation for Mean and Standard Deviation of Normal Distribution under Type II Censoring

The comparative studies on reliability for Rayleigh models

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

On robust and efficient estimation of the center of. Symmetry.

Chapter 4: Asymptotic Properties of the MLE (Part 2)

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODEL

The Expectation-Maximization Algorithm

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Likelihood inference in the presence of nuisance parameters

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Nonparametric Modal Regression

Size Distortion and Modi cation of Classical Vuong Tests

Effect of outliers on the variable selection by the regularized regression

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

EM Algorithm II. September 11, 2018

Estimation for generalized half logistic distribution based on records

Maximum Likelihood Estimation

Answer Key for STAT 200B HW No. 7

Statistics 3858 : Maximum Likelihood Estimators

Maximum Likelihood Estimation

Problem Selected Scores

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Stat 710: Mathematical Statistics Lecture 12

Testing Algebraic Hypotheses

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Bayesian Regression Linear and Logistic Regression

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples

Econometrics I, Estimation

Approximate Median Regression via the Box-Cox Transformation

Method of Conditional Moments Based on Incomplete Data

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Estimation, Inference, and Hypothesis Testing

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Parametric Techniques Lecture 3

The high order moments method in endpoint estimation: an overview

Testing for a Global Maximum of the Likelihood

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Characteristic Function-based Semiparametric Inference for Skew-symmetric Models

Lecture 14 October 13

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Parametric Techniques

Lecture 1: Introduction

KANSAS STATE UNIVERSITY

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Introduction An approximated EM algorithm Simulation studies Discussion

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Statistical Properties of Numerical Derivatives

Bias-corrected Estimators of Scalar Skew Normal

Closest Moment Estimation under General Conditions

Measure-Transformed Quasi Maximum Likelihood Estimation

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Likelihood-based inference with missing data under missing-at-random

BTRY 4090: Spring 2009 Theory of Statistics

Exponential Families

Canonical Correlation Analysis of Longitudinal Data

Chapter 4: Asymptotic Properties of the MLE

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Calibration estimation using exponential tilting in sample surveys

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Multiscale Adaptive Inference on Conditional Moment Inequalities

Asymptotic inference for a nonstationary double ar(1) model

Economics 583: Econometric Theory I A Primer on Asymptotics

STAT 461/561- Assignments, Year 2015

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0

Estimation of parametric functions in Downton s bivariate exponential distribution

ECE531 Lecture 10b: Maximum Likelihood Estimation

The outline for Unit 3

Lecture 3 September 1

Statistics: Learning models from data

Graduate Econometrics I: Maximum Likelihood I

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Introduction to Estimation Methods for Time Series models Lecture 2

Conjugate Predictive Distributions and Generalized Entropies

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

On the Power of Tests for Regime Switching

Last week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36

An Analysis of Record Statistics based on an Exponentiated Gumbel Model

IEOR 165 Lecture 13 Maximum Likelihood Estimation

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Location Multiplicative Error Model. Asymptotic Inference and Empirical Analysis

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Asymptotics for Nonlinear GMM

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Can we do statistical inference in a non-asymptotic way? 1

Transcription:

Communications for Statistical Applications and Methods 014, Vol. 1, No. 3, 13 4 DOI: http://dx.doi.org/10.5351/csam.014.1.3.13 ISSN 87-7843 An Empirical Characteristic Function Approach to Selecting a Transformation to Normality In-Kwon Yeo 1,a, Richard A. Johnson b, XinWei Deng c a Department of Statistics, Sookmyung Women s University, Korea b Department of Statistics, University of Wisconsin-Madison, USA c Department of Statistics, Virginia Tech, USA Abstract In this paper, we study the problem of transforming to normality. We propose to estimate the transformation parameter by minimizing a weighted squared distance between the empirical characteristic function of transformed data and the characteristic function of the normal distribution. Our approach also allows for other symmetric target characteristic functions. Asymptotics are established for a random sample selected from an unknown distribution. The proofs show that the weight function t needs to be modified to have thinner tails. We also propose the method to compute the influence function for M-equation taking the form of U-statistics. The influence function calculations and a small Monte Carlo simulation show that our estimates are less sensitive to a few outliers than the maximum likelihood estimates. Keywords: Box-Cox transformation, influence function, Yeo-Johnson transformation. 1. Introduction Transformation of data is a useful tool that permits the use of an assumption such as normality, when the observed data seriously violate this condition. It is well-known that, under the normality assumption, the maximum likelihood estimator of the Box-Cox transformation parameter is very sensitive to outliers, see Andrews (1971). There are two ways to circumvent this problem. The first approach is to construct diagnostics which attempt to identify influential observations and then to remove these observations before estimating the transformation parameter. Cook and Wang (1983), Hinkley and Wang (1988), Tsai and Wu (1990) and Kim et al. (1996) studied case deletion diagnostics for the Box-Cox transformation. However, for multiple outliers, these diagnostic procedures are quite complicated and require an extensive computation. The second approach is to perform a robust estimation procedure that is not strongly affected by outliers. Carroll (1980) proposed a robust method for selecting a power transformation to achieve approximate normality in a linear model. Hinkley (1975) and Taylor (1985) suggested methods to estimate the transformation parameter in the Box-Cox transformation when the goal is to obtain approximate symmetry rather than normality. Yeo and Johnson (001) and Yeo (001) introduced an M-estimator obtained by minimizing the integrated square of the imaginary part of the empirical characteristic function of Yeo-Johnson transformed data. In this paper, we propose a robust method to estimate a transformation parameter as well as the mean and the variance of the target distribution. The estimators are obtained by minimizing a squared This research was supported by the Sookmyung Women s University Research Grants 013. 1 Corresponding author: Department of Statistics, Sookmyung Women s University, Seoul 140-74, Korea. E-mail: inkwon@sookmyung.ac.kr Published online 31 May 014 / journal homepage: http://csam.or.kr c 014 The Korean Statistical Society, and Korean International Statistical Society. All rights reserved.

14 In-Kwon Yeo, Richard A. Johnson, XinWei Deng distance between the empirical characteristic function of the transformed data and the target characteristic function which is often that of a normal distribution. Specifically, we minimize the integral of the squared modulus of the difference of the two characteristic functions multiplied by a weight function. It is assumed that, for some interval about zero, the weight function equals t which is used to compute the distance covariance by Szekely et al. (007). Many authors such as Koutrouvelis (1980), Koutrouvelis and Kellermeier (1981), Fan (1997), Klar and Meintanis (005), and Jimenez-Gamero et al. (009) have proposed goodness-of-fit test statistics based on measuring differences between the empirical characteristic function and the characteristic function in the null hypothesis. Conversely, we employ this statistic as a measurement to estimate the transformation parameter as well as the mean and the variance. Our estimation procedure can be viewed as solving estimating equations based on a U-statistic, see Lee (1990). In order to calculate the influence function, we take the approach of calculating this function in terms of the asymptotically equivalent statistic that is a sum of independent and identically distributed terms. The resulting expressions show that our procedure is more robust than the maximum likelihood procedure.. Estimation Let ψ(x, λ) be a family of transformations indexed by the transformation parameter λ, for instances, Box and Cox (1964), John and Draper (1980), and Yeo and Johnson (000). Generally, a main goal of transforming data is to enhance the normality of data. According to the definition of the relative skewness by van Zwet (1964), a convex transformation reduces left-skewness or vice versa. The Box-Cox transformation and the Yeo-Johnson transformation are either convex or concave and can be applied to skewed data to improve symmetry. By contrast, the modulus transformation by John and Draper (1980) is convex-concave and can be applied to symmetric data to reduce kurtosis. Let X 1,..., X n be independent and identically distributed random variables with distribution function F( ). It is assumed that there exists a transformation parameter λ for which ψ(x, λ) is normally distributed with mean µ and variance σ. This assumption can be extended to any symmetric distribution with the location and the scale parameter, µ and σ. Let ϕ(t) be the characteristic function of the standard normal distribution, that is ϕ(t) = e t /, and let ϕ n (θ, t) be the empirical characteristic function of standardized transformed variables Z j (θ) = {ψ(x j, λ) µ/σ, j = 1,..., n, θ = (θ 1, θ, θ 3 ) T = (λ, µ, σ ) T. Then, ϕ n (θ, t) = n 1 exp ( itz j (θ) ) = ϕ cn (θ, t) + iϕ sn (θ, t), ϕ cn (θ, t) = n 1 cos ( tz j (θ) ) and ϕ sn (θ, t) = n 1 sin ( tz j (θ) ). Here, the parameter space Θ is assumed to be a compact set of the form Θ = {θ a i θ i b i, 0 < a 3, a i, b i <, for i = 1,, 3. (.1) The goodness-of-fit test statistics based on measuring differences between the empirical characteristic function and the characteristic function in the null hypothesis have been extensively investigated

An ECF Approach to Selecting a Transformation 15 in the literature. For example, Jimenez-Gamero et al. (009) consider the statistic T n,g (ˆθ ) = n ϕ n (t) ϕ ( t; ˆθ ) dg(t), ϕ n (t) is the empirical characteristic function, ϕ(t; θ) is the characteristic function of target distribution, ˆθ is a consistent estimator and G is a distribution function. In this paper, we propose the method to estimate θ by minimizing an integrated weighted version of the distance between the empirical characteristic function of Z(θ) and e t /, { { φ n (θ) = ϕ n (θ) ϕ w = ϕ n (θ, t) e t ϕ n (θ, t) e t w(t)dt, the overline denotes the complex conjugate and w(t) is a non-negative real-valued weight function. Therefore, {ϕn φ n (θ) ϕ n (θ, t)ϕ n (θ, t)w(t) dt (θ, t) + ϕ n (θ, t) e t w(t) dt. Here ϕ n (θ, t)ϕ n (θ, t)w(t) dt = 1 n + cos ( t { Z n j (θ) Z k (θ) ) w(t) dt, {ϕn (θ, t) + ϕ n (θ, t) e t w(t) dt = n j<k cos ( tz j (θ) ) e t w(t) dt. Let ϕ(θ, t) = E[exp(itZ(θ))] denote the characteristic function of the standardized transformed variable Z(θ) and { { φ(θ) = ϕ(θ) ϕ w = ϕ(θ, t) e t ϕ(θ, t) e t w(t)dt. The distribution of Z(θ) is equivalent to the standard normal distribution if and only if φ(θ) is zero. Hence, a reasonable approach to estimation is to select the value ˆθ = (ˆλ, ˆµ, ˆσ) T which minimizes φ n (θ). In order to compute φ n (θ), we have to specify the weight function. If the normal density with the mean 0 and the variance δ is employed as the weight function, then cos(tz)w(t) dt is the characteristic function of the normal distribution and so cos(tz)w(t) dt = exp{ δ z / and cos(tz)e t / w(t) dt = (1 + δ ) 1/ exp{ δ z /((1 + δ )). Therefore, estimates are obtained by minimizing φ n (θ) 1 n j<k exp { δ ( Z j (θ) Z k (θ) ) 1 1 + δ exp { δ Z j (θ). (1 + δ ) Here, according to Epps and Pulley (1983), δ must be a small value. The behavior in neighborhood of zero is important for characteristic functions. They mentioned that w(t) should assign high weight in some interval around the origin. As in Szekely et al. (007), an alternative weight function is w(t) = t on some interval containing zero and we define integrals as the principle values. The integral on 0 to is the limit as ϵ 0 of

16 In-Kwon Yeo, Richard A. Johnson, XinWei Deng the integral over (ϵ, ϵ 1 ). However, when we later consider properties of estimators, including the influence function, the upper and lower tails are too heavy and we find it necessary to impose a moment condition on w(t). In this paper, we consider the truncated weight function such as w(t) = t if t ( δ, δ) for a finite δ and 0 otherwise. The estimation procedure with this weight function involves some difficult numerical integrations and the proof of the asymptotic results is somewhat cumbersome; however, a simulation study shows that this weight function gives a better result. The asymptotic results are derived with this weight function. 3. Asymptotic Theory Before stating our results, we introduce some notations. For any function f (θ), for j, k = 1,, 3, f (θ ) = f (θ) and f (θ ) = f (θ) θ j θ j θ k θ=θ are the gradient and the Hessian of f evaluated at θ, respectively, for j, k = 1,, 3. We also write j f (θ ) = f (θ) and θ j jk f (θ ) = f (θ), θ=θ θ j θ k θ=θ and ψ k (x, λ) = k ψ(x, λ)/ λ k ψ 0 (x, λ) = ψ(x, λ). Theorem 1. Let the parameter space Θ be given by (.1) and let w(t) = t if t ( δ, δ) and 0 otherwise. Assume that, for k = 0, 1,, ψ k (x, λ) is continuous in (x, λ) and that there exists a function h k (x) that satisfies ψ k (x, λ) h k (x) and E[h k (X) ] < on Θ. Suppose φ(θ) has a unique global minimum at θ 0 = (λ 0, µ 0, σ 0 ) T θ 0 is an interior point of Θ. Then, (1) φ n (θ) a.s. φ(θ) uniformly in θ Θ and φ(θ) is continuous in θ. That is, { lim n sup φ n (θ) θ Θ () ˆθ is a strong consistent estimator of θ 0 θ=θ = sup φ(θ) with probability one. θ Θ (3) n 1/ φ n (θ 0 ) is asymptotically distributed with N(0, Σ(θ 0 )), Σ(θ 0 ) is specified in the proof (4) n 1/ (ˆθ θ 0 ) is asymptotically distributed with N(0, V(θ 0 )Σ(θ 0 )V(θ 0 ) T ), V(θ 0 ) = ( φ(θ 0 )) 1. Proof: (1) Note that φ n (θ) = {( ) ϕ cn (θ, t) e t + ϕsn (θ, t) w(t) dt. To verify the finiteness of φ n (θ), we first bound the integrand by adding and subtracting 1. Since (cos(tz) 1)/t z and sin(tz)/t z, { ( ) φ n (θ) (ϕ cn (θ, t) 1) + 1 e t + ϕsn (θ, t) w(t)dt 6δ n Z j (θ) + c,

An ECF Approach to Selecting a Transformation 17 c = (1 e t / ) w(t)dt is a finite number. Thus it is clear that φ n (θ) is finite. Rather than work with the V-statistics in brackets, we will take a U-statistic approach. Let C(t, z, θ) = cos(tz(θ)) e t / and S (t, z, θ) = sin(tz(θ)). We define η(z 1, z ; θ) = {C(t, z 1, θ)c(t, z, θ) + S (t, z 1, θ)s (t, z, θ) w(t) dt (3.1) and then have φ n (θ) = 1 n k=1 η ( Z j, Z k ; θ ) = n 1 n U n(θ) + 1 n U n (θ) = ( ) 1 n η ( Z j, Z k ; θ ). j<k η ( Z j, Z j ; θ ), (3.) Since η(z 1, z ; θ) is bounded and continuous in (z 1, z ; θ) Ω M = S M S M Θ S M = [ M, M], the uniform strong law of large numbers of U-statistics in Yeo and Johnson (001) ensures that U n (θ) a.s. E [ ] {C(t, Z 1, θ)c(t, Z, θ) + S (t, Z 1, θ)s (t, Z, θ) w(t)dt η(θ) uniformly in θ Θ and η(θ) is continuous in θ Θ. Note that Z 1 and Z are independent and identically distributed, η(θ) = {E [C(t, Z1, θ)] + E [S (t, Z 1, θ)] w(t) dt = φ(θ). (3.3) Furthermore, by the uniform strong law of large numbers in Rubin (1956), 1 η ( Z j, Z j ; θ ) a.s. E [ η ( Z j, Z j ; θ )] (3.4) n uniformly in θ Θ and this limit function in (3.4) is continuous in θ Θ and finally we conclude that φ n (θ) a.s. φ(θ) uniformly in θ Θ and the limit is continuous in θ Θ. () Since φ n (θ) a.s. φ(θ) uniformly in θ and φ(θ) is continuous in θ and, by assumption, θ 0 is unique minimizer of φ(θ), Lemma in Yeo and Johnson (001) allows us to conclude that ˆθ a.s. θ 0. (3) Since ψ(x, λ) and ψ 1 (x, λ) are bounded by integrable functions and Θ is compact, each entry of z 1 (θ) is bounded and each entry is dominated by an integrable function. We verify that η(z 1, z ; θ) can be obtained by differentiating under the integral sign in (3.1). The result is η(z 1, z ; θ) = {T 1 (t, z 1, z, θ) + T (t, z 1, z, θ) t w(t) dt + {T 1 (t, z, z 1, θ) + T (t, z, z 1, θ) t w(t) dt, (3.5)

18 In-Kwon Yeo, Richard A. Johnson, XinWei Deng { T 1 (t, z 1, z, θ) = sin(tz 1 (θ)) e t cos(tz (θ)) z 1 (θ), T (t, z 1, z, θ) = cos(tz 1 (θ)) sin(tz (θ)) z 1 (θ). Using sin(tz)/t z again, it is easy to show that entries of φ n (θ) are finite. From (3.), we see that φ n (θ) = n 1 n U n(θ) + 1 n U n (θ) = η ( Z j, Z j ; θ ), (3.6) ( ) 1 n η ( Z j, Z k ; θ ). j<k Again, by the uniform strong law of large numbers, the second term in (3.6) can be neglected. Note that η(z 1, z ; θ) is a symmetric kernel and so U n (θ) is also a U-statistics. Thus, the multivariate central limit theorem for random samples and U-statistics ensure the asymptotic normality of φ n (θ 0 ) with the mean vector φ(θ 0 ) = 0 and the covariance matrix W n (θ 0 ), the ( j, k)-th element of W n (θ 0 ) is W ( j,k) n (θ 0 ) = (n 1) n ( ) 1 n { [ (n )E j η (Z 1, Z ; θ 0 ) k η (Z 1, Z 3 ; θ 0 ) ] + E [ j η (Z 1, Z ; θ 0 ) k η (Z 1, Z ; θ 0 ) ]. Therefore, n 1/ φ n (θ 0 ) is asymptotically normally distributed as N(0, Σ(θ 0 )), the ( j, k)-th element of Σ(θ 0 ) is Σ ( j,k) (θ 0 ) = 4E [ j η(z 1, Z ; θ 0 ) k η(z 1, Z 3 ; θ 0 ) ]. (4) Expanding n 1/ φ n (ˆθ) about θ 0, we obtain that n 1 φn (ˆθ ) = n 1 φn (θ 0 ) + φ n ( θ ) ) n 1 (ˆθ θ 0, θ = α n ˆθ + (1 α n )θ 0 for α n [0, 1]. Since n 1/ φ(ˆθ) = 0 at the minimum when ˆθ lies in the interior of Θ, n 1/ φ n (θ 0 ) φ n ( θ)n 1/ (ˆθ θ 0 ) converges in probability to 0. From (3.5) and (3.6), we have η(z 1, z ; θ) = {S 1 (t, z 1, z, θ) + tc 1 (t, z 1, z, θ) e t t w(t) dt + {S (t, z 1, z, θ) tc (t, z 1, z, θ) t w(t) dt, (3.7)

An ECF Approach to Selecting a Transformation 19 S 1 (t, z 1, z, θ) = sin (tz 1 (θ)) z 1 (θ) + sin (tz (θ)) z (θ), C 1 (t, z 1, z, θ) = cos (tz 1 (θ)) z 1 (θ) z 1 (θ) T + cos(tz (θ)) z (θ) z (θ) T, S (t, z 1, z, θ) = sin (t(z 1 (θ) z (θ))) (z (θ) z 1 (θ)), C (t, z 1, z, θ) = cos (t(z 1 (θ) z (θ))) (z (θ) z 1 (θ)) (z (θ) z 1 (θ)) T and φ n can be written as φ n (θ) = n 1 n U n (θ) + 1 n U n (θ) = η ( Z j, Z j ; θ ), (3.8) ( ) 1 n η ( Z j, Z k ; θ ). j<k Note that, since ψ(x, λ), ψ 1 (x, λ), and ψ (x, λ) are bounded by integrable functions and Θ is compact, each entry of z 1 (θ) and z 1 (θ)( z (θ)) T is bounded. Hence, by the uniform strong law of large numbers, the second term in (3.8) can be neglected. Applying the uniform strong law of large numbers for U-statistic to U n (θ) we conclude that φ n (θ) converges almost surely to φ(θ) uniformly in θ Θ φ(θ) is continuous in θ. Hence, using the uniform convergence of φ n and the continuity of φ with almost sure convergence of ˆθ to θ 0 ; therefore, it is easy to show that φ n ( θ ) converges almost surely to φ (θ 0 ). Slutsky s theorem along with asymptotic normality of n 1/ φ n (θ 0 ) and (3.9) ensure n 1/ (ˆθ θ 0 ) is asymptotically distributed with N(0, V(θ 0 )Σ(θ 0 )V(θ 0 ) T ), V(θ 0 ) = ( φ(θ 0 )) 1. Remark 1. Note that, for a 1 λ b 1, the Box-Cox transformation, the Yeo-Johnson transformation, and the modulus transformation satisfy the following inequalities: ψ(x, λ) ψ(x, a 1 ) + ψ(x, b 1 ) = h(x), ψ 1 (x, λ) ψ 1 (x, a 1 ) + ψ 1 (x, b 1 ) = h 1 (x), ψ (x, λ) ψ (x, a 1 ) + ψ (x, b 1 ) = h (x). It is also shown that ψ(x, λ), ψ 1 (x, λ), and ψ (x, λ) are continuous in (x, λ). 4. Influence Function The influence function is a basic tool to measure the robustness of estimators. Whereas the influence function of M-estimators with independent random variables has been extensively studied in literature, there is no literature that addresses the influence function for U-statistics. Hence we adopt the

0 In-Kwon Yeo, Richard A. Johnson, XinWei Deng following approach. Instead of working with the statistic U n (θ 0 ) in (3.), we use the asymptotically equivalent statistic n 1 n η(z j ; θ 0 ), η(z 1 ; θ 0 ) = E[η(z 1, Z ; θ 0 )] = C(t, z 1, θ 0 )E [C(t, Z, θ 0 )] w(t) dt + S (t, z 1, θ 0 )E [S (t, Z, θ 0 )] w(t) dt. (4.1) Since η(z j ; θ 0 ) s are independent and identically distributed, we can use the usual M-equation theory of robustness based on estimating equations 0 = n 1 n η(z j ; θ 0 ), the derivative becomes η(z 1 ; θ 0 ) = E [ η(z 1, Z ; θ 0 ) ] = {E [T 1 (z 1, Z, t, θ 0 )] + E [T (z 1, Z, t, θ 0 )] t w(t) dt + {E [T 1 (Z, z 1, t, θ 0 )] + E [T (Z, z 1, t, θ 0 )] t w(t) dt. (4.) Assume that the conditions of Theorem 1 hold. Then, the influence function is given by IF(x 0, F) = M 1 η(z; θ 0 ), (4.3) M = E [ η(z 1 ; θ 0 ) ] = E [cos(tz 1 (θ))] E [ cos(tz 1 (θ)) Z 1 (θ 0 ) Z 1 (θ 0 ) ] T t w(t) dt E [sin(tz 1 (θ))] E [ sin(tz 1 (θ)) Z 1 (θ 0 ) Z 1 (θ 0 ) ] T t w(t) dt E [cos(tz 1 (θ)) Z 1 (θ 0 )] E [ cos(tz 1 (θ)) Z 1 (θ 0 ) ] T t w(t) dt E [sin(tz 1 (θ)) Z 1 (θ 0 )] E [ sin(tz 1 (θ)) Z 1 (θ 0 ) ] T t w(t) dt + E [cos(tz 1 (θ))] E [ sin(tz 1 (θ)) Z 1 (θ 0 ) ] tw(t) dt {E [S 1 (t, Z 1, Z 1, θ 0 )] + te [C 1 (t, Z 1, Z 1, θ 0 )] e t t w(t) dt, because Z 1 and Z are independent and identically distributed. The influence function (4.3) is the standard result for M-estimator in an independent and identically distributed setting. Under our assumptions, all integrals and expectations in (4.1), (4.), and (4.3) are finite. For the Box-Cox transformation, the Yeo-Johnson transformation and the modulus transformation, λ 0 = 1 implies that the transformation is not necessary to improve normality. For λ 0 = 1, the influence function of maximum likelihood estimator under normality is proportional to x log(x), for x > 0, when the Box-Cox transformation is applied. Under our characteristic function approach, the influence function of the proposed estimator is proportional to x log(x). This calculation shows that the proposed estimator of λ is still sensitive, but less sensitive than the normal maximum likelihood estimate to an outlier. It can also be easily shown that the influence of an outlier to estimate µ is bounded. In the case µ = 0 and σ = 1, Carroll (1980) proposed an estimator based on robustness arguments that also has influence function proportional to x log(x) when a just estimation of λ is considered.

An ECF Approach to Selecting a Transformation 1 5. Comparison with MLE In this section, we present a small simulation that supports the influence function calculation by showing that our empirical characteristic function approach provides an estimator of λ that is more robust than the maximum likelihood estimator. Moreover, our proposed estimator appears to be slightly better than Carroll s robust estimator under the mixed alternatives considered below. Our proposed method to estimate λ is compared with the maximum likelihood method and Carroll s method. When the Box-Cox transformation is employed, the maximum likelihood method estimates λ by maximizing the profile log-likelihood function l(λ; x) = n log ( ˆσ(λ) ) + (λ 1) log(x i ), ˆσ(λ) = n 1 n (ψ(x j, λ) ˆµ(λ)) and ˆµ(λ) = n 1 n ψ(x j, λ). Carroll follows Huber (1964) with the choice of loss function x, x k, ρ(x) = ( k x k ), x > k. He expresses the likelihood in terms of the density with normal center-exponential tails as follows; Thus the log-likelihood is L(θ; x) = σ n n l(θ; x) = n log(σ) + { exp ρ { ρ ( ψ(xi, λ) µ σ ( ψ(xi, λ) µ σ ) + (λ 1) log(x i ). ) + (λ 1) log(x i ). The estimation of λ consists of the following steps: For a fixed λ, take a starting value of σ and then estimate µ by solving ( ) ψ(xi, λ) µ τ = 0, σ τ is the derivative of ρ. The iterative procedure estimates σ by solving 1 n 1 ( ) τ ψ(xi, λ) µ [ = E Φ τ (Z) ], σ the last expectation is taken with respect to a standard normal variable Z. To find the value of λ, maximize the likelihood function in ˆλ = arg max L(µ(λ), σ(λ), λ). λ As in Carroll (1980), we take k = in the definition of ρ(x).

In-Kwon Yeo, Richard A. Johnson, XinWei Deng Table 1: Simulation results for the estimation of λ with sample size n = 100. The weight function for our method is a truncated version of 1/t. The mean, MSE and MAE are calculated from 1000 runs. Distribution (1) () (3) (4) (5) (6) Summary Estimation Method Proposed MLE Carroll λ Bias 0.0017 0.0009 0.0017 0 SD 0.0014 0.001 0.001 MSE 0.0018 0.0015 0.0016 MAE 0.0343 0.0303 0.0305 Bias 0.0006 0.0005 0.0007 0 SD 0.0015 0.0016 0.0016 MSE 0.001 0.006 0.006 MAE 0.0366 0.040 0.0401 Bias 0.0173 0.0098 0.0459 0.5 SD 0.0079 0.006 0.0064 MSE 0.060 0.0385 0.047 MAE 0.198 0.1558 0.161 Bias 0.165 0.159 0.391 0.5 SD 0.008 0.0074 0.0074 MSE 0.0838 0.1008 0.11 MAE 0.301 0.579 0.753 Bias 0.0398 0.0578 0.1164 1 SD 0.0131 0.0115 0.005 MSE 0.171 0.1348 0.1655 MAE 0.3444 0.961 0.331 Bias 0.649 0.4374 0.4838 1 SD 0.015 0.0144 0.0147 MSE 0.3001 0.3999 0.4495 MAE 0.4410 0.5107 0.5468 To evaluate the performance of the three methods of estimating λ, we generate pseudo random numbers from six different distributions. We write N(µ, 1) for a normal random variable having mean µ and variance 1. Also, for instance, the log-normal random variable whose logarithm is normal and has mean and variance is written as exp(n(0, 1)). (1) log-normal distribution: exp(n(0, 1)). () mixture of log-normal distributions: 0.98 exp(n(0, 1)) + 0.0 exp(5n(0, 1)). (3) normal squared distribution: N(5, 1). (4) mixture of normal squared distributions: 0.98N(5, 1) + 0.0N(8, 1). (5) normal distribution: N(5, 1). (6) mixture of normal distributions: 0.98N(5, 1) + 0.0N(8, 1). Note that for situations (), (4) and (6), the generated data will contain about % outliers. For practical purposes, we truncated w(t) = t at t 50. Initially, we also tried some weight functions that place less mass near 0. For each situation, we generate samples of size n = 100 and obtain the estimate ˆλ from each estimation method. This procedure is repeated N = 1000 times. For each estimation method, we summarize performance by calculating the mean and the standard deviation of ˆλ, the mean squared

An ECF Approach to Selecting a Transformation 3 error(mse) of ˆλ, and the mean absolute error (MAE) of ˆλ. Specifically, we calculate MSE = 1 N N (ˆλ i λ ), MAE = 1 N N ˆλ i λ. ˆλ i is the estimate of λ from the i th replication and λ denotes the value of true λ. Table 1 reports the bias and the standard error of ˆλ, the MSE, and the MAE for the three methods. Carroll (1980) also gives a simulation his and the Box-cox estimator have nearly the same behavior under a log-normal distribution. Based on these summary statistics, the proposed method shows worse performance when underlying distribution is not a mixture. However, in all three cases the underlying distribution is a mixture, our proposed method based on the empirical characteristic function has a smaller bias, MSE and MAE than the Box-Cox estimation and Carroll s estimation. This implies that the proposed method is less sensitive to outliers. References Andrews, D. F. (1971). A note on the selection of data transformations, Biometrika, 58, 49 54. Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations, Journal of the Royal Statistical Society, B 6, 11 5. Carroll, R. J. (1980). A robust method for testing transformations to achieve approximate normality, Journal of the Royal Statistical Society, B 4, 71 78. Cook, R. A. and Wang, P. C. (1983). Transformations and influential cases in regression, Technometrics, 5, 337 345. Epps, T. W. and Pulley, L. B. (1983). A test for normality based on the empirical characteristic function, Biometrika, 70, 73 76. Fan, Y. (1997). Goodness-of-fit tests for a multivariate distribution by the empirical characteristic function, Journal of Multivariate Analysis, 6, 36 63. Hinkley, D. V. (1975). On power transformations to symmetry, Biometrika, 6, 101 111. Hinkley, D. V. and Wang, S. (1988). More about transformations and influential cases in regression. Technometrics, 30, 435 440. Huber, P. J. (1964). Robust estimation of a location parameter, Annals of Statistics, 53, 73 101. Jimenez-Gamero, M. D., Alba-Fernandez, V., Munoz-Garcia, J. and Chalco-Cano, Y. (009). Goodnessof-fit tests based on empirical characteristic functions, Computational Statistics and Data Analysis, 53, 3957 3971. John, J. A. and Draper, N. R. (1980). An alternative family of transformations, Applied Statistics, 9, 190 197. Kim, C., Storer, B. E. and Jeong, M. (1996). A note on Box-Cox transformation diagnostics, Technometrics, 38, 178 180. Klar, B. and Meintanis, S. G. (005) Tests for normal mixtures based on the empirical characteristic function, Computational Statistics and Data Analysis, 49, 7 4. Koutrouvelis, I. A. (1980). A goodness-of-fit test of simple hypothesis based on the empirical characteristic function, Biometrika, 67, 38 40. Koutrouvelis, I. A. and Kellermeier, J. (1981). A goodness-of-fit based on the empirical characteristic function when parameters must be estimated, Journal of the Royal Statistical Society, B 43, 173 176. Lee, A. J. (1990). U-statistics: Theory and Practice, Marcel Dekker, New York.

4 In-Kwon Yeo, Richard A. Johnson, XinWei Deng Rubin, H. (1956). Uniform convergence of random functions with applications to statistics. Annals of Mathematical Statistics, 7, 00 03. Szekely, G. J., Rizzo, M. L. and Bakirov, N. K. (007). Measuring and testing dependence by correlation of distances, Annals of Statistics, 35, 769 794. Taylor, J. M. G. (1985). Power Transformations to Symmetry, Biometrika, 7, 145 15. Tsai, C. L. and Wu, X. (1990). Diagnostics in transformation and weighted regression, Technometrics, 3, 315 3. van Zwet, W. R. (1964). Convex transformations of random variables, Amsterdam: Mathematisch Centrum, Yeo, I. K. (001). Selecting a transformation to reduce skewness, Journal of the Korean Statistical Society, 30, 563 571. Yeo, I. K. and Johnson, R. A. (000). A new family of power transformations to improve normality or symmetry, Biometrika, 87, 954 959. Yeo, I. K. and Johnson, R. A. (001). A uniform strong law of large numbers for U-statistics with application to transforming to near symmetry, Statistics and Probability Letters, 51, 63 69. Received November 9, 013; Revised March 7, 014; Accepted April 9, 014