Contributions to the Theory of Robust Inference

Similar documents
Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

AP Statistics Notes Unit Two: The Normal Distributions

, which yields. where z1. and z2

Resampling Methods. Chapter 5. Chapter 5 1 / 52

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Distributions, spatial statistics and a Bayesian perspective

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

Math Foundations 20 Work Plan

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

Lyapunov Stability Stability of Equilibrium Points

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

Pattern Recognition 2014 Support Vector Machines

Kinetic Model Completeness

A Matrix Representation of Panel Data

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

Computational modeling techniques

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

ENSC Discrete Time Systems. Project Outline. Semester

Differentiation Applications 1: Related Rates

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

Support-Vector Machines

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

Checking the resolved resonance region in EXFOR database

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Homology groups of disks with holes

Computational modeling techniques

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Least Squares Optimal Filtering with Multirate Observations

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

5 th grade Common Core Standards

Tree Structured Classifier

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

Chapter 3: Cluster Analysis

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

7 TH GRADE MATH STANDARDS

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

A Few Basic Facts About Isothermal Mass Transfer in a Binary Mixture

UNIV1"'RSITY OF NORTH CAROLINA Department of Statistics Chapel Hill, N. C. CUMULATIVE SUM CONTROL CHARTS FOR THE FOLDED NORMAL DISTRIBUTION

What is Statistical Learning?

You need to be able to define the following terms and answer basic questions about them:

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Physics 2010 Motion with Constant Acceleration Experiment 1

Comparing Several Means: ANOVA. Group Means and Grand Mean

Eric Klein and Ning Sa

The blessing of dimensionality for kernel methods

Pure adaptive search for finite global optimization*

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Revisiting the Socrates Example

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD

Sequential Allocation with Minimal Switching

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

Localized Model Selection for Regression

OF SIMPLY SUPPORTED PLYWOOD PLATES UNDER COMBINED EDGEWISE BENDING AND COMPRESSION

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Inference in the Multiple-Regression

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

We can see from the graph above that the intersection is, i.e., [ ).

A Regression Solution to the Problem of Criterion Score Comparability

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

Simple Linear Regression (single variable)

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

BASD HIGH SCHOOL FORMAL LAB REPORT

NUMBERS, MATHEMATICS AND EQUATIONS

Materials Engineering 272-C Fall 2001, Lecture 7 & 8 Fundamentals of Diffusion

Verification of Quality Parameters of a Solar Panel and Modification in Formulae of its Series Resistance

ON-LINE PROCEDURE FOR TERMINATING AN ACCELERATED DEGRADATION TEST

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

Thermodynamics Partial Outline of Topics

Module 4: General Formulation of Electric Circuit Theory

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Snow avalanche runout from two Canadian mountain ranges

Pipetting 101 Developed by BSU CityLab

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Lead/Lag Compensator Frequency Domain Properties and Design Methods

How do scientists measure trees? What is DBH?

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

Cambridge Assessment International Education Cambridge Ordinary Level. Published

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Function notation & composite functions Factoring Dividing polynomials Remainder theorem & factor property

ECEN 4872/5827 Lecture Notes

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

^YawataR&D Laboratory, Nippon Steel Corporation, Tobata, Kitakyushu, Japan

ANSWER KEY FOR MATH 10 SAMPLE EXAMINATION. Instructions: If asked to label the axes please use real world (contextual) labels

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Transcription:

Cntributins t the Thery f Rbust Inference by Matías Salibián-Barrera Licenciad en Matemáticas, Universidad de Buens Aires, Argentina, 1994 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Dctr f Philsphy in THE FACULTY OF GRADUATE STUDIES (Department f Statistics) we accept this thesis as cnfrming t the required standard The University f British Clumbia July 2000 c Matías Salibián-Barrera, 2000

Abstract We study the prblem f perfrming statistical inference based n rbust estimates when the distributin f the data is nly assumed t belng t a cntaminatin neighburhd f a knwn central distributin. We start by determining the asympttic prperties f sme rbust estimates when the data are nt generated by the central distributin f the cntaminatin neighburhd. Under certain regularity cnditins the cnsidered estimates are cnsistent and asympttically nrmal. Fr the lcatin mdel and with additinal regularity cnditins we shw that the cnvergence is unifrm n the cntaminatin neighburhd. We determine that a class f rbust estimates satisfies these requirements fr certain prprtins f cntaminatin, and that there is a trade-ff between the rbustness f the estimates and the extent t which the unifrmity f their asympttic prperties hlds. When the distributin f the data is nt the central distributin f the neighburhd the asympttic variance f these estimates is invlved and difficult t estimate. This prblem affects the perfrmance f inference methds based n the empirical estimates f the asympttic variance. We present a new re-sampling methd based n Efrn s btstrap (Efrn, 1979) t estimate the sampling distributin f MM-lcatin and regressin estimates. ii

This methd vercmes the main drawbacks f the use f btstrap with rbust estimates n large and ptentially cntaminated data sets. We shw that ur prpsal is cmputatinally simple and that it prvides stable estimates when the data cntain utliers. This new methd extends naturally t the linear regressin mdel. iii

Cntents Abstract ii Cntents iv List f Tables vii List f Figures ix Acknwledgements xii Dedicatin xiv 1 Intrductin 1 1.1 Rbust estimates and data screening.................. 3 1.2 The variability caused by cleaning the data............... 7 1.3 Inference based n rbust estimates................... 12 1.4 Btstrapping rbust estimates..................... 15 1.5 A new cmputer intensive methd.................... 17 1.6 Thesis utline............................... 25 iv

2 Glbal asympttic prperties f rbust estimates fr the lcatinscale mdel 27 2.1 Definitins................................. 29 2.2 Rbustness prperties.......................... 36 2.3 Asympttic prperties.......................... 40 2.3.1 Unifrm cnsistency f the S-scale estimate.......... 41 2.3.2 Cnsistency f the S-lcatin estimate............. 43 2.3.3 Unifrm cnsistency f the S-lcatin estimate......... 51 2.3.4 Cnsistency f the MM-lcatin estimate............ 61 2.3.5 Unifrm cnsistency f the MM-lcatin estimate....... 64 2.3.6 Asympttic distributin f the MM-lcatin estimate..... 68 2.3.7 Unifrm asympttic distributin fr MM-lcatin estimates.. 74 3 Rbust btstrap fr the lcatin-scale mdel 84 3.1 Definitins................................. 91 3.2 Examples................................. 95 3.2.1 One sample lcatin-scale: Bld pressure........... 95 3.2.2 Tw-sample lcatin-scale: Seeded cluds........... 96 3.3 Asympttic prperties.......................... 101 3.4 Rbustness prperties.......................... 111 3.5 Studentizing the rbust btstrap.................... 115 3.6 Inference.................................. 119 3.6.1 Asympttic variance estimatin................. 121 3.6.2 Cverage and lengths f cnfidence intervals.......... 125 v

4 Glbal asympttic prperties f rbust estimates fr the linear regressin mdel 138 4.1 Definitins................................. 139 4.2 Rbustness prperties.......................... 144 4.3 Asympttic prperties.......................... 146 4.3.1 Cnsistency f the S-scale estimate............... 147 4.3.2 Cnsistency f the S- and MM-regressin estimates...... 161 4.3.3 Asympttic distributin f the MM-regressin estimate.... 164 5 Rbust btstrap fr the linear regressin mdel 166 5.1 Definitins................................. 169 5.2 Examples................................. 172 5.2.1 Bdy and Brain Weights..................... 172 5.2.2 Belgium Internatinal Phne Calls............... 174 5.3 Asympttic prperties.......................... 180 5.4 Rbustness prperties.......................... 188 5.5 Inference.................................. 195 5.5.1 Empirical cverage levels f cnfidence intervals........ 195 6 Cnclusin 208 7 Appendix - Auxiliary results 212 Bibligraphy 233 vi

List f Tables 1.1 Cmparisn f the estimated SDs f the linear regressin estimates fr the Stack Lss data............................ 9 1.2 Cmparisn f actual and estimated standard deviatins using the HRR methd t clean the data..................... 11 2.1 Numerical parameters fr Tukey s family f functins ρ d....... 60 2.2 Numerical evaluatin f regularity cnditins required fr unifrm cnsistency f S-lcatin estimates with Tukey s family f functins ρ d. 60 3.1 Cmparisn f breakdwn pints f classical and rbust btstrap quantile estimates fr MM-lcatin estimatrs............. 115 3.2 Cmparisn f asympttic variance estimates - quadratic measure.. 125 3.3 Cmparisn f asympttic variance estimates - lgarithmic measure. 126 3.4 Cverage and length f 95% cnfidence intervals fr the lcatin-scale mdel................................... 129 3.5 Cverage and length f 99% cnfidence intervals fr the lcatin-scale mdel................................... 129 vii

5.1 Belgium Internatinal Calls - Btstrap and rbust btstrap 95% cnfidence intervals............................ 177 5.2 Cmparisn f quantile upper breakdwn pints fr linear regressin. 194 5.3 Average cverage and length f 5,000 95% cnfidence intervals fr the linear regressin mdel with p = 2.................... 198 5.4 Cverage and length f 5,000 99% cnfidence intervals fr the linear regressin mdel with p = 2....................... 199 5.5 Cverage and length f 5,000 95% cnfidence intervals fr the linear regressin mdel with p = 5....................... 200 5.6 Cverage and length f 5,000 99% cnfidence intervals fr the linear regressin mdel with p = 5....................... 202 viii

List f Figures 1.1 Least Squares residuals fr the Stack Lss data............ 5 1.2 Rbust Regressin residuals fr the Stack Lss data.......... 6 1.3 QQ-plts f the re-sampled Intercept cefficient estimates btained with bth the classical and rbust btstrap fr the Stack Lss data. 21 1.4 QQ-plts f the re-sampled Air Flw cefficient estimates btained with bth the classical and rbust btstrap fr the Stack Lss data. 22 1.5 QQ-plts f the re-sampled Water Temperature cefficient estimates btained with bth the classical and rbust btstrap fr the Stack Lss data.................................. 23 1.6 QQ-plts f the re-sampled Acid Cncentratin cefficient estimates btained with bth the classical and rbust btstrap fr the Stack Lss data.................................. 24 2.1 Plts f γ ( F ɛ, t, σ (F ɛ ) ) fr ɛ = 0.15 and 0.20............. 48 2.2 Plts f γ ( F ɛ, t, σ (F ɛ ) ) fr ɛ = 0.25 and 0.30............. 49 2.3 Plts f γ ( F ɛ, t, σ (F ɛ ) ) fr ɛ = 0.40 and 0.45............. 50 2.4 Plts t determine regularity cnditins n ρ d............. 59 ix

3.1 Bxplts f 3,000 re-calculated MM-lcatin estimates with the classical and rbust btstrap.......................... 88 3.2 Bxplts f 3,000 re-calculated MM-lcatin estimates with the classical and rbust btstrap n an artificial data set with 35% utliers.. 90 3.3 Cmparisn f the classical and rbust btstrap distributin estimates fr the mdified bld pressure data................... 97 3.4 Precipitatin data............................. 98 3.5 Cmparisn f asympttic variance estimates - quadratic measure.. 130 3.6 Cmparisn f asympttic variance estimates - lgarithmic measure. 131 3.7 Lcatin-scale mdel 95% cnfidence intervals fr n = 20....... 132 3.8 Lcatin-scale mdel 95% cnfidence intervals fr n = 30....... 133 3.9 Lcatin-scale mdel 95% cnfidence intervals fr n = 50....... 134 3.10 Lcatin-scale mdel 99% cnfidence intervals fr n = 20....... 135 3.11 Lcatin-scale mdel 99% cnfidence intervals fr n = 30....... 136 3.12 Lcatin-scale mdel 99% cnfidence intervals fr n = 50....... 137 5.1 Least squares and rbust regressin fits t the Brain and Bdy Weight data.................................... 175 5.2 Classical and rbust btstrap distributin estimates fr the Brain and Bdy Weight data............................. 176 5.3 Least squares and rbust regressin fits t the Belgium Internatinal Phne Calls data............................. 178 5.4 Cmparisn f classical and rbust btstrap distributin estimates fr the Belgium Internatinal Phne Calls data.............. 179 x

5.5 Average cverage f 95% cnfidence intervals fr the linear regressin mdel with p = 2............................. 204 5.6 Average cverage f 99% cnfidence intervals fr the linear regressin mdel with p = 2............................. 205 5.7 Average cverage f 95% cnfidence intervals fr the linear regressin mdel with p = 5............................. 206 5.8 Average cverage f 99% cnfidence intervals fr the linear regressin mdel with p = 5............................. 207 xi

Acknwledgements Many peple helped me during the years I spent at the Department. N line f this thesis culd have been written withut my wife Verónica D Angel by my side and ur sn Lucas cnstantly teaching me what is really imprtant. Financial supprt was prvided by the University f British Clumbia thrugh a University Graduate Fellwship; by the Department f Statistics via part-time appintments as Teaching Assistant; by Dr. Ruben Zamar with appintments as Research Assistant during many summers and by the Statistical Cnsultant and Research Labratry with a part-time psitin. I we much mre than what I can express here t ur families back hme (Alfred, Marisa and Flrencia Salibián, Oscar and Nemí D Angel) wh always believed in us and uncnditinally supprted us all these years; t ur friend Snia Mazzi, wh made ur settling in this remte land a smth and pleasant adventure; t the Palejks (Gerge, Gill, Nancy, Ingrid, and Rger) wh became ur family in Canada; and t ur gd friends Daniel and Iris Brendle-Mczuk, Geneviève Gagné, Stephane Lemieux (and Eugenie!), Rémi Gilbert, Carline Grndin (Victr and Zacharias), the Hllmans (Jrge, Sandra and Rcí), Jrge Adrver, Héctr Palaci, Luis Suer, and Carls Di Bella. xii

Many peple at the Department als helped in different ways: Jarek Harezlak, being a gd friend and clleague; Rick White, wh was always ready fr a chat; Christine Graham (what wuld we d withut her?); Dr. Nancy Heckman and Dr. Jhn Petkau, wh n many ccasins spent sme time with the students at the pub; and last but nt least: everybdy at The L. Gang. Finally, I wuld like t thank Dr. Ruben Zamar fr his encuragement, cnstant availability, patience, guidance and supprt; Dr. Nancy Heckman and Dr. Jhn Petkau fr their helpful cmments, suggestins and advice; and Dr. Jim Zidek fr his advice. Matías Salibián-Barrera The University f British Clumbia July 2000 xiii

xiv Fr Lucas and Ver

Chapter 1 Intrductin In this chapter we intrduce and illustrate the prblems addressed in the rest f this thesis. In the first sectin we use a real-life example t shw hw an analysis based n a rbust regressin estimate cmpares with previus analyses f these data. In thse analyses the data were carefully screened, suspicius bservatins were deleted, and least squares methds were used n the remaining data. Unfrtunately there are n prpsals in the literature t cnsistently estimate the variability f the estimates btained after deleting ptential utliers. The secnd sectin f this chapter explres this prblem. An alternative methd t perfrm statistical inference when the data are cntaminated is t use rbust estimates. Mst attentin in the rbust literature has been paid t the case f errrs with symmetric distributins. Sectin 1.3 briefly re- 1

views sme f the published studies fr asymmetric distributins. In the same sectin we discuss ur results regarding the asympttic prperties f sme rbust estimates fr mre general errr distributins. In particular we study their cnsistency and asympttic distributin. Empirical estimates f these asympttic variances prvide cnsistent estimates f the variability f these rbust estimates. Unfrtunately, simulatin experiments suggest that they can be numerically unstable and hence yield pr estimates. We als cnsider cmputer-intensive inference methds, in particular Efrn s btstrap (Efrn, 1979). In Sectin 1.4 we discuss tw drawbacks f the use f Efrn s btstrap with rbust estimates. Bth the presence f utliers in the data and the cmputatinal cmplexity f rbust estimates are imprtant challenges fr this methd. In Sectin 1.5 we intrduce a new cmputer-intensive methd that vercmes these limitatins and hence can be used with large data sets that might cntain utliers. In this sectin we present the basic idea fr the simple lcatin mdel with knwn scale. Details f the applicatin f this methd t the lcatin-scale and linear regressin mdels are presented in Chapters 3 and 5 respectively. Finally, Sectin 1.6 utlines the rest f this thesis. 2

1.1 Rbust estimates and data screening Cnsider the Stack Lss data, first published by Brwnlee (1965, page 454). These data have been extensively studied in the literature (see Daniel and Wd, 1980, Chapters 5 and 7; Atkinsn, 1985, pp. 129-136, 267-8; and Venables and Ripley, 1997, pp 262-264). They cnsist f 21 daily bservatins measured in a plant fr the xidatin f ammnia t nitric acid. The respnse variable is ten times the percentage f ammnia lst. This is an indirect measure f the efficiency f the plant. There are 3 explanatry variables: air flw, temperature f the cling water and acid cncentratin. The linear mdel used in the literature is Ammnia Lst (%) = β 0 + β 1 Air flw + β 2 Water temperature + β 3 Acid cncentratin + ɛ, (1.1) where ɛ are independent identically distributed nrmal errrs. The residuals f the least squares fit f mdel (1.1) presented sme features wrth further cnsideratin. After a very careful analysis f the listing f the data, Daniel and Wd (1971, Chapter 5, page 81) nticed a different behavir in the respnse variable every time the water temperature was abve 60. They cncluded that the plant seemed t have needed a perid f ne day t stabilize after the water temperature reached 60. Hence they decided that the bservatins that were btained with water temperature abve 60 (cases 1, 3, 4 and 21) require special attentin, and were remved frm the analysis. Figure 1.1 cntains the plt f residuals versus fitted values fr the least squares 3

fit. The dtted lines crrespnd t twice the estimated standard deviatin f the errrs in (1.1). Nte that bservatin number 21 appears t have a residual cnsiderably larger than the thers. Three ther cases are smewhat utlying, but within 2 estimated standard deviatins frm zer. Classical utlier detectin methds, such as the externally Studentized residuals test (Weisberg, 1985, page 115-6) nly detect bservatin 21 as an utlier. We als estimated the cefficients f mdel (1.1) using an MM-regressin estimate with 50% breakdwn pint, 95% efficiency and Tukey s lss functins (see Sectins 4.1 and 4.2 fr the crrespnding definitins). We wrked with the cmplete data set. The plt f residuals versus fitted values is shwn in Figure 1.2. The dtted lines crrespnd t twice the estimated standard deviatin f the errrs. With this rbust estimate cases 1, 3, 4 and 21 are clearly identified as utliers. This example illustrates the ptential f rbust estimates. Daniel and Wd (1980) had t rely n an careful analysis f the listing f the data until sme pattern seemed apparent. The additinal cmplicatins and limitatins f this methd when the data have either mre explanatry variables r mre cases are bvius. In this example the analysis based n a rbust regressin estimate yields the same cnclusin as Daniel and Wd (1980, Chapter 5), namely that bservatins 1, 3, 4 and 21 seem t fllw a different mdel frm the rest f the data. Nte that we did nt require a detailed case-by-case analysis as Daniel and Wd did (1980, Chapter 5). Frm the discussin abve ne might cnclude that the main rle f rbust estimates is t help t identify utliers r suspicius bservatins. These cases culd 4

RESIDUALS -10-5 0 5 10 21 10 20 30 40 FITTED VALUES Figure 1.1: Residuals f the least squares fit fr the Stack Lss data. The dtted lines crrespnd t twice the estimated standard deviatin f the errrs 5

RESIDUALS -10-5 0 5 10 4 21 3 1 10 20 30 40 FITTED VALUES Figure 1.2: Residuals f a rbust fit fr the Stack Lss data. The dtted lines crrespnd t twice the estimated standard deviatin f the errrs 6

then be discarded and classical methds applied t the clean data set. In the next sectin we discuss sme drawbacks f this apprach. 1.2 The variability caused by cleaning the data There are tw classes f methds t detect utliers: subjective and bjective prcedures. In this sectin we will fcus n utlier detectin methds applied t linear regressin analyses. Subjective methds rely n the judgment f data analysts. They nrmally use a classical fit fllwed by an analysis f the residuals. Using plts and ther devices the researcher identifies utliers r suspicius bservatins. These bservatins are then remved and classical methds applied t the remaining data. A frmal study f the variability intrduced int the final least squares estimate by these data-cleaning methds seems impssible with the mathematical tls available tday (but see Relles and Rgers (1977) fr a Mnte Carl experiment n subjective utlier rejectin rules). On the ther hand, bjective methds are based n a well defined rule, such as: discard all bservatins with a residual larger than 2 standard deviatins, r reject all bservatins with assciated Cx Distance larger than 1. Because it is expected that if there are utliers in the data the classical fit will be misleading, anther set f bjective rules are based n the residuals frm a rbust fit as fllws: 7

1. Fit a rbust estimate. 2. Calculate a rbust estimate f the standard deviatin f the residuals, ˆσ. 3. Fix a number c > 0 and drp any bservatin with a residual larger than c ˆσ (typically 2 c 3). 4. Apply classical methds t the remaining data. We will refer t this last family f methds as hard rejectin rules (HRR). See Hampel et al. (1986, page 31) fr a Mnte Carl study f bjective rejectin rules fr the lcatin mdel. If we apply steps 1-4 t the Stack Lss data with the same MM-regressin estimate we used befre, and set c = 2 in step 3, we find that bservatins 1, 3, 4 and 21 shuld be remved. The least squares fit f the remaining 17 data pints yields regressin estimates that are indistinguishable frm the MM-regressin fit. Hwever, the estimates f the standard deviatins f the regressin estimates given by the least squares analysis are cnsistently smaller than thse reprted by the rbust prcedure (see Table 1.1). It is imprtant t nte that the standard errrs f the estimates reprted by the least squares analysis f the cleaned data d nt take int accunt the variability intrduced by the cleaning step. In ther wrds, the clumn f estimated standard deviatins in the cmputer utput may nt reflect the actual variability f the reprted pint estimates. 8

Estimated Standard Deviatins LS n the Rbust Cefficient cleaned data fit Intercept 4.732 5.003 Air flw 0.067 0.071 Water temp. 0.166 0.176 Acid cnc. 0.062 0.065 Residuals 1.253 1.837 Table 1.1: Cmparisn f the estimated standard deviatins f the linear regressin estimates fr the Stack Lss data T illustrate this prblem we perfrmed a small Mnte Carl experiment (als see Dupuis and Hamiltn (2000) fr a theretical assessment f this inference prcedure). The bjective f the experiment is t shw that the estimates f the standard deviatins f the regressin estimates calculated by the HRR methd cnsistently underestimate the actual standard deviatins f thse regressin estimates. In rder t d s, we first estimated the actual variability f the pint estimates btained by using a HRR. We cnsidered a linear mdel f the frm y i = β 0 + β 1 x 1i + + β p x pi + ɛ i, i = 1,..., n, (1.2) where ɛ i are independent standard nrmal randm variables. Nte that in the abve mdel there are n utliers in the data. We used all the cmbinatins with n = 20, n = 50, p = 1 and p = 3. The rbust estimate used in the HRR prcedure was a 95% efficient MM-regressin estimate with 50% breakdwn pint and scale calculated with Tukey s lss functin (see Sectins 4.1 and 4.2 fr the crrespnding definitins). Fr each cmbinatin f sample size and number f predictrs we generated 100,000 9

samples fllwing mdel (1.2). With each sample we fllwed steps 1 t 4 abve. In step 3 we used c = 2.5 and the rbust estimate f the standard deviatin f the errrs assciated with the MM-regressin estimate. Our estimate f the actual variability f these estimates is the Mnte Carl standard deviatin f these 100,000 cefficient estimates. In Table 1.2, the clumn labeled Mnte Carl estimate f the standard deviatin cntains this estimated standard deviatin fr each cefficient in the different mdels. The next step in the experiment is t shw that the estimates f thse standard deviatins as reprted by the HRR analysis are cnsistently smaller that the estimates btained in the first part f ur study. With the same design matrices we generated 100,000 new samples fllwing mdel (1.2). We applied steps 1 t 4 as befre t each f these new samples, and recrded the estimates f the standard errrs f each cefficient as reprted by the least squares analysis in step 4. Clumn HRR estimates f the standard deviatin in Table 1.2 cntains the mean and standard errr f these 100,000 estimated standard deviatins. Frm Table 1.2 it is clear that the estimates f the standard deviatins reprted by the least squares fit after cleaning the data cnsistently underestimate the actual variability f this estimatin prcedure. Hence we might btain ptimistic cnfidence intervals and smaller p-values than their actual value. The researcher shuld be cncerned that this difference can affect the validity f his r her cnclusins. An alternative methd f perfrming inference that can deal with utliers in the data is t use rbust estimates. These methds naturally incrprate the variability f 10

HRR estimates f the Mnte Carl estimate f n β standard deviatin the standard deviatin p = 2 20 β 0 0.205 (0.046) 0.256 β 1 0.227 (0.051) 0.283 50 β 0 0.133 (0.017) 0.152 β 1 0.138 (0.017) 0.156 p = 4 20 β 0 0.182 (0.057) 0.322 β 1 0.164 (0.051) 0.410 β 2 0.173 (0.054) 0.478 β 3 0.177 (0.056) 0.295 50 β 0 0.135 (0.018) 0.159 β 1 0.144 (0.019) 0.170 β 2 0.145 (0.019) 0.171 β 3 0.132 (0.018) 0.157 Table 1.2: Cmparisn f actual and estimated standard deviatins using the HRR methd t clean the data. The first clumn cntains the Mnte Carl mean f the HRR estimates f the standard deviatins and the crrespnding Mnte Carl standard deviatin within parentheses. The secnd clumn cntains the estimate f the standard deviatins btained frm a separate simulatin experiment. These last values are the actual standard deviatins. 11

the dwn-weighting step int the estimated standard deviatins. In the next sectin we discuss sme limitatins f the existing asympttic thery fr rbust estimates. 1.3 Inference based n rbust estimates The finite sample distributin f rbust estimates is unknwn and hence inference must be based n their asympttic distributin (see Hampel et al., 1986, Chapter 3; Rnchetti, 1982; Markatu and Hettmansperger, 1990; amng thers). The asympttic distributin f rbust regressin estimates is well knwn when the distributin f the errrs is symmetric (Huber, 1967; Marnna and Yhai, 1981; Davies, 1993). In this case the estimates f the regressin cefficients and f the scale f the errrs are asympttically independent. Because utliers need nt be balanced n bth sides f the regressin line, many data sets with utliers d nt satisfy this symmetry assumptin. If ne relaxes this cnditin, the calculatin f the asympttic variance f rbust lcatin and regressin estimates becmes very invlved. The main difficulty seems t be that the scale estimate is n lnger asympttically independent f the estimate f the lcatin r regressin parameters. This prblem has received little attentin in the literature. Carrll (1979), Huber (1981) and Rcke and Dwns (1981) are amng the few wh studied it. Carrll (1979) cmpared several variance estimates f bth lcatin and simple linear regressin rbust estimates. He shwed that the asympttic variance derived under the symmetry assumptin underestimates the true variance. In the 12

lcatin case, this effect can be amelirated by jackknifing. Hwever, this technique des nt seem t wrk fr the intercept f the simple linear regressin mdel. Huber (1981, page 140) gave a frmula t cmpute the influence functins f lcatin and scale estimates when they are calculated simultaneusly. Rcke and Dwns (1981) als studied variance estimatin fr rbust lcatin estimates when the distributin f the data is asymmetric. Their simulatin study cncluded that estimating the variance f rbust lcatin estimates in this situatin is very difficult. In particular, fr symmetric distributins the empirical estimate f asympttic variance estimate wrked better than the btstrap, but fr asymmetric distributins the perfrmances reversed. Their numerical results d nt shw a variance estimatin methd that yields gd estimates fr bth symmetric and asymmetric distributins. In Sectins 2.3 and 4.3 we study the cnsistency and asympttic distributin f the S-scale, S- and MM-lcatin and regressin estimatrs (see Sectins 2.1 and 4.1 fr the crrespnding definitins). We assume that the distributin f the errrs belngs t a cntaminatin neighburhd f a symmetric central distributin and shw that these estimates are cnsistent fr any distributin in this neighburhd. Fr the lcatin-scale mdel, with further regularity cnditins we shw that these results hld unifrmly n the neighburhd. That is, the speed f the cnvergence des nt depend n the particular distributin F in the cntaminatin neighbrhd H ɛ (see Sectin 2.2). Frmally, the unifrmity result we btain is as fllws. Let ˆµ n be the rbust lcatin estimate calculated n a sample f size n generated by a distributin F H ɛ. Let µ be the almst sure asympttic value f ˆµ n when n. 13

Let δ > 0 be arbitrary, then lim sup P F m F H ɛ { } sup ˆµ n µ > δ = 0. n m We als find that under certain regularity cnditins the MM-lcatin estimates are asympttically nrmal and we derive an explicit frmula fr their asympttic variance. Fr the lcatin mdel it has the frm V (µ, σ, F ) = σ 2 a 2 E F { [U b W ] 2 }, (1.3) where U and W are certain randm variables (see equatin 2.70 n page 71), the cnstants a and b are given by a = 1/E F {ψ ((X µ) /σ)}, and b = E F {ψ ((X µ) /σ) (X µ) /σ} E F {ρ ((X µ) /σ) (X µ) /σ}, where µ is the almst sure asympttic value f the S-lcatin estimate µ n assciated with the MM-lcatin estimate ˆµ n. The functins ψ and ρ are bunded, and cntinuusly differentiable (see Definitin 2.9). Anther result we derive fr the lcatin-scale mdel is that the asympttic nrmality f these estimates hlds unifrmly n the distributin generating the data. That is, we have { lim sup n sup P F (ˆµn µ) < x } V Φ (x ) = 0. n F H ɛ x R where Φ dentes the standard nrmal cumulative distributin functin and V = V (µ, σ, F ) is the asympttic variance given by (1.3). 14

In general, cnsistent estimates fr the asympttic standard deviatins f rbust estimates can be btained frm the crrespnding empirical asympttic variances. Fr example, t estimate V abve we can use ˆV = V (ˆµ n, ˆσ n, F n ) where F n is the empirical distributin functin f the sample x 1,..., x n. Hwever, fr the case f asymmetric errr distributins, we fund sme numerical prblems that seem t arise frm the invlved frm f V in (1.3). In particular, the denminatrs in a and b can becme small fr asymmetric distributins F. In Sectin 3.6.1 we describe a Mnte Carl experiment that illustrates the extent f this instability. 1.4 Btstrapping rbust estimates Anther apprach t estimate the variability f estimates is given by the btstrap (Efrn, 1979). This methd has been extensively studied fr diverse mdels. In particular, the thery fr btstrap distributin f rbust estimates has been studied by Shrack (1982), Parr (1985) and Yang (1985) amng thers. Tw prblems f practical relevance arise when btstrapping rbust regressin estimates. First, the prprtin f utliers in the btstrap samples may be higher than that in the riginal data set causing the btstrap quantiles t be very inaccurate. Intuitively the reasning is as fllws. Bth utlying and nn-utlying bservatins have the same chance f being in any btstrap sample. With a certain psitive prbability, the prprtin f utliers in a btstrap sample will be larger than the fractin f cntaminatin the rbust estimate tlerates. In ther wrds, a 15

certain prprtin f the re-calculated values f the rbust estimate will be heavily influenced by the utliers in the data. Thus, the tails f the btstrap distributin can be heavily influenced by the utliers. This lack f rbustness f the classical btstrap was nted by Ghsh et al. (1984), and Sha (1990, 1992) in the cntext f estimating asympttic variances, and by Singh (1998) fr quantile estimates. Ghsh et al. (1984) shwed that a cnditin is needed n the tails f the distributin f the data fr the btstrap variance estimate f the median t cnverge. Nte that n matter hw rbust the estimate being btstrapped, a tail cnditin is still needed. Sha (1990) prved that if ne truncates the tails f the btstrap distributin (with the truncatin limit ging t infinity as the sample size increases, s that asympttically there are n discarded btstrapped estimates) then the btstrap variance cnverges t the asympttic variance f the estimate f interest. Unfrtunately it is nt clear hw t implement this methd in a finite sample setting. Singh (1998) quantified this rbustness prblem fr the estimates f the quantiles f the asympttic distributin f lcatin estimates. He defined the breakdwn pint fr btstrap quantiles and shwed that it is disappintingly lw even fr highly rbust lcatin estimates. He prpsed t Winsrize the bservatins using the rbust lcatin and scale estimates and then t re-sample frm these Winsrized bservatins. He shwed that the quantiles btained frm this methd have a much higher breakdwn pint and that they cnverge t the quantiles f the asympttic distributin f the estimate. The secnd difficulty is caused by the heavy cmputatinal requirements f 16

the btstrap which are cmpunded with rbust estimates. Rbust regressin estimates are generally determined by the slutin f a nn-linear ptimizatin prblem in several dimensins. In the particular case f MM-estimates (Yhai, 1987) fr each sample we have t slve tw such prblems. Mrever, ne f them is nly implicitly defined as the slutin f a nn-linear equatin. We see that btstrapping MMestimates invlves repeatedly slving tw nn-linear ptimizatin prblems in several dimensins. We have als fund additinal cmputatinal issues that needed special attentin. Fr example, a btstrap sample may nt be in general psitin (see Definitin 5.1 in Sectin 5.4) and this has cnsequences in determining the scale f the residuals. This large number f nn-linear ptimizatin prblems may render the methd unfeasible fr high dimensinal prblems. As an example f the cmputatinal times that can be expected, the evaluatin f 5,000 btstrap re-calculatins f an MM-regressin estimate n a simulated data set with 200 bservatins and 10 explanatry variables tk 9120 CPU secnds ( 2.5 hurs) n a Sun Sparc Ultra wrkstatin. The same number f re-calculatins perfrmed with the rbust btstrap we intrduce in the next sectin tk 416 CPU secnds (apprximately 7 minutes) under the same cnditins. 1.5 A new cmputer intensive methd The basic ideas are best presented using the simple lcatin mdel. Let x 1,..., x n be a randm sample satisfying x i = µ + ɛ i, i = 1,..., n, (1.4) 17

where ɛ i are independent and identically distributed randm variables with knwn variance. Let ψ : R R be dd, bunded, and nn-decreasing. The assciated M-lcatin estimate fr µ is defined as the slutin ˆµ n f ψ (x i ˆµ n ) = 0. (1.5) We are interested in estimating the standard deviatin f ˆµ n. Fr this purpse we present the fllwing cmputer intensive methd t generate a large number f recalculated estimates ˆµ n. We will use the variability bserved in these re-calculated estimates t assess the variance f ˆµ n. bservatins: It is easy t see that ˆµ n can als be expressed as a weighted average f the ˆµ n = n ψ(x i ˆµ n ) x (x i ˆµ n) i n ψ(x i ˆµ n ) (x i ˆµ n ) = n ω i x i n ω, (1.6) i where ω i = ψ (x i ˆµ n )/ (x i ˆµ n ). This representatin f ˆµ n cannt be used directly t calculate ˆµ n because the weights n the right hand side depend n the estimate. Nte that cmmnly used functins ψ (such as Huber s family ψ c, see equatin 2.5) yield weights ω (u) = ψ (u)/ u that are decreasing functins f u. In this case, utlying bservatins that typically have a large residual x i ˆµ n will have a small weight in (1.6). Let x 1,..., x n be a btstrap sample f the data (i.e. a randm sample taken frm x 1,..., x n with replacement). Recalculate ˆµ n using equatin (1.6): ˆµ n = n ω i x i n ω i, (1.7) 18

with ω i = ψ (x i ˆµ n )/ (x i ˆµ n ). We have seen abve that bservatins that are far frm the bulk f the data will typically cme int the btstrap samples assciated with small weights. Hence the influence f utliers in the btstrapped estimate is bunded. Als nte that we are nt fully recalculating the estimate frm each btstrap sample. The re-calculated ˆµ n s in (1.7) may nt reflect the actual variability f ˆµ n. Intuitively this happens because the weights ω i are nt re-cmputed with each btstrap sample. Instead, we are using the weights btained with the estimate ˆµ n as calculated with the riginal data. T remedy this lss f variability in the ˆµ n s we use an estimable crrectin factr. One way t derive this crrectin is t think f (1.6) as a fixed-pint equatin f the frm ˆµ n = f (ˆµ n ). The first-rder Taylr expansin f f arund the limit µ f ˆµ n suggests that we shuld multiply the re-weighted ˆµ n s by [1 f (µ)] 1. With this ntatin, the crrectin factr we use is [1 f (ˆµ n )] 1. Therem 3.1 in Sectin 3.3 shws that the crrected ˆµ n s have the same asympttic distributin as the estimates ˆµ n. Our methd yields quantile estimates with a high breakdwn pint as defined by Singh (1998) (see Sectins 3.4 and 5.4). This prperty means that a high prprtin f utliers is needed t push the rbust btstrap quantile estimates abve any bund. Classical btstrap quantile estimates have a disappintingly lw breakdwn pint, in spite f the rbustness f the estimate being re-calculated (Singh, 1998). In Sectin 3 we study the rbust btstrap fr the lcatin mdel with unknwn scale. This new btstrap methd, which we call the rbust btstrap, is als cmpu- 19

tatinally simple. In the linear regressin cntext studied in Chapter 5, this prperty is very desirable. As ppsed t the classical btstrap that wuld need t slve a full multivariate ptimizatin prblem with each re-calculatin, rbust btstrap evaluatins nly require slving a linear system f equatins. T cmpare the perfrmance f ur methd with the classical btstrap we ran 5,000 rbust btstrap iteratins n the same artificial data set we used t illustrate the cmputatinal demands f the classical btstrap (see page 17). Our methd tk 416 CPU secnds (apprximately 7 minutes) t finish, while the classical btstrap used 2.5 CPU hurs. Bth prgrams were written in C and called within Splus 3.4 fr Unix. T illustrate the stability f the distributin estimates btained with the rbust btstrap, we applied ur methd t the MM-regressin estimate fr the Stack Lss data (see Chapter 4 fr the definitins). We used bth re-sampling methds t estimate the distributin f the 4-dimensinal vectr n [ ( ˆβ0, ˆβ 1, ˆβ 2, ˆβ 3 ) ( β 0, β 1, β 2, β 3 ) ], where ( ˆβ 0, ˆβ 1, ˆβ 2, ˆβ 3 ) is the MM-regressin estimate. Figures 1.3 t 1.6 display the QQ-plts f the estimates f the distributin f the prjectins ( ˆβ i β i ), i = 1,..., 4, btained with each methd. Nte that in all cases the distributin estimates yielded by ur methd are clser t the limiting nrmal distributin and have lighter tails than the re-calculated estimates using the classical btstrap. 20

Quantiles f Standard Nrmal INTERCEPT -4-2 0 2 4-50 0 50 (a) Classical btstrap Quantiles f Standard Nrmal INTERCEPT -4-2 0 2 4-50 0 50 (b) Rbust btstrap Figure 1.3: QQ-plts f the re-sampled Intercept cefficient estimates btained with bth the classical and rbust btstrap fr the Stack Lss data. 21

Quantiles f Standard Nrmal AIR FLOW -4-2 0 2 4-1.0-0.5 0.0 0.5 1.0 (a) Classical btstrap Quantiles f Standard Nrmal AIR FLOW -4-2 0 2 4-1.0-0.5 0.0 0.5 1.0 (b) Rbust btstrap Figure 1.4: QQ-plts f the re-sampled Air Flw cefficient estimates btained with bth the classical and rbust btstrap fr the Stack Lss data. 22

Quantiles f Standard Nrmal WATER TEMPERATURE -4-2 0 2 4-2 -1 0 1 (a) Classical btstrap Quantiles f Standard Nrmal WATER TEMPERATURE -4-2 0 2 4-2 -1 0 1 (b) Rbust btstrap Figure 1.5: QQ-plts f the re-sampled Water Temperature cefficient estimates btained with bth the classical and rbust btstrap fr the Stack Lss data. 23

Quantiles f Standard Nrmal ACID CONCENTRATION -4-2 0 2 4-1.0-0.5 0.0 0.5 1.0 (a) Classical btstrap Quantiles f Standard Nrmal ACID CONCENTRATION -4-2 0 2 4-1.0-0.5 0.0 0.5 1.0 (b) Rbust btstrap Figure 1.6: QQ-plts f the re-sampled Acid Cncentratin cefficient estimates btained with bth the classical and rbust btstrap fr the Stack Lss data. 24

1.6 Thesis utline The rest f this thesis is rganized as fllws. Chapter 2 studies the asympttic prperties (cnsistency and asympttic distributin) f sme rbust scale and lcatin estimates. We intrduce the lcatin-scale mdel and the classes f S-scale, S-lcatin and MM-lcatin estimates. We study the asympttic behaviur f these estimates when the distributin f the errrs belngs t a cntaminatin neighburhd f the standard nrmal. We present cnsistency and asympttic nrmality results that, under additinal regularity cnditins, hld unifrmly n the distributin f the errrs (see Davies, 1998). As a side result we derive a technique t determine the maximum asympttic bias f M-lcatin estimates with re-descending scre functins. In Chapter 3 we present a new cmputer intensive inference methd fr the lcatin-scale mdel and we study its asympttic prperties. In particular we shw that the resulting btstrap distributin cnverges t the asympttic distributin f the estimates f interest and that the derived quantile estimates have satisfactry rbustness prperties. Finally, we reprt the results f tw Mnte Carl studies that cmpare the perfrmance f this new methd with ther prpsals in the literature. The first study cmpares several asympttic variance estimates while the secnd cmpares the mean length and empirical cverage f cnfidence intervals fr the parameters f interest in the mdel. In Chapter 4 we extend the results f Chapter 2 t the linear regressin mdel. Sectin 4.1 presents the mdel and defines the MM-regressin estimates. Sectin 4.2 25

presents the cntaminatin neighburhd and the rbustness prperties f the MMregressin estimates. Sectin 4.3 studies the asympttic prperties f these estimates. Chapter 5 extends the inference methd presented in Chapter 3 t the linear regressin mdel. We illustrate its use with tw examples and we study the cnsistency f the distributin estimate and the rbustness f the crrespnding quantile estimates. Sectin 5.5 cntains the results f a simulatin study that investigates the finite sample size behaviur f the cnfidence intervals based n new methd intrduced here. Chapter 6 cntains a brief list f the results btained in this thesis, the challenges that remain t be slved and the directins we frsee fr future wrk. The appendix in Chapter 7 cntains mst f the auxiliary results needed in the prfs. Prfs are presented fr thse results that culd nt be fund in the literature. 26

Chapter 2 Glbal asympttic prperties f rbust estimates fr the lcatin-scale mdel In this chapter we study the asympttic prperties (cnsistency and asympttic distributin) f sme rbust estimates f a lcatin parameter when the bservatins may have an asymmetric distributin. First we define the classes f M-lcatin estimates with general scale, S-lcatin, S-scale estimates, and MM-lcatin estimates. Mst attentin in the rbustness literature has been paid t the asympttic prperties f rbust estimates (in particular t their cnsistency and asympttic distributin) when the data fllw the nn-cntaminated mdel. In this chapter we study the prperties f S- and MM-estimates in the full cntaminatin neighburhd. We shw that the 27

S- and MM-estimates are cnsistent and asympttically nrmal fr any distributin in the grss-errr neighburhd f a symmetric distributin. We als discuss cnditins that ensure these results hld unifrmly ver the cntaminatin neighburhd. As discussed by Davies (1998), unifrmity is a reasnable prperty t expect in this cntext. Rbust estimates have been prpsed t deal with uncertainty in the mdel that generates the data, hence we expect their prperties nt t depend n a specific distributin in the neighburhd. Fr example, the speed f cnvergence can depend n the distributin that generated the data. Our results guarantee that this is nt the case with the estimates we cnsider in this chapter. This chapter is rganized as fllws. Sectin 2.1 defines the classes f M-, S- and MM-lcatin and scale estimates. Sectin 2.2 intrduces the cntaminatin neighburhd H ɛ and briefly discusses the rbustness prperties f these estimates. Sectin 2.3 studies the asympttic prperties f S-lcatin, S-scale and MM-lcatin estimates when the distributin f the data belngs t H ɛ. We prvide cnditins t btain cnsistency and asympttic nrmality f these estimates. We als btain regularity cnditins that ensure that the cnsistency and asympttic nrmality results hld unifrmly n the cntaminatin neighburhd. We shw that a certain family f rbust estimates satisfies these cnditins. 28

2.1 Definitins In this chapter we cnsider the fllwing lcatin-scale mdel. Let x 1,..., x n be n bservatins n the real line satisfying x i = µ + σ ɛ i i = 1,... n, (2.1) where ɛ i, i = 1,... n are independent and identically distributed (i.i.d.) bservatins with variance equal t 1. The interest is in estimating µ. The scale σ is a nuisance parameter. Huber (1964) intrduced the class f M-estimates. Suppse that x 1,..., x n are i.i.d. bservatins with density functin f (x, θ), θ Θ, with Θ sme parameter space. The M-estimate f θ is ˆµ n = ˆµ n (x 1,..., x n ) = arg min θ Θ ρ (x i, θ), (2.2) where ρ is a lss functin. When ρ (x, θ) = lg f (x, θ), ˆµ n is the maximum likelihd estimate f θ. Under regularity cnditins n ρ and Θ the estimate ˆµ n als satisfies where ψ = ρ/ θ. ψ (x i, ˆµ n ) = 0, (2.3) If the data fllw mdel (2.1) and θ = µ in (2.2) it is natural t chse the lss functin ρ t be a functin f the residuals, ρ (x, θ) = ρ (x θ). Then (2.3) becmes ψ (x i ˆµ n ) = 0. (2.4) 29

In the fllwing we will assume that the functin ψ : R R satisfies: P.1 ψ ( u) = ψ (u), u 0, and bunded; P.2 ψ is nn-decreasing and lim u ψ (u) > 0; P.3 ψ is cntinuus. Withut lss f generality, if ψ satisfies P.1 we can assume that ψ (u) 1, u R. Definitin 2.1 - M-lcatin estimates (with knwn scale): Let x 1,..., x n be a randm sample fllwing mdel (2.1). Let ψ : R R satisfy P.1 t P.3 abve. The slutin ˆµ n f (2.4) is called an M-lcatin estimate. A widely used family f functins ψ c was prpsed by Huber (1964). Its members are given by sgn (u) if u c ψ c (u) = u/c if u < c, (2.5) where c R + is a user-chsen cnstant and sgn (u) is the sign functin. The cnstant c determines the asympttic prperties f the sequence ˆµ n (see Definitin 2.13 n page 39). One crrespnding functin ρ c is given by u c/2 if u > c ρ c (u) = u 2 /2c if u c u c/2 if u < c. Under certain regularity cnditins the crrespnding M-lcatin estimates are asympttically nrmal (Huber, 1967). The chice c = 1.345 yields an asympttic efficiency 30

f 95% when ɛ i N (0, 1). Fr sme asympttic results we will need the functin ψ c t be twice cntinuusly differentiable. We can easily cnstruct functins that satisfy the regularity cnditins f Definitin 2.1 and that are twice cntinuusly differentiable. Fr example, fr a given c > 0 we can find cnstants a, b, d and e such that sgn (u) f c (u) = a u 7 + b u 5 + d u 3 + e u if u c if u c (2.6) is twice cntinuusly differentiable with f c (±c) = ±1, f c (±c) = 0, f c (0) = 1 and f c (±c) = 0. Beatn and Tukey (1974) prpsed anther family f functins ψ d, 0 if u d ψ d (u) = (u/ d) ( 1 (u/ d) 2) 2 if u < d. (2.7) The cnstant d determines the asympttic prperties f these estimates. The assciated family f functins ρ d is given by 3 (u/ d) 2 3 (u/ d) 4 + (u/ d) 6 if u d ρ d (u) = 1 if u > d, (2.8) This family f functins ψ d differs frm Huber s in that its members vanish fr large values f x. In terms f the estimate this feature means that utlying pints will be ignred instead f dwn-weighted. In the rbustness literature these functins are called re-descending. A prperty that is natural t expect frm an estimate fr the lcatin parameter in (2.1) is that it be equivariant under shifts in the center f the data. 31

Definitin 2.2 - Translatin equivariance: We will say that an estimate ˆµ n = ˆµ n (x 1,... x n ) is translatin equivariant if fr any sample x 1,..., x n and real number a we have ˆµ n (x 1 + a,..., x n + a) = ˆµ n (x 1,..., x n ) + a. It is easy t verify that estimates ˆµ n that satisfy (2.4) are translatin equivariant. Equivariance with respect t change f scale is als f interest. Definitin 2.3 - Scale equivariance: We will say that an estimate ˆµ n = ˆµ n (x 1,... x n ) is scale equivariant if fr any sample x 1,..., x n and real number a we have ˆµ n (a x 1,..., a x n ) = a ˆµ n (x 1,..., x n ). (2.9) The estimates ˆµ n defined by equatin (2.4) are nt generally scale equivariant. T btain equivariant estimates we intrduce scale estimates. Definitin 2.4 - Scale estimate: Let x 1,..., x n be a sample f n real numbers. An estimate ˆσ n = ˆσ n (x 1,..., x n ) such that ˆσ n (a x 1 + b,..., a x n + b) = a ˆσ n (x 1,..., x n ) a, b R, (2.10) will be called a scale estimate. Equatin (2.4) can incrprate the scale estimate ˆσ n as fllws. Definitin 2.5 - M-lcatin estimates with general scale: Let ψ : R R satisfy P.1 t P.3. Let x 1,..., x n be a randm sample f real numbers and let ˆσ n be 32

a scale estimate. The M-lcatin estimate with general scale is the slutin ˆµ n f 1 n ψ ((x i ˆµ n )/ ˆσ n ) = 0. (2.11) Let ρ ψ be a real functin such that ρ ψ = ψ, then ˆµ n can als be defined by ˆµ n = arg min t R 1 n ρ ψ ((x i t)/ ˆσ n ). (2.12) If ψ is nt cntinuus in the abve definitin, then the slutin f (2.11) may nt exist. We can still define the M-lcatin estimate in this situatin as ˆµ n = inf { t R : ψ ((x i t)/ ˆσ n ) 0 }, where inf A dentes the infimum f the set A (see Huber, 1981, page 46). It is easy t verify that the M-lcatin estimates with general scale as defined in Definitin 2.5 are translatin and scale equivariant. Different scale estimates ˆσ n generate different classes f M-lcatin estimates. Definitins 2.6 and 2.7 cnsider tw particular classes: the M-scale and S-scale estimates respectively. ρ (0) = 0 and In the fllwing we will assume that the real functin ρ : R R +, satisfies R.1 ρ ( u) = ρ (u), u 0, and sup u R ρ (u) = 1; R.2 ρ (u) is nn-decreasing in u 0; R.3 ρ is cntinuus. 33

Nte that withut lss f generality, any symmetric and bunded functin ρ that is nt cnstantly equal t 0 can be adjusted t satisfy R.1 abve. Fr an arbitrary measurable functin f and a randm variable X with distributin functin F, let E F f (X) dente the expected value f the randm variable f (X) when X has distributin F, if this expectatin exists. Definitin 2.6 - M-scale estimates (Huber, 1964): Let ρ : R R satisfy R.1 t R.3 abve. Let b (0, 1/2]. Let x 1,..., x n be a randm sample and let ˆµ n be a scaleand translatin-equivariant estimate. Define the residuals r 1 = x 1 ˆµ n,..., r n = x n ˆµ n. The M-scale ˆσ n is implicitly defined by 1 n ρ (r i / ˆσ n ) = b. (2.13) The chices f the functin ρ and the cnstant b in (2.13) determine the prperties f the resulting scale estimate. Fr example, t ensure cnsistency f ˆσ n when the residuals r i s in (2.13) are standard nrmal randm variables we chse b = E Φ ρ (Z), where Z N (0, 1). The cnstant b will als characterize the rbustness prperties f the sequence ˆσ n (see Sectin 2.2). A widely used family f ρ functins is given by (u/ k) 2 if u k ρ k (u) = 1 if u > k, (2.14) where k is a user-chsen cnstant. Fr a given b (0, 1/2] we can chse k t btain E Φ ρ k (Z) = b. k = 1.04086 satisfies E Φ ρ k (Z) = 1/2. 34

Anther family f scale estimates is that f the S-scales (Russeeuw and Yhai, 1984). Definitin 2.7 - S-scale estimates: Let ρ : R R + and b R as in Definitin 2.6. Let x 1,..., x n be a randm sample. Fr every t R cnsider the residuals x 1 t,..., x n t and their M-scale s n (t) satisfying 1 ρ ((x i t)/ s n (t)) = b. (2.15) n The S-scale ˆσ n is defined by ˆσ n (x 1,..., x n ) = inf t R s n (t). (2.16) Naturally assciated with this family are the S-lcatin estimates. Definitin 2.8 - S-lcatin estimates: Let x 1,..., x n be a randm sample, and fr each t R let s n (t) be as in (2.15). The S-lcatin estimate µ n is µ n (x 1,..., x n ) = arg inf t R s n (t). (2.17) It is easy t see that if the functin ρ is cntinuusly differentiable, the pair ( µ n, ˆσ n ) in (2.16) and (2.17) satisfies the fllwing system f equatins 1 ρ ((x i µ n )/ ˆσ n ) = b n (2.18) 1 ρ ((x i µ n )/ ˆσ n ) = 0, n (2.19) where ρ dentes the derivative f ρ. In analgy with Yhai (1987) we will refer t the M-lcatin estimates calculated with an S-scale as MM-lcatin estimates. 35

Definitin 2.9 - MM-lcatin estimates: Let x 1,..., x n be a randm sample fllwing mdel (2.1). Let ψ : R R satisfy P.1 t P.3. Let ˆσ n be an S-scale estimate as in (2.16). The slutin ˆµ n f ψ ((x i ˆµ n )/ ˆσ n ) = 0, (2.20) will be called the MM-lcatin estimate f x 1,..., x n. Definitin 2.10 - Simultaneus M-lcatin and scale estimates (Huber, 1964): Let ψ : R R satisfy P.1 t P.3, and let ρ : R R + satisfy R.1 t R.3. Let x 1,..., x n be a randm sample and let b = E Φ ρ (Z). The simultaneus M-lcatin and scale estimates ˆµ n and ˆσ n are given by the slutin f the fllwing system f equatins 1 n 1 n ψ ((x i ˆµ n )/ ˆσ n ) = 0, ρ ((x i ˆµ n )/ ˆσ n ) = b, (2.21) 2.2 Rbustness prperties The asympttic prperties f the M-lcatin estimates given by (2.12) are well-knwn when the distributin f the errrs is symmetric (Huber, 1964, 1967, 1981; Bs and Serfling, 1980; Clarke, 1983, 1984). We will assume that the distributin f the errrs belngs t the fllwing grss errr neighburhd H ɛ = {F D : F (x) = (1 ɛ) F 0 ((x µ)/ σ) + ɛ H (x)}, (2.22) 36