Robust Inference. A central concern in robust statistics is how a functional of a CDF behaves as the distribution is perturbed.

Similar documents
Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Review and continuation from last week Properties of MLEs

Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if

Lecture 21: Convergence of transformations and generating a random variable

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Theoretical Statistics. Lecture 19.

Asymptotic distribution of the sample average value-at-risk

B553 Lecture 1: Calculus Review

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Robustness of Principal Components

Lecture 3: Random variables, distributions, and transformations

Multivariate Distribution Models

Algorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric

Asymptotic Statistics-VI. Changliang Zou

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Confidence Intervals and Hypothesis Tests

Multivariate Distributions (Hogg Chapter Two)

Frequency Analysis & Probability Plots

Lecture 32: Asymptotic confidence sets and likelihoods

1 Solution to Problem 2.1

Asymptotic statistics using the Functional Delta Method

Inference For High Dimensional M-estimates. Fixed Design Results

Recall the Basics of Hypothesis Testing

Basic Computations in Statistical Inference

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Review of Probability Theory

Stochastic Convergence, Delta Method & Moment Estimators

Probability and Estimation. Alan Moses

Metric spaces and metrizability

The Nonparametric Bootstrap

Opening Skew-symmetric distn s Skew-elliptical distn s Other properties Closing. Adelchi Azzalini. Università di Padova, Italia

Lecture 4: Graph Limits and Graphons

6 Lecture 6b: the Euler Maclaurin formula

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Optimization for Compressed Sensing

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

Section 5.2 Series Solution Near Ordinary Point

Review: mostly probability and some statistics

Lecture 4: September Reminder: convergence of sequences

Local Minimax Testing

Modulation of symmetric densities

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Lecture 16: Sample quantiles and their asymptotic properties

We introduce methods that are useful in:

Statistical Estimation

5.6 The Normal Distributions

1 Lyapunov theory of stability

Richard S. Palais Department of Mathematics Brandeis University Waltham, MA The Magic of Iteration

The Delta Method and Applications

Probability and Information Theory. Sargur N. Srihari

Page 52. Lecture 3: Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 2008/10/03 Date Given: 2008/10/03

Compact operators on Banach spaces

Limiting Distributions

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Some Background Material

A Novel Nonparametric Density Estimator

7 Influence Functions

Parametric Techniques

Introduction to Probability

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Lecture 5: Moment generating functions

Sparsification by Effective Resistance Sampling

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Continuous random variables

Set, functions and Euclidean space. Seungjin Han

CS 542G: The Poisson Problem, Finite Differences

Convergence for periodic Fourier series

ECE 4400:693 - Information Theory

Chapter 2. Limits and Continuity 2.6 Limits Involving Infinity; Asymptotes of Graphs

Lecture 13: Series Solutions near Singular Points

Inference For High Dimensional M-estimates: Fixed Design Results

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Return probability on a lattice

1 Random Variable: Topics

On prediction and density estimation Peter McCullagh University of Chicago December 2004

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

6 Central Limit Theorem. (Chs 6.4, 6.5)

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

Quantum Field Theory Prof. Dr. Prasanta Kumar Tripathy Department of Physics Indian Institute of Technology, Madras

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries

lim F n(x) = F(x) will not use either of these. In particular, I m keeping reserved for implies. ) Note:

MATH Notebook 5 Fall 2018/2019

Lecture #11: Classification & Logistic Regression

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Time-bounded computations

Parametric Techniques Lecture 3

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Expectation is a positive linear operator

Time Series 2. Robert Almgren. Sept. 21, 2009

Transcription:

Robust Inference Although the statistical functions we have considered have intuitive interpretations, the question remains as to what are the most useful distributional measures by which to describe a given distribution. In a simple case such as a normal distribution, the choices are obvious. For skewed distributions, or distributions that arise from mixtures of simpler distributions, the choices of useful distributional measures are not so obvious. A central concern in robust statistics is how a functional of a CDF behaves as the distribution is perturbed. If a functional is rather sensitive to small changes in the distribution, then one has more to worry about if the observations from the process of interest are contaminated with observations from some other process.

Sensitivity of Statistical Functions to Perturbations in the Distribution One of the most interesting things about a function (or a functional) is how its value varies as the argument is perturbed. Two key properties are continuity and differentiability. For the case in which the arguments are functions, the cardinality of the possible perturbations is greater than that of the continuum. We can be precise in discussions of continuity and differentiability of a functional Υ at a point (function) F in a domain F by defining another set D consisting of difference functions over F; that is the set the functions D = F 1 F 2 for F 1, F 2 F.

Derivatives of Functionals The concept of differentiability for functionals is necessarily more complicated than for functions over real domains. For a functional Υ over the domain F, we define three levels of differentiability at the function F F. All definitions are in terms of a domain D of difference functions over F, and a linear functional Λ F defined over D in a neighborhood of F. The first type of derivative is very general. The other two types depend on a metric ρ on F F induced by a norm on F.

Derivatives of Functionals Gâteaux differentiable. Υ is Gâteaux differentiable at F iff there exists a linear functional Λ F (D) over D such that for t IR for which F + td F, lim t 0 ( Υ (F + td) Υ (F ) t Λ F (D) ) = 0. ρ-hadamard differentiable. Υ is ρ-hadamard differentiable at F iff there exists a linear functional Λ F (D) over D such that for any sequence t j 0 IR and sequence D j D such that ρ(d j, D) 0 and F + t j D j F, lim j ( ) Υ (F + tj D j ) Υ (F ) Λ F (D j ) t j = 0.

ρ-fréchet differentiable. Υ is ρ-fréchet differentiable at F iff there exists a linear functional Λ(D) over D such that for any sequence F j F for which ρ(f j, F ) 0, ( Υ (Fj ) Υ (F ) Λ F (F j F ) ) lim j ρ(f j, F ) = 0.

Differentials of Functionals The linear functional Λ F is called the [Gâteaux ρ-hadamard ρ-fréchet] differential of Υ at F.

Perturbations In statistical applications using functionals defined on the CDF, we often consider a simple type of function in the neighborhood of the CDF. These are CDFs formed by adding a single mass point to the given distribution. For a given CDF P (y), we can define a simple perturbation as where 0 ɛ 1. P x,ɛ (y) = (1 ɛ)p (y) + ɛi [x, ) (y), (1) We will refer to this distribution as an ɛ-mixture distribution. The distribution with CDF P is the reference distribution. (This, of course, is the distribution of interest, so I often refer to it without any qualification.)

Perturbations A simple interpretation of the perturbation in equation (1) is that it is the CDF of a mixture of a distribution with CDF P and a degenerate distribution with a single mass point at x, which may or may not be in the support of the distribution. The extent of the perturbation depends on ɛ; if ɛ = 0, the distribution is the reference distribution. If the distribution with CDF P is continuous with PDF p, the PDF of the mixture is dp x,ɛ (y)/dy = (1 ɛ)p(y) + ɛδ(x y), where δ( ) is the Dirac delta function. If the distribution is discrete, the probability mass function has nonzero probabilities (scaled by (1 ɛ)) at each of the mass points associated with P together with a mass point at x with probability ɛ.

PDFs and the CDF of the ɛ-mixture Distribution In the left-hand graph the PDF of a continuous reference distribution (solid line) and the PDF of the ɛ-mixture distribution (dotted line together with the mass point at x). ε P x, ε (y) p(y) (1-ε)p(y) ε x x

Perturbations A statistical function evaluated at P x,ɛ compared to the function evaluated at P allows us to determine the effect of the perturbation on the statistical function. For example, we can determine the mean of the distribution with CDF P x,ɛ in terms of the mean µ of the reference distribution to be (1 ɛ)µ + ɛx. This is easily seen by thinking of the distribution as a mixture. For example, for the M functional we have M(P x,ɛ ) = = (1 ɛ) y d((1 ɛ)p (y) + ɛi [x, ) (y)) y dp (y) + ɛ yδ(y x) dy = (1 ɛ)µ + ɛx. (2)

Perturbations For a discrete distribution we would follow the same steps using summations (instead of an integral of y times a Dirac delta function, we just have a point mass of 1 at x), and would get the same result.

Quantiles under Perturbations The π quantile of the mixture distribution, Ξ π (P x,ɛ ) = P 1 x,ɛ (π), is somewhat more difficult to work out. This quantile, which we will call q, is shown relative to the π quantile of the continuous reference distribution, x π for two cases.

Quantiles under Perturbations For example, if the reference distribution is a standard normal, π = 0.7, so x π = 0.52, and ɛ = 0.1, we have the graphs p(y) p(y) (1-ε)p(y) ε (1-ε)p(y) ε x1 q yπ yπ q x2 In the left-hand graph, x 1 = 1.25, and in the right-hand graph, x 2 = 1.25.

Quantiles under Perturbations We see that in the case of a continuous reference distribution (implying P is strictly increasing), ), for (1 ɛ)p (x) + ɛ < π, P 1 ( π ɛ 1 ɛ Px,ɛ 1 (π) = x, for (1 ɛ)p (x) π (1 ɛ)p (x) + ɛ, P 1 ( ) π 1 ɛ, for π < (1 ɛ)p (x). (3) The conditions in equation (3) can also be expressed in terms of x and quantiles of the reference distribution. For example, the first condition is equivalent to x < y π ɛ 1 ɛ.

The Influence Function The extent of the perturbation depends on ɛ, and so we are interested in the relative effect; in particular, the relative effect as ɛ approaches zero. The influence function for the functional Υ and the CDF P, defined at x as Υ (P x,ɛ ) Υ (P ) φ Υ,P (x) = lim (4) ɛ 0 ɛ if the limit exists, is a measure of the sensitivity of the distributional measure defined by Υ to a perturbation of the distribution at the point x. The influence function is also called the influence curve, and denoted by IC. The limit is the right-hand Gâteaux derivative of the functional Υ at P and x.

The Influence Function The influence function can also be expressed as the limit of the derivative of Υ (P x,ɛ ) with respect to ɛ: φ Υ,P (x) = lim ɛ 0 ɛ Υ (P x,ɛ). (5) This form is often more convenient for evaluating the influence function.

The Influence Function for the M Functional Some influence functions are easy to work out, for example, the influence function for the M functional that defines the mean of a distribution, which we denote by µ. The influence function for this functional operating on the CDF P at x is φ µ,p (x) = M(P x,ɛ ) M(P ) lim ɛ 0 ɛ = (1 ɛ)µ + ɛx µ lim ɛ 0 ɛ = x µ. (6)

The Influence Function We note that the influence function of a functional is a type of derivative of the functional, M(P x,ɛ )/ ɛ. The influence function for other moments can be computed in the same way. Note that the influence function for the mean is unbounded in x; that is, it increases or decreases without bound as x increases or decreases without bound. Note also that this result is the same for multivariate or univariate distributions.

The Influence Function for Quantiles The influence function for a quantile is more difficult to work out. The problem arises from the difficulty in evaluating the quantile. As I informally described the distribution with CDF P x,ɛ, it is a mixture of some given distribution and a degenerate discrete distribution. Even if the reference distribution is continuous, the CDF of the mixture, P x,ɛ, does not have an inverse over the full support (although for quantiles we will write P 1 x,ɛ ). Let us consider a simple instance: a univariate continuous reference distribution, and assume p(x π ) > 0. We approach the problem by considering the PDF, or the probability mass function.

The Influence Function for Quantiles In the left-hand graph the second figure, the total probability mass up to the point y π is (1 ɛ) times the area under the curve, that is, (1 ɛ)π, plus the mass at x 1, that is, ɛ. Assuming ɛ is small enough, the π quantile of the ɛ-mixture distribution is the π ɛ quantile of the reference distribution, or P 1 (π ɛ). It is also the π quantile of the scaled reference distribution; that is, it is the value of the function (1 ɛ)p(x) that corresponds to the proportion π of the total probability (1 ɛ) of that component. Use of the definitions is somewhat messy. It is more straightforward to differentiate P 1 x 1,ɛ and take the limit.

The Influence Function for Quantiles For fixed x < y π, we have ( ) ɛ P 1 π ɛ = 1 ɛ 1 p ( P 1 ( π ɛ 1 ɛ )) (π 1)(1 ɛ) (1 ɛ) 2. Likewise, we take the derivatives for the other cases, and then take limits. We get φ Ξπ,P (x) = π 1 p(y π ), for x < y π, 0, for x = y π, π p(y π ), for x > y π.

The Influence Function for Quantiles Notice that the actual value of x is not in the influence function; only whether x is less than, equal to, or greater than the quantile. Notice also that, unlike the influence function for the mean, the influence function for a quantile is bounded; hence, a quantile is less sensitive than the mean to perturbations of the distribution. Likewise, quantile-based measures of scale and skewness are less sensitive than the moment-based measures to perturbations of the distribution. The L J and M ρ functionals, depending on J or ρ, can also be very insensitive to perturbations of the distribution.

The mean and variance of the influence function at a random point are of interest; in particular, we may wish to restrict the functional so that and E(φ Υ,P (X)) = 0 E ( (φ Υ,P (X)) 2) <.

Sensitivity of Estimators Based on Statistical Functions If a distributional measure of interest is defined on the CDF as Υ (P ), we are interested in the performance of the plug-in estimator Υ (P n ); specifically, we are interested in Υ (P n ) Υ (P ). This turns out to depend crucially on the differentiability of Υ. If we assume Gâteaux differentiability, we can write n (Υ (Pn ) Υ (P )) = Λ P ( n(p n P )) + R n = 1 φ Υ,P (Y i ) + R n n where the remainder R n 0. i

Convergence of Estimators First, we as- We are interested in the stochastic convergence. sume E(φ Υ,P (X)) = 0 and E ( (φ Υ,P (X)) 2) <. Then the question is the stochastic convergence of R n. Gâteaux differentiability does not guarantee that R n converges fast enough. However, ρ-hadamard differentiability does imply that that R n is o P (1), because it implies that norms of functionals (with or without random arguments) go to 0. We can also get that R n is o P (1) by assuming Υ is ρ-fréchet differentiable and that nρ(p n, P ) is O P (1).

Convergence of Estimators Assuming either ρ-hadamard or ρ-fréchet differentiability, given the moment properties of φ Υ,P (X) and R n is o P (1), we have by Slutsky s theorem, n (Υ (Pn ) Υ (P )) d N(0, σ 2 Υ,P ), where σ 2 Υ,P = E ( (φ Υ,P (X)) 2).

Asymptotic Variance of Estimators For a given plug-in estimator based on the statistical function Υ, knowing E ( (φ Υ,P (X)) 2) (and assuming E(φ Υ,P (X)) = 0) provides us an estimator of the asymptotic variance of the estimator.

Robust Estimators The influence function is very important in leading us to estimators that are robust; that is, to estimators that are relatively insensitive to departures from the underlying assumptions about the distribution. As mentioned above, the functionals L J and M ρ, depending on J or ρ, can be very insensitive to perturbations of the distribution; therefore estimators based on them, called L-estimators and M- estimators, can be robust. A class of L-estimators that are particularly useful are linear combinations of the order statistics. Because of the sufficiency and completeness of the order statistics in many cases of interest, such estimators can be expected to exhibit good statistical properties.

Robust Estimators Another class of estimators similar to the L-estimators are those based on ranks, which are simpler than order statistics. These are not sufficient the data values have been converted to their ranks nevertheless they preserve a lot of the information. The fact that they lose some information can actually work in their favor; they can be robust to extreme values of the data. A functional to define even a simple linear combination of ranks is rather complicated. As with the L J functional, we begin with a function J, which in this case we require to be strictly increasing, and also, in order to ensure uniqueness, we require that the CDF P be strictly increasing.

R J Estimators The R J functional is defined as the solution to the equation ( ) P (y) + 1 P (2RJ (P ) y) J dp (y) = 0. (7) 2 A functional defined as the solution to this optimization problem is called an R J functional, and an estimator based on applying it to a ECDF is called an R J estimator or just an R-estimator.