Statistical Depth Function

Similar documents
Robust estimation of principal components from depth-based multivariate rank covariance matrix

A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications

CONTRIBUTIONS TO THE THEORY AND APPLICATIONS OF STATISTICAL DEPTH FUNCTIONS

Multivariate quantiles and conditional depth

Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth

Invariant coordinate selection for multivariate data analysis - the package ICS

Methods of Nonparametric Multivariate Ranking and Selection

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Exploratory data analysis: numerical summaries

Influence Functions for a General Class of Depth-Based Generalized Quantile Functions

Klasifikace pomocí hloubky dat nové nápady

Unit 2. Describing Data: Numerical

Jensen s inequality for multivariate medians

Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions

PROJECTION-BASED DEPTH FUNCTIONS AND ASSOCIATED MEDIANS 1. BY YIJUN ZUO Arizona State University and Michigan State University

On the limiting distributions of multivariate depth-based rank sum. statistics and related tests. By Yijun Zuo 2 and Xuming He 3

Chapter 4. Chapter 4 sections

Chapter 1: Exploring Data

Review: Central Measures

Fast and Robust Classifiers Adjusted for Skewness

MATH4427 Notebook 4 Fall Semester 2017/2018

Efficient and Robust Scale Estimation

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Statistics for Managers using Microsoft Excel 6 th Edition

Measuring robustness

On Invariant Within Equivalence Coordinate System (IWECS) Transformations

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

WEIGHTED HALFSPACE DEPTH

In this investigation you will use the statistics skills that you learned the to display and analyze a cup of peanut M&Ms.

Detecting outliers in weighted univariate survey data

DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract

Section 2.4. Measuring Spread. How Can We Describe the Spread of Quantitative Data? Review: Central Measures

Units. Exploratory Data Analysis. Variables. Student Data

2011 Pearson Education, Inc

Statistics I Chapter 2: Univariate data analysis

ON MULTIVARIATE MONOTONIC MEASURES OF LOCATION WITH HIGH BREAKDOWN POINT

Chapter 4. Theory of Tests. 4.1 Introduction

Independent Component (IC) Models: New Extensions of the Multinormal Model

Section 3. Measures of Variation

Continuous Distributions

Multivariate Functional Halfspace Depth

Statistics. Statistics

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Design and Implementation of CUSUM Exceedance Control Charts for Unknown Location

Statistics I Chapter 2: Univariate data analysis

Chapter 2 Depth Statistics

CPSC 540: Machine Learning

Graphical Techniques Stem and Leaf Box plot Histograms Cumulative Frequency Distributions

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

Frequency Analysis & Probability Plots

YIJUN ZUO. Education. PhD, Statistics, 05/98, University of Texas at Dallas, (GPA 4.0/4.0)

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

CIVL 7012/8012. Collection and Analysis of Information

arxiv: v1 [stat.co] 2 Jul 2007

Chapter 1 - Lecture 3 Measures of Location

Robust scale estimation with extensions

3. Probability and Statistics

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Robust estimators based on generalization of trimmed mean

2 Functions of random variables

MIT Spring 2015

1. Exploratory Data Analysis

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

Cheat Sheet: ANOVA. Scenario. Power analysis. Plotting a line plot and a box plot. Pre-testing assumptions

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4: Modelling

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Describing Distributions with Numbers

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

Multivariate Distributions

AP Statistics Cumulative AP Exam Study Guide

Introduction to robust statistics*

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Review of Statistics

CHAPTER 2: Describing Distributions with Numbers

Describing Distributions with Numbers

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Jensen s inequality for multivariate medians

Stat 427/527: Advanced Data Analysis I

arxiv:math/ v1 [math.st] 25 Apr 2005

Extrinsic Antimean and Bootstrap

Chapter 3. Data Description

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Rigorous Evaluation R.I.T. Analysis and Reporting. Structure is from A Practical Guide to Usability Testing by J. Dumas, J. Redish

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Support Vector Machine (SVM) and Kernel Methods

Descriptive Data Summarization

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

A Short Course in Basic Statistics

Depth Measures for Multivariate Functional Data

Chapter 2 Solutions Page 15 of 28

P8130: Biostatistical Methods I

Week 2. Review of Probability, Random Variables and Univariate Distributions

Measures of disease spread

CHAPTER 1 Exploring Data

APPLICATION OF FUNCTIONAL BASED ON SPATIAL QUANTILES

Transcription:

Machine Learning Journal Club, Gatsby Unit June 27, 2016

Outline L-statistics, order statistics, ranking. Instead of moments: median, dispersion, scale, skewness,... New visualization tools: depth contours, sunburst plot. Testing: symmetry. Depth function properties.

L-statistics, ranking

L-statistics Given: x 1,...,x n R samples. Order statistics: x [1]... x [n]. Linear combination of order statistics. Rank: position of x i.

L-statistics: examples Dispersion: 1 range: x [n] x [1]. 2 interquartile range: x [3n/4] x [n/4] = Q 3 Q 1.

L-statistics: examples Dispersion: 1 range: x [n] x [1]. 2 interquartile range: x [3n/4] x [n/4] = Q 3 Q 1. Location: median: x [n/2]. α-trimmed mean = middle of the (1 2α)-th fraction of observations, (1 α)n 1 x [i]. (1 2α)n i=αn+1

L-statistics Typical application: robust estimation. Question: Similar notions in R d?

L-statistics Typical application: robust estimation. Question: Similar notions in R d? Critical: x [1]... x [n].

L-statistics Typical application: robust estimation. Question: Similar notions in R d? Critical: x [1]... x [n]. Idea: center-outward ordering.

Median

Motivation: median F: continuous c.d.f. on R. median: point splitting the probability mass to 1 2 1 2. solution of F(m) = 1 2.

Motivation: median F: continuous c.d.f. on R. median: point splitting the probability mass to 1 2 1 2. solution of F(m) = 1 2. alternative view: m := argmaxd(x) := 2F(x)[1 F(x)] x R

Motivation: median F: continuous c.d.f. on R. median: point splitting the probability mass to 1 2 1 2. solution of F(m) = 1 2. alternative view: m := argmaxd(x) := 2F(x)[1 F(x)] x R = P X1,X 2 F(x [X 1,X 2 ]).

Motivation: median F: continuous c.d.f. on R. median: point splitting the probability mass to 1 2 1 2. solution of F(m) = 1 2. alternative view: m := argmaxd(x) := 2F(x)[1 F(x)] x R = P X1,X 2 F(x [X 1,X 2 ]). Moving away from m, D(x) decreases monotonically to 0.

Let us extend the median to R d! Simplitical depth [Liu, 1990]: SD(x;F) := P F (x (X 1,...,X d+1 )).

Let us extend the median to R d! Simplitical depth [Liu, 1990]: SD(x;F) := P F (x (X 1,...,X d+1 )). Half-space depth [Tukey, 1975]: HD(x;F) := inf x H R d P F(X H) minimum probability of any closed halfspace containing x.

Let us extend the median to R d! Simplitical depth [Liu, 1990]: SD(x;F) := P F (x (X 1,...,X d+1 )). Half-space depth [Tukey, 1975]: HD(x;F) := inf x H R d P F(X H) minimum probability of any closed halfspace containing x. median: m = argmax x R d D(x;F).

Side-note: these quantities are also computable depth R-package: simplicial, half-space, Oja s depth. Ref. (half-space depth): [Dyckerhoff and Mozharovskyi, 2016].

Side-note: these quantities are also computable depth R-package: simplicial, half-space, Oja s depth. Ref. (half-space depth): [Dyckerhoff and Mozharovskyi, 2016]. Simplicial depth: x = n α i x i, i=1 n α i = 1 i=1 linear equation. If α i 0 ( i) Yes.

Example: SD depth-induced contours normal exponential center-outward ordering. high-depth: center, low-depth: outlyingness. expansion of contours: scale (dispersion), skewness, kurtosis.

Location Median/center: deepest point.

Location Median/center: deepest point. Other robust location statistics: idea: assign smaller weights to more outlier data. given: w : [0,1] R 0 nonincreasing weight function L w = n i=1 x [i]w(i/n) n j=1 w(j/n).

Location Median/center: deepest point. Other robust location statistics: idea: assign smaller weights to more outlier data. given: w : [0,1] R 0 nonincreasing weight function L w = n i=1 x [i]w(i/n) n j=1 w(j/n). Specifically: w(t) = I(t 1/n): sample median. w(t) = I(t 1 α): 100α%-trimmed mean.

Location Median/center: deepest point. Other robust location statistics: idea: assign smaller weights to more outlier data. given: w : [0,1] R 0 nonincreasing weight function L w = n i=1 x [i]w(i/n) n j=1 w(j/n). Specifically: w(t) = I(t 1/n): sample median. w(t) = I(t 1 α): 100α%-trimmed mean. No expectation requirement!

Dispersion, scale Idea-1: similar to location, the dispersion S = n ( )( ) T i=1 x[i] x [1] x[i] x [1] w(i/n) n j=1 w(j/n).

Dispersion, scale Idea-1: similar to location, the dispersion S = n ( )( ) T i=1 x[i] x [1] x[i] x [1] w(i/n) n j=1 w(j/n). Scale: det(s). Equivariance under affine tranformations: y i = Ax i +b S y = AS x A T.

Dispersion, scale Idea-1: similar to location, the dispersion S = n ( )( ) T i=1 x[i] x [1] x[i] x [1] w(i/n) n j=1 w(j/n). Scale: det(s). Equivariance under affine tranformations: y i = Ax i +b S y = AS x A T. No moment constraints!

Dispersion: idea-2 Idea Dispersion := speed how the data depth decreases. p-th (empirical) central region: scale curve: C n,p := conv { x [1],...,x [np] }. S n (p) := vol(c n,p ). faster growing of p S n (p) = larger scale of the distribution.

Scale curve example

Skewness: departure from spherical symmetry Spherical symmetry around c: X c = U(X c), U: orthogonal.

Skewness: departure from spherical symmetry Spherical symmetry around c: X c = U(X c), U: orthogonal. Strategy: take the smallest sphere containing C n,p, determine the fraction of the data within this sphere, plot it as function of p ( p).

Skewness: departure from spherical symmetry Spherical symmetry around c: X c = U(X c), U: orthogonal. Strategy: take the smallest sphere containing C n,p, determine the fraction of the data within this sphere, plot it as function of p ( p). Sperical symmetry (0,0) (1,1) linear curve, i.e. area of the gap = 0.

Spherical symmetric/skewed: demo Spherical (area = 0.08) Asymmetric (area = 0.2)

Sunburst plot

Sunburst plot = 2D boxplot For a given depth function: identify the center, central 50% points ( IQR), ray from non-central points to center ( whiskers). normal exponential

Testing

Symmetries X R d is centrally symmetric around c: X c distr = c X.

Symmetries X R d is centrally symmetric around c: X c distr = c X. X c angularly symmetric around c: X c is C-sym. around 0. 2

Symmetries X R d is centrally symmetric around c: X c distr = c X. X c angularly symmetric around c: X c is C-sym. around 0. 2 half-space symmetric around c: P(X H) 1 2, for H c, closed half-space.

Symmetries X R d is centrally symmetric around c: X c distr = c X. X c angularly symmetric around c: X c is C-sym. around 0. 2 half-space symmetric around c: P(X H) 1 2, for H c, closed half-space. Note: C-symmetry A-symmetry H-symmetry. A-symmetry H-symmetry: in the discrete case, continuous:. d = 1: all = standard symmetry.

Testing: A-symmetry idea SD is maximized at the center of symmetry, = 1. 2 d F: absolutely continuous, SD(c ;F) 1, = F: A-sym. 2 d around c.

Testing: A-symmetry idea SD is maximized at the center of symmetry, = 1. 2 d F: absolutely continuous, SD(c ;F) 1, = F: A-sym. 2 d around c. c: hypothesized center. Decision: if 1 2 d SD(c;F n ) is large reject H 0.

Testing: A-symmetry idea SD is maximized at the center of symmetry, = 1. 2 d F: absolutely continuous, SD(c ;F) 1, = F: A-sym. 2 d around c. c: hypothesized center. 1 Decision: if SD(c;F 2 d n ) is large reject H 0. Under the hood: 1 SD(c;F 2 d n ): degenerate U-statistic, n [ 1 SD(c;F 2 d n ) ] -sum of χ 2 -s.

Depth function properties

Desirable properties F:=Borel probability measures on R d. D( ; ) : R d F R 0 bounded, 1 affine invariance: D(Ax+b;F AX+b ) = D(x;F X ) A : invertible,b,x.

Desirable properties F:=Borel probability measures on R d. D( ; ) : R d F R 0 bounded, 1 affine invariance: D(Ax+b;F AX+b ) = D(x;F X ) A : invertible,b,x. 2 maximality at center: For F with center c, D(c;F) = sup x R d D(x;F).

Desirable properties F:=Borel probability measures on R d. D( ; ) : R d F R 0 bounded, 1 affine invariance: D(Ax+b;F AX+b ) = D(x;F X ) A : invertible,b,x. 2 maximality at center: For F with center c, D(c;F) = sup x R d D(x;F). 3 monotonicity relative to the deepest point: For F with deepest point c, D(x;F) D(αx+(1 α)c;f), α [0,1].

Desirable properties F:=Borel probability measures on R d. D( ; ) : R d F R 0 bounded, 1 affine invariance: D(Ax+b;F AX+b ) = D(x;F X ) A : invertible,b,x. 2 maximality at center: For F with center c, D(c;F) = sup x R d D(x;F). 3 monotonicity relative to the deepest point: For F with deepest point c, 4 vanishing at infinity: D(x;F) D(αx+(1 α)c;f), α [0,1]. D(x;F) x 0, F.

Categorization of depth functions [Zuo and Serfling, 2000] Vanishing at infinity: useful for establishing sup D(x;F n ) D(x;F) n 0 (F-a.s.). x

Categorization of depth functions [Zuo and Serfling, 2000] Properties of HD and SD: half-space (HD): 1-4 (H-sym.) simplicial (SD): continuous distributions: 1-4 (A-sym.) discrete: 2/3 might fail.

Types D A (x;f) = E F [h A (x;x 1 ;...;X r )] example SD, 1 D B (x;f) = 1+E F [h B (x;x 1 ;...;X r )], D C (x;f) = 1 1+O(x;F), D D (x;f) = inf P F(X C) example HD. x C C h A (x;x 1,...,x r ): bounded, closeness of x to x i -s. h B (x;x 1,...,x r ): unbounded, distance of x from x i -s. O(x; F): unbounded outlyingness function. C: closedness properties.

Types D A (x;f) = E F [h A (x;x 1 ;...;X r )] example SD, 1 D B (x;f) = 1+E F [h B (x;x 1 ;...;X r )], D C (x;f) = 1 1+O(x;F), D D (x;f) = inf P F(X C) example HD. x C C h A (x;x 1,...,x r ): bounded, closeness of x to x i -s. h B (x;x 1,...,x r ): unbounded, distance of x from x i -s. O(x; F): unbounded outlyingness function. C: closedness properties. Note: D A (x;f n ): U/V-statistic.

Type B examples Let Σ = cov(f). In simplicial volume depth (D SVD α), D L2: ( ) α vol[ (x,x 1,...,x d )] h(x;x 1,...,x d ) =,α > 0, det(σ) Properties: h(x;x 1 ) = x x 1 Σ 1, z M = z T Mz. D SVD α: 1-4 (C-sym.), D L2: 1-4 (A-sym.).

Type C example Projection depth: worst case outlyingness w.r.t. 1d-median, for any 1d projection. u T x med(u T X) O(x;F) = sup u 2 =1 MAD(u T,X F, X) MAD(Y) = med( Y med(y) ). Properties: 1-4 (H-sym.)

Summary Depth functions: simplicial, half-space,... They define: center-outward ordering, ranks, L-statistics, symmetry test. location (median,...), dispersion, scale, skewness, kurtosis (no moments), sunburst plot.

Thank you for the attention!

Dyckerhoff, R. and Mozharovskyi, P. (2016). Exact computation of the halfspace depth. Technical report, University of Cologne. http://arxiv.org/abs/1411.6927. Liu, R. (1990). On a notion of data depth based on random simplices. The Annals of Statistics, 18:405 414. Tukey, J. (1975). Mathematics and picturing data. In International Congress on Mathematics, pages 523 531. Zuo, B. Y. and Serfling, R. (2000). General notions of statistical depth function. The Annals of Statistics, 28:461 482.