Multivariate quantiles and conditional depth

Size: px

Start display at page:

Download "Multivariate quantiles and conditional depth"

Stuart Victor Wheeler
5 years ago
Views:

1 M. Hallin a,b, Z. Lu c, D. Paindaveine a, and M. Šiman d a Université libre de Bruxelles, Belgium b Princenton University, USA c University of Adelaide, Australia d Institute of Information Theory and Automation of the ASCR, Czech Republic Paris, février 2013

2 Partie 1: La profondeur et les quantiles multivariés

3 1 Statistical depth Halfspace depth More depths Possible applications 2 3 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles

4 Halfspace depth More depths Possible applications 1 Statistical depth Halfspace depth More depths Possible applications 2 3 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles

5 Halfspace depth More depths Possible applications Let C n = {y 1,..., y n}, y i R d. Statistical depth measures centrality of any y R d w.r.t. C n D(, C n) : R d [0, 1] y D(y, C n) The larger D(y, C n), the more central y w.r.t. C n. The (Tukey 1975) halfspace depth D H (y, C n) := 1 min u S d 1 n #{ y i C n : u (y i y) 0 }, where S d 1 := {u R d : u = 1} is the unit sphere in R d.

6 Halfspace depth More depths Possible applications The (Tukey 1975) halfspace depth D H (y, C n) := 1 min u S d 1 n #{ y i C n : u (y i y) 0 }, where S d 1 := {u R d : u = 1} is the unit sphere in R d

7 Halfspace depth More depths Possible applications The (Tukey 1975) halfspace depth D H (y, C n) := 1 min u S d 1 n #{ y i C n : u (y i y) 0 }, where S d 1 := {u R d : u = 1} is the unit sphere in R d

8 Halfspace depth More depths Possible applications The (Tukey 1975) halfspace depth D H (y, C n) := 1 min u S d 1 n #{ y i C n : u (y i y) 0 }, where S d 1 := {u R d : u = 1} is the unit sphere in R d

9 Halfspace depth More depths Possible applications The (Tukey 1975) halfspace depth D H (y, C n) := 1 min u S d 1 n #{ y i C n : u (y i y) 0 }, where S d 1 := {u R d : u = 1} is the unit sphere in R d

10 Halfspace depth More depths Possible applications The (Tukey 1975) halfspace depth D H (y, C n) := 1 min u S d 1 n #{ y i C n : u (y i y) 0 }, where S d 1 := {u R d : u = 1} is the unit sphere in R d

11 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ >

12 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

13 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

14 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

15 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

16 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

17 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

18 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

19 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

20 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

21 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

22 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

23 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

24 Halfspace depth More depths Possible applications depth regions R τ (C n) := { y R d : D(y, C n) τ }, for τ > The R τ s are nested, convex, and (for τ > 0) compact.

25 Halfspace depth More depths Possible applications From C n = {y 1,..., y n} to an arbitrary probability measure P Writing P (n) = 1 n n i=1 δy i for the empirical measure associated with C n = {y 1,..., y n}, 1 D H (y, C n) := min u S d 1 n #{ y i C n : u (y i y) 0 } = min u S d 1 P (n) [u (Y y) 0]. Extension to an arbitrary P : D H (y, P) = inf P[u (Y y) 0]. u S d 1 This population halfspace depth" was studied in Rousseeuw and Ruts (1999).

26 Halfspace depth More depths Possible applications This population halfspace depth" was studied in Rousseeuw and Ruts (1999) : D(Ay + b, P AY+b ) = D(y, P Y ) for any d d invertible matrix A, any d-vector b and any distribution P Y (affine-invariance) For any P, 1 d+1 max y R d D H(y, P) max y R d P[{y}] D H (y 0, P) = P[{y 0}] P is angularly symmetric about y 0 R τ (P) = {closed halfspaces H : P[H] > 1 τ}...

27 Halfspace depth More depths Possible applications This population halfspace depth" was studied in Rousseeuw and Ruts (1999) : D(Ay + b, P AY+b ) = D(y, P Y ) for any d d invertible matrix A, any d-vector b and any distribution P Y (affine-invariance) For any P, 1 d+1 max y R d D H(y, P) max y R d P[{y}] D H (y 0, P) = P[{y 0}] P is angularly symmetric about y 0 R τ (P) = {closed halfspaces H : P[H] > 1 τ}... Halfspace depth characterizes empirical measures; see Rousseuw and Struyf (1999)

28 Halfspace depth More depths Possible applications This population halfspace depth" was studied in Rousseeuw and Ruts (1999) : D(Ay + b, P AY+b ) = D(y, P Y ) for any d d invertible matrix A, any d-vector b and any distribution P Y (affine-invariance) For any P, 1 d+1 max y R d D H(y, P) max y R d P[{y}] D H (y 0, P) = P[{y 0}] P is angularly symmetric about y 0 R τ (P) = {closed halfspaces H : P[H] > 1 τ}... Halfspace depth characterizes empirical measures; see Rousseuw and Struyf (1999) compactly supported absolutely continuous distributions; see Koshevoy (2003)

29 Halfspace depth More depths Possible applications This population halfspace depth" was studied in Rousseeuw and Ruts (1999) : D(Ay + b, P AY+b ) = D(y, P Y ) for any d d invertible matrix A, any d-vector b and any distribution P Y (affine-invariance) For any P, 1 d+1 max y R d D H(y, P) max y R d P[{y}] D H (y 0, P) = P[{y 0}] P is angularly symmetric about y 0 R τ (P) = {closed halfspaces H : P[H] > 1 τ}... Halfspace depth characterizes empirical measures; see Rousseuw and Struyf (1999) compactly supported absolutely continuous distributions; see Koshevoy (2003) distributions with smooth depth contours; see Kong and Zuo (2010). General characterization remains an open question.

30 Halfspace depth More depths Possible applications Some depth regions R τ (P), with P = uniform on [0, 1] 2 ( depth >< density)

31 Halfspace depth More depths Possible applications Some depth regions R τ (P), with P having independent Cauchy marginals ( depth >< density)

32 Halfspace depth More depths Possible applications The (Liu 1990, AoS) simplicial depth D S (y, C n) := ( 1 n ) d+1 i 1 <i 2 <...<i d+1 I [ ] y Simpl(y i1, y i2,..., y id+1 ), where Simpl(y 1, y 2,..., y d+1 ) denotes the simplex with vertices y 1, y 2,..., y d

33 Halfspace depth More depths Possible applications The (Vardi and Zhang 2000) spatial depth D Sp (y, C n) := 1 O Sp (y, C n), with O Sp (y, C n) = 1 n n y y i y y i. i=1 The quantity O Sp (y, C n) is a measure of outlyingness of y wrt C n. The deepest point is the y 0 such that O Sp (y 0, C n) = 0. It is called the spatial median, and generalizes the univariate sample median. Milasevic and Ducharme (1987, AoS, ) prove uniqueness of the spatial median in dimension d > 1. Note that this depth is not affine-invariant.

34 Halfspace depth More depths Possible applications Spatial depth Halfspace depth Some depth regions R τ (P), with P = uniform on [0, 1] 2

35 Halfspace depth More depths Possible applications The Mahalanobis depth D M (y, C n) := d, with d := (y ȳ n) Sn 1 (y ȳ n), where ȳ n = 1 n n y i and S n = 1 n i=1 n (y i ȳ n)(y i ȳ n). (more generally, any affine-equivariant location ˆµ and scatter ˆΣ may be used) i=

36 Halfspace depth More depths Possible applications Axiomatic approach (Zuo and Serfling 2000, AoS): A depth function should satisfy the following property (P1) affine-invariance: D(Ay + b, P AY+b ) = D(y, P Y ) for any d d invertible matrix A, any d-vector b and any distribution P Y ; (P2) maximality at center: D(θ, P) = sup y R d D(y, P) holds for any P symmetric about θ (central, angular, or halfspace symmetry); (P3) monotonicity relative to the deepest point: for any P having deepest point θ and any u R d, r D(θ + ru, P) is monotone nonincreasing in r 0; (P4) vanishing at infinity: for any P, D(y, P) 0 as y.

37 Halfspace depth More depths Possible applications Axiomatic approach (Zuo and Serfling 2000, AoS): A depth function should satisfy the following property (P1) affine-invariance: D(Ay + b, P AY+b ) = D(y, P Y ) for any d d invertible matrix A, any d-vector b and any distribution P Y ; (P2) maximality at center: D(θ, P) = sup y R d D(y, P) holds for any P symmetric about θ (central, angular, or halfspace symmetry); (P3) monotonicity relative to the deepest point: for any P having deepest point θ and any u R d, r D(θ + ru, P) is monotone nonincreasing in r 0; (P4) vanishing at infinity: for any P, D(y, P) 0 as y. depth >< density

38 Halfspace depth More depths Possible applications (i) A quite informative descriptive tool

39 Halfspace depth More depths Possible applications Bai and He (1999, AoS) showed that, for P = P Y angularly symmetric about θ, { } n (arg max DH (, P (n) ) θ) L arg max inf Z (u) f u Y (0)u x x R d u S d 1 as n, where, letting H 0,u = {y R d : u y 0}, Z (u) is a centered Gaussian process on S d 1 with covariance function E[Z (u)z (v)] = P[H 0,u H 0,v ] P[H 0,u ]P[H 0,v ] = P[H 0,u H 0,v ] 1 4. Massé (2002) extended this result to the possibly asymmetric case. Still under angular symmetry, the simplicial deepest point was shown to be root-n consistent and asympotically normal in Arcones, Chen and Giné (1994, AoS). Obviously, this made use of U-process theory...

40 Halfspace depth More depths Possible applications The (Liu 1990, AoS) simplicial depth D S (y, C n) := ( 1 n ) d+1 i 1 <i 2 <...<i d+1 I [ ] y Simpl(y i1, y i2,..., y id+1 ), where Simpl(y 1, y 2,..., y d+1 ) denotes the simplex with vertices y 1, y 2,..., y d

41 Halfspace depth More depths Possible applications (ii) Symmetry testing From the results above, it follows that, for any absolutely continuous P, max D H (y, P) = 1 P is angularly symmetric about the point achieving depth 1/2. y R d 2 Hence, a universally consistent test for angular symmetry about y 0 (Rousseeuw and Struyf 2002) may reject the null for large values of T (n) = 1 2 D H(y 0, P (n) ) (quite remarkably, T (n) is distribution-free under the null). Dutta, Ghosh and Chaudhuri (2011, Bernoulli) independently provide the corresponding test about an unspecified (angular) symmetry center.

42 Halfspace depth More depths Possible applications (iii) Classification Consider (X, Y ), with X R d and Y {0, 1}. Given the realization (x, y) of which only x is observed, we have to make a "prediction" m(x) for the corresponding y

43 Halfspace depth More depths Possible applications (iii) Classification Consider (X, Y ), with X R d and Y {0, 1}. Given the realization (x, y) of which only x is observed, we have to make a "prediction" m(x) for the corresponding y. The rule that minimizes P[Y m(x)] (probability of misclassification) is (Bayes rule). [ m(x) = I η(x) > 1 ], with η(x) = P[Y = 1 X = x] 2

44 Halfspace depth More depths Possible applications (iii) Classification Consider (X, Y ), with X R d and Y {0, 1}. Given the realization (x, y) of which only x is observed, we have to make a "prediction" m(x) for the corresponding y. The rule that minimizes P[Y m(x)] (probability of misclassification) is (Bayes rule). [ m(x) = I η(x) > 1 ], with η(x) = P[Y = 1 X = x] 2 If π j = P[Y = j] and X [Y = j] N d (µ j, Σ j ) (j = 0, 1), this boils down to [ ] m(x) = I d Σ1 (x, µ 1 ) < d Σ0 (x, µ 0 ) + C, with C = C Σ0,Σ 1,π 0,π 1 (QDA).

45 Halfspace depth More depths Possible applications The red point is closer to the blue µ than to the orange µ classification into blue

46 Halfspace depth More depths Possible applications Under ellipticity assumptions (f j E(µ j, Σ j, g j )), QDA may be rephrased as [ ] m(x) = I d Σ1 (x, µ 1 ) < d Σ0 (x, µ 0 ) + C [ = I D M (x, P 1 ) > D M (x, P 0 ) + C ], since affine-equivariance implies that µ(p j ) = µ j and Σ(P j ) Σ j under ellipticity. In line with this, Ghosh and Chaudhuri (2005) propose [ ] m(x) = I D(x, P 1 ) > D(x, P 0 ), based on an arbitrary depth concept D. This is the so-called max-depth approach.

47 Halfspace depth More depths Possible applications The red point is deeper in the blue population than in the orange one classification into blue

48 Halfspace depth More depths Possible applications Li, Cuesta-Albertos, and Liu (JASA, 2012) refine the max-depth approach. They identify more appropriate separating lines in the DD plot, reporting points with coordinates ( D(x i, P (n) 1 ), D(x i, P (n) 0 )), i = 1,..., n. Pop 1 depth Pop 0 depth

49 Halfspace depth More depths Possible applications Li, Cuesta-Albertos, and Liu (JASA, 2012) refine the max-depth approach. They identify more appropriate separating lines in the DD plot, reporting points with coordinates ( D(x i, P (n) 1 ), D(x i, P (n) 0 )), i = 1,..., n. Pop 1 depth Pop 0 depth

50 Halfspace depth More depths Possible applications Li, Cuesta-Albertos, and Liu (JASA, 2012) refine the max-depth approach. They identify more appropriate separating lines in the DD plot, reporting points with coordinates ( D(x i, P (n) 1 ), D(x i, P (n) 0 )), i = 1,..., n. Pop 1 depth Pop 0 depth

51 Halfspace depth More depths Possible applications Further recent approaches: Dutta and Ghosh (2012a) consider max-depth classification based on projection depth. Achieving consistency at the elliptical model requires a modification of the procedure that requires estimating densities. Hubert and Van der Veeken (2012) adopt a parallel approach, but based on a modified version of projection depth that can deal with skewed data (standard outlyingness is replaced with "adjusted outlyingess"). Dutta and Ghosh (2012b) assume that the parent distributions are (affine transformations) of L p-spherically symmetric distributions. They propose a max-depth classifier based on an affine-invariant L p depth with an adaptive choice of p( 1) implementation again requires estimating densities. Dutta, Chaudhuri, and Ghosh (2012) define a depth-based classifier that aggregates several posterior probabilities, each of which is based on spatial depth with a fixed level of localization. Paindaveine and Van Bever (2012) define knn depth-based classifiers.

52 Halfspace depth More depths Possible applications Unless a desperately parametric Mahalanobis-type depth is used, the procedure fails if x lies outside the supports of P 0 and P 1 (since 0 = D(x, P 0 ) = D(x, P 1 ) = 0)

53 1 Statistical depth Halfspace depth More depths Possible applications 2 3 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles

54 Consider the univariate case d = 1. If P is an absolutely continuous distribution with cdf F, then D H (y, P) = min(f(y), 1 F (y))

55 Consider the univariate case d = 1. If P is an absolutely continuous distribution with cdf F, then D H (y, P) = min(f(y), 1 F (y)), so that R τ (P) = [ F 1 (τ), F 1 (1 τ) ], τ (0, 1/2]

56 Consider the univariate case d = 1. If P is an absolutely continuous distribution with cdf F, then D H (y, P) = min(f(y), 1 F (y)), so that R τ (P) = [ F 1 (τ), F 1 (1 τ) ], τ (0, 1/2] Hence, for d = 1, There is a strong connection between depth and quantiles The τ-depth region is associated with a pair of quantiles (orders τ, 1 τ) The most central region is the singleton R 1 (P) = {F 1 ( 1 )} = {Med(P)} 2 2 The R τ (P) s are L 1 -, hence robust, functionals

57 Consider the univariate case d = 1. If P is an absolutely continuous distribution with cdf F, then D H (y, P) = min(f(y), 1 F (y)), so that R τ (P) = [ F 1 (τ), F 1 (1 τ) ], τ (0, 1/2] Hence, for d = 1, There is a strong connection between depth and quantiles The τ-depth region is associated with a pair of quantiles (orders τ, 1 τ) The most central region is the singleton R 1 (P) = {F 1 ( 1 )} = {Med(P)} 2 2 The R τ (P) s are L 1 -, hence robust, functionals Does such a connection exist for d > 1?

58 Note that the end points of R τ (P) = [ F 1 (τ), F 1 (1 τ) ] may be regarded as The quantile of order τ in direction u = 1 The quantile of order τ in direction u = 1

59 Note that the end points of R τ (P) = [ F 1 (τ), F 1 (1 τ) ] may be regarded as The quantile of order τ in direction u = 1 The quantile of order τ in direction u = 1 This suggests directional quantiles, indexed by an order τ (0, 1) a unit vector u S 0 = { 1, 1}.

60 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles 1 Statistical depth Halfspace depth More depths Possible applications 2 3 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles

61 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Many multivariate quantiles are indexed by an order p [0, 1) a unit vector u S d 1 = {y R d : y = 1}, or equivalently, by p = pu B 1 (0) = {y R d : y < 1}. In particular, a d-dimensional quantile function, in Serfling (2006), is a mapping of the form Q(, P) : B 1 (0) R d p Q(p, P), where p is an "outlyingness" parameter and u indeed represents a direction in some sense (e.g., direction from M(P) = Med(P) = Q(0, P) to y = Q(p, P)).

62 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Example of quantile function: the spatial quantile function, defined through Q(p, P) = arg min y R d E P [ Y y + p (Y y)], that coincides with the solution (in y) of [ y Y ] p = E P ; y Y Dudley and Koltchinskii (1992), Chaudhuri (1996, JASA), and Koltchinskii (1997, AoS). For d = 1, Q(pu, P) = F 1 (τ), with τ = (pu + 1)/2. Orthogonal-equivariance, but no affine-equivariance. An affine-equivariant version was defined in Serfling (2010). For p = 0, Q(p, P) is the spatial median... The deepest point for the spatial depth. More generally, D Sp (y, P) = 1 p(y, P) = 1 Q 1 (y, P). (connection between spatial depth and spatial quantiles).

63 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Depth-induced quantile functions (Serfling 2006) : Let D(, P) be a depth, with depth regions R τ (P) and deepest point M(P), say. Denote as R p (P) the depth region with P-probability p. One can then associate with D(, P) the quantile function defined by { {M(P) + λu : λ > 0} R p (P) if p 0 Q(p, P) = Q(pu, P) = M(P) if p =

64 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Depth-induced quantile functions (Serfling 2006) : Let D(, P) be a depth, with depth regions R τ (P) and deepest point M(P), say. Denote as R p (P) the depth region with P-probability p. One can then associate with D(, P) the quantile function defined by { {M(P) + λu : λ > 0} R p (P) if p 0 Q(p, P) = Q(pu, P) = M(P) if p = 0. Parallel to spatial quantiles, these depth-induced quantiles are points in R d. We argue that (directional) quantiles should not be points, but surfaces, such as hyperplanes... Critical values.

65 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles The depth region R τ (P) rewrites R τ (P) = [ F 1 (τ), F 1 (1 τ) ], τ (0, 1/2] = [ F 1 (τ), ) (, F 1 (1 τ) ] =: H + τ,1(p) H + τ, 1(P), hence is the intersection of two directional τ-upper quantile halfspaces. Quantile (halfspace) at level τ =.25 in direction u = 1 Quantile (halfspace) at level τ =.25 in direction u = 1

66 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Let Y be a random d-vector. Let τ = τu, where τ = τ (0, 1) and u S d 1. Kong and Mizera (arxiv 2008, SS 2012) define the τ-quantile of Y as the point q KM,τ := F 1 u Y (τ)u. Or equivalently, one may consider the quantile hyperplane" π KM,τ := { } y R d : u (y q KM,τ ) = 0. Note that π KM,τ is orthogonal to u at q KM,τ.

67 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles

68 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles π (n) KM,τu, with u = (0, 1), τ =.1 (n = 500)

69 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles π (n) KM,τu, with u = (0, 1), τ {.1,.2,.3,.4} (n = 500)

70 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles π (n) KM,τu, with u {(0, 1), (1, 0) }, τ {.1,.2,.3,.4} (n = 500)

71 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles π (n) KM,τu, with u {(0, 1), (1, 0), ( 1 1 2, 2 ) }, τ {.1,.2,.3,.4} (n = 500)

72 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles π (n) KM,τu, with u {(0, ±1), (±1, 0), (± 1 2, ± 1 2 ) }, τ =.1 (n = 500)

73 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles π (n) KM,τu, with 512 u S1, τ =.1 (n = 500)

74 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles π (n) KM,τu, with 512 u S1, τ =.1 (n = 500)

75 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Defining H (n)+ KM,τ := {y R d : u (y q (n) KM,τ ) 0}, we have that, for any τ > 0, u S d 1H (n)+ KM,τu = Rτ (P(n) ). This establishes a connection, in the general multivariate case, between halfspace depth regions and "directional upper quantile halfspaces".

τ,1(p) H + τ, 1(P), hence is the intersection of two directional τ-upper quantile halfspaces.

76 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles The depth region R τ (P) rewrites R τ (P) = [ F 1 (τ), F 1 (1 τ) ], τ (0, 1/2] = [ F 1 (τ), ) (, F 1 (1 τ) ] =: H + τ,1(p) H + τ, 1(P), hence is the intersection of two directional τ-upper quantile halfspaces. Quantile (halfspace) at level τ =.25 in direction u = 1 Quantile (halfspace) at level τ =.25 in direction u = 1

77 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Defining H (n)+ KM,τ := {y R d : u (y q (n) KM,τ ) 0}, we have that, for any τ > 0, u S d 1H (n)+ KM,τu = Rτ (P(n) ). This establishes a connection, in the general multivariate case, between halfspace depth regions and "directional upper quantile halfspaces". This connection, however, is quite an indirect one : - The depth regions R τ (P (n) ) cannot be constructed from these quantile hyperplanes, as this would involve an infinite number of u values. - This does not allow for importing to depth the analytical and computational tools related with the L 1 nature of quantiles (Bahadur representations and L 1 simplex algorithms).

78 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles... The L 1 nature of quantile of a random variable Y. Under second-order moment assumptions, it is well-known that E[Y ] = arg min a R E[(Y a)2 ]. Replacing L 2 losses with L 1 losses yields More generally, for any τ (0, 1), Med[Y ] = F 1 (1/2) = arg min E[ Y a ]. a R F 1 (τ) = arg min E[ρτ (Y a)], a R with ρ τ (z) := { (1 τ) z if z < 0 τ z if z 0.

79 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Often covariates are available and it makes sense to consider Y s quantiles conditionally upon the covariates in X = (X 1,..., X p) = (1, W ). n = 1000 observations from Y = (1 + X) + ε, X U([ 3, 2]) ε N (0, 1)

80 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Koenker and Basset (1978) define the regression τ-quantile of Y given X as a τ X, with a τ := arg min a R p E[ρτ (Y a X)]. If X = (1, W ) = 1, then a τ X = a τ reduces to the (unconditional) τ-quantile F 1 Y (τ) of Y.

81 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles n = 1000 observations from Y = (1 + W ) + ε, W U([ 3, 2]) ε N (0, 1)

82 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles n = 1000 observations from Y = (1 + W ) + (W + 4)ε, W U([ 3, 2]) ε N (0, 1)

83 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles By now, nonlinear or even nonparametric quantile regression has become standard.

84 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Hallin, Paindaveine and Šiman (2010, AoS) propose an alternative concept of directional quantiles. For any τ (0, 1), and any direction u S d 1, we define the (τ u)-quantile hyperplane π τu (π (n) τu in the empirical case) as the Koenker and Bassett regression τ-quantile hyperplane once u has been chosen as the (oriented) "vertical axis" in the computation of L 1 deviations.

85 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles More specifically, decompose y = (u y)u + Γ u(γ uy), y R d where Γ u is such that the columns of (u, Γ u) form an orthogonal basis of R d. Then π τu and π (n) τu are the hyperplanes with equations u y c τuγ uy a τu = 0 and u y c (n) τu Γ uy a τu = 0 minimizing, w.r.t. c R d 1 and a R, E[ρ τ (u Y c Γ uy a)] and n ρ τ (u Y i c Γ uy i a), i=1 respectively.

86 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles We define the (τ u)-quantile hyperplane as the Koenker and Bassett regression τ-quantile hyperplane once u has been chosen as the (oriented) "vertical axis". τ =.1,.2,.3 u = ( )

87 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles We define the (τ u)-quantile hyperplane as the Koenker and Bassett regression τ-quantile hyperplane once u has been chosen as the (oriented) "vertical axis". τ =.1,.2,.3 u = ( )

88 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles For the π (n) τu s, we establish consistency, Bahadur-type representation, and asymptotic normality.

89 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Let and ξ i,τ (a, c) := ( ) ( 1 ) t I [u Y i c Γ u Y i a<0], Y i H τ := R m 1 ( 1 z z zz ) f ((a τ + c τ z)u + Γ uz) dz ( ) t(1 t) t(1 t)e[y V τ := J ] u t(1 t)e[y ] Var[(t I [Y R m \Hτ )Y ] J u, ] where J u := diag(1, Γ u). Theorem Under mild regularity assumptions, we have that, as n, ( ) a τ (n) a τ n = 1 n H 1 τ J u ξ i,τ c τ n (aτ, cτ ) + o P (1) c (n) τ i=1 L N m(0, Hτ 1 V τ H 1 τ ).

90 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles For the π (n) τu s, we establish consistency, Bahadur-type representation, and asymptotic normality. We also provide efficient linear programming methods for the construction of the π (n) τu s and the corresponding inner regions...

91 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles We define the (τ u)-quantile hyperplane as the Koenker and Bassett regression τ-quantile hyperplane once u has been chosen as the (oriented) "vertical axis". τ =.1 u = ( )

92 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles We define the (τ u)-quantile hyperplane as the Koenker and Bassett regression τ-quantile hyperplane once u has been chosen as the (oriented) "vertical axis". u = ( 0 ±1 τ =.1 ) (, ±1 0 ), ( ±1/ 2 ±1/ 2 i.e., 8 equispaced u s )

93 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles We define the (τ u)-quantile hyperplane as the Koenker and Bassett regression τ-quantile hyperplane once u has been chosen as the (oriented) "vertical axis". τ = equispaced u s

94 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles We define the (τ u)-quantile hyperplane as the Koenker and Bassett regression τ-quantile hyperplane once u has been chosen as the (oriented) "vertical axis". τ = equispaced u s

95 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Denoting as H (n)+ τ the hallfspace "above" π (n) τ, we have that, for any τ > 0, u S d 1H (n)+ τu = R τ (P (n) ). This establishes another connection, in the general multivariate case, between halfspace depth regions and "directional upper quantile halfspaces". Unlike the previous connection, however, this intersection is over finitely many u s.

96 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles π (n) KM,τu, with 512 u S1, τ =.1 (n = 500)

97 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles π (n) τu, with 512 u S 1, τ =.1 (n = 500)

98 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles For n = 15 data points only : π (n) τu, with 512 u S 1, τ =.1 (n = 15)

99 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles For n = 15 data points only : π (n) KM,τu, with 512 u S1, τ =.1 (n = 15)

100 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Denoting as H (n)+ τ the hallfspace "above" π (n) τ, we have that, for any τ > 0, u S d 1H (n)+ τu = R τ (P (n) ). This establishes another connection, in the general multivariate case, between halfspace depth regions and "directional upper quantile halfspaces". Unlike the previous connection, however, this intersection is over finitely many u s. We showed that this bridge between the quantile and depth worlds makes it possible to import linear programming computability to depth.

101 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles We showed that the breaks in the process u H (n)+ τu (by means of parametric linear programming). can be identified efficiently Hence, we provide efficient and exact depth contour computations

(n)+ τu (by means of parametric linear programming).

102 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles We showed that the breaks in the process u H (n)+ τu (by means of parametric linear programming). can be identified efficiently Hence, we provide efficient and exact depth contour computations

103 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Some halfspace depth contours from n = 449 i.i.d. bivariate observations with independent U([.5,.5]), N (0, 1), and t 1 marginals

104 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles P = uniform on [0, 1] 2 P with independent t 1 marginals Some depth regions R τ (P)

105 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles π (n) τu, with u = {(±1, 0, 0), (0, ±1, 0), (0, 0, ±1) }, τ =.1, and the resulting halfspace depth contour from n = 49 i.i.d. observations with independent U([.5,.5]) marginals

106 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Some halfspace depth contours from n = 449 i.i.d. trivariate observations with independent U([.5,.5]), N (0, 1), and t 1 marginals

107 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles This provides a (directional) quantile that is easy to compute that is affine-equivariant from which (halfspace) depth regions can be computed, and that characterizes the distribution of Y What about the regression case?

108 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles

109 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles References Bai, Z.-D., and He, X. (1999). Asymptotic Distributions of the Maximal Depth Estimators for Regression and Multivariate Location. Ann. Statist Chaudhuri, P. (1996). On a geometric notion of quantiles for multivariate data. J. Amer. Statist. Assoc Dudley, R. M., and Koltchinskii, V. I. (1992). The spatial quantiles. Unpublished manuscript. Dutta, S., Ghosh, A. K., and Chaudhuri, P. (2011). Some intriguing properties of TukeyÕs half-space depth. Bernoulli Dutta, S. and Ghosh, A. K. (2012a). On robust classification using projection depth. Ann. Inst. Statist. Math Dutta, S., and Ghosh, A. K. (2012b). "On Classification Based on Lp -Depth with an Adaptive Choice of p." Submitted. Ghosh, A. K. and Chaudhuri, P. (2005). On maximum depth and related classifiers. Scand. J. Statist Hallin, M., Paindaveine, D., and Šiman, M. (2010). and multiple-output regression quantiles: from L 1 optimization to halfspace depth. Ann. Statist. 38, (with discussion). Hallin, M., Lu, Z., Paindaveine, D., and Šiman, M. (2013). Local constant and local bilinear multiple-output quantile regression. Submitted. Hubert, M., and Van der Veeken, S. (2012). "Robust Classification for Skewed Data," Advances in Data Analysis and Classification, to appear. Koenker, R., and Bassett, G. J. (1978). Regression quantiles. Econometrica Koshevoy, G.A. (2003). Lift-zonoid and multivariate depths. In: R. Dutter, P. Filzmoser, U. Gather, P.J. Rousseeuw (Eds.), Developments in Robust Statistics: International Conference on Robust Statistics 2001, Springer-Physica, Heidelberg, Koltchinskii, V. (1997). M-estimation, convexity and quantiles. Ann. Statist Kong, L., and Mizera, I. (2008). Quantile tomography: using quantiles with multivariate data. Statistica Sinica 22, Kong, L., and Mizera, I. (2012). Quantile tomography: using quantiles with multivariate data. Statistica Sinica 22, Kong, L. and Zuo, Y. (2010). Smooth depth contours characterize the underlying distribution. J. Multivariate Anal Li, J., Cuesta-Albertos, J. A., and Liu, R. Y. (2012). DD-Classifier: Nonparametric classification procedures based on DD-plots. J. Amer. Statist. Assoc

110 Serfling s D-O-Q(-R) paradigm, Chaudhuri (1996) Kong and Mizera (2012) HPŠ quantiles Liu, R. Y. (1990). On a notion of data depth based on random simplices. Ann. Statist Liu, R. Y., Parelius, J. M., and Singh, K. (1999). "Multivariate Analysis by Data Depth: Descriptive Statistics, Graphics and Inference" (with discussion), Ann. Statist Massé, J.-C. (2002). Asymptotics for the Tukey Median. J. Multivariate Anal Milasevic, P., and Ducharme, G. R. (1987). Uniqueness of the Spatial Median. Ann. Statist Paindaveine, D., and Šiman, M. (2012). Computing multiple-output regression quantile regions (with M. Siman). Computational Statistics and Data Analysis 56, Paindaveine, D. and Van Bever, G. (2012). Nonparametrically consistent depth-based classifiers. ECARES working paper Rousseeuw, P. J., and Ruts, I. (1999). The depth function of a population distribution. Metrika Rousseeuw, P. J. and Struyf, A. (2002). A depth test for symmetry. In Goodness-of-fit tests and model validity (Paris, 2000), Boston, MA: Birkhäuser Boston, Stat. Ind. Technol., pp Rousseeuw, P. J., and Struyf, A. (2004). Characterizing angular symmetry and regression symmetry. J. Statist. Plann. Inference Serfling, R. (2006). Depth functions in nonparametric multivariate inference. In Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications (R. Y. Liu, R. Serfling, D. L. Souvaine, eds.), pp DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Volume 72, American Mathematical Society. Serfling, R. (2010). Equivariance and invariance properties of multivariate quantile and related functions, and the role of standardization, J. Nonparametr. Stat. 22, Struyf, A., and Rousseeuw, P. J. (2005). Halfspace depth and regression depth characterize the empirical distribution. J. Multivariate Anal Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the international congress of mathematicians (Vancouver, B. C., 1974), Vol. 2, Quebec: Canad. Math. Congress Vardi, Y. and Zhang, C.-H. (2000). The multivariate L 1 -median and associated data depth. Proceedings National Academy of Science USA Zuo, Y., and Serfling, R. (2000). "General notions of statistical depth function," Ann. Statist

Robust estimation of principal components from depth-based multivariate rank covariance matrix

Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary