Geodesic Convexity and Regularized Scatter Estimation

Size: px

Start display at page:

Download "Geodesic Convexity and Regularized Scatter Estimation"

Piers Wilson
6 years ago
Views:

1 Geodesic Convexity and Regularized Scatter Estimation Lutz Duembgen (Bern) David Tyler (Rutgers) Klaus Nordhausen (Turku/Vienna), Heike Schuhmacher (Bern) Markus Pauly (Ulm), Thomas Schweizer (Bern) Düsseldorf, July 22, 2017

2 I. Geometry of Scatter Matrices II. Geodesic Convexity and Coercivity III. M-Functionals of Scatter IV. Regularization

3 I. Geometry of Scatter Matrices R q q sym := { A R q q : A = A } R q q sym,+ := { A R q q sym : A positive definite } (open convex cone in R q q sym ) A, B := tr(ab) = i,j A ij B ij A F := A, A

4 z x y

5 A = [ z + x y ] y z x = [ ] x y + z I y x 2 A 2 F = 2 (x 2 + y 2 + z 2 ) A positive definite z > x 2 + y 2

6 µ R q Σ R q q sym,+ ˆΣ = sample covariance matrix of X 1, X 2,..., X n i.i.d N q (µ, Σ).

7 m = 50 samples of size n = 100:

8 m = 50 samples of size n = 500:

9 Suitable Geometry W ˆΣ = Σ 1/2 W Σ 1/2 { has universal symmetric distribution(q, n) p I q as n Local distance measure at Σ: d Σ (Σ, ˆΣ) := W I q F d Σ (Σ 0, Σ 1 ) := Σ 1/2 (Σ 0 Σ 1 )Σ 1/2 F

10 Global distance measure (geodesic distance) D g (Σ 0, Σ 1 ) := min over all smooth paths connecting Σ 0 and Σ 1. = min 1 d Σt ( Σt, Σ t+dt ) 0 1 Σ 1/2 t 0 [0, 1] t Σ t Σ t Σ 1/2 F dt t

11 Explicit solution A = log(σ 1/2 0 Σ 1 Σ 1/2 0 ) Σ t = Σ 1/2 0 exp(ta) Σ 1/2 0 D g (Σ 0, Σ 1 ) = A F Note: exp(a) = k=0 A k k! exp ( U diag(λ)u ) = U diag(e λ )U log ( U diag(λ)u ) = U diag(log λ)u

12 *

14 Local global parametrizations of R q q sym,+ Σ = BB with nonsingular B R q q R q q sym,+ = { B exp(a)b : A R q q } sym { Γ R q q sym,+ : det(γ) = det(σ)} = { B exp(a)b : A R q q sym, tr(a) = 0 }.

15 Note that for q 2, is not isometric. (R q q sym,+, D g ) Σ log(σ) (R q q sym, F ) y x

16 II. Geodesic Convexity and Coercivity Geodesic Convexity A function is (strictly) geodesically convex if f : R q q sym,+ R nonsingular B R q q, nonzero A R q q sym, f ( B exp(ta)b ) is (strictly) convex in t R. Equivalently: nonsingular B R q q, f ( B diag(e x )B ) is (strictly) convex in x R q.

17 Example The function is geodesically linear: f (Σ) := log det(σ) log det(b exp(a)b ) = log det(bb ) + trace(a).

18 Verifying g-convexity for smooth functions (V2) For any nonsingular B R q q and x R q, f ( B diag(e x )B ) = f (BB ) + gb x x H B x + o( x 2 ) as x 0. f is g-convex iff for all B, H B 0. f is strictly g-convex iff for all B, H B > 0.

19 Example For nonzero v R q, f (Σ) := log v Σv is g-convex. For nonsingular B R q q and w := B v, f ( B diag(e x )B ) = log ( w diag(e x )w ) f (BB ) + gb x x H B x with g B := ( w 2 i w 2 ) q i=1 H B := diag(g B ) g B g B.

20 Remarks Σ f (Σ) g-convex Σ f (Σ 1 ) g-convex. Sums and pointwise suprema of g-convex functions are g-convex. Both log λ max (Σ) and log λ max (Σ 1 ) = log λ min (Σ) are g-convex. f (Σ) g-convex, h : R R convex and increasing = h(f (Σ)) is g-convex. A local minimizer of a g-convex function is also a global minimizer. The only g-affine functions are f (Σ) = c 1 + c 2 log det(σ) with c 1, c 2 R.

21 Geodesic Coercivity Let f : R q q sym,+ R be g-convex / strictly g-convex. Then iff f is g-coercive, arg min f (Σ) is compact / a singleton Σ f (Σ) as log(σ) F. Criterion: If f is differentiable, it is g-coercive iff lim t for any nonzero A R q q sym. d dt f (exp(ta)) > 0

22 III. M-Functionals of Scatter True/empirical distribution Working model/caricature for P: P on R q with center 0 R q. ( f Σ (x) = C det(σ) 1/2 exp ρ(x Σ 1 x) ) 2 ρ(s) in s > 0 sρ (s) in s > 0 In other words, ρ(e x ) and convex in x R.

23 Target function (log-likelihood times 2/n) L(Σ, P) := 2 log[f Σ /f I ] dp = [ρ(x Σ 1 x) ρ(x x) ] P(dx) + log det(σ) M-Functional of scatter Σ(P) := arg min L(Σ, P) Σ R q q sym,+ M-estimator of scatter ˆP = emp. distribution of X 1, X 2,..., X n i.i.d. P Σ( ˆP) estimates Σ(P)

24 L(Σ, P) = Σ(P) = [ρ(x Σ 1 x) ρ(x x) ] P(dx) + log det(σ) arg min L(Σ, P) Σ R q q sym,+ ρ(s) = s: Σ(P) = Var(P) sρ (s) bounded in s 0: Σ( ) is moderately robust P elliptically symmetric with center 0 and scatter Σ: Σ(P) = c Σ

25 Good news In general, L(, P) is geodesically convex. Under mild regularity conditions on P and ρ, L(, P) is geodesically strictly convex and coercive.

26 Taylor expansion with L(B diag(e x )B, P) L(BB, P) + g B x x H B x g B := 1 q ψ B ψ B := ρ ( x 2 )(xi 2 ) q i=1 P B(dx) H B := diag(ψ B ) + ρ ( x 2 )xx P B (dx) P B := L(B 1 X ), X P. Existence, continuity and weak differentiability of Σ( )... Fast algorithms for computation of Σ( ˆP) via partial Newton method...

27 Symmetrization Replace Σ(P) with Σ s (P) := Σ(P P) P P := L(X X ), X, X i.i.d P. Estimator uses or with 1 k n. P P := P P := 1 nk ( ) n 1 δ 2 Xj X i 1 i<j n n i+k i=1 j=i+1 δ Xj X i

28 No need to estimate center of P P elliptically symmetric around µ with scatter Σ: Σ s (P) = c Σ Block independence property: ( P = L B [ X1 X 2 ]) with independent X 1 R q(1), X 2 R q(2) implies [ ] Σ1 (P) 0 Σ s (P) = B B. 0 Σ 2 (P)

29 IV. Regularization In high-dimensional settings replace Σ(P) with ( ) arg min L(Σ, P) + α Pen(Σ), α > 0, Σ R q q sym,+ where Pen : R q q sym,+ R satisfies Pen(cΣ) = Pen(Σ) (scale invariance) Pen(Σ) as λ max (Σ)/λ min (Σ).

30 Examples of penalties: Pen 0 (Σ) = log tr(σ) + log tr(σ 1 ) ( q ) ( q ) = log λ i + log λ 1 i ) i=1 i=1 Pen 1 (Σ) = q 1 log det(σ) + log tr(σ 1 ) q ( q ) = q 1 log λ i + log i=1 i=1 λ 1 i Pen 2 (Σ) = log det(σ) + q log λ max (Σ) q = log(λ i /λ min ) i=1

31 These penalties Pen j (Σ) are scale invariant g-convex g-coercive on {Σ : det(σ) = c} strictly g-convex on {Σ : det(σ) = c} (Pen 0, Pen 1 ) with arg min Σ Pen j (Σ) = {ci q : c > 0}

32 Example: Regularized version of Tyler s (1987) M-functional with f (Σ) = L(Σ, P) + α Pen(Σ) ρ(s) = q log s { Pen 1 (Σ) (Case 1) Pen(Σ) = h(pen 1 (Σ)) (Case 2) On {Σ : det(σ) = 1}, f is strictly g-convex g-coercive in Case 1 if ( P(V) < 1 + α ) dim(v) q q g-coercive in Case 2 if lim s h(s) s whenever 1 dim(v) < q. =

33 Numerical experiment For q = 50 and n = 30 consider X 1, X 2,..., X n i.i.d. Elliptic q (0, Σ) with Σ = diag(10, 5, 3, 2, 1,..., 1) 2.

34 Compute and ˆΣ α ( := arg min L(Σ, ˆP) + αh(pen 1 (Σ)) ) Σ ˆΣ := ˆΣˆα with ˆα := arg min CV(α) CV(α) := α 2 Z n i=1 { ρ(x i ˆΣ 1 α, i x i) + log det(ˆσ α, i ) }

35 log λ(σ ) and log λ(ˆσ ) (ˆα = 2 7 )

36 Cross validation: CV(2 k ) versus k

37 First eigenvectors: û 1 u 1 versus k

38 Eigenvalues: log λ(ˆσ ) log λ(σ ) versus k

39 Shape matrices: D g (ˆΣ, Σ ) versus k

40 Symmetrization and orthogonally invariant penalties f (Σ) = L(Σ, P P) + α Pen(Σ) Pen(U ΣU) = Pen(Σ) for orthogonal U R q q Restricted block independence property ( [ ]) X1 P = L U with U R q q orth and independent X 1 R q(1), X 2 R q(2) implies [ ] Σ1 (P) 0 Σ s (P) = U U. 0 Σ 2 (P) X 2

41 Open questions and ongoing work Symmetrized M-estimators: Balanced incomplete versus complete U-statistics Asymptotics for regularized scatter estimators Algorithms for non-smooth g-convex penalties Using regularized scatter estimators in other contexts (classification, ICS ICA, multivar. regression,... )...

42 References Auderset, Mazza & Ruh: Angular Gaussian and Cauchy estimation. (JMVA 2005) Bhatia: Positive definite matrices. (Princeton University Press 2007) Wiesel: Geodesic convexity and covariance estimation. (IEEE Trans. Signal Process. 2012) D., Pauly & Schweizer: M-functionals of multivariate scatter. (Statistics Surveys 2015) D., Nordhausen & Schuhmacher: New algorithms for M-estimation of multivar. scatter and loc. (JMVA 2016) R package fastm. (CRAN 2014/2015) D. & Tyler: Geodesic convexity and regularized scatter estimators. (arxiv )

Parameter estimation in linear Gaussian covariance models

Parameter estimation in linear Gaussian covariance models Caroline Uhler (IST Austria) Joint work with Piotr Zwiernik (UC Berkeley) and Donald Richards (Penn State University) Big Data Reunion Workshop