Geodesic Convexity and Regularized Scatter Estimation

Geodesic Convexity and Regularized Scatter Estimation Lutz Duembgen (Bern) David Tyler (Rutgers) Klaus Nordhausen (Turku/Vienna), Heike Schuhmacher (Bern) Markus Pauly (Ulm), Thomas Schweizer (Bern) Düsseldorf, July 22, 2017

I. Geometry of Scatter Matrices II. Geodesic Convexity and Coercivity III. M-Functionals of Scatter IV. Regularization

I. Geometry of Scatter Matrices R q q sym := { A R q q : A = A } R q q sym,+ := { A R q q sym : A positive definite } (open convex cone in R q q sym ) A, B := tr(ab) = i,j A ij B ij A F := A, A

z x y

A = [ z + x y ] y z x = [ ] x y + z I y x 2 A 2 F = 2 (x 2 + y 2 + z 2 ) A positive definite z > x 2 + y 2

µ R q Σ R q q sym,+ ˆΣ = sample covariance matrix of X 1, X 2,..., X n i.i.d N q (µ, Σ).

m = 50 samples of size n = 100:

m = 50 samples of size n = 500:

Suitable Geometry W ˆΣ = Σ 1/2 W Σ 1/2 { has universal symmetric distribution(q, n) p I q as n Local distance measure at Σ: d Σ (Σ, ˆΣ) := W I q F d Σ (Σ 0, Σ 1 ) := Σ 1/2 (Σ 0 Σ 1 )Σ 1/2 F

Global distance measure (geodesic distance) D g (Σ 0, Σ 1 ) := min over all smooth paths connecting Σ 0 and Σ 1. = min 1 d Σt ( Σt, Σ t+dt ) 0 1 Σ 1/2 t 0 [0, 1] t Σ t Σ t Σ 1/2 F dt t

Explicit solution A = log(σ 1/2 0 Σ 1 Σ 1/2 0 ) Σ t = Σ 1/2 0 exp(ta) Σ 1/2 0 D g (Σ 0, Σ 1 ) = A F Note: exp(a) = k=0 A k k! exp ( U diag(λ)u ) = U diag(e λ )U log ( U diag(λ)u ) = U diag(log λ)u

Local global parametrizations of R q q sym,+ Σ = BB with nonsingular B R q q R q q sym,+ = { B exp(a)b : A R q q } sym { Γ R q q sym,+ : det(γ) = det(σ)} = { B exp(a)b : A R q q sym, tr(a) = 0 }.

Note that for q 2, is not isometric. (R q q sym,+, D g ) Σ log(σ) (R q q sym, F ) -3-2 -1 0 1 2 3 y -3-2 -1 0 1 2 3 x

II. Geodesic Convexity and Coercivity Geodesic Convexity A function is (strictly) geodesically convex if f : R q q sym,+ R nonsingular B R q q, nonzero A R q q sym, f ( B exp(ta)b ) is (strictly) convex in t R. Equivalently: nonsingular B R q q, f ( B diag(e x )B ) is (strictly) convex in x R q.

Example The function is geodesically linear: f (Σ) := log det(σ) log det(b exp(a)b ) = log det(bb ) + trace(a).

Verifying g-convexity for smooth functions (V2) For any nonsingular B R q q and x R q, f ( B diag(e x )B ) = f (BB ) + gb x + 1 2 x H B x + o( x 2 ) as x 0. f is g-convex iff for all B, H B 0. f is strictly g-convex iff for all B, H B > 0.

Example For nonzero v R q, f (Σ) := log v Σv is g-convex. For nonsingular B R q q and w := B v, f ( B diag(e x )B ) = log ( w diag(e x )w ) f (BB ) + gb x + 1 2 x H B x with g B := ( w 2 i w 2 ) q i=1 H B := diag(g B ) g B g B.

Remarks Σ f (Σ) g-convex Σ f (Σ 1 ) g-convex. Sums and pointwise suprema of g-convex functions are g-convex. Both log λ max (Σ) and log λ max (Σ 1 ) = log λ min (Σ) are g-convex. f (Σ) g-convex, h : R R convex and increasing = h(f (Σ)) is g-convex. A local minimizer of a g-convex function is also a global minimizer. The only g-affine functions are f (Σ) = c 1 + c 2 log det(σ) with c 1, c 2 R.

Geodesic Coercivity Let f : R q q sym,+ R be g-convex / strictly g-convex. Then iff f is g-coercive, arg min f (Σ) is compact / a singleton Σ f (Σ) as log(σ) F. Criterion: If f is differentiable, it is g-coercive iff lim t for any nonzero A R q q sym. d dt f (exp(ta)) > 0

III. M-Functionals of Scatter True/empirical distribution Working model/caricature for P: P on R q with center 0 R q. ( f Σ (x) = C det(σ) 1/2 exp ρ(x Σ 1 x) ) 2 ρ(s) in s > 0 sρ (s) in s > 0 In other words, ρ(e x ) and convex in x R.

Target function (log-likelihood times 2/n) L(Σ, P) := 2 log[f Σ /f I ] dp = [ρ(x Σ 1 x) ρ(x x) ] P(dx) + log det(σ) M-Functional of scatter Σ(P) := arg min L(Σ, P) Σ R q q sym,+ M-estimator of scatter ˆP = emp. distribution of X 1, X 2,..., X n i.i.d. P Σ( ˆP) estimates Σ(P)

L(Σ, P) = Σ(P) = [ρ(x Σ 1 x) ρ(x x) ] P(dx) + log det(σ) arg min L(Σ, P) Σ R q q sym,+ ρ(s) = s: Σ(P) = Var(P) sρ (s) bounded in s 0: Σ( ) is moderately robust P elliptically symmetric with center 0 and scatter Σ: Σ(P) = c Σ

Good news In general, L(, P) is geodesically convex. Under mild regularity conditions on P and ρ, L(, P) is geodesically strictly convex and coercive.

Taylor expansion with L(B diag(e x )B, P) L(BB, P) + g B x + 1 2 x H B x g B := 1 q ψ B ψ B := ρ ( x 2 )(xi 2 ) q i=1 P B(dx) H B := diag(ψ B ) + ρ ( x 2 )xx P B (dx) P B := L(B 1 X ), X P. Existence, continuity and weak differentiability of Σ( )... Fast algorithms for computation of Σ( ˆP) via partial Newton method...

Symmetrization Replace Σ(P) with Σ s (P) := Σ(P P) P P := L(X X ), X, X i.i.d P. Estimator uses or with 1 k n. P P := P P := 1 nk ( ) n 1 δ 2 Xj X i 1 i<j n n i+k i=1 j=i+1 δ Xj X i

No need to estimate center of P P elliptically symmetric around µ with scatter Σ: Σ s (P) = c Σ Block independence property: ( P = L B [ X1 X 2 ]) with independent X 1 R q(1), X 2 R q(2) implies [ ] Σ1 (P) 0 Σ s (P) = B B. 0 Σ 2 (P)

IV. Regularization In high-dimensional settings replace Σ(P) with ( ) arg min L(Σ, P) + α Pen(Σ), α > 0, Σ R q q sym,+ where Pen : R q q sym,+ R satisfies Pen(cΣ) = Pen(Σ) (scale invariance) Pen(Σ) as λ max (Σ)/λ min (Σ).

Examples of penalties: Pen 0 (Σ) = log tr(σ) + log tr(σ 1 ) ( q ) ( q ) = log λ i + log λ 1 i ) i=1 i=1 Pen 1 (Σ) = q 1 log det(σ) + log tr(σ 1 ) q ( q ) = q 1 log λ i + log i=1 i=1 λ 1 i Pen 2 (Σ) = log det(σ) + q log λ max (Σ) q = log(λ i /λ min ) i=1

These penalties Pen j (Σ) are scale invariant g-convex g-coercive on {Σ : det(σ) = c} strictly g-convex on {Σ : det(σ) = c} (Pen 0, Pen 1 ) with arg min Σ Pen j (Σ) = {ci q : c > 0}

Example: Regularized version of Tyler s (1987) M-functional with f (Σ) = L(Σ, P) + α Pen(Σ) ρ(s) = q log s { Pen 1 (Σ) (Case 1) Pen(Σ) = h(pen 1 (Σ)) (Case 2) On {Σ : det(σ) = 1}, f is strictly g-convex g-coercive in Case 1 if ( P(V) < 1 + α ) dim(v) q q g-coercive in Case 2 if lim s h(s) s whenever 1 dim(v) < q. =

Numerical experiment For q = 50 and n = 30 consider X 1, X 2,..., X n i.i.d. Elliptic q (0, Σ) with Σ = diag(10, 5, 3, 2, 1,..., 1) 2.

Compute and ˆΣ α ( := arg min L(Σ, ˆP) + αh(pen 1 (Σ)) ) Σ ˆΣ := ˆΣˆα with ˆα := arg min CV(α) CV(α) := α 2 Z n i=1 { ρ(x i ˆΣ 1 α, i x i) + log det(ˆσ α, i ) }

log λ(σ ) and log λ(ˆσ ) (ˆα = 2 7 ) 0 1 2 3 4

Cross validation: CV(2 k ) versus k 7000 8000 9000 10000 11000 12000 13000 14000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

First eigenvectors: û 1 u 1 versus k 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Eigenvalues: log λ(ˆσ ) log λ(σ ) versus k 5 10 15 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Shape matrices: D g (ˆΣ, Σ ) versus k 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Symmetrization and orthogonally invariant penalties f (Σ) = L(Σ, P P) + α Pen(Σ) Pen(U ΣU) = Pen(Σ) for orthogonal U R q q Restricted block independence property ( [ ]) X1 P = L U with U R q q orth and independent X 1 R q(1), X 2 R q(2) implies [ ] Σ1 (P) 0 Σ s (P) = U U. 0 Σ 2 (P) X 2

Open questions and ongoing work Symmetrized M-estimators: Balanced incomplete versus complete U-statistics Asymptotics for regularized scatter estimators Algorithms for non-smooth g-convex penalties Using regularized scatter estimators in other contexts (classification, ICS ICA, multivar. regression,... )...

References Auderset, Mazza & Ruh: Angular Gaussian and Cauchy estimation. (JMVA 2005) Bhatia: Positive definite matrices. (Princeton University Press 2007) Wiesel: Geodesic convexity and covariance estimation. (IEEE Trans. Signal Process. 2012) D., Pauly & Schweizer: M-functionals of multivariate scatter. (Statistics Surveys 2015) D., Nordhausen & Schuhmacher: New algorithms for M-estimation of multivar. scatter and loc. (JMVA 2016) R package fastm. (CRAN 2014/2015) D. & Tyler: Geodesic convexity and regularized scatter estimators. (arxiv 1607.05455)