PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCJA. Estymatory kowariancji dla modeli graficznych

Size: px

Start display at page:

Download "PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCJA. Estymatory kowariancji dla modeli graficznych"

Roxanne Wilkerson
6 years ago
Views:

1 PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCJA Estymatory kowariancji dla modeli graficznych wspó lautorzy: H. Ishi(Nagoya), B. Ko lodziejek(warszawa), S. Mamane(Johannesburg), H. Ochiai(Fukuoka) XLV Konferencja Zastosowań Matematyki, Zakopane-Kościelisko, 9/9/2016 1

2 [GI] P. Graczyk, H. Ishi, Riesz measures and Wishart laws associated to quadratic maps, Journal of Jap.Math. Soc.66(2014), [GIM] P. Graczyk, H. Ishi, S. Mamane, Wishart exponential families on cones related to A n graphs, submitted, [GIK] P. Graczyk, H. Ishi, B. Ko lodziejek, Wishart exponential families and variance function on homogeneous cones, submitted, [GIMO] P. Graczyk, H. Ishi, S. Mamane, H. Ochiai, On the Letac-Massam conjecture on cones Q An, submitted,

3 Objective: Modern statistics and analysis of χ 2 laws on matrix cones χ 2 laws on matrices are called Wishart laws Let X be a Gaussian random vector N(m, Σ) on R r, m R r, Σ Sym + r f X (x) = (det 2πΣ) 1/2 exp( 1 2 t (x m)σ 1 (x m)). X has unknown mean m and covariance Σ 3

4 We dispose of an observed sample X (1), X (2),..., X (s) We want to estimate m and Σ, using the observed results X (1), X (2),..., X (s) 4

5 The sample mean X s = X(1) + X (2) X (s) s is an estimator of m = XdP = ˆm and the sample covariance ˆΣ = 1 s s i=1 (X (i) X s ) t (X (i) X s ) is an estimator of Σ = (X m) t (X m)dp. ˆΣ is a quadratic map R r... R r Sym + (r, R). 5

6 Proposition. The estimators ˆm and ˆΣ are maximum likelihood estimators(mle), i.e. they maximize the sample density treated as a function of parameters (likelihood function) (m, Σ) s i=1 f X (x (i) ) = l(x (1),..., x (s) ; m, Σ) Proof. Maximizing of ln l(x (1),..., x (s) ; m, Σ) in (m, Σ). 6

7 Modern research in this area? All seems known and done? NO!!! Problems. Studying the MLE estimators for Σ when 1. Some entries of the vector X are known to be conditionally independent with respect to other ones. It follows that Σ is submitted to some restrictive conditions; the range of Σ is no longer the whole cone Sym + (r, R) but a subcone P Sym + (r, R) of matrices with obligatory zeros or its dual cone Q (explanation follows) 7

8 2. Some data is missing in observations X (i) A simplest example: We have s 1 full observations of X and s 2 observations of first k terms of X. How to estimate Σ from such a sample? 8

9 Conditional independence Example: Simpson paradox A university has students Half boys(24 000), half girls(24 000) At the final exams: boys and girls fail Feminist organizations threaten to close the university, girl students want to lynch the president! However, the president of the university proves that the results R of the exams are independent of the sex S of a student, knowing the department D 9

10 3 departments: A(sciences) B(literature, history, languages) C(law) students each A Succ. Fail B Succ. Fail C Succ. Fail Girls Boys Actually (R S) (D = d) for d =A,B,C 10

11 Conditional independence in a.c. case X = (X 1, X 2, X 3 ) : Random vector f X1,X 2,X 3 (x 1, x 2, x 3 ) : density function X 1 and X 3 are conditionally independent knowing X 2 f X1,X 3 X 2 =x 2 = f X1 X 2 =x 2 f X3 X 2 =x 2 f X1,X 2,X 3 (x 1, x 2, x 3 ) = F (x 1, x 2 )G(x 2, x 3 ) 11

12 Example 1. X N(0, Σ), Σ Sym + 3 f X1,X 2,X 3 (x 1, x 2, x 3 ) = (det 2πΣ) 1/2 exp( t xσ 1 x/2) Put σ := Σ 1. Mixing x 1 and x 3 can be avoided only when σ 13 = 0: f X1,X 2,X 3 (x 1, x 2, x 3 ) = (2π) 3/2 (det σ) 1/2 exp ( (σ 11 x σ 12x 1 x 2 + σ 22 x 2 2 )/2) exp ( (2σ 23 x 2 x 3 + σ 33 x 2 3 )/2) Therefore, (X 1 X 3 ) X 2 σ 13 = 0 The matrix σ = Σ 1 has obligatory zeros σ 13 = σ 31 = 0 12

13 The position of zeros in Σ 1 is encoded by a graph G = (V, E) : undirected graph V = {1,..., r} : the set of vertices E V V : the set of edges i j (i, j) E Z G := { x Sym(r, R) x ij = 0 if i j and i j } P G := Z G Sym + r a sub-cone of Sym + r X N(0, Σ), Σ 1 P G X i and X j are conditionally independent knowing all other components if i j and i j Example 1 (X 1 X 3 ) X 2 corresponds to G:

14 Example 1. Graph G = A 3 : Z G := x 11 x 12 0 x 12 x 22 x 23 x ij R 0 x 23 x 33 P G := Z G Sym + 3 This cone is homogeneous (GL(P G ) acts transitively on P G ) 14

15 Z G := ξ 11 ξ 12 ξ 12 ξ 22 ξ 23 x ij R ξ 23 ξ 33 P G = Q G := { ξ Z G tr xξ > 0 for all x Ω 1 \ {0} } = { ξ ZG ξ 11 ξ 12 ξ 12 ξ 22 > 0, ξ 22 ξ 23 ξ 23 ξ 33 > 0, ξ 33 > 0 } 15

16 Example 2. Graph G = A 4 : Z G := x 11 x x 21 x 22 x x 32 x 33 x 43 x 11,..., x 44 R 0 0 x 43 x 44 P G := Z G S + 4 This cone is non-homogeneous P G = Q G := { ξ Z G tr xξ > 0 for all x Ω 1 \ {0} } = { ξ ZG ξ 11 ξ 12 ξ 12 ξ 22 > 0, ξ 22 ξ 23 ξ 23 ξ 33 > 0, ξ 33 ξ 34 ξ 34 ξ 44 > 0, ξ 44 > 0 } 16

17 Theory of graphical models started in 1976 by Lauritzen and Speed, is for decomposable graphs G G is decomposable G has no cycle of length 4 as an induced subgraph Example: A 4 = from Example 2 Ω G Z G is homogeneous if and only if G is decomposable and A 4 -free (Letac-Massam, Ishi) 17

18 Back to the estimation of the covariance matrix of a normal vector Conditional independence case Simplification: X = t (X 1,..., X n ) : random vector obeying N(0, Σ) known mean 0, unknown covariance matrix Σ S + n with Σ 1 P G = Z G S + n Sample X (1),..., X (s) R n The MLE of Σ on the whole cone S n + is (when s n) Σ = 1 s {X(1) t X (1) +X (2) t X (2) + +X (s) t X (s) } Sym(n, R) Let us look for an estimator when Σ 1 P G. 18

19 Example 1: G=1-2-3, σ 13 = 0, we shall estimate σ = Σ 1 P G and Σ = σ 1 The density function of the sample X (1),..., X (s) R 3 f(x (1),..., x (s) ; σ) = = s k=1 {(2π) 3/2 (det σ) 1/2 exp( t x (k) σx (k) /2)} = (2π) 3s/2 (det σ) s/2 exp( s k=1 t x (k) σx (k) /2) Note that s k=1 t x (k) σx (k) = tr ( s k=1 x (k)t x (k) )σ = s Σ, σ = π(s Σ), σ where π = projection on Z G (since σ 13 = 0) 19

20 f(x (1),..., x (s) ; σ) = (2π) 3s/2 (det σ) s/2 exp( 1 2 π(s Σ), σ ) Which σ P G is most likely? Maximum Likelihood Estimation it s σ = ˆσ for which f(x (1),..., x (s) ; ˆσ) is maximum We study log f(x (1),..., x (s) ; σ) as a function of σ Z G grad σ does not contain grad log det x = x 1 σ 13 ; grad log f(x (1),..., x (s) ; σ) = s 2 (π(σ 1 ) π( Σ)) If s 3, then π( Σ) belongs to Q G. 20

21 We must find ˆσ P G such that π(ˆσ 1 ) = π( Σ) Q G The inverse map Ψ to x P G π(x 1 ) Q G is needed. Only if π = Id this is trivial (Ψ(y) = y 1 ) Then ˆσ = Ψ(π( Σ)) P G is the MLE of σ and, consequently ˆΣ = ˆσ 1 = (Ψ(π( Σ)) 1 is the MLE of Σ. π( Σ) is the MLE of π(σ), which determines Σ The MLE argument is valid for any graphical cone. 21

22 CRUCIAL POINTS: 1. law of π( Σ) on Q G 2. knowledge of the inverse map Ψ 22

23 law of π( Σ) on Q G π( Σ) is a quadratic map of the normal sample with values in Q G : ˆΣ = π( Σ) = 1 s π(x(1) t X (1) +X (2) t X (2) + +X (s) t X (s) ) π( Σ) Q G obeys a Wishart law on Q G 23

24 Missing data The simplest example: X = t (X 1, X 2 ) R 2, X N(µ, Σ) We dispose of s 1 full observations X (1),..., X (s 1) s 2 incomplete observations X (1),..., X (s 2) X = t (X 1, ): the second entry X 2 is not observed It is natural to set X conditional law of X 1 (X 2 = µ 2 ) = N(µ 1, σ = Σ 1 1 σ 11 ), 24

25 f(x (1),..., x (s 1), x (1),..., x (s 2) ; σ) = (2π) (s 1+s 2 )/2 (det σ) s 1/2 σ s 2/2 11 exp{ 1 2 [ i t x (i) σ x (i) + j σ 11 ( x (j) ) 2 ]}, x = x µ The power function appears s ( 1 +s 2 2, s 1 2 ) (σ) = σs 2/2 11 (det σ)s 1/2 The MLE equation grad σ log f = 0 is grad log ( s 1 +s 2 2, s 1 2 ) (σ) = π( q( x(1),..., x (s 1), x (1),..., x (s 2) )) with a quadratic form ( q(x) equal on the sample x to 12 1 s 1 2 i=1 x (i) t x s 2 (i) + j=1 ( x (j) ) 2 ) 0 = q s ,2 (x) q s 2 1 (x ) 25

26 The inverse map Ψ s to x P G grad log ( s 1 +s 2 2, s 1 2 ) (σ) Q G is needed. We show its existence and write it explicitely ([GIM]). Thus ˆσ = Ψ s (π( q(x)) P G is the MLE of σ and, consequently π( q(x)) is the MLE of π(σ), which determines Σ 26

27 The equation π(y 1 ) = ξ Q G, y P G was solved by Lauritzen(1991) for decomposable graphs G, in terms of graph theory (cliques, separators...); the inverse map is called the Lauritzen map We give such maps in monotonous missing data set-up: Ψ s = (π(grad log (s1,...,s n ) ) 1 = grad log δ (s1,...,s n ) where δ (s1,...,s n ) later) is another power function (definitions 27

28 From now on, G = A n = n Q An and P An are important non-homogeneous(n 4) cones appearing in the statistical theory of graphical models They correspond to the practical model of nearest neighbour interactions: in the Gaussian character (X 1, X 2,..., X n ), non-neighbours X i, X j, i j > 1 are conditionally independent with respect to other variables. 28

29 The Wishart laws on Q G and P G were first studied by Letac, Massam G. Letac and H. Massam, Wishart distributions for decomposable graphs, The Annals of Statistics, 35 (2007), Motivations: - MLE theory Lauritzen, S.L. (1996) Graphical models. Oxford University Press - Bayesian statistics: searching conjugate prior distributions for π(σ) Q G and σ P G Diaconis P. and Ylvisaker D. (1979) Conjugate priors for exponential families. The Annals of Statistics 7(2):

30 Letac-Massam, following the Lauritzen graph theory methods, define power functions H on Q G and P G. For G = A n = n they are defined: on Q An by: and on P An by H(α, β, η) = n 1 i=1 η {i:i+1} α i n 1 i=2 ηβ i ii H(α, β, π(y 1 )) 30

31 Our results: 1. we introduce natural power functions δ (M) s (η) on Q An, (M) s (y) on P An, M = 1,..., n which contain (strictly) the Letac-Massam functions H. These functions are densities of Riesz measures on the cones Q An and P An respectively 2. We construct all Wishart families on Q An and P An generated by δ s (M) (η) and s (M) (y). They contain (strictly) the Letac-Massam Wishart families γ Q (α,β,y) and γ P (α,β,η) We give their properties (density, moments) 31

32 3. we give an essential extension and simplification of Letac-Massam theory 4. we prove the Letac-Massam Conjecture on Q An, on Laplace transform property of H(α, β, η) 5. we find MLE of σ and π(σ) in the monotonous missing data case: we construct an infinity of Lauritzen-type inverse maps Ψ s : Q An P An

33 6. We determine the variance function V (m) for the Wishart families on Q An, where the mean m = m s (y) is the expectation of γ s,y. In [GIK] V (m) is determined for homogeneous cones This is done thanks to an explicit form of the inverse mean map ψ s. Ψ s and ψ s are closely related: Ψ s = ψ s

34 Density of q(x), X= a normal sample In order to find MLEs, one needs quadratic forms q(x (1),..., X (s) ) Wishart law γ q,σ the simplest case: q(x) = x t x µ q = q(leb R r) a Riesz measure L µq (η) = R r e η,x t x dx = R r e t xηx dx = π r/2 (det η) 1/2 For an s-sample L µ s (η) = πrs/2 (det η) s/2 q For more general q: L µ q (η) = c. s (η) s (x) := r k=1 (det x {1,...,k} ) s k s k+1 = x s ( ) 1 det x{1,2} s2 ( 11 x ) sr det x det x {1,...,r 1} 32

35 If µ is a measure on a cone Ω V = R n, then the family of probability measures γ y (dx) = e (x,y) L(µ)(y) µ(dx) is called exponential family generated by µ. The density of γ s,σ is e (y,σ) L µ s(σ) µ s(dy). Thus, the density of the Riesz measure µ s is crucial. 33

36 When Ω = Sym + n = S + n, the density of the Riesz measure is given by the multiparameter Siegel integral: For s C n with Rs k > k 1 2 (k = 1,..., n), S + n where δ s (x) = e tr xξ δ s (x)(det x) n+1 2 dx = Γ S + n (s) s (ξ) (ξ S + n ), ( ) s1 ( det x det x{n 1,n} det x... {2,...,n} x nn Γ S + n (s) := π n 1 2 n k=1 Γ(s k k 2 ) ) sn 1 x s n nn, 34

37 POWER FUNCTIONS AND THEIR LAPLACE TRANS- FORMS for G = A n Eliminating orders of vertices There are many (but not all) orders of vertices 1, 2,..., n that we should consider in order to have a harmonious theory of Riesz and Wishart distributions on the cones related to A n graphs. These orders are called eliminating orders of vertices. Let v + be the set of future(w.r. to the order) neigbours (w.r. to the graph) of v. 35

38 An eliminating order of the vertices of G is a permutation {v 1,..., v n } of V such that for all v, the set v + is a complete graph Example. For the graph A 3 : 1 2 3: the orders 1 2 3, 1 3 2, and are eliminating orders and are not eliminating. Proposition. All eliminating orders on A n are obtained by an intertwining of two sequences <... M 1 M n n 1... M + 2 M + 1 M for an M V. 36

39 Definition 1. For s C n, we define functions s : P G C n and δs : Q G C n by s (y) := (det y ) {v} v sv v V det y v (y P G ), δs (η) := (det η ) {v} v + sv det η v + (η Q G ) v V where det y = 1 = det η, V = {1,..., n} Property. On the cones Q An and P An, the power functions s (y) and δ s (η) for eliminating orders, depend only on the maximal element M of the order. We write s (y) = (M) s (y), δs (η) = δ s (M) (η) 37

40 Proposition 2. Let M = 1,..., n. The M-power functions s (M) (y) on P G and δ s (M) (x) on Q G are given by: (M) s (y) = y s 1 s y {1:M 1} s M 1 s M y s M y {M+1:n} s M+1 s M... y s n s n 1 nn. δ (M) s (η) = M 1 i=1 η {i:i+1} s i n i=m+1 η {i 1:i} s i M 1 i=2 ηs i 1 ii η s M 1 s M +s M+1 MM n 1 i=m+1 ηs i+1 ii. Theorem 3. For all y P An, δ (M) s (π(y 1 )) = s (M) (y). 38

41 Characteristic function of a cone ϕ Ω (x) = Ω e (x,y) dy = L (Ω,Leb) (Leb Ω )(x) ϕ Ω (x)dx is the invariant measure of the cone Ω: f(gx)ϕ Ω(x)dx = f(x)ϕ Ω(x)dx. Ω Ω For n 2 define ϕ n : Q An R + by ϕ n (η) = n 1 i=1 η {i,i+1} 3/2 η ii i 1,n We will see that ϕ n is the characteristic function of the cone Q An. 39

42 Theorem 4. For all n 1, for all 1 M n and for all y P An, the integral Q e tr(yη) A δ s (M) (η)ϕ n An (η)dη converges if and only if s i > 2 1, for all i M and s M > 0. have i M In this case, we Q e tr(yη) A δ s (M) (η)ϕ n An (η)dη { = π (n 1)/2 Γ(s i 1 } 2 ) Γ(s M ) (M) (y). s 40

43 Theorem 5. For all n 1, for all 1 M n and for all η Q An, the integral P A n e tr(yη) (M) s (y)dy converges if and only if s i > 3 2, for all i M and s M > 1. And in this case, we have P e tr(yη) A (M) s (y)dy n { = π (n 1)/2 Γ(s i + 3 } 2 ) Γ(s M + 1)δ (M) (η)ϕ A n (η). i M s 41

44 LETAC-MASSAM LAPLACE INTEGRALS Recall Letac-Massam power functions on Q An H(α, β, η) = n 1 i=1 η {i:i+1} α i n 1 i=2 ηβ i ii The Laplace transform formula y P An Q A n e tr(yη) H(α, β, η)ϕ QA n (η)dη = C α,βh(α, β, π 1 (y)), will be referred to as the Letac-Massam formula on Q An 42

45 Define r i = α i β i+1, for all 1 i n 3 and p i = α i β i, for all 3 i n 1. We have H(α, β, η) = δ s (M) M 1 (η) η r i 1 ii i=2 n 1 i=m+1 η p i ii, where s i = α i, for all 1 i M 1; s i = α i 1, for all M + 1 i n and β M = s M 1 s M + s M+1. We have proved Theorem. The Letac-Massam formula on Q An holds if and only if H(α, β, η) = δ s (M) (η) for some M = 2,..., n 1. (a new formulation of Letac-Massam conjecture ) 43

46 Methods. Change of variables Let Φ n : R + R P An 1 P An, (a, b, z) y with y = 1 b a z 0 1 b T Let Ψ n : R + R Q An 1 Q An, (α, β, x) η with η = π 1 β T α x 0 1 β The maps Φ n and Ψ n are bijections. 44

47 Back to MLE of σ and Σ Theorem The map has the inverse map grad log (M) s : P An Q An grad log δ (M) s : Q An P An proved on homogeneous cones by Kai-Nomura (2005) solution of MLE equation grad log s (ˆσ) = π(q s 1 1,2 (x) q s 2 1 (x )) 45

PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCE. Analysis of Wishart matrices: Riesz and Wishart laws on graphical cones. Letac-Massam conjecture

PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCE Analysis of Wishart matrices: Riesz and Wishart laws on graphical cones Letac-Massam conjecture Joint work with H. Ishi, S. Mamane, H. Ochiai Mathematical