On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -1 Bing / 38Li,

Size: px

Start display at page:

Download "On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -1 Bing / 38Li,"

Isaac Dalton
5 years ago
Views:

1 On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis - Bing Li, Hyunho Chun & Hongyu Zhao Kim Youngrae SNU Stat. Multivariate Lab Nov 25, 2016 On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -1 Bing / 38Li,

2 Outline 1 Introduction GGM, GCGM 2 Additive conditional independent Graphoid, semigraphoid Conditional independent, additively conditional independent Relation with previous models 3 Estimation ACCO, APO Calculation Tuning parameter 4 Simultation results On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -2 Bing / 38Li,

3 1. Introduction Additive semigraphoid model Gaussian copula graphical model Additively conditional independent Additive conditional covariance Operator Additive precision operator On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -3 Bing / 38Li,

4 1.1 GGM, GCGM Gaussian graphical model Let X = (X 1,..., X p ) T, Suppose X N(µ, Σ), and Σ is postive definite. Let Θ = Σ 1, Θ ij be the (i, j)th elements of Θ. Then X follows a GGM with respect to the graph G = {Γ, E} iff X i X j X (i,j), (i, j) E Multivariate normal assumption. In Gaussian graphical model, X i X j X (i,j) iff Θ ij = 0. On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -4 Bing / 38Li,

5 1.1 GGM, GCGM (Cont.) Copula Copula is multivariate probability distribution for which the marginal probability of each variable is uniform. It can deal with the variables with dependency. C [0, 1] d [0, 1], is a d dimensional copula if C is cumulative distribution function of a d dimensional random vector on the unit cube On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -5 Bing / 38Li,

6 1.1 GGM, GCGM (Cont.) Gaussian copula graphical model Let X = (X 1,..., X p ) T, Suppose the existence of unknown injections f 1,..., f p such that (f 1 (X 1 ),..., f p (X p )) T is multivariate gaussian. (f 1 (X 1 ),..., f p (X p )) T N(µ, Σ) Each X i shouldn t be a Gaussian. We can use all of GGM procedures for non-gaussian X i s. But the Gaussian copula assumpotion can be violated even for commonly used interactions. ex. marginally normal doesn t imply jointly normal. On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -6 Bing / 38Li,

7 2.1 Graphoid, semigraphoid Graphoid Axioms A three-way relation R is called a graphoid if it satisfies the following conditions. Symmetry: (A, C, B) R (B, C, A) R Decomposition: (A, C, B D) R (A, C, B) R Weak Union: (A, C, B D) R (A, C B, D) R Contraction: (A, C B, D) R, (A, C, B) R (A, C, B D) R Intersection: (A, C D, B) R, (A, C B, D) (A, C, B D) R Conditional indenpendent satisfies this axioms. On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -7 Bing / 38Li,

8 2.1 Graphoid, semigraphoid (Cont.) Semigraphoid Axioms A three-way relation R is called a semigraphoid if it satisfies the conditions of Graphoid axioms except Intersection condition. Symmetry: (A, C, B) R (B, C, A) R Decomposition: (A, C, B D) R (A, C, B) R Weak Union: (A, C, B D) R (A, C B, D) R Contraction: (A, C B, D) R, (A, C, B) R (A, C, B D) R Using this semigraphoid relation in place of probabilistic conditional independence can free us from some awkward restrictions. On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -8 Bing / 38Li,

9 2.2 Conditional independent, additively conditional independent Notations Let X(X 1,..., X p ) T be a random vector. For subvector U = (U 1,..., U p ) T of X, let L 2 (P U ) be the class of functions of U such that Ef(U) = 0, Ef 2 (U) <. For each U i, let A U i denote a subset of L 2 (P U i). Let A U denote the addtive family A U A U r = {f f r f 1 A U 1,..., f r A U r} In this paper, inner product <.,. > represents the L 2 (P X ) inner product. < f, g >= f(x)g(x)dµ(x) On an Additive Semigraphoid Model for Statistical Networks With Application to Nov Pathway 25, 2016 Analysis -9 Bing / 38Li,

10 2.2 Conditional independent, addtively conditional independent (Cont.) Additively conditional independent We say that U and V are additively conditional independent (ACI) give W iff (A U + A W ) A W (A V + A W ) A W And we write this relation as U A V W. Additively conditional independent is three-way relation. U A V W (A U, A W, A V ) On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 10 - Bing / 38Li,

11 2.2 Conditional independent, addtively conditional independent (Cont.) Theorem The additive conditional independece relation is a semigraphoid. Additive semigraphoid model (ASG model) A random vector X follows and ASG model with respect to a graph G = (Γ, E) iff X i A X j X (i,j) (i, j) E. We write this condition as X ASG(G). On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 11 - Bing / 38Li,

12 2.3 Relation with previous models Now we investigate the relation between new-concept, Additive conditional independence and conditional indepdence. Theorem Suppose (a) X has a Gaussian copula distribution with copula functions f 1,..., f p (b) U, V, and W are subvectors of X (c) A X i = span{f i }, i = 1,..., p Then, U A V W iff U V W. Under Gaussian copula assumption with A X i = span{f i }, addtive conditional independence is equivalent to conditional independence. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 12 - Bing / 38Li,

13 2.3 Relation with previous models (Cont.) Theorem Suppose (a) X has a Gaussian copula distribution with copula functions f 1,..., f p (b) U, V, and W are subvectors of X (c) A X i = L 2 (P X i), i = 1,..., p Then, U A V W implies U V W. In this case, they are not equivalent. However, equivalence holds apploximately. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 13 - Bing / 38Li,

14 2.3 Relation with previous models (Cont.) The following proposition suggests implication U V W to U A V W holds approximately. Proposition Suppose Gaussian copula assumption, and WLOG, E[f i (X i )] = 0, V ar[f i (X i )] = 1. Let U = f i (X i ), V = f j (X j ), W = {f k (X k ) k i, j}, and R V W = cov(v, W ), R W W = V ar(w ), R W U = cov(w, U), ρ UV = cor(u, V ). Then, ρ UV W max{ ρ α UV R α V W (R α W W ) 1 R α W U where A α is the α fold Hadamard product. α = 1, 2,...} On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 14 - Bing / 38Li,

15 2.3 Relation with previous models (Cont.) τ UV W is upper bound of ρ UV W in previous proposition. These are calculated under assumption that is in the proposition plus U V W. The small numbers of τ UV W in the table indicate ρ UV W is approximately 0. So, U V W implies U A V W approximately. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 15 - Bing / 38Li,

16 3.1 ACCO, APO In the GGM, X i X j X (i,j) iff (i, j) E, and with precision matrix Θ, we can cut the edge easily. Building the graph in ASG, we introduce two linear operator. Additive conditional covariance operator The ACCO is operator from A V to A U such that Σ UV W = Σ UV Σ UW Σ W V. where Σ UV A V A U such that < f, Σ UV g >= E[f(U)g(V )]. We call the operator the ACCO from V to U given W On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 16 - Bing / 38Li,

17 3.1 ACCO, APO (Cont.) With this operator, there are two important results. Lemma Let P AW A X A W be the projection on to A W. Then P AW A U = Σ W U. Theorem U A V W iff Σ UV W = 0. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 17 - Bing / 38Li,

18 3.1 ACCO, APO (Cont.) Calculating ACCO is difficult because if W is changed, you should calculate again. Similar to GGM, we want to use precision matrix instead of conditional covariance. For defining such operator, we will use different inner product. Let A X be the Hilbert space consisting of the same set of functions as A X, but with the inner product < f f p, g g p > A X = p i=1 < f i, g i > AX i On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 18 - Bing / 38Li,

19 3.1 ACCO, APO (Cont.) Additive precision operator The operator Υ A X A X defined by the matrix of operator {Σ Xi Xj i Γ, j Γ} is called additive covariance operator. If it is invertible then its inverse Θ is called the APO. Υ = {Σ Xi X j i Γ, j Γ} means When Υf i = h h p, Σ Xj X i(f i) = h j. We can easily check for any f, g A X, < f, Υg > A X = cov[f(x), g(x)] =< f, g > On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 19 - Bing / 38Li,

20 3.1 ACCO, APO (Cont.) Question. Is APO well defined? Proposition Suppose Υ is invertible and, for each i j, Υ ij is compact. Then, Υ 1 is bounded. Lemma Suppose that T B(A X, A X ) is a self adjoint, positive definite operator. Then the operators T UU, T V V, T UU T UV TV 1 V T V U, T V V T V U TUU 1 T UV are bounded, self adjoint, and positive definite, and the following identities hold (T 1 ) UU = (T UU T UV T 1 V V T V U ) 1 (1) (T 1 ) V V = (T V V T V U T 1 UU T UV ) 1 (2) (T 1 ) UV = (T 1 ) UU T UV T 1 V V = T 1 UU T UV (T 1 ) V V (3) (T 1 ) V U = (T 1 ) V V T V U T 1 UU = T 1 V V T V U (T 1 ) UU (4) On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 20 - Bing / 38Li,

21 3.1 ACCO, APO (Cont.) The above lemma shows that if Υ has bounded inverse, Θ is well defined. And finally we derive this equivalence. Theorem Suppose (U, V, W ) is a partition of X and Υ is invertible. Then, U A V W iff Θ UV = 0. Corollary A random vector X follows an ASG(G) model if and only if Θ Xi X j = 0 whenever (i, j) E. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 21 - Bing / 38Li,

22 3.2 Calculation First, how we build a graph with ACCO. If sample estimate of Σ Xi X j X (i,j) is small enough, we determine there is no edge between X i and X j. We should derive the finite-sample matrix representation of the operator. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 22 - Bing / 38Li,

23 3.2 Calculation (Cont.) Notations For each i = 1,..., p, let κ i Ω X i Ω X i R be a positive kernel. Let A X i = span{κ 1 (., X i 1) E n κ i (X i, X i 1),..., κ i (., X i n) E n κ i (X i, X i n)} where E n κ i (X i, X i k ) = n 1 n l=1 κ i (X i l, Xi k ). Q = I n 1 n 1 T n /n ; Projection matrix Let κ i be the vector valued function x i (κ i (x i, X i 1) E n κ i (X i, X i 1),..., κ i (x i, X i n) E n κ i (X i, X i n)) With this κ i, for any f A X i can be expressed as f = κ i [f]. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 23 - Bing / 38Li,

24 3.2 Calculation (Cont.) With these notations, we can get the followings (f(x i 1),..., f(x i n)) T = QK i [f] (5) < f, g > AX i = n 1 (QK i [f]) T (QK i [g]) = n 1 [f] T K i QK i [g] (6) where K i = {κ i (X i k, Xi l ) k, l = 1,..., n}, f, g A X i. K (i,j) denoted n(p 2) n matrix obtained by removing ith and jth block of (K 1,..., K p ) T. Lemma K i QK i [Σ Xi X j ] = K iqk j (7) K i QK i [Σ Xi X (i,j)] = K iqk T (i,j) (8) K (i,j) QK T ( i,j) [Σ X (i,j) X i] = K (i,j)qk i (9) On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 24 - Bing / 38Li,

25 3.2 Calculation (Cont.) K i QK i, K (i,j) QK (i,j) are singular. We can solve it by Tychonoff regularization. [Σ X i X j ] = (K iqk i + ɛ 1 I n ) 1 K i QK j (10) [Σ Xi X (i,j)] = (K iqk i + ɛ 1 I n ) 1 K i QK T (i,j) (11) [Σ X (i,j) X i] = (K (i,j)qk T ( i,j) + ɛ 2I n(p 2) ) 1 K (i,j) QK i (12) We can represent [Σ Xj X i X (i,j)] = [Σ X j X i] [Σ X j X (i,j)][σ X (i,j) X i] On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 25 - Bing / 38Li,

26 3.2 Calculation (Cont.) So, we can get [Σ Xj X i X (i,j)] = [Σ X j X i] [Σ X j X (i,j)][σ X (i,j) X i] = (K j QK j + ɛ 1 I n ) 1 K j QK i (K j QK j + ɛ 1 I n ) 1 K j QK T (i,j) (K (i,j)qk T ( i,j) + ɛ 2I n(p 2) ) 1 K (i,j) QK i = (K j QK j + ɛ 1 I n ) 1 K j Q(I n K T (i,j) (K (i,j)qk T ( i,j) + ɛ 2I n(p 2) ) 1 K (i,j) )QK i On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 26 - Bing / 38Li,

27 3.2 Calculation (Cont.) By definition, Σ Xj X i X (i,j) 2 is the maximum of < Σ Xj X i X (i,j)f, Σ X j X i X (i,j)f > A X j subject to < f, f > AX j = 1. = n 1 [f] T [Σ Xj X i X (i,j)]t K j QK j [Σ Xj X i X (i,j)][f] Let v = (K i QK i ) 1/2 [f], then [f] = (K i QK i + ɛ 1 I n ) 1/2 v. The right hand side of above is v T (K i QK i + ɛ 1 I n ) 1/2 [Σ Xj X i X (i,j)]t K j QK j [Σ Xj X i X (i,j)](k iqk i + ɛ 1 I n ) 1/2 v subject to v T v = 1,which implies that Σ Xj X i X (i,j) 2 is the largest eigenvalue of Λ Xi X j (Cont.) On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 27 - Bing / 38Li,

28 3.2 Calculation (Cont.) Λ Xi X j =(K iqk i + ɛ 1 I n ) 1/2 [Σ Xj X i X (i,j)]t K j QK j [Σ Xj X i X (i,j)](k iqk i + ɛ 1 I n ) 1/2 We plug in [Σ Xj X i X (i,j)], then we can get Λ Xi X j = X i X j T X i X j where Xi X j =(K iqk i + ɛ 1 I n ) 1/2 K j Q(I n K T (i,j) (K (i,j)qk T ( i,j) + ɛ 2I n(p 2) ) 1 K (i,j) )QK i K j (K j QK j + ɛ 1 I n ) 1/2 So, we should get the largest singular value of Xi X j. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 28 - Bing / 38Li,

29 3.2 Calculation (Cont.) We can see the large matrix (K (i,j) QK T ( i,j) + ɛ 2I n(p 2) ) and we have to get inverse of it. But, following proposition shows actual amount computation is not large. Proposition V R s t with s > t, which has SVD where D R u u Then, V = (L 1, L 0 )diag(d, 0)(R 1, R 0 ) T (V V T + ɛi s ) 1 = V R 1 D 1 ((D 2 + ɛi u ) 1 ɛ 1 I u )D 1 R T 1 V T + ɛ 1 I s V T (V V T ɛi s ) 1 V = R 1 D(D 2 + ɛi s ) 1 R T 1 On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 29 - Bing / 38Li,

30 3.2 Calculation (Cont.) Second, we calculate Θ Xi X j. Similarly, if sample estimatre of Θ Xi Xj is small enough, we determine there is no edge between X i and X j. Theorem Let K 1 p = (K 1,..., K p ) T, then matrix representation of Υ satisfies the following equation, So, we can get [Θ] as diag(k 1 QK 1,..., K p QK p )[Υ] = K 1 p QK T 1 p [Θ] = (K 1 p QK T 1 p + ɛ 3 I n p) 1 diag(k 1 QK 1,..., K p QK p ) On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 30 - Bing / 38Li,

31 3.2 Calculation (Cont.) To find Θ Xj Xi, we maximize < Θ Xj X if i, Θ Xj X if i > AX j = n 1 [f i ] T [Θ Xj X i]t K j QK j [Θ Xj X i][f i] subject to < f i, f i > AX i = 1. Similarly, Θ Xj X i 2 is the largest eigenvalue of (K i QK i + ɛ 1 I n ) 1/2 [Θ Xj X i]t K j QK j [Θ Xj X i](k iqk i + ɛ 1 I n ) 1/2 And the largest singular value of (K i QK i + ɛ 1 I n ) 1/2 [Θ Xj X i]t (K j QK j + ɛ 1 I n ) 1/2 On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 31 - Bing / 38Li,

32 3.3 Tuning parameter Because of Tychonoff regularization, we used three tuning parameters ɛ 1, ɛ 2, ɛ 3 getting inverse of K i QK i, K (i,j) QK T (i,j), K 1 pqk T 1 p. In this paper, the cross validation procedure is proposed to select these parameters. Notations Let U and V be random vectors and W = (U T, V T ) T. Let F = {f 0, f 1,..., f r } be a set of functions of U. Similarly, G = {g 0, g 1,..., g s } be a set of functions of V. f 0, g 0 represents the constant function 1. For iid sample {W 1,..., W n }, let A = {a 1,..., a m } {1, 2,..., n}, B = A c. Let {W a a A} be a training set, {W b b B} be a testing set. (Cont.) On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 32 - Bing / 38Li,

33 3.3 Tuning parameter (Cont.) (Cont.) Suppose Ê(ɛ) G F is a training-set-based operator, such that for each V U g G, Ê (ɛ) g is the best prediction of g(v ) based on functions in F. V U Then, total error is evaluated as s [g µ (V b ) (Ê(ɛ) V U g µ)(u b )] 2 b B µ=0 Let L U be the matrix with ith row is {f i+1 (U a ) a A}, and let L V be with {g i+1 (V a ) a A} And let l U, l V be a vector valued functions (f 0,..., f r ) T, (g 0,..., g s ) T. Then we can express total error as l V (V b ) L V L T U (L U L T U + ɛi r+1 ) 1 l U (U b ) 2 b B On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 33 - Bing / 38Li,

34 3.3 Tuning parameter (Cont.) If you let G U be the bth column is l U (U b ), G V be the bth column is l V (V b ), xthen total error can be further simplified as G V L V L T U (L U L T U + ɛi r+1 ) 1 G U 2 Now, we apply this cross validation procedure for choosing our parameters. p CV ν 1(ɛ 1 ) = CV 1 (ɛ 1 ) = i=1 b B p l i (X i b ) L i L T i (L i L T i + ɛ 1 I m+1 ) 1 l i (X i b) 2 = G i L i L T i (L i L T i + ɛ 1 I m+1 ) 1 G i 2 i=1 k ν=1 CV ν 1 (ɛ 1 ) where ν is k index of k fold CV. Choose ɛ 1 minimizing CV 1 (ɛ 1 ). On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 34 - Bing / 38Li,

35 3.3 Tuning parameter (Cont.) CV ν 2(ɛ 2 ) = G (i,j) L (i,j) L T (i,j) (L (i,j)l T (i,j) + ɛ 2I m(p 2)+1 ) 1 G (i,j) 2 i>j CV 2 (ɛ 2 ) = k ν=1 CV ν 2 (ɛ 2 ) Choose ɛ 2 minimizing CV 2 (ɛ 2 ). And for ɛ 3, in this paper, they recommend to take ɛ 3 to be same as ɛ 2. Because K (i,j) QK T (i,j) and K 1 pqk T 1 p have similar forms and dimensions. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 35 - Bing / 38Li,

36 4. Simulation results They suggest some comparisons to assess their method. We will see some of them. Comparison 1 When the Gaussian copula assumption is violated. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 36 - Bing / 38Li,

37 4. Simulation results (Cont.) Comparison 2 When the Gaussian assumption and Gaussian copula assumption is satisfied. On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 37 - Bing / 38Li,

38 4. Simulation results (Cont.) Comparison 3 When the true model is non additive model On an Additive Semigraphoid Model for Statistical Networks With Application tonov Pathway 25, 2016 Analysis 38 - Bing / 38Li,

Markov properties for undirected graphs

Markov properties for undirected graphs Graphical Models, Lecture 2, Michaelmas Term 2011 October 12, 2011 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,