Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013
MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005)
RESULT FOR CELL SIGNALING DATA
Gaussian graphical model Respect to graph G = (V, E) as M G = { } N p (0, Σ) K = Σ 1 is positive definite based on G Pairwise Markov property X i X j X V\{i,j} k ij = 0,
BIRTH-DEATH PROCESS Spacial birth-death process: Preston (1976) Birth-death MCMC: Stephen (2000) in mixture models
BIRTH-DEATH MCMC DESIGN General birth-death process Continuous Markov process Birth and death events are independent Poisson processes Time of birth or death event is exponentially distributed Birth-death process in GGM Adding new edge in birth and deleting edge in death time
SIMPLE CASE
SIMPLE CASE
SIMPLE CASE
BALANCE CONDITION Preston (1976): Backward Kolmogorov Under Balance condition, process converges to unique stationary distribution. Mohammadi and Wit (2013): BDMCMC in GGM Stationary distribution = Posterior distribution of (G,K) So, relative sojourn time in graph G = posterior probability of G
PROPOSED BDMCMC ALGORITHM Step 1: (a). Calculate birth and death rates β ξ (K) = λ b, new link ξ = (i, j) δ ξ (K) = b ξ(k ξ )p(g ξ, K ξ x) λ b, existing link ξ = (i, j) p(g, K x) (b). Calculate waiting time, (c). Simulate type of jump, birth or death Step 2: Sampling new precision matrix: K + ξ or K ξ
PROPOSED PRIOR DISTRIBUTIONS Prior for graph Discrete Uniform Truncated Poisson according to number of links Prior for precision matrix G-Wishart: W G (b, D) p(k G) K (b 2)/2 exp { 1 } 2 tr(dk) I G (b, D) = K (b 2)/2 exp { 1 } P G 2 tr(dk) dk
SPECIFIC ELEMENT OF BDMCMC Sampling from G-Wishart distribution Block Gibbs sampler Edgewise block Gibbs sampler According to maximum cliques Metropolis-Hastings algorithm Computing death rates δ ξ (K) = p(g ξ, K ξ x) p(g, K x) γ bb ξ (k ξ ) = I G (b, D) ( K ξ (b, D) K I G ξ ) (b 2)/2 { exp 1 } 2 tr(d (K ξ K)) γ b
Ratio of normalizing constant I G (b, D) I G ξ(b, D) = 2 Γ πt ii t jj Γ ( ) b+νi 2 ( b+νi 1 2 ) E G [f T (ψ ν )] E G ξ [f T (ψ ν )] Plot for ratio of normalizing constants ratio of expectation 1.00 1.05 1.10 1.15 1.20 0 50 100 150 200 250 300 number of nodes
Death rates for high-dimensional cases δ ξ (K) = 2 πt ii t jj Γ ( ) b+νi 2 ( ) b+νi 1 2 Γ { exp 1 2 tr(d (K ξ K)) ) (b 2)/2 ( K ξ K } γ b b ξ (k ξ ) BDgraph package bdmcmc.high : for high-dimensional graphs bdmcmc.low : for low-dimensional graphs
SIMULATION: 8 NODES } M G = {N 8 (0, Σ) K = Σ 1 P G 1.5 0 0 0 0 0.4 1.5 0 0 0 0 0 1.5 0 0 0 0 K = 1.5 0 0 0 1.5 0 0 1.5 0 1.5 1
SOME RESULT Effect of Sample size Number of data 20 30 40 60 80 100 p(true graph data) 0.018 0.067 0.121 0.2 0.22 0.35 false positive 0 0 0 0 0 0 false negative 1 0 0 0 0 0
SIMULATION: 120 NODES M G = {N 120 (0, Σ) K = Σ 1 P G }, n = 2000 7260 Priors: K W G (3, I 120 ) and G TU(all possible graphs) 10000 iterations and 5000 iterations as burn-in Result Time 4 hours p(true graph data) = 0.09 which is most probable graph
SUMMARY
Thanks for your attention References MOHAMMADI, A. AND E. C. WIT (2013) Gaussian graphical model determination based on birth-death MCMC inference, arxiv preprint arxiv:1210.5371v4 WANG, H. AND S. LI (2012) Efficient Gaussian graphical model determination under G-Wishart prior distributions. Electronic Journal of Statistics, 6:168-198 ATAY-KAYIS, A. AND H. MASSAM (2005) A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models. Biometrika Trust, 92(2):317-335 PRESTON, C. J. (1976) Special birth-and-death processes. Bull. Inst. Internat. Statist., 34:1436-1462