Deciphering and modeling heterogeneity in interaction networks
|
|
- Blake Willis
- 6 years ago
- Views:
Transcription
1 Deciphering and modeling heterogeneity in interaction networks (using variational approximations) S. Robin INRA / AgroParisTech Mathematical Modeling of Complex Systems December 2013, Ecole Centrale de Paris
2 Outline State-space models for networks (inc. SBM) Variational inference (inc. SBM) (Variational) Bayesian model averaging Towards W -graphs
3 Heterogeneity in interaction networks
4 Understanding network structure Networks describe interactions between entities. Observed networks display heterogeneous topologies, that one would like to decipher and better understand.
5 Understanding network structure Networks describe interactions between entities. Observed networks display heterogeneous topologies, that one would like to decipher and better understand. Dolphine social network. Hyperlink network. [Newman and Girvan (2004)]
6 Modelling network heterogeneity Latent variable models allow to capture the underlying structure of a network. General setting for binary graphs. [Bollobás et al. (2007)]: A latent (unobserved) variable Z i is associated with each node: {Z i } iid π Edges Y ij = I{i j} are independent conditionally to the Z i s: {Y ij } independent {Z i } : Pr{Y ij = 1} = γ(z i, Z j ) We focus here on model approaches, in contrast with, e.g. Graph clustering [Girvan and Newman (2002)], [Newman (2004)]; Spectral clustering [von Luxburg et al. (2008)].
7 Modelling network heterogeneity Latent variable models allow to capture the underlying structure of a network. General setting for binary graphs. [Bollobás et al. (2007)]: A latent (unobserved) variable Z i is associated with each node: {Z i } iid π Edges Y ij = I{i j} are independent conditionally to the Z i s: {Y ij } independent {Z i } : Pr{Y ij = 1} = γ(z i, Z j ) We focus here on model approaches, in contrast with, e.g. Graph clustering [Girvan and Newman (2002)], [Newman (2004)]; Spectral clustering [von Luxburg et al. (2008)].
8 Modelling network heterogeneity Latent variable models allow to capture the underlying structure of a network. General setting for binary graphs. [Bollobás et al. (2007)]: A latent (unobserved) variable Z i is associated with each node: {Z i } iid π Edges Y ij = I{i j} are independent conditionally to the Z i s: {Y ij } independent {Z i } : Pr{Y ij = 1} = γ(z i, Z j ) We focus here on model approaches, in contrast with, e.g. Graph clustering [Girvan and Newman (2002)], [Newman (2004)]; Spectral clustering [von Luxburg et al. (2008)].
9 Latent space models State-space model: principle.
10 Latent space models State-space model: principle. Consider n nodes (i = 1..n);
11 Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I )
12 Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I ) Edge {Y ij } independent given {Z i }, e.g. Pr{Y ij = 1} = γ(z i, Z j ) γ(z, z ) = f ( z, z ).
13 Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I ) Y = Edge {Y ij } independent given {Z i }, e.g. Pr{Y ij = 1} = γ(z i, Z j ) γ(z, z ) = f ( z, z ).
14 A variety of state-space models Continuous. Latent position models. [Hoff et al. (2002)]: Z i R d, logit[γ(z, z )] = a z z [Handcock et al. (2007)]: Z i k p k N d (µ k, σ 2 k I ) [Lovász and Szegedy (2006)]: Z i U [0,1], γ(z, z ) : [0, 1] 2 [0, 1] =: graphon function [Daudin et al. (2010)]: Z i S K, γ(z, z ) = k,l z k z l γ kl
15 Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)]
16 Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n);
17 Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: {Z i } iid M(1; π) π = (π 1,...π K );
18 Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: {Z i } iid M(1; π) π = (π 1,...π K ); Edge Y ij depends on the labels: {Y ij } independent given {Z i }, Pr{Y ij = 1} = γ(z i, Z j )
19 Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: {Z i } iid M(1; π) π = (π 1,...π K ); Edge Y ij depends on the labels: {Y ij } independent given {Z i }, Pr{Y ij = 1} = γ(z i, Z j )
20 Variational inference Joint work with J.-J. Daudin, S. Gazal, F. Picard [Daudin et al. (2008)], [Gazal et al. (2012)]
21 Incomplete data models Aim. Based on the observed network Y = (Y ij ), we want to infer the parameters θ = (π, γ) the hidden states Z = (Z i ) State space models belong to the class of incomplete data models as the edges (Y ij ) are observed, the latent positions (or status) (Z i ) are not. usual issue in unsupervised classification.
22 Incomplete data models Aim. Based on the observed network Y = (Y ij ), we want to infer the parameters θ = (π, γ) the hidden states Z = (Z i ) State space models belong to the class of incomplete data models as the edges (Y ij ) are observed, the latent positions (or status) (Z i ) are not. usual issue in unsupervised classification.
23 Frequentist maximum likelihood inference Likelihood. The (log-)likelihood log P(Y ; θ) = log Z P(Y, Z; θ) can not be computed. EM trick. But it can be decomposed as log P(Y ; θ) = log P(Y, Z; θ) log P(Z Y ; θ), the conditional expectation of which gives E[log P(Y ; θ) Y ] = E[log P(Y, Z; θ) Y ] E[log P(Z Y ; θ) Y ] log P(Y ; θ) = E[log P(Y, Z; θ) Y ] + H[P(Z Y ; θ)] where H stands for the entropy.
24 Frequentist maximum likelihood inference Likelihood. The (log-)likelihood log P(Y ; θ) = log Z P(Y, Z; θ) can not be computed. EM trick. But it can be decomposed as log P(Y ; θ) = log P(Y, Z; θ) log P(Z Y ; θ), the conditional expectation of which gives E[log P(Y ; θ) Y ] = E[log P(Y, Z; θ) Y ] E[log P(Z Y ; θ) Y ] log P(Y ; θ) = E[log P(Y, Z; θ) Y ] + H[P(Z Y ; θ)] where H stands for the entropy.
25 Frequentist maximum likelihood inference Likelihood. The (log-)likelihood log P(Y ; θ) = log Z P(Y, Z; θ) can not be computed. EM trick. But it can be decomposed as log P(Y ; θ) = log P(Y, Z; θ) log P(Z Y ; θ), the conditional expectation of which gives E[log P(Y ; θ) Y ] = E[log P(Y, Z; θ) Y ] E[log P(Z Y ; θ) Y ] log P(Y ; θ) = E[log P(Y, Z; θ) Y ] + H[P(Z Y ; θ)] where H stands for the entropy.
26 Frequentist maximum likelihood inference Likelihood. The (log-)likelihood log P(Y ; θ) = log Z P(Y, Z; θ) can not be computed. EM trick. But it can be decomposed as log P(Y ; θ) = log P(Y, Z; θ) log P(Z Y ; θ), the conditional expectation of which gives E[log P(Y ; θ) Y ] = E[log P(Y, Z; θ) Y ] E[log P(Z Y ; θ) Y ] log P(Y ; θ) = E[log P(Y, Z; θ) Y ] + H[P(Z Y ; θ)] where H stands for the entropy.
27 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)
28 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)
29 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)
30 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)
31 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)
32 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)
33 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ)
34 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ)
35 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ)
36 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ)
37 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ) but we can not for state space models for graph.
38 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ) but we can not for state space models for graph. Conditional distribution. The dependency graph of Z given Y is a clique. No factorization can be hoped (unlike for HMM). P(Z Y ; θ) can not be computed (efficiently). Variational techniques may help as they provide Q(Z) P(Z Y ).
39 Variational inference Lower bound of the log-likelihood. For any distribution Q(Z) [Jaakkola (2000),Wainwright and Jordan (2008)], log P(Y ) log P(Y ) KL[Q(Z), P(Z Y )] = Q(Z) log P(Y, Z) dz Q(Z) log Q(Z) dz = E Q [log P(Y, Z)] + H[Q(Z)] Link with EM. This is similar to log P(Y ) = E[log P(Y, Z) Y ] + H[P(Z Y )] replacing P(Z Y ) with Q(Z).
40 Variational inference Lower bound of the log-likelihood. For any distribution Q(Z) [Jaakkola (2000),Wainwright and Jordan (2008)], log P(Y ) log P(Y ) KL[Q(Z), P(Z Y )] = Q(Z) log P(Y, Z) dz Q(Z) log Q(Z) dz = E Q [log P(Y, Z)] + H[Q(Z)] Link with EM. This is similar to log P(Y ) = E[log P(Y, Z) Y ] + H[P(Z Y )] replacing P(Z Y ) with Q(Z).
41 Variational inference Lower bound of the log-likelihood. For any distribution Q(Z) [Jaakkola (2000),Wainwright and Jordan (2008)], log P(Y ) log P(Y ) KL[Q(Z), P(Z Y )] = Q(Z) log P(Y, Z) dz Q(Z) log Q(Z) dz = E Q [log P(Y, Z)] + H[Q(Z)] Link with EM. This is similar to log P(Y ) = E[log P(Y, Z) Y ] + H[P(Z Y )] replacing P(Z Y ) with Q(Z).
42 Variational inference Lower bound of the log-likelihood. For any distribution Q(Z) [Jaakkola (2000),Wainwright and Jordan (2008)], log P(Y ) log P(Y ) KL[Q(Z), P(Z Y )] = Q(Z) log P(Y, Z) dz Q(Z) log Q(Z) dz = E Q [log P(Y, Z)] + H[Q(Z)] Link with EM. This is similar to log P(Y ) = E[log P(Y, Z) Y ] + H[P(Z Y )] replacing P(Z Y ) with Q(Z).
43 Variational EM algorithm Variational EM. M-step: compute θ = arg max E Q [log P(Y, Z; θ)]. θ E-step: replace the calculation of P(Z Y ) with the search of Q (Z) = arg min KL[Q(Z), P(Z Y )]. Q Q Taking Q = {all possible distributions} gives Q (Z) = P(Z Y )... like EM does. Variational approximations rely on the choice a set Q of good and tractable ditributions.
44 Variational EM algorithm Variational EM. M-step: compute θ = arg max E Q [log P(Y, Z; θ)]. θ E-step: replace the calculation of P(Z Y ) with the search of Q (Z) = arg min KL[Q(Z), P(Z Y )]. Q Q Taking Q = {all possible distributions} gives Q (Z) = P(Z Y )... like EM does. Variational approximations rely on the choice a set Q of good and tractable ditributions.
45 Variational EM algorithm Variational EM. M-step: compute θ = arg max E Q [log P(Y, Z; θ)]. θ E-step: replace the calculation of P(Z Y ) with the search of Q (Z) = arg min KL[Q(Z), P(Z Y )]. Q Q Taking Q = {all possible distributions} gives Q (Z) = P(Z Y )... like EM does. Variational approximations rely on the choice a set Q of good and tractable ditributions.
46 Variational EM for SBM Distribution class. Q = set of factorisable distributions: Q = {Q : Q(Z) = i Q i (Z i )}, Q i (Z i ) = k τ Z ik ik. The approximate joint distribution is Q(Z i, Z j ) = Q i (Z i )Q j (Z j ). The optimal approximation within this class satisfies a fix-point relation: τ ik π k f kl (Y ij ) τ jl j i l also known as mean-field approximation in physics [Parisi (1988)]. Variational estimates. No general statistical guaranty for variational estimates. SBM is a very specific case for which the variational approximation is asymptotically exact [Celisse et al. (2012),Mariadassou and Matias (2012)].
47 Variational EM for SBM Distribution class. Q = set of factorisable distributions: Q = {Q : Q(Z) = i Q i (Z i )}, Q i (Z i ) = k τ Z ik ik. The approximate joint distribution is Q(Z i, Z j ) = Q i (Z i )Q j (Z j ). The optimal approximation within this class satisfies a fix-point relation: τ ik π k f kl (Y ij ) τ jl j i l also known as mean-field approximation in physics [Parisi (1988)]. Variational estimates. No general statistical guaranty for variational estimates. SBM is a very specific case for which the variational approximation is asymptotically exact [Celisse et al. (2012),Mariadassou and Matias (2012)].
48 Variational EM for SBM Distribution class. Q = set of factorisable distributions: Q = {Q : Q(Z) = i Q i (Z i )}, Q i (Z i ) = k τ Z ik ik. The approximate joint distribution is Q(Z i, Z j ) = Q i (Z i )Q j (Z j ). The optimal approximation within this class satisfies a fix-point relation: τ ik π k f kl (Y ij ) τ jl j i l also known as mean-field approximation in physics [Parisi (1988)]. Variational estimates. No general statistical guaranty for variational estimates. SBM is a very specific case for which the variational approximation is asymptotically exact [Celisse et al. (2012),Mariadassou and Matias (2012)].
49 Variational Bayes inference Bayesian perspective. θ = (π, γ) is random. Model:
50 Variational Bayes inference Bayesian perspective. θ = (π, γ) is random. Model: P(θ)
51 Variational Bayes inference Bayesian perspective. θ = (π, γ) is random. Model: P(θ) P(Z π)
52 Variational Bayes inference Bayesian perspective. θ = (π, γ) is random. Model: P(θ) P(Z π) P(Y γ, Z)
53 Variational Bayes inference Bayesian perspective. θ = (π, γ) is random. Model: P(θ) P(Z π) P(Y γ, Z) Bayesian inference. The aim is then to get the joint conditional distribution of the parameters and of the hidden variables: P(θ, Z Y ).
54 Variational Bayes algorithm Variational Bayes. As P(θ, Z Y ) is intractable, one look formula Q (θ, Z) = arg min KL [Q(θ, Z) P(θ, Z Y )] Q Q taking, e.g., Q = {Q(θ, Z) = Q θ (θ)q Z (Z)} Variational Bayes EM (VBEM). When P(Z, Y θ) belongs to the exponential family, P(θ) is the corresponding conjugate prior, Q can be obtained iteratively as [Beal and Ghahramani (2003)] log Q h θ (θ) E Q h 1 Z [log P(Z, Y, θ)], log QZ h (Z) E Qθ h [log P(Z, Y, θ)]. Application to SBM: [Latouche et al. (2012)]
55 Variational Bayes algorithm Variational Bayes. As P(θ, Z Y ) is intractable, one look formula Q (θ, Z) = arg min KL [Q(θ, Z) P(θ, Z Y )] Q Q taking, e.g., Q = {Q(θ, Z) = Q θ (θ)q Z (Z)} Variational Bayes EM (VBEM). When P(Z, Y θ) belongs to the exponential family, P(θ) is the corresponding conjugate prior, Q can be obtained iteratively as [Beal and Ghahramani (2003)] log Q h θ (θ) E Q h 1 Z [log P(Z, Y, θ)], log QZ h (Z) E Qθ h [log P(Z, Y, θ)]. Application to SBM: [Latouche et al. (2012)]
56 Variational Bayes algorithm Variational Bayes. As P(θ, Z Y ) is intractable, one look formula Q (θ, Z) = arg min KL [Q(θ, Z) P(θ, Z Y )] Q Q taking, e.g., Q = {Q(θ, Z) = Q θ (θ)q Z (Z)} Variational Bayes EM (VBEM). When P(Z, Y θ) belongs to the exponential family, P(θ) is the corresponding conjugate prior, Q can be obtained iteratively as [Beal and Ghahramani (2003)] log Q h θ (θ) E Q h 1 Z [log P(Z, Y, θ)], log QZ h (Z) E Qθ h [log P(Z, Y, θ)]. Application to SBM: [Latouche et al. (2012)]
57 VBEM: Simulation study Credibility intervals: π 1 : +, γ 11 :, γ 12 :, γ 22 : Width of the posterior credibility intervals. π 1, γ 11, γ 12, γ 22
58 VBEM: Simulation study Credibility intervals: π 1 : +, γ 11 :, γ 12 :, γ 22 : Width of the posterior credibility intervals. π 1, γ 11, γ 12, γ 22
59 SBM analysis of E. coli operon networks [Picard et al. (2009)]
60 SBM analysis of E. coli operon networks Meta-graph representation. [Picard et al. (2009)]
61 SBM analysis of E. coli operon networks Meta-graph representation. Parameter estimates. K = 5 [Picard et al. (2009)]
62 (Variational) Bayesian model averaging Joint work with M.-L. Martin-Magniette, S. Volant [Volant et al. (2012)]
63 Model choice Model selection. The number of classes K generally needs to be estimated. In the frequentist setting, an approximate ICL criterion can be derived ICL = E[log P(Y, Z) Y ] 1 { K(K + 1) n(n 1) log (K 1) log n In the Bayesian setting, exact versions of BIC and ICL criteria can be calculated as log P(Y, K), log P(Y, Z, K). (up to the variational approximation) [Latouche et al. (2012)] But, in some applications, it may be useless or meaningless and model averaging may be preferred.
64 Model choice Model selection. The number of classes K generally needs to be estimated. In the frequentist setting, an approximate ICL criterion can be derived ICL = E[log P(Y, Z) Y ] 1 { K(K + 1) n(n 1) log (K 1) log n In the Bayesian setting, exact versions of BIC and ICL criteria can be calculated as log P(Y, K), log P(Y, Z, K). (up to the variational approximation) [Latouche et al. (2012)] But, in some applications, it may be useless or meaningless and model averaging may be preferred.
65 Bayesian model averaging (BMA) General principle. [Hoeting et al. (1999)] : a parameter that can be defined under a series of different models {M K } K. Denote P K ( Y ) its posterior distribution under model M K. The posterior distribution of can be averaged over all models as P( Y ) = K P(M K Y )P K ( Y ) Remarks. w k = P(M K Y ): weight given to model M K for the estimation of. Calculating of w K is not easy, but variational approximation may help.
66 Bayesian model averaging (BMA) General principle. [Hoeting et al. (1999)] : a parameter that can be defined under a series of different models {M K } K. Denote P K ( Y ) its posterior distribution under model M K. The posterior distribution of can be averaged over all models as P( Y ) = K P(M K Y )P K ( Y ) Remarks. w k = P(M K Y ): weight given to model M K for the estimation of. Calculating of w K is not easy, but variational approximation may help.
67 Variational Bayesian model averaging (VBMA) Variational Bayes formulation. M K can be viewed as one more hidden layer Variational Bayes then aims at finding Q (K, θ, Z) = arg min KL [Q(K, θ, Z) P(K, θ, Z Y )] Q Q with Q = {Q(θ, Z, K) = Q θ (θ K)Q Z (Z K)Q K (K)} 1 Optimal variational weights: Q K (K) P(K) exp{log P(Y K) KL[Q (Z, θ K); P(Z, θ Y, K)]} = P(K Y )e KL[Q (Z,θ K);P(Z,θ Y,K)]. 1 No additional approximation w.r.t. the regular VBEM.
68 Variational Bayesian model averaging (VBMA) Variational Bayes formulation. M K can be viewed as one more hidden layer Variational Bayes then aims at finding Q (K, θ, Z) = arg min KL [Q(K, θ, Z) P(K, θ, Z Y )] Q Q with Q = {Q(θ, Z, K) = Q θ (θ K)Q Z (Z K)Q K (K)} 1 Optimal variational weights: Q K (K) P(K) exp{log P(Y K) KL[Q (Z, θ K); P(Z, θ Y, K)]} = P(K Y )e KL[Q (Z,θ K);P(Z,θ Y,K)]. 1 No additional approximation w.r.t. the regular VBEM.
69 VBMA: the recipe For K = 1... K max Use regular VBEM to compute the approximate conditional posterior Q Z,θ K (Z, θ K) = Q Z K (Z K)Q θ K (θ K); Compute the conditional lower bound of the log-likelihood L K = log P(Y K) KL[Q (Z, θ K); P(Z, θ Y, K)]. Compute the approximate posterior of model M K : w K := Q K (K) P(K) exp(l K ). Deduce the approximate posterior of : P( Y ) K w K Q θ K ( K)
70 VBMA: the recipe For K = 1... K max Use regular VBEM to compute the approximate conditional posterior Q Z,θ K (Z, θ K) = Q Z K (Z K)Q θ K (θ K); Compute the conditional lower bound of the log-likelihood L K = log P(Y K) KL[Q (Z, θ K); P(Z, θ Y, K)]. Compute the approximate posterior of model M K : w K := Q K (K) P(K) exp(l K ). Deduce the approximate posterior of : P( Y ) K w K Q θ K ( K)
71 VBMA: the recipe For K = 1... K max Use regular VBEM to compute the approximate conditional posterior Q Z,θ K (Z, θ K) = Q Z K (Z K)Q θ K (θ K); Compute the conditional lower bound of the log-likelihood L K = log P(Y K) KL[Q (Z, θ K); P(Z, θ Y, K)]. Compute the approximate posterior of model M K : w K := Q K (K) P(K) exp(l K ). Deduce the approximate posterior of : P( Y ) K w K Q θ K ( K)
72 Towards W -graphs Joint work with P. Latouche. [Latouche and Robin (2013)]
73 W -graph model W graph. Latent variables: Graphon function γ(z, z ) (Z i ) iid U [0,1], Graphon function γ: γ(z, z ) : [0, 1] 2 [0, 1] Edges: Pr{Y ij = 1} = γ(z i, Z j )
74 Inference of the graphon function Probabilistic point of view. W -graph have been mostly studied in the probability literature: [Lovász and Szegedy (2006)], [Diaconis and Janson (2008)] Motif (sub-graph) frequencies are invariant characteristics of a W -graph. Intrinsic un-identifiability of the graphon function γ is often overcome by imposing that u γ(u, v) dv is monotonous increasing. Statistical point of view. Not much attention has been paid to its inference until very recently: [Chatterjee (2012)], [Airoldi et al. (2013)], [Wolfe and Olhede (2013)],... The two latter also uses SBM as a proxy for W -graph.
75 Inference of the graphon function Probabilistic point of view. W -graph have been mostly studied in the probability literature: [Lovász and Szegedy (2006)], [Diaconis and Janson (2008)] Motif (sub-graph) frequencies are invariant characteristics of a W -graph. Intrinsic un-identifiability of the graphon function γ is often overcome by imposing that u γ(u, v) dv is monotonous increasing. Statistical point of view. Not much attention has been paid to its inference until very recently: [Chatterjee (2012)], [Airoldi et al. (2013)], [Wolfe and Olhede (2013)],... The two latter also uses SBM as a proxy for W -graph.
76 SBM as a W -graph model Latent variables: (U i ) iid U[0, 1] Graphon function γ SBM K (z, z ) Z ik = I{σ k 1 U i < σ k } where σ k = k l=1 π l. Blockwise constant graphon: Edges: γ(z, z ) = γ kl Pr{Y ij = 1} = γ(z i, Z j )
77 Variational Bayes estimation of γ(z, z ) Posterior mean of γk SBM (z, z ) VBEM inference provides the approximate posteriors: (π Y ) Dir(π ) (γ kl Y ) Beta(γkl 0 kl ) Estimate of γ(u, v). γ SBM K (u, v) = Ẽ ( γ C(u),C(v) Y ) where C(u) = 1 + k I{σ k u}. [Gouda and Szántai (2010)]
78 Model averaging Model averaging: There is no true K in the W -graph model. Apply VBMA recipe. For K = 1..K max, fit an SBM model via VBEM and compute Perform model averaging as γ SBM K (z, z ) = E Q [γ C(z),C(z )]. γ(z, z ) = K w K γ SBM K (z, z ) where w K is the variational weights arising from variational Bayes inference.
79 Model averaging Model averaging: There is no true K in the W -graph model. Apply VBMA recipe. For K = 1..K max, fit an SBM model via VBEM and compute Perform model averaging as γ SBM K (z, z ) = E Q [γ C(z),C(z )]. γ(z, z ) = K w K γ SBM K (z, z ) where w K is the variational weights arising from variational Bayes inference.
80 Some simulations Design. Symetric graphon: γ(u, v) = ρλ 2 (uv) λ 1 Variational posterior for K: Q (K). λ : imbalanced graph ρ : dense graph Results. More complex models as n and λ Posterior fairly concentrated
81 French political blogosphere Website network. French political blogs: 196 nodes, 1432 edges.
82 French political blogosphere Infered graphon. Ŵ (u, v) = E(γ(u, v) Y ) Motif probability can be estimated as well as µ(m) = E(µ(m) Y ).
83 French political blogosphere Infered graphon. Ŵ (u, v) = E(γ(u, v) Y ) Motif probability can be estimated as well as µ(m) = E(µ(m) Y ).
84 Conclusion Real networks are heterogeneous. State space models (including SBM) allow to capture such an heterogeneity... but raise inference problems. Variational approximations help to deal with complex (conditional) dependency structures. SBM is a special (rare?) case for which guaranties exist as for the performances of the variational approximation. SBM can be used as a proxy for smoother models, such as W -graphs.
85 Conclusion Real networks are heterogeneous. State space models (including SBM) allow to capture such an heterogeneity... but raise inference problems. Variational approximations help to deal with complex (conditional) dependency structures. SBM is a special (rare?) case for which guaranties exist as for the performances of the variational approximation. SBM can be used as a proxy for smoother models, such as W -graphs.
86 Conclusion Real networks are heterogeneous. State space models (including SBM) allow to capture such an heterogeneity... but raise inference problems. Variational approximations help to deal with complex (conditional) dependency structures. SBM is a special (rare?) case for which guaranties exist as for the performances of the variational approximation. SBM can be used as a proxy for smoother models, such as W -graphs.
87 Conclusion Real networks are heterogeneous. State space models (including SBM) allow to capture such an heterogeneity... but raise inference problems. Variational approximations help to deal with complex (conditional) dependency structures. SBM is a special (rare?) case for which guaranties exist as for the performances of the variational approximation. SBM can be used as a proxy for smoother models, such as W -graphs.
88 Conclusion Real networks are heterogeneous. State space models (including SBM) allow to capture such an heterogeneity... but raise inference problems. Variational approximations help to deal with complex (conditional) dependency structures. SBM is a special (rare?) case for which guaranties exist as for the performances of the variational approximation. SBM can be used as a proxy for smoother models, such as W -graphs.
89 Airoldi, E. M., Costa, T. B. and Chan, S. H. (Nov., 2013). Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. ArXiv e-prints Beal, J., M. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayes. Statist Bollobás, B., Janson, S. and Riordan, O. (2007). The phase transition in inhomogeneous random graphs. Rand. Struct. Algo. 31 (1) Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Statis Chatterjee, S. (Dec., 2012). Matrix estimation by Universal Singular Value Thresholding. ArXiv e-prints Daudin, J.-J., Picard, F. and Robin, S. (Jun, 2008). A mixture model for random graphs. Stat. Comput. 18 (2) Daudin, J.-J., Pierre, L. and Vacher, C. (2010). Model for heterogeneous random networks using continuous latent variables and an application to a tree fungus network. Biometrics. 66 (4) Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. Rend. Mat. Appl. 7 (28) Gazal, S., Daudin, J.-J. and Robin, S. (2012). Accuracy of variational estimates for random graph mixture models. Journal of Statistical Computation and Simulation. 82 (6) Girvan, M. and Newman, M. E. J. (2002). Community strucutre in social and biological networks. Proc. Natl. Acad. Sci. USA. 99 (12) Gouda, A. and Szántai, T. (2010). On numerical calculation of probabilities according to Dirichlet distribution. Ann. Oper. Res DOI: /s
90 Handcock, M., Raftery, A. and Tantrum, J. (2007). Model-based clustering for social networks. JRSSA. 170 (2) doi: /j X x. Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science. 14 (4) Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 (460) Jaakkola, T. (2000). Advanced mean field methods: theory and practice. chapter Tutorial on variational approximation methods. MIT Press. Latouche, P., Birmelé, E. and Ambroise, C. (2012). Variational bayesian inference and complexity control for stochastic block models. Statis. Model. 12 (1) Latouche, P. and Robin, S. (Oct., 2013). Bayesian Model Averaging of Stochastic Block Models to Estimate the Graphon Function and Motif Frequencies in a W-graph Model. ArXiv e-prints Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences. Journal of Combinatorial Theory, Series B. 96 (6) von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Stat. 36 (2) Mariadassou, M. and Matias, C. (June, 2012). Convergence of the groups posterior distribution in latent or stochastic block models. ArXiv e-prints Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks,. Phys. Rev. E Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Phys. Rev. E (69) Nowicki, K. and Snijders, T. (2001). Estimation and prediction for stochastic block-structures. J. Amer. Statist. Assoc Parisi, G. (1988). Statistical Field Theory. Addison Wesley, New York),.
91 Picard, F., Miele, V., Daudin, J.-J., Cottret, L. and Robin, S. (2009). Deciphering the connectivity structure of biological networks using mixnet. BMC Bioinformatics. Suppl 6 S17. doi: / s6-s17. Volant, S., Magniette, M.-L. M. and Robin, S. (2012). Variational bayes approach for model aggregation in unsupervised classification with markovian dependency. Comput. Statis. & Data Analysis. 56 (8) Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1 (1 2) Wolfe, P. J. and Olhede, S. C. (Sep., 2013). Nonparametric graphon estimation. ArXiv e-prints
92 SBM for a binary social network Zachary data. Social binary network of friendship within a sport club. (%) γ kl k/l π l
93 Küllback-Leibler divergence Definition. KL[Q( ), P( )] = = Q(Z) log Q(Z) P(Z) dz Q(Z) log Q(Z) dz Q(Z) log P(Z) dz Some properties. Always positive. Null iff P = Q. Contrast consistent with maximum-likelihood inference. Not a distance, only a divergence.
94 Concentration of the degree distribution n = 100.
95 Concentration of the degree distribution n = 1000.
96 Concentration of the degree distribution n =
Goodness of fit of logistic models for random graphs: a variational Bayes approach
Goodness of fit of logistic models for random graphs: a variational Bayes approach S. Robin Joint work with P. Latouche and S. Ouadah INRA / AgroParisTech MIAT Seminar, Toulouse, November 2016 S. Robin
More informationVariational inference for the Stochastic Block-Model
Variational inference for the Stochastic Block-Model S. Robin AgroParisTech / INRA Workshop on Random Graphs, April 2011, Lille S. Robin (AgroParisTech / INRA) Variational inference for SBM Random Graphs,
More informationIV. Analyse de réseaux biologiques
IV. Analyse de réseaux biologiques Catherine Matias CNRS - Laboratoire de Probabilités et Modèles Aléatoires, Paris catherine.matias@math.cnrs.fr http://cmatias.perso.math.cnrs.fr/ ENSAE - 2014/2015 Sommaire
More informationModeling heterogeneity in random graphs
Modeling heterogeneity in random graphs Catherine MATIAS CNRS, Laboratoire Statistique & Génome, Évry (Soon: Laboratoire de Probabilités et Modèles Aléatoires, Paris) http://stat.genopole.cnrs.fr/ cmatias
More informationVariational Bayes model averaging for graphon functions and motif frequencies inference in W-graph models
Variational Bayes model averaging for graphon functions and motif frequencies inference in W-graph models Pierre Latouche, Stephane Robin To cite this version: Pierre Latouche, Stephane Robin. Variational
More informationBayesian methods for graph clustering
Author manuscript, published in "Advances in data handling and business intelligence (2009) 229-239" DOI : 10.1007/978-3-642-01044-6 Bayesian methods for graph clustering P. Latouche, E. Birmelé, and C.
More informationMODELING HETEROGENEITY IN RANDOM GRAPHS THROUGH LATENT SPACE MODELS: A SELECTIVE REVIEW
ESAIM: PROCEEDINGS AND SURVEYS, December 2014, Vol. 47, p. 55-74 F. Abergel, M. Aiguier, D. Challet, P.-H. Cournède, G. Faÿ, P. Lafitte, Editors MODELING HETEROGENEITY IN RANDOM GRAPHS THROUGH LATENT SPACE
More informationThe Random Subgraph Model for the Analysis of an Ecclesiastical Network in Merovingian Gaul
The Random Subgraph Model for the Analysis of an Ecclesiastical Network in Merovingian Gaul Charles Bouveyron Laboratoire MAP5, UMR CNRS 8145 Université Paris Descartes This is a joint work with Y. Jernite,
More informationarxiv: v2 [stat.ap] 24 Jul 2010
Variational Bayesian inference and complexity control for stochastic block models arxiv:0912.2873v2 [stat.ap] 24 Jul 2010 P. Latouche, E. Birmelé and C. Ambroise Laboratoire Statistique et Génome, UMR
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationUncovering structure in biological networks: A model-based approach
Uncovering structure in biological networks: A model-based approach J-J Daudin, F. Picard, S. Robin, M. Mariadassou UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Statistics
More informationMixtures and Hidden Markov Models for analyzing genomic data
Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche
More informationMixture models for analysing transcriptome and ChIP-chip data
Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationStatistical analysis of biological networks.
Statistical analysis of biological networks. Assessing the exceptionality of network motifs S. Schbath Jouy-en-Josas/Evry/Paris, France http://genome.jouy.inra.fr/ssb/ Colloquium interactions math/info,
More informationBayesian Methods for Graph Clustering
Bayesian Methods for Graph Clustering by Pierre Latouche, Etienne Birmelé, and Christophe Ambroise Research Report No. 17 September 2008 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry, France
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationG8325: Variational Bayes
G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c
More informationAn introduction to Variational calculus in Machine Learning
n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationBayesian non parametric inference of discrete valued networks
Bayesian non parametric inference of discrete valued networks Laetitia Nouedoui, Pierre Latouche To cite this version: Laetitia Nouedoui, Pierre Latouche. Bayesian non parametric inference of discrete
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationU-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models
U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationTheory and Methods for the Analysis of Social Networks
Theory and Methods for the Analysis of Social Networks Alexander Volfovsky Department of Statistical Science, Duke University Lecture 1: January 16, 2018 1 / 35 Outline Jan 11 : Brief intro and Guest lecture
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationTopics in Network Models. Peter Bickel
Topics in Network Models MSU, Sepember, 2012 Peter Bickel Statistics Dept. UC Berkeley (Joint work with S. Bhattacharyya UC Berkeley, A. Chen Google, D. Choi UC Berkeley, E. Levina U. Mich and P. Sarkar
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationState Space and Hidden Markov Models
State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State
More informationVariational Mixture of Gaussians. Sargur Srihari
Variational Mixture of Gaussians Sargur srihari@cedar.buffalo.edu 1 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties
More informationDegree-based goodness-of-fit tests for heterogeneous random graph models : independent and exchangeable cases
arxiv:1507.08140v2 [math.st] 30 Jan 2017 Degree-based goodness-of-fit tests for heterogeneous random graph models : independent and exchangeable cases Sarah Ouadah 1,2,, Stéphane Robin 1,2, Pierre Latouche
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationSome Statistical Models and Algorithms for Change-Point Problems in Genomics
Some Statistical Models and Algorithms for Change-Point Problems in Genomics S. Robin UMR 518 AgroParisTech / INRA Applied MAth & Comput. Sc. Journées SMAI-MAIRCI Grenoble, September 2012 S. Robin (AgroParisTech
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationRandom Effects Models for Network Data
Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of
More informationLearning latent structure in complex networks
Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationGraphical models: parameter learning
Graphical models: parameter learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London London WC1N 3AR, England http://www.gatsby.ucl.ac.uk/ zoubin/ zoubin@gatsby.ucl.ac.uk
More informationClustering bi-partite networks using collapsed latent block models
Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationVariational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M
A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationStochastic Variational Inference
Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More information