Deciphering and modeling heterogeneity in interaction networks

Size: px
Start display at page:

Download "Deciphering and modeling heterogeneity in interaction networks"

Transcription

1 Deciphering and modeling heterogeneity in interaction networks (using variational approximations) S. Robin INRA / AgroParisTech Mathematical Modeling of Complex Systems December 2013, Ecole Centrale de Paris

2 Outline State-space models for networks (inc. SBM) Variational inference (inc. SBM) (Variational) Bayesian model averaging Towards W -graphs

3 Heterogeneity in interaction networks

4 Understanding network structure Networks describe interactions between entities. Observed networks display heterogeneous topologies, that one would like to decipher and better understand.

5 Understanding network structure Networks describe interactions between entities. Observed networks display heterogeneous topologies, that one would like to decipher and better understand. Dolphine social network. Hyperlink network. [Newman and Girvan (2004)]

6 Modelling network heterogeneity Latent variable models allow to capture the underlying structure of a network. General setting for binary graphs. [Bollobás et al. (2007)]: A latent (unobserved) variable Z i is associated with each node: {Z i } iid π Edges Y ij = I{i j} are independent conditionally to the Z i s: {Y ij } independent {Z i } : Pr{Y ij = 1} = γ(z i, Z j ) We focus here on model approaches, in contrast with, e.g. Graph clustering [Girvan and Newman (2002)], [Newman (2004)]; Spectral clustering [von Luxburg et al. (2008)].

7 Modelling network heterogeneity Latent variable models allow to capture the underlying structure of a network. General setting for binary graphs. [Bollobás et al. (2007)]: A latent (unobserved) variable Z i is associated with each node: {Z i } iid π Edges Y ij = I{i j} are independent conditionally to the Z i s: {Y ij } independent {Z i } : Pr{Y ij = 1} = γ(z i, Z j ) We focus here on model approaches, in contrast with, e.g. Graph clustering [Girvan and Newman (2002)], [Newman (2004)]; Spectral clustering [von Luxburg et al. (2008)].

8 Modelling network heterogeneity Latent variable models allow to capture the underlying structure of a network. General setting for binary graphs. [Bollobás et al. (2007)]: A latent (unobserved) variable Z i is associated with each node: {Z i } iid π Edges Y ij = I{i j} are independent conditionally to the Z i s: {Y ij } independent {Z i } : Pr{Y ij = 1} = γ(z i, Z j ) We focus here on model approaches, in contrast with, e.g. Graph clustering [Girvan and Newman (2002)], [Newman (2004)]; Spectral clustering [von Luxburg et al. (2008)].

9 Latent space models State-space model: principle.

10 Latent space models State-space model: principle. Consider n nodes (i = 1..n);

11 Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I )

12 Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I ) Edge {Y ij } independent given {Z i }, e.g. Pr{Y ij = 1} = γ(z i, Z j ) γ(z, z ) = f ( z, z ).

13 Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I ) Y = Edge {Y ij } independent given {Z i }, e.g. Pr{Y ij = 1} = γ(z i, Z j ) γ(z, z ) = f ( z, z ).

14 A variety of state-space models Continuous. Latent position models. [Hoff et al. (2002)]: Z i R d, logit[γ(z, z )] = a z z [Handcock et al. (2007)]: Z i k p k N d (µ k, σ 2 k I ) [Lovász and Szegedy (2006)]: Z i U [0,1], γ(z, z ) : [0, 1] 2 [0, 1] =: graphon function [Daudin et al. (2010)]: Z i S K, γ(z, z ) = k,l z k z l γ kl

15 Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)]

16 Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n);

17 Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: {Z i } iid M(1; π) π = (π 1,...π K );

18 Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: {Z i } iid M(1; π) π = (π 1,...π K ); Edge Y ij depends on the labels: {Y ij } independent given {Z i }, Pr{Y ij = 1} = γ(z i, Z j )

19 Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: {Z i } iid M(1; π) π = (π 1,...π K ); Edge Y ij depends on the labels: {Y ij } independent given {Z i }, Pr{Y ij = 1} = γ(z i, Z j )

20 Variational inference Joint work with J.-J. Daudin, S. Gazal, F. Picard [Daudin et al. (2008)], [Gazal et al. (2012)]

21 Incomplete data models Aim. Based on the observed network Y = (Y ij ), we want to infer the parameters θ = (π, γ) the hidden states Z = (Z i ) State space models belong to the class of incomplete data models as the edges (Y ij ) are observed, the latent positions (or status) (Z i ) are not. usual issue in unsupervised classification.

22 Incomplete data models Aim. Based on the observed network Y = (Y ij ), we want to infer the parameters θ = (π, γ) the hidden states Z = (Z i ) State space models belong to the class of incomplete data models as the edges (Y ij ) are observed, the latent positions (or status) (Z i ) are not. usual issue in unsupervised classification.

23 Frequentist maximum likelihood inference Likelihood. The (log-)likelihood log P(Y ; θ) = log Z P(Y, Z; θ) can not be computed. EM trick. But it can be decomposed as log P(Y ; θ) = log P(Y, Z; θ) log P(Z Y ; θ), the conditional expectation of which gives E[log P(Y ; θ) Y ] = E[log P(Y, Z; θ) Y ] E[log P(Z Y ; θ) Y ] log P(Y ; θ) = E[log P(Y, Z; θ) Y ] + H[P(Z Y ; θ)] where H stands for the entropy.

24 Frequentist maximum likelihood inference Likelihood. The (log-)likelihood log P(Y ; θ) = log Z P(Y, Z; θ) can not be computed. EM trick. But it can be decomposed as log P(Y ; θ) = log P(Y, Z; θ) log P(Z Y ; θ), the conditional expectation of which gives E[log P(Y ; θ) Y ] = E[log P(Y, Z; θ) Y ] E[log P(Z Y ; θ) Y ] log P(Y ; θ) = E[log P(Y, Z; θ) Y ] + H[P(Z Y ; θ)] where H stands for the entropy.

25 Frequentist maximum likelihood inference Likelihood. The (log-)likelihood log P(Y ; θ) = log Z P(Y, Z; θ) can not be computed. EM trick. But it can be decomposed as log P(Y ; θ) = log P(Y, Z; θ) log P(Z Y ; θ), the conditional expectation of which gives E[log P(Y ; θ) Y ] = E[log P(Y, Z; θ) Y ] E[log P(Z Y ; θ) Y ] log P(Y ; θ) = E[log P(Y, Z; θ) Y ] + H[P(Z Y ; θ)] where H stands for the entropy.

26 Frequentist maximum likelihood inference Likelihood. The (log-)likelihood log P(Y ; θ) = log Z P(Y, Z; θ) can not be computed. EM trick. But it can be decomposed as log P(Y ; θ) = log P(Y, Z; θ) log P(Z Y ; θ), the conditional expectation of which gives E[log P(Y ; θ) Y ] = E[log P(Y, Z; θ) Y ] E[log P(Z Y ; θ) Y ] log P(Y ; θ) = E[log P(Y, Z; θ) Y ] + H[P(Z Y ; θ)] where H stands for the entropy.

27 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)

28 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)

29 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)

30 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)

31 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)

32 EM algorithm Aims at maximizing the log-likelihood log P(Y ; θ) through the alternation of two steps [Dempster et al. (1977)] M-step: maximize E[log P(Y, Z; θ) Y ] with respect to θ generally similar to standard MLE. E-step: calculate P(Z Y ) (at least, some moments) sometimes straightforward (independent mixture models: Bayes formula) sometimes tricky but doable (HMMs: forward-backward recursion) sometimes impossible (SBM:...)

33 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ)

34 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ)

35 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ)

36 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ)

37 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ) but we can not for state space models for graph.

38 Conditional distribution of Z EM works provided that we can calculate P(Z Y ; θ) but we can not for state space models for graph. Conditional distribution. The dependency graph of Z given Y is a clique. No factorization can be hoped (unlike for HMM). P(Z Y ; θ) can not be computed (efficiently). Variational techniques may help as they provide Q(Z) P(Z Y ).

39 Variational inference Lower bound of the log-likelihood. For any distribution Q(Z) [Jaakkola (2000),Wainwright and Jordan (2008)], log P(Y ) log P(Y ) KL[Q(Z), P(Z Y )] = Q(Z) log P(Y, Z) dz Q(Z) log Q(Z) dz = E Q [log P(Y, Z)] + H[Q(Z)] Link with EM. This is similar to log P(Y ) = E[log P(Y, Z) Y ] + H[P(Z Y )] replacing P(Z Y ) with Q(Z).

40 Variational inference Lower bound of the log-likelihood. For any distribution Q(Z) [Jaakkola (2000),Wainwright and Jordan (2008)], log P(Y ) log P(Y ) KL[Q(Z), P(Z Y )] = Q(Z) log P(Y, Z) dz Q(Z) log Q(Z) dz = E Q [log P(Y, Z)] + H[Q(Z)] Link with EM. This is similar to log P(Y ) = E[log P(Y, Z) Y ] + H[P(Z Y )] replacing P(Z Y ) with Q(Z).

41 Variational inference Lower bound of the log-likelihood. For any distribution Q(Z) [Jaakkola (2000),Wainwright and Jordan (2008)], log P(Y ) log P(Y ) KL[Q(Z), P(Z Y )] = Q(Z) log P(Y, Z) dz Q(Z) log Q(Z) dz = E Q [log P(Y, Z)] + H[Q(Z)] Link with EM. This is similar to log P(Y ) = E[log P(Y, Z) Y ] + H[P(Z Y )] replacing P(Z Y ) with Q(Z).

42 Variational inference Lower bound of the log-likelihood. For any distribution Q(Z) [Jaakkola (2000),Wainwright and Jordan (2008)], log P(Y ) log P(Y ) KL[Q(Z), P(Z Y )] = Q(Z) log P(Y, Z) dz Q(Z) log Q(Z) dz = E Q [log P(Y, Z)] + H[Q(Z)] Link with EM. This is similar to log P(Y ) = E[log P(Y, Z) Y ] + H[P(Z Y )] replacing P(Z Y ) with Q(Z).

43 Variational EM algorithm Variational EM. M-step: compute θ = arg max E Q [log P(Y, Z; θ)]. θ E-step: replace the calculation of P(Z Y ) with the search of Q (Z) = arg min KL[Q(Z), P(Z Y )]. Q Q Taking Q = {all possible distributions} gives Q (Z) = P(Z Y )... like EM does. Variational approximations rely on the choice a set Q of good and tractable ditributions.

44 Variational EM algorithm Variational EM. M-step: compute θ = arg max E Q [log P(Y, Z; θ)]. θ E-step: replace the calculation of P(Z Y ) with the search of Q (Z) = arg min KL[Q(Z), P(Z Y )]. Q Q Taking Q = {all possible distributions} gives Q (Z) = P(Z Y )... like EM does. Variational approximations rely on the choice a set Q of good and tractable ditributions.

45 Variational EM algorithm Variational EM. M-step: compute θ = arg max E Q [log P(Y, Z; θ)]. θ E-step: replace the calculation of P(Z Y ) with the search of Q (Z) = arg min KL[Q(Z), P(Z Y )]. Q Q Taking Q = {all possible distributions} gives Q (Z) = P(Z Y )... like EM does. Variational approximations rely on the choice a set Q of good and tractable ditributions.

46 Variational EM for SBM Distribution class. Q = set of factorisable distributions: Q = {Q : Q(Z) = i Q i (Z i )}, Q i (Z i ) = k τ Z ik ik. The approximate joint distribution is Q(Z i, Z j ) = Q i (Z i )Q j (Z j ). The optimal approximation within this class satisfies a fix-point relation: τ ik π k f kl (Y ij ) τ jl j i l also known as mean-field approximation in physics [Parisi (1988)]. Variational estimates. No general statistical guaranty for variational estimates. SBM is a very specific case for which the variational approximation is asymptotically exact [Celisse et al. (2012),Mariadassou and Matias (2012)].

47 Variational EM for SBM Distribution class. Q = set of factorisable distributions: Q = {Q : Q(Z) = i Q i (Z i )}, Q i (Z i ) = k τ Z ik ik. The approximate joint distribution is Q(Z i, Z j ) = Q i (Z i )Q j (Z j ). The optimal approximation within this class satisfies a fix-point relation: τ ik π k f kl (Y ij ) τ jl j i l also known as mean-field approximation in physics [Parisi (1988)]. Variational estimates. No general statistical guaranty for variational estimates. SBM is a very specific case for which the variational approximation is asymptotically exact [Celisse et al. (2012),Mariadassou and Matias (2012)].

48 Variational EM for SBM Distribution class. Q = set of factorisable distributions: Q = {Q : Q(Z) = i Q i (Z i )}, Q i (Z i ) = k τ Z ik ik. The approximate joint distribution is Q(Z i, Z j ) = Q i (Z i )Q j (Z j ). The optimal approximation within this class satisfies a fix-point relation: τ ik π k f kl (Y ij ) τ jl j i l also known as mean-field approximation in physics [Parisi (1988)]. Variational estimates. No general statistical guaranty for variational estimates. SBM is a very specific case for which the variational approximation is asymptotically exact [Celisse et al. (2012),Mariadassou and Matias (2012)].

49 Variational Bayes inference Bayesian perspective. θ = (π, γ) is random. Model:

50 Variational Bayes inference Bayesian perspective. θ = (π, γ) is random. Model: P(θ)

51 Variational Bayes inference Bayesian perspective. θ = (π, γ) is random. Model: P(θ) P(Z π)

52 Variational Bayes inference Bayesian perspective. θ = (π, γ) is random. Model: P(θ) P(Z π) P(Y γ, Z)

53 Variational Bayes inference Bayesian perspective. θ = (π, γ) is random. Model: P(θ) P(Z π) P(Y γ, Z) Bayesian inference. The aim is then to get the joint conditional distribution of the parameters and of the hidden variables: P(θ, Z Y ).

54 Variational Bayes algorithm Variational Bayes. As P(θ, Z Y ) is intractable, one look formula Q (θ, Z) = arg min KL [Q(θ, Z) P(θ, Z Y )] Q Q taking, e.g., Q = {Q(θ, Z) = Q θ (θ)q Z (Z)} Variational Bayes EM (VBEM). When P(Z, Y θ) belongs to the exponential family, P(θ) is the corresponding conjugate prior, Q can be obtained iteratively as [Beal and Ghahramani (2003)] log Q h θ (θ) E Q h 1 Z [log P(Z, Y, θ)], log QZ h (Z) E Qθ h [log P(Z, Y, θ)]. Application to SBM: [Latouche et al. (2012)]

55 Variational Bayes algorithm Variational Bayes. As P(θ, Z Y ) is intractable, one look formula Q (θ, Z) = arg min KL [Q(θ, Z) P(θ, Z Y )] Q Q taking, e.g., Q = {Q(θ, Z) = Q θ (θ)q Z (Z)} Variational Bayes EM (VBEM). When P(Z, Y θ) belongs to the exponential family, P(θ) is the corresponding conjugate prior, Q can be obtained iteratively as [Beal and Ghahramani (2003)] log Q h θ (θ) E Q h 1 Z [log P(Z, Y, θ)], log QZ h (Z) E Qθ h [log P(Z, Y, θ)]. Application to SBM: [Latouche et al. (2012)]

56 Variational Bayes algorithm Variational Bayes. As P(θ, Z Y ) is intractable, one look formula Q (θ, Z) = arg min KL [Q(θ, Z) P(θ, Z Y )] Q Q taking, e.g., Q = {Q(θ, Z) = Q θ (θ)q Z (Z)} Variational Bayes EM (VBEM). When P(Z, Y θ) belongs to the exponential family, P(θ) is the corresponding conjugate prior, Q can be obtained iteratively as [Beal and Ghahramani (2003)] log Q h θ (θ) E Q h 1 Z [log P(Z, Y, θ)], log QZ h (Z) E Qθ h [log P(Z, Y, θ)]. Application to SBM: [Latouche et al. (2012)]

57 VBEM: Simulation study Credibility intervals: π 1 : +, γ 11 :, γ 12 :, γ 22 : Width of the posterior credibility intervals. π 1, γ 11, γ 12, γ 22

58 VBEM: Simulation study Credibility intervals: π 1 : +, γ 11 :, γ 12 :, γ 22 : Width of the posterior credibility intervals. π 1, γ 11, γ 12, γ 22

59 SBM analysis of E. coli operon networks [Picard et al. (2009)]

60 SBM analysis of E. coli operon networks Meta-graph representation. [Picard et al. (2009)]

61 SBM analysis of E. coli operon networks Meta-graph representation. Parameter estimates. K = 5 [Picard et al. (2009)]

62 (Variational) Bayesian model averaging Joint work with M.-L. Martin-Magniette, S. Volant [Volant et al. (2012)]

63 Model choice Model selection. The number of classes K generally needs to be estimated. In the frequentist setting, an approximate ICL criterion can be derived ICL = E[log P(Y, Z) Y ] 1 { K(K + 1) n(n 1) log (K 1) log n In the Bayesian setting, exact versions of BIC and ICL criteria can be calculated as log P(Y, K), log P(Y, Z, K). (up to the variational approximation) [Latouche et al. (2012)] But, in some applications, it may be useless or meaningless and model averaging may be preferred.

64 Model choice Model selection. The number of classes K generally needs to be estimated. In the frequentist setting, an approximate ICL criterion can be derived ICL = E[log P(Y, Z) Y ] 1 { K(K + 1) n(n 1) log (K 1) log n In the Bayesian setting, exact versions of BIC and ICL criteria can be calculated as log P(Y, K), log P(Y, Z, K). (up to the variational approximation) [Latouche et al. (2012)] But, in some applications, it may be useless or meaningless and model averaging may be preferred.

65 Bayesian model averaging (BMA) General principle. [Hoeting et al. (1999)] : a parameter that can be defined under a series of different models {M K } K. Denote P K ( Y ) its posterior distribution under model M K. The posterior distribution of can be averaged over all models as P( Y ) = K P(M K Y )P K ( Y ) Remarks. w k = P(M K Y ): weight given to model M K for the estimation of. Calculating of w K is not easy, but variational approximation may help.

66 Bayesian model averaging (BMA) General principle. [Hoeting et al. (1999)] : a parameter that can be defined under a series of different models {M K } K. Denote P K ( Y ) its posterior distribution under model M K. The posterior distribution of can be averaged over all models as P( Y ) = K P(M K Y )P K ( Y ) Remarks. w k = P(M K Y ): weight given to model M K for the estimation of. Calculating of w K is not easy, but variational approximation may help.

67 Variational Bayesian model averaging (VBMA) Variational Bayes formulation. M K can be viewed as one more hidden layer Variational Bayes then aims at finding Q (K, θ, Z) = arg min KL [Q(K, θ, Z) P(K, θ, Z Y )] Q Q with Q = {Q(θ, Z, K) = Q θ (θ K)Q Z (Z K)Q K (K)} 1 Optimal variational weights: Q K (K) P(K) exp{log P(Y K) KL[Q (Z, θ K); P(Z, θ Y, K)]} = P(K Y )e KL[Q (Z,θ K);P(Z,θ Y,K)]. 1 No additional approximation w.r.t. the regular VBEM.

68 Variational Bayesian model averaging (VBMA) Variational Bayes formulation. M K can be viewed as one more hidden layer Variational Bayes then aims at finding Q (K, θ, Z) = arg min KL [Q(K, θ, Z) P(K, θ, Z Y )] Q Q with Q = {Q(θ, Z, K) = Q θ (θ K)Q Z (Z K)Q K (K)} 1 Optimal variational weights: Q K (K) P(K) exp{log P(Y K) KL[Q (Z, θ K); P(Z, θ Y, K)]} = P(K Y )e KL[Q (Z,θ K);P(Z,θ Y,K)]. 1 No additional approximation w.r.t. the regular VBEM.

69 VBMA: the recipe For K = 1... K max Use regular VBEM to compute the approximate conditional posterior Q Z,θ K (Z, θ K) = Q Z K (Z K)Q θ K (θ K); Compute the conditional lower bound of the log-likelihood L K = log P(Y K) KL[Q (Z, θ K); P(Z, θ Y, K)]. Compute the approximate posterior of model M K : w K := Q K (K) P(K) exp(l K ). Deduce the approximate posterior of : P( Y ) K w K Q θ K ( K)

70 VBMA: the recipe For K = 1... K max Use regular VBEM to compute the approximate conditional posterior Q Z,θ K (Z, θ K) = Q Z K (Z K)Q θ K (θ K); Compute the conditional lower bound of the log-likelihood L K = log P(Y K) KL[Q (Z, θ K); P(Z, θ Y, K)]. Compute the approximate posterior of model M K : w K := Q K (K) P(K) exp(l K ). Deduce the approximate posterior of : P( Y ) K w K Q θ K ( K)

71 VBMA: the recipe For K = 1... K max Use regular VBEM to compute the approximate conditional posterior Q Z,θ K (Z, θ K) = Q Z K (Z K)Q θ K (θ K); Compute the conditional lower bound of the log-likelihood L K = log P(Y K) KL[Q (Z, θ K); P(Z, θ Y, K)]. Compute the approximate posterior of model M K : w K := Q K (K) P(K) exp(l K ). Deduce the approximate posterior of : P( Y ) K w K Q θ K ( K)

72 Towards W -graphs Joint work with P. Latouche. [Latouche and Robin (2013)]

73 W -graph model W graph. Latent variables: Graphon function γ(z, z ) (Z i ) iid U [0,1], Graphon function γ: γ(z, z ) : [0, 1] 2 [0, 1] Edges: Pr{Y ij = 1} = γ(z i, Z j )

74 Inference of the graphon function Probabilistic point of view. W -graph have been mostly studied in the probability literature: [Lovász and Szegedy (2006)], [Diaconis and Janson (2008)] Motif (sub-graph) frequencies are invariant characteristics of a W -graph. Intrinsic un-identifiability of the graphon function γ is often overcome by imposing that u γ(u, v) dv is monotonous increasing. Statistical point of view. Not much attention has been paid to its inference until very recently: [Chatterjee (2012)], [Airoldi et al. (2013)], [Wolfe and Olhede (2013)],... The two latter also uses SBM as a proxy for W -graph.

75 Inference of the graphon function Probabilistic point of view. W -graph have been mostly studied in the probability literature: [Lovász and Szegedy (2006)], [Diaconis and Janson (2008)] Motif (sub-graph) frequencies are invariant characteristics of a W -graph. Intrinsic un-identifiability of the graphon function γ is often overcome by imposing that u γ(u, v) dv is monotonous increasing. Statistical point of view. Not much attention has been paid to its inference until very recently: [Chatterjee (2012)], [Airoldi et al. (2013)], [Wolfe and Olhede (2013)],... The two latter also uses SBM as a proxy for W -graph.

76 SBM as a W -graph model Latent variables: (U i ) iid U[0, 1] Graphon function γ SBM K (z, z ) Z ik = I{σ k 1 U i < σ k } where σ k = k l=1 π l. Blockwise constant graphon: Edges: γ(z, z ) = γ kl Pr{Y ij = 1} = γ(z i, Z j )

77 Variational Bayes estimation of γ(z, z ) Posterior mean of γk SBM (z, z ) VBEM inference provides the approximate posteriors: (π Y ) Dir(π ) (γ kl Y ) Beta(γkl 0 kl ) Estimate of γ(u, v). γ SBM K (u, v) = Ẽ ( γ C(u),C(v) Y ) where C(u) = 1 + k I{σ k u}. [Gouda and Szántai (2010)]

78 Model averaging Model averaging: There is no true K in the W -graph model. Apply VBMA recipe. For K = 1..K max, fit an SBM model via VBEM and compute Perform model averaging as γ SBM K (z, z ) = E Q [γ C(z),C(z )]. γ(z, z ) = K w K γ SBM K (z, z ) where w K is the variational weights arising from variational Bayes inference.

79 Model averaging Model averaging: There is no true K in the W -graph model. Apply VBMA recipe. For K = 1..K max, fit an SBM model via VBEM and compute Perform model averaging as γ SBM K (z, z ) = E Q [γ C(z),C(z )]. γ(z, z ) = K w K γ SBM K (z, z ) where w K is the variational weights arising from variational Bayes inference.

80 Some simulations Design. Symetric graphon: γ(u, v) = ρλ 2 (uv) λ 1 Variational posterior for K: Q (K). λ : imbalanced graph ρ : dense graph Results. More complex models as n and λ Posterior fairly concentrated

81 French political blogosphere Website network. French political blogs: 196 nodes, 1432 edges.

82 French political blogosphere Infered graphon. Ŵ (u, v) = E(γ(u, v) Y ) Motif probability can be estimated as well as µ(m) = E(µ(m) Y ).

83 French political blogosphere Infered graphon. Ŵ (u, v) = E(γ(u, v) Y ) Motif probability can be estimated as well as µ(m) = E(µ(m) Y ).

84 Conclusion Real networks are heterogeneous. State space models (including SBM) allow to capture such an heterogeneity... but raise inference problems. Variational approximations help to deal with complex (conditional) dependency structures. SBM is a special (rare?) case for which guaranties exist as for the performances of the variational approximation. SBM can be used as a proxy for smoother models, such as W -graphs.

85 Conclusion Real networks are heterogeneous. State space models (including SBM) allow to capture such an heterogeneity... but raise inference problems. Variational approximations help to deal with complex (conditional) dependency structures. SBM is a special (rare?) case for which guaranties exist as for the performances of the variational approximation. SBM can be used as a proxy for smoother models, such as W -graphs.

86 Conclusion Real networks are heterogeneous. State space models (including SBM) allow to capture such an heterogeneity... but raise inference problems. Variational approximations help to deal with complex (conditional) dependency structures. SBM is a special (rare?) case for which guaranties exist as for the performances of the variational approximation. SBM can be used as a proxy for smoother models, such as W -graphs.

87 Conclusion Real networks are heterogeneous. State space models (including SBM) allow to capture such an heterogeneity... but raise inference problems. Variational approximations help to deal with complex (conditional) dependency structures. SBM is a special (rare?) case for which guaranties exist as for the performances of the variational approximation. SBM can be used as a proxy for smoother models, such as W -graphs.

88 Conclusion Real networks are heterogeneous. State space models (including SBM) allow to capture such an heterogeneity... but raise inference problems. Variational approximations help to deal with complex (conditional) dependency structures. SBM is a special (rare?) case for which guaranties exist as for the performances of the variational approximation. SBM can be used as a proxy for smoother models, such as W -graphs.

89 Airoldi, E. M., Costa, T. B. and Chan, S. H. (Nov., 2013). Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. ArXiv e-prints Beal, J., M. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayes. Statist Bollobás, B., Janson, S. and Riordan, O. (2007). The phase transition in inhomogeneous random graphs. Rand. Struct. Algo. 31 (1) Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Statis Chatterjee, S. (Dec., 2012). Matrix estimation by Universal Singular Value Thresholding. ArXiv e-prints Daudin, J.-J., Picard, F. and Robin, S. (Jun, 2008). A mixture model for random graphs. Stat. Comput. 18 (2) Daudin, J.-J., Pierre, L. and Vacher, C. (2010). Model for heterogeneous random networks using continuous latent variables and an application to a tree fungus network. Biometrics. 66 (4) Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. Rend. Mat. Appl. 7 (28) Gazal, S., Daudin, J.-J. and Robin, S. (2012). Accuracy of variational estimates for random graph mixture models. Journal of Statistical Computation and Simulation. 82 (6) Girvan, M. and Newman, M. E. J. (2002). Community strucutre in social and biological networks. Proc. Natl. Acad. Sci. USA. 99 (12) Gouda, A. and Szántai, T. (2010). On numerical calculation of probabilities according to Dirichlet distribution. Ann. Oper. Res DOI: /s

90 Handcock, M., Raftery, A. and Tantrum, J. (2007). Model-based clustering for social networks. JRSSA. 170 (2) doi: /j X x. Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science. 14 (4) Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 (460) Jaakkola, T. (2000). Advanced mean field methods: theory and practice. chapter Tutorial on variational approximation methods. MIT Press. Latouche, P., Birmelé, E. and Ambroise, C. (2012). Variational bayesian inference and complexity control for stochastic block models. Statis. Model. 12 (1) Latouche, P. and Robin, S. (Oct., 2013). Bayesian Model Averaging of Stochastic Block Models to Estimate the Graphon Function and Motif Frequencies in a W-graph Model. ArXiv e-prints Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences. Journal of Combinatorial Theory, Series B. 96 (6) von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Stat. 36 (2) Mariadassou, M. and Matias, C. (June, 2012). Convergence of the groups posterior distribution in latent or stochastic block models. ArXiv e-prints Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks,. Phys. Rev. E Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Phys. Rev. E (69) Nowicki, K. and Snijders, T. (2001). Estimation and prediction for stochastic block-structures. J. Amer. Statist. Assoc Parisi, G. (1988). Statistical Field Theory. Addison Wesley, New York),.

91 Picard, F., Miele, V., Daudin, J.-J., Cottret, L. and Robin, S. (2009). Deciphering the connectivity structure of biological networks using mixnet. BMC Bioinformatics. Suppl 6 S17. doi: / s6-s17. Volant, S., Magniette, M.-L. M. and Robin, S. (2012). Variational bayes approach for model aggregation in unsupervised classification with markovian dependency. Comput. Statis. & Data Analysis. 56 (8) Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1 (1 2) Wolfe, P. J. and Olhede, S. C. (Sep., 2013). Nonparametric graphon estimation. ArXiv e-prints

92 SBM for a binary social network Zachary data. Social binary network of friendship within a sport club. (%) γ kl k/l π l

93 Küllback-Leibler divergence Definition. KL[Q( ), P( )] = = Q(Z) log Q(Z) P(Z) dz Q(Z) log Q(Z) dz Q(Z) log P(Z) dz Some properties. Always positive. Null iff P = Q. Contrast consistent with maximum-likelihood inference. Not a distance, only a divergence.

94 Concentration of the degree distribution n = 100.

95 Concentration of the degree distribution n = 1000.

96 Concentration of the degree distribution n =

Goodness of fit of logistic models for random graphs: a variational Bayes approach

Goodness of fit of logistic models for random graphs: a variational Bayes approach Goodness of fit of logistic models for random graphs: a variational Bayes approach S. Robin Joint work with P. Latouche and S. Ouadah INRA / AgroParisTech MIAT Seminar, Toulouse, November 2016 S. Robin

More information

Variational inference for the Stochastic Block-Model

Variational inference for the Stochastic Block-Model Variational inference for the Stochastic Block-Model S. Robin AgroParisTech / INRA Workshop on Random Graphs, April 2011, Lille S. Robin (AgroParisTech / INRA) Variational inference for SBM Random Graphs,

More information

IV. Analyse de réseaux biologiques

IV. Analyse de réseaux biologiques IV. Analyse de réseaux biologiques Catherine Matias CNRS - Laboratoire de Probabilités et Modèles Aléatoires, Paris catherine.matias@math.cnrs.fr http://cmatias.perso.math.cnrs.fr/ ENSAE - 2014/2015 Sommaire

More information

Modeling heterogeneity in random graphs

Modeling heterogeneity in random graphs Modeling heterogeneity in random graphs Catherine MATIAS CNRS, Laboratoire Statistique & Génome, Évry (Soon: Laboratoire de Probabilités et Modèles Aléatoires, Paris) http://stat.genopole.cnrs.fr/ cmatias

More information

Variational Bayes model averaging for graphon functions and motif frequencies inference in W-graph models

Variational Bayes model averaging for graphon functions and motif frequencies inference in W-graph models Variational Bayes model averaging for graphon functions and motif frequencies inference in W-graph models Pierre Latouche, Stephane Robin To cite this version: Pierre Latouche, Stephane Robin. Variational

More information

Bayesian methods for graph clustering

Bayesian methods for graph clustering Author manuscript, published in "Advances in data handling and business intelligence (2009) 229-239" DOI : 10.1007/978-3-642-01044-6 Bayesian methods for graph clustering P. Latouche, E. Birmelé, and C.

More information

MODELING HETEROGENEITY IN RANDOM GRAPHS THROUGH LATENT SPACE MODELS: A SELECTIVE REVIEW

MODELING HETEROGENEITY IN RANDOM GRAPHS THROUGH LATENT SPACE MODELS: A SELECTIVE REVIEW ESAIM: PROCEEDINGS AND SURVEYS, December 2014, Vol. 47, p. 55-74 F. Abergel, M. Aiguier, D. Challet, P.-H. Cournède, G. Faÿ, P. Lafitte, Editors MODELING HETEROGENEITY IN RANDOM GRAPHS THROUGH LATENT SPACE

More information

The Random Subgraph Model for the Analysis of an Ecclesiastical Network in Merovingian Gaul

The Random Subgraph Model for the Analysis of an Ecclesiastical Network in Merovingian Gaul The Random Subgraph Model for the Analysis of an Ecclesiastical Network in Merovingian Gaul Charles Bouveyron Laboratoire MAP5, UMR CNRS 8145 Université Paris Descartes This is a joint work with Y. Jernite,

More information

arxiv: v2 [stat.ap] 24 Jul 2010

arxiv: v2 [stat.ap] 24 Jul 2010 Variational Bayesian inference and complexity control for stochastic block models arxiv:0912.2873v2 [stat.ap] 24 Jul 2010 P. Latouche, E. Birmelé and C. Ambroise Laboratoire Statistique et Génome, UMR

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Uncovering structure in biological networks: A model-based approach

Uncovering structure in biological networks: A model-based approach Uncovering structure in biological networks: A model-based approach J-J Daudin, F. Picard, S. Robin, M. Mariadassou UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Statistics

More information

Mixtures and Hidden Markov Models for analyzing genomic data

Mixtures and Hidden Markov Models for analyzing genomic data Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Statistical analysis of biological networks.

Statistical analysis of biological networks. Statistical analysis of biological networks. Assessing the exceptionality of network motifs S. Schbath Jouy-en-Josas/Evry/Paris, France http://genome.jouy.inra.fr/ssb/ Colloquium interactions math/info,

More information

Bayesian Methods for Graph Clustering

Bayesian Methods for Graph Clustering Bayesian Methods for Graph Clustering by Pierre Latouche, Etienne Birmelé, and Christophe Ambroise Research Report No. 17 September 2008 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry, France

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

G8325: Variational Bayes

G8325: Variational Bayes G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Bayesian non parametric inference of discrete valued networks

Bayesian non parametric inference of discrete valued networks Bayesian non parametric inference of discrete valued networks Laetitia Nouedoui, Pierre Latouche To cite this version: Laetitia Nouedoui, Pierre Latouche. Bayesian non parametric inference of discrete

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2017 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Theory and Methods for the Analysis of Social Networks

Theory and Methods for the Analysis of Social Networks Theory and Methods for the Analysis of Social Networks Alexander Volfovsky Department of Statistical Science, Duke University Lecture 1: January 16, 2018 1 / 35 Outline Jan 11 : Brief intro and Guest lecture

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Topics in Network Models. Peter Bickel

Topics in Network Models. Peter Bickel Topics in Network Models MSU, Sepember, 2012 Peter Bickel Statistics Dept. UC Berkeley (Joint work with S. Bhattacharyya UC Berkeley, A. Chen Google, D. Choi UC Berkeley, E. Levina U. Mich and P. Sarkar

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

State Space and Hidden Markov Models

State Space and Hidden Markov Models State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State

More information

Variational Mixture of Gaussians. Sargur Srihari

Variational Mixture of Gaussians. Sargur Srihari Variational Mixture of Gaussians Sargur srihari@cedar.buffalo.edu 1 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties

More information

Degree-based goodness-of-fit tests for heterogeneous random graph models : independent and exchangeable cases

Degree-based goodness-of-fit tests for heterogeneous random graph models : independent and exchangeable cases arxiv:1507.08140v2 [math.st] 30 Jan 2017 Degree-based goodness-of-fit tests for heterogeneous random graph models : independent and exchangeable cases Sarah Ouadah 1,2,, Stéphane Robin 1,2, Pierre Latouche

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Some Statistical Models and Algorithms for Change-Point Problems in Genomics

Some Statistical Models and Algorithms for Change-Point Problems in Genomics Some Statistical Models and Algorithms for Change-Point Problems in Genomics S. Robin UMR 518 AgroParisTech / INRA Applied MAth & Comput. Sc. Journées SMAI-MAIRCI Grenoble, September 2012 S. Robin (AgroParisTech

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Learning latent structure in complex networks

Learning latent structure in complex networks Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Graphical models: parameter learning

Graphical models: parameter learning Graphical models: parameter learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London London WC1N 3AR, England http://www.gatsby.ucl.ac.uk/ zoubin/ zoubin@gatsby.ucl.ac.uk

More information

Clustering bi-partite networks using collapsed latent block models

Clustering bi-partite networks using collapsed latent block models Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Stochastic Variational Inference

Stochastic Variational Inference Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information