Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1
Outline 1 Relational data 2 Exchangeability for arrays 3 Latent variable models 4 Subtalk: Rank selection for the SVD model (a) Prior distributions on the Steifel manifold (b) A Gibbs sampling scheme for rank estimation (c) A simulation study 5 Examples: (a) Modeling international conflict (b) Finding protein-protein interactions 2
Relational data Relational data consists of a set of units or nodes A, and a set of variables Y {y i,j } specific to pairs of nodes (i, j) A A Examples: Needle-sharing network: Let A index a set of IV drug users Let y i,j indicate needle-sharing activity between i and j International relations: Let A index the set of countries Let y i,j be the number of military disputes initiated by i with target j Protein-protein interactions: Let A index a sets of proteins Let y i,j measure the interaction between i and j Gene-protein interactions: Let A = {A 1,A 2 },witha 1 indexing a set of genes and A 2 indexing a set of proteins Let y i,j measure the relationship between i A 1 and j A 2 3
Inferential goals in the regression framework Let y i,j be the measurement on i j, andx i,j be a vector of explanatory variables If y i,j is binary, then maybe we have Y = 0 @ 1 0 NA 0 0 0 0 1 0 NA 1 NA 0 1 0 0 1 A X = 0 @ x 1,2 x 1,3 x 1,4 x 1,5 x 2,1 x 2,3 x 2,4 x 2,5 x 3,1 x 3,2 x 3,4 x 3,5 x 4,1 x 4,2 x 4,3 x 4,5 x 5,1 x 5,2 x 5,3 x 5,4 1 A onsider a basic generalized linear model log odds(y i,j =1)=β x i,j A model can provide a measure of the association between X and Y: ˆβ, se(ˆβ) predictions of missing or future observations: Pr(y 1,4 =1 Y, X) 4
How might node heterogeneity affect network structure? 5
Latent variable models Deviations from ordinary logistic regression can be represented with log odds(y i,j =1)=β x i,j + γ i,j Perhaps the first latent variable model was similar to a p 1 -type model γ i,j = a i + b j log odds(y i,j =1)=β x i,j + a i + b j a i and b j induce across-node heterogeneity that is additive on the log-odds scale Inclusion of these effects in the model can dramatically improve within-sample model fit (measured by R 2, likelihood ratio, I, etc ); out-of-sample predictive performance (measured by cross-validation) ut this model only captures heterogeneity of outdegree/indegree, and can t represent more complicated structure, such as clustering, transitivity, etc 6
Exchangeability for arrays For relational data in general we might be willing to assume something like {γ i,j } d = {γ gi,hj } for all permutations g and h This is a type of exchangeability for arrays, sometimes called weak exchangeability Theorem (Aldous, Hoover): Let {γ i,j } be a weakly exchangeable array Then {γ i,j } d = {γ i,j },where γ i,j = f(µ, u i,v j,ɛ i,j ) and µ, {u i }, {v j }, {ɛ i,j } are all independent uniform random variables on [0, 1] There are variants of this theorem for symmetric sociomatrices and other exchangeability assumptions 7
Matrix representation techniques Exchangeability suggests random effects models of the form γ i,j = f(µ, u i,v j,ɛ i,j ) What should f be? Model: log odds(y i,j =1)=µ + u idv j + ɛ i,j Interpretation: Think of {u 1,,u m }, {v 1,,v m } as vectors of latent nodal attributes u idv j = K d k u i,k v j,k k=1 8
Model Extension: Latent SVD model General model for relational data : Let Y be an m n matrix of relational data; X be an m n p array of explanatory variables A latent variable model relating X to Y is g(e[y i,j ]) = β x i,j + u idv j + ɛ i,j For example, some potential models are If y i,j is binary, If y i,j is count data, If y i,j is continuous, log odds (y i,j =1)=β x i,j + u i Dv j + ɛ i,j log E[y i,j ]=β x i,j + u i Dv j + ɛ i,j E[y i,j ]=β x i,j + u i Dv j + ɛ i,j (similar to the generalized bilinear models of Gabriel 1998) Estimation and dimension selection for u idv j relates to a more basic SVD model 9
eginning of subtalk: The singular value decomposition Every m n matrix M has a representation of the form M = UDV where, in the case m n, Recall, U is an m n matrix with orthonormal columns; V is an n n matrix with orthonormal columns; D is an n n diagonal matrix, with diagonal elements {d 1,,d n } typically taken to be a decreasing sequence of non-negative numbers The squared elements of the diagonal of D are the eigenvalues of M M and the columns of V are the corresponding eigenvectors The matrix U can be obtained from the first n eigenvectors of MM The number of non-zero elements of D is the rank of M 10
Data analysis with the singular value decomposition Many data analysis procedures for matrix-valued data Y are related to the SVD Given a model of the form Y = M + E the SVD provides Interpretation: y i,j = u i Dv j + ɛ i,j, u i and v j are the ith, jth rows of U, V Estimation: Applications: ˆMK = Û[,1:K] ˆD [1:K,1:K] ˆV [,1:K] if M is assumed to be of rank K biplots (Gabriel 1971, Gower and Hand 1996) reduced-rank interaction models (Gabriel 1978, 1998) analysis of relational data (Harshman et al, 1982) Factor analysis, image processing, data reduction, ut: How to select K? GivenK, is ˆM K a good estimator? (E[Y Y]=M M + mσ 2 I ) ayesian work on factor analysis models: Rajan and Rayner (1997), Minka (2000), Lopes and West (2004) 11
Atoyexample Y 40 30 = UDV + E = X d k U [,k] V [,k] + E singular values 0 5 10 15 20 0 5 10 15 20 25 30 12
Atoyexample Y 40 30 = UDV + E = X d k U [,k] V [,k] + E singular values 0 5 10 15 20 0 5 10 15 20 25 30 13
Atoyexample Y 40 30 = UDV + E = X d k U [,k] V [,k] + E singular values 0 5 10 15 20 0 5 10 15 20 25 30 14
P(K Y) 000 005 010 015 020 Atoyexample 1 3 5 7 9 11 13 15 3 2 1 0 1 2 3 K Truth Estimate 4 2 0 2 MSE(ayes) = 020 MSE(LSE) = 042 15
SVD Model Let Y be an m n data matrix (suppose m n) onsider a model of the form Y = UDV + E where K {0,,n} ; U V K,m, V V K,n ; D =diag{d 1,,d K }; E is a matrix of independent normal noise Given a prior distribution on {U, D, V} we would like to calculate p({u 1,,u K }, {v 1,,v K }, {d 1,,d K } Y,K) p(k Y) 16
Prior distribution for U 1 z 1 uniform[m-sphere], set U [,1] = z 1 ; 2 z 2 uniform[(m 1)-sphere], set U [,2] =Null(U [,1] )z 2 ; K z K uniform [(m K + 1)-sphere], U [,K] =Null(U [,1],,U [,K 1] )z K The resulting distribution for U is the uniform (invariant) distribution on the Steifel manifold, and is exchangeable under row and column permutations Prior conditional distributions: (U [,j] U [, j] ) d = Null(U [,1],,U [,j 1], U [,j+1],,u [,K] )z j, z j uniform [(m K + 1)-sphere] 17
Posterior distribution for U Recall UDV = K k=1 d ku [,k] V [,k], so so Y UDV = (Y U [, j] D [ j, j] V [, j]) d j U [,j] V [,j] E j d j U [,j] V [,j] Y UDV 2 = E j d j U [,j] V [,j] 2 = E j 2 2d j U [,j] E j V [,j] + d j U [,j] V [,j] 2 = E j 2 2d j U [,j] E j V [,j] + d 2 j p(y U, V, D,φ) = (2πφ) mn/2 exp{ 1 2 φ E j 2 + φd j U [,j] E j V [,j] 1 2 φd2 j} p(u [,j] Y, U [, j], V, D) p(u [,j] U [, j] )exp{φd j U [,j] E j V [,j] } (U [,j] U [, j], Y, V, D) d = Null(U [,1],,U [,j 1], U [,j+1],,u [,K] )z j N j z j, p(z j ) exp{z j µ}, µ = φd j N j E jv [,j] 18
Evaluating the dimension It can all be done by comparing M 1 : Y = duv + E M 0 : Y = E To decide whether or not to add a dimension, we need to calculate p(m 1 Y) p(m 0 Y) = p(m 1) p(y M 1 ) p ( M 0 ) p(y M 0 ) The numerator of the ayes factor is the hard part We need to integrate the following wrt the prior distribution: p(y M 1, u, v, d) = (2π) nm/2 exp{ 1 2 φ Y 2 /2+φdu Yv φd 2 /2} = p(y M 0 ) exp{ φd 2 /2} exp{u [dφy]v} omputing E[e u Av ]overu Sm and v S n is difficult but doable 19
Simulation study For each of (m, n) {(10, 10), (100, 10), (100, 100)}, 100 datasets were generated as follows: U uniform(v 5,m ), V uniform(v 5,n ); D =diag{d 1,,d 5 }, {d 1,,d 5 } iid uniform( 1 2 µ mn, 3 2 µ mn) Y = UDV + E, wheree is an m n matrix of standard normal noise where µ mn = m + n +2 mn This choice was not arbitrary: The largest eigenvalue of E is approximately (Edelman, 1988) λ max (m + n +2 mn) 20
E[p(K Y)] 00 02 04 06 E[p(K Y)] 00 02 04 06 E[p(K Y)] 00 02 04 06 m=10, n=10 0 2 4 6 8 10 m=100, n=10 0 2 4 6 8 10 m=100, n=100 0 2 4 6 8 10 K Simulation study 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 K^ 21 p(k^ ) 00 02 04 06 p(k^ ) 00 02 04 06 p(k^ ) 00 02 04 06 Density 0 1 2 3 4 5 6 Density 0 1 2 3 4 5 6 Density 0 1 2 3 4 5 6 00 05 10 15 00 05 10 15 00 05 10 15 ASE ASELS
International onflict Network, 1990-2000 11 years of international relations data (thanks to Mike Ward and Xun ao) y i,j = number of militarized disputes initiated by i with target j; x i,j an 11-dimensional covariate vector containing 1 log distance 2 log imports 3 log exports 4 number of shared intergovernmental organizations 5 population initiator 6 population target 7 log gdp initiator 8 log gdp target 9 polity score initiator 10 polity score target 11 polity score initiator polity score target Model: y i,j are independent Poisson random variables with log-mean log E[y i,j ]=β x i,j + u idv j + ɛ i,j 22
International conflict network, 1990-2000: Parameter estimates p(k Y) 00 01 02 03 04 log distance polity initiator intergov org polity target polity interaction log imports log gdp target log exports log gdp initiator log pop target log pop initiator 0 2 4 6 8 K 10 05 00 05 standardized coefficient 23
AFG ANG ARG AUL AH NG OT UI AN AO HA HN OS U YP DOM DR EGY FRN GHA GR GUI HON IND INS IRN IRQ ISR ITA JOR JPN KEN LR LES LI MAL MYA NI NIG NIR NTH OMA PAK PHI PRK QAT ROK RWA SAF SAL SAU SEN SIE SRI SUD SWA SYR TAW TAZ THI TOG TUR UAE UGA UKG USA VEN AFG AL ANG ARG AUL AH EL EN NG UI AM AN DI HA HL HN OL ON OS U YP DR EGY FRN GHA GN GR GUI GUY HAI HON IND INS IRN IRQ ISR ITA JOR JPN LR LES LI MAA MLI MON MOR MYA MZM NAM NI NIG NIR NTH OMA PAK PHI PNG PRK QAT ROK RWA SAF SAL SAU SEN SIE SIN SPN SUD SYR TAW TAZ THI TOG TRI TUR UAE UGA UKG USA VEN YEM ZAM ZIM 24
Protein-protein interaction data Experiments are run on pairs of protein to see if they bind: 1 if proteins i, j bind; y i,j = 0 else There are many proteins, and ( ) many 2 possible experiments, only a subset of which are usually performed an we predict interactions from a subset of the experiments? Missing data experiment: Ecoli protein interaction network (utland et al 2005); All interactions checked on a 270 270 protein network; A very sparse network: ȳ =001 25
3 1 0 1 2 3 3 1 0 1 2 3 Full dataset u v 3 1 0 1 2 3 75% training data u v 25% test data u v 0 50 100 150 # of additional experiments # of new observed interactions 0 10 20 30 40 50 60 Finding missing links 26
Summary and future directions Summary: SVD models are a natural way to represent patterns in relational data; The SVD structure can be incorporated into a variety of model forms; MA on model rank can be done Future Directions: Extensions to higher dimensional arrays; Dynamic network inference (times series models for Y, U, D, V); Restrict latent characteristics to be orthogonal to design vectors 27