Latent SVD Models for Relational Data

Size: px

Start display at page:

Download "Latent SVD Models for Relational Data"

Philomena Ball
5 years ago
Views:

1 Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1

2 Outline Part 1: Multiplicative factor models for network data 1 Relational data 2 Models via matrix representations 3 Models via exchangeability 4 Examples: (a) International conflict data (b) Protein-protein interaction data Part 2: Rank selection for the SVD model 1 Prior distributions on the Steifel manifold 2 A Gibbs sampling scheme for rank estimation 3 A simulation study 4 Example: Global service firm locations 2

3 Relational data Relational data consists of a set of units or nodes A, and a set of variables Y {y i,j } specific to pairs of nodes (i, j) A A Examples: International relations: Let A index the set of countries Let y i,j be the number of military disputes initiated by i with target j Needle-sharing network: Let A index a set of IV drug users Let y i,j indicate needle-sharing activity between i and j Protein-protein interactions: Let A index a sets of proteins Let y i,j measure the interaction between i and j usiness locations: Let A 1 be a set of banks and A 2 be a set of cities For (i, j) A 1 A 2, let y i,j indicate the presence of an office of bank i in city j 3

4 Inferential goals in the regression framework Let y i,j be the measurement on i j, and x i,j be a vector of explanatory variables If y i,j is binary, then maybe we have Y = 1 0 NA NA 1 NA A X = x 1,2 x 1,3 x 1,4 x 1,5 x 2,1 x 2,3 x 2,4 x 2,5 x 3,1 x 3,2 x 3,4 x 3,5 x 4,1 x 4,2 x 4,3 x 4,5 x 5,1 x 5,2 x 5,3 x 5,4 1 A onsider a basic generalized linear model log odds(y i,j = 1) = β x i,j A model can provide a measure of the association between X and Y: ˆβ, se( ˆβ) predictions of missing or future observations: Pr(y 1,4 = 1 Y, X) 4

5 How might node heterogeneity affect network structure? 5

6 Latent variable models Deviations from ordinary logistic regression can be represented with log odds(y i,j = 1) = β x i,j + z i,j A simple latent variable model might include row and column effects: z i,j = a i + b j log odds(y i,j = 1) = β x i,j + a i + b j a i and b j induce across-node heterogeneity that is additive on the log-odds scale Inclusion of these effects in the model can dramatically improve within-sample model fit (measured by R 2, likelihood ratio, I, etc ); out-of-sample predictive performance (measured by cross-validation) ut this model only captures heterogeneity of outdegree/indegree, and can t represent more complicated structure, such as clustering, transitivity, etc 6

7 The singular value decomposition Z = M + E M represents systematic patterns in the effects and E represents noise Every M has a representation of the form M = UDV where, in the case m n, Recall, U is an m n matrix with orthonormal columns; V is an n n matrix with orthonormal columns; D is an n n diagonal matrix, with diagonal elements {d 1,, d n } typically taken to be a decreasing sequence of non-negative numbers The squared elements of the diagonal of D are the eigenvalues of M M and the columns of V are the corresponding eigenvectors The matrix U can be obtained from the first n eigenvectors of MM The number of non-zero elements of D is the rank of M Writing the row vectors as {u 1,, u m }, {v 1,, v n }, m i,j = u i Dv j 7

8 Multiplicative effects Interpretation: log odds(y i,j = 1) = β x i,j + u idv j + ɛ i,j Think of {u 1,, u m }, {v 1,, v n } as vectors of latent nodal attributes: u idv j = K d k u i,k v j,k k=1 In general, a latent variable model relating X to Y is For example, some potential models are g(e[y i,j ]) = β x i,j + u idv j + ɛ i,j If y i,j is binary, If y i,j is count data, If y i,j is continuous, log odds (y i,j = 1) = β x i,j + u idv j + ɛ i,j log E[y i,j ] = β x i,j + u idv j + ɛ i,j E[y i,j ] = β x i,j + u idv j + ɛ i,j Estimation: Given D, V, the predictor is linear in {u 1,, u m } This bilinear structure can be exploited (EM, Gibbs sampling, variational methods) 8

9 Alternative justification via exchangeability X represents known information about the nodes Z represents any additional patterns In this case we might be willing to use a model in which {z i,j } d = {z gi,hj } for all permutations g and h This is a type of exchangeability for arrays, sometimes called weak exchangeability Theorem (Aldous, Hoover): Let {z i,j } be a weakly exchangeable array Then {z i,j } d = {z i,j }, where z i,j = f(µ, u i, v j, ɛ i,j ) and µ, {u i }, {v j }, {ɛ i,j } are all independent random variables 9

10 International onflict Network, years of international relations data (thanks to Mike Ward and Xun ao) y i,j = indicator of a militarized disputes initiated by i with target j; x i,j an 8-dimensional covariate vector containing an intercept and 1 population initiator 2 population target 3 polity score initiator 4 polity score target 5 polity score initiator polity score target 6 log distance 7 number of shared intergovernmental organizations Model: y i,j are independent binary random variables with log-mean log E[y i,j ] = β x i,j + u idv j + ɛ i,j 10

11 International conflict network, : Parameter estimates log distance polity initiator intergov org polity target polity interaction log pop target log pop initiator regression coefficient 11

12 AFG ANG ARG AUL AH NG OT UI AN AO HA HN OS U YP DOM DR EGY FRN GHA GR GUI HON IND INS IRN IRQ ISR ITA JOR JPN KEN LR LES LI MAL MYA NI NIG NIR NTH OMA PAK PHI PRK QAT ROK RWA SAF SAL SAU SEN SIE SRI SUD SWA SYR TAW TAZ THI TOG TUR UAE UGA UKG USA VEN AFG AL ANG ARG AUL AH EL EN NG UI AM AN DI HA HL HN OL ON OS U YP DR EGY FRN GHA GN GR GUI GUY HAI HON IND INS IRN IRQ ISR ITA JOR JPN LR LES LI MAA MLI MON MOR MYA MZM NAM NI NIG NIR NTH OMA PAK PHI PNG PRK QAT ROK RWA SAF SAL SAU SEN SIE SIN SPN SUD SYR TAW TAZ THI TOG TRI TUR UAE UGA UKG USA VEN YEM ZAM ZIM 12

13 Analyzing undirected graphs In many applications, y i,j = y j,i by design log odds(y i,j = y j,i = 1) = β x i,j + z i,j As before, write Z = M + E, with all matrices symmetric All such M have an eigenvalue decomposition This suggests a model of the form M = UΛU m i,j = u iλu j log odds(y i,j = y j,i = 1) = β x i,j + u iλu j + ɛ i,j Justification via exchangeability: Let Z be a symmetric array (with an undefined diagonal), such that {z i,j } = d {z gi,gj } Then {z i,j } = d {zi,j }, where z i,j = f(µ, u i, u j, ɛ i,j ) µ, {u i }, {ɛ i,j } are independent random variables, and f(, u i, u j, ) = f(, u j, u i, ) 13

14 Protein-protein interaction network Interactions among 270 proteins in E coli (utland, 2005) 1 Data: ( ) i<j y i,j 001 Model: log odds(y i,j = 1) = µ + u i Λu j + ɛ i,j (analysis by Ryan ressler) 14

15 Positive Negative acca acc accd acpp aid alas b1731 b2097 b2341 b2342 b2496 bola cada cbpa clpa clpp clpx cob cspa csp csp cspd cspe dead dnaa dna dnae dnaj dnak dnaq eno faba fabi fabz fba fdhd flda ftse ftsz gada gad gida grea gre grpe gyra gyr hepa hfq hima hns hol hol hold hole hsca hsdm hsdr hslu hslv hupa hup hyp hypd ibpa inf iscs iscu kdsa ldc lig lpxd lyss lysu malt metk mopa mop mre muk muke mura narg narh narj nary ndk nfi nusa nusg pan par pept pfl phes phet pnp pola ppa prsa pssa pst pur pyka rec recd rece recj recq rfad rfb rhl rne rnha rnpa rpld rpoa rpo rpo rpod rpoh rpon rpos rpoz seca sec sel smp spot srm ssb suca suc tdcd tgt thdf topa top tpia tsf tufa tuf usg uvr vac yacg yacl ybcj ybdq ybez ybjx ycby ycea yce ycff ycil yeaz yedo yfc yfg yfid yfif ygdp ygfa yggh ygjd yhbu yhby yhhp yien yjef yjfq ylea ynh ynhd ynhe acc aas fab pls acps fabf fabg glmu ybg yiid ybbn dnan ydhd rho rpl rpli rplv rps rps rpsg narz clp cspg rpsm rpl rple rplm rpsa rpsd rpse rpla rplx rpsj rpst cspi ygif rpls rplu rpsf rpsh rpsi rpsn rpso rpsp rpsr dna dnax hola reca ent sapa b1742 ffh rplf gcpe ubix fusa cafa hrpa lon tig hlpa himd b1410 b1808 pria malp hyfg hyce ibp rpll gaty mukf nei inf usha sbc rplo rplt b2372 qor manx ycb rpm recg rplq yhir rplr yibl rplw yhbv fucu yab rpsl 15

16 RNA polymerase Negative rpoa gre b2372 usg inf grea rpo manx rpos rpon rpo rpoz yacl rpod nusg pola b1731 hepa nusa Positive 16

17 Prediction experiment for international conflict data # of links found K=0 K=1 K=2 K=3 fraction predicted to be links true links true non links # of pairs checked p^ 17

18 Prediction experiment for protein interaction data Latent dimension 2 interactions found trials 18

19 Data analysis with the singular value decomposition Many data analysis procedures for matrix-valued data Y are related to the SVD Given a model of the form Y = M + E the SVD provides Interpretation: y i,j = u i Dv j + ɛ i,j, u i and v j are the ith, jth rows of U, V Estimation: Applications: ˆMK = Û[,1:K] ˆD [1:K,1:K] ˆV [,1:K] if M is assumed to be of rank K biplots (Gabriel 1971, Gower and Hand 1996) reduced-rank interaction models (Gabriel 1978, 1998) analysis of relational data (Harshman et al, 1982) Factor analysis, image processing, data reduction, ut: How to select K? Given K, is ˆM K a good estimator? (E[Y Y] = M M + mσ 2 I ) ayesian work on factor analysis models: Rajan and Rayner (1997), Minka (2000), Lopes and West (2004) 19

20 A toy example Y = UDV + E = X d k U [,k] V [,k] + E singular values

21 A toy example Y = UDV + E = X d k U [,k] V [,k] + E singular values

22 SVD Model Let Y be an m n data matrix (suppose m n) onsider a model of the form Y = UDV + E where K {0,, n} ; U V K,m, V V K,n ; D = diag{d 1,, d K }; E is a matrix of independent normal noise Given a prior distribution on {U, D, V} we would like to calculate p({u 1,, u K }, {v 1,, v K }, {d 1,, d K } Y, K) p(k Y) 22

23 Prior distribution for U 1 z 1 uniform[m-sphere], set U [,1] = z 1 ; 2 z 2 uniform[(m 1)-sphere], set U [,2] = Null(U [,1] )z 2 ; K z K uniform [(m K + 1)-sphere], U [,K] = Null(U [,1],, U [,K 1] )z K The resulting distribution for U is the uniform (invariant) distribution on the Steifel manifold, and is exchangeable under row and column permutations Prior conditional distributions: (U [,j] U [, j] ) d = Null(U [,1],, U [,j 1], U [,j+1],, U [,K] )z j, z j uniform [(m K + 1)-sphere] 23

24 Posterior distribution for U Recall UDV = K k=1 d ku [,k] V [,k], so so Y UDV = (Y U [, j] D [ j, j] V [, j]) d j U [,j] V [,j] E j d j U [,j] V [,j] Y UDV 2 = E j d j U [,j] V [,j] 2 = E j 2 2d j U [,j] E j V [,j] + d j U [,j] V [,j] 2 = E j 2 2d j U [,j] E j V [,j] + d 2 j p(y U, V, D, φ) = (2πφ) mn/2 exp{ 1 2 φ E j 2 + φd j U [,j] E j V [,j] 1 2 φd2 j} p(u [,j] Y, U [, j], V, D) p(u [,j] U [, j] ) exp{φd j U [,j] E j V [,j] } (U [,j] U [, j], Y, V, D) d = Null(U [,1],, U [,j 1], U [,j+1],, U [,K] )z j N j z j, p(z j ) exp{z j µ}, µ = φd j N je j V [,j] 24

25 Evaluating the dimension It can all be done by comparing M 1 : Y = duv + E M 0 : Y = E To decide whether or not to add a dimension, we need to calculate p(m 1 Y) p(m 0 Y) = p(m 1) p(y M 1 ) p(m 0 ) p(y M 0 ) The numerator of the ayes factor is the hard part We need to integrate the following wrt the prior distribution: p(y M 1, u, v, d) = (2π) nm/2 exp{ 1 2 φ Y 2 /2 + φdu Yv φd 2 /2} = p(y M 0 ) exp{ φd 2 /2} exp{u [dφy]v} omputing E[e u Av ] over u Sm and v S n is difficult but doable 25

26 Simulation study For each of (m, n) {(10, 10), (100, 10), (100, 100)}, 100 datasets were generated as follows: U uniform(v 5,m ), V uniform(v 5,n ) ; D = diag{d 1,, d 5 }, {d 1,, d 5 } iid uniform( 1 2 µ mn, 3 2 µ mn) Y = UDV + E, where E is an m n matrix of standard normal noise where µ mn = m + n + 2 mn This choice was not arbitrary: The largest eigenvalue of E is approximately (Edelman, 1988) λ max (m + n + 2 mn) 26

27 Simulation study E[p(K Y)] m=10, n=10 p(k^ ) Density E[p(K Y)] m=100, n=10 p(k^ ) Density E[p(K Y)] m=100, n=100 p(k^ ) Density K K^ ASE ASE LS 27

28 Final Example Data on 46 global service firms and 55 cities, obtained from the Globalization and World ities study group ( y i,j = 1 if firm j has an office in city i and y i,j = 0 otherwise log odds(y i,j = 1) = β + u i Dv j + e i,j K p(k Y) log p(y θ^) scan K K 28

29 ity effects u2 ^ Kuala Lumpur Taipei Manila Seoul Sydney uenos Aires Mexico Johannesburg ity angkok Shanghai eijing Geneva Los Angeles Zurich aracas Santiago Toronto Jakarta Hong Kong Tokyo Singapore hicago Houston Amsterdam Frankfurt Melbourne MontrealSan Francisco Osaka Sao Paulo oston Milan Dusseldorf Istanbul Miami openhagen Madrid New York arcelona Hamburg Munich Stockholm Paris London udapest Atlanta Prague Warsaw Dallas Rome erlin Minneapolis Moscow Washington,D russels ^ u 1 29

30 Summary and future directions Summary: SVD models are a natural way to represent patterns in relational data; The SVD structure can be incorporated into a variety of model forms; MA on model rank can be done Future Directions: Extensions to higher dimensional arrays; Dynamic network inference (times series models for Y, U, D, V); Restrict latent characteristics to be orthogonal to design vectors 30

Latent SVD Models for Relational Data

Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1 Outline 1 Relational data 2 Exchangeability for arrays