Latent Factor Models for Relational Data

Similar documents
Latent SVD Models for Relational Data

Hierarchical models for multiway data

Mean and covariance models for relational arrays

Probability models for multiway data

Higher order patterns via factor models

Latent Factor Models for Relational Data

The International-Trade Network: Gravity Equations and Topological Properties

Inferring Latent Preferences from Network Data

Clustering and blockmodeling

Hierarchical multilinear models for multiway data

Agriculture, Transportation and the Timing of Urbanization

Random Effects Models for Network Data

Lecture 10 Optimal Growth Endogenous Growth. Noah Williams

Dyadic data analysis with amen

A re examination of the Columbian exchange: Agriculture and Economic Development in the Long Run

Multilinear tensor regression for longitudinal relational data

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Modeling homophily and stochastic equivalence in symmetric relational data

Economic Growth: Lecture 1, Questions and Evidence

Landlocked or Policy Locked?

Network Analysis with Pajek

Lecture Note 13 The Gains from International Trade: Empirical Evidence Using the Method of Instrumental Variables

ECON 581. The Solow Growth Model, Continued. Instructor: Dmytro Hryshko

Equivariant and scale-free Tucker decomposition models

Exponential Random Graph Models (ERGM)

IDENTIFYING MULTILATERAL DEPENDENCIES IN THE WORLD TRADE NETWORK

Linear Regression Models P8111

Model averaging and dimension selection for the singular value decomposition

arxiv:math/ v1 [math.st] 1 Sep 2006

Sampling and incomplete network data

Stat 579: Generalized Linear Models and Extensions

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

Lecture 9 Endogenous Growth Consumption and Savings. Noah Williams

Scalable Gaussian process models on matrices and tensors

INSTITUTIONS AND THE LONG-RUN IMPACT OF EARLY DEVELOPMENT

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

PCA and admixture models

Linear Algebra (Review) Volker Tresp 2018

Probabilistic Latent Semantic Analysis

Lecture 6: Methods for high-dimensional problems

WP/18/117 Sharp Instrument: A Stab at Identifying the Causes of Economic Growth

Nonparametric Bayesian Matrix Factorization for Assortative Networks

!" #$$% & ' ' () ) * ) )) ' + ( ) + ) +( ), - ). & " '" ) / ) ' ' (' + 0 ) ' " ' ) () ( ( ' ) ' 1)

CS 143 Linear Algebra Review

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Linear Equations and Matrix

STA 216, GLM, Lecture 16. October 29, 2007

Model averaging and dimension selection for the singular value decomposition

Longitudinal Modeling with Logistic Regression

Introduction to Machine Learning

Default Priors and Effcient Posterior Computation in Bayesian

36-720: The Rasch Model

International Investment Positions and Exchange Rate Dynamics: A Dynamic Panel Analysis

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

14 Multiple Linear Regression

Maximum Likelihood Estimation

Classification. Chapter Introduction. 6.2 The Bayes classifier

STAT 510 Final Exam Spring 2015

STAT 525 Fall Final exam. Tuesday December 14, 2010

Homework 6. Due: 10am Thursday 11/30/17

Correspondence Analysis

Generalized logit models for nominal multinomial responses. Local odds ratios

Manning & Schuetze, FSNLP (c) 1999,2000

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

4 Bias-Variance for Ridge Regression (24 points)

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

A Hierarchical Perspective on Lee-Carter Models

High-Throughput Sequencing Course

Figure 36: Respiratory infection versus time for the first 49 children.

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

One-stage dose-response meta-analysis

For Adam Smith, the secret to the wealth of nations was related

Fractional Imputation in Survey Sampling: A Comparative Review

Linear Regression With Special Variables

Lecture 18: Simple Linear Regression

Properties of Matrices and Operations on Matrices

Review of Linear Algebra

Singular Value Decomposition and Principal Component Analysis (PCA) I

Modeling heterogeneity in random graphs

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

Chapter 1. Modeling Basics

The System of Linear Equations. Direct Methods. Xiaozhou Li.

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann

Non-linear panel data modeling

Multinomial Logistic Regression Models

Strassen-like algorithms for symmetric tensor contractions

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Linear Methods in Data Mining

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Linear Regression (9/11/13)

Multi-level Models: Idea

14 Singular Value Decomposition

Transcription:

Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington

Outline Part 1: Multiplicative factor models for network data Relational data Eigenvalue decomposition model Example: Adolescent social network Singular value decomposition model Example: International conflicts Part 2: Extension to multiway data Multiway data Multiway latent factor models Example: Cold war cooperation and conflict Summary and future work

Relational data Relational data: consist of a set of units or nodes A, and a set of measurements Y {y i,j } specific to pairs of nodes (i, j) A A. Examples: International relations A =countries, y i,j = indicator of a dispute initiated by i with target j Needle-sharing network A=IV drug users, y i,j = needle-sharing activity between i and j Protein-protein interactions A=proteins, y i,j = the interaction between i and j Document analysis A 1 =words, A 2 =documents, y i,j = wordcount of i in document j

Adolescent health social network Data on 82 12th graders from a single high school: 54 boys, 28 girls ˆPr(y i,j = 1 same sex) = 0.077 ˆPr(y i,j = 1 opposite sex) = 0.056

Inferential goals in the regression framework 0 Y = B @ y i,j measures i j, y 1,1 y 1,2 y 1,3 NA y 1,5 y 2,1 y 2,2 y 2,3 y 2,4 y 2,5 y 3,1 NA y 3,3 y 3,4 NA y 4,1 y 4,2 y 4,3 y 4,4 y 4,5..... x i,j is a vector of explanatory variables. 1 C A 0 X = B @ Consider a basic (generalized) linear model A model can provide y i,j β x i,j + e i,j x 1,1 x 1,2 x 1,3 x 1,4 x 1,5 x 2,1 x 2,2 x 2,3 x 2,4 x 2,5 x 3,1 x 3,2 x 3,3 x 3,4 x 3,5 x 4,1 x 4,2 x 4,3 x 4,4 x 4,5 a measure of the association between X and Y: ˆβ, se( ˆβ) imputations of missing observations: p(y 1,4 Y, X) a probabilistic description of network features: g(ỹ), Ỹ p(ỹ Y, X)..... 1 C A

Model fit glm(formula = y ~ x, family = binomial(link = "logit")) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -2.8332 0.1123-25.24 <2e-16 *** x 0.3471 0.1428 2.43 0.0151 * This result says that a model with preferential association is a better description of the data than an i.i.d. binary model. 0.0 1.0 2.0 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 log odds ratio 1.0 0.5 0.0 0.5 1.0 log odds ratio

Nodal heterogeneity and independence assumptions

Model lack of fit Neither of these models do well in terms of representing other features of the data - for example, transitivity: t(y) = y i,j y j,k y k,i i<j<k 0.000 0.010 0.020 0 100 300 500 transitive triples 0.000 0.010 0 100 300 500 transitive triples

Latent variable models Deviations from ordinary regression models can be represented as y i,j β x i,j + e i,j A simple latent variable model might include additive node effects: e i,j = u i + u j + ɛ i,j y i,j β x i,j + u i + u j + ɛ i,j {u 1,..., u n } represent across-node heterogeneity that is additive on the scale of the regressors. Inclusion of these effects in the model can dramatically improve within-sample model fit (measured by R 2, likelihood ratio, BIC, etc.); out-of-sample predictive performance (measured by cross-validation). But this model only captures heterogeneity of outdegree/indegree, and can t represent more complicated structure, such as clustering, transitivity, etc.

Fit of additive effects model 0.0 1.0 2.0 1.0 0.5 0.0 0.5 1.0 log odds ratio 0.000 0.004 0.008 0.012 0 100 300 500 transitive triples

Latent variable models via exchangeability X represents known information about the nodes E represents deviations from the regression model In this case we might be willing to use a model for E in which {e i,j } d = {e gi,gj } for all permutations g. This is a type of exchangeability for arrays, sometimes called weak exchangeability. Theorem (Aldous, Hoover): Let {e i,j } be a weakly exchangeable array. Then {e i,j } = d {ei,j }, where e i,j = f (u i, u j, ɛ i,j ) and {u i }, {ɛ i,j } are all independent random variables.

The eigenvalue decomposition model E = M + E M represents systematic patterns and E represents noise. Every symmetric M has a representation of the form M = UΛU where U is an n n matrix with orthonormal columns Λ is an n n diagonal matrix, with elements {λ 1,..., λ n } Many data analysis procedures for symmetric matrix-valued data Y are related to this decomposition. Given a model of the form Y = M + E where E is independent noise, the ED provides Interpretation:y i,j = u i Λu j + ɛ i,j, u i and u j are the ith, jth rows of U Estimation: ˆMR = Û[,1:R]ˆΛ [1:R,1:R] Û [,1:R] if M is assumed to be of rank R.

Generalized bilinear regression y i,j β x i,j + u iλu j + ɛ i,j Interpretation: Think of {u 1,..., u n } as vectors of latent nodal attributes: u iλu j = R λ k u i,k u j,k r=1 In general, a latent variable model relating X to Y is g(y i,j ) = β x i,j + u iλv j + ɛ i,j and the parameters can be estimated using a rank-likelihood or multinomial probit. Alternatively, parametric models include If y i,j is binary, If y i,j is count data, If y i,j is continuous, log odds (y i,j = 1) = β x i,j + u i Λu j + ɛ i,j log E[y i,j ] = β x i,j + u i Λu j + ɛ i,j E[y i,j ] = β x i,j + u i Λu j + ɛ i,j Estimation: Given Λ, the predictor is linear in U. This bilinear structure can be exploited (EM, Gibbs sampling).

Eigenmodel fit Parameters this model can be fit with the eigenmodel package in R: eigenmodel_mcmc(y,x,r=3) 0.0 0.5 1.0 1.5 2.0 2.5 1.0 0.5 0.0 0.5 1.0 log odds ratio 0.000 0.002 0.004 0.006 0 100 300 500 transitive triples The latent factors are able to represent the network transitivity.

Underlying structure

Missing variables

Missing variables The eigenmodel, without having explicit race information, captures a large degree of the racial homophily in friendship: 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 1.5 0.5 0.5 1.5 additive model log odds ratio 1.5 0.5 0.5 1.5 eigenmodel log odds ratio

Multiway data Data on pairs is sometimes called two-way data. More generally, data on triples, quadruples, etc. is called multi-way data. A 1, A 2,..., A p represent classes of objects. y i1,i 2,...,i p is the measurement specific to i 1 A 1,..., i p A p. Examples: International relations A 1 = A 2 = countries, A 3 = time, y i,j,t = indicator of a dispute between i and j in year t. Social networks A 1 = A 2 = individuals, A 3 = information sources, y i,j,k = source k s report of the relationship between i and j. Document analysis A 1 = A 2 = words, A 3 = documents, y i,j,k = co-occurrence of words i and j in document k.

Factor models for multiway data Recall the decomposition of a two-way array of rank R: m i,j = u iλu j = Now generalize to a three-way array: m i,j,k = R u i,r u j,r λ r r=1 R u i,r u j,r w k,r λ r r=1 {u 1,..., u n } represents variation among the nodes; {w 1,..., w m } represents variation across the networks. Consider the kth slab of M, which is an n n matrix: m i,j,k = = R u i,r u j,r w k,r λ r r=1 R u i,r u j,r λ k,r = u iλ k u j r=1 where λ k,r replaces w k,r λ k

Cold war cooperation and conflict data βtrade 0.2 0.0 0.2 0.4 βgdp 0.2 0.0 0.2 0.4 βdistance 0.2 0.0 0.2 0.4 1950 1960 1970 1980 1950 1960 1970 1980 1950 1960 1970 1980 λ1 0.5 0.0 0.5 1.0 λ2 0.5 0.0 0.5 1.0 λ3 0.5 0.0 0.5 1.0 1950 1960 1970 1980 1950 1960 1970 1980 1950 1960 1970 1980

Cold war cooperation and conflict data u2 1950 1960 1970 USR USR USR SYR CUB IRQ IRQ SYR FRN EGY CAM FRN EGY POR IND AFG AFG TUN CAM ALG EGY ZAM ECU NEW PHI TAW PRK IND AUL ICE PRK IND GUA RVN GRC DOM HAI MYA ROK HUN RUM DRV NEW AUL RVN PHI PRK TUR BUL ARG PER ROK BEL ETH CHL AUS JPN HUN DRV TUR LAO SOM NEP ROK LAO VEN PAK CHN THI PAK GUY INS CHN THI PAK CHN YUG SAL MOR IRN IRN ZIM CAN SEN USA JOR USA GUI JOR UKG ITA USA ISR UKGISR NOR NTH JOR UKG SAF ISR 1985 USR ANG CUB IRQ SYR CAM FRN LEB EGY AFG TUN IND NEW PHI PRK KUW BOT GRCCYP DRV LAO HON MLT PAN ROK TUR LBR BEL COS DEN ETH IRE BFO MAL MLI NIC SAU SOM SRI CZE SPN GDR MYA THI PAK CHN SAL IRN USA GFR LIB UKG ITA ISR NTH SAF u3 u1 USA 1950 ROK AUL NEW PHI UKG TUR ISR PAK HUN CANYUG TAW DOM BUL FRN GRC HAI RUM ECU MYA EGY IND PER AFG JOR SYRPRK CHN USR u1 USA 1960 NOR JPN PAK UKG TUR ROK RVN THI ISR NTH LAO ITA BEL NEP HUN ETH MOR SOM CHL ARG FRNTUN EGY IND AFG JOR ICE AUS IRN IRQ PRK CHN DRV CUB INS USR CAM USA u1 1970 UKG ROK RVN AULTHI ISR NEW LAO ZAM PHI IND CAM PAK SAF VEN GUY SEN GUA ALG GUI PORSAL EGY JOR IRN IRQ ZIM SYRPRK CHN DRV USR USA u1 1985 PAK UKG TUR KUW SAU ROK THI ISR MAL GFR HON NTH BEL SAF ITA NEW LAO MLT PAN LBR COS DEN ETH IRE BFO FRN GRC SOM MLI SRI LIB LEB TUN SAL SPN CYP MYA EGY PHI IND AFGBOTCZE GDR ANG IRN NIC IRQ SYRPRK CHN DRV CUB USR CAM u3 u1 USA 1950 PAK UKG TUR ISR ROK AUL HUN NEW CANYUGDOM BUL HAI RUM TAW MYA GRC PER ECU EGY FRN JOR PHI IND AFG CHN PRK SYR USR u1 USA 1960 NOR PAK UKG TUR JPN ISR NTH ROK RVN ITA THI LAO BEL HUN MOR NEP SOM CHL ARG ETH TUN EGY FRN JOR ICE IND AFG IRN CHN AUS IRQ INS DRV PRK CUB USR CAM u1 USA 1970 UKG PAK ISR SAF ROK RVN THI LAO AUL NEW ZAM GUI SEN GUY VEN SAL GUA ALG EGY POR JORZIM PHI IND IRN CHN IRQ DRV PRK SYR CAM USR u1 USA 1985 PAK UKG TUR KUW SAU ISR NTH SAF GFR MAL HON ITA THI ROK LAO BEL DEN COSNEW BFO ETH IRE MLT LBR PAN MLI LIB SAL SOM SRI GRC LEB SPN MYA CYP TUN EGY FRN CZE IND BOT PHI AFG IRN CHN GDR NIC IRQ ANG DRV PRK SYR CUB USR CAM u2 u2 u2 u2

Summary and future directions Summary: Latent factor models are a natural way to represent patterns in relational or array-structured data. The latent factor structure can be incorporated into a variety of model forms. Model-based methods Future Directions: give parameter estimates; accommodate missing data; provide predictions; are easy to extend. Dynamic network inference Generalization to multi-level models

The SVD model for asymmetric data E = M + E M represents systematic patterns and E represents noise. Every M has a representation of the form M = UDV where, in the case m n, Recall, U is an m n matrix with orthonormal columns; V is an n n matrix with orthonormal columns; D is an n n diagonal matrix, with diagonal elements {d 1,..., d n } typically taken to be a decreasing sequence of non-negative numbers. The squared elements of the diagonal of D are the eigenvalues of M M and the columns of V are the corresponding eigenvectors. The matrix U can be obtained from the first n eigenvectors of MM. The number of non-zero elements of D is the rank of M. Writing the row vectors as {u 1,..., u m }, {v 1,..., v n }, m i,j = u i Dv j.

Data analysis with the singular value decomposition Many data analysis procedures for matrix-valued data Y are related to the SVD. Given a model of the form Y = M + E where E is independent noise, the SVD provides Interpretation:y i,j = u i Dv j + ɛ i,j, u i and v j are the ith, jth rows of U, V Estimation: ˆMR = Û[,1:R] ˆD [1:R,1:R] ˆV [,1:R] if M is assumed to be of rank R. Applications: biplots (Gabriel 1971, Gower and Hand 1996) reduced-rank interaction models (Gabriel 1978, 1998) analysis of relational data (Harshman et al., 1982) Factor analysis, image processing, data reduction,... Notes: How to select R? Given R, is ˆM R a good estimator? (E[Y Y] = M M + mσ 2 I ) Work on model-based factor analysis: Rajan and Rayner (1997), Minka (2000), Lopes and West (2004).

International conflict network, 1990-2000 11 years of international relations data (Mike Ward and Xun Cao) y i,j = indicator of a militarized disputes initiated by i with target j; x i,j an 8-dimensional covariate vector containing an intercept and 1. population initiator 2. population target 3. polity score initiator 4. polity score target 5. polity score initiator polity score target 6. log distance 7. number of shared intergovernmental organizations Model: y i,j are independent binary random variables with log-odds log odds(y i,j = 1) = β x i,j + u idv j + ɛ i,j

International conflict network: parameter estimates log distance polity initiator intergov org polity target polity interaction log pop target log pop initiator 3 2 1 0 1 regression coefficient

International conflict network: description of latent variation AFG ANG ARG AUL BAH BNG BOT BUI CAN CAO CHA CHN COS CUB CYP DOM DRC EGY FRN GHA GRC GUI HON IND INS IRN IRQ ISR ITA JOR JPN KEN LBR LES LIB MAL MYA NIC NIG NIR NTH OMA PAK PHI PRK QAT ROK RWA SAF SAL SAU SEN SIE SRI SUD SWA SYR TAW TAZ THI TOG TUR UAE UGA UKG USA VEN AFG ALB ANG ARG AUL BAH BEL BEN BNG BUI CAM CAN CDI CHA CHL CHN COL CON COS CUB CYP DRC EGY FRN GHA GNB GRC GUI GUY HAI HON IND INS IRN IRQ ISR ITA JOR JPN LBR LES LIB MAA MLI MON MOR MYA MZM NAM NIC NIG NIR NTH OMA PAK PHI PNG PRK QAT ROK RWA SAF SAL SAU SEN SIE SIN SPN SUD SYR TAW TAZ THI TOG TRI TUR UAE UGA UKG USA VEN YEM ZAM ZIM

International conflict network: prediction experiment # of links found 0 20 40 60 80 100 K=0 K=1 K=2 K=3 0 20 40 60 80 100 # of pairs checked

Factor models for multiway data Recall the decomposition of a two-way array of rank R: m i,j = u idv j = Now generalize to a three-way array: m i,j,k = R u i,r v j,r d r r=1 R u i,r v j,r w k,r d r {u 1,..., u n1 } represents variation along the 1st dimension {v 1,..., v n2 } represents variation along the 2nd dimension {w 1,..., w n3 } represents variation along the 3rd dimension Consider the kth slab of M, which is an n 1 n 2 matrix: R m i,j,k = u i,r v j,r w k,r d r = r=1 r=1 R u i,r v j,r d k,r = u id k v j r=1 where d k,r = w k,r d k