Latent Factor Models for Relational Data

Size: px
Start display at page:

Download "Latent Factor Models for Relational Data"

Transcription

1 Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington

2 Outline Part 1: Multiplicative factor models for network data Relational data Models via exchangeability Models via matrix representations Examples: conflict data, protein interaction data Part 2: Extension to multiway data Multiway data Multiway latent factor models Example: Cold war cooperation and conflict Summary and future work

3 Relational data Relational data: consist of a set of units or nodes A, and a set of measurements Y {y i,j } specific to pairs of nodes (i, j) A A. Examples: International relations A =countries, y i,j = indicator of a dispute initiated by i with target j. Needle-sharing network A=IV drug users, y i,j = needle-sharing activity between i and j. Protein-protein interactions A=proteins, y i,j = the interaction between i and j. Document analysis A 1 =words, A 2 =documents, y i,j = wordcount of i in document j.

4 Inferential goals in the regression framework 0 Y = y i,j measures i j, y 1,1 y 1,2 y 1,3 NA y 1,5 y 2,1 y 2,2 y 2,3 y 2,4 y 2,5 y 3,1 NA y 3,3 y 3,4 NA y 4,1 y 4,2 y 4,3 y 4,4 y 4, x i,j is a vector of explanatory variables. 1 C A 0 X = Consider a basic (generalized) linear model A model can provide y i,j β x i,j + e i,j a measure of the association between X and Y: x 1,1 x 1,2 x 1,3 x 1,4 x 1,5 x 2,1 x 2,2 x 2,3 x 2,4 x 2,5 x 3,1 x 3,2 x 3,3 x 3,4 x 3,5 x 4,1 x 4,2 x 4,3 x 4,4 x 4,5... ˆβ, se( ˆβ) predictions of missing or future observations: p(y 1,4 Y, X).. 1 C A

5 How might node heterogeneity affect network structure?

6 Latent variable models Deviations from ordinary regression models can be represented as y i,j β x i,j + e i,j A simple latent variable model might include row and column effects: e i,j = u i + v j + ɛ i,j y i,j β x i,j + u i + v j + ɛ i,j u i and v j induce across-node heterogeneity that is additive on the scale of the regressors. Inclusion of these effects in the model can dramatically improve within-sample model fit (measured by R 2, likelihood ratio, BIC, etc.); out-of-sample predictive performance (measured by cross-validation). But this model only captures heterogeneity of outdegree/indegree, and can t represent more complicated structure, such as clustering, transitivity, etc.

7 Latent variable models via exchangeability X represents known information about the nodes E represents deviations from the regression model In this case we might be willing to use a model for E in which {e i,j } d = {e gi,hj } for all permutations g and h. This is a type of exchangeability for arrays, sometimes called weak exchangeability. Theorem (Aldous, Hoover): Let {e i,j } be a weakly exchangeable array. Then {e i,j } = d {ei,j }, where e i,j = f (µ, u i, v j, ɛ i,j ) and µ, {u i }, {v j }, {ɛ i,j } are all independent random variables.

8 The singular value decomposition model E = M + E M represents systematic patterns and E represents noise. Every M has a representation of the form M = UDV where, in the case m n, Recall, U is an m n matrix with orthonormal columns; V is an n n matrix with orthonormal columns; D is an n n diagonal matrix, with diagonal elements {d 1,..., d n } typically taken to be a decreasing sequence of non-negative numbers. The squared elements of the diagonal of D are the eigenvalues of M M and the columns of V are the corresponding eigenvectors. The matrix U can be obtained from the first n eigenvectors of MM. The number of non-zero elements of D is the rank of M. Writing the row vectors as {u 1,..., u m }, {v 1,..., v n }, m i,j = u i Dv j.

9 Data analysis with the singular value decomposition Many data analysis procedures for matrix-valued data Y are related to the SVD. Given a model of the form Y = M + E where E is independent noise, the SVD provides Interpretation:y i,j = u i Dv j + ɛ i,j, u i and v j are the ith, jth rows of U, Estimation: ˆMK = Û[,1:K] ˆD [1:K,1:K] ˆV [,1:K] if M is assumed to be of rank K. Applications: biplots (Gabriel 1971, Gower and Hand 1996) reduced-rank interaction models (Gabriel 1978, 1998) analysis of relational data (Harshman et al., 1982) Factor analysis, image processing, data reduction,... Notes: How to select K? Given K, is ˆM K a good estimator? (E[Y Y] = M M + mσ 2 I ) Work on model-based factor analysis: Rajan and Rayner (1997), Minka (2000), Lopes and West (2004).

10 Generalized bilinear regression y i,j β x i,j + u idv j + ɛ i,j Interpretation: Think of {u 1,..., u m }, {v 1,..., v n } as vectors of latent nodal attributes: K u idv j = d k u i,k v j,k k=1 In general, a latent variable model relating X to Y is g(y i,j ) = β x i,j + u idv j + ɛ i,j and the parameters can be estimated using a rank-likelihood or multinomial probit. Alternatively, parametric models include If y i,j is binary, If y i,j is count data, If y i,j is continuous, log odds (y i,j = 1) = β x i,j + u i Dv j + ɛ i,j log E[y i,j ] = β x i,j + u i Dv j + ɛ i,j E[y i,j ] = β x i,j + u i Dv j + ɛ i,j Estimation: Given D, V, the predictor is linear in U. This bilinear structure can be exploited (EM, Gibbs sampling, variational methods).

11 International conflict network, years of international relations data (Mike Ward and Xun Cao) y i,j = indicator of a militarized disputes initiated by i with target j; x i,j an 8-dimensional covariate vector containing an intercept and 1. population initiator 2. population target 3. polity score initiator 4. polity score target 5. polity score initiator polity score target 6. log distance 7. number of shared intergovernmental organizations Model: y i,j are independent binary random variables with log-odds log odds(y i,j = 1) = β x i,j + u idv j + ɛ i,j

12 International conflict network: parameter estimates log distance polity initiator intergov org polity target polity interaction log pop target log pop initiator regression coefficient

13 International conflict network: description of latent variation AFG ANG ARG AUL BAH BNG BOT BUI CAN CAO CHA CHN COS CUB CYP DOM DRC EGY FRN GHA GRC GUI HON IND INS IRN IRQ ISR ITA JOR JPN KEN LBR LES LIB MAL MYA NIC NIG NIR NTH OMA PAK PHI PRK QAT ROK RWA SAF SAL SAU SEN SIE SRI SUD SWA SYR TAW TAZ THI TOG TUR UAE UGA UKG USA VEN AFG ALB ANG ARG AUL BAH BEL BEN BNG BUI CAM CAN CDI CHA CHL CHN COL CON COS CUB CYP DRC EGY FRN GHA GNB GRC GUI GUY HAI HON IND INS IRN IRQ ISR ITA JOR JPN LBR LES LIB MAA MLI MON MOR MYA MZM NAM NIC NIG NIR NTH OMA PAK PHI PNG PRK QAT ROK RWA SAF SAL SAU SEN SIE SIN SPN SUD SYR TAW TAZ THI TOG TRI TUR UAE UGA UKG USA VEN YEM ZAM ZIM

14 International conflict network: prediction experiment # of links found K=0 K=1 K=2 K= # of pairs checked

15 Analyzing undirected data In many applications y i,j = y j,i by design, and so (y i,j = y j,i ) β x i,j + e i,j and E = {e i,j } is a symmetric array. How should E be modeled? Modeling via exchangeability: Let E be a symmetric array (with an undefined diagonal), such that {e i,j } d = {e gi,gj }. Then {e i,j } d = {e i,j }, where e i,j = f (µ, u i, u j, ɛ i,j ), µ, {u i }, {ɛ i,j } are all independent and f (, u i, u j, ) = f (, u j, u i, ). Modeling via matrix decomposition: Write E = M + E, with all matrices symmetric. All such M have an eigenvalue decomposition This suggests a model of the form M = UΛU m i,j = u iλu j (y i,j = y j,i ) β x i,j + u iλu j + ɛ i,j

16 Protein-protein interaction network Interactions among 270 proteins in E. coli (Butland, 2005). 1 Data: ( 270 i<j y i,j ) Model: log odds(y i,j = 1) = µ + u i Λu j + ɛ i,j

17 Protein-protein interaction network Positive Negative acca accc accd acpp aidb alas b1731 b2097 b2341 b2342 b2496 bola cada cbpa clpa clpp clpx cobb cspa cspb cspc cspd cspe dead dnaa dnab dnae dnaj dnak dnaq eno faba fabi fabz fba fdhd flda ftse ftsz gada gadb gida grea greb grpe gyra gyrb hepa hfq hima hns holb holc hold hole hsca hsdm hsdr hslu hslv hupa hupb hypc hypd ibpa infc iscs iscu kdsa ldcc lig lpxd lyss lysu malt metk mopa mopb mreb mukb muke mura narg narh narj nary ndk nfi nusa nusg panc parc pept pflb phes phet pnp pola ppa prsa pssa pstb purc pyka recb recd rece recj recq rfad rfbc rhlb rne rnha rnpa rpld rpoa rpob rpoc rpod rpoh rpon rpos rpoz seca secb selb smpb spot srmb ssb suca sucb tdcd tgt thdf topa topb tpia tsf tufa tufb usg uvrc vacb yacg yacl ybcj ybdq ybez ybjx ycby ycea ycec ycff ycil yeaz yedo yfcb yfgb yfid yfif ygdp ygfa yggh ygjd yhbu yhby yhhp yien yjef yjfq ylea ynhc ynhd ynhe accb aas fabb plsb acps fabf fabg glmu ybgc yiid ybbn dnan ydhd rho rplb rpli rplv rpsc rpsb rpsg narz clpb cspg rpsm rplc rple rplm rpsa rpsd rpse rpla rplx rpsj rpst cspi ygif rpls rplu rpsf rpsh rpsi rpsn rpso rpsp rpsr dnac dnax hola reca entb sapa b1742 ffh rplf gcpe ubix fusa cafa hrpa lon tig hlpa himd b1410 b1808 pria malp hyfg hyce ibpb rpll gaty mukf nei infb usha sbcb rplo rplt b2372 qor manx ycbb rpmb recg rplq yhir rplr yibl rplw yhbv fucu yabb rpsl

18 Protein-protein interaction network rpoa greb b2372 usg infb grea rpoc manx rpos rpon pola b1731 hepa nusa yacl rpod nusg rpob rpoz RNA polymerase Negative Positive

19 Protein-protein interaction network: prediction experiment # of links found # of pairs checked

20 Multiway data Data on pairs is sometimes called two-way data. More generally, data on triples, quadruples, etc. is called multi-way data. A 1, A 2,..., A p represent classes of objects. y i1,i 2,...,i p is the measurement specific to i 1 A 1,..., i p A p. Examples: International relations A 1 = A 2 = countries, A 3 = time, y i,j,t = indicator of a dispute between i and j in year t. Social networks A 1 = A 2 = individuals, A 3 = information sources, y i,j,k = source k s report of the relationship between i and j. Document analysis A 1 = A 2 = words, A 3 = documents, y i,j,k = co-occurrence of words i and j in document k.

21 Factor models for multiway data Recall the decomposition of a two-way array of rank R: m i,j = u idv j = Now generalize to a three-way array: m i,j,k = R u i,r v j,r d r r=1 R u i,r v j,r w k,r d r {u 1,..., u n1 } represents variation along the 1st dimension {v 1,..., v n2 } represents variation along the 2nd dimension {w 1,..., w n3 } represents variation along the 3rd dimension Consider the kth slab of M, which is an n 1 n 2 matrix: R m i,j,k = u i,r v j,r w k,r d r = r=1 r=1 R u i,r v j,r d k,r = u id k v j r=1 where d k,r = w k,r d k

22 Cold war cooperation and conflict data βtrade βgdp βdistance λ λ λ

23 Cold war cooperation and conflict data u USR USR USR SYR CUB IRQ IRQ SYR FRN EGY CAM FRN EGY POR IND AFG AFG TUN CAM ALG EGY ZAM ECU NEW PHI TAW PRK IND AUL ICE PRK IND GUA RVN GRC DOM HAI MYA ROK HUN RUM DRV NEW AUL RVN PHI PRK TUR BUL ARG PER ROK BEL ETH CHL AUS JPN HUN DRV TUR LAO SOM NEP ROK LAO VEN PAK CHN THI PAK GUY INS CHN THI PAK CHN YUG SAL MOR IRN IRN ZIM CAN SEN USA JOR USA GUI JOR UKG ITA USA ISR UKGISR NOR NTH JOR UKG SAF ISR 1985 USR ANG CUB IRQ SYR CAM FRN LEB EGY AFG TUN IND NEW PHI PRK KUW BOT GRCCYP DRV LAO HON MLT PAN ROK TUR LBR BEL COS DEN ETH IRE BFO MAL MLI NIC SAU SOM SRI CZE SPN GDR MYA THI PAK CHN SAL IRN USA GFR LIB UKG ITA ISR NTH SAF u3 u1 USA 1950 ROK AUL NEW PHI UKG TUR ISR PAK HUN CANYUG TAW DOM BUL FRN GRC HAI RUM ECU MYA EGY IND PER AFG JOR SYRPRK CHN USR u1 USA 1960 NOR JPN PAK UKG TUR ROK RVN THI ISR NTH LAO ITA BEL NEP HUN ETH MOR SOM CHL ARG FRNTUN EGY IND AFG JOR ICE AUS IRN IRQ PRK CHN DRV CUB INS USR CAM USA u UKG ROK RVN AULTHI ISR NEW LAO ZAM PHI IND CAM PAK SAF VEN GUY SEN GUA ALG GUI PORSAL EGY JOR IRN IRQ ZIM SYRPRK CHN DRV USR USA u PAK UKG TUR KUW SAU ROK THI ISR MAL GFR HON NTH BEL SAF ITA NEW LAO MLT PAN LBR COS DEN ETH IRE BFO FRN GRC SOM MLI SRI LIB LEB TUN SAL SPN CYP MYA EGY PHI IND AFGBOTCZE GDR ANG IRN NIC IRQ SYRPRK CHN DRV CUB USR CAM u3 u1 USA 1950 PAK UKG TUR ISR ROK AUL HUN NEW CANYUGDOM BUL HAI RUM TAW MYA GRC PER ECU EGY FRN JOR PHI IND AFG CHN PRK SYR USR u1 USA 1960 NOR PAK UKG TUR JPN ISR NTH ROK RVN ITA THI LAO BEL HUN MOR NEP SOM CHL ARG ETH TUN EGY FRN JOR ICE IND AFG IRN CHN AUS IRQ INS DRV PRK CUB USR CAM u1 USA 1970 UKG PAK ISR SAF ROK RVN THI LAO AUL NEW ZAM GUI SEN GUY VEN SAL GUA ALG EGY POR JORZIM PHI IND IRN CHN IRQ DRV PRK SYR CAM USR u1 USA 1985 PAK UKG TUR KUW SAU ISR NTH SAF GFR MAL HON ITA THI ROK LAO BEL DEN COSNEW BFO ETH IRE MLT LBR PAN MLI LIB SAL SOM SRI GRC LEB SPN MYA CYP TUN EGY FRN CZE IND BOT PHI AFG IRN CHN GDR NIC IRQ ANG DRV PRK SYR CUB USR CAM u2 u2 u2 u2

24 Summary and future directions Summary: Latent factor models are a natural way to represent patterns in relational or array-structured data. The latent factor structure can be incorporated into a variety of model forms. Model-based methods Future Directions: give parameter estimates; accommodate missing data; provide predictions; are easy to extend. Dynamic network inference Generalization to multi-level models

Latent Factor Models for Relational Data

Latent Factor Models for Relational Data Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington Outline Part 1: Multiplicative factor models for

More information

Latent Factor Models for Relational Data

Latent Factor Models for Relational Data Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington 1 Outline Part 1: Multiplicative factor models for

More information

Latent SVD Models for Relational Data

Latent SVD Models for Relational Data Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1 Outline Part 1: Multiplicative factor models for network

More information

Latent SVD Models for Relational Data

Latent SVD Models for Relational Data Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1 Outline 1 Relational data 2 Exchangeability for arrays

More information

Hierarchical models for multiway data

Hierarchical models for multiway data Hierarchical models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Array-valued data y i,j,k = jth variable of ith subject under condition k (psychometrics).

More information

Mean and covariance models for relational arrays

Mean and covariance models for relational arrays Mean and covariance models for relational arrays Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Models for multiway mean structure Models for

More information

Probability models for multiway data

Probability models for multiway data Probability models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Hierarchical models for multiway factors Deep interactions

More information

Higher order patterns via factor models

Higher order patterns via factor models 1/39 Higher order patterns via factor models 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/39 Conflict data Y

More information

Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data

Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data Peter D. Hoff 1 Technical Report no. 526 Department of Statistics University of Washington

More information

The International-Trade Network: Gravity Equations and Topological Properties

The International-Trade Network: Gravity Equations and Topological Properties The International-Trade Network: Gravity Equations and Topological Properties Giorgio Fagiolo Sant Anna School of Advanced Studies Laboratory of Economics and Management Piazza Martiri della Libertà 33

More information

Inferring Latent Preferences from Network Data

Inferring Latent Preferences from Network Data Inferring Latent Preferences from Network John S. Ahlquist 1 Arturas 2 1 UC San Diego GPS 2 NYU 14 November 2015 very early stages Methodological extend latent space models (Hoff et al 2002) to partial

More information

Clustering and blockmodeling

Clustering and blockmodeling ISEG Technical University of Lisbon Introductory Workshop to Network Analysis of Texts Clustering and blockmodeling Vladimir Batagelj University of Ljubljana Lisbon, Portugal: 2nd to 5th February 2004

More information

Hierarchical multilinear models for multiway data

Hierarchical multilinear models for multiway data Hierarchical multilinear models for multiway data Peter D. Hoff 1 Technical Report no. 564 Department of Statistics University of Washington Seattle, WA 98195-4322. December 28, 2009 1 Departments of Statistics

More information

Agriculture, Transportation and the Timing of Urbanization

Agriculture, Transportation and the Timing of Urbanization Agriculture, Transportation and the Timing of Urbanization Global Analysis at the Grid Cell Level Mesbah Motamed Raymond Florax William Masters Department of Agricultural Economics Purdue University SHaPE

More information

Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data

Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data Santiago Treviño III 1,2, Yudong Sun 1,2, Tim F. Cooper 3, Kevin E. Bassler 1,2 * 1 Department of Physics, University

More information

A re examination of the Columbian exchange: Agriculture and Economic Development in the Long Run

A re examination of the Columbian exchange: Agriculture and Economic Development in the Long Run Are examinationofthecolumbianexchange: AgricultureandEconomicDevelopmentintheLongRun AlfonsoDíezMinguela MªDoloresAñónHigón UniversitatdeValéncia UniversitatdeValéncia May2012 [PRELIMINARYRESEARCH,PLEASEDONOTCITE]

More information

Lecture 10 Optimal Growth Endogenous Growth. Noah Williams

Lecture 10 Optimal Growth Endogenous Growth. Noah Williams Lecture 10 Optimal Growth Endogenous Growth Noah Williams University of Wisconsin - Madison Economics 702 Spring 2018 Optimal Growth Path Recall we assume exogenous growth in population and productivity:

More information

Multilinear tensor regression for longitudinal relational data

Multilinear tensor regression for longitudinal relational data Multilinear tensor regression for longitudinal relational data Peter D. Hoff 1 Working Paper no. 149 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 November

More information

Supplemental Information. Exploiting Natural Fluctuations. to Identify Kinetic Mechanisms. in Sparsely Characterized Systems

Supplemental Information. Exploiting Natural Fluctuations. to Identify Kinetic Mechanisms. in Sparsely Characterized Systems Cell Systems, Volume Supplemental Information Exploiting Natural Fluctuations to Identify Kinetic Mechanisms in Sparsely Characterized Systems Andreas Hilfinger, Thomas M. Norman, and Johan Paulsson Exploiting

More information

Dyadic data analysis with amen

Dyadic data analysis with amen Dyadic data analysis with amen Peter D. Hoff May 23, 2017 Abstract Dyadic data on pairs of objects, such as relational or social network data, often exhibit strong statistical dependencies. Certain types

More information

Network Analysis with Pajek

Network Analysis with Pajek Network Analysis with Pajek Vladimir Batagelj University of Ljubljana Local Development PhD Program version: September 10, 2009 / 07 : 37 V. Batagelj: Network Analysis with Pajek 2 Outline 1 Networks......................................

More information

Economic Growth: Lecture 1, Questions and Evidence

Economic Growth: Lecture 1, Questions and Evidence 14.452 Economic Growth: Lecture 1, Questions and Evidence Daron Acemoglu MIT October 23, 2018 Daron Acemoglu (MIT) Economic Growth Lecture 1 October 23, 2018 1 / 38 Cross-Country Income Differences Cross-Country

More information

Landlocked or Policy Locked?

Landlocked or Policy Locked? Landlocked or Policy Locked? How Services Trade Protection Deepens Economic Isolation Ingo Borchert University of Sussex Based on research with Batshur Gootiiz, Arti Grover and Aaditya Mattoo FERDI ITC

More information

Lecture Note 13 The Gains from International Trade: Empirical Evidence Using the Method of Instrumental Variables

Lecture Note 13 The Gains from International Trade: Empirical Evidence Using the Method of Instrumental Variables Lecture Note 13 The Gains from International Trade: Empirical Evidence Using the Method of Instrumental Variables David Autor, MIT and NBER 14.03/14.003 Microeconomic Theory and Public Policy, Fall 2016

More information

ECON 581. The Solow Growth Model, Continued. Instructor: Dmytro Hryshko

ECON 581. The Solow Growth Model, Continued. Instructor: Dmytro Hryshko ECON 581. The Solow Growth Model, Continued Instructor: Dmytro Hryshko 1 / 38 The Solow model in continuous time Consider the following (difference) equation x(t + 1) x(t) = g(x(t)), where g( ) is some

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Equivariant and scale-free Tucker decomposition models

Equivariant and scale-free Tucker decomposition models Equivariant and scale-free Tucker decomposition models Peter David Hoff 1 Working Paper no. 142 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 December 31,

More information

Partial correlation coefficient between distance matrices as a new indicator of protein protein interactions

Partial correlation coefficient between distance matrices as a new indicator of protein protein interactions BIOINFORMATICS ORIGINAL PAPER Vol. 22 no. 20 2006, pages 2488 2492 doi:10.1093/bioinformatics/btl419 Sequence analysis Partial correlation coefficient between distance matrices as a new indicator of protein

More information

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018 Econometrics I KS Module 1: Bivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: March 12, 2018 Alexander Ahammer (JKU) Module 1: Bivariate

More information

IDENTIFYING MULTILATERAL DEPENDENCIES IN THE WORLD TRADE NETWORK

IDENTIFYING MULTILATERAL DEPENDENCIES IN THE WORLD TRADE NETWORK IDENTIFYING MULTILATERAL DEPENDENCIES IN THE WORLD TRADE NETWORK PETER R. HERMAN Abstract. When studying the formation of trade between two countries, traditional modeling has described this decision as

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Linear Equations and Matrix

Linear Equations and Matrix 1/60 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Gaussian Elimination 2/60 Alpha Go Linear algebra begins with a system of linear

More information

Modeling homophily and stochastic equivalence in symmetric relational data

Modeling homophily and stochastic equivalence in symmetric relational data Modeling homophily and stochastic equivalence in symmetric relational data Peter D. Hoff Departments of Statistics and Biostatistics University of Washington Seattle, WA 98195-4322. hoff@stat.washington.edu

More information

Lecture 9 Endogenous Growth Consumption and Savings. Noah Williams

Lecture 9 Endogenous Growth Consumption and Savings. Noah Williams Lecture 9 Endogenous Growth Consumption and Savings Noah Williams University of Wisconsin - Madison Economics 702/312 Optimal Balanced Growth Therefore we have capital per unit of effective labor in the

More information

Model averaging and dimension selection for the singular value decomposition

Model averaging and dimension selection for the singular value decomposition Model averaging and dimension selection for the singular value decomposition Peter D. Hoff December 9, 2005 Abstract Many multivariate data reduction techniques for an m n matrix Y involve the model Y

More information

INSTITUTIONS AND THE LONG-RUN IMPACT OF EARLY DEVELOPMENT

INSTITUTIONS AND THE LONG-RUN IMPACT OF EARLY DEVELOPMENT DEPARTMENT OF ECONOMICS ISSN 1441-5429 DISCUSSION PAPER 49/12 INSTITUTIONS AND THE LONG-RUN IMPACT OF EARLY DEVELOPMENT James B. Ang * Abstract We study the role of institutional development as a causal

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Dr Gerhard Roth COMP 40A Winter 05 Version Linear algebra Is an important area of mathematics It is the basis of computer vision Is very widely taught, and there are many resources

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Exponential Random Graph Models (ERGM)

Exponential Random Graph Models (ERGM) Exponential Random Graph Models (ERGM) SN & H Workshop, Duke University Zack W Almquist May 25, 2017 University of Minnesota 1 Preliminaries Statistical Models for Social Networks Classic Probability Models

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

Manning & Schuetze, FSNLP (c) 1999,2000

Manning & Schuetze, FSNLP (c) 1999,2000 558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),

More information

Sampling and incomplete network data

Sampling and incomplete network data 1/58 Sampling and incomplete network data 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/58 Network sampling methods It is sometimes difficult to obtain a

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

WP/18/117 Sharp Instrument: A Stab at Identifying the Causes of Economic Growth

WP/18/117 Sharp Instrument: A Stab at Identifying the Causes of Economic Growth WP/18/117 Sharp Instrument: A Stab at Identifying the Causes of Economic Growth By Reda Cherif, Fuad Hasanov, and Lichen Wang 2 2018 International Monetary Fund WP/18/117 IMF Working Paper Institute for

More information

arxiv:math/ v1 [math.st] 1 Sep 2006

arxiv:math/ v1 [math.st] 1 Sep 2006 Model averaging and dimension selection for the singular value decomposition arxiv:math/0609042v1 [math.st] 1 Sep 2006 Peter D. Hoff February 2, 2008 Abstract Many multivariate data analysis techniques

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Positive Definite Matrix

Positive Definite Matrix 1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

!" #$$% & ' ' () ) * ) )) ' + ( ) + ) +( ), - ). & " '" ) / ) ' ' (' + 0 ) ' " ' ) () ( ( ' ) ' 1)

! #$$% & ' ' () ) * ) )) ' + ( ) + ) +( ), - ). &  ' ) / ) ' ' (' + 0 ) '  ' ) () ( ( ' ) ' 1) !" #$$% & ' ' () ) * ) )) ' + ( ) + ) +( ), - ). & " '" ) () -)( / ) ' ' (' + 0 ) ' " ' ) () ( ( ' ) ' 1) )) ) 2') 3 45$" 467" 8" 4 %" 96$ & ' 4 )" 3)" ::" ( & ) ;: < ( ) ) =)+ ( " " " $8> " ') +? @ ::

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Scalable Gaussian process models on matrices and tensors

Scalable Gaussian process models on matrices and tensors Scalable Gaussian process models on matrices and tensors Alan Qi CS & Statistics Purdue University Joint work with F. Yan, Z. Xu, S. Zhe, and IBM Research! Models for graph and multiway data Model Algorithm

More information

International Investment Positions and Exchange Rate Dynamics: A Dynamic Panel Analysis

International Investment Positions and Exchange Rate Dynamics: A Dynamic Panel Analysis International Investment Positions and Exchange Rate Dynamics: A Dynamic Panel Analysis Michael Binder 1 Christian J. Offermanns 2 1 Frankfurt and Center for Financial Studies 2 Frankfurt Motivation Empirical

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

A Hierarchical Perspective on Lee-Carter Models

A Hierarchical Perspective on Lee-Carter Models A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Accurate proteome-wide protein quantification from high-resolution 15 N mass spectra

Accurate proteome-wide protein quantification from high-resolution 15 N mass spectra METHOD Open Access Accurate proteome-wide protein quantification from high-resolution 15 N mass spectra Zia Khan 1,2,3, Sasan Amini 2,4,5, Joshua S Bloom 2,4, Cristian Ruse 6, Amy A Caudy 2,7, Leonid Kruglyak

More information

Model averaging and dimension selection for the singular value decomposition

Model averaging and dimension selection for the singular value decomposition Model averaging and dimension selection for the singular value decomposition Peter D. Hoff Technical Report no. 494 Department of Statistics University of Washington January 10, 2006 Abstract Many multivariate

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Example Linear Algebra Competency Test

Example Linear Algebra Competency Test Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,

More information

Strassen-like algorithms for symmetric tensor contractions

Strassen-like algorithms for symmetric tensor contractions Strassen-like algorithms for symmetric tensor contractions Edgar Solomonik Theory Seminar University of Illinois at Urbana-Champaign September 18, 2017 1 / 28 Fast symmetric tensor contractions Outline

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

For Adam Smith, the secret to the wealth of nations was related

For Adam Smith, the secret to the wealth of nations was related The building blocks of economic complexity César A. Hidalgo 1 and Ricardo Hausmann a Center for International Development and Harvard Kennedy School, Harvard University, Cambridge, MA 02138 Edited by Partha

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Manning & Schuetze, FSNLP, (c)

Manning & Schuetze, FSNLP, (c) page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E.

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E. 3.3 Diagonalization Let A = 4. Then and are eigenvectors of A, with corresponding eigenvalues 2 and 6 respectively (check). This means 4 = 2, 4 = 6. 2 2 2 2 Thus 4 = 2 2 6 2 = 2 6 4 2 We have 4 = 2 0 0

More information

Appendix A: Matrices

Appendix A: Matrices Appendix A: Matrices A matrix is a rectangular array of numbers Such arrays have rows and columns The numbers of rows and columns are referred to as the dimensions of a matrix A matrix with, say, 5 rows

More information

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we

More information

Practical Linear Algebra: A Geometry Toolbox

Practical Linear Algebra: A Geometry Toolbox Practical Linear Algebra: A Geometry Toolbox Third edition Chapter 12: Gauss for Linear Systems Gerald Farin & Dianne Hansford CRC Press, Taylor & Francis Group, An A K Peters Book www.farinhansford.com/books/pla

More information

Cluster Analysis of Protein-protein Interaction Network of Mycobacterium tuberculosis during Host infection

Cluster Analysis of Protein-protein Interaction Network of Mycobacterium tuberculosis during Host infection Advances in Bioresearch Adv. Biores., Vol 6 (5) September 2015: 38-46 2015 Society of Education, India Print ISSN 0976-4585; Online ISSN 2277-1573 Journal s URL:http://www.soeagra.com/abr.html CODEN: ABRDC3

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=

More information

Correspondence Analysis

Correspondence Analysis Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,

More information

MP 472 Quantum Information and Computation

MP 472 Quantum Information and Computation MP 472 Quantum Information and Computation http://www.thphys.may.ie/staff/jvala/mp472.htm Outline Open quantum systems The density operator ensemble of quantum states general properties the reduced density

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008 Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008 Exam 2 will be held on Tuesday, April 8, 7-8pm in 117 MacMillan What will be covered The exam will cover material from the lectures

More information

BIOINFORMATICS Datamining #1

BIOINFORMATICS Datamining #1 BIOINFORMATICS Datamining #1 Mark Gerstein, Yale University gersteinlab.org/courses/452 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Large-scale Datamining Gene Expression Representing Data in

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

5 Linear Algebra and Inverse Problem

5 Linear Algebra and Inverse Problem 5 Linear Algebra and Inverse Problem 5.1 Introduction Direct problem ( Forward problem) is to find field quantities satisfying Governing equations, Boundary conditions, Initial conditions. The direct problem

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Roots of Autocracy. January 26, 2017

Roots of Autocracy. January 26, 2017 Roots of Autocracy Oded Galor Marc Klemp January 26, 2017 Abstract This research explores the origins of the variation in the prevalence and nature of political institutions across the globe. It advances

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

Modeling heterogeneity in random graphs

Modeling heterogeneity in random graphs Modeling heterogeneity in random graphs Catherine MATIAS CNRS, Laboratoire Statistique & Génome, Évry (Soon: Laboratoire de Probabilités et Modèles Aléatoires, Paris) http://stat.genopole.cnrs.fr/ cmatias

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

The System of Linear Equations. Direct Methods. Xiaozhou Li.

The System of Linear Equations. Direct Methods. Xiaozhou Li. 1/16 The Direct Methods xiaozhouli@uestc.edu.cn http://xiaozhouli.com School of Mathematical Sciences University of Electronic Science and Technology of China Chengdu, China Does the LU factorization always

More information

BIOINFORMATICS. An automated comparative analysis of 17 complete microbial genomes. Arvind K. Bansal

BIOINFORMATICS. An automated comparative analysis of 17 complete microbial genomes. Arvind K. Bansal BIOINFORMATICS Electronic edition http://www.bioinformatics.oupjournals.org VOLUME 15 NUMBER 11 NOVEMBER 1999 PAGES 900 908 An automated comparative analysis of 17 complete microbial genomes Arvind K.

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular form) Given: matrix C = (c i,j ) n,m i,j=1 ODE and num math: Linear algebra (N) [lectures] c phabala 2016 DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix

More information