Latent Factor Models for Relational Data

Size: px
Start display at page:

Download "Latent Factor Models for Relational Data"

Transcription

1 Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington 1

2 Outline Part 1: Multiplicative factor models for network data 1. Relational data 2. Models via exchangeability 3. Models via matrix representations 4. Examples: (a) International conflict data (b) Protein-protein interaction data Part 2: Rank selection with the SVD model 1. A Gibbs sampling scheme for rank estimation 2. A simulation study 2

3 Relational data Relational data: consist of a set of units or nodes A, and a set of measurements Y {y i,j } specific to pairs of nodes (i, j) A A. Examples: International relations A =countries, y i,j = indicator of a dispute initiated by i with target j. Needle-sharing network A=IV drug users, y i,j = needle-sharing activity between i and j. Protein-protein interactions A=proteins, y i,j = the interaction between i and j. Business locations A 1 =banks, A 2 =cities, y i,j = presence of an office of bank i in city j. 3

4 Inferential goals in the regression framework 0 Y = y i,j measures i j, y 1,1 y 1,2 y 1,3 NA y 1,5 y 2,1 y 2,2 y 2,3 y 2,4 y 2,5 y 3,1 NA y 3,3 y 3,4 NA y 4,1 y 4,2 y 4,3 y 4,4 y 4, x i,j is a vector of explanatory variables. 1 C A Consider a basic (generalized) linear model 0 X = x 1,1 x 1,2 x 1,3 x 1,4 x 1,5 x 2,1 x 2,2 x 2,3 x 2,4 x 2,5 x 3,1 x 3,2 x 3,3 x 3,4 x 3,5 x 4,1 x 4,2 x 4,3 x 4,4 x 4, C A y i,j β x i,j + ɛ i,j A model can provide a measure of the association between X and Y: ˆβ, se( ˆβ) predictions of missing or future observations: p(y 1,4 Y, X) 4

5 How might node heterogeneity affect inference? 5

6 Latent variable models Deviations from ordinary regression models can be represented as y i,j β x i,j + z i,j A simple latent variable model might include row and column effects: z i,j = u i + v j + ɛ i,j y i,j β x i,j + u i + v j + ɛ i,j u i and v j induce across-node heterogeneity that is additive on the scale of the regressors. Inclusion of these effects in the model can dramatically improve within-sample model fit (measured by R 2, likelihood ratio, BIC, etc. ); out-of-sample predictive performance (measured by cross-validation). But this model only captures heterogeneity of outdegree/indegree, and can t represent more complicated structure, such as clustering, transitivity, etc. 6

7 Latent variable models via exchangeability X represents known information about the nodes Z represents any additional patterns In this case we might be willing to use a model in which {z i,j } d = {z gi,hj } for all permutations g and h. This is a type of exchangeability for arrays, sometimes called weak exchangeability. Theorem (Aldous, Hoover): Let {z i,j } be a weakly exchangeable array. Then {z i,j } = d {zi,j }, where z i,j = f(µ, u i, v j, ɛ i,j ) and µ, {u i }, {v j }, {ɛ i,j } are all independent random variables. 7

8 The singular value decomposition model Z = M + E M represents systematic patterns in the effects and E represents noise. Every M has a representation of the form M = UDV where, in the case m n, Recall, U is an m n matrix with orthonormal columns; V is an n n matrix with orthonormal columns; D is an n n diagonal matrix, with elements {d 1,..., d n } typically taken to be a decreasing sequence of non-negative numbers. The squared elements of the diagonal of D are the eigenvalues of M M and the columns of V are the corresponding eigenvectors. The matrix U can be obtained from the first n eigenvectors of MM. The number of non-zero elements of D is the rank of M. Writing the row vectors as {u 1,..., u m }, {v 1,..., v n }, m i,j = u i Dv j. 8

9 Multiplicative effects Interpretation: y i,j β x i,j + u idv j + ɛ i,j Think of {u 1,..., u m }, {v 1,..., v n } as vectors of latent nodal attributes: u idv j = K d k u i,k v j,k k=1 In general, a latent variable model relating X to Y is For example, some potential models are g(e[y i,j ]) = β x i,j + u idv j + ɛ i,j If y i,j is binary, If y i,j is count data, If y i,j is continuous, log odds (y i,j = 1) = β x i,j + u idv j + ɛ i,j log E[y i,j ] = β x i,j + u idv j + ɛ i,j E[y i,j ] = β x i,j + u idv j + ɛ i,j Estimation: Given D, V, the predictor is linear in U. This bilinear structure can be exploited (EM, Gibbs sampling, variational methods). 9

10 International conflict network, years of international relations data (thanks to Mike Ward and Xun Cao) y i,j = indicator of a militarized disputes initiated by i with target j; x i,j an 8-dimensional covariate vector containing an intercept and 1. population initiator 2. population target 3. polity score initiator 4. polity score target 5. polity score initiator polity score target 6. log distance 7. number of shared intergovernmental organizations Model: y i,j are independent binary random variables with log-odds log odds(y i,j = 1) = β x i,j + u idv j + ɛ i,j 10

11 International conflict network: parameter estimates log distance polity initiator intergov org polity target polity interaction log pop target log pop initiator regression coefficient 11

12 International conflict network: description of latent variation AFG ANG ARG AUL BAH BNG BOT BUI CAN CAO CHA CHN COS CUB CYP DOM DRC EGY FRN GHA GRC GUI HON IND INS IRN IRQ ISR ITA JOR JPN KEN LBR LES LIB MAL MYA NIC NIG NIR NTH OMA PAK PHI PRK QAT ROK RWA SAF SAL SAU SEN SIE SRI SUD SWA SYR TAW TAZ THI TOG TUR UAE UGA UKG USA VEN AFG ALB ANG ARG AUL BAH BEL BEN BNG BUI CAM CAN CDI CHA CHL CHN COL CON COS CUB CYP DRC EGY FRN GHA GNB GRC GUI GUY HAI HON IND INS IRN IRQ ISR ITA JOR JPN LBR LES LIB MAA MLI MON MOR MYA MZM NAM NIC NIG NIR NTH OMA PAK PHI PNG PRK QAT ROK RWA SAF SAL SAU SEN SIE SIN SPN SUD SYR TAW TAZ THI TOG TRI TUR UAE UGA UKG USA VEN YEM ZAM ZIM 12

13 International conflict network: prediction experiment # of links found K=0 K=1 K=2 K= # of pairs checked 13

14 Model details: prior distribution for U Parameters to estimate include the diagonal matrix D and: U V K,m = {m K matrices with orthonormal columns} V V K,n = {n K matrices with orthonormal columns} A constructive definition for p(u): 1. z 1 uniform[m-sphere], set U [,1] = z 1 ; 2. z 2 uniform[(m 1)-sphere], set U [,2] = Null(U [,1] )z 2 ;. K. z K uniform [(m K + 1)-sphere], U [,K] = Null(U [,1],..., U [,K 1] )z K. The resulting distribution for U is the uniform (invariant) distribution on the Steifel manifold, and is exchangeable under row and column permutations. Prior conditional distributions: (U [,j] U [, j] ) d = Null(U [,1],..., U [,j 1], U [,j+1],..., U [,K] )z j, z j uniform [(m K + 1)-sphere] 14

15 Estimation details: posterior distribution for U UDV = d 1 U [,1] V [,1] + + d K U [,K] V [,K], so for any column j Z UDV = (Z U [, j] D [ j, j] V [, j]) d j U [,j] V [,j] R j d j U [,j] V [,j] Z UDV 2 = R j d j U [,j] V [,j] 2 = R j 2 2d j U [,j] R j V [,j] + d j U [,j] V [,j] 2 = R j 2 2d j U [,j] R j V [,j] + d 2 j Recall Z = UDV + E where E is normally distributed noise with variance 1/φ: p(z U, V, D, φ) = (2πφ) mn/2 exp{ 1 2 φ R j 2 + φd j U [,j] R j V [,j] 1 2 φd2 j} p(u [,j] Z, U [, j], V, D) p(u [,j] U [, j] ) exp{ φd j U [,j] R j V [,j] } (U [,j] U [, j], Z, V, D) d = Null(U [,1],..., U [,j 1], U [,j+1],..., U [,K] )z j N j z j, p(z j ) exp{z j µ}, µ = φd j N jr j V [,j] 15

16 The von Mises-Fisher distribution p(z µ) = c(κ) exp{z µ}, where κ = µ κ=0 κ=2 κ=4 16

17 Analyzing undirected data In many applications y i,j = y j,i by design, and so (y i,j = y j,i ) β x i,j + z i,j, and Z = {z i,j } is a symmetric array. How should Z be modeled? Modeling via exchangeability: Let Z be a symmetric array (with an undefined diagonal), such that {z i,j } = d {z gi,gj }. Then {z i,j } = d {zi,j }, where z i,j = f(µ, u i, u j, ɛ i,j ), µ, {u i }, {ɛ i,j } are independent random variables, and f(, u i, u j, ) = f(, u j, u i, ). Modeling via matrix decomposition: Write Z = M + E, with all matrices symmetric. All such M have an eigenvalue decomposition This suggests a model of the form M = UΛU m i,j = u iλu j (y i,j = y j,i ) β x i,j + u iλu j + ɛ i,j 17

18 Protein-protein interaction network Interactions among 270 proteins in E. coli (Butland, 2005). 1 Data: ( ) i<j y i,j 0.01 Model: log odds(y i,j = 1) = µ + u i Λu j + ɛ i,j (analysis by Ryan Bressler) 18

19 Protein-protein interaction network Positive Negative acca accc accd acpp aidb alas b1731 b2097 b2341 b2342 b2496 bola cada cbpa clpa clpp clpx cobb cspa cspb cspc cspd cspe dead dnaa dnab dnae dnaj dnak dnaq eno faba fabi fabz fba fdhd flda ftse ftsz gada gadb gida grea greb grpe gyra gyrb hepa hfq hima hns holb holc hold hole hsca hsdm hsdr hslu hslv hupa hupb hypc hypd ibpa infc iscs iscu kdsa ldcc lig lpxd lyss lysu malt metk mopa mopb mreb mukb muke mura narg narh narj nary ndk nfi nusa nusg panc parc pept pflb phes phet pnp pola ppa prsa pssa pstb purc pyka recb recd rece recj recq rfad rfbc rhlb rne rnha rnpa rpld rpoa rpob rpoc rpod rpoh rpon rpos rpoz seca secb selb smpb spot srmb ssb suca sucb tdcd tgt thdf topa topb tpia tsf tufa tufb usg uvrc vacb yacg yacl ybcj ybdq ybez ybjx ycby ycea ycec ycff ycil yeaz yedo yfcb yfgb yfid yfif ygdp ygfa yggh ygjd yhbu yhby yhhp yien yjef yjfq ylea ynhc ynhd ynhe accb aas fabb plsb acps fabf fabg glmu ybgc yiid ybbn dnan ydhd rho rplb rpli rplv rpsc rpsb rpsg narz clpb cspg rpsm rplc rple rplm rpsa rpsd rpse rpla rplx rpsj rpst cspi ygif rpls rplu rpsf rpsh rpsi rpsn rpso rpsp rpsr dnac dnax hola reca entb sapa b1742 ffh rplf gcpe ubix fusa cafa hrpa lon tig hlpa himd b1410 b1808 pria malp hyfg hyce ibpb rpll gaty mukf nei infb usha sbcb rplo rplt b2372 qor manx ycbb rpmb recg rplq yhir rplr yibl rplw yhbv fucu yabb rpsl 19

20 Protein-protein interaction network RNA polymerase Negative rpoa greb b2372 usg infb grea rpoc manx rpos rpon rpob rpoz yacl rpod nusg pola b1731 hepa nusa Positive 20

21 Protein-protein interaction network: prediction experiment # of links found # of pairs checked 21

22 Data analysis with the singular value decomposition Many data analysis procedures for matrix-valued data Y are related to the SVD. Given a model of the form Y = M + E where E is independent normal noise, the SVD provides Interpretation: y i,j = u i Dv j + ɛ i,j, u i and v j are the ith, jth rows of U, V Estimation: Applications: ˆMK = Û[,1:K] ˆD [1:K,1:K] ˆV [,1:K] if M is assumed to be of rank K. biplots (Gabriel 1971, Gower and Hand 1996) reduced-rank interaction models (Gabriel 1978, 1998) analysis of relational data (Harshman et al., 1982) Factor analysis, image processing, data reduction,... But: How to select K? Given K, is ˆM K a good estimator? (E[Y Y] = M M + mσ 2 I ) Bayesian work on factor analysis models: Rajan and Rayner (1997), Minka (2000), Lopes and West (2004). 22

23 A toy example Y = UDV + E = X d k U [,k] V [,k] + E singular values

24 A toy example Y = UDV + E = X d k U [,k] V [,k] + E singular values

25 Evaluating the dimension It can all be done by comparing M 1 : Y = duv + E M 0 : Y = E To decide whether or not to add a dimension, we need to calculate p(m 1 Y) p(m 0 Y) = p(m 1) p(y M 1 ) p(m 0 ) p(y M 0 ) The numerator of the Bayes factor is the hard part. We need to integrate the following w.r.t. the prior distribution: p(y M 1, u, v, d) = (2π) nm/2 exp{ 1 2 φ Y 2 /2 + φdu Yv φd 2 /2} = p(y M 0 ) exp{ φd 2 /2} exp{u [dφy]v} Computing E[e u Av ] over u Sm and v S n is difficult but doable. It involves the integral of a Bessel function and a weird hypergeometric function. These results provide a description of a joint distribution to represent bivariate dependence among points on spheres: p(u, v A) = c(a) exp{u Av} 25

26 Simulation study 100 datasets were generated for each of (m, n) { (10, 10), (100, 10), (100, 100)} U uniform(v 5,m ), V uniform(v 5,n ) ; D = diag{d 1,..., d 5 }, {d 1,..., d 5 } i.i.d. uniform( 1 2 µ mn, 3 2 µ mn), where µ 2 mn = m + n + 2 mn. Y = UDV + E, where E is an m n matrix of standard normal noise. The choice of µ mn is not arbitrary: The largest eigenvalue of E is approximately (Edelman, 1988) λ max (m + n + 2 mn). 26

27 Simulation study m=10 n=10 m=100 n=10 m=100 n=100 posterior mode 27

28 Simulation study m=10 n=10 m=100 n=10 m=100 n=100 max eigengap posterior mode 28

29 Simulation study m=10 n=10 m=100 n=10 m=100 n=100 eigenvalue of corr max eigengap posterior mode 29

30 Simulation study E[p(K Y)] m=10, n=10 p(k^ ) Density E[p(K Y)] m=100, n=10 p(k^ ) Density E[p(K Y)] m=100, n=100 p(k^ ) Density K K^ ASE B ASE LS 30

31 Summary and future directions Summary: Relevance disclaimer: If your goal is to represent large components of variation in the data, then maybe you don t need a model. If you believe Y M + E, and you want to represent large components of variation in M, then a model can be helpful. SVD models are a natural way to represent patterns in relational data; The SVD structure can be incorporated into a variety of model forms; BMA on model rank can be done. Potentially straightforward extensions: Representation of higher dimensional arrays: m i,j,k = L l=1 d lu i,l v j,l w l,k ; Dynamic network inference (times series models for Y, U, D, V); Restrict latent characteristics to be orthogonal to design vectors. 31

Latent SVD Models for Relational Data

Latent SVD Models for Relational Data Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1 Outline 1 Relational data 2 Exchangeability for arrays

More information

Latent SVD Models for Relational Data

Latent SVD Models for Relational Data Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1 Outline Part 1: Multiplicative factor models for network

More information

Latent Factor Models for Relational Data

Latent Factor Models for Relational Data Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington Outline Part 1: Multiplicative factor models for

More information

Latent Factor Models for Relational Data

Latent Factor Models for Relational Data Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington Outline Part 1: Multiplicative factor models for

More information

Hierarchical models for multiway data

Hierarchical models for multiway data Hierarchical models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Array-valued data y i,j,k = jth variable of ith subject under condition k (psychometrics).

More information

Mean and covariance models for relational arrays

Mean and covariance models for relational arrays Mean and covariance models for relational arrays Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Models for multiway mean structure Models for

More information

Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data

Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data Peter D. Hoff 1 Technical Report no. 526 Department of Statistics University of Washington

More information

Higher order patterns via factor models

Higher order patterns via factor models 1/39 Higher order patterns via factor models 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/39 Conflict data Y

More information

Probability models for multiway data

Probability models for multiway data Probability models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Hierarchical models for multiway factors Deep interactions

More information

Model averaging and dimension selection for the singular value decomposition

Model averaging and dimension selection for the singular value decomposition Model averaging and dimension selection for the singular value decomposition Peter D. Hoff December 9, 2005 Abstract Many multivariate data reduction techniques for an m n matrix Y involve the model Y

More information

arxiv:math/ v1 [math.st] 1 Sep 2006

arxiv:math/ v1 [math.st] 1 Sep 2006 Model averaging and dimension selection for the singular value decomposition arxiv:math/0609042v1 [math.st] 1 Sep 2006 Peter D. Hoff February 2, 2008 Abstract Many multivariate data analysis techniques

More information

Model averaging and dimension selection for the singular value decomposition

Model averaging and dimension selection for the singular value decomposition Model averaging and dimension selection for the singular value decomposition Peter D. Hoff Technical Report no. 494 Department of Statistics University of Washington January 10, 2006 Abstract Many multivariate

More information

Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data

Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data Santiago Treviño III 1,2, Yudong Sun 1,2, Tim F. Cooper 3, Kevin E. Bassler 1,2 * 1 Department of Physics, University

More information

The International-Trade Network: Gravity Equations and Topological Properties

The International-Trade Network: Gravity Equations and Topological Properties The International-Trade Network: Gravity Equations and Topological Properties Giorgio Fagiolo Sant Anna School of Advanced Studies Laboratory of Economics and Management Piazza Martiri della Libertà 33

More information

Inferring Latent Preferences from Network Data

Inferring Latent Preferences from Network Data Inferring Latent Preferences from Network John S. Ahlquist 1 Arturas 2 1 UC San Diego GPS 2 NYU 14 November 2015 very early stages Methodological extend latent space models (Hoff et al 2002) to partial

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Multilinear tensor regression for longitudinal relational data

Multilinear tensor regression for longitudinal relational data Multilinear tensor regression for longitudinal relational data Peter D. Hoff 1 Working Paper no. 149 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 November

More information

Supplemental Information. Exploiting Natural Fluctuations. to Identify Kinetic Mechanisms. in Sparsely Characterized Systems

Supplemental Information. Exploiting Natural Fluctuations. to Identify Kinetic Mechanisms. in Sparsely Characterized Systems Cell Systems, Volume Supplemental Information Exploiting Natural Fluctuations to Identify Kinetic Mechanisms in Sparsely Characterized Systems Andreas Hilfinger, Thomas M. Norman, and Johan Paulsson Exploiting

More information

Dyadic data analysis with amen

Dyadic data analysis with amen Dyadic data analysis with amen Peter D. Hoff May 23, 2017 Abstract Dyadic data on pairs of objects, such as relational or social network data, often exhibit strong statistical dependencies. Certain types

More information

A Hierarchical Perspective on Lee-Carter Models

A Hierarchical Perspective on Lee-Carter Models A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Dr Gerhard Roth COMP 40A Winter 05 Version Linear algebra Is an important area of mathematics It is the basis of computer vision Is very widely taught, and there are many resources

More information

Equivariant and scale-free Tucker decomposition models

Equivariant and scale-free Tucker decomposition models Equivariant and scale-free Tucker decomposition models Peter David Hoff 1 Working Paper no. 142 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 December 31,

More information

Hierarchical multilinear models for multiway data

Hierarchical multilinear models for multiway data Hierarchical multilinear models for multiway data Peter D. Hoff 1 Technical Report no. 564 Department of Statistics University of Washington Seattle, WA 98195-4322. December 28, 2009 1 Departments of Statistics

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Partial correlation coefficient between distance matrices as a new indicator of protein protein interactions

Partial correlation coefficient between distance matrices as a new indicator of protein protein interactions BIOINFORMATICS ORIGINAL PAPER Vol. 22 no. 20 2006, pages 2488 2492 doi:10.1093/bioinformatics/btl419 Sequence analysis Partial correlation coefficient between distance matrices as a new indicator of protein

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Positive Definite Matrix

Positive Definite Matrix 1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function

More information

Modeling homophily and stochastic equivalence in symmetric relational data

Modeling homophily and stochastic equivalence in symmetric relational data Modeling homophily and stochastic equivalence in symmetric relational data Peter D. Hoff Departments of Statistics and Biostatistics University of Washington Seattle, WA 98195-4322. hoff@stat.washington.edu

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Agriculture, Transportation and the Timing of Urbanization

Agriculture, Transportation and the Timing of Urbanization Agriculture, Transportation and the Timing of Urbanization Global Analysis at the Grid Cell Level Mesbah Motamed Raymond Florax William Masters Department of Agricultural Economics Purdue University SHaPE

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Example Linear Algebra Competency Test

Example Linear Algebra Competency Test Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,

More information

Clustering and blockmodeling

Clustering and blockmodeling ISEG Technical University of Lisbon Introductory Workshop to Network Analysis of Texts Clustering and blockmodeling Vladimir Batagelj University of Ljubljana Lisbon, Portugal: 2nd to 5th February 2004

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Linear algebra for computational statistics

Linear algebra for computational statistics University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2017 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 141 Part III

More information

MP 472 Quantum Information and Computation

MP 472 Quantum Information and Computation MP 472 Quantum Information and Computation http://www.thphys.may.ie/staff/jvala/mp472.htm Outline Open quantum systems The density operator ensemble of quantum states general properties the reduced density

More information

Weighted Low Rank Approximations

Weighted Low Rank Approximations Weighted Low Rank Approximations Nathan Srebro and Tommi Jaakkola Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Weighted Low Rank Approximations What is

More information

Appendix A: Matrices

Appendix A: Matrices Appendix A: Matrices A matrix is a rectangular array of numbers Such arrays have rows and columns The numbers of rows and columns are referred to as the dimensions of a matrix A matrix with, say, 5 rows

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

BIOINFORMATICS Datamining #1

BIOINFORMATICS Datamining #1 BIOINFORMATICS Datamining #1 Mark Gerstein, Yale University gersteinlab.org/courses/452 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Large-scale Datamining Gene Expression Representing Data in

More information

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Linear Equations and Matrix

Linear Equations and Matrix 1/60 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Gaussian Elimination 2/60 Alpha Go Linear algebra begins with a system of linear

More information

A re examination of the Columbian exchange: Agriculture and Economic Development in the Long Run

A re examination of the Columbian exchange: Agriculture and Economic Development in the Long Run Are examinationofthecolumbianexchange: AgricultureandEconomicDevelopmentintheLongRun AlfonsoDíezMinguela MªDoloresAñónHigón UniversitatdeValéncia UniversitatdeValéncia May2012 [PRELIMINARYRESEARCH,PLEASEDONOTCITE]

More information

5.6. PSEUDOINVERSES 101. A H w.

5.6. PSEUDOINVERSES 101. A H w. 5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Linear Algebra, part 3 QR and SVD

Linear Algebra, part 3 QR and SVD Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Quick Tour of Linear Algebra and Graph Theory

Quick Tour of Linear Algebra and Graph Theory Quick Tour of Linear Algebra and Graph Theory CS224W: Social and Information Network Analysis Fall 2014 David Hallac Based on Peter Lofgren, Yu Wayne Wu, and Borja Pelato s previous versions Matrices and

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

Lecture 10 Optimal Growth Endogenous Growth. Noah Williams

Lecture 10 Optimal Growth Endogenous Growth. Noah Williams Lecture 10 Optimal Growth Endogenous Growth Noah Williams University of Wisconsin - Madison Economics 702 Spring 2018 Optimal Growth Path Recall we assume exogenous growth in population and productivity:

More information

ECON 581. The Solow Growth Model, Continued. Instructor: Dmytro Hryshko

ECON 581. The Solow Growth Model, Continued. Instructor: Dmytro Hryshko ECON 581. The Solow Growth Model, Continued Instructor: Dmytro Hryshko 1 / 38 The Solow model in continuous time Consider the following (difference) equation x(t + 1) x(t) = g(x(t)), where g( ) is some

More information

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Lecture Notes 2: Matrices

Lecture Notes 2: Matrices Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector

More information

Quick Tour of Linear Algebra and Graph Theory

Quick Tour of Linear Algebra and Graph Theory Quick Tour of Linear Algebra and Graph Theory CS224w: Social and Information Network Analysis Fall 2012 Yu Wayne Wu Based on Borja Pelato s version in Fall 2011 Matrices and Vectors Matrix: A rectangular

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

Scalable Gaussian process models on matrices and tensors

Scalable Gaussian process models on matrices and tensors Scalable Gaussian process models on matrices and tensors Alan Qi CS & Statistics Purdue University Joint work with F. Yan, Z. Xu, S. Zhe, and IBM Research! Models for graph and multiway data Model Algorithm

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le Direct Methods for Solving Linear Systems Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le 1 Overview General Linear Systems Gaussian Elimination Triangular Systems The LU Factorization

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information