Latent SVD Models for Relational Data

Size: px
Start display at page:

Download "Latent SVD Models for Relational Data"

Transcription

1 Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1

2 Outline Part 1: Multiplicative factor models for network data 1 Relational data 2 Models via matrix representations 3 Models via exchangeability 4 Examples: (a) International conflict data (b) Protein-protein interaction data Part 2: Rank selection for the SVD model 1 Prior distributions on the Steifel manifold 2 A Gibbs sampling scheme for rank estimation 3 A simulation study 4 Example: Global service firm locations 2

3 Relational data Relational data consists of a set of units or nodes A, and a set of variables Y {y i,j } specific to pairs of nodes (i, j) A A Examples: International relations: Let A index the set of countries Let y i,j be the number of military disputes initiated by i with target j Needle-sharing network: Let A index a set of IV drug users Let y i,j indicate needle-sharing activity between i and j Protein-protein interactions: Let A index a sets of proteins Let y i,j measure the interaction between i and j usiness locations: Let A 1 be a set of banks and A 2 be a set of cities For (i, j) A 1 A 2, let y i,j indicate the presence of an office of bank i in city j 3

4 Inferential goals in the regression framework Let y i,j be the measurement on i j, and x i,j be a vector of explanatory variables If y i,j is binary, then maybe we have Y = 1 0 NA NA 1 NA A X = x 1,2 x 1,3 x 1,4 x 1,5 x 2,1 x 2,3 x 2,4 x 2,5 x 3,1 x 3,2 x 3,4 x 3,5 x 4,1 x 4,2 x 4,3 x 4,5 x 5,1 x 5,2 x 5,3 x 5,4 1 A onsider a basic generalized linear model log odds(y i,j = 1) = β x i,j A model can provide a measure of the association between X and Y: ˆβ, se( ˆβ) predictions of missing or future observations: Pr(y 1,4 = 1 Y, X) 4

5 How might node heterogeneity affect network structure? 5

6 Latent variable models Deviations from ordinary logistic regression can be represented with log odds(y i,j = 1) = β x i,j + z i,j A simple latent variable model might include row and column effects: z i,j = a i + b j log odds(y i,j = 1) = β x i,j + a i + b j a i and b j induce across-node heterogeneity that is additive on the log-odds scale Inclusion of these effects in the model can dramatically improve within-sample model fit (measured by R 2, likelihood ratio, I, etc ); out-of-sample predictive performance (measured by cross-validation) ut this model only captures heterogeneity of outdegree/indegree, and can t represent more complicated structure, such as clustering, transitivity, etc 6

7 The singular value decomposition Z = M + E M represents systematic patterns in the effects and E represents noise Every M has a representation of the form M = UDV where, in the case m n, Recall, U is an m n matrix with orthonormal columns; V is an n n matrix with orthonormal columns; D is an n n diagonal matrix, with diagonal elements {d 1,, d n } typically taken to be a decreasing sequence of non-negative numbers The squared elements of the diagonal of D are the eigenvalues of M M and the columns of V are the corresponding eigenvectors The matrix U can be obtained from the first n eigenvectors of MM The number of non-zero elements of D is the rank of M Writing the row vectors as {u 1,, u m }, {v 1,, v n }, m i,j = u i Dv j 7

8 Multiplicative effects Interpretation: log odds(y i,j = 1) = β x i,j + u idv j + ɛ i,j Think of {u 1,, u m }, {v 1,, v n } as vectors of latent nodal attributes: u idv j = K d k u i,k v j,k k=1 In general, a latent variable model relating X to Y is For example, some potential models are g(e[y i,j ]) = β x i,j + u idv j + ɛ i,j If y i,j is binary, If y i,j is count data, If y i,j is continuous, log odds (y i,j = 1) = β x i,j + u idv j + ɛ i,j log E[y i,j ] = β x i,j + u idv j + ɛ i,j E[y i,j ] = β x i,j + u idv j + ɛ i,j Estimation: Given D, V, the predictor is linear in {u 1,, u m } This bilinear structure can be exploited (EM, Gibbs sampling, variational methods) 8

9 Alternative justification via exchangeability X represents known information about the nodes Z represents any additional patterns In this case we might be willing to use a model in which {z i,j } d = {z gi,hj } for all permutations g and h This is a type of exchangeability for arrays, sometimes called weak exchangeability Theorem (Aldous, Hoover): Let {z i,j } be a weakly exchangeable array Then {z i,j } d = {z i,j }, where z i,j = f(µ, u i, v j, ɛ i,j ) and µ, {u i }, {v j }, {ɛ i,j } are all independent random variables 9

10 International onflict Network, years of international relations data (thanks to Mike Ward and Xun ao) y i,j = indicator of a militarized disputes initiated by i with target j; x i,j an 8-dimensional covariate vector containing an intercept and 1 population initiator 2 population target 3 polity score initiator 4 polity score target 5 polity score initiator polity score target 6 log distance 7 number of shared intergovernmental organizations Model: y i,j are independent binary random variables with log-mean log E[y i,j ] = β x i,j + u idv j + ɛ i,j 10

11 International conflict network, : Parameter estimates log distance polity initiator intergov org polity target polity interaction log pop target log pop initiator regression coefficient 11

12 AFG ANG ARG AUL AH NG OT UI AN AO HA HN OS U YP DOM DR EGY FRN GHA GR GUI HON IND INS IRN IRQ ISR ITA JOR JPN KEN LR LES LI MAL MYA NI NIG NIR NTH OMA PAK PHI PRK QAT ROK RWA SAF SAL SAU SEN SIE SRI SUD SWA SYR TAW TAZ THI TOG TUR UAE UGA UKG USA VEN AFG AL ANG ARG AUL AH EL EN NG UI AM AN DI HA HL HN OL ON OS U YP DR EGY FRN GHA GN GR GUI GUY HAI HON IND INS IRN IRQ ISR ITA JOR JPN LR LES LI MAA MLI MON MOR MYA MZM NAM NI NIG NIR NTH OMA PAK PHI PNG PRK QAT ROK RWA SAF SAL SAU SEN SIE SIN SPN SUD SYR TAW TAZ THI TOG TRI TUR UAE UGA UKG USA VEN YEM ZAM ZIM 12

13 Analyzing undirected graphs In many applications, y i,j = y j,i by design log odds(y i,j = y j,i = 1) = β x i,j + z i,j As before, write Z = M + E, with all matrices symmetric All such M have an eigenvalue decomposition This suggests a model of the form M = UΛU m i,j = u iλu j log odds(y i,j = y j,i = 1) = β x i,j + u iλu j + ɛ i,j Justification via exchangeability: Let Z be a symmetric array (with an undefined diagonal), such that {z i,j } = d {z gi,gj } Then {z i,j } = d {zi,j }, where z i,j = f(µ, u i, u j, ɛ i,j ) µ, {u i }, {ɛ i,j } are independent random variables, and f(, u i, u j, ) = f(, u j, u i, ) 13

14 Protein-protein interaction network Interactions among 270 proteins in E coli (utland, 2005) 1 Data: ( ) i<j y i,j 001 Model: log odds(y i,j = 1) = µ + u i Λu j + ɛ i,j (analysis by Ryan ressler) 14

15 Positive Negative acca acc accd acpp aid alas b1731 b2097 b2341 b2342 b2496 bola cada cbpa clpa clpp clpx cob cspa csp csp cspd cspe dead dnaa dna dnae dnaj dnak dnaq eno faba fabi fabz fba fdhd flda ftse ftsz gada gad gida grea gre grpe gyra gyr hepa hfq hima hns hol hol hold hole hsca hsdm hsdr hslu hslv hupa hup hyp hypd ibpa inf iscs iscu kdsa ldc lig lpxd lyss lysu malt metk mopa mop mre muk muke mura narg narh narj nary ndk nfi nusa nusg pan par pept pfl phes phet pnp pola ppa prsa pssa pst pur pyka rec recd rece recj recq rfad rfb rhl rne rnha rnpa rpld rpoa rpo rpo rpod rpoh rpon rpos rpoz seca sec sel smp spot srm ssb suca suc tdcd tgt thdf topa top tpia tsf tufa tuf usg uvr vac yacg yacl ybcj ybdq ybez ybjx ycby ycea yce ycff ycil yeaz yedo yfc yfg yfid yfif ygdp ygfa yggh ygjd yhbu yhby yhhp yien yjef yjfq ylea ynh ynhd ynhe acc aas fab pls acps fabf fabg glmu ybg yiid ybbn dnan ydhd rho rpl rpli rplv rps rps rpsg narz clp cspg rpsm rpl rple rplm rpsa rpsd rpse rpla rplx rpsj rpst cspi ygif rpls rplu rpsf rpsh rpsi rpsn rpso rpsp rpsr dna dnax hola reca ent sapa b1742 ffh rplf gcpe ubix fusa cafa hrpa lon tig hlpa himd b1410 b1808 pria malp hyfg hyce ibp rpll gaty mukf nei inf usha sbc rplo rplt b2372 qor manx ycb rpm recg rplq yhir rplr yibl rplw yhbv fucu yab rpsl 15

16 RNA polymerase Negative rpoa gre b2372 usg inf grea rpo manx rpos rpon rpo rpoz yacl rpod nusg pola b1731 hepa nusa Positive 16

17 Prediction experiment for international conflict data # of links found K=0 K=1 K=2 K=3 fraction predicted to be links true links true non links # of pairs checked p^ 17

18 Prediction experiment for protein interaction data Latent dimension 2 interactions found trials 18

19 Data analysis with the singular value decomposition Many data analysis procedures for matrix-valued data Y are related to the SVD Given a model of the form Y = M + E the SVD provides Interpretation: y i,j = u i Dv j + ɛ i,j, u i and v j are the ith, jth rows of U, V Estimation: Applications: ˆMK = Û[,1:K] ˆD [1:K,1:K] ˆV [,1:K] if M is assumed to be of rank K biplots (Gabriel 1971, Gower and Hand 1996) reduced-rank interaction models (Gabriel 1978, 1998) analysis of relational data (Harshman et al, 1982) Factor analysis, image processing, data reduction, ut: How to select K? Given K, is ˆM K a good estimator? (E[Y Y] = M M + mσ 2 I ) ayesian work on factor analysis models: Rajan and Rayner (1997), Minka (2000), Lopes and West (2004) 19

20 A toy example Y = UDV + E = X d k U [,k] V [,k] + E singular values

21 A toy example Y = UDV + E = X d k U [,k] V [,k] + E singular values

22 SVD Model Let Y be an m n data matrix (suppose m n) onsider a model of the form Y = UDV + E where K {0,, n} ; U V K,m, V V K,n ; D = diag{d 1,, d K }; E is a matrix of independent normal noise Given a prior distribution on {U, D, V} we would like to calculate p({u 1,, u K }, {v 1,, v K }, {d 1,, d K } Y, K) p(k Y) 22

23 Prior distribution for U 1 z 1 uniform[m-sphere], set U [,1] = z 1 ; 2 z 2 uniform[(m 1)-sphere], set U [,2] = Null(U [,1] )z 2 ; K z K uniform [(m K + 1)-sphere], U [,K] = Null(U [,1],, U [,K 1] )z K The resulting distribution for U is the uniform (invariant) distribution on the Steifel manifold, and is exchangeable under row and column permutations Prior conditional distributions: (U [,j] U [, j] ) d = Null(U [,1],, U [,j 1], U [,j+1],, U [,K] )z j, z j uniform [(m K + 1)-sphere] 23

24 Posterior distribution for U Recall UDV = K k=1 d ku [,k] V [,k], so so Y UDV = (Y U [, j] D [ j, j] V [, j]) d j U [,j] V [,j] E j d j U [,j] V [,j] Y UDV 2 = E j d j U [,j] V [,j] 2 = E j 2 2d j U [,j] E j V [,j] + d j U [,j] V [,j] 2 = E j 2 2d j U [,j] E j V [,j] + d 2 j p(y U, V, D, φ) = (2πφ) mn/2 exp{ 1 2 φ E j 2 + φd j U [,j] E j V [,j] 1 2 φd2 j} p(u [,j] Y, U [, j], V, D) p(u [,j] U [, j] ) exp{φd j U [,j] E j V [,j] } (U [,j] U [, j], Y, V, D) d = Null(U [,1],, U [,j 1], U [,j+1],, U [,K] )z j N j z j, p(z j ) exp{z j µ}, µ = φd j N je j V [,j] 24

25 Evaluating the dimension It can all be done by comparing M 1 : Y = duv + E M 0 : Y = E To decide whether or not to add a dimension, we need to calculate p(m 1 Y) p(m 0 Y) = p(m 1) p(y M 1 ) p(m 0 ) p(y M 0 ) The numerator of the ayes factor is the hard part We need to integrate the following wrt the prior distribution: p(y M 1, u, v, d) = (2π) nm/2 exp{ 1 2 φ Y 2 /2 + φdu Yv φd 2 /2} = p(y M 0 ) exp{ φd 2 /2} exp{u [dφy]v} omputing E[e u Av ] over u Sm and v S n is difficult but doable 25

26 Simulation study For each of (m, n) {(10, 10), (100, 10), (100, 100)}, 100 datasets were generated as follows: U uniform(v 5,m ), V uniform(v 5,n ) ; D = diag{d 1,, d 5 }, {d 1,, d 5 } iid uniform( 1 2 µ mn, 3 2 µ mn) Y = UDV + E, where E is an m n matrix of standard normal noise where µ mn = m + n + 2 mn This choice was not arbitrary: The largest eigenvalue of E is approximately (Edelman, 1988) λ max (m + n + 2 mn) 26

27 Simulation study E[p(K Y)] m=10, n=10 p(k^ ) Density E[p(K Y)] m=100, n=10 p(k^ ) Density E[p(K Y)] m=100, n=100 p(k^ ) Density K K^ ASE ASE LS 27

28 Final Example Data on 46 global service firms and 55 cities, obtained from the Globalization and World ities study group ( y i,j = 1 if firm j has an office in city i and y i,j = 0 otherwise log odds(y i,j = 1) = β + u i Dv j + e i,j K p(k Y) log p(y θ^) scan K K 28

29 ity effects u2 ^ Kuala Lumpur Taipei Manila Seoul Sydney uenos Aires Mexico Johannesburg ity angkok Shanghai eijing Geneva Los Angeles Zurich aracas Santiago Toronto Jakarta Hong Kong Tokyo Singapore hicago Houston Amsterdam Frankfurt Melbourne MontrealSan Francisco Osaka Sao Paulo oston Milan Dusseldorf Istanbul Miami openhagen Madrid New York arcelona Hamburg Munich Stockholm Paris London udapest Atlanta Prague Warsaw Dallas Rome erlin Minneapolis Moscow Washington,D russels ^ u 1 29

30 Summary and future directions Summary: SVD models are a natural way to represent patterns in relational data; The SVD structure can be incorporated into a variety of model forms; MA on model rank can be done Future Directions: Extensions to higher dimensional arrays; Dynamic network inference (times series models for Y, U, D, V); Restrict latent characteristics to be orthogonal to design vectors 30

Latent SVD Models for Relational Data

Latent SVD Models for Relational Data Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1 Outline 1 Relational data 2 Exchangeability for arrays

More information

Latent Factor Models for Relational Data

Latent Factor Models for Relational Data Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington 1 Outline Part 1: Multiplicative factor models for

More information

Latent Factor Models for Relational Data

Latent Factor Models for Relational Data Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington Outline Part 1: Multiplicative factor models for

More information

Latent Factor Models for Relational Data

Latent Factor Models for Relational Data Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington Outline Part 1: Multiplicative factor models for

More information

Hierarchical models for multiway data

Hierarchical models for multiway data Hierarchical models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Array-valued data y i,j,k = jth variable of ith subject under condition k (psychometrics).

More information

Mean and covariance models for relational arrays

Mean and covariance models for relational arrays Mean and covariance models for relational arrays Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Models for multiway mean structure Models for

More information

Higher order patterns via factor models

Higher order patterns via factor models 1/39 Higher order patterns via factor models 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/39 Conflict data Y

More information

Probability models for multiway data

Probability models for multiway data Probability models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Hierarchical models for multiway factors Deep interactions

More information

Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data

Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data Peter D. Hoff 1 Technical Report no. 526 Department of Statistics University of Washington

More information

Model averaging and dimension selection for the singular value decomposition

Model averaging and dimension selection for the singular value decomposition Model averaging and dimension selection for the singular value decomposition Peter D. Hoff Technical Report no. 494 Department of Statistics University of Washington January 10, 2006 Abstract Many multivariate

More information

Model averaging and dimension selection for the singular value decomposition

Model averaging and dimension selection for the singular value decomposition Model averaging and dimension selection for the singular value decomposition Peter D. Hoff December 9, 2005 Abstract Many multivariate data reduction techniques for an m n matrix Y involve the model Y

More information

arxiv:math/ v1 [math.st] 1 Sep 2006

arxiv:math/ v1 [math.st] 1 Sep 2006 Model averaging and dimension selection for the singular value decomposition arxiv:math/0609042v1 [math.st] 1 Sep 2006 Peter D. Hoff February 2, 2008 Abstract Many multivariate data analysis techniques

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Porous Europe: European Cities in Global Urban Arenas

Porous Europe: European Cities in Global Urban Arenas Porous Europe: European Cities in Global Urban Arenas Ben Derudder & Peter Taylor Globalization and World Cities study group and network (GaWC, http://www.lboro.ac.uk/gawc) Rationale Three basic observations:

More information

Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data

Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data Santiago Treviño III 1,2, Yudong Sun 1,2, Tim F. Cooper 3, Kevin E. Bassler 1,2 * 1 Department of Physics, University

More information

Multilinear tensor regression for longitudinal relational data

Multilinear tensor regression for longitudinal relational data Multilinear tensor regression for longitudinal relational data Peter D. Hoff 1 Working Paper no. 149 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 November

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Modeling homophily and stochastic equivalence in symmetric relational data

Modeling homophily and stochastic equivalence in symmetric relational data Modeling homophily and stochastic equivalence in symmetric relational data Peter D. Hoff Departments of Statistics and Biostatistics University of Washington Seattle, WA 98195-4322. hoff@stat.washington.edu

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Mid-Sized Cities: Territorial Capital of Europe

Mid-Sized Cities: Territorial Capital of Europe Mid-Sized Cities: Territorial Capital of Europe IUFA Conference 2009 Mid-size cities and the knowledge economy Bologna, Italy 13-17 June 2009 Contents 1. Metropolitan fever 2. Les Villes d`europe 3. Territorial

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Linear algebra for computational statistics

Linear algebra for computational statistics University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j

More information

Dyadic data analysis with amen

Dyadic data analysis with amen Dyadic data analysis with amen Peter D. Hoff May 23, 2017 Abstract Dyadic data on pairs of objects, such as relational or social network data, often exhibit strong statistical dependencies. Certain types

More information

Urbanisation & Economic Geography (2)

Urbanisation & Economic Geography (2) GEOG1101: Introduction to Economic Geography Tuesday 20 October, 2009 Urbanisation & Economic Geography (2) Kevon Rhiney Department of Geography and Geology University of the West Indies, Mona Lecture

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Dr Gerhard Roth COMP 40A Winter 05 Version Linear algebra Is an important area of mathematics It is the basis of computer vision Is very widely taught, and there are many resources

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University of Hong Kong Lecture 8: QR Decomposition

More information

Steven J. Leon University of Massachusetts, Dartmouth

Steven J. Leon University of Massachusetts, Dartmouth INSTRUCTOR S SOLUTIONS MANUAL LINEAR ALGEBRA WITH APPLICATIONS NINTH EDITION Steven J. Leon University of Massachusetts, Dartmouth Boston Columbus Indianapolis New York San Francisco Amsterdam Cape Town

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Global City By: System Administrator On: 2013-05-11 00:59 (29667 Reads) Global City or world city World Cities Are Influential on a Global Scale: Economically powerful integrated into the world economy

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Fundamentals of Matrices

Fundamentals of Matrices Maschinelles Lernen II Fundamentals of Matrices Christoph Sawade/Niels Landwehr/Blaine Nelson Tobias Scheffer Matrix Examples Recap: Data Linear Model: f i x = w i T x Let X = x x n be the data matrix

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

College Algebra. Third Edition. Concepts Through Functions. Michael Sullivan. Michael Sullivan, III. Chicago State University. Joliet Junior College

College Algebra. Third Edition. Concepts Through Functions. Michael Sullivan. Michael Sullivan, III. Chicago State University. Joliet Junior College College Algebra Concepts Through Functions Third Edition Michael Sullivan Chicago State University Michael Sullivan, III Joliet Junior College PEARSON Boston Columbus Indianapolis New York San Francisco

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Hierarchical multilinear models for multiway data

Hierarchical multilinear models for multiway data Hierarchical multilinear models for multiway data Peter D. Hoff 1 Technical Report no. 564 Department of Statistics University of Washington Seattle, WA 98195-4322. December 28, 2009 1 Departments of Statistics

More information

Multi-level Models: Idea

Multi-level Models: Idea Review of 140.656 Review Introduction to multi-level models The two-stage normal-normal model Two-stage linear models with random effects Three-stage linear models Two-stage logistic regression with random

More information

Positive Definite Matrix

Positive Definite Matrix 1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

Review of similarity transformation and Singular Value Decomposition

Review of similarity transformation and Singular Value Decomposition Review of similarity transformation and Singular Value Decomposition Nasser M Abbasi Applied Mathematics Department, California State University, Fullerton July 8 7 page compiled on June 9, 5 at 9:5pm

More information

Sampling and incomplete network data

Sampling and incomplete network data 1/58 Sampling and incomplete network data 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/58 Network sampling methods It is sometimes difficult to obtain a

More information

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent. Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Equivariant and scale-free Tucker decomposition models

Equivariant and scale-free Tucker decomposition models Equivariant and scale-free Tucker decomposition models Peter David Hoff 1 Working Paper no. 142 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 December 31,

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Quick Tour of Linear Algebra and Graph Theory

Quick Tour of Linear Algebra and Graph Theory Quick Tour of Linear Algebra and Graph Theory CS224w: Social and Information Network Analysis Fall 2012 Yu Wayne Wu Based on Borja Pelato s version in Fall 2011 Matrices and Vectors Matrix: A rectangular

More information

Weighted Low Rank Approximations

Weighted Low Rank Approximations Weighted Low Rank Approximations Nathan Srebro and Tommi Jaakkola Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Weighted Low Rank Approximations What is

More information

Lecture 9: Low Rank Approximation

Lecture 9: Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the

More information

Appendix A: Matrices

Appendix A: Matrices Appendix A: Matrices A matrix is a rectangular array of numbers Such arrays have rows and columns The numbers of rows and columns are referred to as the dimensions of a matrix A matrix with, say, 5 rows

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Scalable Gaussian process models on matrices and tensors

Scalable Gaussian process models on matrices and tensors Scalable Gaussian process models on matrices and tensors Alan Qi CS & Statistics Purdue University Joint work with F. Yan, Z. Xu, S. Zhe, and IBM Research! Models for graph and multiway data Model Algorithm

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

EMPIRICAL POLITICAL ANALYSIS

EMPIRICAL POLITICAL ANALYSIS Eighth Edition SUB Hamburg A/533757 EMPIRICAL POLITICAL ANALYSIS QUANTITATIVE AND QUALITATIVE RESEARCH METHODS Craig Leonard Brians Virginia Polytechnic and State University Lars Willnat Indiana University

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Quick Tour of Linear Algebra and Graph Theory

Quick Tour of Linear Algebra and Graph Theory Quick Tour of Linear Algebra and Graph Theory CS224W: Social and Information Network Analysis Fall 2014 David Hallac Based on Peter Lofgren, Yu Wayne Wu, and Borja Pelato s previous versions Matrices and

More information

Total Least Squares Approach in Regression Methods

Total Least Squares Approach in Regression Methods WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

2012 Global Cities Index and Emerging Cities Outlook

2012 Global Cities Index and Emerging Cities Outlook 2012 Global Cities Index and Emerging Cities Outlook New York, London, Paris, and Tokyo remain today s leading cities, but an analysis of key trends in emerging cities suggests that Beijing and Shanghai

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

A Hierarchical Perspective on Lee-Carter Models

A Hierarchical Perspective on Lee-Carter Models A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Supplemental Information. Exploiting Natural Fluctuations. to Identify Kinetic Mechanisms. in Sparsely Characterized Systems

Supplemental Information. Exploiting Natural Fluctuations. to Identify Kinetic Mechanisms. in Sparsely Characterized Systems Cell Systems, Volume Supplemental Information Exploiting Natural Fluctuations to Identify Kinetic Mechanisms in Sparsely Characterized Systems Andreas Hilfinger, Thomas M. Norman, and Johan Paulsson Exploiting

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A

More information

Essentials of College Algebra

Essentials of College Algebra Essentials of College Algebra For these Global Editions, the editorial team at Pearson has collaborated with educators across the world to address a wide range of subjects and requirements, equipping students

More information

Citizen Centric Cities. The Sustainable Cities Index 2018 Europe

Citizen Centric Cities. The Sustainable Cities Index 2018 Europe Citizen Centric Cities The Sustainable Cities Index 2018 Europe Foreword The 2018 edition of Arcadis Sustainable Cities Index (SCI) explores city sustainability from the perspective of the citizen. We

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

Boundary. DIFFERENTIAL EQUATIONS with Fourier Series and. Value Problems APPLIED PARTIAL. Fifth Edition. Richard Haberman PEARSON

Boundary. DIFFERENTIAL EQUATIONS with Fourier Series and. Value Problems APPLIED PARTIAL. Fifth Edition. Richard Haberman PEARSON APPLIED PARTIAL DIFFERENTIAL EQUATIONS with Fourier Series and Boundary Value Problems Fifth Edition Richard Haberman Southern Methodist University PEARSON Boston Columbus Indianapolis New York San Francisco

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010 Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu COMPSTAT

More information

The International-Trade Network: Gravity Equations and Topological Properties

The International-Trade Network: Gravity Equations and Topological Properties The International-Trade Network: Gravity Equations and Topological Properties Giorgio Fagiolo Sant Anna School of Advanced Studies Laboratory of Economics and Management Piazza Martiri della Libertà 33

More information

Introduction to Economic Geography

Introduction to Economic Geography Introduction to Economic Geography Globalization, Uneven Development and Place 2nd edition Danny MacKinnon and Andrew Cumbers Harlow, England London New York Boston San Francisco Toronto Sydney Singapore

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

Example Linear Algebra Competency Test

Example Linear Algebra Competency Test Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,

More information

Practical Linear Algebra: A Geometry Toolbox

Practical Linear Algebra: A Geometry Toolbox Practical Linear Algebra: A Geometry Toolbox Third edition Chapter 12: Gauss for Linear Systems Gerald Farin & Dianne Hansford CRC Press, Taylor & Francis Group, An A K Peters Book www.farinhansford.com/books/pla

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

Differential Equations

Differential Equations Differential Equations Theory, Technique, and Practice George F. Simmons and Steven G. Krantz Higher Education Boston Burr Ridge, IL Dubuque, IA Madison, Wl New York San Francisco St. Louis Bangkok Bogota

More information

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

BIOINFORMATICS Datamining #1

BIOINFORMATICS Datamining #1 BIOINFORMATICS Datamining #1 Mark Gerstein, Yale University gersteinlab.org/courses/452 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Large-scale Datamining Gene Expression Representing Data in

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information