Latent Factor Models for Relational Data
|
|
- Wendy Bell
- 5 years ago
- Views:
Transcription
1 Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington 1
2 Outline Part 1: Multiplicative factor models for network data 1. Relational data 2. Models via exchangeability 3. Models via matrix representations 4. Examples: (a) International conflict data (b) Protein-protein interaction data Part 2: Rank selection with the SVD model 1. A Gibbs sampling scheme for rank estimation 2. A simulation study 2
3 Relational data Relational data: consist of a set of units or nodes A, and a set of measurements Y {y i,j } specific to pairs of nodes (i, j) A A. Examples: International relations A =countries, y i,j = indicator of a dispute initiated by i with target j. Needle-sharing network A=IV drug users, y i,j = needle-sharing activity between i and j. Protein-protein interactions A=proteins, y i,j = the interaction between i and j. Business locations A 1 =banks, A 2 =cities, y i,j = presence of an office of bank i in city j. 3
4 Inferential goals in the regression framework 0 Y = y i,j measures i j, y 1,1 y 1,2 y 1,3 NA y 1,5 y 2,1 y 2,2 y 2,3 y 2,4 y 2,5 y 3,1 NA y 3,3 y 3,4 NA y 4,1 y 4,2 y 4,3 y 4,4 y 4, x i,j is a vector of explanatory variables. 1 C A Consider a basic (generalized) linear model 0 X = x 1,1 x 1,2 x 1,3 x 1,4 x 1,5 x 2,1 x 2,2 x 2,3 x 2,4 x 2,5 x 3,1 x 3,2 x 3,3 x 3,4 x 3,5 x 4,1 x 4,2 x 4,3 x 4,4 x 4, C A y i,j β x i,j + ɛ i,j A model can provide a measure of the association between X and Y: ˆβ, se( ˆβ) predictions of missing or future observations: p(y 1,4 Y, X) 4
5 How might node heterogeneity affect inference? 5
6 Latent variable models Deviations from ordinary regression models can be represented as y i,j β x i,j + z i,j A simple latent variable model might include row and column effects: z i,j = u i + v j + ɛ i,j y i,j β x i,j + u i + v j + ɛ i,j u i and v j induce across-node heterogeneity that is additive on the scale of the regressors. Inclusion of these effects in the model can dramatically improve within-sample model fit (measured by R 2, likelihood ratio, BIC, etc. ); out-of-sample predictive performance (measured by cross-validation). But this model only captures heterogeneity of outdegree/indegree, and can t represent more complicated structure, such as clustering, transitivity, etc. 6
7 Latent variable models via exchangeability X represents known information about the nodes Z represents any additional patterns In this case we might be willing to use a model in which {z i,j } d = {z gi,hj } for all permutations g and h. This is a type of exchangeability for arrays, sometimes called weak exchangeability. Theorem (Aldous, Hoover): Let {z i,j } be a weakly exchangeable array. Then {z i,j } = d {zi,j }, where z i,j = f(µ, u i, v j, ɛ i,j ) and µ, {u i }, {v j }, {ɛ i,j } are all independent random variables. 7
8 The singular value decomposition model Z = M + E M represents systematic patterns in the effects and E represents noise. Every M has a representation of the form M = UDV where, in the case m n, Recall, U is an m n matrix with orthonormal columns; V is an n n matrix with orthonormal columns; D is an n n diagonal matrix, with elements {d 1,..., d n } typically taken to be a decreasing sequence of non-negative numbers. The squared elements of the diagonal of D are the eigenvalues of M M and the columns of V are the corresponding eigenvectors. The matrix U can be obtained from the first n eigenvectors of MM. The number of non-zero elements of D is the rank of M. Writing the row vectors as {u 1,..., u m }, {v 1,..., v n }, m i,j = u i Dv j. 8
9 Multiplicative effects Interpretation: y i,j β x i,j + u idv j + ɛ i,j Think of {u 1,..., u m }, {v 1,..., v n } as vectors of latent nodal attributes: u idv j = K d k u i,k v j,k k=1 In general, a latent variable model relating X to Y is For example, some potential models are g(e[y i,j ]) = β x i,j + u idv j + ɛ i,j If y i,j is binary, If y i,j is count data, If y i,j is continuous, log odds (y i,j = 1) = β x i,j + u idv j + ɛ i,j log E[y i,j ] = β x i,j + u idv j + ɛ i,j E[y i,j ] = β x i,j + u idv j + ɛ i,j Estimation: Given D, V, the predictor is linear in U. This bilinear structure can be exploited (EM, Gibbs sampling, variational methods). 9
10 International conflict network, years of international relations data (thanks to Mike Ward and Xun Cao) y i,j = indicator of a militarized disputes initiated by i with target j; x i,j an 8-dimensional covariate vector containing an intercept and 1. population initiator 2. population target 3. polity score initiator 4. polity score target 5. polity score initiator polity score target 6. log distance 7. number of shared intergovernmental organizations Model: y i,j are independent binary random variables with log-odds log odds(y i,j = 1) = β x i,j + u idv j + ɛ i,j 10
11 International conflict network: parameter estimates log distance polity initiator intergov org polity target polity interaction log pop target log pop initiator regression coefficient 11
12 International conflict network: description of latent variation AFG ANG ARG AUL BAH BNG BOT BUI CAN CAO CHA CHN COS CUB CYP DOM DRC EGY FRN GHA GRC GUI HON IND INS IRN IRQ ISR ITA JOR JPN KEN LBR LES LIB MAL MYA NIC NIG NIR NTH OMA PAK PHI PRK QAT ROK RWA SAF SAL SAU SEN SIE SRI SUD SWA SYR TAW TAZ THI TOG TUR UAE UGA UKG USA VEN AFG ALB ANG ARG AUL BAH BEL BEN BNG BUI CAM CAN CDI CHA CHL CHN COL CON COS CUB CYP DRC EGY FRN GHA GNB GRC GUI GUY HAI HON IND INS IRN IRQ ISR ITA JOR JPN LBR LES LIB MAA MLI MON MOR MYA MZM NAM NIC NIG NIR NTH OMA PAK PHI PNG PRK QAT ROK RWA SAF SAL SAU SEN SIE SIN SPN SUD SYR TAW TAZ THI TOG TRI TUR UAE UGA UKG USA VEN YEM ZAM ZIM 12
13 International conflict network: prediction experiment # of links found K=0 K=1 K=2 K= # of pairs checked 13
14 Model details: prior distribution for U Parameters to estimate include the diagonal matrix D and: U V K,m = {m K matrices with orthonormal columns} V V K,n = {n K matrices with orthonormal columns} A constructive definition for p(u): 1. z 1 uniform[m-sphere], set U [,1] = z 1 ; 2. z 2 uniform[(m 1)-sphere], set U [,2] = Null(U [,1] )z 2 ;. K. z K uniform [(m K + 1)-sphere], U [,K] = Null(U [,1],..., U [,K 1] )z K. The resulting distribution for U is the uniform (invariant) distribution on the Steifel manifold, and is exchangeable under row and column permutations. Prior conditional distributions: (U [,j] U [, j] ) d = Null(U [,1],..., U [,j 1], U [,j+1],..., U [,K] )z j, z j uniform [(m K + 1)-sphere] 14
15 Estimation details: posterior distribution for U UDV = d 1 U [,1] V [,1] + + d K U [,K] V [,K], so for any column j Z UDV = (Z U [, j] D [ j, j] V [, j]) d j U [,j] V [,j] R j d j U [,j] V [,j] Z UDV 2 = R j d j U [,j] V [,j] 2 = R j 2 2d j U [,j] R j V [,j] + d j U [,j] V [,j] 2 = R j 2 2d j U [,j] R j V [,j] + d 2 j Recall Z = UDV + E where E is normally distributed noise with variance 1/φ: p(z U, V, D, φ) = (2πφ) mn/2 exp{ 1 2 φ R j 2 + φd j U [,j] R j V [,j] 1 2 φd2 j} p(u [,j] Z, U [, j], V, D) p(u [,j] U [, j] ) exp{ φd j U [,j] R j V [,j] } (U [,j] U [, j], Z, V, D) d = Null(U [,1],..., U [,j 1], U [,j+1],..., U [,K] )z j N j z j, p(z j ) exp{z j µ}, µ = φd j N jr j V [,j] 15
16 The von Mises-Fisher distribution p(z µ) = c(κ) exp{z µ}, where κ = µ κ=0 κ=2 κ=4 16
17 Analyzing undirected data In many applications y i,j = y j,i by design, and so (y i,j = y j,i ) β x i,j + z i,j, and Z = {z i,j } is a symmetric array. How should Z be modeled? Modeling via exchangeability: Let Z be a symmetric array (with an undefined diagonal), such that {z i,j } = d {z gi,gj }. Then {z i,j } = d {zi,j }, where z i,j = f(µ, u i, u j, ɛ i,j ), µ, {u i }, {ɛ i,j } are independent random variables, and f(, u i, u j, ) = f(, u j, u i, ). Modeling via matrix decomposition: Write Z = M + E, with all matrices symmetric. All such M have an eigenvalue decomposition This suggests a model of the form M = UΛU m i,j = u iλu j (y i,j = y j,i ) β x i,j + u iλu j + ɛ i,j 17
18 Protein-protein interaction network Interactions among 270 proteins in E. coli (Butland, 2005). 1 Data: ( ) i<j y i,j 0.01 Model: log odds(y i,j = 1) = µ + u i Λu j + ɛ i,j (analysis by Ryan Bressler) 18
19 Protein-protein interaction network Positive Negative acca accc accd acpp aidb alas b1731 b2097 b2341 b2342 b2496 bola cada cbpa clpa clpp clpx cobb cspa cspb cspc cspd cspe dead dnaa dnab dnae dnaj dnak dnaq eno faba fabi fabz fba fdhd flda ftse ftsz gada gadb gida grea greb grpe gyra gyrb hepa hfq hima hns holb holc hold hole hsca hsdm hsdr hslu hslv hupa hupb hypc hypd ibpa infc iscs iscu kdsa ldcc lig lpxd lyss lysu malt metk mopa mopb mreb mukb muke mura narg narh narj nary ndk nfi nusa nusg panc parc pept pflb phes phet pnp pola ppa prsa pssa pstb purc pyka recb recd rece recj recq rfad rfbc rhlb rne rnha rnpa rpld rpoa rpob rpoc rpod rpoh rpon rpos rpoz seca secb selb smpb spot srmb ssb suca sucb tdcd tgt thdf topa topb tpia tsf tufa tufb usg uvrc vacb yacg yacl ybcj ybdq ybez ybjx ycby ycea ycec ycff ycil yeaz yedo yfcb yfgb yfid yfif ygdp ygfa yggh ygjd yhbu yhby yhhp yien yjef yjfq ylea ynhc ynhd ynhe accb aas fabb plsb acps fabf fabg glmu ybgc yiid ybbn dnan ydhd rho rplb rpli rplv rpsc rpsb rpsg narz clpb cspg rpsm rplc rple rplm rpsa rpsd rpse rpla rplx rpsj rpst cspi ygif rpls rplu rpsf rpsh rpsi rpsn rpso rpsp rpsr dnac dnax hola reca entb sapa b1742 ffh rplf gcpe ubix fusa cafa hrpa lon tig hlpa himd b1410 b1808 pria malp hyfg hyce ibpb rpll gaty mukf nei infb usha sbcb rplo rplt b2372 qor manx ycbb rpmb recg rplq yhir rplr yibl rplw yhbv fucu yabb rpsl 19
20 Protein-protein interaction network RNA polymerase Negative rpoa greb b2372 usg infb grea rpoc manx rpos rpon rpob rpoz yacl rpod nusg pola b1731 hepa nusa Positive 20
21 Protein-protein interaction network: prediction experiment # of links found # of pairs checked 21
22 Data analysis with the singular value decomposition Many data analysis procedures for matrix-valued data Y are related to the SVD. Given a model of the form Y = M + E where E is independent normal noise, the SVD provides Interpretation: y i,j = u i Dv j + ɛ i,j, u i and v j are the ith, jth rows of U, V Estimation: Applications: ˆMK = Û[,1:K] ˆD [1:K,1:K] ˆV [,1:K] if M is assumed to be of rank K. biplots (Gabriel 1971, Gower and Hand 1996) reduced-rank interaction models (Gabriel 1978, 1998) analysis of relational data (Harshman et al., 1982) Factor analysis, image processing, data reduction,... But: How to select K? Given K, is ˆM K a good estimator? (E[Y Y] = M M + mσ 2 I ) Bayesian work on factor analysis models: Rajan and Rayner (1997), Minka (2000), Lopes and West (2004). 22
23 A toy example Y = UDV + E = X d k U [,k] V [,k] + E singular values
24 A toy example Y = UDV + E = X d k U [,k] V [,k] + E singular values
25 Evaluating the dimension It can all be done by comparing M 1 : Y = duv + E M 0 : Y = E To decide whether or not to add a dimension, we need to calculate p(m 1 Y) p(m 0 Y) = p(m 1) p(y M 1 ) p(m 0 ) p(y M 0 ) The numerator of the Bayes factor is the hard part. We need to integrate the following w.r.t. the prior distribution: p(y M 1, u, v, d) = (2π) nm/2 exp{ 1 2 φ Y 2 /2 + φdu Yv φd 2 /2} = p(y M 0 ) exp{ φd 2 /2} exp{u [dφy]v} Computing E[e u Av ] over u Sm and v S n is difficult but doable. It involves the integral of a Bessel function and a weird hypergeometric function. These results provide a description of a joint distribution to represent bivariate dependence among points on spheres: p(u, v A) = c(a) exp{u Av} 25
26 Simulation study 100 datasets were generated for each of (m, n) { (10, 10), (100, 10), (100, 100)} U uniform(v 5,m ), V uniform(v 5,n ) ; D = diag{d 1,..., d 5 }, {d 1,..., d 5 } i.i.d. uniform( 1 2 µ mn, 3 2 µ mn), where µ 2 mn = m + n + 2 mn. Y = UDV + E, where E is an m n matrix of standard normal noise. The choice of µ mn is not arbitrary: The largest eigenvalue of E is approximately (Edelman, 1988) λ max (m + n + 2 mn). 26
27 Simulation study m=10 n=10 m=100 n=10 m=100 n=100 posterior mode 27
28 Simulation study m=10 n=10 m=100 n=10 m=100 n=100 max eigengap posterior mode 28
29 Simulation study m=10 n=10 m=100 n=10 m=100 n=100 eigenvalue of corr max eigengap posterior mode 29
30 Simulation study E[p(K Y)] m=10, n=10 p(k^ ) Density E[p(K Y)] m=100, n=10 p(k^ ) Density E[p(K Y)] m=100, n=100 p(k^ ) Density K K^ ASE B ASE LS 30
31 Summary and future directions Summary: Relevance disclaimer: If your goal is to represent large components of variation in the data, then maybe you don t need a model. If you believe Y M + E, and you want to represent large components of variation in M, then a model can be helpful. SVD models are a natural way to represent patterns in relational data; The SVD structure can be incorporated into a variety of model forms; BMA on model rank can be done. Potentially straightforward extensions: Representation of higher dimensional arrays: m i,j,k = L l=1 d lu i,l v j,l w l,k ; Dynamic network inference (times series models for Y, U, D, V); Restrict latent characteristics to be orthogonal to design vectors. 31
Latent SVD Models for Relational Data
Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1 Outline 1 Relational data 2 Exchangeability for arrays
More informationLatent SVD Models for Relational Data
Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1 Outline Part 1: Multiplicative factor models for network
More informationLatent Factor Models for Relational Data
Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington Outline Part 1: Multiplicative factor models for
More informationLatent Factor Models for Relational Data
Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington Outline Part 1: Multiplicative factor models for
More informationHierarchical models for multiway data
Hierarchical models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Array-valued data y i,j,k = jth variable of ith subject under condition k (psychometrics).
More informationMean and covariance models for relational arrays
Mean and covariance models for relational arrays Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Models for multiway mean structure Models for
More informationSimulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data
Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data Peter D. Hoff 1 Technical Report no. 526 Department of Statistics University of Washington
More informationHigher order patterns via factor models
1/39 Higher order patterns via factor models 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/39 Conflict data Y
More informationProbability models for multiway data
Probability models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Hierarchical models for multiway factors Deep interactions
More informationModel averaging and dimension selection for the singular value decomposition
Model averaging and dimension selection for the singular value decomposition Peter D. Hoff December 9, 2005 Abstract Many multivariate data reduction techniques for an m n matrix Y involve the model Y
More informationarxiv:math/ v1 [math.st] 1 Sep 2006
Model averaging and dimension selection for the singular value decomposition arxiv:math/0609042v1 [math.st] 1 Sep 2006 Peter D. Hoff February 2, 2008 Abstract Many multivariate data analysis techniques
More informationModel averaging and dimension selection for the singular value decomposition
Model averaging and dimension selection for the singular value decomposition Peter D. Hoff Technical Report no. 494 Department of Statistics University of Washington January 10, 2006 Abstract Many multivariate
More informationRobust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data
Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data Santiago Treviño III 1,2, Yudong Sun 1,2, Tim F. Cooper 3, Kevin E. Bassler 1,2 * 1 Department of Physics, University
More informationThe International-Trade Network: Gravity Equations and Topological Properties
The International-Trade Network: Gravity Equations and Topological Properties Giorgio Fagiolo Sant Anna School of Advanced Studies Laboratory of Economics and Management Piazza Martiri della Libertà 33
More informationInferring Latent Preferences from Network Data
Inferring Latent Preferences from Network John S. Ahlquist 1 Arturas 2 1 UC San Diego GPS 2 NYU 14 November 2015 very early stages Methodological extend latent space models (Hoff et al 2002) to partial
More informationRandom Effects Models for Network Data
Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of
More informationMultilinear tensor regression for longitudinal relational data
Multilinear tensor regression for longitudinal relational data Peter D. Hoff 1 Working Paper no. 149 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 November
More informationSupplemental Information. Exploiting Natural Fluctuations. to Identify Kinetic Mechanisms. in Sparsely Characterized Systems
Cell Systems, Volume Supplemental Information Exploiting Natural Fluctuations to Identify Kinetic Mechanisms in Sparsely Characterized Systems Andreas Hilfinger, Thomas M. Norman, and Johan Paulsson Exploiting
More informationDyadic data analysis with amen
Dyadic data analysis with amen Peter D. Hoff May 23, 2017 Abstract Dyadic data on pairs of objects, such as relational or social network data, often exhibit strong statistical dependencies. Certain types
More informationA Hierarchical Perspective on Lee-Carter Models
A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel
More informationReview of Linear Algebra
Review of Linear Algebra Dr Gerhard Roth COMP 40A Winter 05 Version Linear algebra Is an important area of mathematics It is the basis of computer vision Is very widely taught, and there are many resources
More informationEquivariant and scale-free Tucker decomposition models
Equivariant and scale-free Tucker decomposition models Peter David Hoff 1 Working Paper no. 142 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 December 31,
More informationHierarchical multilinear models for multiway data
Hierarchical multilinear models for multiway data Peter D. Hoff 1 Technical Report no. 564 Department of Statistics University of Washington Seattle, WA 98195-4322. December 28, 2009 1 Departments of Statistics
More informationLinear Algebra (Review) Volker Tresp 2018
Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationCS 143 Linear Algebra Review
CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationPartial correlation coefficient between distance matrices as a new indicator of protein protein interactions
BIOINFORMATICS ORIGINAL PAPER Vol. 22 no. 20 2006, pages 2488 2492 doi:10.1093/bioinformatics/btl419 Sequence analysis Partial correlation coefficient between distance matrices as a new indicator of protein
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationPositive Definite Matrix
1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function
More informationModeling homophily and stochastic equivalence in symmetric relational data
Modeling homophily and stochastic equivalence in symmetric relational data Peter D. Hoff Departments of Statistics and Biostatistics University of Washington Seattle, WA 98195-4322. hoff@stat.washington.edu
More informationThe Singular Value Decomposition
The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationAgriculture, Transportation and the Timing of Urbanization
Agriculture, Transportation and the Timing of Urbanization Global Analysis at the Grid Cell Level Mesbah Motamed Raymond Florax William Masters Department of Agricultural Economics Purdue University SHaPE
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationAppendix: Modeling Approach
AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationDimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationExample Linear Algebra Competency Test
Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,
More informationClustering and blockmodeling
ISEG Technical University of Lisbon Introductory Workshop to Network Analysis of Texts Clustering and blockmodeling Vladimir Batagelj University of Ljubljana Lisbon, Portugal: 2nd to 5th February 2004
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationAssignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran
Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More informationLinear algebra for computational statistics
University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j
More informationAMS-207: Bayesian Statistics
Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 141 Part III
More informationMP 472 Quantum Information and Computation
MP 472 Quantum Information and Computation http://www.thphys.may.ie/staff/jvala/mp472.htm Outline Open quantum systems The density operator ensemble of quantum states general properties the reduced density
More informationWeighted Low Rank Approximations
Weighted Low Rank Approximations Nathan Srebro and Tommi Jaakkola Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Weighted Low Rank Approximations What is
More informationAppendix A: Matrices
Appendix A: Matrices A matrix is a rectangular array of numbers Such arrays have rows and columns The numbers of rows and columns are referred to as the dimensions of a matrix A matrix with, say, 5 rows
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationBIOINFORMATICS Datamining #1
BIOINFORMATICS Datamining #1 Mark Gerstein, Yale University gersteinlab.org/courses/452 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Large-scale Datamining Gene Expression Representing Data in
More informationLecture 4: Principal Component Analysis and Linear Dimension Reduction
Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationLinear Equations and Matrix
1/60 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Gaussian Elimination 2/60 Alpha Go Linear algebra begins with a system of linear
More informationA re examination of the Columbian exchange: Agriculture and Economic Development in the Long Run
Are examinationofthecolumbianexchange: AgricultureandEconomicDevelopmentintheLongRun AlfonsoDíezMinguela MªDoloresAñónHigón UniversitatdeValéncia UniversitatdeValéncia May2012 [PRELIMINARYRESEARCH,PLEASEDONOTCITE]
More information5.6. PSEUDOINVERSES 101. A H w.
5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and
More informationEigenvalues and diagonalization
Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves
More informationSingular Value Decomposition and Principal Component Analysis (PCA) I
Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression
More informationLinear Algebra, part 3 QR and SVD
Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We
More informationlinearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice
3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is
More informationTensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationQuick Tour of Linear Algebra and Graph Theory
Quick Tour of Linear Algebra and Graph Theory CS224W: Social and Information Network Analysis Fall 2014 David Hallac Based on Peter Lofgren, Yu Wayne Wu, and Borja Pelato s previous versions Matrices and
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationReview of Linear Algebra
Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationUNIT 6: The singular value decomposition.
UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T
More informationLecture 10 Optimal Growth Endogenous Growth. Noah Williams
Lecture 10 Optimal Growth Endogenous Growth Noah Williams University of Wisconsin - Madison Economics 702 Spring 2018 Optimal Growth Path Recall we assume exogenous growth in population and productivity:
More informationECON 581. The Solow Growth Model, Continued. Instructor: Dmytro Hryshko
ECON 581. The Solow Growth Model, Continued Instructor: Dmytro Hryshko 1 / 38 The Solow model in continuous time Consider the following (difference) equation x(t + 1) x(t) = g(x(t)), where g( ) is some
More informationAM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition
AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationLecture Notes 2: Matrices
Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector
More informationQuick Tour of Linear Algebra and Graph Theory
Quick Tour of Linear Algebra and Graph Theory CS224w: Social and Information Network Analysis Fall 2012 Yu Wayne Wu Based on Borja Pelato s version in Fall 2011 Matrices and Vectors Matrix: A rectangular
More informationPart I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with
More informationScalable Gaussian process models on matrices and tensors
Scalable Gaussian process models on matrices and tensors Alan Qi CS & Statistics Purdue University Joint work with F. Yan, Z. Xu, S. Zhe, and IBM Research! Models for graph and multiway data Model Algorithm
More informationIntroduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond
PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationCS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery
CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationDirect Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le
Direct Methods for Solving Linear Systems Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le 1 Overview General Linear Systems Gaussian Elimination Triangular Systems The LU Factorization
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationLinear Algebra. Session 12
Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)
More information