Latent SVD Models for Relational Data

Size: px
Start display at page:

Download "Latent SVD Models for Relational Data"

Transcription

1 Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1

2 Outline 1 Relational data 2 Exchangeability for arrays 3 Latent variable models 4 Subtalk: Rank selection for the SVD model (a) Prior distributions on the Steifel manifold (b) A Gibbs sampling scheme for rank estimation (c) A simulation study 5 Examples: (a) Modeling international conflict (b) Finding protein-protein interactions 2

3 Relational data Relational data consists of a set of units or nodes A, and a set of variables Y {y i,j } specific to pairs of nodes (i, j) A A Examples: Needle-sharing network: Let A index a set of IV drug users Let y i,j indicate needle-sharing activity between i and j International relations: Let A index the set of countries Let y i,j be the number of military disputes initiated by i with target j Protein-protein interactions: Let A index a sets of proteins Let y i,j measure the interaction between i and j Gene-protein interactions: Let A = {A 1,A 2 },witha 1 indexing a set of genes and A 2 indexing a set of proteins Let y i,j measure the relationship between i A 1 and j A 2 3

4 Inferential goals in the regression framework Let y i,j be the measurement on i j, andx i,j be a vector of explanatory variables If y i,j is binary, then maybe we have Y = 1 0 NA NA 1 NA A X = x 1,2 x 1,3 x 1,4 x 1,5 x 2,1 x 2,3 x 2,4 x 2,5 x 3,1 x 3,2 x 3,4 x 3,5 x 4,1 x 4,2 x 4,3 x 4,5 x 5,1 x 5,2 x 5,3 x 5,4 1 A onsider a basic generalized linear model log odds(y i,j =1)=β x i,j A model can provide a measure of the association between X and Y: ˆβ, se(ˆβ) predictions of missing or future observations: Pr(y 1,4 =1 Y, X) 4

5 How might node heterogeneity affect network structure? 5

6 Latent variable models Deviations from ordinary logistic regression can be represented with log odds(y i,j =1)=β x i,j + γ i,j Perhaps the first latent variable model was similar to a p 1 -type model γ i,j = a i + b j log odds(y i,j =1)=β x i,j + a i + b j a i and b j induce across-node heterogeneity that is additive on the log-odds scale Inclusion of these effects in the model can dramatically improve within-sample model fit (measured by R 2, likelihood ratio, I, etc ); out-of-sample predictive performance (measured by cross-validation) ut this model only captures heterogeneity of outdegree/indegree, and can t represent more complicated structure, such as clustering, transitivity, etc 6

7 Exchangeability for arrays For relational data in general we might be willing to assume something like {γ i,j } d = {γ gi,hj } for all permutations g and h This is a type of exchangeability for arrays, sometimes called weak exchangeability Theorem (Aldous, Hoover): Let {γ i,j } be a weakly exchangeable array Then {γ i,j } d = {γ i,j },where γ i,j = f(µ, u i,v j,ɛ i,j ) and µ, {u i }, {v j }, {ɛ i,j } are all independent uniform random variables on [0, 1] There are variants of this theorem for symmetric sociomatrices and other exchangeability assumptions 7

8 Matrix representation techniques Exchangeability suggests random effects models of the form γ i,j = f(µ, u i,v j,ɛ i,j ) What should f be? Model: log odds(y i,j =1)=µ + u idv j + ɛ i,j Interpretation: Think of {u 1,,u m }, {v 1,,v m } as vectors of latent nodal attributes u idv j = K d k u i,k v j,k k=1 8

9 Model Extension: Latent SVD model General model for relational data : Let Y be an m n matrix of relational data; X be an m n p array of explanatory variables A latent variable model relating X to Y is g(e[y i,j ]) = β x i,j + u idv j + ɛ i,j For example, some potential models are If y i,j is binary, If y i,j is count data, If y i,j is continuous, log odds (y i,j =1)=β x i,j + u i Dv j + ɛ i,j log E[y i,j ]=β x i,j + u i Dv j + ɛ i,j E[y i,j ]=β x i,j + u i Dv j + ɛ i,j (similar to the generalized bilinear models of Gabriel 1998) Estimation and dimension selection for u idv j relates to a more basic SVD model 9

10 eginning of subtalk: The singular value decomposition Every m n matrix M has a representation of the form M = UDV where, in the case m n, Recall, U is an m n matrix with orthonormal columns; V is an n n matrix with orthonormal columns; D is an n n diagonal matrix, with diagonal elements {d 1,,d n } typically taken to be a decreasing sequence of non-negative numbers The squared elements of the diagonal of D are the eigenvalues of M M and the columns of V are the corresponding eigenvectors The matrix U can be obtained from the first n eigenvectors of MM The number of non-zero elements of D is the rank of M 10

11 Data analysis with the singular value decomposition Many data analysis procedures for matrix-valued data Y are related to the SVD Given a model of the form Y = M + E the SVD provides Interpretation: y i,j = u i Dv j + ɛ i,j, u i and v j are the ith, jth rows of U, V Estimation: Applications: ˆMK = Û[,1:K] ˆD [1:K,1:K] ˆV [,1:K] if M is assumed to be of rank K biplots (Gabriel 1971, Gower and Hand 1996) reduced-rank interaction models (Gabriel 1978, 1998) analysis of relational data (Harshman et al, 1982) Factor analysis, image processing, data reduction, ut: How to select K? GivenK, is ˆM K a good estimator? (E[Y Y]=M M + mσ 2 I ) ayesian work on factor analysis models: Rajan and Rayner (1997), Minka (2000), Lopes and West (2004) 11

12 Atoyexample Y = UDV + E = X d k U [,k] V [,k] + E singular values

13 Atoyexample Y = UDV + E = X d k U [,k] V [,k] + E singular values

14 Atoyexample Y = UDV + E = X d k U [,k] V [,k] + E singular values

15 P(K Y) Atoyexample K Truth Estimate MSE(ayes) = 020 MSE(LSE) =

16 SVD Model Let Y be an m n data matrix (suppose m n) onsider a model of the form Y = UDV + E where K {0,,n} ; U V K,m, V V K,n ; D =diag{d 1,,d K }; E is a matrix of independent normal noise Given a prior distribution on {U, D, V} we would like to calculate p({u 1,,u K }, {v 1,,v K }, {d 1,,d K } Y,K) p(k Y) 16

17 Prior distribution for U 1 z 1 uniform[m-sphere], set U [,1] = z 1 ; 2 z 2 uniform[(m 1)-sphere], set U [,2] =Null(U [,1] )z 2 ; K z K uniform [(m K + 1)-sphere], U [,K] =Null(U [,1],,U [,K 1] )z K The resulting distribution for U is the uniform (invariant) distribution on the Steifel manifold, and is exchangeable under row and column permutations Prior conditional distributions: (U [,j] U [, j] ) d = Null(U [,1],,U [,j 1], U [,j+1],,u [,K] )z j, z j uniform [(m K + 1)-sphere] 17

18 Posterior distribution for U Recall UDV = K k=1 d ku [,k] V [,k], so so Y UDV = (Y U [, j] D [ j, j] V [, j]) d j U [,j] V [,j] E j d j U [,j] V [,j] Y UDV 2 = E j d j U [,j] V [,j] 2 = E j 2 2d j U [,j] E j V [,j] + d j U [,j] V [,j] 2 = E j 2 2d j U [,j] E j V [,j] + d 2 j p(y U, V, D,φ) = (2πφ) mn/2 exp{ 1 2 φ E j 2 + φd j U [,j] E j V [,j] 1 2 φd2 j} p(u [,j] Y, U [, j], V, D) p(u [,j] U [, j] )exp{φd j U [,j] E j V [,j] } (U [,j] U [, j], Y, V, D) d = Null(U [,1],,U [,j 1], U [,j+1],,u [,K] )z j N j z j, p(z j ) exp{z j µ}, µ = φd j N j E jv [,j] 18

19 Evaluating the dimension It can all be done by comparing M 1 : Y = duv + E M 0 : Y = E To decide whether or not to add a dimension, we need to calculate p(m 1 Y) p(m 0 Y) = p(m 1) p(y M 1 ) p ( M 0 ) p(y M 0 ) The numerator of the ayes factor is the hard part We need to integrate the following wrt the prior distribution: p(y M 1, u, v, d) = (2π) nm/2 exp{ 1 2 φ Y 2 /2+φdu Yv φd 2 /2} = p(y M 0 ) exp{ φd 2 /2} exp{u [dφy]v} omputing E[e u Av ]overu Sm and v S n is difficult but doable 19

20 Simulation study For each of (m, n) {(10, 10), (100, 10), (100, 100)}, 100 datasets were generated as follows: U uniform(v 5,m ), V uniform(v 5,n ); D =diag{d 1,,d 5 }, {d 1,,d 5 } iid uniform( 1 2 µ mn, 3 2 µ mn) Y = UDV + E, wheree is an m n matrix of standard normal noise where µ mn = m + n +2 mn This choice was not arbitrary: The largest eigenvalue of E is approximately (Edelman, 1988) λ max (m + n +2 mn) 20

21 E[p(K Y)] E[p(K Y)] E[p(K Y)] m=10, n= m=100, n= m=100, n= K Simulation study K^ 21 p(k^ ) p(k^ ) p(k^ ) Density Density Density ASE ASELS

22 International onflict Network, years of international relations data (thanks to Mike Ward and Xun ao) y i,j = number of militarized disputes initiated by i with target j; x i,j an 11-dimensional covariate vector containing 1 log distance 2 log imports 3 log exports 4 number of shared intergovernmental organizations 5 population initiator 6 population target 7 log gdp initiator 8 log gdp target 9 polity score initiator 10 polity score target 11 polity score initiator polity score target Model: y i,j are independent Poisson random variables with log-mean log E[y i,j ]=β x i,j + u idv j + ɛ i,j 22

23 International conflict network, : Parameter estimates p(k Y) log distance polity initiator intergov org polity target polity interaction log imports log gdp target log exports log gdp initiator log pop target log pop initiator K standardized coefficient 23

24 AFG ANG ARG AUL AH NG OT UI AN AO HA HN OS U YP DOM DR EGY FRN GHA GR GUI HON IND INS IRN IRQ ISR ITA JOR JPN KEN LR LES LI MAL MYA NI NIG NIR NTH OMA PAK PHI PRK QAT ROK RWA SAF SAL SAU SEN SIE SRI SUD SWA SYR TAW TAZ THI TOG TUR UAE UGA UKG USA VEN AFG AL ANG ARG AUL AH EL EN NG UI AM AN DI HA HL HN OL ON OS U YP DR EGY FRN GHA GN GR GUI GUY HAI HON IND INS IRN IRQ ISR ITA JOR JPN LR LES LI MAA MLI MON MOR MYA MZM NAM NI NIG NIR NTH OMA PAK PHI PNG PRK QAT ROK RWA SAF SAL SAU SEN SIE SIN SPN SUD SYR TAW TAZ THI TOG TRI TUR UAE UGA UKG USA VEN YEM ZAM ZIM 24

25 Protein-protein interaction data Experiments are run on pairs of protein to see if they bind: 1 if proteins i, j bind; y i,j = 0 else There are many proteins, and ( ) many 2 possible experiments, only a subset of which are usually performed an we predict interactions from a subset of the experiments? Missing data experiment: Ecoli protein interaction network (utland et al 2005); All interactions checked on a protein network; A very sparse network: ȳ =001 25

26 Full dataset u v % training data u v 25% test data u v # of additional experiments # of new observed interactions Finding missing links 26

27 Summary and future directions Summary: SVD models are a natural way to represent patterns in relational data; The SVD structure can be incorporated into a variety of model forms; MA on model rank can be done Future Directions: Extensions to higher dimensional arrays; Dynamic network inference (times series models for Y, U, D, V); Restrict latent characteristics to be orthogonal to design vectors 27

Latent Factor Models for Relational Data

Latent Factor Models for Relational Data Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington Outline Part 1: Multiplicative factor models for

More information

Hierarchical models for multiway data

Hierarchical models for multiway data Hierarchical models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Array-valued data y i,j,k = jth variable of ith subject under condition k (psychometrics).

More information

Mean and covariance models for relational arrays

Mean and covariance models for relational arrays Mean and covariance models for relational arrays Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Models for multiway mean structure Models for

More information

Higher order patterns via factor models

Higher order patterns via factor models 1/39 Higher order patterns via factor models 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/39 Conflict data Y

More information

Probability models for multiway data

Probability models for multiway data Probability models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Hierarchical models for multiway factors Deep interactions

More information

Model averaging and dimension selection for the singular value decomposition

Model averaging and dimension selection for the singular value decomposition Model averaging and dimension selection for the singular value decomposition Peter D. Hoff December 9, 2005 Abstract Many multivariate data reduction techniques for an m n matrix Y involve the model Y

More information

arxiv:math/ v1 [math.st] 1 Sep 2006

arxiv:math/ v1 [math.st] 1 Sep 2006 Model averaging and dimension selection for the singular value decomposition arxiv:math/0609042v1 [math.st] 1 Sep 2006 Peter D. Hoff February 2, 2008 Abstract Many multivariate data analysis techniques

More information

Model averaging and dimension selection for the singular value decomposition

Model averaging and dimension selection for the singular value decomposition Model averaging and dimension selection for the singular value decomposition Peter D. Hoff Technical Report no. 494 Department of Statistics University of Washington January 10, 2006 Abstract Many multivariate

More information

Multilinear tensor regression for longitudinal relational data

Multilinear tensor regression for longitudinal relational data Multilinear tensor regression for longitudinal relational data Peter D. Hoff 1 Working Paper no. 149 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 November

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Dyadic data analysis with amen

Dyadic data analysis with amen Dyadic data analysis with amen Peter D. Hoff May 23, 2017 Abstract Dyadic data on pairs of objects, such as relational or social network data, often exhibit strong statistical dependencies. Certain types

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Latent SVD Models for Relational Data

Latent SVD Models for Relational Data Latent SVD Models for Relational Data Peter Hoff Statistics, iostatistics and enter for Statistics and the Social Sciences University of Washington 1 Outline Part 1: Multiplicative factor models for network

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

Hierarchical multilinear models for multiway data

Hierarchical multilinear models for multiway data Hierarchical multilinear models for multiway data Peter D. Hoff 1 Technical Report no. 564 Department of Statistics University of Washington Seattle, WA 98195-4322. December 28, 2009 1 Departments of Statistics

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

The International-Trade Network: Gravity Equations and Topological Properties

The International-Trade Network: Gravity Equations and Topological Properties The International-Trade Network: Gravity Equations and Topological Properties Giorgio Fagiolo Sant Anna School of Advanced Studies Laboratory of Economics and Management Piazza Martiri della Libertà 33

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

A Hierarchical Perspective on Lee-Carter Models

A Hierarchical Perspective on Lee-Carter Models A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Equivariant and scale-free Tucker decomposition models

Equivariant and scale-free Tucker decomposition models Equivariant and scale-free Tucker decomposition models Peter David Hoff 1 Working Paper no. 142 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 December 31,

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Weighted Low Rank Approximations

Weighted Low Rank Approximations Weighted Low Rank Approximations Nathan Srebro and Tommi Jaakkola Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Weighted Low Rank Approximations What is

More information

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Dr Gerhard Roth COMP 40A Winter 05 Version Linear algebra Is an important area of mathematics It is the basis of computer vision Is very widely taught, and there are many resources

More information

Sampling and incomplete network data

Sampling and incomplete network data 1/58 Sampling and incomplete network data 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/58 Network sampling methods It is sometimes difficult to obtain a

More information

Modeling homophily and stochastic equivalence in symmetric relational data

Modeling homophily and stochastic equivalence in symmetric relational data Modeling homophily and stochastic equivalence in symmetric relational data Peter D. Hoff Departments of Statistics and Biostatistics University of Washington Seattle, WA 98195-4322. hoff@stat.washington.edu

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Positive Definite Matrix

Positive Definite Matrix 1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

Latent Factor Models for Relational Data

Latent Factor Models for Relational Data Latent Factor Models for Relational Data Peter Hoff Statistics, Biostatistics and Center for Statistics and the Social Sciences University of Washington Outline Part 1: Multiplicative factor models for

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Inferring Latent Preferences from Network Data

Inferring Latent Preferences from Network Data Inferring Latent Preferences from Network John S. Ahlquist 1 Arturas 2 1 UC San Diego GPS 2 NYU 14 November 2015 very early stages Methodological extend latent space models (Hoff et al 2002) to partial

More information

Linear algebra for computational statistics

Linear algebra for computational statistics University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2 Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Review of similarity transformation and Singular Value Decomposition

Review of similarity transformation and Singular Value Decomposition Review of similarity transformation and Singular Value Decomposition Nasser M Abbasi Applied Mathematics Department, California State University, Fullerton July 8 7 page compiled on June 9, 5 at 9:5pm

More information

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Information Search and Management Prof. Chris Clifton 6 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 Vector Space Model Disadvantages:

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

17.1 Directed Graphs, Undirected Graphs, Incidence Matrices, Adjacency Matrices, Weighted Graphs

17.1 Directed Graphs, Undirected Graphs, Incidence Matrices, Adjacency Matrices, Weighted Graphs Chapter 17 Graphs and Graph Laplacians 17.1 Directed Graphs, Undirected Graphs, Incidence Matrices, Adjacency Matrices, Weighted Graphs Definition 17.1. A directed graph isa pairg =(V,E), where V = {v

More information

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010 Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu COMPSTAT

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Lecture 9: Low Rank Approximation

Lecture 9: Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

Math 408 Advanced Linear Algebra

Math 408 Advanced Linear Algebra Math 408 Advanced Linear Algebra Chi-Kwong Li Chapter 4 Hermitian and symmetric matrices Basic properties Theorem Let A M n. The following are equivalent. Remark (a) A is Hermitian, i.e., A = A. (b) x

More information

Bayes Decision Theory - I

Bayes Decision Theory - I Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find

More information

Bi-cross-validation for factor analysis

Bi-cross-validation for factor analysis Picking k in factor analysis 1 Bi-cross-validation for factor analysis Art B. Owen, Stanford University and Jingshu Wang, Stanford University Picking k in factor analysis 2 Summary Factor analysis is over

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

Conjugate Analysis for the Linear Model

Conjugate Analysis for the Linear Model Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Example Linear Algebra Competency Test

Example Linear Algebra Competency Test Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering. Some Slides

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering. Some Slides Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering Some Slides Jean Gallier Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104,

More information

Total Least Squares Approach in Regression Methods

Total Least Squares Approach in Regression Methods WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics

More information

DYNAMIC AND COMPROMISE FACTOR ANALYSIS

DYNAMIC AND COMPROMISE FACTOR ANALYSIS DYNAMIC AND COMPROMISE FACTOR ANALYSIS Marianna Bolla Budapest University of Technology and Economics marib@math.bme.hu Many parts are joint work with Gy. Michaletzky, Loránd Eötvös University and G. Tusnády,

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

BIOINFORMATICS Datamining #1

BIOINFORMATICS Datamining #1 BIOINFORMATICS Datamining #1 Mark Gerstein, Yale University gersteinlab.org/courses/452 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Large-scale Datamining Gene Expression Representing Data in

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Exercise Sheet 1 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Show that an infinite mixture of Gaussian distributions, with Gamma distributions as mixing weights in the following

More information

Quick Tour of Linear Algebra and Graph Theory

Quick Tour of Linear Algebra and Graph Theory Quick Tour of Linear Algebra and Graph Theory CS224W: Social and Information Network Analysis Fall 2014 David Hallac Based on Peter Lofgren, Yu Wayne Wu, and Borja Pelato s previous versions Matrices and

More information

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF) Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information