Hierarchical models for multiway data

Similar documents
Mean and covariance models for relational arrays

Probability models for multiway data

Latent Factor Models for Relational Data

Hierarchical multilinear models for multiway data

Latent SVD Models for Relational Data

Multilinear tensor regression for longitudinal relational data

Dyadic data analysis with amen

The International-Trade Network: Gravity Equations and Topological Properties

Latent Factor Models for Relational Data

Clustering and blockmodeling

Higher order patterns via factor models

Inferring Latent Preferences from Network Data

Equivariant and scale-free Tucker decomposition models

SCHOOL OF MATHEMATICS AND STATISTICS

Agriculture, Transportation and the Timing of Urbanization

Lecture 10 Optimal Growth Endogenous Growth. Noah Williams

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Random Effects Models for Network Data

Landlocked or Policy Locked?

Separable covariance arrays via the Tucker product, with applications to multivariate relational data

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

Part 6: Multivariate Normal and Linear Models

Nonparametric Bayes tensor factorizations for big data

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

STAT 705 Chapter 16: One-way ANOVA

Regression I: Mean Squared Error and Measuring Quality of Fit

Decomposing a three-way dataset of TV-ratings when this is impossible. Alwin Stegeman

Statistical challenges in Disease Ecology

Self Organizing Maps

ECON 581. The Solow Growth Model, Continued. Instructor: Dmytro Hryshko

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Classification. Chapter Introduction. 6.2 The Bayes classifier

Biostatistics Advanced Methods in Biostatistics IV

Lecture Note 13 The Gains from International Trade: Empirical Evidence Using the Method of Instrumental Variables

Part 8: GLMs and Hierarchical LMs and GLMs

Economic Growth: Lecture 1, Questions and Evidence

Two Stage Modelling of Arms Trade: Applying Inferential Network Analysis on the Cold War Period

Multivariate Normal & Wishart

Network Analysis with Pajek

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Multilevel Analysis, with Extensions

Landlocked or Policy Locked?

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Linear Methods for Prediction

Stat 579: Generalized Linear Models and Extensions

Figure 36: Respiratory infection versus time for the first 49 children.

Computational statistics

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

1 Introduction. 2 Example

2 Statistical Estimation: Basic Concepts

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

International Investment Positions and Exchange Rate Dynamics: A Dynamic Panel Analysis

INSTITUTIONS AND THE LONG-RUN IMPACT OF EARLY DEVELOPMENT

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Bayesian shrinkage approach in variable selection for mixed

An Introduction to Spectral Learning

Towards a Regression using Tensors

A primer on Bayesian statistics, with an application to mortality rate estimation

Multinomial Logistic Regression Models

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Lecture 14: Shrinkage

!" #$$% & ' ' () ) * ) )) ' + ( ) + ) +( ), - ). & " '" ) / ) ' ' (' + 0 ) ' " ' ) () ( ( ' ) ' 1)

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Efficient Bayesian Multivariate Surface Regression

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Course topics (tentative) The role of random effects

Classical and Bayesian inference

Module 4: Bayesian Methods Lecture 5: Linear regression

Distribution-free ROC Analysis Using Binary Regression Techniques

A re examination of the Columbian exchange: Agriculture and Economic Development in the Long Run

Network Analysis Using a Local Structure Graph Model: Application to Alliance Formation

Generalized logit models for nominal multinomial responses. Local odds ratios

Lecture 9 Endogenous Growth Consumption and Savings. Noah Williams

Separable covariance arrays via the Tucker product - Final

Fundamentals of Unconstrained Optimization

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff

Latent Factor Regression Models for Grouped Outcomes

Learning gradients: prescriptive models

Vector Auto-Regressive Models

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

WU Weiterbildung. Linear Mixed Models

VAR Models and Applications

Nearest Neighbor Gaussian Processes for Large Spatial Data


Bayesian analysis of logistic regression

IDENTIFYING MULTILATERAL DEPENDENCIES IN THE WORLD TRADE NETWORK

I r j Binom(m j, p j ) I L(, ; y) / exp{ y j + (x j y j ) m j log(1 + e + x j. I (, y) / L(, ; y) (, )

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

Bayes methods for categorical data. April 25, 2017

Spatial Misalignment

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Parametric Techniques Lecture 3

Transcription:

Hierarchical models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington

Array-valued data y i,j,k = jth variable of ith subject under condition k (psychometrics). type-k relationship between i and j (relational data/network). sample mean of variable i for group j in state k (cross-classified data) y 123 y 124 y 125 y 122 y 121

Longitudinal network example USA Cold war cooperation and conflict 66 countries 8 years (1950,1955,..., 1980, 1985) y i,j,t =relation between i, j in year t UKG also have data on gdp, polity ROK AUL NEW PHI EGY TAW THI TUR IND BUL HUN RUM CAN JOR SPN LEB FRN HON BRA GDR COL DEN NEP GFR SAF ETH ARG VEN IRQ BEL OMA COS ALB HAI DOMSAL NOR ITA NTH IRE AFG PAN LBR CHL CZE SWD GUA SAU SRI INS CUB NIC POR PER AUS ECU IRN GRC YUG MYA USR ISR PRK CHN

Deep interaction example words 3 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 1 2 3 4 1 2 3 4 male female 0 1 2 3 tv 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3 1 2 3 4 deg 1 2 3 4 age male female sex 0 1 2 3 child {y i : x i = x} iid multivariate normal(µx, Σ) n = 1116 survey participants 4 4 2 4 = 128 levels of x {β x } a 4 4 2 4 2 array > 1/2 levels have 5 samples

Reduced rank models Y = Θ + E Θ contains the main features we hope to recover, E is patternless. Matrix decomposition: If Θ is a rank-r matrix, then RX RX RX θ i,j = u i, v j = u i,r v j,r Θ = u r v T r = u r v r r=1 r=1 r=1 Array decomposition: If Θ is a rank-r array, then RX RX θ i,j,k = u i, v j, w k = u i,r v j,r w k,r Θ = u r v r w r r=1 r=1 (Harshman[1970], Kruskal[1976,1977], Harshman and Lundy[1984], Kruskal[1989] )

Some things you should know 1. Computing the rank matrix: easy to do array: no known algorithm 2. Possible rank matrix: R max = min(m 1, m 2 ) array: max(m 1, m 2, m 3 ) R max min(m 1 m 2, m 1 m 3, m 2 m 3 ) 3. Probable rank matrix: almost all matrices have full rank. array: a nonzero fraction (w.r.t. Lebesgue measure) have less than full rank. 4. Least squares approximation matrix: SVD of Y provides the rank R least-squares approximation to Θ. array: iterative least squares methods, but solution may not exist (de Silva and Lim[2008]) 5. Uniqueness matrix: The representation Θ = U, V = UV T is not unique. array: The representation Θ = U, V, W is essentially unique.

A model-based approach For a K-way array Y, u (k) 1,..., u(k) m k Y = Θ + E RX Θ = r=1 u (1) r u (K) r U (1),..., U (K) iid multivariate normal(µ k, Ψ k ), with {µ k, Ψ k, k = 1,..., K} to be estimated. Some motivation: shrinkage: Θ contains lots of parameters. hierarchical: covariance among columns of U (k) is identifiable. estimation: p(y U (1),..., U (K) ) multimodal, MCMC stochastic search adaptability: incorporate reduced rank arrays as a model component multilinear predictor in a GLM multilinear effects for regression parameters

Simulation study K = 3, R = 4, (m 1, m 2, m 3) = (10, 8, 6) 1. Generate M, a random array of roughly full rank 2. Set Θ = ALS 4(M) 3. Set Y = Θ + E, {e i,j,k } iid normal(0, v(θ)/4). For each of 100 such simulated datasets, we obtain ˆΘ LS and ˆΘ HB.

Simulation study: known rank 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 least squares MSE Bayesian MSE mode mean 0.12 0.14 0.16 0.18 0.20 0.12 0.14 0.16 0.18 0.20 least squares RSS Bayesian RSS

Simulation study: misspecified rank least squares hierarchical Bayes log RSS 2.5 1.5 0.5 2.5 1.5 0.5 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 log MSE 3 2 1 0 3 2 1 0 1 2 3 4 5 6 7 8 assumed rank 1 2 3 4 5 6 7 8 assumed rank

Simulation study: comments on rank selection A hierarchical model - try DIC: R true = 2 Pr(ˆR = r) = {0.10, 0.74, 0.07, 0.05, 0.02, 0.01, 0.01, 0.00} R true = 4 Pr(ˆR = r) = {0.08, 0.15, 0.27, 0.28, 0.06, 0.07, 0.04, 0.05} R true = 6 Pr(ˆR = r) = {0.07, 0.18, 0.19, 0.17, 0.10, 0.08, 0.09, 0.12} Keep in mind: A rank R < R true estimate might be better than the rank R true estimate.

Deep interaction example The 2008 General Social Survey includes data on the following six variables: y 1 (words): number of correct answers out of 10 on a vocabulary test; y 2 (tv): hours of television watched in a typical day; x 1 (deg) highest degree obtained: none, high school, Bachelor s, graduate; x 2 (age): 18-34, 35-47, 48-60, 61 and older; x 3 (sex): male or female; x 4 (child) number of children: 0, 1, 2, 3 or more. Nominal goal: Estimate E[y x] for each of the 128 possible x-vectors. 0 5 10 15 0 3 6 9 12 16 20 25 29 38 56 61 n(x)

Deep interaction example Sampling model: {y i : x i = x} iid multivariate normal(µ x, Σ) Mean model: µ x = β x + γ x {β x : x X } = B is of reduced rank {γ x : x X } This is a full model: µ x is unconstrained. iid multivariate normal(0, Ω) This is a hierarchical model: ˆµ x borrows information from other x-groups.

Deep interaction example 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 u 1 u2 deg.1 deg.2 deg.3 deg.4 age.1 age.2 age.3 age.4 sex.m sex.f child.0 child.1 child.2 child.3 words tv 0 10 20 30 40 50 60 3 2 1 0 1 sample size yx µx ^

Longitudinal network example Conflict and cooperation during the cold war y i,j,t { 5, 4,..., +1, +2}, the level of military conflict/cooperation x i,j,t,1 = log gdp i + log gdp j, the sum of the log gdps of the two countries; x i,j,t,2 = (log gdp i ) (log gdp j ), the product of the log gdps; x i,j,t,3 = polity i polity j, where polity i { 1, 0, +1}; x i,j,t,4 = (polity i > 0) (polity j > 0).

Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 v 2 u 1

Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 v 2 u 1

Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 v 2 u 1

Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 v 2 u 1

Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 v 2 u 1

Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 v 2 u 1

Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 v 2 u 1

Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 v 2 u 1

Discussion Multiway models provide a parsimonious representation of mean structure. Bayesian estimation provides regularized estimates. A Hierarchical model provides adaptive regularization. Array structures can be incorporated into more complicated models.