Hierarchical models for multiway data

Hierarchical models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington

Array-valued data y i,j,k = jth variable of ith subject under condition k (psychometrics). type-k relationship between i and j (relational data/network). sample mean of variable i for group j in state k (cross-classified data) y 123 y 124 y 125 y 122 y 121

Longitudinal network example USA Cold war cooperation and conflict 66 countries 8 years (1950,1955,..., 1980, 1985) y i,j,t =relation between i, j in year t UKG also have data on gdp, polity ROK AUL NEW PHI EGY TAW THI TUR IND BUL HUN RUM CAN JOR SPN LEB FRN HON BRA GDR COL DEN NEP GFR SAF ETH ARG VEN IRQ BEL OMA COS ALB HAI DOMSAL NOR ITA NTH IRE AFG PAN LBR CHL CZE SWD GUA SAU SRI INS CUB NIC POR PER AUS ECU IRN GRC YUG MYA USR ISR PRK CHN

Deep interaction example words 3 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 1 2 3 4 1 2 3 4 male female 0 1 2 3 tv 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3 1 2 3 4 deg 1 2 3 4 age male female sex 0 1 2 3 child {y i : x i = x} iid multivariate normal(µx, Σ) n = 1116 survey participants 4 4 2 4 = 128 levels of x {β x } a 4 4 2 4 2 array > 1/2 levels have 5 samples

Reduced rank models Y = Θ + E Θ contains the main features we hope to recover, E is patternless. Matrix decomposition: If Θ is a rank-r matrix, then RX RX RX θ i,j = u i, v j = u i,r v j,r Θ = u r v T r = u r v r r=1 r=1 r=1 Array decomposition: If Θ is a rank-r array, then RX RX θ i,j,k = u i, v j, w k = u i,r v j,r w k,r Θ = u r v r w r r=1 r=1 (Harshman[1970], Kruskal[1976,1977], Harshman and Lundy[1984], Kruskal[1989] )

Some things you should know 1. Computing the rank matrix: easy to do array: no known algorithm 2. Possible rank matrix: R max = min(m 1, m 2 ) array: max(m 1, m 2, m 3 ) R max min(m 1 m 2, m 1 m 3, m 2 m 3 ) 3. Probable rank matrix: almost all matrices have full rank. array: a nonzero fraction (w.r.t. Lebesgue measure) have less than full rank. 4. Least squares approximation matrix: SVD of Y provides the rank R least-squares approximation to Θ. array: iterative least squares methods, but solution may not exist (de Silva and Lim[2008]) 5. Uniqueness matrix: The representation Θ = U, V = UV T is not unique. array: The representation Θ = U, V, W is essentially unique.

A model-based approach For a K-way array Y, u (k) 1,..., u(k) m k Y = Θ + E RX Θ = r=1 u (1) r u (K) r U (1),..., U (K) iid multivariate normal(µ k, Ψ k ), with {µ k, Ψ k, k = 1,..., K} to be estimated. Some motivation: shrinkage: Θ contains lots of parameters. hierarchical: covariance among columns of U (k) is identifiable. estimation: p(y U (1),..., U (K) ) multimodal, MCMC stochastic search adaptability: incorporate reduced rank arrays as a model component multilinear predictor in a GLM multilinear effects for regression parameters

Simulation study K = 3, R = 4, (m 1, m 2, m 3) = (10, 8, 6) 1. Generate M, a random array of roughly full rank 2. Set Θ = ALS 4(M) 3. Set Y = Θ + E, {e i,j,k } iid normal(0, v(θ)/4). For each of 100 such simulated datasets, we obtain ˆΘ LS and ˆΘ HB.

Simulation study: known rank 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 least squares MSE Bayesian MSE mode mean 0.12 0.14 0.16 0.18 0.20 0.12 0.14 0.16 0.18 0.20 least squares RSS Bayesian RSS

Simulation study: misspecified rank least squares hierarchical Bayes log RSS 2.5 1.5 0.5 2.5 1.5 0.5 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 log MSE 3 2 1 0 3 2 1 0 1 2 3 4 5 6 7 8 assumed rank 1 2 3 4 5 6 7 8 assumed rank

Simulation study: comments on rank selection A hierarchical model - try DIC: R true = 2 Pr(ˆR = r) = {0.10, 0.74, 0.07, 0.05, 0.02, 0.01, 0.01, 0.00} R true = 4 Pr(ˆR = r) = {0.08, 0.15, 0.27, 0.28, 0.06, 0.07, 0.04, 0.05} R true = 6 Pr(ˆR = r) = {0.07, 0.18, 0.19, 0.17, 0.10, 0.08, 0.09, 0.12} Keep in mind: A rank R < R true estimate might be better than the rank R true estimate.

Deep interaction example The 2008 General Social Survey includes data on the following six variables: y 1 (words): number of correct answers out of 10 on a vocabulary test; y 2 (tv): hours of television watched in a typical day; x 1 (deg) highest degree obtained: none, high school, Bachelor s, graduate; x 2 (age): 18-34, 35-47, 48-60, 61 and older; x 3 (sex): male or female; x 4 (child) number of children: 0, 1, 2, 3 or more. Nominal goal: Estimate E[y x] for each of the 128 possible x-vectors. 0 5 10 15 0 3 6 9 12 16 20 25 29 38 56 61 n(x)

Deep interaction example Sampling model: {y i : x i = x} iid multivariate normal(µ x, Σ) Mean model: µ x = β x + γ x {β x : x X } = B is of reduced rank {γ x : x X } This is a full model: µ x is unconstrained. iid multivariate normal(0, Ω) This is a hierarchical model: ˆµ x borrows information from other x-groups.

Deep interaction example 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 u 1 u2 deg.1 deg.2 deg.3 deg.4 age.1 age.2 age.3 age.4 sex.m sex.f child.0 child.1 child.2 child.3 words tv 0 10 20 30 40 50 60 3 2 1 0 1 sample size yx µx ^

Longitudinal network example Conflict and cooperation during the cold war y i,j,t { 5, 4,..., +1, +2}, the level of military conflict/cooperation x i,j,t,1 = log gdp i + log gdp j, the sum of the log gdps of the two countries; x i,j,t,2 = (log gdp i ) (log gdp j ), the product of the log gdps; x i,j,t,3 = polity i polity j, where polity i { 1, 0, +1}; x i,j,t,4 = (polity i > 0) (polity j > 0).

Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 v 2 u 1

Discussion Multiway models provide a parsimonious representation of mean structure. Bayesian estimation provides regularized estimates. A Hierarchical model provides adaptive regularization. Array structures can be incorporated into more complicated models.