Mean and covariance models for relational arrays Peter Hoff Statistics, Biostatistics and the CSSS University of Washington
Outline Introduction and examples Models for multiway mean structure Models for separable covariance arrays
Array-valued data y i,j,k = jth measurement on ith subject under condition k (psychometrics) type-k relationship between i and j (relational data/network) relationship between i and j at time t (dynamic network) y 123 y 124 y 125 y 122 y 121
Array-valued data y i,j,k = jth measurement on ith subject under condition k (psychometrics) type-k relationship between i and j (relational data/network) relationship between i and j at time t (dynamic network) y 123 y 124 y 125 y 122 y 121
Array-valued data y i,j,k = jth measurement on ith subject under condition k (psychometrics) type-k relationship between i and j (relational data/network) relationship between i and j at time t (dynamic network) y 123 y 124 y 125 y 122 y 121
Dynamic network example Cold war cooperation and conflict 66 countries USA 8 years (1950,1955,..., 1980, 1985) y i,j,t =relation between i, j in year t also have data on gdp polity Y is a 66 66 8 data array UKG Model for the mean structure of the data: Y M + E Task: Estimate a low-rank M ROK AUL NEW PHI EGY TAW THI TUR IND BUL HUN RUM CAN JOR SPN LEB FRN HON BRA GDR COL DEN NEP GFR SAF ETH ARG VEN IRQ BEL OMA COS ALB HAI DOMSAL NOR ITA NTH IRE AFG PAN LBR CHL CZE SWD GUA SAU SRI INS CUB NIC POR PER AUS ECU IRN GRC YUG MYA USR ISR PRK CHN
Dynamic network example Cold war cooperation and conflict 66 countries USA 8 years (1950,1955,..., 1980, 1985) y i,j,t =relation between i, j in year t also have data on gdp polity Y is a 66 66 8 data array UKG Model for the mean structure of the data: Y M + E Task: Estimate a low-rank M ROK AUL NEW PHI EGY TAW THI TUR IND BUL HUN RUM CAN JOR SPN LEB FRN HON BRA GDR COL DEN NEP GFR SAF ETH ARG VEN IRQ BEL OMA COS ALB HAI DOMSAL NOR ITA NTH IRE AFG PAN LBR CHL CZE SWD GUA SAU SRI INS CUB NIC POR PER AUS ECU IRN GRC YUG MYA USR ISR PRK CHN
Dynamic network example Cold war cooperation and conflict 66 countries USA 8 years (1950,1955,..., 1980, 1985) y i,j,t =relation between i, j in year t also have data on gdp polity Y is a 66 66 8 data array UKG Model for the mean structure of the data: Y M + E Task: Estimate a low-rank M ROK AUL NEW PHI EGY TAW THI TUR IND BUL HUN RUM CAN JOR SPN LEB FRN HON BRA GDR COL DEN NEP GFR SAF ETH ARG VEN IRQ BEL OMA COS ALB HAI DOMSAL NOR ITA NTH IRE AFG PAN LBR CHL CZE SWD GUA SAU SRI INS CUB NIC POR PER AUS ECU IRN GRC YUG MYA USR ISR PRK CHN
Multivariate relational data example Yearly change in log exports (2000 dollars) : Y = {y i,j,k,l } R 30 30 6 10 i {1,..., 30} indexes exporting nation j {1,..., 30} indexes importing nation k {1,..., 6} indexes commodity l {1,..., 10} indexes year Replications over time: Y = {Y 1,..., Y 10} Y t = M + E t M R 30 30 6, constant over time; E t R 30 30 6, changing over time. How should the covariance among {E 1,..., E 10} be described?
Multivariate relational data example Yearly change in log exports (2000 dollars) : Y = {y i,j,k,l } R 30 30 6 10 i {1,..., 30} indexes exporting nation j {1,..., 30} indexes importing nation k {1,..., 6} indexes commodity l {1,..., 10} indexes year Replications over time: Y = {Y 1,..., Y 10} Y t = M + E t M R 30 30 6, constant over time; E t R 30 30 6, changing over time. How should the covariance among {E 1,..., E 10} be described?
Reduced rank models for mean structure Y = Θ + E Θ contains the main features we hope to recover, E is patternless. Matrix decomposition: If Θ is a rank-r matrix, then RX RX RX θ i,j = u i, v j = u i,r v j,r Θ = u r v T r = u r v r r=1 r=1 r=1 Array decomposition: If Θ is a rank-r array, then RX RX θ i,j,k = u i, v j, w k = u i,r v j,r w k,r Θ = u r v r w r r=1 r=1 (PARAFAC: Harshman[1970], Kruskal[1976,1977], Harshman and Lundy[1984], Kruskal[1989])
Reduced rank models for mean structure Y = Θ + E Θ contains the main features we hope to recover, E is patternless. Matrix decomposition: If Θ is a rank-r matrix, then RX RX RX θ i,j = u i, v j = u i,r v j,r Θ = u r v T r = u r v r r=1 r=1 r=1 Array decomposition: If Θ is a rank-r array, then RX RX θ i,j,k = u i, v j, w k = u i,r v j,r w k,r Θ = u r v r w r r=1 r=1 (PARAFAC: Harshman[1970], Kruskal[1976,1977], Harshman and Lundy[1984], Kruskal[1989])
A model-based approach For a K-way array Y, u (k) 1,..., u(k) m k Y = Θ + E RX Θ = r=1 u (1) r u (K) r U (1),..., U (K) iid multivariate normal(µ k, Ψ k ), with {µ k, Ψ k, k = 1,..., K} to be estimated. Some motivation: shrinkage: Θ contains lots of parameters. hierarchical: covariance among columns of U (k) is identifiable. estimation: p(y U (1),..., U (K) ) multimodal, MCMC stochastic search adaptability: incorporate reduced rank arrays as a model component multilinear predictor in a GLM multilinear effects for regression parameters
A model-based approach For a K-way array Y, u (k) 1,..., u(k) m k Y = Θ + E RX Θ = r=1 u (1) r u (K) r U (1),..., U (K) iid multivariate normal(µ k, Ψ k ), with {µ k, Ψ k, k = 1,..., K} to be estimated. Some motivation: shrinkage: Θ contains lots of parameters. hierarchical: covariance among columns of U (k) is identifiable. estimation: p(y U (1),..., U (K) ) multimodal, MCMC stochastic search adaptability: incorporate reduced rank arrays as a model component multilinear predictor in a GLM multilinear effects for regression parameters
A model-based approach For a K-way array Y, u (k) 1,..., u(k) m k Y = Θ + E RX Θ = r=1 u (1) r u (K) r U (1),..., U (K) iid multivariate normal(µ k, Ψ k ), with {µ k, Ψ k, k = 1,..., K} to be estimated. Some motivation: shrinkage: Θ contains lots of parameters. hierarchical: covariance among columns of U (k) is identifiable. estimation: p(y U (1),..., U (K) ) multimodal, MCMC stochastic search adaptability: incorporate reduced rank arrays as a model component multilinear predictor in a GLM multilinear effects for regression parameters
A model-based approach For a K-way array Y, u (k) 1,..., u(k) m k Y = Θ + E RX Θ = r=1 u (1) r u (K) r U (1),..., U (K) iid multivariate normal(µ k, Ψ k ), with {µ k, Ψ k, k = 1,..., K} to be estimated. Some motivation: shrinkage: Θ contains lots of parameters. hierarchical: covariance among columns of U (k) is identifiable. estimation: p(y U (1),..., U (K) ) multimodal, MCMC stochastic search adaptability: incorporate reduced rank arrays as a model component multilinear predictor in a GLM multilinear effects for regression parameters
Simulation study K = 3, R = 4, (m 1, m 2, m 3) = (10, 8, 6) 1. Generate M, a random array of roughly full rank 2. Set Θ = ALS 4(M) 3. Set Y = Θ + E, {e i,j,k } iid normal(0, v(θ)/4). For each of 100 such simulated datasets, we obtain ˆΘ LS and ˆΘ HB. Questions: How well do ˆΘ LS and ˆΘ HB recover the truth Θ? represent Y?
Simulation study K = 3, R = 4, (m 1, m 2, m 3) = (10, 8, 6) 1. Generate M, a random array of roughly full rank 2. Set Θ = ALS 4(M) 3. Set Y = Θ + E, {e i,j,k } iid normal(0, v(θ)/4). For each of 100 such simulated datasets, we obtain ˆΘ LS and ˆΘ HB. Questions: How well do ˆΘ LS and ˆΘ HB recover the truth Θ? represent Y?
Simulation study: known rank 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 least squares MSE Bayesian MSE mode mean 0.12 0.14 0.16 0.18 0.20 0.12 0.14 0.16 0.18 0.20 least squares RSS Bayesian RSS
Simulation study: misspecified rank least squares hierarchical Bayes log RSS 2.5 1.5 0.5 2.5 1.5 0.5 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 log MSE 3 2 1 0 3 2 1 0 1 2 3 4 5 6 7 8 assumed rank 1 2 3 4 5 6 7 8 assumed rank
Mean structure for dynamic networks {Y 1,..., Y T } = Y = M + E y 125 M = X u r v r w r y 124 What if each Y t is symmetric? y 123 M = X r u r u r w r, u r R n, w r R T y 122 y 121 (INDSCAL (Carroll and Chang, 1970)) Under this Model, M t = UΛ tu T M = {M 1,..., M T }, Λ t = diag(w 1,t,..., w R,t ).
Mean structure for dynamic networks {Y 1,..., Y T } = Y = M + E y 125 M = X u r v r w r y 124 What if each Y t is symmetric? y 123 M = X r u r u r w r, u r R n, w r R T y 122 y 121 (INDSCAL (Carroll and Chang, 1970)) Under this Model, M t = UΛ tu T M = {M 1,..., M T }, Λ t = diag(w 1,t,..., w R,t ).
Dynamic network example y i,j,t { 5, 4,..., +1, +2}, the level of military conflict/cooperation x i,j,t,1 = log gdp i + log gdp j, the sum of the log gdps of the two countries; x i,j,t,2 = (log gdp i ) (log gdp j ), the product of the log gdps; x i,j,t,3 = polity i polity j, where polity i { 1, 0, +1}; x i,j,t,4 = (polity i > 0) (polity j > 0). Model: y i,j,t = f (z i,j,t, c 5,..., c +2) = max{y : z i,j,t > c y } z i,j,t = β T x i,j,t + u T i Λ tu j + ɛ i,j,t Z = {z i,j,t β T x i,j,t } = UΛ tu T + E
Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 0.0 0.2 0.4 0.6 v 2 0.0 0.2 0.4 0.6 1950 1960 1970 1980 u 1 1950 1960 1970 1980
Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 0.0 0.2 0.4 0.6 v 2 0.0 0.2 0.4 0.6 1950 1960 1970 1980 u 1 1950 1960 1970 1980
Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 0.0 0.2 0.4 0.6 v 2 0.0 0.2 0.4 0.6 1950 1960 1970 1980 u 1 1950 1960 1970 1980
Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 0.0 0.2 0.4 0.6 v 2 0.0 0.2 0.4 0.6 1950 1960 1970 1980 u 1 1950 1960 1970 1980
Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 0.0 0.2 0.4 0.6 v 2 0.0 0.2 0.4 0.6 1950 1960 1970 1980 u 1 1950 1960 1970 1980
Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 0.0 0.2 0.4 0.6 v 2 0.0 0.2 0.4 0.6 1950 1960 1970 1980 u 1 1950 1960 1970 1980
Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 0.0 0.2 0.4 0.6 v 2 0.0 0.2 0.4 0.6 1950 1960 1970 1980 u 1 1950 1960 1970 1980
Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 0.0 0.2 0.4 0.6 v 2 0.0 0.2 0.4 0.6 1950 1960 1970 1980 u 1 1950 1960 1970 1980
Covariance structure of multiple relational arrays Yearly change in log exports (2000 dollars) : Y = {y i,j,k,l } R 30 30 6 10 i {1,..., 30} indexes exporting nation j {1,..., 30} indexes importing nation k {1,..., 6} indexes commodity l {1,..., 10} indexes year Replications over time: Y = {Y 1,..., Y 10} Y t = M + E t M R 30 30 6, constant over time; E t R 30 30 6, changing over time. How should the covariance among {E 1,..., E 10} be described?
Separable covariance via Tucker products Y = Θ + E Decompose Θ using the Tucker decomposition (Tucker 1964,1966): RX SX TX θ i,j,k = z r,s,ta i,r b j,r c k,r r=1 s=1 t=1 Θ = Z {A, B, C} Z is the R S T core array A, B, C are R m 1, S m 2, T m 3 matrices. R, S and T are the 1-rank, 2-rank and 3-rank of Θ is array-matrix multiplication (De Lathauwer et al., 2000)
Separable covariance via Tucker products Multivariate normal model: z = {z j : j = 1,..., m} Matrix normal model: Z = {z i,j } m 1,m 2 i=1,j=1 iid normal(0, 1) y = µ + Az multivariate normal(µ, Σ = AA T ) iid normal(0, 1) Y = M + AZB T matrix normal(m, Σ 1 = AA T, Σ 2 = BB T ) NOTE: AZB T = Z {A, B} Array normal model: Z = {z i,j,k } m 1,m 2,m 3 i=1,j=1,k=1 iid normal(0, 1) Y = M + Z {A, B, C} array normal(m, Σ 1 = AA T, Σ 2 = BB T, Σ 3 = CC T )
Separable covariance via Tucker products Multivariate normal model: z = {z j : j = 1,..., m} Matrix normal model: Z = {z i,j } m 1,m 2 i=1,j=1 iid normal(0, 1) y = µ + Az multivariate normal(µ, Σ = AA T ) iid normal(0, 1) Y = M + AZB T matrix normal(m, Σ 1 = AA T, Σ 2 = BB T ) NOTE: AZB T = Z {A, B} Array normal model: Z = {z i,j,k } m 1,m 2,m 3 i=1,j=1,k=1 iid normal(0, 1) Y = M + Z {A, B, C} array normal(m, Σ 1 = AA T, Σ 2 = BB T, Σ 3 = CC T )
Separable covariance via Tucker products Multivariate normal model: z = {z j : j = 1,..., m} Matrix normal model: Z = {z i,j } m 1,m 2 i=1,j=1 iid normal(0, 1) y = µ + Az multivariate normal(µ, Σ = AA T ) iid normal(0, 1) Y = M + AZB T matrix normal(m, Σ 1 = AA T, Σ 2 = BB T ) NOTE: AZB T = Z {A, B} Array normal model: Z = {z i,j,k } m 1,m 2,m 3 i=1,j=1,k=1 iid normal(0, 1) Y = M + Z {A, B, C} array normal(m, Σ 1 = AA T, Σ 2 = BB T, Σ 3 = CC T )
Separable covariance structure For the matrix normal model: Cov[Y] = Σ 1 Σ 2 Cov[vec(Y)] = Σ 2 Σ 1 E[YY T ] = Σ 1 tr(σ 2) E[Y T Y] = Σ 2 tr(σ 1) For the array normal model: Cov[Y] = Σ 1 Σ 2 Σ 3 Cov[vec(Y)] = Σ K Σ 1 E[Y (k) Y T (k)] = Σ k Y tr(σ j ) j k
International trade example Yearly change in log exports (2000 dollars) : Y = {y i,j,k,l } R 30 30 6 7 i {1,..., 30} indexes exporting nation j {1,..., 30} indexes importing nation k {1,..., 6} indexes commodity l {1,..., 10} indexes year Full cell means model: y i,j,k,l = µ i,j,k + e i,j,k,l Let E = {e i,j,k,l } iid error model: E array normal(0, I, I, I, σ 2 I) vector normal error model: E array normal(0, I, I, Σ 3, I) matrix normal error model: E array normal(0, I, I, Σ 3, Σ 4) array normal model: E array normal(0, Σ 1, Σ 2, Σ 3, Σ 4}
International trade example Yearly change in log exports (2000 dollars) : Y = {y i,j,k,l } R 30 30 6 7 i {1,..., 30} indexes exporting nation j {1,..., 30} indexes importing nation k {1,..., 6} indexes commodity l {1,..., 10} indexes year Full cell means model: y i,j,k,l = µ i,j,k + e i,j,k,l Let E = {e i,j,k,l } iid error model: E array normal(0, I, I, I, σ 2 I) vector normal error model: E array normal(0, I, I, Σ 3, I) matrix normal error model: E array normal(0, I, I, Σ 3, Σ 4) array normal model: E array normal(0, Σ 1, Σ 2, Σ 3, Σ 4}
International trade example Yearly change in log exports (2000 dollars) : Y = {y i,j,k,l } R 30 30 6 7 i {1,..., 30} indexes exporting nation j {1,..., 30} indexes importing nation k {1,..., 6} indexes commodity l {1,..., 10} indexes year Full cell means model: y i,j,k,l = µ i,j,k + e i,j,k,l Let E = {e i,j,k,l } iid error model: E array normal(0, I, I, I, σ 2 I) vector normal error model: E array normal(0, I, I, Σ 3, I) matrix normal error model: E array normal(0, I, I, Σ 3, Σ 4) array normal model: E array normal(0, Σ 1, Σ 2, Σ 3, Σ 4}
International trade example Yearly change in log exports (2000 dollars) : Y = {y i,j,k,l } R 30 30 6 7 i {1,..., 30} indexes exporting nation j {1,..., 30} indexes importing nation k {1,..., 6} indexes commodity l {1,..., 10} indexes year Full cell means model: y i,j,k,l = µ i,j,k + e i,j,k,l Let E = {e i,j,k,l } iid error model: E array normal(0, I, I, I, σ 2 I) vector normal error model: E array normal(0, I, I, Σ 3, I) matrix normal error model: E array normal(0, I, I, Σ 3, Σ 4) array normal model: E array normal(0, Σ 1, Σ 2, Σ 3, Σ 4}
International trade example Model comparison: reduced: array normal(0, I, I, Σ 3, Σ 4) full: array normal(0, Σ 1, Σ 2, Σ 3, Σ 4) density 0.0 0.2 0.4 0.6 0.8 1.0 1.2 density 0.0 0.5 1.0 1.5 density 0 2 4 6 8 10 12 14 0 5 10 15 t 1(Y) 0 2 4 6 8 10 t 2(Y) 0.0 0.2 0.4 0.6 0.8 t 3(Y)
International trade example 1 2 3 4 0.5 1.5 2.5 3.5 0.9 1.0 1.1 1.2 1.3 1.4 0 5 10 15 20 25 30 0 5 10 15 20 25 30 1 2 3 4 5 6 0.4 0.2 0.0 0.2 0.4 0.6 IDN MYS THA SINKOR CHN HKG MEX JPN BRATUR USA NZL AUS IRE CZE NLD ITA NOR GRC CAN SWI DEN SPN UKG FRNDEU SWE FINAUT 0.4 0.2 0.0 0.2 IRE GRC AUT SPN FRN DEN ITA SWE DEU FIN UKG MEX CANNLD NOR SWI AUS CZE USA BRA TUR NZL CHN HKG SIN JPN IDN THA MYS KOR 1.0 0.5 0.0 0.5 machinery mfg goods food textiles chemicals crude materials 0.05 0.15 0.25 0.35 0.05 0.15 0.25 0.2 0.3 0.4 0.5
Discussion Scientific studies increasingly involve data with multiway array structure often this structure is unrecognized array structure may be present in data, latent data, or parameters Model-based versions of array decompositions offer several benefits applicability of multiway methods is broadened parameter estimates have reduced MSE Applicability to dynamic, multivariate relational data reduced-rank methods can capture mean structure array-matrix transformations can model covariance
Discussion Scientific studies increasingly involve data with multiway array structure often this structure is unrecognized array structure may be present in data, latent data, or parameters Model-based versions of array decompositions offer several benefits applicability of multiway methods is broadened parameter estimates have reduced MSE Applicability to dynamic, multivariate relational data reduced-rank methods can capture mean structure array-matrix transformations can model covariance
Discussion Scientific studies increasingly involve data with multiway array structure often this structure is unrecognized array structure may be present in data, latent data, or parameters Model-based versions of array decompositions offer several benefits applicability of multiway methods is broadened parameter estimates have reduced MSE Applicability to dynamic, multivariate relational data reduced-rank methods can capture mean structure array-matrix transformations can model covariance