Probability models for multiway data

Probability models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington

Outline Introduction and examples Hierarchical models for multiway factors Deep interactions International conflict Separable covariance

Array-valued data y i,j,k = jth measurement on ith subject under condition k (psychometrics) type-k relationship between i and j (relational data/network) sample mean of variable i for group j in state k (cross-classified data) y 123 y 124 y 125 y 122 y 121

Longitudinal network example Cold war cooperation and conflict USA ROK 66 countries 8 years (1950,1955,..., 1980, 1985) y i,j,t =relation between i, j in year t also have data on gdp polity UKG AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM SPN LEB FRN USR A 66 66 8 data array GFR NOR ITA NTH SAU HON BRA COL DEN NEP SAF ETH ARG VEN BEL OMA COS ALB HAI DOMSAL IRE AFG PAN LBRSWD GUA CHL SRI POR PER GDR CZE INS CUB NIC AUS ECU IRQ GRC IRN YUG MYA ISR PRK CHN

Deep interaction example words 3 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 1 2 3 4 1 2 3 4 male female 0 1 2 3 tv 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3 1 2 3 4 deg 1 2 3 4 age male female sex 0 1 2 3 child {y i : x i = x} iid multivariate normal(µx, Σ) n = 1116 survey participants 4 4 2 4 = 128 levels of x {µ x } a 4 4 2 4 2 array > 1/2 levels have 5 samples

Data models and Probability models Y = Θ + E Θ contains the main features we hope to recover, E is patternless. Data model: Θ represents main features of the data E represents residual features Goal is to compactly represent/summarize/describe the data Probability model: Θ represents a fixed process or population parameter E represents measurement error or sample-to-sample variation Goal is to estimate Θ and describe our estimation uncertainty

Reduced rank models Y = Θ + E Θ contains the main features we hope to recover, E is patternless. Matrix decomposition: If Θ is a rank-r matrix, then RX RX RX θ i,j = u i, v j = u i,r v j,r Θ = u r v T r = u r v r r=1 r=1 r=1 Array decomposition: If Θ is a rank-r array, then RX RX θ i,j,k = u i, v j, w k = u i,r v j,r w k,r Θ = u r v r w r r=1 r=1 (PARAFAC: Harshman[1970], Kruskal[1976,1977], Harshman and Lundy[1984], Kruskal[1989])

Some things to worry about 1. Computing the rank matrix: easy to do array: no known algorithm 2. Possible rank matrix: R max = min(m 1, m 2 ) array: max(m 1, m 2, m 3 ) R max min(m 1 m 2, m 1 m 3, m 2 m 3 ) 3. Probable rank matrix: almost all matrices have full rank. array: a nonzero fraction (w.r.t. Lebesgue measure) have less than full rank. 4. Least squares approximation matrix: SVD of Y provides the rank R least-squares approximation to Θ. array: iterative least squares methods, but solution may not exist (de Silva and Lim[2008] ) 5. Uniqueness matrix: The representation Θ = U, V = UV T is not unique. array: The representation Θ = U, V, W is essentially unique.

A model-based approach For a K-way array Y, u (k) 1,..., u(k) m k Y = Θ + E RX Θ = r=1 u (1) r u (K) r U (1),..., U (K) iid multivariate normal(µ k, Ψ k ), with {µ k, Ψ k, k = 1,..., K} to be estimated. Some motivation: shrinkage: Θ contains lots of parameters. hierarchical: covariance among columns of U (k) is identifiable. estimation: p(y U (1),..., U (K) ) multimodal, MCMC stochastic search adaptability: incorporate reduced rank arrays as a model component multilinear predictor in a GLM multilinear effects for regression parameters

Simulation study K = 3, R = 4, (m 1, m 2, m 3) = (10, 8, 6) 1. Generate M, a random array of roughly full rank 2. Set Θ = ALS 4(M) 3. Set Y = Θ + E, {e i,j,k } iid normal(0, v(θ)/4). For each of 100 such simulated datasets, we obtain ˆΘ LS and ˆΘ HB. Questions: How well do ˆΘ LS and ˆΘ HB recover the truth Θ? represent Y?

Simulation study: known rank 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 least squares MSE Bayesian MSE mode mean 0.12 0.14 0.16 0.18 0.20 0.12 0.14 0.16 0.18 0.20 least squares RSS Bayesian RSS

Simulation study: misspecified rank least squares hierarchical Bayes log RSS 2.5 1.5 0.5 2.5 1.5 0.5 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 log MSE 3 2 1 0 3 2 1 0 1 2 3 4 5 6 7 8 assumed rank 1 2 3 4 5 6 7 8 assumed rank

Simulation study: comments on rank selection A hierarchical model - try DIC: R true = 2 Pr(ˆR = r) = {0.10, 0.74, 0.07, 0.05, 0.02, 0.01, 0.01, 0.00} R true = 4 Pr(ˆR = r) = {0.08, 0.15, 0.27, 0.28, 0.06, 0.07, 0.04, 0.05} R true = 6 Pr(ˆR = r) = {0.07, 0.18, 0.19, 0.17, 0.10, 0.08, 0.09, 0.12} Keep in mind: A rank R < R true estimate might be better than the rank R true estimate.

Deep interaction example The 2008 General Social Survey includes data on the following six variables: y 1 (words): number of correct answers out of 10 on a vocabulary test; y 2 (tv): hours of television watched in a typical day; x 1 (deg) highest degree obtained: none, high school, Bachelor s, graduate; x 2 (age): 18-34, 35-47, 48-60, 61 and older; x 3 (sex): male or female; x 4 (child) number of children: 0, 1, 2, 3 or more. Nominal goal: Estimate E[y x] for each of the 128 possible x-vectors. 0 5 10 15 0 3 6 9 12 16 20 25 29 38 56 61 n(x)

Deep interaction example Sampling model: {y i : x i = x} iid multivariate normal(µ x, Σ) Mean model: µ x = α x + γ x {α x : x X } = A is of reduced rank {γ x : x X } This is a full model: µ x is unconstrained. iid multivariate normal(0, Ω) This is a hierarchical model: ˆµ x borrows information from other x-groups.

Deep interaction example u2 1.0 0.5 0.0 0.5 1.0 deg.1 tv deg.2 deg.4 deg.3sex.f age.1 age.2 words sex.m child.0 age.3 child.3 child.2 child.1 age.4 yx µx ^ 3 2 1 0 1 1.0 0.5 0.0 0.5 1.0 u 1 0 10 20 30 40 50 60 sample size

Longitudinal network example y i,j,t { 5, 4,..., +1, +2}, the level of military conflict/cooperation x i,j,t,1 = log gdp i + log gdp j, the sum of the log gdps of the two countries; x i,j,t,2 = (log gdp i ) (log gdp j ), the product of the log gdps; x i,j,t,3 = polity i polity j, where polity i { 1, 0, +1}; x i,j,t,4 = (polity i > 0) (polity j > 0). Model: y i,j,t = f (z i,j,t, c 5,..., c +2) = max{y : z i,j,t > c y } z i,j,t = β T x i,j,t + u T i Λ tu j + ɛ i,j,t Z = {z i,j,t β T x i,j,t } = UΛ tu T + E

Longitudinal network example u 2 USA UKG ROK AUL NEW PHI THI TUR CAN JOR EGY TAW IND BUL HUN RUM FRN SPN LEB HON COL ARG BEL BRA DEN GDR GFR SAF NEP ETH COS ALB DOM IRQ OMA VEN NOR ITA HAI SAL NTH AFGPAN LBR GUA IRE SWDCHL CZE SAU SRI INS POR CUB NIC PER AUS ECU IRN GRC YUG MYA ISR CHN PRK USR v 1 0.0 0.2 0.4 0.6 v 2 0.0 0.2 0.4 0.6 1950 1960 1970 1980 u 1 1950 1960 1970 1980

Separable covariance via Tucker products Y = Θ + E Decompose Θ using the Tucker decomposition (Tucker 1964,1966): RX SX TX θ i,j,k = z r,s,ta i,r b j,r c k,r r=1 s=1 t=1 Θ = Z {A, B, C} Z is the R S T core array A, B, C are R m 1, S m 2, T m 3 matrices. R, S and T are the 1-rank, 2-rank and 3-rank of Θ is array-matrix multiplication (De Lathauwer et al., 2000)

Separable covariance via Tucker products Multivariate normal model: z = {z j : j = 1,..., m} Matrix normal model: Z = {z i,j } m 1,m 2 i=1,j=1 iid normal(0, 1) y = µ + Az multivariate normal(µ, Σ = AA T ) iid normal(0, 1) Y = M + AZB T matrix normal(m, Σ 1 = AA T, Σ 2 = BB T ) NOTE: AZB T = Z {A, B} Array normal model: Z = {z i,j,k } m 1,m 2,m 3 i=1,j=1,k=1 iid normal(0, 1) Y = M + Z {A, B, C} array normal(m, Σ 1 = AA T, Σ 2 = BB T, Σ 3 = CC T )

Separable covariance structure For the matrix normal model: Cov[Y] = Σ 1 Σ 2 Cov[vec(Y)] = Σ 2 Σ 1 E[YY T ] = Σ 1 tr(σ 2) E[Y T Y] = Σ 2 tr(σ 1) For the array normal model: Cov[Y] = Σ 1 Σ 2 Σ 3 Cov[vec(Y)] = Σ K Σ 1 E[Y (k) Y T (k)] = Σ k Y tr(σ j ) j k

International trade example Yearly change in log exports (2000 dollars) : Y = {y i,j,k,l } R 30 30 6 7 i {1,..., 30} indexes exporting nation j {1,..., 30} indexes importing nation k {1,..., 6} indexes commodity l {1,..., 10} indexes year Main effects ANOVA-type model: y i,j,k,l = µ + a i + b j + c k + d l + e i,j,k,l Let E = {e i,j,k,l } ANOVA error model: E array normal(0, I, I, I, σ 2 I) MANOVA error model: E array normal(0, I, I, Σ 3, I) array normal model: E array normal(0, Σ 1,..., Σ 4}

International trade example 0.4 0.2 0.0 0.2 0.4 AustriaIreland Denmark Finland Greece GermanyFrance United Kingdom Czech Rep. Spain Italy Sweden Netherlands Switzerland Turkey New Zealand USA Brazil Norway Mexico China, Canada Hong Australia Kong SAR Malaysia Singapore Japan Indonesia China Rep. of Korea Thailand 0.35 0.25 0.15 0.05 0.4 0.2 0.0 0.2 Germany Spain France Greece Austria Italy Denmark Ireland Sweden United Kingdom Switzerland FinlandNorway Canada Mexico Netherlands Australia USA Czech Rep. Turkey New Brazil Zealand China China, Hong Kong SAR Japan Singapore Indonesia Thailand Rep. Malaysia of Korea 0.3 0.2 0.1 0.0 1.0 0.5 0.0 0.5 Machinery and transport equipment Finshed goods Unfinished goods Chemicals Food and animals Crude materials 0.5 0.4 0.3 0.2

Discussion Scientific studies increasingly involve data with multiway array structure Often this structure is unrecognized Array structure may be present in the data or model Multiway data, latent data, or parameters Array decompositions can be incorporated into statistical models This can broaden the applicability of multiway methods