Supplemental Material for TKDE-05-05-035 Kijung Sh, Lee Sael, U Kang PROPOSED METHODS Proofs of Update Rules In this section, we present the proofs of the update rules Section 35 of the ma paper Specifically, we prove the CDTF update rule for L regularization Theorem 7, the CDTF update rule for the nonnegativity constrat Theorem, the SALS update rule for coupled tensor factorization Theorem, and the update rule for bias terms commonly used by CDTF and SALS for the bias model Theorem 0 Lemma Partial Derivative CDTF: For a parameter i, let ˆr i i N x ii N N s k l al, g i,,i N ˆr Ω n ii N l n al k, and d i,,i N Ω n i,,i N Ω i,,i N Ω i,,i N Ω n i,,i N Ω n g l n al k, as the ma paper x i i N K s l x i i N K x i i N x i i N K s l s l ˆr i i N a l a l K s l a l l a l k a l g a l k l n Theorem 7: Correctness of CDTF with L - regularization The update rule the ma paper mimizes the loss function with respect to the updated parameter For an updated parameter, let ˆr i i N x ii N s k N l al, g i,,i N Ω n d i,,i N Ω n paper arg m k ˆr ii N l n al k, and l n al k, as the ma λ g/d if g > λ L LassoA,, A N λ g/d if g < -λ 0 otherwise By Lemma, L LassoA,, A N i,,i N Ω g i d λ an x i i N K i Case : If g > λ > 0, s l { g i d λ g i d λ L Lasso k 0, should be negative for L Lasso k < 0, L Lasso k < 0 makes L Lasso k g a l λ N A l l d λ, and an if an if an > 0 < 0 > d Sce d to be zero If λ g/d 0 Sce L Lasso d 0, k λg/d mimizes L LassoA,, A N with respect to Case : Likewise, if g < λ < 0, L Lasso k Sce d 0, be zero If L Lasso k < should be positive for L Lasso k > 0, L Lasso g k λ g/d > 0 makes L Lasso k d 0, to d λ, and 0 Sce λ g/d mimizes L Lasso A,, A N with respect to i Case 3: On the other hand, if λ g λ, L Lasso { g i d λ an i d 0 g i d λ an i d 0 if an if an i > 0 i < 0 That is, L Lasso A,, A N decreases if i < 0 and creases if > 0, given other parameters Thus, L Lasso A,, A N, which is a contuous
function, is mimized with respect to arg m k if an 0 λ g/d if g > λ L LassoA,, A N λ g/d if g < -λ 0 otherwise N l al, Theorem : Correctness of CDTF with the nonnegativity constrat The update rule 4 the ma paper mimizes the loss function with respect to the updated parameter under the nonnegativity constrat For an updated parameter i, let ˆr i i N x ii N s k g i,,i N ˆr Ω n ii N l n al k, and d i,,i N Ω n arg m k 0 l n al k, as the ma paper LA,, A N g max d λ, 0 By Lemma, LA,, A N i,,i N Ω g d λ Thus, L x i i N K s l a l λ N A l F l > 0 if > g/d λ 0 if g/d λ < 0 otherwise Case : If g/dλ 0, sce L k dλ 0, LA,, A N is mimized if g/dλ with respect to i Case : On the other hand, if g/d λ < 0, under the constrat that i 0, LA,, A N is mimized with respect to if an 0 This is because LA,, A N, which is a contuous function, monotonically creases ie, 0 > g/d λ arg m k 0 L k LA,, A N g max d λ, 0 > 0 if Ny n y Ny n y y ii Ny K k i C c c Likewise, let x Ω and y Ω be the sets of dices of the observable entries X and Y, respectively arg m [a i k,,a i k C ] T L Coupled xa,, xa Nx, ya,, ya Ny xb i yb i λi C xc i yc i, where x B i and y B i are C by C matrices whose entries are xb i c c x c x c, n n yb i c c xc i and y c i xc i c yc i c i,,i Nx xω i i,,i Ny yω i y c n y c n, c, c, are length C vectors whose entries are xˆr i i N x, i,,i Nx xω i i,,i Ny yω i y ˆr i i N and I C is the C by C identity matrix L xa,, xa Nx a i k c i,,i Nx xω λ Nx xa n F n xa i k c i,,i Nx xω i n c y c n LxA,, xa Nx xa x i i Nx K x i i Nx K i,,i Nx xω i λa i k c Likewise, xa i k c x i i Nx x s n xa C i k c N x s n i k c x s n K N x s n x x x λ xa i k c, c x s xˆr i i Nx x c n Theorem : Correctness of SALS Coupled Tensor Factorization The update rule 6 the ma paper mimizes 5 with respect to the updated parameters Let xr and y R be the residual tensors for X and Y, respectively That is, for C updated parameters a i k,, a i k C, xˆr ii Nx x ii Nx K Nx k n x i C Nx c n x c and yˆr ii Ny L ya,, ya Ny a i k c C i,,i Ny yω i λa i k c N y y s n s y ˆr i i Ny y c n
From these, L Coupled a i k c i,,i Nx xω i 0, c, c C i,,i Ny yω i C N x s n C x s xˆr i i Nx x c n N y y s n λa i k c 0, c C xa i k s i,,i Nx xω i i,,i Ny yω i λa i k c i,,i Nx xω i i,,i Ny yω i s s y ˆr i i Ny x s n C ya i k s s xˆr i i Nx y ˆr i i Ny y s n x c n y c n y c n x n, c c y n c xb i yb i λi C[a i k,, a i k C ] T xc i yc i arg m [a i k,,a i k C ] T L Coupled xa,, xa Nx, ya,, ya Ny xb i yb i λi C xc i yc i Theorem 0: Correctness of the Update Rule for Bias Terms The update rule the ma paper mimizes 7 with respect to the updated parameter For an updated parameter b n, let r ii N x ii N K N k l al k l n bl µ, as the ma paper arg m b n L BiasA,, A N, b,, b N i,,i N Ω n L BiasA,, A N, b,, b N i,,i N Ω λ A l b n x i i N K a l s l b n N A l F b n r i i N /λ b Ω n λ b l N l N b l F b n b l µ i,,i N Ω n x i i N K i,,i N Ω n x i i N a l s l λ b Ω n b n Sce L Bias b n N K s l l a l N b l µ l b n r i i N b n λ b b n i,,i N Ω n r i i N b l µ λ b l 3 N b l F b n λ b Ω n 0, L Bias A,, A N, b,, b N is mimized with respect to b n if b n i,,i N r Ω n ii N /λ b Ω n 0, which entails L Bias b n arg m b n L BiasA,, A N, b,, b N Pseudocodes i,,i N Ω n r i i N /λ b Ω n We present the pseudocodes of the SALS variants described Section 35 of the ma paper SALS for Coupled Tensor Factorization Algorithm 6 describes SALS for coupled tensor factorization, where two tensors, denoted by X and Y, share their first mode without loss of generality We denote the residual tensors for X and Y by x R and yr, respectively The lengths of the n-th modes of X and Y are denoted by x I n and y I n, respectively Algorithm 6: SALS for Coupled Tensor Factorization Input : X, Y, K, λ Output: A, xa n for n N x, ya n for n N y itialize xr, yr, A, xa n and ya n for all n for outer iter T out do 3 for split iter K C do 4 choose k,, k C from columns not updated yet 5 compute x ˆR and y ˆR usg 6 for ner iter T do 7 for i I xi yi do update a i k,, a i k C usg 6 0 3 4 5 for n N x do for xi n do update x,, x usg C for n N y do for yi n do update y,, y usg C update xr and yr usg 0
4 Algorithm 7: SALS for Bias Model Input : X, K, λ A, λ b Output: A n for all n, b n for all n, µ compute µ the mean of the observable entries of X itialize R, A n for all n, and b n for all n 3 for outer iter T out do 4 for split iter K C do 5 choose k,, k C from columns not updated yet 6 compute ˆR usg 7 for ner iter T do for n N do for I n do 0 update,, C usg 3 4 5 update R usg 0 for n N do for I n do update b n usg update R usg SALS for Bias Model SALS for the bias model is described Algorithm 7, where each i,, i N th entry of R is r i x ii N µ N n bn K N k n an, as explaed Section 355 of the ma paper OPTIMIZATION ON MAPREDUCE In this section, we present the details of the optimization techniques described Section 4 of the ma paper Local Disk Cachg As explaed Section 4 of the ma paper, our MAPREDUCE implementation of CDTF and SALS with local disk cachg, X entries are distributed across maches and cached their local disk durg the map and reduce stages Algorithm gives the details of the map and reduce stages The rest of CDTF and SALS runs the close stage cleanup stage Hadoop usg the cached data Direct Communication In the ma paper, we troduce direct communication between reducers usg distributed file system to overcome the rigidity of MAPREDUCE model Algorithm describes the implementation of m k broadcast CDTF le 0 of Algorithm 3 the ma paper based on this communication method 3 Greedy Row Assignment Our MAPREDUCE implementation of SALS and CDTF uses the greedy row assignment, explaed Section 343 of the ma paper In this section, we expla our MAPREDUCE implementation of the greedy row assignment We assume that X is stored Algorithm : Data distribution CDTF and SALS with local disk cachg Input : X, ms n for all m and n Output: mω n entries of R X for all m and n MapKey k, Value v beg 3 i,, i N, x i i N v 4 for n,,n do 5 fd m where ms n 6 emit < m, n, i,, i N, x i i N > 7 end PartitionerKey k, Value v beg 0 m, n k assign < k, v > to mache m end 3 ReduceKey k, Value v[ v ] 4 beg 5 m, n k 6 create a file on the local disk to cache mω n entries of R 7 foreach i,, i N, x i i N v do write i,, i N, x i i N to the file end Algorithm : m k broadcast CDTF Input : m k parameters to broadcast Output: k parameters received from others beg create a data file ma on the distributed file system DFS 3 write m k on the datafile 4 create a dummy file md on DFS 5 while not all data files are read do 6 get the list of dummy files from DFS 7 foreach m D the list do if m A are not read then read m k from m A 0 end on the distributed file system At the first stage, Ω n for all n and is computed Specifically, mappers output < n,, > for all n for each entry x ii N, and reducers output < n,, Ω n > for all n and by countg the number of values for each key At the second stage, the outputs are aggregated to a sgle reducer which runs the rest of Algorithm 5 the ma paper 3 EXPERIMENTS In this section, we design and conduct additional experiments to answer the followg questions: How do different numbers of ner iterations T affect the convergence of SALS? How do different numbers of columns updated at a time C affect the runng time of SALS? 3 Experimental Settgs We ran experiments on a 0-node Hadoop cluster Each node had an Intel Xeon E3-30 33GHz CPU
5 a Netflix 3 C 0 b Netflix 3 C 0 c Yahoo-music 4 C 0 d Yahoo-music 4 C 40 Fig 4: Effects of T ie, ner iterations on the convergence of SALS when C ie, the number of columns updated at a time has large enough values The effects of T on convergence speed and the quality of converged solutions are margal a Netflix 3 3 Effects of the Number of Inner Iterations ie, T on the Convergence of SALS We compared the convergence properties of SALS with different T values Especially, we focused on cases where C ie, the number of columns updated at a time has large enough values The effect of T when C is set to one and thus SALS is equivalent to CDTF can be found the ma paper As seen Figure 4, the effects of T on convergence speed and the quality of converged solutions are neither distct nor consistent When C is set to one, however, high T values are preferred see Section 57 of the ma paper for detailed experimental results b Yahoo-music 4 Fig 5: Effects of the number of columns updated at a time C on the runng time of SALS Runng time per iteration decreased until C 0, then started to crease The maximum heap size per reducer was set to GB Other experimental settgs, cludg datasets and parameter values λ and K, were the same as those the ma paper The number of reducers was set to 0 We used the root mean square error RMSE on a held-out test set, which is commonly used recommender systems, to measure the accuracy, as the ma paper 33 Effects of the Number of Columns Updated at a Time ie, C on Runng Time of SALS We measured the runng times per iteration SALS, as we creased C from to K As seen Figure 5, runng time per iteration decreased until C 0, then started to crease As C creases, the amount of disk I/O decles sce it depends on the number of times that the entries of R or ˆR are streamed from disk, which is versely proportional to C Conversely, computational cost creases quadratically with regard to C At small C values, the decrease the amount of disk I/O was greater and leaded to a downward trend of runng time per iteration The opposite happened at large C values