Supplementary Information Reconstructing propagation networks with temporal similarity

Supplementary Inormation Reconstructing propagation networks with temporal similarity Hao Liao and An Zeng I. SI NOTES A. Special range. The special range is actually due to two reasons: the similarity degeneracy and degree penalty o the similarity metrics. In order to understand why the special range exists when Cimilarity is used, we take a detailed look at the similarity matrix under dierent inection rate. In our method, we need to take the E node pairs (E is the number o links in the true network) with the highest similarity and add links between each node pair to obtain the reconstructed network. When is in the special range, we ind that there is a serious similarity degeneracy problem when we try to pick up the top E node pairs. It means that there are so many node pairs with the same similarity that one cannot use a simple similarity threshold to cut the similarity and obtain exact top E node pairs. We sort the similarity values in descending order and consider two adjacent similarity Sc and Sc 2 such that E < E node pairs will be obtained when Sc is used as the threshold and E 2 > E node pairs will be obtained when Sc 2 is used as the threshold. In our method, Sc is used in this case and E E node pairs are randomly selected rom the node pairs with similarity Sc 2. We checked the value o E 2 E, it is usually very large (e.g. larger than 4 ). Thereore, these E E node pairs needs to be randomly selected rom a large number o candidates, resulting in a low reconstruction precision. To show this, we plot the dependence o E E on in Fig. S7. One can see that in SW, E E indeed suddenly increases in the special range. The increase o E E in BA is smaller, so we only observe a small drop o precision in the special range in this network in Fig. S7. The similarity degeneracy in the special range is related to the spreading coverage under dierent. When is small, the spreading can only propagate very limited number o steps. Most o node pairs are with zero similarity. As increases, more nodes are with nonzero similarity and the similarity degeneracy becomes less serious. However, when is around the critical inection rate, the spreading starts to propagate locally, making many node pairs receive one or two common news. The similarity degeneracy becomes serious again. As urther increases (larger than the special range ), spreading propagates globally and the similarity between node pairs becomes well separated. This problem in and is less serious because they eectively reduce the similarity degeneracy by some normalization (but we can still observe an increase o E E in the special range ). However, the normalization in these methods may cause strong penalty on, contributing also to the drop o precision in the special range. To show this, we denote the number o news the node i received as d i and the degree o node i in the network as k i. We irst study the correlation between d i and k i under dierent inection rate. The result is shown in Fig. S8. In BA network, one can see that the correlation irst increases with when is small. Ater reaching a highest value, the correlation starts to decrease. The special range happens when the correlation is very high. The behavior o the correlation between d i and k i is similar in SW. However, the special range happens when the correlation is around.25 (but not the highest). Based on the results in Fig. S8, we conjecture the reason that orms the special range in and is as ollows: smaller than the special range : When is low, the spreading can only cover a small number o nodes. As the spreading inormation is so little (the similarity matrix is very sparse), it is natural that precision o the network reconstruction is low. As increases, the spreading covers more and more nodes and the similarity matrix becomes denser. The network reconstruction precision gradually increases with. within the special range : As increases in BA networks, the correlation between d and k increases as well. The metric computes the similarity between two nodes by the number o their common news over the product o their received news number (i.e. d i d j ). When d becomes more strongly correlated with k, the will be punished and inally have very small number o links in the reconstructed network. This eect is absent when is small because the correlation d and k is very low in that case (e.g. lower than.25). The punishment rom d i d j is not concentrated on. In SW networks, as the correlation between d and k is not very high and there is no hub with extremely large degree in SW networks, the eect o the punishment is weaker and the drop o precision is less obvious in the special range.

larger than the special range : when becomes even larger, the spreading can reach the small degree nodes more requently. In this case, we can obtain meaningul similarity relation between the and other nodes. Thereore, the correctly predicted links o the will signiicantly increases with when is large enough. This could be the reason why the precision increases again ater the special range. In order to prove the conjecture above, we carry out more simulation. We consider a group o consisting o the all nodes with the lowest degree in the network, and a group o consisting o all the nodes with the highest degree nodes. As degree distribution is homogeneous in SW, the group has similar number o nodes as the large degree node group. However, in BA, there is only one node with the highest degree. Thereore, we consider top- nodes with highest degree in BA. We study the relation between the number o correctly predicted links o dierent groups and the inection rate in Fig. S9. Our conjecture above is proved by Fig. S9. In Fig. S9(d) (i.e. is used in BA), the correctly predicted links (denoted as ) are mainly the links connecting to when is small. When is in the special range, the are punished and drop to due to the high correlation between k and d. Since some are connecting to these, their are slightly punished as well. Thereore, we also observe a small drop o or the. When is urther increased, or large degree nodes stay zero, but or increases a lot since the spreading can reach the more requently and result in a denser and more meaningul similarity matrix or prediction. The inal drop o the number o correctly predicted links (when is very large) is because the spreading cover almost the whole network, the similarity extracted rom the spreading can no longer relects meaningul structural inormation o the network. The results o in SW are similar (see ig. S9(c)). As the degree is homogeneous in SW, the punishing eect is less obvious than BA. However, we can still observe that the start to be punished in the special range and start to have many correctly predicted links when is larger than special range. Since there is no penalty term in, there is no sudden drop o o the in both Fig. S9(a)(d). The results o are between and, as shown in Fig. S9(b)(e). 2 B. Other similarity metrics We study the inluence o dierent parameters (i.e. N, k, ) on the perormance o dierent similarity metrics in network reconstruction, as shown below in Fig. S. We consider eight similarity metrics including the common neighbor index (), card index (), Cosine index (Cos), Hub depressed index (HDI), Hub promoted index (HPI), Sorensen index (SSI), Resource allocation index (), and Leicht-Holme-Newman index (). For each method, we also study its temporal version. The description o these methods is as ollows. (i)cosine Index (Cos) It also named the Salton index [], which is deined as R ir j R i R j (ii) Temporal Cosine Index (TCos) The Cos index can be improved by T i as. () R ir j (T i T j ) R i R. (2) j (iii) Sorensen Index (SSI) This index is used mainly or ecological community data [2], and is deined as 2 R ir j R i + R. (3) j (iv) Temporal Sorensen Index (TSSI) The improved sorensen index is deined as R ir j (T i T j ) R i + R. (4) j (v) Hub Promoted Index (HPI) This index is proposed or quantiying the topological overlap o pairs o substrates in metabolic networks [3], and is deined as R ir j min{ R i, R j}. (5)

3 (vi) Temporal Hub Promoted Index (THPI) This index is deined as R ir j (T i T j ) min{ R i, R j}. (6) (vii) Hub Depressed Index (HDI) There is a measure with the opposite eect on hubs [4]. It is deined as R ir j max{ R i, R j}. (7) (viii) Temporal Hub Depressed Index (THDI) It is deined as R ir j (T i T j ) max{ R i, R j}. (8) (ix) Preerential Attachment (PA) Based on the well-known Preerential Attachment process mechanism [3], the likelihood score or two nodes to have a link can be calculated as R i R j. (9) (x) Temporal Preerential Attachment (TPA) It can be expressed as R i R j. () T i T j (xi) Asymmetric Index (AS) This measure is or inerring the network topology in directed networks [5]. Mathematically, it can be expressed as R ir j R. () i (xii) Temporal Asymmetric Index (TAS) Similarly, it can be expressed as R ir j (T i T j ) R. (2) i We ind that the temporal similarity metrics can signiicantly outperorm the corresponding traditional similarity metrics especially when is large. This is consistent with the indings in the paper. When k increases, the precision o both traditional similarity metrics and temporal similarity metrics tend to increase. When N increases, the precision o both traditional similarity metrics and temporal similarity metrics tend to decrease. However, when k and N increase, the temporal metrics constantly outperorm the traditional metrics. Thereore, it is better to use the temporal similarity metrics to reconstruct networks. When dierent similarity metrics are compared, we ind that and indices have smaller drop o precision in the special range than the other similarity metrics such as, SSI, HPI, HDI, Cos and. This is because the latter group o metrics all has some orm o punishment based on node degree. In, the drop o precision in the special range is most signiicant. The special range eect is much less obvious when the temporal similarity metrics are used. In, however, an observable drop o precision in the special range still exists. This is because the degree punishment is most severe in. We then compare the results o dierent metrics on SW and BA networks. In SW networks, all the temporal metrics can reach a very high precision (close to ) when is large. However, T method reaches the highest value later (i.e. a larger is needed) than the other methods. In BA networks, the THPI reaches a highest precision. In summary, i the time inormation o the spreading is unknown, it is better to use and to reconstruct the network as their precision is not aected much by the special range eect. I the time inormation o the spreading is available, it is better to use THPI to reconstruct the network as it works similar to other metrics in heterogeneous networks while works best in homogeneous networks.

4 II. SI FIGURES.3.25 (a) BA.8.3.25 (b) BA.2.3.25 (c) BA.8.2.5..5.25.5.75.7.6.5.2.5..5.25.5.75.5..5.2.5..5.25.5.75.6.4.2.3.25.2.5..5 (d) SW.25.5.75.9.8.7.6.5.3.25.2.5..5 (e) SW.25.5.75.6.5.4.3.2..3.25.2.5..5 () SW.25.5.75.6.5.4.3.2. Fig. S The, and Degree correlation in the parameter space (, ) or (a,b,c) BA networks (N = 5, k = ) and (d,e,) SW networks (N = 5, p =., k = ) by using method. The results are averaged over 5 independent realizations..3.25.2.5..5 (a) BA.25.5.75.75.7.65.6.55.5.3.25.2.5..5 (b) BA.25.5.75.5..5.3.25.2.5..5 (c) BA.25.5.75.4.2.2.4.3 (d) SW.3 (e) SW.7.3 () SW.25.2.5..5.25.5.75.9.8.7.6.5.25.2.5..5.25.5.75.6.5.4.3.2..25.2.5..5.25.5.75.2.4.6 Fig. S2 The, and Degree correlation in the parameter space (, ) or (a,b,c) BA networks (N = 5, k = ) and (d,e,) SW networks (N = 5, p =., k = ) by using method. The results are averaged over 5 independent realizations.

5.3.25.2.5..5 (a) BA.25.5.75.8.75.7.65.6.55.5.3.25.2.5..5 (b) BA.25.5.75.3.25.2.5..5.3.25.2.5..5 (c) BA.25.5.75.8.6.4.2.3.25 (d) SW.9.3.25 (e) SW.5.3.25 () SW.6.5.2.8.2.4.2.4.5..5.25.5.75.7.6.5.5..5.25.5.75.3.2..5..5.25.5.75.3.2. Fig. S3 The, and Degree correlation in the parameter space (, ) or (a,b,c) BA networks (N = 5, k = ) and (d,e,) SW networks (N = 5, p =., k = ) by using method. The results are averaged over 5 independent realizations..9.8.7.6 (a) BA.5.25.5.75.9.8.7.6 (d) SW.5.25.5.75 Precsion Precsion.4.3.2. (b) BA.25.5.75.5.4.3 (e) SW.2..25.5.75.5 (c) BA.5.25.5.75.6.4.2 () SW.2.25.5.75 Fig. S4 The dependence o the, P recision and Degree correlation on with our dierent similarity methods or (a,b,c) BA networks (N = 5, k = ) and (d,e,) SW networks (N = 5, p =., k = ). We use = / k here. The results are averaged over 5 independent realizations.

6 (a) BA (b) BA.9.9.8.7 T.6 T.5.25.5.75.8.7.6 T.5 T..2.3 (c) SW (d) SW.9.9.8.7.6 T T.5.25.5.75.8.7.6.5 T T..2.3 Fig. S5 The dependence o the on and separately with temporal-based similarity methods in BA and SW networks. We use = / k in (a) and (c), and =.5 in (b) and (d). The results are averaged over 5 independent realizations. (a) BA (b) BA.5 T T.5 T T.5.25.5.75.5 (c) SW T T.5.25.5.75.5.5.5..2.3 (d) SW T T..2.3 Fig. S6 The dependence o Degree correlation on and separately with temporal-based similarity methods in BA and SW networks. We use = / k in (a) and (c), and =.5 in (b) and (d). The results are averaged over 5 independent realizations.

7 6 (a) SW 6 (b) SW 6 (c) SW E E 4 2 E E 4 2 E E 4 2..2.3..2.3..2.3 6 (d) BA 6 (e) BA 6 () BA E E 4 2 E E 4 2 E E 4 2..2.3..2.3..2.3 Fig. S7 The dependence o E E on when dierent similarity methods are used in SW and BA networks. The network parameters are the same as those used in Fig. in the paper. (a) SW (b) BA (d,k).8.6.4.2 (d,k).8.6.4.2..2.3..2.3 Fig. S8 The dependence o the correlation between d and k on (the blue curve), and the dependence o the precision on (the red curve). The similarity used or reconstructing the network is the similarity. The network parameters are the same as the ones used in Fig. in the paper. 6 4 (a) SW 6 4 (b) SW 3 2 (c) SW 2 2..2.3 6 4 2 (d) BA..2.3..2.3 5 4 3 2 (e) BA..2.3..2.3 2 5 () BA 5..2.3 Fig. S9 The relation between the number o correctly predicted links (Ns) o dierent groups and the inection rate. The network parameters are the same as the ones used in Fig. in the paper.

8.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5 (BA) T(BA) (SW) T(SW) (BA) T(BA) (SW) T(SW) Cos(BA) TCos(BA) Cos(SW) TCos(SW) HDI(BA) THDI(BA) HDI(SW) THDI(SW) HPI(BA) THPI(BA) HPI(SW) THPI(SW) SSI(BA) TSSI(BA) SSI(SW) TSSI(SW) (BA) T(BA) (SW) T(SW).5..2.3.5 5 <k>.5 2 3 N (BA) T(BA) (SW) T(SW) Fig. S The inluence o dierent parameters (i.e. N, k, ) on the precision o dierent similarity metrics in network reconstruction. Each row o the subplots is corresponding to one similarity metric. In the irst column, k =, N = 5. In the second column, =., N = 5. In the third column, k =, =..

9 III. SI TABLES Table S. Basic properties o real undirected networks and the perormance o the, T, and T methods on these networks. The parameters are set as = 2/ k and =.5. We select a relatively large because the perormance dierence between traditional similarity metric and temporal similarity metric becomes more signiicant under large, as shown in Fig. 4. The similarity method with the best perormance in each network is highlighted in bold ont. The results are averaged over 5 independent realizations. Network Degree correlation T T T T T T Dolphins.78±..96±.2.83±..97±..33±.2.66±.5.38±.3.74±.4.67±.4.77±.7.69±.5.83±.4 Word.8±..92±..8±..93±..3±..53±.2.3±..55±.2.76±.2.8±.2.76±.2.87±.2 Jazz.79±..86±..79±..86±..4±..52±..42±..53±..85±..82±..85±..82±. E. coli.87±..94±..89±..97±..32±..52±..33±..53±..83±.2.79±..83±.2.79±.2 USAir.9±..93±..9±..94±..52±..5±..5±..5±..82±..84±..82±..84±. Netsci.86±..98±..97±..99±..2±.2.6±.2.44±.2.84±..5±.3.63±.3.64±.2.88±.2 Email.83±..92±..83±..93±..±..39±..±..4±..78±..85±..78±..85±. TAP.82±..93±..89±..99±..8±..55±..26±..58±..69±..76±..75±..78±. PPI.89±..94±..92±..97±..29±..34±..28±..35±..79±..75±..79±..75±. Table S2. Basic properties o real undirected networks(n or number o nodes, E or number o links), and two network reconstruction metrics(note or and ) o two additional typical similarity deinitions which are method and method, and also these methods with temporal inormation(short or T and T). The parameters are set as = 2/ k and =.5. The similarity with best perormance in each network is highlighted in bold ont. The results are averaged over 5 independent realizations. Networks Degree correlation T T T T T T Dolphins.76±.4.94±..82±.3.96±..25±.4.48±.4.38±.5.68±.5.7±.6.44±.6.57±.6.63±.6 Word.6±..87±..8±..92±..6±..35±.2.3±..53±.3 -.58±.2 -.27±.5.75±.3.82±.3 Jazz.59±.2.8±..79±..86±..±.2.38±..42±..52±. -.69±.6 -.5±.9.86±..83±. E. coli.68±.2.93±..88±..95±..7±..9±..34±..55±.2 -.44±. -.42±..82±.2.8±. USAir.62±..74±..9±..93±..3±..7±..52±..5±. -.43±.2 -.4±..82±..84±. Netsci.94±..99±..94±..99±..3±.3.54±.2.4±.4.7±.3 -.5±.6.4±.4.56±.8.74±.6 Email.65±..97±..83±..92±..3±..8±..2±..4±. -.45±. -.46±..78±..85±. TAP.84±..99±..87±..96±..4±..3±..25±.4.59±. -.4±. -.47±..73±.4.78±. PPI.7±..96±..9±..96±..5±..±..29±..33±. -.23±. -.26±..78±..72±. Table S3. Basic properties o real directed networks and the perormance o the, T, and T methods on these networks. The parameters are set as = 2/ k and =.5. We select a relatively large because the perormance dierence between traditional similarity metric and temporal similarity metric becomes more signiicant under large, as shown in Fig. 4. The similarity method with the best perormance in each network is highlighted in bold ont. The results are averaged over 5 independent realizations. Degree correlation Networks T T T T T T Prisoners.72±..8±.2.8±.2.84±.3.2±.2.48±.3.4±.3.58±.2.57±.4.69±.2.68±.4.73±.4 SM-FW.64±..67±.2.63±.2.66±.2.25±.2.29±.2.23±.2.28±.2.66±.6.64±.8.33±..33±.9 Neural.72±..79±..73±..8±..4±..25±..4±..29±..68±..6±.2.55±..5±.2 Metabolic.68±..7±..7±..72±..9±..4±..4±..23±..54±.2.64±.2.6±..7±.2 PB.84±..86±..84±..86±..5±..25±..6±..25±..8±..8±..8±..8±. Table S4. Basic properties o real directed networks(n or number o nodes, E or number o links), and two network reconstruction metrics(note or and ) o two additional typical similarity deinitions which are method and method, and also these methods with temporal inormation(short or T and T). The parameters are set as = 2/ k and =.5. The similarity with best perormance in each network is highlighted in bold ont. The results are averaged over 5 independent realizations. Degree correlation Networks T T T T T T Prisoners.73±.2.84±.2.74±..8±..2±.5.48±.3.24±.2.46±.2 -.23±.8.24±..52±.9.65±.4 SM-FW.6±.2.64±.2.65±.2.67±.3.9±.2.25±.3.27±..3±. -.2±..±.2.7±.4.7±.4 Neural.66±..8±..73±..8±..7±..24±..7±..26±. -.3±. -.24±.2.7±.2.66±.2 Metabolic.68±..73±..69±..7±..8±..±..±..3±. -.3±. -.±..5±..53±.2 PB.76±..85±..85±..86±..±..5±..5±..25±. -.22±. -.8±..8±..8±. Table S5. o other additional typical similarity deinitions, and also these methods with temporal inormation. The parameters are set as = 2/ k and =.5. The similarity with best perormance in each network is highlighted in bold ont. The results are averaged over 5 independent realizations. Network Cos TCos SSI TSSI HPI THPI HDI THDI PA TPA AS TAS Prisoners.8±..84±.2.79±..83±.2.76±.2.85±.2.78±..83±.2.65±..76±..79±.2.87±.2 SM-FW.63±.2.65±.2.63±.2.65±.2.64±.2.65±.2.62±..64±.2.6±.2.65±.2.69±.4.7±.3 Neural.74±..83±..73±..82±..74±..84±..72±..8±..7±..76±..78±..87±. Metabolic.7±..73±..7±..72±..73±..75±..7±..72±..64±..69±..75±..78±. PB.84±..87±..84±..87±..84±..88±..84±..86±..84±..85±..85±..89±. Table S6. o other additional typical similarity deinitions, and also these methods with temporal inormation. The parameters are set as = 2/ k and =.5. The similarity with best perormance in each network is highlighted in bold ont. The results are averaged over 5 independent realizations. Network Cos TCos SSI TSSI HPI THPI HDI THDI PA TPA AS TAS Prisoners.4±.4.57±.2.4±.4.57±.2.±.3.56±.3.38±.4.57±.2.6±.2.32±.3.6±.2.6±.3 SM-FW.24±.2.28±.2.23±.2.27±.2.22±.2.29±.3.22±..27±.2.25±.2.27±..25±.2.38±.3 Neural.4±..35±..4±..34±..8±..33±..3±..29±..4±..2±..4±..29±. Metabolic.4±..24±..4±..23±..3±..3±.3.2±..2±..7±..9±..7±..5±. PB.6±..26±..6±..26±..4±..9±..5±..25±..5±..24±..5±..8±. Table S7. o other additional typical similarity deinitions, and also these methods with temporal inormation. The parameters are set as = 2/ k and =.5. The similarity with best perormance in each network is highlighted in bold ont. The results are averaged over 5 independent realizations. Network Degree Cos TCos SSI TSSI HPI THPI HDI THDI PA TPA AS TAS Prisoners.67±.4.7±.5.67±.5.7±.4.34±..53±.8.65±.5.7±.4.5±.7.6±.4.5±.7.74±.5 SM-FW.28±.2.22±.8.26±.3.2±.5.4±.9.44±.6.2±..9±.5.75±.5.7±.3.75±.6.7±.6 Neural.56±..49±.3.55±..49±.3.6±.3.45±.6.52±..48±.2.78±..68±..78±..6±.3 Metabolic.62±..7±.2.6±.3.7±.2.32±.4.28±.9.56±.2.7±.2.53±..56±.2.53±..79±.3 PB.8±..79±..8±..79±..74±..55±.4.8±..8±..8±..8±..8±..77±.

[] Salton, G. & McGill, M. J. Introduction to modern inormation retrieval (MuGraw-Hill, Auckland, 983). [2] Sorensen, T. A method o establishing groups o equal amplitude in plant sociology based on similarity o species and its application to analyses o the vegetation on Danish commons. Biol. Skr. 5, -34 (948). [3] Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabasi, A. L. Hierarchical organization o modularity in metabolic networks. Science. 297, 55-555 (22). [4] Zhou, T., Lu, L. & Zhang, Y. C. Predicting missing links via local inormation. Eur. Phys. J. B 7, 623?3 (29). [5] Zeng, A. Inerring network topology via the propagation process. J. Stat. Mech., (23).