Supplementary Information

Size: px
Start display at page:

Download "Supplementary Information"

Transcription

1 Supplementary Information For the article "The organization of transcriptional activity in the yeast, S. cerevisiae" by I. J. Farkas, H. Jeong, T. Vicsek, A.-L. Barabási, and Z. N. Oltvai For the Referees convenience, we are attaching the supplementary material containing additional results (including numerical values) and details on the methodology that are necessary to reproduce our findings. Upon publication, this material will also be available on our designated website located at 1

2 Web Note A - Additional Results Table of Contents: p. 4 Web Fig. A (Statistical analysis of data sets) The averages and standard deviations of the genes and transcriptomes' expressi on level values in the control and the perturbation data sets. p. 5 Web Fig. B (Statistical analysis of the control data set) Analysis of the distributions displayed by the genes expression level values in the control data set. p. 6 Web Note A1 (Statistical analysis of data sets) The average extent by which the different gene deletions alter the genomic expression program of S. cerevisiae cells. p. 6 Web Fig. C (Control analysis to Figs. 1&) Analysis of numerical artifacts in the stepwise correlation search method. p. 7 Web Fig. D (Control analysis to Figs. 1&) Analysis of the structural properties of the similarity graph obtained by the correlation search method when changing the empirical parameters in the algorithm. p. 8 Web Fig. E (Control analysis to Figs. 1&) Analysis of transcriptome similarities in the perturbation data set using a random reordering of genes. p. 9 Web Fig. F (Control analysis to Figs. 1&) Analysis of transcriptome similarities in the perturbation data set using the hierarchically clustered order of genes. p. 1 Web Fig. G (Control analysis to Figs. 1&) Analysis of transcriptome similarities in the perturbation data set when genes are listed in the descending order of the standard deviations of their expression level values.

3 p. 1 Web Fig. H (Control analysis to Figs. 1&) Analysis of transcriptome similarities in the perturbation data set with a modified version of the correlation search algorithm containing overlapping gene regions. p. 13 Web Fig. I (Control analysis to Figs. 1&) The similarity graph predicted for the wild-type (control) data set by the correlation search algorithm.7 p. 14 Web Fig. J (Additional analysis to Figs. 1&) Analysis of the microarray data set published by Kim.et.al. 1 reveals a scale-free structure of the transcriptome similarity graph in Caenorhabditis elegans. p. 15 Web Fig. K (Control analysis to Fig. 3b) After randomly reordering the genes of the perturbation data set, the list of transcriptomes/ deleted genes with the highest numbers of connections on the similarity graph is highly similar to the original result of Fig.3b in the manuscript, where genes were listed in alphabetical order. 3

4 PERTURBATION DATA SET CONTROL DATA SET PERTURBATION DATA SET CONTROL DATA SET ANALYSIS OF GENES average expression level average expression level average expression level average expression level (c) alphabetical order of genes (a) ANALYSIS OF TRANSCRIPTOMES (e) (g) alphabetical order of transcriptomes std.dev. of expression level std.dev. of expression level std.dev. of expression level std.dev. of expression level (b) (d) genes sorted to obtain the descending order of values (f) (h) transcriptomes sorted to obtain the descending order of values Fig. A The averages and standard deviations of the genes and the transcriptomes' expression level values in the control and the perturbation data sets. Averages ( offsets ) of the genes expression level values and the descending sequence of the standard deviations of the genes expression levels in the control- (a, b) and the perturbation (c, d) data sets. Observe that ~ genes display significantly higher standard deviations of expression level than the remaining ones (subfigure b). For the reader s convenience, an arrow indicates the dividing line. Using (b) as a control for subfigure (d), we find that the number genes with strongly varying expression values in the perturbation data set is ~15, which is well below the size of the 4

5 complete yeast transcriptome. The average expression level values and the descending sequence of expression level standard deviations in transcriptomes of the control- (e, f) and the perturbation (g, h) data sets. Using (f) as a control for (h), we find that the amount to which the expression level in individual perturbation transcriptomes varies, is highly different. These results indicate that in the perturbation experiments, transcriptional responses are localized, but the level of localization varies strongly. A detailed quantitative analysis is presented in Web Fig C. χ measured / χ normal (a) data ratio=1 4 6 sorted order of genes χ measured / χ normal (b) data ratio=1 4 6 sorted order of genes Fig. B data set. Analysis of the distributions displayed by the genes expression level values in the control (a) Using a standard χ -test with a confidence level of 95%, the hypothesis that a given gene s expression level values are normally distributed was rejected for 3.7% of all genes. Shown are the ratios of measured vs. standard for a given gene, if the χ values in ascending order. (The χ -test rejects the hypothesis χ ratio for that gene is above 1.) (b) To remove suspected experimental errors, we have filtered the control data set. After filtering, the statistical test still rejected the hypothesis of a normal distribution of the steady-state expression level values for 15.7% of all genes. (During filtering, we first measured the average expression level of each gene. Next, for each gene, we removed points farther from the average expression level than 3 times the radius of the major population, 9%, of the points.) 5

6 Note A1 The average extent by which the different gene deletions alter the genomic expression program of S. cerevisiae cells. We use the wild-type average expression level ( A ) and standard deviation ( Σ wt i wt i ) of gene i as obtained from the filtered data set analyzed on panel b of Web Fig. B. Next, we computed for all e ij expression level values in both the complete control and perturbation data sets the normalized difference, e ij = e ij E i wt Σ i wt, of the expression level value from the gene s wild-type average. Considering a relative 3-fold up- or downregulation to be significant, we found that the average percentage of significantly up- or downregulated genes per transcriptome is 1.3 ±1.87% in the control data set vs. 1.1 ± 11.4% in the perturbation data set. Thus, while the ratio of up- or downregulated genes per transcriptome in the perturbation data set is significantly higher than in the control set, on average transcriptional responses and genetic noise combined involves only about one tenth of all genes. erg3 ymr58c Fig. C Analysis of numerical artifacts in the stepwise correlation search method. To test the effect of numerical artifacts on our results, we have performed the stepwise correlation search method on a modified version of the perturbation data set, where all transcriptomes were scrambled independently to remove all possible similar patterns among transcriptomes. In the resulting data matrix, the e ij values of any row contained data points measured for different genes. The two strongest edges in the graph predicted for this case reached the similarity scores C=.59 and C=.5 (both colored in yellow). Comparing these results to the similarity graph displayed in the top layer of Fig.1c of the paper, where 16 connections above C=.9 were found, we conclude that the effect of numerical artifacts on our results is negligible. 6

7 X 6 4 (a) (b) 6 4 (c) (d) (e) (f) Y s (window size) u (u th largest value picked) C (similarity threshold) Fig. D Irrespective of the set of empirical parameters, the similarity graph supplied by the correlation search algorithm is most closely described by a scale-free graph. We compared the similarity graph obtained by the correlation search method to the three idealized test graphs using the spectral tests outlined below. Results for the three test graphs are given in the same color as below: blue (random graph), green (small-world graph) and red (scale-free graph). For both quantities characterizing the quality of fit in the spectral comparison - X, testing localization of the first eigenvector, and Y, testing the closeness of eigenvalues - smaller values indicate a better agreement between the structural properties of the similarity graph and the given test graph. The parameters used on Fig. of the paper were s=3, u=1 and C=.7. Compared to the original parameter set, here only one parameter was changed for each column of subfigures. In the first column of subfigures (a, d) the size of the gene segment, s, was varied. For the analysis in the second column (b, e), u was changed. For the third column of subfigures (c, f), the similarity threshold, C, was changed. The best fit is given everywhere by the scale-free graph. 7

8 CDC4 fus ymr141c ERG11 spf1 KAR cbp gas1 imp erg3 msu1 ERG11 gcn4 yel8w swi4 cyt1 cbp anp1 gas1 sgs1 erg3 msu1 yer83c jnm1 yel8w yhl9c imp rml ymr93c ckb gcn4 hda1 erg3 ubr1 clb ymr37w sir pet111 mrpl33 msu1 ERG11 rad57 swi4 swi5 yer83c jnm1 yel8w yhl9c ymr93c sst ste4 sir3 vps8 ymr14w ste1 dig1,dig yor78w ste7 fus3,kss1 isw1 cem1 afg3 mac1 isw1,isw ymr93c ste5 kim4 yhr11w ste11 ste1 dig1,dig ste18 ste7 cka dig1,dig ymr37w fus3,kss1 sst sir4 ckb ste4 sir vps8 sir3 yor78w ssn6 rpl8a tup1 rml gcn4 hda1 ubr1 yhr39c cyt1 pet111 cbp mrpl33 msu1 imp anp1 ERG11 CMD1 yer83c spf1 yor8w PMA1 gas1 rad57 jnm1 erg6 erg3 gcn4 sgs1 she4 swi4 yar14c yel8w yhl9c fus sbp1 ymr141c swi5 vac8 ymr1w RHO1 CDC4 KAR erp yel1c cla4 yor6c rps4a ymr14w isw1 cem1 afg3 mac1 isw1,isw isw rml ymr93c ste5 yhr11w ste11 ste1 dig1,dig ste18 ste7 cka dig1,dig ymr37w fus3,kss1 sst rts1 sir4 ckb utr4 hda1 ste4 ras npr cin5 hog1 ubr1 gfd1 yal4w sir vps8 hir imp sir3 pex1 yor78w ssn6 rpl8a tup1 ymr69w rpl1a ymr9c yel33w rpl7a nrf1 yor6c yml5w ymr14w rps4a (a) C=.9 C=.84 C=.88 C=.8 rml kim4 kin3 clb cmk yhr39c yml3w (b) cyt1 pet111 isw1 cem1 afg3 mac1 isw1,isw cbp mrpl33 msu1 isw rml ymr93c ste5 imp kim4 yhr11w kin3 ste11 ste1 dig1,dig anp1 ERG11 ste18 ste7 CMD1 yer83c cka dig1,dig spf1 ymr37w fus3,kss1 yor8w sst rts1 PMA1 sir4 gas1 rad57 jnm1 ckb utr4 erg6 erg3 gcn4 hda1 ste4 ras sgs1 npr cin5 hog1 she4 swi4 ubr1 gfd1 yar14c yel8w yal4w sir vps8 hir yhl9c imp sir3 fus clb pex1 cmk sbp1 yor78w ssn6 ymr141c yhr39c rpl8a swi5 tup1 yml3w ymr69w vac8 ymr1w rpl1a ymr9c yel33w RHO1 rpl7a nrf1 yor6c yml5w CDC4 KAR erp yel1c ymr14w cla4 rps4a number of occurences number of links sorted order of vertices (g) sorted order of gene windows (c) 1 1 random (d) inverse participation ratio small world (e) scale free (f) eigenvalue Fig. E of genes. Analysis of transcriptome similarities in the perturbation data set using a random reordering As an additional test we have randomly reordered genes in the perturbation set prior to the analysis. Note that the similarity graph obtained by the correlation search method on the reordered data is not altered significantly (compare to Figs. 1b and 1c of the paper). (a) Each node (vertex) of the graph represents a transcriptome and two transcriptomes are connected, if they were found to contain region(s) of similarity in their expression patterns. The similarity graph obtained for increasing similarity score thresholds C=.8, C=.84, C=.88 and C=.9. (b) Enlarged view of the graph obtained for C=.8. The graph is rich in loops and strongly interconnected groups of experiments. The close similarity between this graph and the one shown on Fig. 1c indicates that the random reordering of genes has no significant effect on our results. (c) The degree sequence of the largest component of 8

9 fus ymr141c cyt1 yor51c ERG11 pet111 pfd rml msu1 rml msu1 msu1 ymr93c sir4 sir4 ymr93c ecm18 yor78w ymr93c imp sir3 ymr14w sir3 yor78w ymr14w erg4 ecm18 ecm18 ymr93c erg4 erg4 vps8 sir ste rps7b the measured graph (black), and an idealized random graph (blue), small-world graph (green), or scale-free graph (red) are shown at C=.7. (d, e, f) Spectral comparison of the data graph and the three idealized test graphs. Each plot shows the inverse participation ratios of the graphs eigenvectors vs. the corresponding eigenvalue of the graph. (g) The frequency at which a given segment shows similarity between two transcriptomes vs. the sorted order of gene windows (C=.7). The fitted line is a power-law with the exponent.37. (a) (b) C=.9 pet111 msu1 cyt1 ymr93c C=.84 rml imp ecm18 erg4 ERG11 vps8 C=.88 fus sir3 sir ymr141c pfd sir4 ste C=.8 yor51c yor78w rps7b ymr14w Fig. F Analysis of transcriptome similarities in the perturbation data set using the hierarchically clustered order of genes. To test if the grouping similar expression level values together in each experiment affects our results, we have performed the correlation search algorithm on the perturbation data set after a hierarchical clustering of genes. Comparing to any other ordering of the genes tested here, hierarchical clustering produces a very small and weak network structure. Since the correlation search method compares the expression level values of adjacent genes in the experiments, the smoothing of expression levels by ordering similar expression values close to each other makes it harder for the algorithm to detect characteristic changes. Shown are (a) the network at different similarity thresholds and (b) enlarged for C=.8. (Note that the network is too small for a meaningful structural analysis.) Observe also that the main hubs of the graph predicted previously (see Fig. 1c of the paper) are still present suggesting that the only effect of the hierarchical clustering of genes on the correlation search technique was a shift in the value of the similarity threshold, C. 9

10 cla4 cbp gas1 erg3 msu1 gcn4 yel8w erg3 imp ERG11 swi4 msu1 yer83c yel8w yhl9c rml ymr93c jnm1 ckb gcn4 hda1 ubr1 ymr37w sir ymr93c ste4 sir3 vps8 ymr14w yor78w ste7 fus3,kss1 cyt1 pet111 isw1 cem1 afg3 isw1,isw mrpl33 msu1 isw rml ymr93c cbp imp yhr11w ste1 dig1,dig anp1 ERG11 sst ste7 yer83c cka dig1,dig spf1 ymr37w fus3,kss1 gas1 rad57 jnm1 ckb erg3 gcn4 hda1 ste4 sgs1 npr cin5 swi4 ubr1 yel8w sir vps8 ste11 yhl9c imp sir3 fus clb pex1 ste5 yor78w ymr141c KAR yhr39c rpl8a swi5 erp yor6c CDC4 yel1c ymr14w rps4a cyt1 pet111 isw1 msu1 cem1 afg3 mac1 isw1,isw mrpl33 isw rml ymr93c cbp imp kim4 yhr11w ste4 ste1 dig1,dig anp1 ERG11 sst ste18 ste7 CMD1 yer83c cka dig1,dig spf1 ymr37w kin3 fus3,kss1 yor8w ecm9 sir4 gas1 rad57 jnm1 ckb utr4 erg6 sbp1 erg3 gcn4 hda1 ste4 ras sgs1 npr cin5 hog1 she4 swi4 ubr1 gfd1 yal4w yel8w sir vps8 hir ste11 yhl9c AUR1 imp sir3 fus clb pex1 ste5 yor78w ssn6 ymr141c KAR yhl13c rpl8a swi5 tup1 ymr69w ymr1w rpl1a mrt4 ymr9c RHO1 yel33w erp rpl7a yor6c nrf1 CDC4 yel1c erg4 ymr14w yml5w rps4a ecm18 (a) C=.9 C=.84 C=.88 C=.8 yhr39c yml3w (b) cyt1 pet111 isw1 msu1 cem1 afg3 mac1 isw1,isw mrpl33 isw rml ymr93c cbp imp kim4 yhr11w ste4 ste1 dig1,dig anp1 ERG11 sst ste18 ste7 CMD1 yer83c cka dig1,dig spf1 ymr37w kin3 fus3,kss1 yor8w ecm9 sir4 gas1 rad57 jnm1 ckb utr4 erg6 sbp1 erg3 gcn4 hda1 ste4 ras sgs1 npr cin5 hog1 she4 swi4 ubr1 gfd1 yel8w yal4w sir vps8 hir ste11 yhl9c AUR1 imp sir3 fus clb pex1 ste5 yor78w ssn6 ymr141c KAR yhl13c yhr39c rpl8a swi5 tup1 yml3w ymr69w ymr1w rpl1a mrt4 ymr9c RHO1 yel33w erp yor6c rpl7a nrf1 CDC4 yel1c erg4 ymr14w yml5w cla4 rps4a ecm18 number of occurences number of links (c) sorted order of vertices (g) sorted order of gene windows inverse participation ratio random (d) small world (e) scale free (f) eigenvalue Fig. G Analysis of transcriptome similarities in the perturbation data set when genes are listed in the descending order of the standard deviations of their expression levels. To test whether the similarities detected by the correlation search method are due to a small number of genes strongly up- or downregulated on the given transcriptome segment, or they are due to a broad range of similarly regulated genes, we have listed the genes of the perturbation data set in the descending order of their expression levels standard deviations measured across the 87 experi- 1

11 ments. (a) The transcriptome similarity graph at different C similarity thresholds and (b) enlarged for C=.8. Note the strong similarity of the obtained graph to the one shown in the manuscript. Both the analysis of the (c) degree sequence and the spectral comparison (d, e, f) shows that the closest description for the transcriptome network is given by the scale-free model. In addition, the frequencies at which individual transcriptome segments hold similar expression patterns (g) display a scale-free distribution. The fitted line is a power-law with the exponent.3. 11

12 CDC4 spf1 spf1 KAR anp1 cbp erg3 cyt1 anp1 cbp gas1 msu1 gcn4 yel8w imp ERG11 yel47c erg3 rml msu1 yer83c gcn4 yel8w yhl9c rml ubr1 ckb sir ymr93c ymr93c sir3 vps8 ymr14w sst dig1,dig yor78w mac1 ste1 ste7 fus3,kss1 ste11 ste5 pet111 cem1 afg3 mac1 isw1,isw mrpl33 msu1 isw rml ymr93c yhr11w sst imp ste1 dig1,dig ERG11 kin3 ste7 yer83c dig1,dig ymr37w fus3,kss1 yel47c yer71c rad57 jnm1 ckb ste4 erg3 gcn4 hda1 npr cin5 utr4 swi4 ubr1 gfd1 yel8w sir vps8 yhl9c imp sir3 pex1 ste5 yor78w ste11 swi5 rpl8a ymr14w cyt1 pet111 isw1 msu1 cem1 afg3 mac1 isw1,isw mrpl33 isw rml ymr93c kim4 yhr11w sst ste4 imp ste1 dig1,dig ERG11 ste18 kin3 ste7 yer83c CMD1 cka spf1 dig1,dig ymr37w fus3,kss1 cbp yel47c ecm34 sir4 yer71c yor8w rad57 jnm1 ckb ste4 cmk gas1 erg3 gcn4 hda1 ras npr sgs1 cin5 utr4 hog1 swi4 ubr1 ade1 gfd1 yel8w yal4w sir vps8 yhl9c imp sir3 fus clb pex1 ste5 erg6 gln3 yor78w ste11 ymr141c yhr39c swi5 ymr69w rpl8a nrf1 yor6c rps7b CDC4 KAR erp yel1c ymr14w rps4a (a) C=.9 C=.84 C=.88 C=.8 anp1 aqy-b (b) cyt1 pet111 isw1 cem1 afg3 mac1 isw1,isw mrpl33 msu1 isw rml ymr93c kim4 yhr11w sst ste4 imp ste1 dig1,dig anp1 aqy-b ERG11 ste18 kin3 ste7 yer83c CMD1 cka spf1 dig1,dig ymr37w fus3,kss1 cbp yel47c ecm34 sir4 yer71c yor8w rad57 jnm1 ckb ste4 cmk gas1 erg3 gcn4 hda1 ras npr sgs1 cin5 utr4 hog1 swi4 ubr1 ade1 gfd1 yel8w yal4w sir vps8 yhl9c imp sir3 fus clb pex1 ste5 erg6 gln3 yor78w ste11 ymr141c yhr39c swi5 ymr69w rpl8a nrf1 yor6c rps7b CDC4 KAR erp yel1c ymr14w rps4a number of occurences number of links (c) sorted order of vertices (g) sorted order of gene windows inverse participation ratio random (d) small world (e) scale free (f) eigenvalue Fig. H Analysis of transcriptome similarities in the perturbation data set with a modified version of the correlation search algorithm containing overlapping gene regions. As an additional test we have increased the segment size and allowed the segments to overlap in the perturbation set prior to the analysis (see Web Note B for details). Note that the similarity graph obtained by this slightly altered correlation search method provides results that are comparable to those in Fig.1 of the paper. (a) Each node (vertex) of the graph represents a transcriptome and two transcriptomes are connected, if they were found to contain region(s) of similarity in their expression patterns. The similarity graph obtained for increasing similarity score thresholds C=.8, C=.84, C=.88 and C=.9. (b) Enlarged view of the graph obtained for C =.8. Note that the graph is rich in 1

13 loops and strongly interconnected groups of experiments. Strongly connected experiments of the original graph (see Fig. 1c of the paper) are usually strongly connected here, too. (c) The degree sequence of the largest component of the measured graph (black), and an idealized random graph (blue), small-world graph (green), or scale-free graph (red) is shown at C=.7. (d, e, f) Spectral comparison of the data graph and the three idealized test graphs at C=.7. Each plot shows the inverse participation ratios of the eigenvectors vs. the corresponding eigenvalues of the graphs. (g) The frequency at which a given transcriptome segment shows similarity between two transcriptomes vs. the sorted order of the segments (C=.7). The fitted line is a power-law with the exponent.37. EXP14 EXP Fig. I The similarity graph predicted for the control data set by the correlation search algorithm obtained with the same parameters as Fig. 1c of the paper (s=3, u=1 and C=.8). Labels show the indices of wild-type transcriptomes. Observe that the number and strength of similarities is well below those predicted for the perturbation data set. On the other hand, finding a connection above the C=.8 threshold indicates that the strongest similarities in the control data set are still much stronger than numerical artifacts, which usually do not yield connections stronger than C=.5.6 (see Web Fig. D). 13

14 (a) K=.96 K=.94 K=.9 K= (b) (c) sorted order of vertices 1 4 (g) number of occurences number of links sorted order of gene windows inverse participation ratio random (d) small world (e) scale free (f) eigenvalue Fig. J Analysis of the microarray data set published by Kim.et.al. 1 reveals scale-free structure of the transcriptome similarity graph of Caenorhabditis elegans. (a) Each node (vertex) of the graph represents a transcriptome and two transcriptomes are connected, if they were found to contain genes strongly up- or downregulated in both experiments. The similarity graph obtained for increasing similarity score thresholds C=.9, C=.9, C=.94 and C=.96. (b) Enlarged view of the graph obtained for C=.9. Note that the graph is rich in loops and strongly interconnected groups of experiments. (c) The degree sequence of the largest component of 14

15 the measured graph (black), and an idealized random graph (blue), small-world graph (green), or scale-free graph (red) is shown at C=.8. (d-f) Spectral comparison of the data graph and the three idealized test graphs at C=.8. Each plot shows the inverse participation ratios of the eigenvectors vs. the corresponding eigenvalues of the graphs. (g) The frequency at which a given gene shows similarity between two transcriptomes vs. the sorted order of genes (C=.8). The fitted line is a power-law with the exponent.1. k= yel8w (36) gcn4 (34) sir (34) swi4 (34) jnm1 (33) yer83c (3) (3) vps8 (31) ubr1 (31) ste4 (3) hda1 (9) yhl9c (8) (8) clb (7) ckb (6) rad57 (5) (4) erg3 (4) gas1 (3) (3) cin5 () ymr37w (1) imp () yhr39c (18) swi5 (18) yel8w (3) swi4 (9) jnm1 (8) gcn4 (7) sir (7) yhl9c (6) (6) yer83c (5) ste4 (4) vps8 (4) erg3 () clb () hda1 (1) ubr1 (1) (1) () cin5 (19) (19) ymr37w (17) ckb (17) rad57 (17) gas1 (15) pex1 (14) yhr39c (14) imp (13) swi4 (5) yel8w (4) sir () yer83c (1) gcn4 () jnm1 () erg3 () () ste4 (19) yhl9c (18) (18) vps8 (18) ubr1 (17) hda1 (17) clb (16) (16) (15) rad57 (13) ymr93c (11) ymr37w (11) cin5 (1) ckb (9) swi5 (9) gas1 (9) yhr39c (8) yel8w () (18) swi4 (17) gcn4 (16) sir (15) jnm1 (15) yhl9c (14) erg3 (14) vps8 (13) yer83c (1) (1) ubr1 (1) ste4 (11) (11) clb (1) (1) hda1 (1) ymr93c (9) rad57 (8) msu1 (7) ymr37w (7) ckb (6) (6) (6) yhr39c (5) gcn4 (14) yel8w (14) (14) swi4 (13) (11) sir (1) jnm1 (9) erg3 (9) yhl9c (8) yer83c (8) hda1 (8) ymr93c (7) ste4 (7) clb (6) ubr1 (6) msu1 (6) vps8 (6) ckb (5) (5) (5) (5) ERG11 (4) ymr37w (4) rml (4) ste7 (3) Fig. K After randomly reordering the rows of genes of the perturbation data set, the list of transcriptomes/deleted genes with the highest numbers of connections on the similarity graph is highly similar to the original result of Fig.3b in the paper. 15

16 Web Note B - Detailed Methods Table of Contents: 1. Data preparation p. 16 Data source and construction of microarray matrices. Stepwise correlation search method p. 18 Description of the stepwise correlation search algorithm p. Testing the stepwise correlation search technique by applying it to reordered versions of the perturbation data matrix p. 1 Testing the stepwise correlation search technique by allowing overlapping transcriptome segments 3. Analysis of the similarity graph provided by the stepwise correlation search method p. Description of the random graph models used to test the structure of the predicted similarity graph p. Tools of the spectral analysis p. 3 Quality of fit in the spectral comparison of the data graph and the test graphs p. 3 Testing the structure of the similarity graph by using alternative parameter sets 16

17 1. Data preparation Data source and construction of microarray matrices We used the publicly available microarray data set of Friend and colleagues 3, and the files control_expts1-63_ratios.txt and data_expts1-3_ratios.txt 3 were used from the package downloaded from As a first step, we arranged the 63 control- and the 87 perturbation microarray data sets into two separate matrices, with the expression level values of gene i being listed in the ith row, and the expression level values of the jth measurement listed in the jth column. (The 87 data set was obtained by keeping only the data of singlegene deletion mutant strains out of the 3 set.) In both cases, we ordered rows and columns as they were listed in the original data files. Genes (rows) were listed alphabetically, and transcriptomes (columns) were listed in the temporal order of experiments in the control data set and alphabetically in the perturbation set (see Web Fig. A). For the statistical characterization of the two matrices we use the following notations. In either case, the data matrix, e, has N rows (each containing the expression levels of one gene) and m columns (each containing the expression levels of all genes in one experiment, i.e., one measured transcriptome). The expression level of the ith gene in the jth array is e ij, and the average expression level of this gene throughout the m arrays is A i = m 1 m m e j =1 ij. The standard deviation of the expression level of the same gene is Σ i = m 1 (e ij A i ). The average expression level of genes in the jth array is a j = N 1 N e i=1 ij array is σ j = N 1 (e ij a j ) N i=1 j =1, and the standard deviation of the expression level values in the same. The raw data files contain base 1 logarithmic values. A value of, e.g.,.5 indicates the upregulation of a gene' s expression level by a factor of 1.5 = 3.16, and a value of -.7 means downregulation to the =. part of the expression level. From the mathematical point of view, there are several possible scales that can be used for the analyses of these datasets. As an example, the linear data scale would mean using the value 3.16 instead of.5, and using the value. instead of -.7. We decided to use the original, base 1 logarithmic data scale 3 for two reasons. First, in most biological systems, where experimental data span several orders of magnitude, the data scale most readily applicable for description is the logarithmic scale. Second, in both raw data sets (the 63 control- and the 87 perturbation subsets) data are approximately centered around, which minimizes the accumulation rate of data error throughout the computational analysis shown below. Third, microarray 17

18 data is produced by measuring light intensities, for which the physically relevant data scale is again logarithmic. It is important to note that the raw data matrices are not complete. First, in both downloaded data sets for many genes less than m data points are available. Another discrepancy observed in the data files was that upregulation by more than a base 1 logarithmic ratio of (i.e., higher than 1-fold upregulation) is always indicated by +, and downregulation by more than - (i.e., 1-fold downregulation) is always indicated by -. In other words, the data set contains an experimental cutoff at + and. If a row contains either a missing e ij, an e ij = + or an e ij = - value, then the simplest approach is to remove this row (i.e., to remove this gene from the data set). An alternative approach is to treat each e ij = +/ - value as unknown, too, and to keep track of all unkown values throughout the analysis. For the 87 perturbation data set, the first approach (i.e., the removal of genes for which less than 87 measured values are available) would discard more than one fourth of the complete transcriptome. As our aim was to carry out global analyses of transcriptomes, we decided to use the second approach (i.e., to keep all rows of the matrix and to keep track of missing values) throughout our analyses. However, in all cases, we removed repeated open reading frame (ORF) names and those ORF names that did not follow the common ' Y number' nomenclature to designate individual yea st ORFs. Following this selection, the number of rows (i.e., individual genes) was 687 for the control data matrix, and 68 for the perturbation data set.. Stepwise correlation search method Description of the stepwise correlation search algorithm The stepwise correlation search technique in this paper uses full-transcriptome data sets as input, and searches the columns (i.e., the transcriptomes) of the data set for groups of expressed genes displaying similar expression patterns. For each pair of compared transcriptomes, the groups of similarly expressed genes are allowed to be different. Since the suggested algorithm compares each pair of transcriptomes individually, the results are independent of the order in which the transcriptomes are listed. For the analyses shown in the paper (see Figs. 1-4), genes were listed in the alphabetical order of their open reading frame (ORF) names. We have also shown that the predicted similarity graph changes only slightly and the graph s statistical properties are identical when genes of the data set are randomly reordered. When searching for groups of similarly expressed genes in two transcriptomes, ideally, one should test all possible subsets of the N genes. Unfortunately, the number of all possible subsets of 18

19 N size s in a set of N genes is, which is growing too rapidly with N to enable the testing of all s subsets. However, in practice, the number of all co-regulated subsets of genes is usually far below this number. Here we introduce a method that reduces computational time from the ideally necessary binomial to linear in N. The basic tool of the algorithm is a sliding transcriptome segment (i.e., a small group of sequentially listed genes) that is used to select a small number of genes and check whether they show a similar expression profile in the two transcriptomes being compared. To search for correlations among transcriptomes, we compare each pair of transcriptomes individually. For one transcriptome pair, we first find the list of genes with known expression level values in both transcriptomes (in the downloaded data files, we called a value known, if it was not missing and was not + or -.). Next, we define a sliding segment with size s, and place this segment on the first s genes with known expression values in both transcriptomes. The two data sets to be compared are now the 1.,.,, s. gene expression level values of the first transcriptome and the 1.,.,, s. gene expression level values of the second transcriptome. We label these two sets (two vectors) by e 1 ={e 1,1, e 1,,, e 1, s } and e ={e,1, e,,, e, s }, respectively. Next, we compute the mean values (m 1 and m ) and standard deviations ( σ 1 and σ ) of these two vectors: m 1 = s 1 σ 1 = s 1 e 1, j m 1 s j =1( ) s e j =1 1, j. For the measure of similarity, C1,, between the vectors e 1 and e we used the absolute value of the (Pearson) correlation: C 1, = (sσ 1 σ ) 1 (e 1, j m 1 )(e, j m ). We used the absolute value, because two biological signals with the mathematical correlation 1 (changing in exactly the opposite way) are coupled with the same strength as two with the correlation +1 (changing in exactly the same way). After saving the obtained value for C 1,, we move the segment with a step size of s (in the paper, we used s=3), therefore, the second segment contained the (s+1)., (s+).,, (s). genes, the third segment contained the (s+1)., (s+).,, (3s). genes, etc. Note, that segments cover the entire genome, but they do not overlap. We defined the similarity score of the two transcriptomes as the u-th (in the paper u=1) largest C 1, value obtained for the given two transcriptomes. Having computed the similarity score for each transcriptome pair (an m x m symmetrical matrix), we used a constant threshold, C, to decide which transcriptome pairs are coupled strongly enough (in the paper, C is varied between.8 and.9). If the similarity score for a given pair of transcriptomes was above C, then the two points of the graph corresponding to these two transcriptomes were connected. Fig. 1c in the paper shows the graph for the parameters s=3, u=1, C=.8. (Note, that only transcriptomes with at least one connection are shown.) s j =1 and 19

20 The correlation search algorithm performs a subspace search in the space of transcriptome vectors independently for each pair of transcriptomes with the aim to find groups of similarly expressed genes in the two transcriptomes. Thus, we expect it to find response patterns displayed by a small number of genes under a small number of conditions more easily than methods targeted at the blockdiagonalization of distance matrices derived from full-genome transcriptomes 4. Further, the correlation search algorithm is a direct subspace search method as opposed to biclustering 5, which is an iterative algorithm, and searches for local minima in the space of all possible submatrices by making short steps toward the steepest descent. For data sets containing a high number characteristic expression patterns extending over almost all genes used in the analysis, we expect the correlation search technique to be comparable in its results and speed to biclustering combined with a refined method for localizing minima, e.g., simulated annealing 6, 7. However, for the analysis of microarray data sets containing a small number of similarities displayed under a small number of conditions (e.g., localized transcriptional responses in large data sets), we expect the correlation search algorithm to be more suitable. In summary, the major strength of the stepwise correlation search method is that it compares each pair of transcriptomes individually and for each transcriptome pair it allows for similar patterns to appear on different groups of expressed genes. Thus, it is able to detect a high variety of shared similarities among experiments. Testing the stepwise correlation search technique by applying it to reordered versions of the perturbation data matrix To analyze the effect of the order of genes on our results, we have performed two tests. In both tests, the stepwise correlation search algorithm was applied to a modified version of the matrix, e, of the perturbation data set. First, we intended to test the role of numerical artifacts in the predicted similarity graph shown in Fig.1c of the paper, and scrambled the expression values in each transcriptome of the perturbation data set independently. In the resulting matrix the e ij values in any of the rows were expression values experimentally measured for different genes. Having performed the correlation search algorithm on this modified data set with unchanged parameter values (s=3, u=1 and C=.8), we found that no pair of transcriptomes has reached the similarity score C=.8. The three highest similarity scores measured were C=.59, C=.5 and C=.49 (see Web Fig. D). In comparison, the original data set yielded 16 pairs of transcriptomes with a similarity score above C=.9. We conclude, that the similarities detected by the correlation search method are not numerical artifacts, but truly existing similarities between transcriptomes.

21 In the second, third and fourth tests, we set out to analyze whether the results obtained by the stepwise correlation search method are influenced by the order of genes used in the data set. In the second test, we created another modified version of the perturbation data set by scrambling the expression values in each transcriptome identically. In the resulting matrix any of the rows contained expression values of the same gene. According to the analyses shown in Web Fig. F, the predicted similarity graph changes only slightly upon a random reordering of genes and also, the statistical properties of the graph remain identical. In the third test (see Web Fig.G), we listed genes using the hierarchically clustered order. Since the hierarchical clustering technique groups similar expression values close to each other in each transcriptome, the significant differences between neighboring expression levels in each column of the microarray matrix are smoothed, and the resulting similarity matrix is much weaker than before. Observe, however, that the main hubs of the graph predicted previously (see Fig.1e of the paper) are still present suggesting that the only effect of the hierarchical clustering of genes on the correlation search technique was a shift in the value of the similarity threshold, C. In the fourth test (see Web Fig.H), we listed genes using the descending order of their expression level variances. The similarity graph predicted for this case was again almost identical to the graph predicted for the original case (see Fig.1e of the paper). In other words, the similarity graph found by the correlation search method represents not merely the effect of a few genes with large expression level changes, but rather the combined effect of genes with large, medium and small expression level variances forming a seamless continuum. Testing the stepwise correlation search technique by allowing overlapping transcriptome segments In Web Fig. I, we used a slightly modified version of the correlation search technique, where transcriptome segments are allowed to overlap. When selecting two transcriptomes to be compared, the first segment is placed on the 1.,.,, s. genes, as before, however, the second segment contains the (t+1)., (t+).,, (t+s). genes, the third segment contains the (t+1)., (t+).,, (t+s). genes, etc. Note that if t<s, then adjacent gene segments will overlap: the (t+1)., (t+).,, s. genes will be contained by both the first and the second segment. On Web Fig. I we used s=6, t=15, u=1 and C=.8, and found that transcriptomes strongly connected on the original graph (Fig. 1e of the paper) are usually strongly connected here, too. In addition, we have performed all analyses shown in the paper and have demonstrated that the statistical properties of the similarity graph remained identical. 1

22 3. Analysis of the similarity graph provided by the stepwise correlation search method Description of the random graph models used to test the structure of the predicted similarity graph To test the structure of the similarity graph computed for a certain set of parameters, we compared it to three random graph models. In the uncorrelated random graph (also called the random graph 8 ) n e edges are connecting randomly chosen pairs of the graph s n v vertices. For the small-world graph 9, one starts with n v vertices placed along the perimeter of a circle, connects each vertex to its z nearest neighbors with z being the closest integer to n v / (n e ) and then, rewires a randomly chosen p r proportion of all edges. (We used p r =.1 everywhere.) For the scale-free graph 1 we performed the iteration mimicking the growth of the network for t time steps. During one time step, a new vertex was added with probability p v, and a new edge was added with probability 1- p v. In the case of a new edge the first vertex, i, to be connected was chosen randomly. The probability, i Π j, for connecting vertex i, to another vertex, j, was defined using the degree, k j, (i.e., the i number of links) of vertex j as Π j = k j k l i l representing a linear preference of vertices with a higher number of connections. The number of edges and the number of vertices had to be identical to those in the data graph, thus, the two parameters of the model were t = n e +n v and p v = n e / (n e +n v ). Tools of the spectral analysis Consider a simple graph with N vertices (nodes), i.e., a graph where none of the N vertices is connected to itself, all connections are undirected and have the same, 1, weight. The adjacency matrix of this graph is a symmetrical N x N square matrix, A, with A ij =1, if the ith and jth vertex of the graph are connected and A ij =, if they are not. The diagonal entries of the adjacency matrix are all zeroes: A ii = for each i=1,,, N. The spectrum of the graph is the set of eigenvalues of the graph s adjacency matrix. For a simple graph the eigenvalues are all real numbers and the eigenvectors are all vectors containing real numbers. According to a recent study 11, the eigenvalues of a graph and the inverse participation ratios of the graph s eigenvectors are well applicable for the structural analysis of the graph. The eigenvalues and eigenvectors of a graph are the eigenvalues and eigenvectors of the graph s adjacency matrix, A. The largest eigenvalue of a graph is called the graph s first eigenvalue and the first eigenvector is the eigenvector of the first eigenvalue. The inverse participation ratio of a normalized eigenvector is the sum of fourth powers of the eigenvector s components. If N is the number of

23 components in the eigenvector e j, and e j,k is the kth component of e j, then the inverse participation ratio I j, of this eigenvector is I j = N e k =1( j,k ) =1 N ( e j,k ) 4 k=1. Since each eigenvector is normalized, i.e.,, the inverse participation ratio can be used to measure the number of those components in e j that are significantly different from. If N-1 components of e j are zeroes and only one differs from zero (i.e., it is equal to 1), then I j will be 1. On the other hand, if all components of e j are different from zero, e.g., they are all equal to 1/ N, then I j will be 1/N. Note that any eigenvector can be treated as a set of N numbers, with the ith component of the eigenvector being written on the ith vertex of the graph. Thus, the inverse participation ratio of a given eigenvector can be used to determine if that eigenvector is localized on a small number of vertices of the graph or not localized at all. If the inverse participation ratio of an eigenvector is high (close to 1), then the eigenvector is localized; if it is low (close to 1/N), then the eigenvector is non-localized. If an eigenvector is localized on a small number of vertices, then only those few vertices are significant for the eigenvalue of that eigenvector. On the other hand, a non-localized eigenvector shows that all vertices of the graph have approximately the same significance in determining the eigenvalue of that eigenvector. Quality of fit in the spectral comparison of the data graph and the test graphs In Fig. b-d of the paper, we plot the inverse participation ratio as a function of the corresponding eigenvalue for the similarity graph and the three test graphs: the uncorrelated random 8 graph, the small-world 9 - and the scale-free 1 graph. To analyze how well the inverse participation ratio vs. eigenvalue function computed for a test graphs fits the same function obtained for the similarity graph, we used the following two quantities. X = ln(i (data) (test 1 / I ) 1 ) compares the inverse participation ratios of the data graph s and the test graph s first eigenvectors, i.e., it compares the level of structural dominance of the most highly connected vertices in the two graphs. N { j=1[ ] } / N λ (data) j =1 j Y = λ j (data) λ j (test ) ( ) compares the test graph s eigenvalues to those of the data graph. For both quantities, the lowest scores indicating the best agreement between data and test is given by the scale-free model. With the parameters used for Fig. of the manuscript, the uncorrelated random test graph gives X=.8 and Y=15., for the small-world test graph X=1.9 and Y=16.14, and for the scale-free test graph X=.5 and Y=6.5. Testing the structure of the similarity graph by using alternative parameter sets To analyze how the empirical parameters s, u and C affect the structural changes of the similarity 3

24 graph, we have computed the similarity graph for a broad range of parameters. For each similarity graph, we used three test graphs (with the closest possible number of vertices and edges) and for each test graph we computed the values of X and Y characterizing the quality of fit in the spectral comparison. Consider a 3-dimensional parameter space with the coordinates s, u and C. Starting from the point s=3, u=1, C=.8, we scanned the parameter space in all three directions. First (see Web Fig. E a, d), we constructed the similarity graph and its test graphs with s varied between 13 and 5, and u=1, C=.8. Next (Figs. E b, e), the similarity graph and its three test graphs were analyzed with u varied between 3 and, and s=3, C=.8. Finally (Web Fig. E c, f), we examined the results when the parameter C was varied between.6 and.95, and s=3 and u=1 were kept constant. For each investigated point of the parameter space, the spectral comparison of the three test graphs with the data graph is shown on Web Fig. E. All subfigures analyze the quality of fit (see above) of the inverse participation ratio vs. eigenvalue plot of the graph. The first and second rows of subfigures on Web Fig. E compare the quality of fit via the values of X and Y computed for the test graphs, respectively. In all cases, a lower score means closer agreement between the structural properties of the test graph and the data graph. 4

Using graphs to relate expression data and protein-protein interaction data

Using graphs to relate expression data and protein-protein interaction data Using graphs to relate expression data and protein-protein interaction data R. Gentleman and D. Scholtens October 31, 2017 Introduction In Ge et al. (2001) the authors consider an interesting question.

More information

Bioinformatics. Transcriptome

Bioinformatics. Transcriptome Bioinformatics Transcriptome Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/ Bioinformatics

More information

Introduction to Microarray Data Analysis and Gene Networks lecture 8. Alvis Brazma European Bioinformatics Institute

Introduction to Microarray Data Analysis and Gene Networks lecture 8. Alvis Brazma European Bioinformatics Institute Introduction to Microarray Data Analysis and Gene Networks lecture 8 Alvis Brazma European Bioinformatics Institute Lecture 8 Gene networks part 2 Network topology (part 2) Network logics Network dynamics

More information

Predicting causal effects in large-scale systems from observational data

Predicting causal effects in large-scale systems from observational data nature methods Predicting causal effects in large-scale systems from observational data Marloes H Maathuis 1, Diego Colombo 1, Markus Kalisch 1 & Peter Bühlmann 1,2 Supplementary figures and text: Supplementary

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

Gene expression microarray technology measures the expression levels of thousands of genes. Research Article

Gene expression microarray technology measures the expression levels of thousands of genes. Research Article JOURNAL OF COMPUTATIONAL BIOLOGY Volume 7, Number 2, 2 # Mary Ann Liebert, Inc. Pp. 8 DOI:.89/cmb.29.52 Research Article Reducing the Computational Complexity of Information Theoretic Approaches for Reconstructing

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Lecture 5: November 19, Minimizing the maximum intracluster distance

Lecture 5: November 19, Minimizing the maximum intracluster distance Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction

More information

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Evidence for dynamically organized modularity in the yeast protein-protein interaction network Evidence for dynamically organized modularity in the yeast protein-protein interaction network Sari Bombino Helsinki 27.3.2007 UNIVERSITY OF HELSINKI Department of Computer Science Seminar on Computational

More information

Androgen-independent prostate cancer

Androgen-independent prostate cancer The following tutorial walks through the identification of biological themes in a microarray dataset examining androgen-independent. Visit the GeneSifter Data Center (www.genesifter.net/web/datacenter.html)

More information

Homework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM

Homework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM Homework 9: Protein Folding & Simulated Annealing 02-201: Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM 1. Set up We re back to Go for this assignment. 1. Inside of your src directory,

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks in biology Protein-Protein Interaction Network of Yeast Transcriptional regulatory network of E.coli Experimental

More information

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 12 : Graph Laplacians and Cheeger s Inequality CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful

More information

Erzsébet Ravasz Advisor: Albert-László Barabási

Erzsébet Ravasz Advisor: Albert-László Barabási Hierarchical Networks Erzsébet Ravasz Advisor: Albert-László Barabási Introduction to networks How to model complex networks? Clustering and hierarchy Hierarchical organization of cellular metabolism The

More information

GENETICS. Supporting Information

GENETICS. Supporting Information GENETICS Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.117655/dc1 Trivalent Arsenic Inhibits the Functions of Chaperonin Complex XuewenPan,StefanieReissman,NickR.Douglas,ZhiweiHuang,DanielS.Yuan,

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

Supplementary Information

Supplementary Information Supplementary Information For the article"comparable system-level organization of Archaea and ukaryotes" by J. Podani, Z. N. Oltvai, H. Jeong, B. Tombor, A.-L. Barabási, and. Szathmáry (reference numbers

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION med!1,2 Wild-type (N2) end!3 elt!2 5 1 15 Time (minutes) 5 1 15 Time (minutes) med!1,2 end!3 5 1 15 Time (minutes) elt!2 5 1 15 Time (minutes) Supplementary Figure 1: Number of med-1,2, end-3, end-1 and

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

Spectral Generative Models for Graphs

Spectral Generative Models for Graphs Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known

More information

Self Similar (Scale Free, Power Law) Networks (I)

Self Similar (Scale Free, Power Law) Networks (I) Self Similar (Scale Free, Power Law) Networks (I) E6083: lecture 4 Prof. Predrag R. Jelenković Dept. of Electrical Engineering Columbia University, NY 10027, USA {predrag}@ee.columbia.edu February 7, 2007

More information

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai Network Biology: Understanding the cell s functional organization Albert-László Barabási Zoltán N. Oltvai Outline: Evolutionary origin of scale-free networks Motifs, modules and hierarchical networks Network

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

Protein function prediction via analysis of interactomes

Protein function prediction via analysis of interactomes Protein function prediction via analysis of interactomes Elena Nabieva Mona Singh Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics January 22, 2008 1 Introduction Genome

More information

Differential Modeling for Cancer Microarray Data

Differential Modeling for Cancer Microarray Data Differential Modeling for Cancer Microarray Data Omar Odibat Department of Computer Science Feb, 01, 2011 1 Outline Introduction Cancer Microarray data Problem Definition Differential analysis Existing

More information

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics Spectral Graph Theory and You: and Centrality Metrics Jonathan Gootenberg March 11, 2013 1 / 19 Outline of Topics 1 Motivation Basics of Spectral Graph Theory Understanding the characteristic polynomial

More information

Link to related Solstice article: Sum-graph article, Arlinghaus, Arlinghaus, and Harary.

Link to related Solstice article:  Sum-graph article, Arlinghaus, Arlinghaus, and Harary. Link to related Solstice article: http://www-personal.umich.edu/~sarhaus/image/sols193.html, Sum-graph article, Arlinghaus, Arlinghaus, and Harary. A GRAPH THEORETIC VIEW OF THE JOIN-COUNT STATISTIC Sandra

More information

CS 664 Segmentation (2) Daniel Huttenlocher

CS 664 Segmentation (2) Daniel Huttenlocher CS 664 Segmentation (2) Daniel Huttenlocher Recap Last time covered perceptual organization more broadly, focused in on pixel-wise segmentation Covered local graph-based methods such as MST and Felzenszwalb-Huttenlocher

More information

CS246 Final Exam. March 16, :30AM - 11:30AM

CS246 Final Exam. March 16, :30AM - 11:30AM CS246 Final Exam March 16, 2016 8:30AM - 11:30AM Name : SUID : I acknowledge and accept the Stanford Honor Code. I have neither given nor received unpermitted help on this examination. (signed) Directions

More information

Systems biology and biological networks

Systems biology and biological networks Systems Biology Workshop Systems biology and biological networks Center for Biological Sequence Analysis Networks in electronics Radio kindly provided by Lazebnik, Cancer Cell, 2002 Systems Biology Workshop,

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Clustering of Pathogenic Genes in Human Co-regulatory Network. Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015

Clustering of Pathogenic Genes in Human Co-regulatory Network. Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015 Clustering of Pathogenic Genes in Human Co-regulatory Network Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015 Topics Background Genetic Background Regulatory Networks

More information

Delay and Accessibility in Random Temporal Networks

Delay and Accessibility in Random Temporal Networks Delay and Accessibility in Random Temporal Networks 2nd Symposium on Spatial Networks Shahriar Etemadi Tajbakhsh September 13, 2017 Outline Z Accessibility in Deterministic Static and Temporal Networks

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Comparative Network Analysis COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Biomolecular Network Components 2 Accumulation of Network Components

More information

Spectral Analysis of Directed Complex Networks. Tetsuro Murai

Spectral Analysis of Directed Complex Networks. Tetsuro Murai MASTER THESIS Spectral Analysis of Directed Complex Networks Tetsuro Murai Department of Physics, Graduate School of Science and Engineering, Aoyama Gakuin University Supervisors: Naomichi Hatano and Kenn

More information

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

Notes on the Matrix-Tree theorem and Cayley s tree enumerator Notes on the Matrix-Tree theorem and Cayley s tree enumerator 1 Cayley s tree enumerator Recall that the degree of a vertex in a tree (or in any graph) is the number of edges emanating from it We will

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Class President: A Network Approach to Popularity. Due July 18, 2014

Class President: A Network Approach to Popularity. Due July 18, 2014 Class President: A Network Approach to Popularity Due July 8, 24 Instructions. Due Fri, July 8 at :59 PM 2. Work in groups of up to 3 3. Type up the report, and submit as a pdf on D2L 4. Attach the code

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E.

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E. 3.3 Diagonalization Let A = 4. Then and are eigenvectors of A, with corresponding eigenvalues 2 and 6 respectively (check). This means 4 = 2, 4 = 6. 2 2 2 2 Thus 4 = 2 2 6 2 = 2 6 4 2 We have 4 = 2 0 0

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property Chapter 1: and Markov chains Stochastic processes We study stochastic processes, which are families of random variables describing the evolution of a quantity with time. In some situations, we can treat

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Lecture 1 and 2: Introduction and Graph theory basics. Spring EE 194, Networked estimation and control (Prof. Khan) January 23, 2012

Lecture 1 and 2: Introduction and Graph theory basics. Spring EE 194, Networked estimation and control (Prof. Khan) January 23, 2012 Lecture 1 and 2: Introduction and Graph theory basics Spring 2012 - EE 194, Networked estimation and control (Prof. Khan) January 23, 2012 Spring 2012: EE-194-02 Networked estimation and control Schedule

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Designing Information Devices and Systems I Spring 2017 Babak Ayazifar, Vladimir Stojanovic Homework 4

Designing Information Devices and Systems I Spring 2017 Babak Ayazifar, Vladimir Stojanovic Homework 4 EECS 16A Designing Information Devices and Systems I Spring 2017 Babak Ayazifar, Vladimir Stojanovic Homework This homework is due February 22, 2017, at 2:59. Self-grades are due February 27, 2017, at

More information

1.10 Matrix Representation of Graphs

1.10 Matrix Representation of Graphs 42 Basic Concepts of Graphs 1.10 Matrix Representation of Graphs Definitions: In this section, we introduce two kinds of matrix representations of a graph, that is, the adjacency matrix and incidence matrix

More information

Definition 2.3. We define addition and multiplication of matrices as follows.

Definition 2.3. We define addition and multiplication of matrices as follows. 14 Chapter 2 Matrices In this chapter, we review matrix algebra from Linear Algebra I, consider row and column operations on matrices, and define the rank of a matrix. Along the way prove that the row

More information

This appendix provides a very basic introduction to linear algebra concepts.

This appendix provides a very basic introduction to linear algebra concepts. APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

An indicator for the number of clusters using a linear map to simplex structure

An indicator for the number of clusters using a linear map to simplex structure An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany

More information

Data reduction for multivariate analysis

Data reduction for multivariate analysis Data reduction for multivariate analysis Using T 2, m-cusum, m-ewma can help deal with the multivariate detection cases. But when the characteristic vector x of interest is of high dimension, it is difficult

More information

Fractal functional regression for classification of gene expression data by wavelets

Fractal functional regression for classification of gene expression data by wavelets Fractal functional regression for classification of gene expression data by wavelets Margarita María Rincón 1 and María Dolores Ruiz-Medina 2 1 University of Granada Campus Fuente Nueva 18071 Granada,

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

Application of random matrix theory to microarray data for discovering functional gene modules

Application of random matrix theory to microarray data for discovering functional gene modules Application of random matrix theory to microarray data for discovering functional gene modules Feng Luo, 1 Jianxin Zhong, 2,3, * Yunfeng Yang, 4 and Jizhong Zhou 4,5, 1 Department of Computer Science,

More information

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky Monte Carlo Lecture 15 4/9/18 1 Sampling with dynamics In Molecular Dynamics we simulate evolution of a system over time according to Newton s equations, conserving energy Averages (thermodynamic properties)

More information

UNIVERSITY OF NORTH CAROLINA CHARLOTTE 1995 HIGH SCHOOL MATHEMATICS CONTEST March 13, 1995 (C) 10 3 (D) = 1011 (10 1) 9

UNIVERSITY OF NORTH CAROLINA CHARLOTTE 1995 HIGH SCHOOL MATHEMATICS CONTEST March 13, 1995 (C) 10 3 (D) = 1011 (10 1) 9 UNIVERSITY OF NORTH CAROLINA CHARLOTTE 5 HIGH SCHOOL MATHEMATICS CONTEST March, 5. 0 2 0 = (A) (B) 0 (C) 0 (D) 0 (E) 0 (E) 0 2 0 = 0 (0 ) = 0 2. If z = x, what are all the values of y for which (x + y)

More information

Detecting temporal protein complexes from dynamic protein-protein interaction networks

Detecting temporal protein complexes from dynamic protein-protein interaction networks Detecting temporal protein complexes from dynamic protein-protein interaction networks Le Ou-Yang, Dao-Qing Dai, Xiao-Li Li, Min Wu, Xiao-Fei Zhang and Peng Yang 1 Supplementary Table Table S1: Comparative

More information

networks in molecular biology Wolfgang Huber

networks in molecular biology Wolfgang Huber networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

On Distributed Coordination of Mobile Agents with Changing Nearest Neighbors

On Distributed Coordination of Mobile Agents with Changing Nearest Neighbors On Distributed Coordination of Mobile Agents with Changing Nearest Neighbors Ali Jadbabaie Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia, PA 19104 jadbabai@seas.upenn.edu

More information

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks Shen HW, Cheng XQ, Wang YZ et al. A dimensionality reduction framework for detection of multiscale structure in heterogeneous networks. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(2): 341 357 Mar. 2012.

More information

9. Nuclear Magnetic Resonance

9. Nuclear Magnetic Resonance 9. Nuclear Magnetic Resonance Nuclear Magnetic Resonance (NMR) is a method that can be used to find structures of proteins. NMR spectroscopy is the observation of spins of atoms and electrons in a molecule

More information

Communities, Spectral Clustering, and Random Walks

Communities, Spectral Clustering, and Random Walks Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Supplementary Information

Supplementary Information Supplementary Information 1 List of Figures 1 Models of circular chromosomes. 2 Distribution of distances between core genes in Escherichia coli K12, arc based model. 3 Distribution of distances between

More information

Relational Nonlinear FIR Filters. Ronald K. Pearson

Relational Nonlinear FIR Filters. Ronald K. Pearson Relational Nonlinear FIR Filters Ronald K. Pearson Daniel Baugh Institute for Functional Genomics and Computational Biology Thomas Jefferson University Philadelphia, PA Moncef Gabbouj Institute of Signal

More information

Weighted gene co-expression analysis. Yuehua Cui June 7, 2013

Weighted gene co-expression analysis. Yuehua Cui June 7, 2013 Weighted gene co-expression analysis Yuehua Cui June 7, 2013 Weighted gene co-expression network (WGCNA) A type of scale-free network: A scale-free network is a network whose degree distribution follows

More information

Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion

Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion 1 Overview With clustering, we have several key motivations: archetypes (factor analysis) segmentation hierarchy

More information

x y = 1, 2x y + z = 2, and 3w + x + y + 2z = 0

x y = 1, 2x y + z = 2, and 3w + x + y + 2z = 0 Section. Systems of Linear Equations The equations x + 3 y =, x y + z =, and 3w + x + y + z = 0 have a common feature: each describes a geometric shape that is linear. Upon rewriting the first equation

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010 Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu COMPSTAT

More information

Singular value decomposition for genome-wide expression data processing and modeling. Presented by Jing Qiu

Singular value decomposition for genome-wide expression data processing and modeling. Presented by Jing Qiu Singular value decomposition for genome-wide expression data processing and modeling Presented by Jing Qiu April 23, 2002 Outline Biological Background Mathematical Framework:Singular Value Decomposition

More information

Correlation Networks

Correlation Networks QuickTime decompressor and a are needed to see this picture. Correlation Networks Analysis of Biological Networks April 24, 2010 Correlation Networks - Analysis of Biological Networks 1 Review We have

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Using DERIVE to Interpret an Algorithmic Method for Finding Hamiltonian Circuits (and Rooted Paths) in Network Graphs

Using DERIVE to Interpret an Algorithmic Method for Finding Hamiltonian Circuits (and Rooted Paths) in Network Graphs Liverpool John Moores University, July 12 15, 2000 Using DERIVE to Interpret an Algorithmic Method for Finding Hamiltonian Circuits (and Rooted Paths) in Network Graphs Introduction Peter Schofield Trinity

More information

A sequence of triangle-free pseudorandom graphs

A sequence of triangle-free pseudorandom graphs A sequence of triangle-free pseudorandom graphs David Conlon Abstract A construction of Alon yields a sequence of highly pseudorandom triangle-free graphs with edge density significantly higher than one

More information

EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST

EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST TIAN ZHENG, SHAW-HWA LO DEPARTMENT OF STATISTICS, COLUMBIA UNIVERSITY Abstract. In

More information

Principal Component Analysis, A Powerful Scoring Technique

Principal Component Analysis, A Powerful Scoring Technique Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

More information

Grouping of correlated feature vectors using treelets

Grouping of correlated feature vectors using treelets Grouping of correlated feature vectors using treelets Jing Xiang Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213 jingx@cs.cmu.edu Abstract In many applications, features

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Computational statistics

Computational statistics Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f

More information

7.06 Problem Set #4, Spring 2005

7.06 Problem Set #4, Spring 2005 7.06 Problem Set #4, Spring 2005 1. You re doing a mutant hunt in S. cerevisiae (budding yeast), looking for temperaturesensitive mutants that are defective in the cell cycle. You discover a mutant strain

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

Interaction Network Analysis

Interaction Network Analysis CSI/BIF 5330 Interaction etwork Analsis Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Biological etworks Definition Maps of biochemical reactions, interactions, regulations

More information

arxiv: v1 [q-bio.mn] 7 Nov 2018

arxiv: v1 [q-bio.mn] 7 Nov 2018 Role of self-loop in cell-cycle network of budding yeast Shu-ichi Kinoshita a, Hiroaki S. Yamada b a Department of Mathematical Engineering, Faculty of Engeneering, Musashino University, -- Ariake Koutou-ku,

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016 A Random Dot Product Model for Weighted Networks arxiv:1611.02530v1 [stat.ap] 8 Nov 2016 Daryl R. DeFord 1 Daniel N. Rockmore 1,2,3 1 Department of Mathematics, Dartmouth College, Hanover, NH, USA 03755

More information

Hotspots and Causal Inference For Yeast Data

Hotspots and Causal Inference For Yeast Data Hotspots and Causal Inference For Yeast Data Elias Chaibub Neto and Brian S Yandell October 24, 2012 Here we reproduce the analysis of the budding yeast genetical genomics data-set presented in Chaibub

More information

Towards Detecting Protein Complexes from Protein Interaction Data

Towards Detecting Protein Complexes from Protein Interaction Data Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,

More information

EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES

EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES Charless Fowlkes 1, Qun Shan 2, Serge Belongie 3, and Jitendra Malik 1 Departments of Computer Science 1 and Molecular Cell Biology 2, University

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information