Using networks to measure similarity between genes: association index selection

Similar documents
Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Probability. b a b. a b 32.

Lecture Notes No. 10

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions

QUADRATIC EQUATION. Contents

6.5 Improper integrals

Review Topic 14: Relationships between two numerical variables

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem.

Comparing the Pre-image and Image of a Dilation

THE INFLUENCE OF MODEL RESOLUTION ON AN EXPRESSION OF THE ATMOSPHERIC BOUNDARY LAYER IN A SINGLE-COLUMN MODEL

Maintaining Mathematical Proficiency

CS 491G Combinatorial Optimization Lecture Notes

Introduction to Olympiad Inequalities

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

Section 1.3 Triangles

A Study on the Properties of Rational Triangles

Discrete Structures Lecture 11

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

On the Scale factor of the Universe and Redshift.

Unit 4. Combinational Circuits

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Learning Partially Observable Markov Models from First Passage Times

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

ILLUSTRATING THE EXTENSION OF A SPECIAL PROPERTY OF CUBIC POLYNOMIALS TO NTH DEGREE POLYNOMIALS

ANALYSIS AND MODELLING OF RAINFALL EVENTS

University of Sioux Falls. MAT204/205 Calculus I/II

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES

12.4 Similarity in Right Triangles

Outline. Theory-based Bayesian framework for property induction Causal structure induction

Arrow s Impossibility Theorem

Generalization of 2-Corner Frequency Source Models Used in SMSIM

SECTION A STUDENT MATERIAL. Part 1. What and Why.?

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

Engr354: Digital Logic Circuits

Arrow s Impossibility Theorem

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

NON-DETERMINISTIC FSA

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Thermodynamics. Question 1. Question 2. Question 3 3/10/2010. Practice Questions PV TR PV T R

8 THREE PHASE A.C. CIRCUITS

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

System Validation (IN4387) November 2, 2012, 14:00-17:00

HS Pre-Algebra Notes Unit 9: Roots, Real Numbers and The Pythagorean Theorem

Iowa Training Systems Trial Snus Hill Winery Madrid, IA

QUADRATIC EQUATION EXERCISE - 01 CHECK YOUR GRASP

THE PYTHAGOREAN THEOREM

Part 4. Integration (with Proofs)

Lecture 6: Coding theory

Logic Synthesis and Verification

A Non-parametric Approach in Testing Higher Order Interactions

TIME AND STATE IN DISTRIBUTED SYSTEMS

Section 4.4. Green s Theorem

Estimation of Global Solar Radiation in Onitsha and Calabar Using Empirical Models

Exercise sheet 6: Solutions

Learning Objectives of Module 2 (Algebra and Calculus) Notes:

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Solutions to Assignment 1

1B40 Practical Skills

Linear Algebra Introduction

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

2. There are an infinite number of possible triangles, all similar, with three given angles whose sum is 180.

Math 32B Discussion Session Week 8 Notes February 28 and March 2, f(b) f(a) = f (t)dt (1)

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

NEW CIRCUITS OF HIGH-VOLTAGE PULSE GENERATORS WITH INDUCTIVE-CAPACITIVE ENERGY STORAGE

2.4 Linear Inequalities and Interval Notation

Factorising FACTORISING.

Chapter 8 Roots and Radicals

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

Symmetrical Components 1

Finite State Automata and Determinisation

Alpha Algorithm: A Process Discovery Algorithm

Equivalent fractions have the same value but they have different denominators. This means they have been divided into a different number of parts.

Chapter Gauss Quadrature Rule of Integration

Eigenvectors and Eigenvalues

CHENG Chun Chor Litwin The Hong Kong Institute of Education

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix

Hyers-Ulam stability of Pielou logistic difference equation

Table of Content. c 1 / 5

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P.

6.3.2 Spectroscopy. N Goalby chemrevise.org 1 NO 2 H 3 CH3 C. NMR spectroscopy. Different types of NMR

p-adic Egyptian Fractions

Part I: Study the theorem statement.

TOPIC: LINEAR ALGEBRA MATRICES

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

April 8, 2017 Math 9. Geometry. Solving vector problems. Problem. Prove that if vectors and satisfy, then.

H 4 H 8 N 2. Example 1 A compound is found to have an accurate relative formula mass of It is thought to be either CH 3.

Appendix C Partial discharges. 1. Relationship Between Measured and Actual Discharge Quantities

Fast Frequent Free Tree Mining in Graph Databases

Section 6: Area, Volume, and Average Value

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005

CS 573 Automata Theory and Formal Languages

Calculus Cheat Sheet. Integrals Definitions. where F( x ) is an anti-derivative of f ( x ). Fundamental Theorem of Calculus. dx = f x dx g x dx

Integration. antidifferentiation

Statistics in medicine

Transcription:

Using networks to mesure similrity etween genes: ssoition index seletion Jun I Fuxmn ss 1,2, los Dillo 1,2, Justin Nelson 3, Jun M Soto 1,2, Chd L Myers 3 & lerth J M Wlhout 1,2 npg 213 Nture meri, In. ll rights reserved. iologil networks n e used to funtionlly nnotte genes on the sis of intertionprofile similrities. Metris known s ssoition indies n e used to quntify intertion-profile similrity. We provide n overview of ommonly used ssoition indies, inluding the index nd the Person orreltion oeffiient, nd ompre their performne in different types of nlyses of iologil networks. We introdue the Guide for ssoition Index for Networks (GIN), we tool for lulting nd ompring intertion-profile similrities nd defining modules of genes with similr profiles. iologil proesses re orhestrted through omplex intertion networks. Networks re modeled s grphs tht depit intertions ( edges ) etween iologil entities suh s genes, tissues, proteins nd metolites ( nodes ; see ox 1). If only one type of node is involved, s in protein-protein 1,2 or geneti intertion networks 3, the grph is defined s monoprtite. iprtite grphs, y ontrst, desrie intertions etween two different types of nodes (X-type nd Y-type), with edges onneting only nodes of different types (Fig. 1). iprtite grphs inlude protein- DN intertion networks 4 6, metoli networks 7,8, phenotypi networks 9 nd expression networks 1 14. Networks re powerful tools for gene funtion nnottion. For instne, the guilt-y-ssoition priniple postultes tht if node with n unknown funtion hs n intertion profile similr to tht of node with known funtion, its funtion my e similr s well 2,15. dditionlly, network nlysis n identify modules neighorhoods omprising nodes with similr intertion profiles tht n point to funtionl reltionships etween lrger sets of genes 16,17. lthough seemingly intuitive, it is not trivil to know how to est pture intertion-profile similrity etween nodes, s numerous metris, or ssoition indies, n e used, nd euse eh index n provide different vlues nd rnk similrity etween pirs of nodes in different order. Here, we provide n overview of ommonly used ssoition indies. We disuss the differenes nd similrities etween ssoition indies nd provide set of guidelines nd we tool for their seletion for different pplitions. Types of ssoition indies We fous here on iprtite networks tht onnet X-type nodes to Y-type nodes (Fig. 1). In these networks, ssoition indies n e used to mesure shred Y-type nodes etween two X-type nodes, or vie vers. n ssoition index n mesure intertionprofile similrity etween X-type nodes nd y lulting the shred prtners ( N() N() ), in reltion to their totl numer of intertions ( node degree ), defined s N() nd N(), nd the totl numer of Y-type nodes in the network (n y ) (Fig. 1). There re three min types of indies, eh of whih uses the vriles mentioned ove in different wy (see ox 2). Similrity indies reflet the proportion of overlp nd onsider only the numer of shred intertions etween two X-type nodes nd the individul degrees of these nodes, ut they do not tke the totl numer of Y-type nodes in the network into ount. There re mny similrity indies, most of whih sle intertion-profile similrity etween nd 1 (ref. 18) (Supplementry Tle 1). We will fous on four tht re ommonly used in genomis nd systems iology (see ox 2). The index lultes the proportion 1 Progrm in Systems iology, University of Msshusetts Medil Shool, Worester, Msshusetts, US. 2 Progrm in Moleulr Mediine, University of Msshusetts Medil Shool, Worester, Msshusetts, US. 3 Deprtment of Computer Siene nd Engineering, University of Minnesot Twin Cities, Minnepolis, Minnesot, US. Correspondene should e ddressed to J.I.F.. (jun.fuxmnss@umssmed.edu) or.j.m.w. (mrin.wlhout@umssmed.edu). Reeived 14 Jnury; epted 22 July; pulished online 26 novemer 213; CORRECTED FTER PRINT 27 JNURY 214; doi:1.138/nmeth.2728 nture methods VOL.1 NO.12 DECEMER 213 69

ox 1 GLOSSRY OF TERMS grph is pir G = (N, E) omprising set N of nodes onneted y set E of edges. The degree of node ( N() ) is defined s the numer of nodes with whih it interts. Hus re nodes with disproportiontely high degree. module is set of highly interonneted nodes. monoprtite grph ontins only one type of node. iprtite grph ontins two types of nodes (X-type nd Y-type nodes), nd onnetions our only etween nodes of different type. n ssoition index is mesure tht quntifies intertion-profile similrity. n ssoition network is network in whih two nodes of the sme type (for exmple, only X-type nodes) re onneted y n edge if their similrity exeeds seleted threshold. npg 213 Nture meri, In. ll rights reserved. of Y-type nodes shred etween two X-type nodes reltive to the totl numer of Y-type nodes onneted to either X-type node. The index (equl to the meet/min index 19 nd similr to the topologil overlp oeffiient 16 ) onsiders the numer of shred Y-type nodes reltive to the smllest degree of either X-type node. The geometri index lultes the squre of the numer of shred intertions etween two X-type nodes, divided y the produt of their individul degrees. Finlly, the osine index orresponds to the squre root of the geometri index. X-type nodes Y-type nodes iprtite network C D E F iprtite network ssoition index Intertion profile similrity ssoition network.17.35.26.35 C D.47 E F.9.17.47.75 Unlike similrity indies, mthing indies, suh s the simple mthing oeffiient nd the Hmnn index (Supplementry Tle 1), onsider the proportion of shred Y-type nodes s well s Y-type nodes tht re not onneted to either of the two X-type nodes. euse iologil networks re sprse, shred nonprtners n ontriute more to the similrity etween two nodes thn shred prtners. Therefore, mthing indies re not pproprite for the nlysis of most iologil networks nd will not e disussed further. Exmple 1 Exmple 2 Exmple 3 lultion C D.47 E F.47 =.333 = 1 =.333 =.577 Hypergeometri =.146 =.258 =.333 =.5 =.25 =.5 Hypergeometri =.53 =.167 =.667 = 1 =.667 =.816 Hypergeometri =.845 =.73 = 3/6 C C.17.35.65.47.65.47 D.47 E F D.47 E F.9.17.47.75.47 C = 1/6 Figure 1 Mesuring intertion-profile similrity etween two nodes using ssoition indies. () iprtite grphs onnet two types of nodes: X-type (purple) nd Y-type (yellow). The intertion-profile similrity etween pir of X-type nodes ( nd ) is determined on the sis of the numer of shred Y-type nodes, the totl numer of Y-type nodes onneted to nd, nd the totl numer of Y-type nodes in the network. () ssoition index omprison. For eh pir of X-type nodes, the,, geometri, osine nd hypergeometri indies nd were lulted on the sis of their intertions with Y-type nodes. () lultion etween nodes nd for iprtite network involving six X-type nodes (purple) nd seven Y-type nodes (yellow). For eh pir of X-type nodes the ws lulted (lue, positive vlues; red, negtive vlues). In the ssoition network ll the edges onneted to or re highlighted. represents the frtion of X-type nodes onneted to oth nd, with <.5. The ws lso lulted etween nd C. 7 VOL.1 NO.12 DECEMER 213 nture methods

ox 2 DEFINITIONS OF SSOCITION INDICES The index is the proportion of shred nodes etween nd reltive to the totl numer of nodes onneted to or. N( ) N( ) J = N( ) N( ) The index is the proportion of shred nodes reltive to the degree of the lest-onneted node. N( ) N( ) S = min( N( ), N( ) ) The geometri index orresponds to the produt of the proportion of shred nodes etween nd. 2 N( ) N( ) G = N( ). N( ) npg 213 Nture meri, In. ll rights reserved. The osine index is the geometri men of the proportions of shred nodes etween nd. C = N( ) N( ) N( ). N( ) The Person orreltion oeffiient is the orreltion etween the intertion profiles of nd. = N( ) N( ). ny N( ). N( ) N( ). N( ). ( ny N ( ) ).( ny N( ) ) The hypergeometri index is the log-trnsformed proility of hving n equl or greter intertion overlp thn the one oserved etween nd. N( ) ny N( ) min( N( ), N( ) ). i N( ) i H = log ny i = N( ) N( ) N ( ) The onnetion speifiity index () is defined s the frtion of X-type nodes tht hve n intertion profile similrity with nd tht is lower thn the intertion profile similrity etween nd itself. # nodes onneted to or with. 5 = 1 # of X-type nodes in the network # nodes onneted to nd with < =. 5 # of X-type nodes in the network Sttisti-sed indies employ proility distriutions (suh s hi-squre nd Fisher s ext test) to determine the likelihood of oserving ertin overlp etween the intertion profiles of two X-type nodes given their degree nd the totl numer of Y-type nodes in the network 18 (Supplementry Tle 1). We will disuss two of the most ommonly used sttisti-sed indies. The Person orreltion oeffiient () ws originlly developed to mesure the liner reltionship etween two ontinuous vriles, suh s protein nd mrn levels. This metri n lso e pplied to iprtite networks where intertions re either present or sent. The provides vlue etween 1 nd 1 tht desries how well the intertions overlp. of 1 indites perfet overlp, orresponds to the numer of shred intertions expeted y hne nd 1 depits perfet ntiorreltion. The hypergeometri index lultes the log-trnsformed proility of oserving n equl or greter numer of shred nodes y hne nd, therefore, mesures the signifine rther thn the mgnitude of the overlp. Compring ssoition indies Different ssoition indies n provide different vlues of intertion-profile similrity. We illustrte this using three smll exmple networks in whih two X-type nodes, nd, shre different numers of Y-type nodes, out of totl of seven (Fig. 1). In eh exmple, different indies n provide different vlues, rnging from perfet similrity ( = 1 in exmple 1) to low similrity (hypergeometri =.146 in exmple 1). Further, different indies n rnk the intertion-profile similrity etween pir of nodes in different orders. For instne, ording to most indies the profiles of nd re most similr nture methods VOL.1 NO.12 DECEMER 213 71

GIN MNGE PROJECT GUIDE for SSOCITION INDEX for NETWORKS USERNME PSSWORD Login Register New User Intertion mtrix G1 G2 G3 G4 G5 G6 G7 G8 G9 VISULIZE DT 1 2 3 4 5 6 1 1 1 1 C1 G1 C1 G5 1 1 C1 G6 1 1 1 C2 G1 1 1 C2 G2 1 or C2 G3 1 1 C2 G5 1 C3 G1 1 1 1 C3 G2 1 C3 G4 Hetmp Visullze intertions FIND MODULES Network List of intertions COMPRE SIMILRITY GUIDE HELP CONTCT LOGIN/OUT Find modules Compre similrity etween seleted pirs Selet index Selet pirs Selet index Hetmp ssemle networks Histogrm K12H4.4 336.2 T1H9.3 E3H4.8 H19M22.3 K21.5 F12F6.6 T22D1.4 T1.3 F571.1 F18C12.2 D245.1 ZK18.4 Y57GC.15 F57H12.1 K212.3 ZK637.8 Y512.4 H15N14.2 C47G2.3 F59E1.3 T14G1.5 C24H.7 Y71F9L.17 TF8.3 C9F5.2 R6C1.2 T17E9.2 Y25C1.5 D4.3 C557.5 Y51H7C.6 Y7C5C.6 F9E5.2 C364.4 C29F9.7 Y37D8.1 ZK158.2 F54C1.7 T1E8.3 C39F7.4 C39E9.3 T1C3. K3E6.7 F36H1.4 ZK792.6 ZK616.6 F32.1 H2I12.1 28.1 F579.2 F563.2 D1 D21 D29 D32 D33 D4 D41 D42 D45 D53 D54 D57 D66 D71 D92 D9 D27 D43 D6 D23 D37 D47 D48 D82 D84 D51 D75 D59 D64 D88 D81 D35 D45 D4 D D13 D31 D94 D1 D25 D73 D91 D13 D3 D22 D5 D2 D36 D63 D5 D26 D61 D62 D83 D65 D15 D58 D87 D68 D85 D69 D28 D6 D8 D2 D49 D93 D95 1 npg 213 Nture meri, In. ll rights reserved. Figure 2 GIN we tool for the lultion nd lustering of ssoition indies. () Sreenshot of GIN s min window. (,) Visuliztion of iprtite network s n intertion het mp () or s grph (). (d,e) Clustered ssoition index displyed s het mp (d) nd ssoition network (e). (f) Density plot ompring the distriution of the ssoition index vlues etween seleted set of node pirs (lue) nd ll possile pirs of nodes (red). in exmple 3, ut the index rnks exmples 1 nd 3 s eqully similr. Finlly, even for pirs of nodes tht hve different overlp nd/or node degree, n index my output identil vlues s it ondenses four vriles (overlp, degree of, degree of nd n y ) into single numer. For instne, the index nnot disriminte etween exmples 1 nd 2 ( =.333), in whih the totl numer of edges is differently distriuted etween nd, wheres the other indies n. Nonspeifi intertions n drive similrity The indies mentioned ove onsider the similrity in interting prtners etween two X-type nodes ut not the intertion speifiity. Two issues need to e onsidered. First, Y-type hus my onfer rtifiilly high levels of intertion-profile similrity: if hlf of ll X-type nodes ind Y-type hu, this overlp is not very informtive. Seond, not ll Y-type nodes re independent, whih my lso onfer exggerted levels of intertion-profile similrity. For instne, neurons n e lssified into different tegories on the sis of the tissues in whih they re loted. Different types of neurons express ommon genes. Thus, in gene-to-tissue network where genes re onneted to the tissues in whih they re expressed, neuronl genes my e onneted to mny lsses of neurons, rtifiilly inresing their similrity. d 1 Y57GC.15 Y71F9L.17 T14G1.5 F57H12.1 K212.3 TF8.3 C24H.7 Y512.4 C47G2.3 F59E1.3 H15N14.2 ZK37.8 C9F5.2 F36H1.4 K3E6.7 F563.2 28.1 F579.2 H2I12.1 F32.1 ZK616.6 ZK792.6 T1E8.3 R6C1.2 Y25C1.5 Y7C5C.6 D4.3 ZK158.2 Y37D8.1 T1C3. C29F9.7 Y51H7C.6 F9E5.2 C364.4 C39F7.4 C557.5 F54C1.7 T17E9.2 C39E9.3 K2D1.5 K12H4.4 T22D1.4 T1H9.3 F571.1 F18C12.2 ZK18.4 E3H4.8 F12F6.6 336.2 D245.1 T1.3 H19H22.3 Y57GC.15 Y71F9L.17 T14G1.5 F57H17.1 K212.3 TF3.3 C24H.7 Y512.4 C47G2.3 F59E1.3 H15N14.2 ZK637.8 C3F5.2 F36H1.4 K3E6.7 F563.2 28.1 F579.2 H22.1 F32.1 ZK616.6 ZK792.6 T1E8.3 K6C1.2 Y25C1.5 Y7C5C.6 D4.3 ZK158.2 Y37D8.1 T1C3. C29F9.7 Y51H7C.6 Y3E5.2 C364.4 C39F7.4 C557.5 F54C1.7 T17E9.2 C39E9.3 K2D1.5 K12H4.4 T22D1.4 T1H9.3 F571.1 F16C12.2 ZK18.4 E3H4.8 F12F6.6 336.2 D245.1 T1.3 H19H22.3 e f Density 4 3 2 1 ll pirs of nodes Seleted pirs of nodes.2.4.6.8 1. Index vlue The onnetion speifiity index () provides ontextdependent mesure tht mitigtes the effet of nonspeifi intertions y rnking the signifine of similrity etween two X-type nodes ording to the speifiity of their shred intertion prtners 9. The etween two nodes nd is defined s the frtion of X-type nodes tht hve n intertion-profile similrity with nd tht is lower thn the intertion-profile similrity etween nd themselves (see ox 2). s originlly defined, the employs the s first-level ssoition index to rnk the similrity etween nodes, nd then uses onstnt of.5 to define the lower oundry of intertion-profile similrity 9. When the onstnt is inresed, the provides more stringent mesure. Other ssoition indies my lso e used for first-level rnking of intertion-profile similrity. Figure 1 illustrtes n exmple in whih the redues the influene of hus. In this network, nd intert with three nd one Y-type nodes, respetively, nd shre one Y-type node, resulting in =.47 (Fig. 1). nd C lso shre one intertion prtner nd therefore C =.47 s well. However, mny other X-type nodes intert with the Y-type node onneted to oth nd C, hene this shred intertion is less speifi. pplying to these networks llevites this prolem: when onstnt of.5 is used, =.5, wheres C =.17 (Fig. 1). 72 VOL.1 NO.12 DECEMER 213 nture methods

Hypergeometri + or osine npg 213 Nture meri, In. ll rights reserved. Frtion of gene pirs.4.3.2.1 UC =.917 UC =.799.3.2.1.2.4.6.8 Index vlue.2.4.6.81..4.3.2.1 UC =.99.2.4.6.8 UC =.99 GIN: we tool for ssoition indies nd lustering We developed GIN (http://sio.s.umn.edu/similrity_index/ login.php) (Fig. 2 nd Supplementry Methods), whih llows user to uplod n intertion dt set nd perform severl tsks. First, intertions n e visulized s het mp (Fig. 2) or grph (Fig. 2). Seond, GIN llows the user to find modules y lulting ll pirwise vlues with user-seleted ssoition index followed y hierrhil lustering nd y displying het mp (Fig. 2d) or ssoition network (Fig. 2e). ssoition networks ontin one node type onneted y n edge only when their intertion-profile similrity exeeds user-seleted threshold. Finlly, GIN n disply density plot to determine whether n ssoition index n disriminte the intertion-profile similrity of prtiulr set of node pirs seleted y the user from ll node pirs (Fig. 2f). For instne, in gene regultory network this n e used to determine whether pirs.4.3.2.1 of highly homologous trnsription ftors hve more similr intertion profiles thn ll possile pirs of trnsription ftors..2.4.6.81..5.4.3.2.1 Intrmodule Intermodule UC =.895 UC =.93 UC =.941.3.5.2.4.3.1.2.1 36 9 1215.2.2.4.6.8.2.4.6.81. Figure 3 Using ssoition indies to identify modules in gene-to-phenotype network. () Clustered ssoition index het mps for C. elegns gene-to-phenotype network (top). The ssoition index ws lulted for eh pir of genes ording to shred phenotypi fetures nd then lustered using hierrhil lustering. The distriution of index vlues for gene pirs tht elong to the sme module (intrmodule, red) is plotted (ottom) ginst the vlues of gene pirs tht elong to different modules (intermodule, lk). The re under the reeiver operting hrteristi urve (UC) mesures the seprtion etween the two distriutions. () The ssoition networks shown were ssemled y linking genes tht hve top 1% phenotypi profile similrity vlue using the indited indies. fore-direted lyout ws generted y Hypergeometri Hyperg. 62 4 4 9 2 62 4 59 59 59 61 56 61 12 5 17 4 59 12 5 17 Hyperg. 61 12 12 16 26 9 56 5 5 16 15 2 61 17 17 26 15 Cytospe 2.8.1. The olors of the nodes represent mnul prtitioning of the genes in modules 9. () Pirwise omprison of the perentge of differenes in the edges inluded in eh of the ssoition networks. Hyperg., hypergeometri. Figure 4 Compring ssoition indies in the C. elegns gene-to-phenotype network. (,) Pirwise ssoition index vlues for ll genes ording to shred phenotypi fetures for the index () nd (), plotted ginst the vlues determined with the other indies. (,d) The intertion-profile similrity etween pl-1 nd let-6 (elonging to different modules) (), pl-1 nd perm-3 (elonging to different modules) nd etween C47G2.3 nd F571.1 (elonging to the sme module) (d) were determined for ll the ssoition indies. The rnking of intertion-profile similrity ross the entire network (in the top x% vlues from most to lest similr) is indited (right). Yellow nodes indite phenotypes. Phenotype hus (onneted to more thn 4% of the genes) re indited with lue outline. Hyperg., hypergeometri. 1..8.6.4.2 1..8.6.4.2 1..8.6.4.2 1..8.6.4.2.2.4 5 1 15.6.81. Hypergeometri.2.4.6.81..2.4.6.81. 1..8.6.4.2 1..8.6.4.2.5 1..8.6.4.2 1..8.6.4.2 Finding network modules Network modules re groups of nodes with reltively high intertion-profile similrity nd n point to shred iologil funtion etween nodes. To ompre how different ssoition indies perform in the identifition of network modules, we used two iprtite networks. The first is suset of Cenorhditis elegns gene-to-phenotype network tht onnets 52 essentil genes to 94 phenotypi fetures 9. We used genes tht elong to four modules mnully determined y the uthors of the originl pper to enhmrk the performne of the different indies. ssoition indies were lulted for eh pir of genes ording to their shred phenotypi fetures nd then lustered into het mps (Fig. 3). Visul inspetion shows tht the index is lest suitle for the identifition of the four modules, nd performs the est. Consistent.2.4.6.81..5 1..2.4.6.81. 5 1 15 Hypergeometri 1..8.6.4.2 1..8.6.4.2 1..8.6.4.2 1..8.6.4.2.5.5 1..2.4.6.81..2.4.6.81..2.4.6.81. d L3 module S1 module M1 module Phenotype Phenotype hu pl-1 let-6 pl-1 perm-3 C47G2.3 F571.1 Hyperg. Hyperg. Hyperg. L3 module L4 module S1 module M1 module Index vlue.33 1.33.58 3.81.55.62 Index vlue.42 1.42.65 4.86.62.62.44.89.42.65 5.79.6.96 Rnk (%) 27 2 18 18 23 13 18 Rnk (%) 15 2 14 9 18 7 1 1 1 nture methods VOL.1 NO.12 DECEMER 213 73

X Funtion 1? Funtion 2?... Funtion n? X vs. F1 genes Rnked.i. vlues Rnked.i. vlues X vs. F2 genes Top k.i. vlues (knn sore) Top k.i. vlues (knn sore) Non-F1 vs. F1 Frtion of genes X vs. F1 F1 vs. F1 knn sore Non-F2 vs. F2 Frtion of genes X vs. F2 F2 vs. F2 knn sore Medin UC 1..97.94.91.88 Gene-to-phenotype network.85 1 2 3 4 5 6 Numer of nerest neighors Medin UC.83.81.79.77.75 Protein-DN intertion network 1 2 3 4 5 6 Numer of nerest neighors Hypergeometri npg 213 Nture meri, In. ll rights reserved. Figure 5 Prediting gene funtion. () k-nerest-neighor (knn) lgorithm ws used to evlute how well eh index is le to ssign genes to funtionl lsses (F). To determine whether n unhrterized gene X n e ssigned prtiulr funtion, knn sore ws determined s the verge of the top k ssoition index (.i.) vlues etween X nd genes with tht funtion. The knn vlues were then lulted for genes tht hve tht funtion (lue nd green) nd for genes tht do not (lk urves). To ssign funtion to gene X, these vlues should e well seprted (determined y lulting the UC). illustrtes se in whih gene X n e ssigned funtion 1 (F1, lue) ut not funtion 2 (F2, green). The medin UC determined for ll the funtionl lsses ws used s mesure of performne of the different ssoition indies to predit gene funtion. () The medin UC lulted for the four funtionl lsses in the C. elegns gene-to-phenotype network ws determined for k vlues of 1 to 6 (numer of nerest neighors). () The medin UC lulted for the iologil proess GO slim terms in the yest protein-dn intertion network ws determined for k vlues from 1 to 6. with this oservtion, we found tht is est le to disriminte etween the intertion-profile similrity etween nodes tht elong to the sme module nd tht of nodes elonging to different modules (Fig. 3). Next, we sked whih index performs est to delinete ssoition networks for this gene-to-phenotype network. These networks onnet nodes tht hve n intertion-profile similrity ove ertin threshold. Therefore, they serve not only to delinete modules ut lso to identify nodes relted to more thn one module nd nodes tht re not relted to ny module. We used the top 1% of the vlues otined with eh index (Fig. 3). outperforms the other indies, s it (i) etter demrtes the modules, (ii) leves only two genes not ssigned to ny module nd (iii) ples only one gene into module different from tht Network 1 E D F G C H Network 2 E F D G C H Pir of nodes Network 1 Network 2-1 1 -C 1 1 -D 1 -E......... -C 1 -D......... Compre intertion lists using ssoition indies X-type nodes Y-type nodes Network 1 Network 2 Figure 6 pplition of ssoition indies to network integrtion. () Using ssoition indies, edges in one monoprtite network n e ompred to those in nother, fousing either on prtiulr node (lue ox) or on the entire network. () To determine whether interting nodes in network 2 hve similr intertion profile in network 1, intertion-profile similrity vlues in network 1 n e ompred etween interting nd noninterting nodes in network 2. () The intertion-profile similrity etween X-type nodes in two networks n e ompred to determine whether node pirs tht hve similr intertion profile in network 1 lso hve similr intertion profile in network 2. Edge width in the ssoition network is proportionl to the ssoition index vlue. (d) The ssoition index vlues of C. elegns HLH trnsription ftors ws determined ording to the tissues in whih they re expressed nd ws prtitioned etween proteins tht physilly intert (red) nd those tht do not (gry). Eh ox spns from the first to the third qurtile, the horizontl lines inside the oxes indite medin vlue nd the whiskers indite minimum nd mximum vlues (,d). ssigned y mnul lssifition (Fig. 3). Generlly, ssoition networks otined with different indies exhiit lrge degree of overlp in the edges inluded, exept for those otined with the index nd (Fig. 3). Indeed, y determining ll pirwise ssoition index vlues for eh pir of indies, i.e., y not limiting to the top 1%, omprisons involving or were lest orrelted (Fig. 4, nd Supplementry Fig. 1). This nlysis of rel network further sustntites the notion tht different indies n result in different vlues nd rnking of intertion-profile similrity. Neither the index nor is well orrelted with ny of the other indies, ut the onsequenes in eh se re quite different: for the poor orreltion results in redued module demrtion, wheres for it results in preisely the opposite. The denomintor in Index vlue.8.6.4.2 d Interting pirs Noninterting pirs Index vlue in gene-to expression network Network 1 X-type nodes Y-type nodes Network 2 X-type nodes 6 1.5 24 1..5 Z-type nodes Interting HLH pirs Noninterting HLH pirs Hypergeometri e Index vlue in protein-dn intertion network Similrity ording to Y 7 1. 14.8.6.4.2.2.8.6.4.2.2.4.6.8 Similrity ording to Z Highly oexpressed gene pirs Other gene pirs (e) ssoition index vlues were determined for pirs of promoters in the yest protein-dn intertion network. The vlues for pirs of highly oexpressed genes (top 1%; red) nd other gene pirs (ottom 99%; gry) re plotted. Hypergeometri 74 VOL.1 NO.12 DECEMER 213 nture methods

Tle 1 ssoition index performne for different pplitions Hypergeometri Identifying network modules ** * ** ** ** ** *** Prediting gene funtion ** * ** ** ** ** *** Compring two sets of node pirs ** *** ** ** * *** * Determining signifine of overlp No No No No Yes No No sterisks indite qulittive strengths, with greter numer inditing greter utility. ssessment depends on iologil question or ojetive. npg 213 Nture meri, In. ll rights reserved. the index uses only the lower of the two overll node degrees, whih n led to rtifiilly high levels of intertionprofile similrity, even for genes elonging to different modules (Fig. 4). In the se of, rnking similrity ording to intertion speifiity results in higher vlue for gene pirs with shred speifi phenotypes (C47G2.3 nd F571.1), nd lower vlue for gene pirs with shred ommon phenotypes (pl-1 nd perm-3) (Fig. 4d). The seond exmple network ontins protein-dn intertions etween 12 yest trnsription ftors nd 542 promoters 4. ssoition indies were lulted for eh pir of promoters ording to their shred trnsription ftors, nd vlues were lustered into het mps (Supplementry Fig. 2). The het mps re visully quite similr, nd numerous modules n e deteted. There were no previously enhmrked modules ville. However, euse genes with similr funtions re frequently ound y the sme trnsription ftor(s), we ssessed the performne of the different indies y nlyzing the iologil proess Gene Ontology (GO) enrihment in three different modules (Supplementry Fig. 2). Two modules were deteted eqully well y ll ssoition indies (Supplementry Fig. 2); however, for the third module signifint enrihment (P <.1) for genes involved in the oxidtion-redution proess ws deteted only y nd the, geometri nd hypergeometri indies (Supplementry Fig. 2). Thus, ssoition indies n perform differently in different types of networks nd even within network. Prediting funtion of individul genes iologists frequently identify single genes of unknown funtions, for instne in geneti sreen. So fr, we hve disussed network modules s strting point for funtionl nnottion. However, for nlysis of only single gene, there is no need to first omprehensively identify network modules. Moreover, modules re not lwys suitle for nnottion of the funtion of every gene, s gene my not elong to lerly defined module nd my hve more thn one funtion. n intuitive wy to nnotte gene funtion is to use the guilt-y-ssoition priniple, whih postultes tht two genes with similr funtions hve similr intertion profiles. One n ssign funtions to genes using vriety of different lgorithms. Here, we use k-nerest-neighor lgorithm tht tests ssoitions etween genes nd funtions (Fig. 5). n unknown gene n e ssigned to eh funtion F depending on (i) the top k ssoition index vlues etween tht gene nd the gene(s) tht re known to hve tht funtion, nd (ii) the speifiity of the distriution of those vlues. In the exmple shown in Figure 5, the highest sore for the unknown gene (X) with genes with either known funtion 1 (F1, lue) or funtion 2 (F2, green) is similr (red lines). However, for funtion 1, the two distriutions re lrgely seprte, wheres for funtion 2 the two distriutions overlp gretly. Thus, funtion 1 n e ssigned to gene X with greter onfidene thn funtion 2. We ssessed whih ssoition index est predits funtion using the two networks desried ove. gin, ws est le to ssign genes to funtionl lsses, nd the index performed the worst (Fig. 5,). This result is onsistent with the ility of to onsider intertion speifiity. Integrted networks The integrtion of different types of networks enles the omprison of pirs of nodes ross networks 13,2. Questions tht n e nswered inlude (i) whether diretly interting pirs of nodes in one network lso tend to intert in nother (Fig. 6; note tht this involves two monoprtite networks), (ii) whether interting nodes in one monoprtite network hve similr intertion profiles in iprtite network (Fig. 6) nd (iii) whether pirs of nodes with similr intertion profiles in one iprtite network re lso similr in nother iprtite network (Fig. 6). n exmple of the first type of question is whether the genes tht enode physilly interting proteins lso intert genetilly. n exmple of the seond type of question is whether proteins tht physilly intert tend to shre phenotypes. Finlly, n exmple of the third question is whether trnsription ftors tht regulte shred set of trget genes re expressed in the sme tissues nd/or under the sme onditions. To determine whether interting nodes in one monoprtite network lso intert in nother network, the overlp etween oth sets of intertions n e determined, using ssoition indies, on the sis of the numer of shred edges etween oth networks, the numer of edges in eh network nd the totl numer of node pirs (Fig. 6). To determine the mgnitude of similrity, nd re most suitle, nd the hypergeometri index n e used to determine signifine. The sme pproh n e used to ompre interting node pirs etween modules in one network to those in nother. Suh ross-network module preservtion hs een evluted elsewhere 21. To integrte monoprtite nd iprtite (Fig. 6), or two iprtite, networks (Fig. 6), the iologil question should inform index seletion. We illustrte this with two dt sets. The first is multiprmeter, integrted C. elegns si helix-loophelix (HLH) network omprising protein-protein intertions nd gene-to-tissue expression ptterns 13. Eh index reveled tht interting HLH proteins re more often oexpressed thn noninterting ones (Fig. 6d). However, outperformed the other indies (Fig. 6d nd Supplementry Fig. 3). This is euse few HLH proteins tht ind mny prtners re rodly expressed, ut eh protein s prtners re expressed in only suset of tissues. This is est ptured y the index, s it uses the minimum node degree in the denomintor. The seond dt set is the yest protein-dn intertion network, nture methods VOL.1 NO.12 DECEMER 213 75

npg 213 Nture meri, In. ll rights reserved. integrted with mirorry oexpression network 22. ll indies reveled tht highly oexpressed genes hve higher protein-dn intertion-profile similrity thn other gene pirs (Fig. 6e). The est seprted the two tegories, nd the index ws lest effiient (Supplementry Fig. 3). The reltively poor performne of the index is euse it onsiders the degree of only the lest onneted promoter. s onsequene, promoter ound y mny trnsription ftors my e regrded s similr to promoter ound y few, some of whih re shred. However, differenes etween trnsription ftors ound to promoters re lso highly meningful, s these my ontriute to distint gene-expression profiles. Conlusions Different ssoition indies n e used to ompre intertionprofile similrity within nd ross networks, nd different indies hve strengths nd weknesses for different pplitions (Tle 1). is most suitle for prediting gene funtion nd identifying modules. However, levels the similrities etween modules, whih is disdvntge in ompring modules. When the min gol is to ompre the similrity etween node pirs, the iologil question should drive index seletion. For instne, the index my e used to void penlizing lrge differenes in node degree. If one wnts, onversely, to pture this differene, other indies re more pproprite. The hypergeometri index should e used with ution to determine the mgnitude of similrity etween intertion profiles, s it does not sle linerly with the proportion of overlp. However, only this index is le to lulte the sttistil signifine of intertionprofile overlp. Note: ny Supplementry Informtion nd Soure Dt files re ville in the online version of the pper. knowledgments We thnk memers of.j.m.w. s l, R. MCord nd. Ljoie for disussions nd ritil reding of the mnusript. We thnk J.C. re (Institute of Systems iology) for helpful dvie in the development of GIN. This work ws supported y the US Ntionl Institutes of Helth grnts DK68429 nd GM82971 to.j.m.w. J.I.F.. is prtilly supported y postdotorl fellowship from the Pew Ltin merin Fellows Progrm. J.N. nd C.L.M. re prtilly supported y grnt DI-953881 from the US Ntionl Siene Foundtion. UTHOR CONTRIUTIONS J.I.F.. nd.j.m.w. oneived the projet; J.I.F.. performed the dt nlysis with the ssistne of.d., J.N. nd C.L.M.;.D. nd J.I.F.. developed the GIN we tool in ollortion with J.N., C.L.M. nd J.M.S.; J.I.F.. nd.j.m.w. wrote the pper. COMPETING FINNCIL INTERESTS The uthors delre no ompeting finnil interests. Reprints nd permissions informtion is ville online t http://www.nture. om/reprints/index.html. 1. Wlhout,.J.M. et l. Protein intertion mpping in C. elegns using proteins involved in vulvl development. Siene 287, 6 122 (2). 2. Shwikowski,., Uetz, P. & Fields, S. network of protein-protein intertions in yest. Nt. iotehnol. 18, 1257 1261 (2). 3. Costnzo, M. et l. The geneti lndspe of ell. Siene 327, 425 431 (21). 4. Hrison, C.T. et l. Trnsriptionl regultory ode of eukryoti genome. Nture 431, 99 14 (24). 5. Wlhout,.J.M. Unrveling trnsription regultory networks y protein-dn nd protein-protein intertion mpping. Genome Res. 16, 1445 1454 (26). 6. Reee-Hoyes, J.S. et l. Extensive rewiring nd omplex evolutionry dynmis in C. elegns multiprmeter trnsription ftor network. Mol. Cell 51, 6 127 (213). 7. ordr,. & Plsson,.O. Using the reonstruted genome-sle humn metoli network to study physiology nd pthology. J. Intern. Med. 271, 131 141 (212). 8. Wtson, E., MNeil, L.T., rd, H.E., Zhu, L.J. & Wlhout,.J.M. Integrtion of metoli nd gene regultory networks modultes the C. elegns dietry response. Cell 153, 253 266 (213). 9. Green, R.. et l. high-resolution C. elegns essentil gene network sed on phenotypi profiling of omplex tissue. Cell 145, 47 482 (2). 1. Su,.I. et l. gene tls of the mouse nd humn protein-enoding trnsriptomes. Pro. Ntl. d. Si. US, 662 667 (24).. Fowlkes, C.C. et l. quntittive sptiotemporl tls of gene expression in the Drosophil lstoderm. Cell 133, 364 374 (28). 12. Mrtinez, N.J., Ow, M.C., Reee-Hoyes, J., mros, V. & Wlhout,.J. Genome-sle sptiotemporl nlysis of Cenorhditis elegns mirorn promoter tivity. Genome Res. 18, 25 215 (28). 13. Grove, C.. et l. multiprmeter network revels extensive divergene etween C. elegns HLH trnsription ftors. Cell 138, 314 327 (29). 14. Ritter,.D. et l. Complex expression dynmis nd roustness in C. elegns insulin networks. Genome Res. 23, 954 965 (213). 15. Lee, I., lom, U.M., Wng, P.I., Shim, J.E. & Mrotte, E.M. Prioritizing ndidte disese genes y network-sed oosting of genome-wide ssoition dt. Genome Res. 21, 9 21 (2). 16. Rvsz, E., Somer,.L., Mongru, D.., Oltvi, Z.N. & rsi,.l. Hierrhil orgniztion of modulrity in metoli networks. Siene 297, 1551 1555 (22). 17. Spirin, V. & Mirny, L.. Protein omplexes nd funtionl modules in moleulr networks. Pro. Ntl. d. Si. US 1, 12123 12128 (23). 18. Hyek, L.-.C. in Mesuring nd Monitoring iologil Diversity: Stndrd Methods for mphiins. (ed. Heyer, W.R.) Ch. 9, 27 269 (Smithsonin Institution, Wshington, DC, 1994). 19. Golderg, D.S. & Roth, F.P. ssessing experimentlly derived intertions in smll world. Pro. Ntl. d. Si. US 1, 4372 4376 (23). 2. Gunslus, K.C. et l. Preditive models of moleulr mhines involved in Cenorhditis elegns erly emryogenesis. Nture 436, 861 865 (25). 21. Lngfelder, P., Luo, R., Oldhm, M.C. & Horvth, S. Is my network module preserved nd reproduile? PLoS Comput. iol. 7, e57 (2). 22. Huttenhower, C., His, M., Myers, C. & Troynsky, O.G. slle method for integrtion nd funtionl nlysis of multiple mirorry dtsets. ioinformtis 22, 289 2897 (26). 76 VOL.1 NO.12 DECEMER 213 nture methods

Corrigendum: Using networks to mesure similrity etween genes: ssoition index seletion Jun I Fuxmn ss, los Dillo, Justin Nelson, Jun M Soto, Chd L Myers & lerth J M Wlhout Nt. Methods 1, 69 76 (213); pulished online 26 Novemer 213; orreted fter print 27 Jnury 214 orrigend In the version of this rtile initilly pulished, the formul desriing the onnetion speifiity index () in ox 2 ws inorret. The denomintor in the frtion of the eqution originlly red n y ; the orret denomintor is # of X-type nodes in the network. The error hs een orreted in the HTML nd PDF versions of the rtile. npg 214 Nture meri, In. ll rights reserved. nture methods