A Global Analysis of Synthetic Genetic Interactions & A Genetic Analysis of Muscle Arm Development in Caenorhabditis elegans.

Size: px
Start display at page:

Download "A Global Analysis of Synthetic Genetic Interactions & A Genetic Analysis of Muscle Arm Development in Caenorhabditis elegans."

Transcription

1 A Global Analysis of Synthetic Genetic Interactions & A Genetic Analysis of Muscle Arm Development in Caenorhabditis elegans by Alexandra Byrne A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Molecular Genetics University of Toronto Copyright by Alexandra Byrne 2009

2 A Global Analysis of Synthetic Genetic Interactions & A Genetic Analysis of Muscle Arm Development in Caenorhabditis elegans Doctor of Philosophy 2009 Alexandra Byrne Department of Molecular Genetics University of Toronto Abstract Understanding gene function and genetic relationships is elemental in our efforts to better understand biological systems. Here, I describe a reliable high-throughput approach, Systematic Genetic Interaction analysis (SGI), capable of revealing both weak and strong genetic interactions in the nematode Caenorhabditis elegans. I also present evidence that UNC-73 functions cell-autonomously in an UNC-40 pathway to direct muscle arm extension in C. elegans. Previous efforts to systematically describe genetic interactions between redundant genes on a global scale either have focused on core biological processes in yeast or have surveyed catastrophic interactions in metazoans. I investigated synthetic genetic interactions between eleven query mutants in conserved signal transduction pathways and hundreds of target genes compromised by RNAi. A network of 1246 genetic interactions was uncovered through an unbiased global analysis of the interaction matrix, establishing the largest metazoan genetic interaction network to date. To investigate how genetic interactions connect genes on a systems-wide level, the SGI network was superimposed with existing networks of physical, genetic, phenotypic and co-expression interactions. Fifty-six putative functional modules were identified within the superimposed network, one of which regulates fat accumulation and is coordinated by bar-1(ga80)/β-catenin interactions. This led to the discovery that SGI interactions link distinct functional modules on a global scale, which is a previously unappreciated level of organization within metazoan systems. In addition, I present evidence that the properties of genetic networks are conserved between C. elegans and S. cerevisiae, but ii

3 that the connectivity of the interactions within the current networks is not. Although the buffering between functional modules may differ between species, studying these differences may provide insight into the evolution of divergent form and function. In C. elegans the postsynaptic membrane of the neuromuscular junction reaches its destination through an active process of guided cell extension. The worm has 95 body wall muscles (BWMs) that extend projections called 'muscle arms' to motor axons. The muscle arms harbour the postsynaptic elements of neuromuscular junctions. The stereotypical pattern of muscle arm extension was exploited in a forward genetic screen for new genes required for guided cell migration by looking for mutations that caused a reduction in the number of arms that extend to the motor axons. One of the resulting mutants was tr117, which extended half the number of arms compared to wild type animals. Genetic mapping, complementation tests, and sequencing revealed that tr117 was a mutation in unc-73/trio, which encodes a guanine nucleotide exchange factor. Expression of UNC-73 specifically in the BWMs rescued the muscle arm development defects of unc-73(e936) mutants, indicating that UNC-73 functions cellautonomously to regulate muscle arm extension. UNC-73::CFP was localized to muscle arm termini in a pattern similar to that of UNC-40/Dcc, which directs muscle arm extension. UNC-73 over-expression suppressed the Madd phenotype of unc-40 null worms and unc-73(e936) suppressed ectopic myopodia induced by UNC-40 over-expression. These results indicate that UNC-73 functions downstream of UNC-40 in a pathway that regulates muscle arm extension. i

4 Note to Reader This thesis is an account of two projects that I undertook to understand two aspects of C. elegans biology: a large-scale analysis of genetic interactions and a detailed genetic analysis of muscle arm extension. I have presented the projects separately because the two subjects are significantly different. An investigation of the network of genetic interactions that regulates animal development and physiology is presented in the first four chapters. This is followed by a detailed genetic investigation of a developmental process called muscle arm extension, which is introduced, described, and discussed in the fifth chapter. i

5 Table of Contents Abstract... ii Note to Reader... iv List of Figures...x List of Tables... xii CHAPTER 1. AN INTRODUCTION TO SYNTHETIC GENETIC INTERACTION ANALYSIS IN C. ELEGANS Synthetic Genetic Interactions Reveal Gene Function Genetic Modifier Screens Ascribe Function to Genes Epistasis Defines a Genetic Interaction Interpreting Enhancing Genetic Interactions Interpreting Suppressing Genetic Interactions High-Throughput Approaches to Identify Genetic Interactions Yeast Synthetic Genetic Array Analysis (SGA) Modified-SGA Approaches to Identify Genetic Interactions in Yeast Network Analysis Provides Biological Insight into Gene Function Synthetic Genetic Interactions in Metazoans C. elegans is an Ideal Model Organism to Study Genetic Interactions RNAi is a Useful Tool to Reduce Gene Function Previous Genetic Interaction Screens in C. elegans Genetic Interactions will Contribute Significantly to the C. elegans Global Network of Functional Data Superimposing Distinct Interaction Networks can Reveal New Insight into Gene Function Superimposed Networks Facilitate the Interpretation of the Nature of a Genetic Interaction Integrating Functional Data Predicts Genetic Interactions Signal Transduction in C. elegans Signal Transduction Pathways are Ideal for Investigating Genetic Interactions in Metazoans Insulin Signal Transduction FGF Signal Transduction EGF Signal Transduction Notch Signal Transduction Wnt Signal Transduction TGF-β Signal Transduction DNA Damage Response Signal Transduction Conclusion v

6 Table of Contents Abstract... ii Note to Reader iii List of Figures...x List of Tables... xii CHAPTER 1. AN INTRODUCTION TO SYNTHETIC GENETIC INTERACTION ANALYSIS IN C. ELEGANS Synthetic Genetic Interactions Reveal Gene Function Genetic Modifier Screens Ascribe Function to Genes Epistasis Defines a Genetic Interaction Interpreting Enhancing Genetic Interactions Interpreting Suppressing Genetic Interactions High-Throughput Approaches to Identify Genetic Interactions Yeast Synthetic Genetic Array Analysis (SGA) Modified-SGA Approaches to Identify Genetic Interactions in Yeast Network Analysis Provides Biological Insight into Gene Function Synthetic Genetic Interactions in Metazoans C. elegans is an Ideal Model Organism to Study Genetic Interactions RNAi is a Useful Tool to Reduce Gene Function Previous Genetic Interaction Screens in C. elegans Genetic Interactions will Contribute Significantly to the C. elegans Global Network of Functional Data Superimposing Distinct Interaction Networks can Reveal New Insight into Gene Function Superimposed Networks Facilitate the Interpretation of the Nature of a Genetic Interaction Integrating Functional Data Predicts Genetic Interactions Signal Transduction in C. elegans Signal Transduction Pathways are Ideal for Investigating Genetic Interactions in Metazoans Insulin Signal Transduction FGF Signal Transduction EGF Signal Transduction Notch Signal Transduction Wnt Signal Transduction TGF-β Signal Transduction DNA Damage Response Signal Transduction Conclusion v

7 The SGI Network Properties are Typical of Other Biological Networks The Construction of a Superimposed Network Creation of co-phenotype network Genetic interactions are orthogonal to other interaction datasets The Superimposed Network was Mined for Multiply-Supported Subnetworks The Nature of Genetic Interactions in C. elegans High Throughput Interactions are Likely Between-Pathway Interactions Genetic Interactions Bridge Functional Modules The connectivity of the current synthetic genetic networks is not conserved between worms and yeast Discussion Investigating the integration of genetic interactions into a superimposed network reveals a new level of organization Functional Modules Represent Groups of Genes that Share Function The connectivity of synthetic genetic networks may not be evolutionarily conserved Conclusions Materials and Methods Testing the correlation of target hubs with RNAi phenotype Comparing the Network Properties of the SGI and SGA Genetic Networks Construction of the transposed SGA network and the interolog network Construction of the co-phenotype network Construction of permuted networks Determination of the significance of the number of supported links Identification of gene subnetworks Characterization of multiply-supported subnetworks Nile Red Analysis Identification of significantly bridged subnetwork pairs Estimation of the significance of the number of bridged subnetwork pairs Determination of bridging propensities Determination of the degree of subnetwork bridging conservation CHAPTER 4. SUMMARY OF GLOBAL GENETIC INTERACTION ANALYSIS IN C. ELEGANS AND FUTURE DIRECTIONS Summary Future Directions Genetic Interactions Serve as a Hypothesis Generating Resource to Infer Function for Many Genes Towards a Global Network Graph in C. elegans High-Throughput Screens for Alleviating Interactions may Reveal Within-Pathway Interactions and Conserved Yeast Genetic Interactions Genetic Interaction Analyses in Specific Cells or Tissues May Reveal More Biologically Relevant Interactions vii

8 Investigation of Genetic Interactions Among Functionally Related Subsets of Genes will Create a Dense Network of Interactions Advances in Functional Genomics will Improve the Global Network Investigation of Worm Interactions with Interactions in Higher Organisms CHAPTER 5. UNC-73 FUNCTIONS CELL-AUTONOMOUSLY IN THE UNC-40 PATHWAY TO REGULATE MUSCLE ARM EXTENSION Abstract Introduction to Muscle Arm Extension in C. elegans Genetic Control of Guided Cell Migration is Plastic in Nature Muscle Arm Extension in C. elegans is a Model for Guided Cell Migration A Forward Genetic Screen Identifies 23 Muscle Arm Development Defective (Madd) Mutants UNC-40/DCC Regulates Muscle Arm Extension in C. elegans UNC-73 Regulates Axon Guidance in C. elegans Results Characterization of Seven Madd Mutants unc-33(tr114) and unc-51 (tr126) are Likely Madd as a Secondary Consequence of Neuronal Defects tr121 is an Allele of unc tr105, tr123 and tr98 Mutants Have Weak Madd Phenotypes unc-73 Functions Cell-Autonomously to Regulate Muscle Arm Development UNC-73 is Localized to Muscle Arm Termini UNC-73 Functions Downstream of UNC-40 in Muscle Arm Development The Localization of UNC-40 and MADD-2 are Disrupted in unc-51 Mutants Proteins Required for Muscle Arm Extension are Localized to the Muscle Arm Termini UNC-40 Functions at the Membrane to Direct Muscle Arm Extension Discussion and Future Directions unc-73 Functions Downstream of unc-40 in Muscle Arm Development Possible Roles of unc-51 and unc-33 in Muscle Arm Development UNC-40 Likely Functions at the Leading Edge of Migrating Cells UNC-40 may Perform a Function in the Nucleus Potential Identities and Roles of tr123 and tr In vivo Visualization of Muscle Arm Extension will Provide Insight Into Guided Cell Migration Materials and Methods Strains Microscopy Mutant Identification RNAi Experiments viii

9 Additional files References ix

10 List of Figures Figure 1-1. Two Classic Definitions of Epistasis Figure 1-2. Synthetic Genetic Interactions Figure 1-3. Suppressing Genetic Interactions Figure 1-4. Network Properties Predict Gene Function Figure 1-5. C. elegans Life Cycle Figure 1-6. Notch, EGF, and Wnt Pathways Regulate Vulval Induction Figure 1-7. The RAS/MAPK Pathway Regulates Diverse Processes Through Multiple Signal Transduction Pathways Figure 2-1. Synthetic Genetic Interaction (SGI) Analysis in C. elegans Figure 2-2. SGI Proof of principle Assay: Insulin Pathway Figure 2-3. The SGI Network Figure 2-4. The Precision and Recall of the SGI Network and of Other Functional Interaction Datasets. 65 Figure 2-5. The SGI network Correlates with Existing Functional Annotation Figure 2-6. Distribution of Interactions Among Query Genes Figure 2-7. Global Patterns of Interactions within the SGI Network Figure 3-1. Network Properties of the SGI Network Figure 3-2. A Schematic of the Construction of a Superimposed Network Figure 3-3. Genes in bar-1 module share a pale phenotype Figure 3-4. The bar-1 Module Regulates Fat Storage and/or Metabolism Figure 3-5. Analysis of the Overlap between Genetic Interactions and Other Modes of Interaction Figure 3-6. SGI Interactions Bridge Subnetworks Figure 3-7. A Schematic of the Approaches used to Investigate if Synthetic Genetic Network Connectivity is Conserved Figure 4-1. Towards a Global Metazoan Network Graph Figure 4-2. Synthetic Genetic Analysis of Muscle in C. elegans Figure 5-1. Axon Guidance by the UNC-6/Netrin Pathway in C. elegans Figure 5-2. Muscle Arm Extension in C. elegans Figure 5-3. UNC-40/DCC Regulates Muscle Arm Extension in C. elegans Figure 5-4. Muscle Arm Development Defects of Cloned Madd Mutants Figure 5-5. Muscle Arm Development Defects of Uncloned Madd Mutants Figure 5-6. tr117 is an Allele of unc Figure 5-7. unc-73(tr117) Functions Cell-Autonomously in Muscle to Guide Muscle Arm Extension Figure 5-8. UNC-40 and UNC-73 are Enriched at the Muscle Arm Termini Figure 5-9. An Analysis of UNC-73 s Role in UNC-40 Localization x

11 Figure Quantification of UNC-73 s Role in UNC-40 Localization Figure UNC-73 Functions Downstream of UNC-40 in Muscle Cells Figure MADD-2 Localization is Disrupted in an unc-51 Mutant Background Figure GEX-2 and EVA-1 are Enriched at the Muscle Arm Termini Figure UNC-40 Functions at the Membrane to Direct Muscle Arm Extension Figure A Model for UNC-73 Function in Muscle Arm Extension xi

12 List of Tables Table 2-1. A Summary of the Query Genes Table 2-2. Phenotype Analysis Table 2-3. A detailed assessment of the nature of the SGI interactions Table 2-4. Comparison of SGI and Lehner Genetic Interactions Table 2-5. Reciprocal Query-Query Interactions Table 3-1. Composition of the C. elegans Superimposed Network Table 3-2. Genetic Interactions Within the bar-1 Module Table 5-1. The 23 Madd Mutants Table 5-2. Characterization of Madd Mutants xii

13 Chapter 1 An Introduction to Synthetic Genetic Interaction Analysis in C. elegans

14 Chapter 1. An Introduction to Synthetic Genetic Interaction Analysis in C. elegans Genetic redundancy is a critical factor in development, disease, and evolution. Previous efforts to systematically describe genetic interactions between redundant genes on a global scale have focused on core biological processes in yeast or have surveyed catastrophic interactions in metazoans. In this thesis, I present a novel approach to define synthetic genetic interactions on a global scale using the nematode Caenorhabditis elegans as a model system. I call this approach Systematic Genetic Interaction analysis (SGI). In this introductory chapter, I present an overview of synthetic genetic interaction analysis. This includes a description of synthetic genetic interactions, their functional significance, and the high-throughput approaches that have been employed to uncover such interactions in yeast. I also review various methods to interpret and analyse genetic interaction data. Finally, I describe the characteristics that make C. elegans an ideal model organism to investigate synthetic genetic interactions in a high-throughput manner and the signal transduction pathways that I chose to investigate with SGI Synthetic Genetic Interactions Reveal Gene Function Genetic Modifier Screens Ascribe Function to Genes A basic premise of genetics is that the biological role of a gene can be inferred from observing the consequence of its disruption. For many genes, however, genetic disruption yields no detectable phenotype in a laboratory setting. For example, ~66% of genes deleted in Saccharomyces cerevisiae have no obvious growth phenotype in standard growth conditions 2

15 (Giaever et al., 2002). Disruption of a similar fraction of genes in Caenorhabditis elegans is expected to result in a wild type phenotype (Hodgkin, 2001; Kamath et al., 2003; Simmer et al., 2003). Elucidating the function of these genes therefore requires an alternative approach to single gene disruption. One way to uncover biological roles for phenotypically silent genes is through genetic modifier screens. Genetic modifiers are traditionally identified through a random mutagenesis of individuals harbouring one mutant gene followed by a screen for second site mutations that either enhance or suppress the primary phenotype (Forsburg, 2001; Guarente, 1993). Alternatively, a forward genetic screen that was not intended to identify genetic modifiers may do so when mutagenesis creates two separate mutations that are both required to elicit the desired phenotype (Horvitz and Sulston, 1980). Modifying genes identified in forward genetic screens participate in regulation of the process of interest, yet often have no detectable phenotype on their own (Colavita and Culotti, 1998; Davies et al., 1999; Ferguson and Horvitz, 1989; Forsburg, 2001; Gu et al., 1998; Hartman and Roth, 1973). For example, a forward genetic screen for mutations that disrupt wild type C. elegans development generated a strain with multiple vulvas (Horvitz and Sulston, 1980). The multivulva phenotype in this strain was dependent on the presence of a mutation in each of the genes lin-8 and lin-9. A subsequent forward screen for mutants that induce the multivulva phenotype in a lin-8 or lin- 9 background identified 18 extragenic modifier mutations among 8 genes that individually caused no defects in vulval development (Ferguson and Horvitz, 1989). Analysis of the genetic interactions among these ten genes led to the discovery that vulval development in C. elegans is inhibited by two redundant signalling pathways termed synmuv A (synthetic Multivulva) and synmuv B (Ferguson and Horvitz, 1989). Individually, worms with a disruption in either of the 3

16 synmuv pathways will develop normally (Ferguson et al., 1987; Sulston and Horvitz, 1981). However, if a gene in each of these pathways is mutated, a multivulva phenotype is manifested (Ferguson and Horvitz, 1989). Indeed, this was the first evidence that redundancy occurs at the level of pathways in addition to between individual genes. Thus, forward genetic modifier screens are useful approaches to ascribe function to genes that otherwise have no phenotype Epistasis Defines a Genetic Interaction To identify genetic modifiers and to interpret their function, it is important to first define what constitutes a genetic interaction. Broadly, a genetic interaction is identified between two genes when the mutant phenotype of one gene is modified with concomitant mutation of another gene. This phenomenon was first described as epistasis. However, there are two different definitions of epistasis, both of which aid in functional interpretation of the relationship between two genes (reviewed in Boone et al., 2007). William Bateson used the term epistasis first, in 1909, to describe deviations from expected Mendelian ratios when perturbation of one gene masks the effects of an allele at another locus (Bateson, 1909). Describing such an epistatic relationship between two genes can provide insight into the nature of their functional relationship. Classical geneticists use the Bateson definition of epistasis to order two genes in a common pathway when concomitant mutation of two genes, A and B, produces the same phenotype as mutation A but not that of mutation B. A classic example of ordering genes using Bateson epistasis in C. elegans involves the sexdetermination pathway genes tra-1 and her-1 (Figure 1-1A) (reviewed in Avery and 4

17 A. her 1 tra 1 XO her 1; tra 1 her 1 tra 1 XX B. Figure 1 1. Two Classic Definitions of Epistasis. (A) Classical geneticists use Bateson s definition of epistasis when ordering genes in a common pathway. In C. elegans, her 1 mutants are hermaphrodites and tra 1 mutants are male, regardless of the number of X chromosomes that are present. A her 1; tra 1 mutant is male, suggesting that tra 1 is epistatic to her 1 and functions downstream of it in the sex determination pathway. (B) Fisher defined epistasis as a relationship between two genes whose mutual disruption results in an effect that is not equal to the multiplicative effects of the single mutations. As defined, the genes X and Y have an epistatic relationship and the genes U and V do not. 5

18 Wasserman, 1992). The ratio of sex chromosomes to autosomes signals whether a worm will develop as a male or as a hermaphrodite. Recessive mutations in tra-1 cause all homozygotes, regardless of the number of X chromosomes, to be male, while loss-of-function mutations in her-1 cause all homozygotes to be hermaphrodites. Double mutants are all male, indicating that tra-1 is epistatic to her-1. This relationship implies that tra-1 functions downstream of her-1, which normally acts to inhibit tra-1 from generating the male phenotype. While Bateson s epistasis does define a functional relationship, it is limited to defining masking interactions between genes. In 1919, Sir Ronald Fisher used a statistical model to define epistasy as occurring between two genes that, when mutated in the same organism, have a double mutant phenotype different from the multiplicative effects of both single mutants (Figure 1-1B) (Fisher, 1918). For example, consider two genes X and Y that individually cause a 12% and 33% decrease in fitness, respectively. According to Fisher, anything other than a 41% (0.88 x 0.67 = 0.59) decrease in fitness in the double xy mutant would be called an epistatic interaction or a synthetic genetic interaction between genes X and Y. As with Bateson s epistasis, Fisher s epistasis can define a functional relationship between two genes. For example, if the double mutant xy has a 98% decrease in fitness, the products of genes X and Y may function in a similar process that can be carried out in the absence of either gene but not both. Alternatively, if a double mutant xy has a 41% decrease in fitness, genes X and Y are not considered epistatic and their combined effects likely result from disruption of functionally unrelated processes. As such, Fisher s broad definition of epistasis describes three classes of relationship: suppression, enhancement (or aggravating), and non-interaction. 6

19 A. A D A D A D A D B E B E B E B E C F C F C F C F X X X X Viable Viable Viable Lethal/Sick B. A A A A B B/R B/R B/R C C C C D D D D X X X X Viable Viable Viable Lethal/Sick C. A A A A A B B B B B C C C C C D D D D D X X X X X Viable Lethal Viable Viable Lethal/Sick Figure 1 2. Synthetic Genetic Interactions. Three classes of genetic enhancement are depicted. (A) Genetic interactions can occur between genes in parallel pathways or (B) between redundant genes in the same pathway. (C) In the case where a null (large X) is lethal, weak losses of function (small x) can identify if buffering interactions i within ihi a pathway. 7

20 For the purposes of my work and this thesis, I define interactions that result in a phenotypic degree that exceeds that which is expected based on the null hypothesis that two genes do not interact, as enhancing. Those interactions that fail to reach the degree of the expected phenotype will be considered suppressing. Only those interactions that involve mutations that affect the same process in opposite ways and mask one another when combined will be referred to as epistatic Interpreting Enhancing Genetic Interactions An enhancing genetic interaction can describe various scenarios, including: mutations in two genes cause a synthetic phenotype not observed in either single mutant, two intragenic mutations cause a synthetic phenotype not observed in either single mutant, and mutations in two genes enhance a phenotype observed in either single mutant. The most drastic type of interaction is a synthetic lethal interaction, where mutation of two genes causes death, which does not result from either of the individual mutations. Although three or more genes can interact to carry out a given function (Tong et al., 2004), genetic interactions between two genes will be considered here for simplicity. An enhancing genetic interaction can signify several possible relationships between two genes. First, disrupting a pair of genes that belong to parallel pathways that regulate the same process may reveal a between-pathway interaction (Figure 1-2A). Second, compromising a pair of genes that act at the same level of the pathway or are ancillary components at different levels of the pathway may reveal a within-pathway interaction (Figure 1-2B, C). Finally, each gene of an interacting pair may act in unrelated processes that collapse the system when compromised together through poorly understood mechanisms, revealing an 8

21 indirect interaction (Kelley and Ideker, 2005). As the cell may function by coordinating collections of gene products that work together as discrete units, called molecular machines or functional modules (Alberts, 1998; Hartwell et al., 1999), these indirect interactions may actually reveal redundancy between previously unrecognized functional modules. To interpret the relationship between two synthetically interacting genes, characteristics of the interacting mutations must be considered. These characteristics include the nature of the mutations, the phenotypes of the individual mutants, the synthetic phenotype of the double mutant, and whether the genes are essential (reviewed in Forsburg, 2001; Guarente, 1993). First, consider synthetic interactions between two genes where both mutants are complete losses-of-function (nulls). Loss-of-function mutations in downstream or upstream genes are unlikely to exacerbate the effect of a null mutation in the same pathway. Therefore, the simplest interpretation of a synthetic lethal interaction between two null mutants that individually result in viability is that the genes act in parallel pathways that contribute to the same essential process. An enhancing interaction between a weak loss of function mutant and a null is also likely a between-pathway interaction if the synthetic phenotype exceeds that of the null. An alternative interpretation of a synthetic enhancement interaction between null mutations in non-essential genes is that the two genes may function completely redundantly at the same level of a single pathway. Interpreting an enhancing interaction between incomplete loss-of-function mutations (hypomorphs) is less straightforward. Such an interaction could either imply that the two genes are in the same pathway or in parallel pathways. One way to distinguish between these two possibilities is to compare the synthetic phenotype to the individual null phenotypes of the interacting genes (Guarente, 1993). The genes are likely in the same pathway if one of the 9

22 individual null mutations causes a specific phenotype equal to that of the double mutant phenotype. For example, if the synthetic phenotype is equal to the null phenotype of at least one of the genes, the simplest interpretation is that it is a within-pathway interaction and that there is no parallel pathway buffering or contributing to the essential process. However, this interpretation could be false if the genes function in other processes or if the phenotype being measured, such as lethality, can result from distinct processes (Guarente, 1993). Alternatively, two genes whose hypomorphs interact are likely in parallel pathways if the synthetic phenotype is worse than the phenotype of either of the individual null mutants. In this case, the mutations could disrupt genes in two pathways that contribute to a shared function Interpreting Suppressing Genetic Interactions Extragenic synthetic suppressing interactions, also called alleviating or positive interactions, can also imply different types of functional relationships between two genes (reviewed in Forsburg, 2001; Guarente, 1993). For example, suppression by epistasis, as described above for the C. elegans sex-determination pathway, occurs when mutation of a component of a linear pathway compensates for or masks loss of function of another component in the pathway (Figure 1-1A). Bypass suppression occurs when two genes encode a product with a similar function or function in parallel pathways (Figure 1-3A). Therefore, up-regulation of gene B compensates for down-regulation of gene A. In both epistatic and bypass suppression, the mutation in gene B will suppress multiple mutations in gene A. In the case of interaction suppression, the interaction is between genes that encode physically interacting proteins. Here, mutation in gene A causes a conformational change that would otherwise disrupt interaction with gene B, which is compensated for by a conformational change in gene B. The 10

23 A. Bypass Suppression C;D Wild type c(lf); D Mutant t c(lf); d(gf) Wild type B. Interaction Suppression X;Y Wild type X; y(lf) Mutant x(lf); y(lf) Wild type Figure 1 3. Suppressing Genetic Interactions. Two types of extragenic suppression are depicted. (A) Bypass suppression can occur when loss of function (lf) of one gene (C) is compensated for by gain of function (gf) of another (D). (B) An example of an interaction suppression is depicted. In this example, an initial i i mutation (y) causes a conformational change that prevents physical interaction with its binding partner (X). Suppression of the resulting phenotype occurs when mutation of the binding partner (x) causes a conformational change that complements the original mutation. 11

24 suppression is allele- and gene- specific, in that a mutation of gene B will only suppress specific alleles of gene A (Figure 1-3B). Temperature sensitive mutations can often be suppressed in this way, because they result in allele-specific conformational changes. Finally, informational suppression includes trna suppressors that can suppress specific alleles of multiple genes. Therefore, interpreting suppressing genetic interactions involves determining if the suppression is allele and gene specific. In summary, genetic interactions occur when mutation of two genes results in: 1) a synthetic phenotype not observed in either single mutant; 2) an enhancement of the phenotype of one or both of the single mutants; or 3) an alleviation of the phenotype of one or both of the single mutants. These three types of interactions lead to specific functional interpretations of the relationship between the two genes in question. Such functional implications include whether the genes function in a given process in the same or in parallel pathways and whether they act upstream or downstream of each other High-Throughput Approaches to Identify Genetic Interactions Yeast Synthetic Genetic Array Analysis (SGA) While genetic interactions have a demonstrated value in revealing shared function between two genes, until recently, genetic interactions were identified relatively infrequently. This is due to the time consuming and laborious nature of uncovering a genetic interaction using classical approaches (Colavita and Culotti, 1998; Davies et al., 1999; Ferguson and Horvitz, 1989; Forsburg, 2001; Gu et al., 1998; Hartman and Roth, 1973). An elegant approach called Synthetic Genetic Array (SGA) analysis was devised to systematically analyze the phenotypic consequences of double mutant combinations in Saccharomyces cerevisiae (Tong et al., 2001). 12

25 With SGA, a query deletion strain is mated to a comprehensive library of non-essential deletion strains (Giaever et al., 2002) through a mechanical pinning process. Resulting double mutant combinations typically have growth rates indistinguishable from single mutant controls. However, some deletion pairs produce a synthetic sick or lethal phenotype not shared by either single mutant, indicating a genetic interaction. In this seminal work, 132 query mutants were crossed with an array of approximately 4700 mutants representing nonessential genes (Tong et al., 2001; Tong et al., 2004). Approximately 4000 interactions among ~1000 genes were identified as a result. Importantly, the synthetic lethal genetic interactions uncovered by SGA are enriched between functionally related genes as measured by shared GO annotation. This functional enrichment demonstrates the validity of high throughput synthetic genetic analysis at uncovering bona fide genetic interactions. Moreover, subsequent investigation of the yeast SGA genetic interactions, as discussed below, provided invaluable insight into the nature of genetic interactions. Particularly, the revelation that most nonessential genes synthetically interact with several genes from different pathways is a major biological insight as it suggests that many genes have multiple redundant functions and provides a satisfying explanation for the apparent lack of phenotype for the majority of gene disruptions (Tong et al., 2001; Tong et al., 2004). Moreover, this finding suggests a high level of buffering exists in the system to compensate for gene disruption Modified-SGA Approaches to Identify Genetic Interactions in Yeast SGA-related techniques have been developed to investigate other classes of genetic interactions and to mine the consequences of interactions in detail (Davierwala et al., 2005; 13

26 Pan et al., 2004; Schuldiner et al., 2005; Sopko et al., 2006). One such modified approach is the use of SGA to identify interactions among essential genes using conditional temperature sensitive mutants or alleles under the control of tetracycline repressible promoters (Davierwala et al., 2005; Mnaimneh et al., 2004). These studies revealed a five-fold increase in the number of genetic interactions, suggesting that essential genes are functionally related to more genes than are non-essential genes. As such, essential genes may be hubs on the genetic interaction network. Other modifications of SGA include the use of chemical perturbants to infer gene function by identifying interactions between genes and small molecules (Giaever et al., 2004; Parsons et al., 2006). Over-expression arrays have been used to identify synthetic dosage lethality (SDL) interactions (Sopko et al., 2006). SDL interactions include kinase-substrate interactions where loss-of-function of an inhibitory kinase is synthetically lethal with a gain-of-function of its downstream target (Sopko et al., 2006). The dslam (diploid Synthetic Lethal Analysis by Microarrays) approach takes advantage of a unique molecular barcode incorporated into each deletion allele in the yeast knockout library (Giaever et al., 2002; Pan et al., 2004). With this approach, double mutants are created by transforming a transgene that targets a query gene for deletion en masse into the pooled library of heterozygous deletion strains. The resulting growth of each double mutant in the pool is measured by hybridization of each barcode to a microarray and is compared to the relative growth of single mutants. In addition to synthetic lethal genetic interactions, this approach also has the potential to identify sub-lethal enhancing and suppressing genetic interactions. A common question in the field of synthetic genetics is whether a given synthetic phenotype is simply a consequence of sick plus sick equals sicker. Various statistical models of epistasis, 14

27 such as that proposed by Fisher, differentiate between an interaction involving two functionally related genes and the neutral effect of two non-functionally related genes (Mani et al., 2008). However, in order to apply a statistical model to the analysis of a given gene pair, the phenotype of each single mutant along with that of the double mutant must be quantifiable. SGA approaches including E-MAPS (Epistasis Miniarray Profiles) address this issue (Schuldiner et al., 2005). E-MAPS capture a broad range of genetic interactions by quantifying fitness (Schuldiner et al., 2005). The initial E-MAP study used both strains carrying deletions of non-essential genes and hypomorphic alleles of essential genes created with a strategy termed DAmP (Decreased Abundance by mrna Perturbation) (Schuldiner et al., 2005). The use of both non-essential and essential genes, combined with statistical comparison of the colony size of double mutants to that of single mutants, resulted in the ability to detect alleviating and enhancing genetic interactions that were within- and betweenpathway. Quantitative descriptions of the resulting interactions facilitated ordering the interacting genes within the early secretory pathway. Quantification of phenotype was also instrumental in a genetic interaction screen carried out by the Galitski group (Drees et al., 2005). The ability of yeast to invade agar was quantified and the resulting data were used to identify nine classes of relationships between the growth phenotypes of wild type (WT), mutant A, mutant B, and double mutant AB strains. These nine relationships describe types of interaction including enhancing, alleviating, noninteracting, and additive. For example, a synthetic enhancing interaction is described by the equation: AB<WT=A=B, where WT, AB, A, and B represent the amount of invasive growth of the respective mutants. Alternatively, an alleviating or suppression interaction is described by 15

28 A=WT=AB<B. Therefore, quantifying the phenotypes of double mutants and controls facilitates interpretation of the nature of the genetic interactions. The examples outlined above are but a few of the approaches that have been developed using S. cerevisiae to identify genetic interactions on a high-throughput scale. As such, yeast has proven to be an ideal model organism not only to study genetic interactions themselves but also to develop high throughput methodologies to uncover and interpret genetic interactions. The advances in yeast SGA should guide the design of new high throughput interaction screens in other organisms. Noteworthy lessons include the use of: 1) mutations in both non-essential and essential genes, and 2) quantitative analysis. Together, these advances have resulted in greater sensitivity at detecting diverse types of interactions, thus providing a broader view of the genetic interaction spectrum Network Analysis Provides Biological Insight into Gene Function Manipulating and interpreting the large amount of data generated by functional genomic screens is a burgeoning field of study. Network analysis has proven to be essential to the analysis of synthetic genetic interactions. Network graphs consist of nodes, which represent individual genes, connected by edges, which represent genetic interactions (Figure 1-4). The topology of these genetic interaction network graphs can reveal gene function and provide an understanding of how biological components function in the context of a whole organism (Tong et al., 2004). Much of what is known about network biology stems from the study of S. cerevisiae. For example, the network of yeast SGA interactions is scale-free (Figure 1-4) (Barabasi and Oltvai, 2004; Tong et al., 2004). Therefore, much like the internet, the nodes in the SGA network 16

29 Scale free Random Figure 1 4. Network Properties Predict Gene Function. The yeast genetic interaction network is scale free rather than random. Thus, few genes are highly connected and many genes have few interactions. The shortest path length between the green and red nodes is 2. The high clustering co efficient of the yeast genetic interaction network results in the prediction of a genetic interaction between the yellow and purple nodes. Nodes represent genes, solid edges represent genetic interactions, and dashed edges represent predicted genetic interactions. 17

30 follow a power law distribution such that few nodes are highly connected to each other and many nodes are connected to few others. The scale-free organization can be contrasted to that of a random network where every node has a similar number of links. Scale-free networks have an extreme small-world property (Barabasi and Oltvai, 2004). The average distance between any two genes in a small-world network, as measured by the number of links (path length) between them, is short, implying a large amount of genetic buffering or network robustness exists. Indeed, the yeast SGA network has an average path length of 3.3 and an average query gene exhibits a synthetic genetic interaction with 34 target genes, implying a high level of redundancy in yeast (Tong et al., 2004). Analysis of the yeast SGA network demonstrated that genetic network topology predicts genetic interactions and gene function (Tong et al., 2004). The small-world characteristic of the yeast SGA network is due to dense local neighbourhoods of interactions. As genetic interactions link functionally related genes, the small-world characteristic implies that the yeast SGA network contains clusters of genes with similar function. Indeed, Tong et al. demonstrated that the function of a gene is likely to be similar to that of its closest neighbours by using network topology to predict genetic interactions. Of 2561 gene pairs where both genes interacted with SGS1, RAD27 or BIM1, a significantly large proportion (24, 18, and 18% respectively) were subsequently shown to also interact with each other. This is a roughly 20- fold enrichment of synthetic lethal interactions over the average interaction rate of 1% between all gene pairs tested in the original SGA screen. Therefore, two genes with a shared interacting gene are themselves likely to genetically interact, indicating the two genes share function (Tong et al., 2004). 18

31 Groups of genes with similar function that appear in network graphs as clusters of genes with shared genetic interaction partners are called functional modules (Barabasi and Oltvai, 2004). Functional modules are chemically or spatially distinct entities within a cell, such as a signal transduction pathway or a macromolecular machine, respectively, that have emergent properties (Hartwell et al., 1999). Functional modules define a unit of higher organisation within the cell and can reveal new function for uncharacterized and previously characterized genes (Tanay et al., 2004). Specifically, one can predict the function of an uncharacterized gene to be similar to that of its neighbours in the functional module. Thus, by providing an understanding of how genes work in the context of a global system, network analysis is effective at determining gene function at the level of individual genes. Conservation of a scale-free organization among networks in many complex systems, such as the internet and yeast, has led to the hypothesis that there is a universality to network biology (Barabasi and Oltvai, 2004). If true, such universality would imply that metazoan genetic interaction networks are also likely to be scale-free, which indicates a large amount of genetic buffering exists in metazoans. This buffering or compensation is expected to modulate polygenic disease such as cancer, Hirschsprung disease, and Bardet-Biedl syndrome (Badano and Katsanis, 2002). A point of contention in cancer research is which mutations are required to form a cancer. This lack of agreement is caused by confusion between driver mutations that are responsible for cancer development and passenger mutations that are simply coincident (Greenman et al., 2007). Understanding how genes work together in a system is essential to understand the mutational load or the combinations of genetic variations that may result in complex disease or phenotype. Therefore, the importance of studying network biology is evident when considering complex diseases such as cancer. Moreover, while a large 19

32 amount of the current understanding of network biology has come from the study of unicellular organisms, the network biology of multicellular organisms must be considered to gain a better understanding of multifactorial diseases such as cancer. Insights could include how polygenic disease is initiated and progresses compared to monogenic disease and how modifier mutations may affect the pathology of certain diseases based on synthetic interaction with genomic variation Synthetic Genetic Interactions in Metazoans C. elegans is an Ideal Model Organism to Study Genetic Interactions While the SGA approach in yeast has made a large impact on our understanding of core biological processes, studying synthetic interactions in a metazoan allows for investigation of interactions within conserved processes and functions, such as signal transduction pathways, involved in both disease and development. Several features make the nematode worm Caenorhabditis elegans uniquely suited amongst animal model systems to systematically investigate genetic interactions in a high-throughput manner (Brenner, 1974). First, the worm has a short three-day life cycle, making genetic manipulation relatively less time-consuming than working with organisms such as mice. Second, animals can be easily cultured in multiwell-plate format, making the preparation of large numbers of samples economical. Third, ~99.8% of the individuals within a population are hermaphrodites. Strains therefore propagate during an experiment without the need for human intervention. Fourth, C. elegans genes can be specifically targeted for reduction-of-function through RNA interference (RNAi)- by-feeding (Timmons and Fire, 1998). Moreover, the invariant cell lineage of the worm s 959 somatic cells has been mapped (Sulston et al., 1983; Sulston and Horvitz, 1977). The extensive 20

33 characterization of worm development facilitates interpretation of mutant phenotypes. Another useful feature of the worm is the large collection of publicly available mutants representing most of the conserved pathways that control development in all animals (Ruvkun and Hobert, 1998). Together, these features make C. elegans a unique whole-animal model system to systematically probe genetic interactions in a high-throughput fashion. One potential obstacle to systematically investigating genetic interactions in any metazoan is the inability to cross strains carrying genetic mutations to create double loss-of-function mutants in a high-throughput manner as is done in yeast SGA. In metazoans, the most highthroughput way to disrupt the function of specific genes in whole animals is to use RNAi. Besides the C. elegans RNAi library, many genome-wide RNAi libraries exist that target genes in metazoans such as D. melanogaster and in mammalian cell lines (Boutros et al., 2004; Dietzl et al., 2007; Goshima et al., 2007; Kamath et al., 2003; Moffat et al., 2006). C. elegans is the most amenable to high-throughput gene loss-of-function by RNAi in a whole animal because RNAi can be fed to worms to create systemic disruption of specific genes (Timmons and Fire, 1998). C. elegans is the ideal model organism in which to systematically investigate genetic interactions because two genes can be disrupted simultaneously in a high-throughput manner by feeding RNAi inducing bacterial strains to genetic mutants RNAi is a Useful Tool to Reduce Gene Function A library of E. coli strains has been generated in which each strain expresses double-stranded (ds)rna whose sequence corresponds to a particular worm gene (Fraser et al., 2000; Kamath et al., 2003). Upon ingestion of the E. coli, dsrnas target a particular gene for reduction-of- 21

34 function by RNAi (Fire et al., 1998). RNAi-inducing bacterial strains targeting over 80% of the 20,604 protein-coding genes of C. elegans are included in the library. The mechanism of dsrna-induced RNAi is a series of molecular steps that translate a specific dsrna signal into degradation of a sequence-matched mrna transcript (reviewed in Grishok, 2005; Hammond, 2005; Meister and Tuschl, 2004; Mello and Conte, 2004). The dsrnas are processed by the nuclease DICER into 21- to 23- nucleotide fragments called small interfering (si)rnas (Bernstein et al., 2001; Grishok et al., 2001; Hutvagner et al., 2001; Ketting et al., 2001). The sirnas target the RNAi Induced Silencing Complex (RISC) to mrna that are complementary to the sirna sequence (reviewed in Filipowicz, 2005). A single strand of the sirna is bound to the mrna transcript and the RISC complex then destroys the mrna transcript by cleaving it into small fragments. Additional dsrna is made from the sirna-mrna hybrid, resulting in amplification of the dsrna signal. Furthermore, dsrna is systemically distributed in order to target the particular mrna for downregulation throughout most of the animal (Fire et al., 1998; Sijen et al., 2001). The production of the RNAi library allowed an enormous advance in understanding the function of C. elegans genes. First, a functional characterization of each of the genes targeted by RNAi was carried out in both a wild type and in an RNAi hypersensitive strain, rrf-3 (Fraser et al., 2000; Kamath et al., 2003; Simmer et al., 2003). The use of rrf-3 as a background mutation increased the number of genes that caused a visible phenotype when targeted by RNAi from 10% to 33%. The two aforementioned genome-wide screens revealed that the RNAi library produces specific down-regulation of individual genes in an incompletely penetrant manner. Fewer than 1% of genes that produced phenotypes when targeted by RNAi were false-positives 22

35 (Kamath et al., 2003). However, the penetrance of RNAi varies. For unknown reasons, more than 30% of genes that did not produce a phenotype were false-negatives, and tissues such as the nervous system are largely refractory to RNAi (Kamath et al., 2003). In addition, RNAi does not always produce a complete loss of gene function. While performing multiple repeats of RNAi experiments can circumvent false negatives caused by the variant penetrance of RNAi, more confidence can be placed in positive RNAi experiments than in negative experiments. Other methods to induce RNAi exist and new collections of RNAi-inducing constructs have been created (Maeda et al., 2001; Rual et al., 2004; Sonnichsen et al., 2005). Besides feeding dsrna-producing E. coli to worms, systemic induction of RNAi by injecting dsrna into the gonad of worms or by soaking worms in dsrna have been described (Fire et al., 1998; Maeda et al., 2001; Tabara et al., 1998). The Vidal group constructed the ORF-eome RNAi library, which consists of plasmids directing synthesis of dsrna corresponding to confirmed open reading frames (Rual et al., 2004). The production of this RNAi library increased the number of C. elegans genes targeted by RNAi by 1736 genes, 1066 of which cause phenotypes. Despite these options, the RNAi feeding library is preferred in most cases because of its ease of use and its almost-complete genome-coverage. RNAi has facilitated numerous in vivo investigations of C. elegans gene function. These include detailed analyses of individual genes, genome-wide screens, and candidate screens. First, RNAi has been used to study the function of individual genes of interest in both embryonic and post-embryonic development (Kamath et al., 2001; Kuroyanagi et al., 2000). An advantage of using RNAi instead of a loss-of-function mutant in many of these investigations is the ability to titrate the amount of RNAi-producing E. coli that is fed to a worm. RNAi titration creates varied loss-of-function of a specific gene akin to an allelic series 23

36 and thus facilitates analysis of genes whose only existing mutants are lethal (Kamath et al., 2001). Second, RNAi has facilitated large-scale screens for genes involved in a particular process such as fat regulation, early embryogenesis, or cell division, and these screens have identified new roles for both characterized and uncharacterized genes in the aforementioned processes (Ashrafi et al., 2003; Gonczy et al., 2000; Sonnichsen et al., 2005). In these screens, the identity of a gene in question is immediately revealed, eliminating a large amount of time required to map and clone mutants isolated from a chemical mutagenesis screen. Third, RNAi has been used in reverse screens to identify phenotypes associated with a specific class of candidate genes. For example, RNAi was used in a phenotypic analysis of genes expressed in the ovary to identify 81 genes with essential roles in embryogenesis (Piano et al., 2000). In this case, many of the genes identified have maternal-roles that would have prevented identification of their embryonic functions in a traditional forward genetic screen approach. However, the ability to temporally control down-regulation of genes by RNAi allowed the authors to distinguish between genes required for embryogenesis and genes required for production of eggs. Therefore, RNAi can be applied in a specific, temporally controlled, and titratable manner to down-regulate the function of the majority of the genome at multiple stages in development. These qualities have made RNAi enormously useful in identifying gene function for those genes in which phenotype is induced upon RNAi-induction. Moreover, the ease of manipulation and completeness of the RNAi feeding library make it ideal for high-throughput disruption of gene function. As such, it is a resource that can be taken advantage of in a highthroughput investigation of genetic interactions. By facilitating the simultaneous disruption of two genes, the RNAi library will be useful in functionally annotating the large number of 24

37 uncharacterized genes in the C. elegans genome that have no obvious phenotype when disrupted individually Previous Genetic Interaction Screens in C. elegans Prior to the work presented in this thesis, there were only 2,620 reported genetic interactions in C. elegans (Wormbase, Release WS170). These genetic interactions were identified in small-scale investigations that included: 1) classic forward enhancer and suppressor screens as presented above; 2) reverse enhancer and suppressor screens using RNAi as discussed below (Baugh et al., 2005; Tewari et al., 2004); and 3) plasmid screens as carried out in classic yeast experiments (Bender and Pringle, 1991; Davies et al., 1999). In an example of a plasmid screen, investigators looked for enhancers of compromised mec-8, which encodes a putative RNA splicing factor involved in alternative splicing (Davies et al., 1999). The investigators created a strain containing a viable mec-8 null mutation and a rescuing extrachromosomal transgenic array containing the wild-type mec-8 sequence. Extrachromosomal arrays are lost randomly during germ cell proliferation. Thus, a parent containing an extrachromosomal array will produce progeny that have and progeny that do not have the extrachromosomal array (Stinchcomb et al., 1985). The strain was subjected to chemical mutagenesis and then screened for mutants that were dependent on the mec-8-encoding extrachromosomal array for viability. Five synthetic lethal mec-8 interactors were identified in this manner (Davies et al., 1999). Enhancer and suppressor screens using RNAi have been exploited to further investigate the TGF-β signal transduction pathway, vulval development, cell polarity, and muscle development (Baugh et al., 2005; Cui et al., 2006; Labbe et al., 2006; Poulin et al., 2005; Tewari et al., 2004). 25

38 In all of these screens, a mutant strain was fed RNAi-inducing bacteria that targeted a second gene for down-regulation and the resulting double loss-of-function mutants were screened for enhancement or suppression of a specific phenotype. For example, Baugh et al. performed an RNAi enhancer screen among candidate genes involved in posterior embryonic patterning. They tested 15 mutant strains for synthetic lethality with 22 genes targeted by RNAi and identified five interactions among five genes required for muscle development, three of which are conserved in vertebrates. In an example of an RNAi suppressor screen, O Rourke et al. screened for suppression of conditional lethality caused by a temperature sensitive mutation in the gene encoding dynein heavy chain (O'Rourke et al., 2007). Dynein is a motor-protein that plays a role in multiple microtubule-mediated events such as centrosome segregation. The analysis revealed 20 genes that, when targeted by RNAi, specifically suppress dynein mutations as they suppress three different mutations in the dynein gene and do not suppress conditional mutants in other genes. Supporting evidence for a role of these suppressor genes in dynein-mediated events is that most of them localize in a similar pattern to that of the dynein heavy chain and four encode dynein subunits. The examples presented above serve as proof of principle that feeding RNAi-inducing bacteria to a sensitized strain can be useful in identifying genetic interactions. However, the studies each probed genetic interactions with one query gene or among a small subset of candidate genes. In S. cerevisiae, there are an estimated 199,370 interactions among the ~6,000 essential and non-essential genes in the unicellular organism (Davierwala et al., 2005; Tong et al., 2004). Considering a similar ratio of essential to non-essential genes in C. elegans (Kemphues, 2005) the equivalent minimum number of expected genetic interactions among the 20,604 C. elegans genes is approximately 2,200,000. Therefore, the existing C. elegans 26

39 genetic interactions revealed through studies of particular biological processes represent only 0.1% of the anticipated number of genetic interactions in C. elegans. Moreover, each of the previously reported genetic interaction screens investigated significantly distinct processes. Even if combined, the resulting network would likely be sparse and fail to provide insight into the architecture of the C. elegans genetic network. Hence, the development of highthroughput approaches that makes use of the RNAi library to identify genetic interactions could yield significant insight into the principles of metazoan genetic networks. In 2006, a concurrent investigation of genetic interactions to that presented in this thesis was published (Lehner et al., 2006). In this approach, two genes are simultaneously disrupted by feeding mutant worms RNAi-inducing bacteria that target a second gene for downregulation. Worms are grown in liquid culture and scored in a binary manner for growth versus lack of growth. The qualitative assessment of growth is used to compare double lossof-function mutants with corresponding single loss-of-function mutants to identify catastrophic genetic interactions. Thirty-one genes were tested for genetic interaction with 1744 others, revealing a network of 350 genetic interactions among signal transduction pathways. One of the main conclusions from this work was that six chromatin remodelling genes were extremely promiscuous and likely mediate redundancy on a global scale. However, little analysis of the resulting network was presented. As such, detailed characterization of the metazoan genetic interaction network remained incomplete. 27

40 1.5. Genetic Interactions will Contribute Significantly to the C. elegans Global Network of Functional Data Several high-throughput methodologies using C. elegans as a model system have recently been developed to systematically investigate a variety of interactions, including proteinprotein interactions (interactome) (Li et al., 2004), shared microarray co-expression profiles (transcriptome) (Stuart et al., 2003), (Kim et al., 2001), shared RNAi phenotypes (phenome) (Fraser et al., 2000; Kamath et al., 2003; Rual et al., 2004; Simmer et al., 2003), orthologous gene and protein interactions (DIP; Stark et al., 2006; Tong et al., 2004), and genetic interactions from small-throughput investigations (Wormbase). The resulting datasets describe functional relationships between genes that physically interact, share temporal and spatial expression patterns, produce the same phenotype when down-regulated, or genetically interact. In addition to yielding valuable information for detailed genetic analyses, further biological insight can be gained by examining these large-scale datasets in the context of network graphs. Indeed, network analysis has demonstrated that network principles are conserved between yeast and worms (Gunsalus et al., 2005; Kim et al., 2001; Li et al., 2004; Stuart et al., 2003). Most notably, studies of protein networks in C. elegans revealed that network principles such as the scale-free nature and small-world property of the yeast genetic interaction network are conserved in C. elegans protein interaction networks (Barabasi and Oltvai, 2004; Li et al., 2004; Tong et al., 2004). However, there is a fundamental difference between genetic interaction networks and protein interaction networks. Protein interactions tend to occur within-pathways whereas genetic interactions can occur both within- and between-pathways (Kelley and Ideker, 2005; Ye et al., 2005). Therefore, the connectivity and 28

41 organization of the metazoan genetic interaction network may be different from that of the metazoan protein interaction network. Specifically, at the time of this study, it was unknown whether a metazoan genetic interaction network would exhibit a small-world property that could be exploited to infer function for closely placed genes in the network, and whether a metazoan would have a similar level of genetic buffering to that of a unicellular organism. Moreover, it was unknown how a genetic interaction network would complement or overlap networks of previously identified functional data to add to the functional characterization of C. elegans genes Superimposing Distinct Interaction Networks can Reveal New Insight into Gene Function Previously reported work suggests that a high-density map of functional links within a network best facilitates the identification of functional modules and the inference of gene function (Gunsalus et al., 2005; Kelley and Ideker, 2005; Tewari et al., 2004; Walhout et al., 2002; Zhong and Sternberg, 2006). To create a high-density map of C. elegans functional links, various combinations of the aforementioned datasets have been superimposed. When considered at the level of individual links, the superimposed networks provide both overlapping and disparate functional information, which yields further support for true positive interactions and reveals potential false negatives associated with any one dataset. Interactions between genes that are supported by more than one type of functional data, called multiply-supported links, are less likely to be false positives. Multiply-supported links from a superimposed network were identified by combining 42 protein-protein interactions, RNAi co-phenotype data, and spatiotemporal co-expression data from 553 gene-expression 29

42 experiments (Walhout et al., 2002). In all, 10 high-confidence, multiply-supported links were identified in a network of 600 genes expressed in the worm germline. An investigation of protein interactions involving TGF-β pathway components revealed 71 yeast-two-hybrid (Y2H) interactions among 59 proteins (Tewari et al., 2004). The investigators then carried out genetic interaction analysis between seven mutants and 46 RNAi that targeted genes identified in the yeast-two-hybrid screen. Thirteen genetic interactions were identified. These functional links were mapped on a network graph with physical interactions and the integrated network was used to identify genes with both physical and genetic links to known TGF- β genes. One such gene, W01G7.1, corresponds to a previously uncloned gene, daf-5, which has a characterized role in TGF-B signalling (Thomas et al., 1993), demonstrating the usefulness of an integrated network at predicting gene function. Interestingly, this was the first inclusion of C. elegans genetic interactions in a network. While the two studies described above used overlapping links to assign high confidence to functional inferences, they did not take advantage of the differences between the compared networks to identify false negatives that may have been missed by one investigation yet identified in another. Individually, each of the functional datasets is useful, but incomplete. For example, systematic yeast-two hybrid approaches have revealed thousands of proteinprotein interactions in C. elegans, D. melanogaster, and humans (Giot et al., 2003; Li et al., 2004; Rual et al., 2005; Stelzl et al., 2005). The resulting network of protein interactions predicts the function of many previously uncharacterized genes. One drawback to this approach, however, is that it results in many false negatives. For example, yeast-two-hybrid tests do not reveal interactions that depend on post-translational modifications that are common in metazoan signalling modules, such as tyrosine phosphorylation, since the analysis 30

43 is done in yeast (Colland and Daviet, 2004). Similarly, the RNAi phenotype data has multiple false-negative results due to the inefficiency of RNAi in tissues such as the nervous system (Timmons et al., 2001). Genetic interactions will also add distinct information to the functional map of C. elegans. Because inclusion of multiple types of functional data provides a more holistic view of the animal, the integration of a large network of genetic interactions into the functional map of C. elegans is essential to an understanding of metazoan biology Superimposed Networks Facilitate the Interpretation of the Nature of a Genetic Interaction On a broader level, the superimposed network gives a systems-level view of C. elegans biology that provides insight into the nature of relationship between genes, the robustness of the system, and its architecture. For example, one investigation of a superimposed network combined a network of genes linked by shared phenotypic profile, the interactome (Li et al., 2004), and the transcriptome (Kim et al., 2001; Stuart et al., 2003) to investigate new levels of organization that include functional modules (Gunsalus et al., 2005). It was demonstrated that the modules can reveal new function for uncharacterized and previously characterized genes represented within them. Again, a priority was given to multiply-supported links, as the authors identified functional modules that consisted mainly of links supported by two or more types of data. By restricting analyses to overlapping data characteristic of within-pathway interactions, such as protein interaction, co-expression, and co-phenotype data, the authors of these studies biased their investigations towards within-pathway interactions. Had they been available, the addition of between-pathway genetic interactions may have provided a different view of the system. 31

44 A superimposed network can also provide insight into the nature of genetic interactions. As described earlier (Figure 1-2), synthetic lethal interactions can reveal both between-pathway and within-pathway relationships between genes. To investigate which model best describes the yeast SGA interactions, physical interaction data has been superimposed onto synthetic genetic interaction networks (Kelley and Ideker, 2005; Tong et al., 2001; Tong et al., 2004; Ulitsky and Shamir, 2007). Gene products within a pathway or complex are expected to physically interact more often than the products of genes in parallel pathways. This type of analysis suggests that between-pathway models account for roughly three and a half times as many yeast synthetic lethal genetic interactions compared to within-pathway models. This outcome is likely a reflection of the use of non-essential genes to identify synthetic lethal interactions in the SGA experiments. Conversely, alleviating interactions are more likely to overlap with physical interactions because they are more likely to connect genes within the same pathway (Collins et al., 2007; Onge et al., 2007; Schuldiner et al., 2005). Therefore, a superimposed network can reveal the functional relationship between two genes in a given network Integrating Functional Data Predicts Genetic Interactions Zhong and Sternberg took a significantly different approach to combine multiple data types into a network (Zhong and Sternberg, 2006). They derived a genetic interaction prediction algorithm to incorporate multiple types of functional data from several species to provide probabilities of interaction for each pair of genes in the C. elegans genome. They predicted 18,183 functional interactions among approximately 2,254 genes and found that the predicted network had a modular nature such that it consisted of groups of interconnected genes that 32

45 belong to individual signal transduction pathways or protein complexes. For example, the EGF pathway clustered together on the network. Forty-nine of seventy predicted interactions with let-60/ras were experimentally tested revealing 12 novel interactions. An additional 17 of the predicted interactions were previously reported. Therefore, at least 29 of 87 predicted interactions were true positives. This study demonstrates that integrated functional data can be used to predict genetic interactions. By extension, because genetic interactions are indicative of gene function, integrated functional networks will be largely informative in predicting gene function. In summary, C. elegans -related datasets representing diverse functional relationships have been integrated to create superimposed networks. These networks aid in the characterization of gene function by strengthening true-positive relationships between genes, identifying groups of genes with shared function, distinguishing within-pathway from between-pathway relationships, and predicting genetic interactions. Possible improvements to the aforementioned analyses are the inclusion of genetic interactions and the consideration of non-supported links as potential true-positive relationships Signal Transduction in C. elegans Signal Transduction Pathways are Ideal for Investigating Genetic Interactions in Metazoans A signal transduction pathway consists of a series of biochemical reactions that transduce a signal into regulation of a biological process. Typically, a signal from an extracellular cue is transmitted through sequential activation and/or inhibition of downstream molecules that ultimately alters the development or physiology of a cell. Herein, I describe the seven signal 33

46 transduction pathways that I investigated to create a metazoan network of genetic interactions. These are the Insulin, Epidermal Growth Factor (EGF), Fibroblast Growth Factor (FGF), Notch, Transforming Growth Factor Beta (TGF-β), Wingless (Wnt), and DNA Damage Response (DDR) pathways. These are well-conserved, well-studied pathways that are involved in the development of metazoans and disease in humans. A common theme in many of these pathways is that different components of the pathways are used in a plastic manner to create diverse outcomes. Furthermore, extensive crosstalk exists within the global system of signal transduction to regulate the spatial and temporal control of developmental events. As such, identifying interactions among the genes in these pathways could lead to a better understanding of conserved biology. Specifically, investigation of genetic interactions among signal transduction pathways in C. elegans may: 1) identify and aid in interpretation of relationships between genes within and among signal transduction pathways; 2) reveal the function of previously uncharacterized genes in signal transduction; 3) uncover unidentified functions of previously characterized genes in signal transduction; 4) reveal new insight into animal development and disease; and 5) result in a densely connected network of genetic interactions with which genetic network properties can be studied Insulin Signal Transduction In mammals, the insulin pathway regulates ageing, glucose homeostasis, lipid metabolism, cell growth, cell proliferation, and proteotoxicity (reviewed in Broughton and Partridge, 2009; Cohen and Dillin, 2008; Saltiel and Kahn, 2001). Disruption of the pathway has been implicated in obesity, diabetes, Alzheimer s disease, and neoplasia. In C. elegans, the insulin- 34

47 like pathway regulates glucose homeostasis, lifespan, ageing, and the dauer life stage, which is described below (Nelson and Padgett, 2003). Under ideal conditions worms progress from embryogenesis through four sequential larval stages of development called L1, L2, L3 and L4, to adulthood (Figure 1-5). This process takes approximately 65 hours at 20 C and the average worm lives for 18 days in ideal conditions (Brenner, 1974; Byerly et al., 1976; Kenyon et al., 1993). Dauer, meaning enduring in German, diapause is an alternative L3 stage that worms enter in response to stressful conditions such as lack of food, high temperature, and high population density (Cassada and Russell, 1975). Worms can exist as dauer larvae for three months; however, reintroduction of food to the worms or alleviation of stressful conditions at any time during dauer arrest will cause the worm to re-enter the life cycle as an L4 worm. The insulin-like pathway is activated when an extracellular insulin ligand, such as DAF-28, binds to the DAF-2 transmembrane receptor tyrosine kinase (reviewed in Fielenbach and Antebi, 2008; Nelson and Padgett, 2003). The ligand-bound DAF-2 receptor dimerizes and autophosphorylates itself to activate the PI3 kinase AGE-1. In turn, the PI3 kinase activates PDK-1, AKT-1, and AKT-2 which ultimately inhibit the forkhead transcription factor, DAF-16, from entering the nucleus to regulate transcription of effector genes. The insulin-like pathway regulates dauer arrest and lifespan. For example, daf-2 mutants are Dauer Formation Constitutive (Daf-c) and daf-16 mutants are Dauer Formation Defective (Daf-d). Daf-c worms are unable to escape dauer arrest in normally favourable environmental conditions and Daf-d worms are unable to enter dauer arrest in unfavourable conditions. Moreover, on average, daf-2 mutants live twice as long as wild type worms (Kenyon et al., 1993). 35

48 A family of 38 insulin-like ligands and only one insulin receptor (DAF-2) have been identified in C. elegans, suggesting extensive redundancy exists in the worm insulin-like pathway (Fielenbach and Antebi, 2008; Kimura et al., 1997; Li et al., 2003; Nelson and Padgett, 2003; Pierce et al., 2001). A systematic investigation of genetic interactions could provide insight into how the insulin-like pathway is regulated by these ligands. Moreover, the insulin ligands have been shown to act in a non-autonomous manner, therefore in vivo investigation of insulin signalling in the context of a whole animal could potentially reveal new genetic interactions that may not have been identified by other means, such as cell culture experiments (Apfeld and Kenyon, 1998; Nelson and Padgett, 2003). These characteristics, along with its roles in disease and ageing make the insulin-like pathway an ideal candidate for inclusion in a study of metazoan genetic interactions FGF Signal Transduction In developing mammals, the FGF pathway regulates guided cell migration, differentiation, and proliferation (Eswarakumar et al., 2005). As such, the FGF pathway has been demonstrated to play key roles in skeletal dysplasia and various human cancers (Eswarakumar et al., 2005). In C. elegans, the FGF pathway regulates multiple processes including fluid homeostasis, guided cell migration, axon outgrowth, muscle arm extension, and axon maintenance (Borland et al., 2001; Bülow et al., 2004; DeVore et al., 1995; Dixon et al., 2006; Huang and Stern, 2004; Roubin et al., 1999). The FGF pathway regulates these functions with only one receptor and two ligands. Therefore, it must coordinate multiple signalling events to regulate diverse processes in different spatio-temporal contexts. For example, the egl-15 (for egg laying defective) FGF receptor in C. elegans was first characterized for its role in guiding sex 36

49 Dauer Figure 1 5. C. elegans Life Cycle. The C. elegans life cycle takes approximately 3 days at 20 C. Worms progress from embryogenesis through four larval stages into adulthood. Stressful conditions such as lack of nutrients or an overcrowded environment prompts worms to enter an alternative L3 stage called dauer diapause. Entry into and out of dauer diapause is regulated by the Insulin and TGF β signal transduction pathways. 37

50 myoblasts to their final positions in response to one of the two FGF ligands, egl-17 (DeVore et al., 1995). The other known FGF ligand, let-756 (for lethal), functions to guide axon outgrowth, regulate fluid homeostasis, and guide muscle arm extension (Bülow et al., 2004; Dixon et al., 2006; Huang and Stern, 2004). In addition to their individual roles in cell guidance and axon outgrowth, a genetic interaction identified a shared role of the two FGF ligands in muscle proteolysis (Szewczyk and Jacobson, 2003). Investigation of FGF signalling in C. elegans has also revealed that it regulates diverse processes in coordination with other signal transduction pathways and with the use of alternate downstream components. For example, FGF signalling in C. elegans negatively regulates sex myoblast differentiation, which is positively regulated by phosphatidyl-inositol 3'-kinase (PI3 kinase) signalling (Sasson and Stern, 2004). The antagonistic relationship between these two pathways is conserved in the regulation of vertebrate myogenesis (Itoh et al., 1996; Jiang et al., 1998; Sasson and Stern, 2004). The canonical FGF signal transduction pathway is activated upon extracellular ligand binding to the FGF receptor-tyrosine kinase (reviewed in Borland et al., 2001). Ligand-bound receptors dimerize and phosphorylate multiple tyrosine residues on the receptors themselves. These phosphorylation events are necessary for interaction with an adaptor protein such as SEM- 5/GRB-2, which recruits the downstream guanine nucleotide exchange factor SOS-1/Son of Sevenless. SOS-1 then activates the small GTPase LET-60/RAS which in turn initiates a mitogen-activated protein kinase (MAPK) mediated signal transduction cascade (Moghal and Sternberg, 2003). In addition, a non-canonical kinase-independent function of egl-15 in axon maintenance was found with a kinase-dead form of the EGL-15 receptor that has red fluorescent protein 38

51 sequence in the place of the kinase domain (Bülow et al., 2004). The authors of this study suggest that interaction with neural cell adhesion molecules (NCAMs) or an alternate FGF pathway may mediate axon adhesion. Therefore, an investigation of genetic interactions between FGF pathway components and other signal transduction pathways may reveal new mechanisms of FGF signalling that include alternate downstream molecules and coordination with other pathways EGF Signal Transduction Similar to the FGF pathway, the EGF pathway is also implicated in many human cancers (Kalyankrishna and Grandis, 2006; Lo et al., 2006). In C. elegans, the EGF pathway controls many aspects of larval development including vulva development, uterine development, excretory cell development, and ovulation (reviewed in Sternberg and Han, 1998; Sternberg et al., 1995; Sundaram, 2006). The pathway is regulated by one EGF ligand (LIN-3) and one EGF receptor (LET-23); as such, the activity of the EGF pathway must be coordinated in a spatialtemporal context with the use of diverse signal transduction molecules (Aroian et al., 1990; Hill and Sternberg, 1992). For example, LIN-3 signals through LET-23 to regulate vulval and uterine development through activation of the RAS/MAPK pathway as described above for FGF signalling (Aroian et al., 1990; Chang et al., 1999; Han et al., 1990; Hill and Sternberg, 1992). In addition, LET-23 regulates ovulation in a let-60 independent manner through inositol triphosphate (IP3) signalling (Clandinin et al., 1998). Interestingly, the existence of a third EGF pathway has been proposed because the lethality of lin-3 mutants is not rescued by overexpression of downstream LET-23 or LET-60 (Liu, 1999). Genetic interactions between EGF components and other signal transduction genes may shed light on this alternate pathway. 39

52 Notch Signal Transduction The Notch pathway mediates cell-cell interactions through a physical interaction between the membrane bound Notch ligand (Delta) and the transmembrane Notch receptor (reviewed in Lai, 2004). Physical interaction between the ligand and the receptor ultimately results in cleavage of the intracellular portion of the receptor, which is then free to enter the nucleus and repress inhibition of the CSL transcription factor. In this way, neighbouring cells coordinate cell fate decisions. In mammals, Notch has a role in the cell fate of cells in all three germ layers (Bolos et al., 2007). Disruption of the mammalian Notch pathway results in cancer and cardiovascular disease (Gridley, 2007; Koch and Radtke, 2007). Notch specifies diverse cell fate decisions by coordinated activity of Notch components and other signal transduction pathways. In C. elegans, there are two Notch receptors, GLP-1 and LIN-12, and ten Notch ligands, including LAG-2, APX-1, and DSL-1 (Chen and Greenwald, 2004; Lambie and Kimble, 1991). Genetic interactions have been identified between the three aforementioned ligands in vulval cell fate induction (Chen and Greenwald, 2004). Moreover, there is coordination between the EGF pathway and the Notch pathway to regulate vulval cell fate (Figure 1-6) (reviewed in Sternberg, 2005). There are six vulval precursor cells, named P3.p to P8.p that are equally capable of developing one of three vulval cell fates, termed primary, secondary, and tertiary. The anchor cell of the developing gonad sends an EGF signal to P6.p that activates LET-23-mediated signalling and primary cell fate. The EGF signalling cascade up-regulates the DSL-1 ligand and down-regulates the LIN-12 receptor. As such, the P5.p and P7.p neighbouring cells that express LIN-12 interact with the DSL-1 on P6.p, which initiates secondary cell fate in P5.p and P7.p and down-regulates EGF signalling. This example 40

53 highlights that much can be learned by investigating the function of genes that coordinate cell fate in a non-autonomous manner in the context of a whole animal Wnt Signal Transduction The Wnt pathway is repeatedly employed throughout the development of most animals examined to date. In mammals, the Wnt pathway regulates a multitude of processes in embryogenesis, tissue homeostasis, and tumorigenesis (Reya and Clevers, 2005; Schlessinger et al., 2009). There are both canonical and non-canonical Wnt pathways in flies and vertebrates that are differentiated by the use of β-catenin. In the canonical Wnt pathway, Wnt ligand binding to the Frizzled receptor and LRP co-receptor leads to protection of β- catenin from degradation, thereby allowing β-catenin to enter the nucleus and interact with the Lymphoid Enhancer binding Factor/T-Cell-specific Factor (LEF/TCF) transcription factors to regulate transcription of downstream effector genes. When the pathway is inactivated, the kinases GSK3β and CKI target β-catenin for ubiquitin-mediated degradation. The diversity of Wnt regulated functions is reflected in the large number of Wnt ligands and receptors in the pathway. In C. elegans, there are five Wnt ligands, four frizzled receptors, and four β-catenins. To date, an LRP homolog has not been identified in C. elegans. Each of the β-catenins, namely bar-1, wrm-1, and hmp-2, regulate different processes (reviewed in Hardin and King, 2008; Korswagen, 2002). The Wnt/BAR-1 pathway is most similar to the canonical Wnt pathway and regulates cell migration, male tail patterning, and various cell fate decisions. For example, the Wnt/BAR-1 pathway regulates vulval precursor cell fate in coordination with the EGF, Notch, and non-canonical Wnt pathways (Figure 1-6) (reviewed in Sternberg, 2005). The ability of the six vulval precursor cells to adopt 41

54 A. AC EGF P3.p P4.p P5.p P6.p P7.p P8.p Wnt Notch B. P5.p P6.p LIN 3 P7.p LET 23 2 DSL 1 1 DSL 1 2 LIN 12 LIN 12 Figure 1 6. Notch, EGF, and Wnt Pathways Regulate Vulval Induction. (A) Six equipotent epidermal precursor cells (P3.p P8.p) are differentially regulated to induce 1, 2, or 3 cell fates. Wnt signalling upregulates LIN 39 in all six cells, which renders each cell capable of adopting any of the cell fates. The anchor cell (AC) sends an EGF mediated d signal to P6.p, which h induces primary cell fate. Consequent upregulation of the Notch ligand DSL 1 and downregulation of the Notch receptor LIN 12 occurs in P6.p. Notch signalling mediates communication between P6.p and its immediate neighbours P5.p and P7.p. Binding of DSL 1 to LIN 12 stimulates 2 cell fate of P5.p and P7.p, which downregulates EGF signalling. Uninduced cells adopt the 3 cell fate. (B) Detailed representation of Notch signalling between P6.p and its closest neighbours in response to LET 23 signalling. See text for references. 42

55 one of the three cell fates is dependent on Wnt-mediated expression of the homeobox gene lin-39 (Eisenmann et al., 1998a). In turn, lin-39 expression is thought to be directly regulated by EGF signalling (Eisenmann et al., 1998b; Maloof and Kenyon, 1998). Moreover, the noncanonical Wnt signalling pathways play a role in specifying the fate and polarity of the posterior vulval precursor cells (Deshpande et al., 2005; Inoue et al., 2004). The existence of cross-talk between the Wnt, Notch, and EGF pathways assures that investigation of genetic interactions involving Wnt pathway components will add to the interconnectivity of the C. elegans genetic interaction network TGF-β Signal Transduction In C. elegans, there are three TGF-B signal transduction pathways: 1) the Dauer pathway regulates the decision to enter dauer; 2) the Sma/Mab pathway regulates body morphology including body size and male-tail patterning; and 3) the unc-129 pathway regulates axon guidance in a non-canonical manner (reviewed in Savage-Dunn, 2005). Conventional TGF-β signalling is presented here with the Sma/Mab pathway as an example. The pathway is activated when the dbl-1 ligand binds two transmembrane serine/threonine kinases sma-6 and daf-4. These are type I and type II receptors, respectively. Tetramerization of the receptors activates phosphorylation of the type I receptor by the type II receptor. The type I receptor then phosphorylates receptor-regulated (R) Smads that physically interact with Co- Smads to regulate gene transcription in the nucleus. Inhibitory (I) Smads that interfere with signal transduction are also a part of the TGF-B pathway. This pathway is a potentially valuable addition to the study of metazoan genetic interactions because it functions in a considerably different manner than the RTK pathways. Its study may 43

56 therefore result in a different view of signal transduction. Moreover, there is evidence for cross-talk between TGF-β and both the Notch and Wnt pathways, which could potentially add to the interconnectedness of a genetic interaction network (Reviewed in Guo and Wang, 2009). For example, physical interactionshave been identified between Smads and the Wnt LEF1/TCF transcription factors, which synergistically activate transcription of Xtwn, a homeobox gene that specifies the dorsal axis in Xenopus (Labbe et al., 2000; Laurent et al., 1997; Nishita et al., 2000) DNA Damage Response Signal Transduction Finally, the DNA damage response pathway (DDR) was included in this investigation as an example of a pathway that is not activated by an extracellular ligand. The DNA damage response pathway is essential in maintaining the integrity of the genome (reviewed in O'Neil, 2006; Zhou and Elledge, 2000). In response to DNA damage in germline cells, the DDR pathway can elicit repair, cell cycle arrest, and apoptosis. In C. elegans, the gene rad-5/clk-2 functions to regulate a DNA-damage checkpoint (Ahmed et al., 2001; Gartner et al., 2000). In response to DNA damage, rad-5 mediates a cell cycle arrest that allows the cell to either repair the DNA or undergo apoptosis. Disruption of the pathway has grave consequences for disease, development, and evolution as cells with DNA damage are allowed to live and proliferate. In summary, several lines of reasoning suggest that a genetic interaction network describing the aforementioned signal transduction pathways could provide significant insight into metazoan biology. First, the pathways are evolutionarily conserved and implicated in development and disease. Therefore, revealing gene function and interactions between 44

57 components of these conserved pathways will advance our understanding of animal biology. Second, the pathways share multiple downstream effectors. For example, three of the pathways are receptor tyrosine kinase (RTK) pathways that signal through a RAS/MAPK pathway (Figure 1-7) (Schlessinger, 2000). Therefore, their inclusion will likely lead to a highly interconnected network of genetic interactions. Third, many of the pathways coordinate to regulate the same processes such as muscle differentiation and vulval cell fate decisions (Berset et al., 2001; Chen and Greenwald, 2004; Sasson and Stern, 2004; Shaye and Greenwald, 2002; Yoo et al., 2004). Many processes are co-regulated by unknown signal transduction pathways whose identities could be revealed in a genetic interaction network. Fourth, all of the pathways, with the exception of the DDR pathway, are responsible for signal transduction from the plasma membrane. Therefore, they consist of extracellular signalling events such as non-autonomous signalling that could potentially be revealed by studying genetic interactions in the context of the whole animal. Finally, the DDR pathway does not transduce a signal from the plasma membrane. As such, it could reveal a different pattern of genetic interactions from the other signal transduction pathways Conclusion Genetic interactions reveal functional relationships between genes and pathways. Synthetic genetic interaction screens in yeast have been developed to identify genetic interactions that describe diverse types of functional relationships in a high-throughput manner. The analysis of these interactions has advanced our understanding of both individual gene function and systems biology. Moreover, yeast SGA has inspired the development of a similar high- 45

58 L i g a n d P RTK RTK P SEM 5 SOS 1 LET 60 MAPK signalling Signal Transduction Pathway FGF Wnt Notch EGF Insulin Function Mediated by RAS/MAPK Signalling Axon Guidance Fluid Homeostasis Sex Myoblast Migration Ectoblast Fate Vulval Cell Fate Vulval Cell Fate Vulval Cell Fate Uterine Cell Fate Male Spicule Cell Fate P12 Ectoblast Fate Dauer Formation Ageing? Olfaction Progression of GermlineMeiosis Figure 1 7. The RAS/MAPK Pathway Regulates Diverse Processes Through Multiple Signal Transduction Pathways. (A) Canonical receptor tyrosine kinase signal transduction pathways such as the FGF and EGF pathways signal through the MAP kinase pathway. Ligand binding to a transmembrane receptor tyrosine kinase stimulates receptor dimerization, phosphorylation and signal transduction through adaptor proteins to activate RAS/MAPK signalling. g (B) The FGF, Wnt, Notch, EGF, and Insulin pathways all signal through RAS/MAPK to regulate multiple developmental processes. In some cases, these developmental processes are coordinately controlled by non RAS mediated signalling. For example, Insulin also regulates dauer formation and ageing through the PI3 Kinase mediated pathway. 46

59 throughput approach to investigate whether genetic network architecture is conserved in metazoans and the connectivity of processes conserved with higher organisms, such as signal transduction pathways implicated in multicellular development and disease. Finally, C. elegans possesses many qualities that make it the ideal animal model system for investigations requiring high-throughput analysis. These include the RNAi library, which targets 80% of the genes in the C. elegans genome for individual down-regulation and the existence of multiple conserved signal transduction pathways important to development and disease (Kamath et al., 2003). I set out to address four specific questions: 1) Can genetic interactions be identified in a metazoan on a large scale to add functional annotation to genes involved in signal transduction? 2) Can network topology be used to assign function to genes in a metazoan? 3) How do genetic interactions contribute to our understanding of biological systems? 4) Are genetic interactions conserved between S. cerevisiae and C. elegans? To address each of these aims, I developed a novel approach towards a global analysis of genetic interactions in the nematode Caenorhabditis elegans that I call Systematic Genetic Interaction analysis (SGI). SGI relies on targeting one gene by RNAi in a strain that carries a mutation in a second gene of interest. Using SGI analysis, I identified 1246 interactions between 461 genes, which is the largest metazoan genetic interaction network reported to date. I present several lines of evidence showing that the SGI network meets or exceeds the quality of other large-scale interaction datasets. Analysis of the SGI network reveals new functions for both uncharacterized and previously characterized genes, as well as new links between well-studied signal transduction pathways. I integrated the SGI network with other networks and found that synthetic genetic interactions are orthogonal to other types of 47

60 functional data and typically bridge different subnetworks, revealing redundancy between functional modules. Finally, I provide evidence that the properties of the C. elegans synthetic genetic network are conserved with S. cerevisiae, but the network connectivity of the interactions differs between the two systems. Thus, SGI analysis not only reveals novel gene function, but also contributes to our understanding of genetic interaction networks in an animal model system. 48

61 Chapter 2 Constructing a Network of Synthetic Genetic Interactions in C. elegans The work in this chapter was carried out by me except in the following cases. Jason Moffat and Scott Dixon collaborated on the pilot screens and initial SGI experiments. Victoria Wong is a technician that helped to carry out some of the repetitive tasks of gathering the SGI data under my supervision. The analysis of the SGI data was carried out in collaboration with Matthew Weirauch, and Martina Koeva at the University of California, Santa Cruz, under the guidance of their supervisor Josh Stuart and mine (Peter Roy). Therefore, I use the plural pronoun we when referring to work done by the collaboration. I have included detailed methods of the data analysis in the materials and methods section, even though it was largely implemented by my bioinformatics collaborators. The work in this chapter has been published as: Byrne, A., Weirauch, M., Wong, V., Koeva, M., Dixon, S., Stuart, J. and Roy, P. (2007). A global analysis of genetic interactions in Caenorhabditis elegans. J Biol 6, 8.

62 Chapter 2. Constructing a Network of Synthetic Genetic Interactions in C. elegans Abstract Understanding gene function and genetic relationships is elemental in our efforts to better understand biological systems. As outlined in Chapter 1, previous studies to systematically describe genetic interactions on a global scale have either focused on core biological processes in yeast or surveyed catastrophic interactions in metazoans. I developed a reliable highthroughput approach capable of revealing both weak and strong genetic interactions in the nematode Caenorhabditis elegans. The approach is called Systematic Genetic Interaction (SGI) analysis and is presented herein Introduction Before the C. elegans genome was sequenced, functional characterization of the worm was relatively sparse. However, the publication of a largely complete genome sequence at the end of 1998 provided the opportunity to rapidly augment the characterization of the estimated 19,099 genes (C. elegans Sequencing Consortium, 1998). The sequence opened up the omic era of C. elegans research. Immediately, it was noted that 42% of the predicted genes had some semblance to non-nematodea genes (C. elegans Sequencing Consortium, 1998), providing functional hypotheses for many genes. Functional characterization was expanded from classical mutant analysis to analyses of genome-wide gene expression profiles (Kim et al., 2001), highthroughput yeast-two-hybrid studies (Li et al., 2004; Tewari et al., 2004), and perhaps most 50

63 importantly, genome-wide RNAi analysis (Kamath et al., 2003; Simmer et al., 2003). While these analyses provided a wealth of additional functional annotation, these approaches did not directly address the question of how genes work together to carry out a developmental plan or to sustain life. Specifically, when I began work on my thesis project, a large-scale study of genetic interactions had not yet been carried out to directly address this question. In fact, of the 212,262,408potential genetic interactions within the genome of 20,604 genes (20,604 2 /2), only 2,620 genetic interactions had been uncovered (Wormbase, Release WS170). The paucity of known genetic interactions prompted the first aim of my thesis: Can genetic interactions be identified in a metazoan on a large scale? To complement the work done in yeast on core biological processes, I wanted to explore genetic interactions among processes conserved with higher order multicellular organisms. Specifically, I sought to create a network of genetic interactions in C. elegans focused on signal transduction pathways of universal importance to animal development and pathogenesis. The efficiency of yeast SGA inspired the development of a similar systematic approach to identify genetic interactions in C. elegans (Tong et al., 2001; Tong et al., 2004). The tools that accompany S. cerevisiae as a model system, such as the deletion library, make it ideal for genome-wide analyses of genetic interactions in a single cell organism. In contrast, there is no comprehensive collection of null mutants in any animal model system. For C. elegans, the Knockout consortium (Moerman and Barstead, 2008) has aimed to create knockouts of all 20,604 (Wormbase, Release WS170) worm genes. At present, 7,000 deletions in 5,500 genes have been generated by the Knockout Consortium and individual researchers. The consortium expects to have deletion alleles of 10,000 genes by The knockout collection is an invaluable resource for the C. elegans community; however, there are still 51

64 disadvantages to using these alleles in a large-scale, high-throughput manner. First, although approximately one-quarter of worm genes are represented in the collection, it is doubtful that corresponding knockout alleles exist for a list of genes of interest. Second, even if there were readily available deletions for large numbers of genes within a group of interest, there is no straightforward way to create double mutants in a high-throughput manner. Therefore, it was clear that to simultaneously disrupt the activity of pairs of genes on a large scale, it would be more feasible to make use of the RNAi library described in Chapter 1 to feed individual RNAi-inducing bacterial strains to worms carrying a mutation in a second gene. Such an approach would allow for efficient disruption of the function of two genes simultaneously. Moreover, since RNAi does not completely disrupt the function of a gene, the use of RNAi to uncover synthetic genetic interactions between signal transduction genes could theoretically reveal functional interactions both between and within signal transduction pathways Results Genes Chosen for Interaction Analyses Represent Multiple Signal Transduction Pathways To better understand how genes interact on a systems-wide scale, I aimed to test genes for interaction with components of conserved signal transduction pathways in a high-throughput manner. To test for genetic interactions between gene pairs, worms harbouring a weak lossof-function mutation in a query gene were fed dsrna targeting a second target gene for reduction-of-function by RNAi (Fire et al., 1998; Kamath et al., 2003; Timmons and Fire, 1998). The approach is called worm Systematic Genetic Interaction analysis (SGI) and is presented in detail below (Figure 2-1). 52

65 A. B. RNAi Inducing Bacteria Mutant Worms C. RNAi wildtype daf 2 (e137 70) sem 5 (n20 019) sos 1 (cs41 1) negative control ist let hus Figure 2 1. Synthetic Genetic Interaction (SGI) Analysis in C. elegans. (A) Two scenarios that may result in synthetic interactions are presented. On the top, enhancing interactions may arise when hypomorphic lossof function worms (mutant), which have reduced but not eliminated function of a gene, are fed RNAi that targets another gene in the same essential pathway. On the bottom, synthetic interactions may arise when a hypomorph and a gene targeted by RNAi are in parallel pathways that regulate an essential process (X). (B) An outline of the SGI experimental approach. RNAi inducing bacteria that target a specific C. elegans gene for knock down k (target t gene A ) are fdt fed to a hypomorphic hi mutant t( (query gene B ). In parallel, l wildtype worms are fed the experimental RNAi inducing bacteria (control 1), and the query mutant is fed mock RNAi inducing bacteria (control 2). This is all done in 12 well plate format with at least three technical replicates. Over the course of several days, the number of progeny produced in each experimental and control well are estimated (see text and methods). Each population of worms is assigned a growth score from 0 6 (0, 2 parental worms; 1, 1 10 progeny; 2, progeny; 3, progeny; 4, progeny; 5, 200+ progeny;and 6, overgrown). (C) Ultimately, interacting gene pairs are inferred through a difference in the population growth scores between experimental and control wells. In the example shown, a global analysis of the experimental and control query target combinations revealed that daf 2 interacts with ist 1, and sem 5 and sos 1 interact with let

66 Weak loss of function alleles were chosen based on three criteria: 1) they are components of conserved signal transduction pathways; 2) mutants grow well at C for ease of manipulation; and 3) they are, preferably, weak hypomorphs of genes whose complete loss of function resulted in lethality. The latter criterion was inspired by investigation of genetic interactions in yeast with conditional alleles of essential genes (Davierwala et al., 2005; Mnaimneh et al., 2004). This approach demonstrated that essential genes partake in a higher proportion of genetic interactions than do non-essential genes. Moreover, by including hypomorphs, we sought to investigate both enhancing and synthetic lethal genetic interactions (Figure 2-1). Eleven query genes were chosen for the SGI analysis (Table 2-1). Ten of the query genes belong to one of six signalling pathways specific to metazoans, including the insulin, epidermal growth factor (EGF), fibroblast growth factor (FGF), wingless (Wnt), notch, and transforming growth factor beta (TGF-β) pathways (Table 2-1). The eleventh query gene, clk-2, is a member of the DNA damage response (DDR) pathway. All of the query genes but clk-2 mediate signal transduction from the plasma membrane. Thus, clk-2 was included in my analysis as an example of a gene not involved in the transduction of a signal from the plasma membrane. This gene was hypothesized to interact with different types of signal transduction genes than would the other queries, thereby acting as a negative control for interactions with genes from the plasma membrane. To build a network of genetic interactions in C. elegans, I systematically tested genetic interactions between the chosen 11 query genes and 858 target genes (Additional Data File 1). The 858 target genes consist of 372 genes that are likely involved in signal transduction from the plasma membrane based on annotation in Proteome (Costanzo et al., 2000), and 486 genes 54

67 Query Gene Ortholog (Pathway) a Null/ Strong LoF b Phenotype(s) Hypomorphic c Phenotype(s) let-756 FGF (FGF) early larval arrest (s2887) scrawny, Slo (s2613)** egl-15 FGF receptor (FGF) early larval arrest(n1456) scrawny, Egl (n1477)** let-23 EGF receptor (EGF) L1 arrest (mn23) ts Vulpleotropic (n1045)** daf-2 sem-5 sos-1 let-60 glp-1 bar-1 sma-6 Insulin Growth Factor receptor (Insulin) GRB-2 (EGF, FGF, Insulin) Guanine nucleotide exchange factor (EGF,FGF) RAS (EGF, FGF, Insulin, Wingless/Wnt) Notch receptor (Notch) β-catenin (Wingless/Wnt) Type I TGF-β receptor (TGF-β) Emb (e979) L1 arrest (leaky) (n1619) Emb (s1031) mid-larval lethal (leaky) (s1124) ts Emb (gp60) Mig, Vul, Pvl (ga80)** Sma, Mab (wk7) clk-2 Tel-2p (DNA damage response) unknown Table 2-1. A Summary of the Query Genes. ts Daf-c (e1370)** Egl, Vul (n2019)* ts Egl, Vul (cs41)* Egl, Vul (n2021)* ts Emb, Glp. Muv (or178)* Mig, Vul, Pvl (mu63) Sma (e1482)* Slo, Ste, ts Emb (mn159)** Refs (Roubin et al., 1999) (Goodman et al., 2003) (Moghal and Sternberg, 2003) (Nanji et al., 2005) (Horvitz and Sulston, 1980; Moghal and Sternberg, 2003) (Chang et al., 2000) (Han et al., 1990; Han and Sternberg, 1990) (Austin and Kimble, 1989) (Eisenmann et al., 1998b) (Savage-Dunn et al., 2003) (Ahmed and Hodgkin, 2000) a The ortholog refers to the canonical ortholog, whether it be in yeast, flies, mice or humans. The pathway to which the ortholog belongs is in brackets. b If known, the null or strong loss-of-function (LoF) phenotype is shown. c Weak loss-of-function (hypomorphic) phenotypes are shown for representative alleles. The phenotypic acronyms are as follows: Emb, embryonic lethal; Daf-c, dauer formation constitutive; Slo, slow growth; Egl, egg-laying defective; Vul, vulvaless; Glp, germ-line proliferation defects; Muv, multivulva; Lin, lineage defects; Mig, cell and/or axon migration defects; Pvl, protruding vulva; Unc, uncoordinated movement; Sma, small body; Mab, male tail abnormal; Ste, sterile. The alleles used in this study are marked with two asterisks if used as a query against both the signalling targets and the LGIII targets, or just a single asterisk if used only against the signalling targets. 55

68 from linkage group III from which new signalling genes might be identified. I will henceforth refer to these groups of genes as the signalling targets and the LGIII targets, respectively. An analysis of the LGIII set suggests that the 486 genes are random with respect to known functional categories (p>0.05) (Materials and Methods and Additional Data File 2). All of the queries were tested against the signalling targets and six of the queries representing five pathways were tested against the LGIII targets (Table 2-1) Construction of a Matrix of Synthetic Growth Scores To systematically test for genetic interactions between query-target pairs, two P 0 L4-staged worms harbouring a weak loss-of-function mutation in a query gene were placed in each well of a 12-well plate seeded with RNAi-inducing bacterial strains that targeted a second (target) gene for loss-of-function (Figure 2-1 and Materials and Methods 2.4.2, 2.4.3). I estimated the number of progeny resulting from each query-target combination over the course of several days as the progeny matured and assigned each well a score from zero to six (0, 2 parental worms; 1, 1-10 progeny; 2, progeny; 3, progeny; 4, progeny; 5, 200+ progeny; and 6, overgrown). Therefore, wells containing no progeny received a score of zero, whereas wells over-grown with progeny were given a score of six. The range of worm numbers represented by each score were chosen as they can easily be distinguished from one another. For example, the difference between two wells containing ~30 worms and ~80 worms respectively is more identifiable than the difference between two wells containing 60 and 70 worms respectively. Each query-target pair was tested at least in triplicate. I then compared these counts to controls. I expected that if the query and target interacted, the resulting number of progeny would be lower than wild-type (N2) worms fed the target RNAi (control 1) 56

69 and the query mutant worms fed mock-rnai (control 2). The ability of the SGI approach to identify genetic interactions was investigated by myself and others in the lab with a pilot test that focused on the insulin pathway (100% success rate, Figure 2-2). Other pilot tests included the EGF (56% success rate), FGF (63% success rate), and Wingless (18% success rate) pathways, providing proof-of-principle that this approach can reveal genetic interactions High Throughput Digital Imaging For many of the experiments, a high throughput digital imager (HiDI) was used to record the results (Elegenics Inc.). The HiDI collects pictures, numbers, and growth-stages of the worms in each well of a multi-well plate. I worked with the creators of the HiDI, Ed Houston and Al Howard, to optimize its performance. Although the number of worms in each well was recorded for these experiments, experimental outcomes were nonetheless scored by eye to maintain consistency with the previous experiments that were scored at a microscope. For each query-target pair documented without the HiDI, I recorded the visible phenotypes of the worms (Table 2-2). Recording visible phenotypes, in addition to slow growth and lethality, of a given query-target pair was done with the intent of facilitating an understanding of the function of two interacting genes. The HiDI was also used to archive pictures of the worms so that one could examine visible phenotypes long after an experiment had been performed. While analysis of the recorded phenotypes was not carried out, this archive remains a resource that could provide further information on an interaction of interest. 57

70 A. B. C Scale: # of Worms: >250 O/G daf 2(e1370); ø(rnai) daf 2(e1370); daf 28(RNAi) Figure 2 2. SGI Proof of Principle Assay: Insulin Pathway. (A) The Insulin pathway. (B) Known components of the insulin pathway were targeted by RNAi in either an N2 or a daf 2 mutant background. The number of worms in each well were counted and compared to N2 fed the respective dsrna and to daf 2 fed negative control dsrna. O/G is overgrown. daf 2 worms grew slower than N2 worms when fed each component of the Insulin pathway, revealing genetic interactions between daf 2 and each of these genes. C) Representative pictures of an overgrown (healthy) plate and a plate containing few (unhealthy) worms. 58

71 S E G L D B A O C All sterile F1 adults Many embryos dead Any slow growth phenotype Many dead larvae Many worms in dauer Low F1 brood size Adult gonad malformed (egl, muv, pvl, rup) Other visible phenotype (eg. unc, dpy, vab) Plate contaminated Table 2-2. Phenotype Analysis. Worms resulting from interaction analysis were scored for nine phenotypes. Plates that were contaminated were not scored for phenotype in order to avoid recording an observation that was a secondary consequence of the contamination Objectively Identified Interactions were Used to Construct the SGI Network I developed an unsupervised computational method with the help of bioinformaticians Martina Koeva and Josh Stuart at the University of California, Santa Cruz, to objectively determine which query-target pairs genetically interact based on reproducibility and the nature of the population scores. First, the target genes plus control 1 (wild-type worms fed the target RNAi) were arrayed on one axis, and the query genes plus control 2 (query mutant worms fed mock-rnai) on the other axis, to create a matrix of 56,347 scores that included all experimental replicates over several days. I then identified six different attributes that could be mined to infer a unique 59

72 set of genetic interactions from the matrix. These attributes include the reproducibility of scores among technical replicates, the consistency of scores over each day of observation, and the difference in the scores between the experimental gene pair and controls (Materials and Methods 2.4.4). By varying selection parameters for each attribute, 51 unique variant sets of interactions or networks were identified (Figure 2-3A and Materials and Methods 2.4.5). To identify the network variant that maximized the number of likely true positives but minimized the number of likely false positives, interacting pairs that share the same Gene Ontology (GO) biological process (Ashburner et al., 2000) were first identified (Materials and Methods 2.4.5). The recall (analogous to sensitivity) for each variant was calculated by dividing the number of co-classified interacting pairs by the number of all possible co-classified pairs within the variant. Similarly, precision (analogous to specificity) was calculated by dividing the number of co-classified interacting pairs by the total number of interacting pairs in the variant. A variant with high recall and low precision is likely to have good recovery of all possible co-classified genetic interactions but its low stringency will result in a high number of false positives. On the other hand, a network with low recall and high precision will have a low number of false positives but may have a greater number of false negatives. As is evident from the recall and precision plot (Figure 2-3A), there are several network variants with high recall and precision values. The significance of the extent to which each variant network links genes in the same GO biological process was estimated using the hypergeometric distribution (Materials and Methods 2.4.5). Henceforth, I denote p-values calculated using the hypergeometric distribution with hg. The most significant variant contains 656 unique interactions among 253 genes (p< ) hg and has a precision and recall of 42% and 16%, respectively. The next best variant (p< ) hg contains nearly twice as many interactions (1246) 60

73 A Precision Recall B. Figure 2 3. The SGI Network. (A)The precision and recall of the 51 unique network variants, as calculated with respect to GO Biological Process annotation (see Methods). The higher confidence variant (green) and the SGI variant (pink) are highlighted. (B) The SGI network contains 1,246 unique synthetic genetic interactions, of which 833 (67%) are between a query gene and a gene in the signalling set and 413 (33%) are between a query gene and a gene in the LGIII set. Visualization generated with Cytoscape (Shannon et al., 2003). 61

74 among 461 genes, and has 10% higher recall. I chose to restrict all further analysis to the latter network in order to capture more previously uncharacterized interactions. I refer to this variant as the SGI network (Figure 2-3B and Additional Data File 3). All 656 interactions within the smaller variant are contained within the SGI network and are hereafter referred to as higher confidence SGI interactions The SGI Network Contains 1246 Synthetic Interactions Among 461 genes The SGI network contains 833 interactions between query genes and signalling targets (67%), and another 421 between query genes and LGIII targets (33%). These 1246 interactions range in strength from weak to very strong (Additional Data File 4). The strength of an interaction is calculated as the amount of difference between the growth scores and controls over all days that passed the seven criteria for identification of an interaction. All of the interactions within the SGI network are interconnected because each query gene shares interaction targets with at least one other query gene (Figure 2-3B). Each of the 1246 gene pairs within the SGI network synthetically interact by a conservative estimate as the double gene perturbation phenotypes exceed phenotypic expectations for a synthetic genetic interaction based on multiplicative models (Table 2-3 and Materials and Methods 2.5.6) (Schuldiner et al., 2005; Segre et al., 2005). One of the various criteria of the chosen SGI network is that each gene pair must result in a score of at least 2 less than either single gene-perturbation control. Calculations using my semi-quantitative scoring scheme show that the measured phenotype of the interacting gene pairs exceeds the product of the 62

75 Example 1: A+B observed A+B expected A B Method A Growth Score or 3 d a Fraction of WT (X/6) Method B b # of Worms (μ) 200 c Fraction of WT (X/250) e Example 2: C+D observed C+D expected C D Method A Growth Score d a Fraction of WT (X/6) Method B b # of Worms (μ) 150 c Fraction of WT (X/250) e Table 2-3. A detailed assessment of the nature of the SGI interactions. Both examples are hypothetical. a The experimental growth score divided by the known wild-type growth score (i.e. 6) is shown. b The average number of worms for that bin is shown. For example, a score of 4 represents between 100 and 200 worms, the average of which is 150. c The average number of worms for that bin for that experiment divided by the approximate number of wild-type worms in control wells (i.e. 250) is shown. d The expected growth score is calculated from the expected fraction of wild-type worms multiplied by 6, the expected growth score of wild-type worms. In example 1, this is 0.42 X 6 =2.5 (i.e. a growth score of 2 or 3). e The expected number of worms is calculated from the expected fraction of wild-type worms multiplied by 250, the expected number of wild-type worms in control wells. In example 2, this is 0.36 X 250 =90. 63

76 measured phenotype of the single gene-perturbation controls and are therefore bona fide synthetic genetic interactions The SGI network Correlates with Existing Functional Annotation The Precision and Recall of the SGI Network are Similar to those of Other Functional Interaction Datasets I next asked how the recall and precision of the SGI network compared to other large eukaryotic interaction networks, including: 1) a previously described C. elegans genetic interaction network (Lehner et al., 2006); 2) a C. elegans protein interaction network (Li et al., 2004); 3) a eukaryotic protein interaction network that augments the C. elegans protein interaction network with orthologous interactions from S. cerevisiae, D. melanogaster, and human protein interactions contained in BioGRID (Stark et al., 2006); 4) an mrna co-expression network constructed from C. elegans, S. cerevisiae, D. melanogaster, and human expression data (Kim et al., 2001; Stuart et al., 2003); 5) a S. cerevisiae synthetic genetic interaction network (Tong et al., 2004); and 6) a network we created based on the similarity of C. elegans RNAi-induced phenotypes (Weirauch et al., 2008; Chapter , and Materials and Methods 3.4.3). I refer to these networks as the Lehner, Li, interolog, co-expression, Tong, and co-phenotype networks, respectively. In addition, a network of fine genetic interactions was included, which consists of genetic interactions identified from low-throughput experiments that were collected from the literature by Wormbase (Wormbase). The fine genetic network excludes interactions identified solely through high-throughput analysis. The SGI network has an average precision, but a higher recall than all other datasets examined (Figure 2-4). I asked if the SGI network has a higher recall because of a pre-selection of signalling target genes, but found this not to be true. 64

77 Figure 2 4. The Precision and Recall of the SGI Network and of Other Functional Interaction Datasets. The precision and recall of interaction networks calculated with respect to GoProcess1000 (see Materials and Methods). Significance values (in brackets) were calculated using the hypergeometric distribution. The source of the networks is presented in the text, except for the SuperNet (superimposed network, see Materials and Methods). The orange dashed line indicates the precision of the fine genetic interactions extracted from WormBase. The lower dashed line indicates the precision of the interolog network (see Materials and Methods). The recall of these two datasets cannot be calculated as the number of genes that were tested cannot be ascertained. 65

78 The recall of the SGI network remains the highest of all networks examined when only the LGIII target genes are considered (recall=0.23). Together, these analyses suggest that the SGI approach is at least as proficient as other efforts that describe interactions on a large scale SGI Network Demonstrates High Concordance with a Parallel Study of Genetic Interactions Next, the SGI interactions were compared to those found in the Lehner genetic interaction network (Table 2-4) (Lehner et al., 2006). Of the 6963 gene pairs tested for interaction by SGI, 1165 were also tested by Lehner et al. Of these, 78.5% do not interact in either study. Of the 28 pairs found to interact by Lehner et al, 18 also interact in the SGI network. There are no obvious differences in the phenotypes of the 18 interacting gene pairs found in both the Lehner and SGI sets, compared to the 10 pairs found only in the Lehner set (Kamath et al., 2003). Overall, SGI identifies 64.3% of Lehner interactions and there is 98.9% concordance of the negative calls (p<10-27 ). Of the 1165 pairs tested by both screens, the SGI approach identified 222 additional interactions. The gene pairs that only interact in SGI are as likely to connect genes with shared GO annotation as are gene pairs that only interact in the Lehner network as measured by precisions of 0.66 and 0.60, respectively. These observations suggest that both approaches can identify genetic interactions with equal precision, but that SGI captures more interactions The Sensitivity of Genetic Networks Correlates with Predicted Genetic Interactions The comparison between the SGI and Lehner networks was extended by using previously computed prediction scores for C. elegans genetic interactions based on characterized physical 66

79 Type of link Number of links a Tested in SGI and Lehner analyses 1165 Negative in SGI and Lehner analyses 915 (78.5%) Positive in SGI and Lehner analyses 18 (1.5%) Positive only in SGI analysis 222 (19.1%) Positive only in Lehner analysis 10 (0.85%) Table 2-4. Comparison of SGI and Lehner Genetic Interactions. a Percentage of gene pairs tested in both SGI and Lehner analyses. interactions, gene expression, phenotypes, and functional annotation from C. elegans, D. melanogaster, and S. cerevisiae (Zhong and Sternberg, 2006). The probability scores assigned by Zhong and Sternberg for all pairs of genes in the SGI network were divided into three categories: low probability of interaction, intermediate probability of interaction, and high probability of interaction. There are roughly twice as many SGI interactions as expected in the high probability category and fewer gene pairs than expected in the low probability of interaction category (p<10-25 ) ( Figure 2-5). The high confidence SGI interactions have more high probability scores than expected compared to the whole SGI network (see Figure 2-3A), and the SGI interactions with the greatest interaction strengths (>4.4, see Additional Data File 4), have more still. The Lehner genetic interactions have the greatest number of high probability interactions relative to that expected by chance. As Lehner et al. exclusively scored catastrophic interactions (Lehner et al., 2006), this analysis suggests that the Zhong and Sternberg probability score not only reflects the likelihood of interaction, but also the strength of that interaction. Together, the comparison of SGI interactions to other observed and predicted networks further supports confidence in SGI interactions. 67

80 Figure 2 5. The SGInetwork Correlateswith Existing Functional Annotation. An independent test of the likelihood of true interactions among the Lehner and SGI genetic interaction datasets using the algorithm of Zhong & Sternberg (2006), which predicts a confidence level for a genetic interaction between any given gene pair in C. elegans. The 656 interactions of the high confidence SGI variant, along with the 229 interactions of the highest interaction strength within the SGI network are also analyzed. Each experimentally derived interacting gene pair is binned according to the confidence level predicted by Zhong & Sternberg (x axis): Low, Moderate and High confidence predictions have interaction probabilitiesof of 0 0.6, , and , respectively. The results are plotted as a ratio of the number of experimentallyidentified interacting gene pairs to the number of gene pairs expected to be in that bin by random chance (y axis). Expected counts were determined by assuming a uniform distribution across all bins for all tested gene pairs. Values within each bar show the number of observed gene pairs over the number expected by chance. The legend indicates the data source. Error bars indicate one standard error of the mean. 68

81 SGI Genetic Interactions are Reproducible and Reciprocal I assessed the reproducibility of SGI interactions by analyzing reciprocal and technical replicates. Reciprocal reproducibility was measured by interchanging the method used to down-regulate each member of selected query-target gene pairs. Interacting query-target pairs were retested by targeting the query gene by RNAi in the background of a mutated target gene. Six of the queries in the matrix were also included as RNAi targets, providing 15 gene pairs to test for reciprocity. All of the 15 gene pairs interacted in one test, and six (40%) also interacted in the reciprocal test (Table 2-5). 100% reciprocity is not expected because mutations and RNAi experiments often differ in their effects on gene function (Fraser et al., 2000; Kamath and Ahringer, 2003; Kamath et al., 2003). The technical reproducibility of the assay was also measured. For technical replicates, 15 of the target genes and six of the query genes were included in both the signalling and LGIII matrices, providing replicates for 90 query-target pairs. Of these, eight are positive and 67 are negative in both sets, yielding a technical reproducibility of 83% (75/90). Together, these results demonstrate that SGI interactions are reproducible A Functional Analysis of SGI Interactions All of the query genes included in this study except clk-2 are required in signal transduction from the plasma membrane. clk-2 was included as a query gene in the screen to gauge the specificity of SGI interactions on a global scale. I expected that clk-2(mn159) would interact with fewer signalling targets compared to signalling queries. In addition, I expected that 69

82 C54D1.6 F58A3.2 ZK792.6 C05D11.4 C14F5.5 T28F12.3 RNAi Gene bar-1 egl-15 let-60 let-756 sem-5 sos-1 C54D1.6 bar-1 N/A (E) (E) (E) (E) F58A3.2 egl (E) N/A (E) (E) (E) ZK792.6 let (E) (E) N/A (E) (E) (E) C05D11.4 let (E) N/A C14F5.5 sem (E) (E) (E) N/A (E) T28F12.3 sos (E) (E) (E) N/A Table 2-5. Reciprocal Query-Query Interactions. Hypomorphic query worms (X axis) were fed RNAi that targets query genes (Y axis) to measure the reciprocity of SGI. Average growth scores are indicated for each query(mutant); query(rnai) interaction (E). clk-2(mn159) would interact with a similar number of signalling targets compared to LGIII targets, whereas the signalling queries would preferentially interact with other signalling genes. Indeed, clk-2(mn159) interacts with half as many signalling genes compared to the average signalling query (11.0% versus 21.5%, respectively) and interacts with the fewest signalling targets overall (Figure 2-6). In contrast, let-60/ras interacts with the most signalling targets (29.2%) likely because of the pleiotropic nature of RAS in signal transduction (Sternberg and Han, 1998). The fraction of LGIII targets that interact with signalling queries is 32% less than the fraction of signalling targets that interact with signalling queries (14.7% vs. 21.5%). In contrast, the fraction of clk-2 interactions with signalling or LGIII targets is nearly identical (11.0% versus 70

83 35 Signaling (n=372) LGIII (n=486) Target Genes % daf-2 let-756 bar-1 (78,88) (101,87) (85,78) egl-15 (71,75) clk-2 (41,53) let-23 (62,40) Query Gene let-60 (109) sem-5 (92) sma-6 (81) glp-1 (76) Figure 2 sos-1 (46) Figure 2 6. Distribution of Interactions Among Query Genes. The percentage of target interactions per query gene in both the Signalling and LGIII subnetworks. The raw number of interacting target genes in each experiment (Signalling, LGIII) is shown below each bar. 71

84 10.6%, respectively). These results further support the validity of the SGI approach. Next, I further exploited the graded scoring scheme used to collect SGI data to investigate patterns of interactions within the matrix of genetic interaction tests. The strength of interaction between each tested gene pair was calculated based on the average difference between the experimental growth scores and the controls. The strength of interaction for each gene pair was then clustered in two dimensions to group queries and targets based on similar growth patterns (Materials and Methods 2.5.7). Clusters of target genes were then examined for enrichment of shared functional annotation (see Additional Data File 5 and Materials and Methods 2.5.8). The resulting clustergram reflects the characterized roles of many genes and provides evidence supporting previously uncovered relationships (Figure 2-7A). For example, the first cluster of target genes is enriched for Notch receptor-processing annotation and is clustered based on shared slow growth in a glp-1/notch receptor background (cluster A, Figure 2-7A). Similarly, a cluster of genes enriched for the establishment of cell polarity predominantly interact with bar-1/β-catenin (cluster J, Figure 2-7A). Also, a cluster of genes whose disruption results in slow growth in the background of clk-2(mn159) are enriched for induction of apoptosis annotation (cluster C, Figure 2-7A). Interestingly, genes in this group also cause slow growth when disrupted in a sma-6/tgf-β receptor background. Although well characterized in other systems (Schuster and Krieglstein, 2002), this is the first reported evidence for a functional link between the TGF-β pathway and apoptosis in C. elegans. Finally, clusters of target genes with low growth scores in the background of many query mutants have general annotations such as reproduction and ageing. This may reflect the involvement of many signalling pathways in these processes. Within all of these clusters are previously uncharacterized genes that form the basis for numerous hypotheses. 72

85 A. B. Co expression Lehner genetic interaction Protein protein interaction Query interaction Fine genetic interaction SGI genetic interaction Figure 2 7. Global Patterns of Interactions within the SGI Network. (A) Two dimensional clustergram of SGI interactions i based on average strength of interaction. i RNAi targeted genes are represented along the rows and the eleven query hypomorphs across the columns. The shades from black to yellow on the bottom scale indicate increasing interaction strength, and shades from black to light blue indicate increasing alleviating interaction strength. Alleviating interaction strengths indicate that the double reduction offunction worms grow better than controls. (B) The query network. Query genes (nodes) are linked in this network if they share a significant number of interaction partners or if there is evidence of a functional interaction (see text). Edges are colored according to the type of supporting evidence (see text and Materials and Methods for more details). Graphic generated with Cytoscape (Shannon et al., 2003). 73

86 To explore the connectivity between the EGF, FGF, Notch, Insulin, Wingless, and TGF-β signalling pathways, I analyzed the SGI data in three ways. First, I examined the clusters of query genes on the clustergram and found some expected patterns, including the grouping of the FGF receptor, egl-15, with its ligand, let-756, as well as their downstream mediator let-60/ras (Figure 2-7A). As expected, clk-2 and glp-1 do not cluster with the receptor tyrosine kinases or their downstream mediators. In contrast, sma-6 and bar-1/β-catenin are closely linked, suggesting co-operation between TGF-β and Wnt pathways as previously reported in other organisms (Labbe et al., 2007). Second, the connectivity between the signalling pathways was investigated by creating a network of query genes (Figure 2-7B and Additional Data File 3). Because six of the query mutants were also included as RNAi targets within the SGI matrix, query pairs were tested directly for interactions and 25 interactions were found among 45 pairs. In addition, the pattern of interactions were examined between each query gene and the entire set of RNAi targets. Functionally related query genes are expected to interact with an overlapping set of target genes (Tong et al., 2001; Tong et al., 2004; Ye et al., 2005). Therefore, queries were connected within the query network with a congruent link if they shared interactions with the same targets more than expected by chance (p<10-9 ) hg (Materials and Methods 2.5.9). As expected, the proximity of query genes to each other in the clustergram is reflected in the congruent links. Finally, links derived from other datasets considered throughout this study were added to the query network. These included protein-protein interactions, co-expression links, phenotype links, and other genetic data, all of which are described in detail below. The resulting query network contains 11 nodes and 33 query-query interactions, 16 of which are supported by multiple sources. Of the 24 SGI links within the query network, 8 are supported by other lines of evidence that include previously described 74

87 genetic interactions between genes within defined pathways. Therefore, 16 of the SGI links represent previously unreported interactions, 10 of which are also supported by congruent links. Many of the interaction patterns within the query network are expected. For example, the downstream mediators of receptor tyrosine kinase signalling (let-60/ras, sem-5/grb-2, and sos-1/sos-2) have the highest number of links within the query network (21, 21, and 18 respectively). This pattern is expected given that almost half of the pathways analyzed involve receptor tyrosine kinase signalling. Interestingly, let-60 and sem-5 each interact with all of the query genes but do not interact with clk-2, suggesting that they are common mediators of signal transduction. As expected, clk-2 has the fewest links. There are also many multiply supported links between let-23, let-60, sem-5, and sos-1, which are previously characterized components of the EGF pathway (Chang et al., 2000; Sternberg and Han, 1998). Furthermore, previously characterized cross-talk between let-60 and bar-1/β-catenin (Eisenmann et al., 1998b), and between daf-2/insulin receptor and sem-5/grb2 (Nanji et al., 2005) is supported. The query network provides the first evidence of genetic interactions between let-756/fgf and downstream mediators of the FGF pathway including egl-15/fgfr, let-60, sem-5, and sos-1, affirming several previous lines of evidence (Borland et al., 2001). Furthermore, let-756 and egl- 15 each interact with six query genes, five of which are shared between the two. Finally, the query network reveals novel interactions between bar-1/β-catenin and glp-1/notch-receptor, between bar-1 and sma-6/type 1 TGF-β receptor, and between bar-1 and multiple components of the FGF and EGF pathways. Further investigation will be required to elucidate the precise role of these interactions during development. 75

88 2.3. Discussion SGI is a Robust Approach to Systematically Investigate Genetic Interactions I developed Systematic Genetic Interaction analysis (SGI) to identify biologically relevant genetic interactions in a systematic and high-throughput manner. Through the unique approach, I was able to extract 3.5-fold more interactions than a previous study (Lehner et al., 2006), despite testing 9.2-fold fewer gene pairs for interaction. The resulting SGI network of 1246 interactions is the largest metazoan genetic network reported to date. Four lines of evidence support the validity of SGI interactions. First, replicates of 90 query-target pairs were included in both the Signalling and LGIII matrix, yielding a technical reproducibility of 83%. Second, six of the query genes were also included as RNAi targets, yielding a reciprocal reproducibility of 40%. Full reciprocity is not expected because of the varying degree of gene inactivation in the background of different alleles and RNAi-conditions. Third, of the 1165 gene pairs examined in both this study and by Lehner et al, SGI identified 64% of the 28 interactions found by Lehner et al., and there is 98.9% agreement between the negative calls. Fourth, an independent method of assessing the likelihood of genetic interactions between gene pairs (Zhong and Sternberg, 2006) determined that the SGI network is enriched for interactions that are predicted to be true (p<10-25 ). Three lines of evidence suggest that the interactions uncovered by SGI are also biologically meaningful. First, query genes involved in signal transduction have dramatically more interactions with signalling targets than with random targets. In contrast, a query gene involved in an unrelated process (DNA-damage response) interacts with signalling and random targets with equal frequency. Second, the SGI network contains 26% of all gene pairs within the interaction test matrix that have similar GO annotation, suggesting that the SGI network is 76

89 greatly enriched for interactions between functionally-related genes (p<10-21 ) hg. Third, a cluster analysis reveals many expected patterns within the query gene network, and between query and target genes. For example, a glp-1-interacting cluster is enriched for notch-receptor processing activity (Austin and Kimble, 1989; Yochem and Greenwald, 1989), a sem-5- interacting cluster is enriched for muscle-development activity (DeVore et al., 1995; Dixon et al., 2006), and a bar-1 interacting cluster is enriched for establishment of cell polarity activity. Thus, the dataset contains biologically meaningful relationships that can be mined for further insights The SGI Approach Reveals Interactions in an Unbiased Fashion The SGI approach facilitates the discovery of interactions with a wide range of strength and reveals many network variants from which the most biologically relevant network can be extracted. Although the chosen SGI network is significantly enriched with known functional categories, a number of criteria can be modified to mine SGI data for more or less stringent interactions. For example, the SGI variant with the most significant precision and recall (Figure 2-3A) had greater overlap with predicted interactions than did the larger SGI network (Figure 2-4D). With the SGI approach, tailored sets of genetic interactions can be revealed that either facilitate detailed biological analysis by limiting false positives at the expense of some true positives, or facilitate global network analyses by increasing the capture rate of true positives at the expense of including more false positives. My chosen SGI network has good recall and precision when compared to other interaction datasets. As a quality benchmark of precision, I considered the network of fine genetic interactions, which is assembled from low-throughput biological analyses and likely contains 77

90 few false positive interactions. The SGI network has a precision similar to the network of fine genetic interactions, which suggests that SGI interactions do not simply represent the additive perturbation of functionally unrelated genes. Although much of the precision score of the SGI network is due to interactions among known signalling components, the precision of the LGIII network remains significant, suggesting that more uncharacterized interactions are uncovered within the LGIII network than the signalling network, as expected. Surprisingly, the SGI network has a higher recall than all of the other datasets examined. This is not due to the pre-selection of signalling targets, as a network created with random LGIII targets also has a higher recall than the other datasets. By comparison, the Lehner network, which is similar to the signalling network in that it derives from a matrix of pre-selected signalling genes, has much lower recall than all SGI-related networks. I suspect that the difference lies in the methodology of identifying interactions: the SGI approach detects interactions ranging from weak to strong, while Lehner et al. reports only strong interactions. Restricting analyses to strong interactions evidently neglects a large proportion of meaningful interactions between genes known to function within the same biological process, and must therefore miss interactions between genes with no previously shared annotation as well. The SGI approach is similar in principle to that used by Fraser and colleagues (Lehner et al., 2006), but with four key differences. First, Fraser investigated interactions in liquid culture whereas I performed all experiments on the solid agar substrate commonly used by C. elegans geneticists. Second, rather than score population growth in a binary manner, I used a graded scoring scheme to measure population growth. Third, rather than test all potential interactions in side-by-side duplicates (Lehner et al., 2006), I performed all experiments in at least three 78

91 independent replicates in a blind fashion. Finally, I employed a global analysis of our data to identify interacting gene pairs in an unbiased fashion The Large Number of Genetic Interactions Revealed by SGI is Not Unexpected Approximately 18% of the 7008 gene pairs that I tested interact genetically. I rationalize this large fraction of interacting gene pairs uncovered by SGI in four ways. First, genes within the same local neighborhood on a network graph are more likely to interact with each other than with randomly selected targets. For example, in S. cerevisiae, 18-24% of genes linked to the same query gene interact with each other, compared to the interaction rate of 0.6% for the average query (Tong et al., 2001; Tong et al., 2004). Similarly, a majority of the SGI genetic tests are between genes known or predicted to be involved in signal transduction; a relatively high number of interactions may therefore be expected. Second, essential genes genetically interact with more genes than non-essential genes. For example, when conditional alleles of essential yeast genes are used as queries in SGA screens, the fraction of interactions identified is 5.5-fold more than the number of interactions with non-essential queries (0.6%) (Davierwala et al., 2005). Of the 11 query genes investigated in this study, 9 are essential. Thus, by using hypomorphic alleles of genes that likely teeter on the brink of collapse and designing an approach that can reliably detect both strong and weak interactions, I have created a very sensitive system to detect genetic interactions. Third, multicellular organisms may have more vulnerabilities than unicellular organisms. Each cell type within an animal is likely to be governed by a system with a distinct set of genetic vulnerabilities that is different from other cell types. Since compromising the development or physiology of any one of the major tissue types will likely kill the animal, the vulnerability of the entire system is greater than that of any 79

92 one cell type. This effect may be further compounded by a complex developmental program. Finally, the total number of anticipated genetic interactions in C. elegans as revealed by SGI is in the realm of expectation when compared to that of S. cerevisiae. Based on the fraction of genes that interacted in the LGIII network (14%), which represents a nearly random set of genes, there are an estimated ~61 million genetic interactions in C. elegans. The number of expected genetic interactions in C. elegans as revealed by SGI analysis is therefore ~120 times that of S. cerevisiae (Davierwala et al., 2005; Tong et al., 2001; Tong et al., 2004). By comparison, the number of all possible gene pairs in C. elegans is ~11-fold more than the number of all gene pairs in S. cerevisiae. Thus, the ratio of expected genetic interactions in worms compared to yeast is only ~11-fold more than the respective ratio of all possible gene pairs in both organisms. This difference likely reflects the increased complexity of nematodes compared to yeast. In contrast, Lehner et al. reported an interaction rate of 0.5%. This fraction would suggest that the ratio of the number of expected genetic interactions in worms compared to yeast is ~0.4-fold less than the ratio of all possible gene pairs in worms compared to yeast, which is inconsistent with expectations. I therefore conclude that the number of interactions revealed by SGI is not unexpectedly high Conclusion I developed a novel, sensitive, and reproducible approach to systematically investigate genetic interactions in C. elegans called SGI. Using this approach, I identified a network of 1246 interactions among 461 genes. These interactions correlate with existing functional annotation, are reproducible, and reveal both expected and novel functional relationships among genes 80

93 involved in signal transduction. As such, the SGI network is a resource that provides functional annotation for many poorly characterized signal transduction genes Materials and Methods Analysis of the Distribution of Functional Categories within the LGIII set Within the LGIII set of genes, there are 203 genes annotated with at least one GO biological process. These genes represent 280 unique GO Process 1000 categories samples from the C. elegans genome of 203 genes with at least one GO biological process were then chosen randomly. The random set has a mean of unique GO Process 1000 categories with a standard of deviation of Compared to the random set, there is no significant difference in the number of unique GO Processes in the LGIII set (z-score=-1.298; p=0.097 after Bonferroni correction). Furthermore, of the 280 unique GO biological processes in the LGIII set, only 18 are significantly enriched (p>0.01) in the LGIII set, and all of these are represented by only 1 (12 processes), 2 (4 processes) or 3 (2 processes) genes. See Additional Data File RNAi feeding assay Query-target gene pairs were tested for interaction by feeding target gene RNAi to worms with a mutation in the query gene. RNAi cultures were grown in 100ug/mL LB Amp overnight at 37 C. 40uL of culture was placed on each well of 12-well plates containing 3.5 ml NGM (Lewis and Fleming, 1995) supplemented with ug/ml carbenicillin and 1mM IPTG. Plates seeded with bacteria were dried overnight at room temperature and for 40 min in a flow hood. Two L3-81

94 L4 worms [N2, egl-15(n1477), let-756(s2613), sos-1(cs41), sem-5(n2019), let-23(n1045), let- 60(n2021), clk-2(mn159), daf-2(e1370), glp-1(or178), sma-6(e1482), bar-1(ga80)] were placed in each well of a 12-well plate using a COPAS BIOSORT worm sorter (Union Biometrica). Worms were grown at 20 C [egl-15(n1477), let-756(s2613), sos-1(cs41), sem-5(n2019), let-60(n2021), sma-6(e1482), bar-1(ga80)] or at 16 C [glp-1(or178), let-23(n1045), clk-2(mn159), daf- 2(e1370)]. The following controls were grown in each experiment. As a positive control for RNAi efficiency, wild-type (N2) worms and the query mutants were fed pop-1(rnai). As negative controls for background growth levels, N2 worms were fed target RNAi and query mutants were fed L4440 mock-rnai. Typically, one person can prepare and process experiments with five worm strains fed 384 RNAi-inducing bacterial strains in triplicate over the course of two weeks. Overlapping sets of experiments of similar size can be prepared while the worms in the first experiment are growing, resulting in an average throughput of 1920 genetic tests per week per person Scoring query-target interactions The number of progeny counted in a well that resulted from each query-target pair and control combination was counted and recorded as growth scores. A well with no progeny was given a growth score of zero, whereas a well overgrown with progeny was given a growth score of six. Growth scores one through five were assigned to wells with increasing numbers of worms (1, 1-10 progeny; 2, progeny; 3, progeny; 4, progeny; 5, 200+ progeny). I found that worm populations can be quickly and reliably binned into these categories based pilot experiments performed by two independent investigators. We took several counts of the same maturing population over the course of several days. Each query-target pair and its two 82

95 controls were tested in at least three rounds. Experiments suspected of contamination were flagged as suspect and repeated. Counts obtained in a round were annotated with confidence scores of 0, 1, or 2, reflecting whether they were suspect, not suspect, or resulted from a second attempt, respectively. A large fraction of all experiments were digitally archived using a high-throughput digital imager (Kwok et al, 2006; Burns et al, 2006.) Determination of interactions from growth scores To determine whether two genes interact, the difference between experimental growth scores and control growth scores must first be calculated. Let G(Q, T,i,j) be the growth score for the (Q,T) query-target pair on the j th day of round i. For each query-target pair, two growth score differences were calculated: 1) D null (i,j) = G(Q,null,i,j)-G(Q,T,i,j), the difference between the experimental population (query mutant; target RNAi) and the mock RNAi vector control (query mutant; L4440 RNAi); and 2) D wt (i,j)=g(wt,t,i,j)-g(q,t,i,j), the difference between the experimental population and the wild-type control (N2; target RNAi). The following sequential rules were used to call a (Q,T) pair an interaction: 1) For round i, its j th day s counts were called deviant if both D wt (i,j) and D null (i,j) were at least d. 2) A round s set of counts was labeled positive if at least e of its days were found to be deviant (e = 1 or 2) or a majority of its days were deviant (e = 0). 3) A (Q,T) pair was then called an interaction if at least s of its rounds were positive (s = 1 or 2) or a majority of its rounds were positive (s = 0). 83

96 Three additional criteria were used to determine how counts from suspect rounds were treated: 4) Suspect rounds were excluded from the analysis if the confidence score was less than a threshold c (c = 0, 1, or 2). 5) Counts derived from suspect rounds were removed if a second attempt was conducted as long as the parameter r was set; if r was not set, all counts were retained. 6) Suspect rounds were included to bring the total number of rounds to a minimum of m (m = 1 or 2) Generation and comparison of network variants We applied all combinations of the above criteria to generate 72 different network variants. All interacting pairs within a network variant were query-target pairs that had satisfied all of the criteria imposed by the variant. For example, in a variant with the following criteria: d=3, e=1, s=2, r=1, c=0, and m=2, all query-target pairs that were called interacting were found in at least 2 (s=2) positive rounds that had at least 1 (e=1) deviant day, for which the difference between the growth scores of the experimental population and the control populations was at least 3 (d=3). If any round was considered suspect and the experiment for that round had been repeated, only growth scores from the second attempt were used (r=1). Otherwise, rounds with all levels of confidence were used (c=0). If fewer than 2 rounds of data were available for a specific query-target pair, data from additional rounds were included so that at least 2 rounds of data were available starting from the most confident rounds (m=2). To compare network variants, we identified pairs of genes within each variant that share a Gene Ontology (GO) biological process classification (Ashburner et al., 2000). Only categories 84

97 with less than 1000 genes were considered. We calculated recall and precision for each variant V as: ( V) Recall = #of co - classified interacting pairs in V # of possible co - classified pairs # of co - classified interacting pairs in V Precision ( V) = # of interacting pairs in V The significance of the degree to which each network linked genes in the same GO biological process category was estimated using the hypergeometric distribution. The hypergeometric distribution takes into account the number of co-classified interacting pairs in each variant relative to the size of the variant, the total number of all possible co-classified gene pairs, and the total number of gene pairs tested and is thus a measure of the significance of both the recall and precision of a variant A more detailed assessment of the nature of the SGI interactions The SGI method for determining a genetic interaction identifies interactions that exceed phenotypic expectations for a synthetic genetic interaction based on multiplicative models (Schuldiner et al., 2005; Segre et al., 2005). This was demonstrated by my bioinformatic collaborators Matt Weirauch and Josh Stuart as illustrated below. The multiplicative model corresponds to calling an interaction whenever the following inequality is true: 1 < FQ 0F0 T F QT, 85

98 where F QT is a measure of fitness for the Q-T mutant. F Q0 and F 0T denote the fitness of the query RNAi in wild-type background and the control vector in the hypomorph background respectively. The numerator is the product of the two fitness measures and is interpreted as the expected fitness under the model that genes Q and T do not interact. If we use the proportion of counted worms as the fitness measure (i.e. F QT = N QT / W, where N QT is the number of worms counted for the Q-T pair and W is the total possible number of worms) we get the following multiplicative rule: W < N Q0 N N QT 0T. In our case, W=250 is assumed as worms were often found burrowing at counts this high, indicating the bacterial lawn was depleted when the population reached approximately 250 worms. In contrast, the rule described by SGI calls an interaction whenever the following inequality holds: d ( GQ G T ) GQT min, 0 0 where d is the difference cut-off as discussed in 2.4.4, and G QT is the growth score of the Q-T query-target double-mutant. While this rule is additive in the growth scores, it is multiplicative in the original worm counts on a plate because our discretization of growth scores is roughly logarithmic (log base 3) with respect to the counts. Thus, the above rule can be rewritten as: d 3 ( N ) NQ 0T where N QT corresponds to the approximate count of worms for the Q-T mutant. min N 0, QT,, 86

99 Using the definitions of the two rules, we see that the interaction calling method used in SGI will be more conservative than the multiplicative rule whenever the ratio between the minimum and the product of the control counts obeys the following relation: min N ( N, N ) Q0 Q0 N 0T Since min(x,y)/xy is maximum when x=y for non-negative integers x and y, we see that our method is conservative with respect to the multiplicative model (i.e. will call a subset of the interactions as the multiplicative model) as long as the number of worms of the controls are greater than 25. Because this corresponds to a growth score of 2 and our delta parameter was equal to 2, all of the calls reported in the manuscript fall into this conservative range. 0T d 3 < W Clustering of interaction strengths An interaction strength, IS, was calculated so that target and query genes could be clustered based on their interaction profiles. The IS measures the average difference between the experimental and control populations of worms. For interacting pairs, we averaged D wt (i,j) and D null (i,j) using only days and rounds passing criteria 3 through 6. For pairs considered noninteracting, all rounds that passed criteria 4 through 6 were included in the computation. The final interaction strength for a particular query-target pair was calculated as: IS = 1 h n i=1 ( ) 1 n i 1 1 i n i j=1 2 D wt ( i, j)+ 1 2 D null( i, ) j, 87

100 where 1(i) was 1 if round i passed the above criteria and was 0 otherwise, h is the total number of rounds that passed the criteria, and n i is the number of days in round i. IS represents the average growth score for a query-target pair calculated over its valid data. Target and query genes were clustered based on their interaction strengths. Hierarchical agglomerative clustering was run using Cluster 3.0 (de Hoon et al., 2004; Eisen et al., 1998) on both the target and query dimensions using average linkage as the cluster similarity metric and uncentered Pearson correlation as the IS profile similarity metric, respectively. Individual target gene clusters were defined by cutting the hierarchical tree at a height of 0.4. The degree to which each cluster contained genes assigned to the same gene functional category was measured using the hypergeometric distribution and a significance cutoff of P < Gene Functional Categories We searched for common functional annotation present in clusters of genes on the heatmap. To do so, we collected several datasets of gene functional categories described for C. elegans genes specifically as well as for predicted C. elegans orthologs from other organisms. We collected C. elegans gene categories from GO (Ashburner et al., 2000) (downloaded from on January 17, 2007) and KEGG (Kanehisa et al., 2006) (downloaded from ftp://ftp.genome.ad.jp/pub/kegg/pathways/cel on June 13, 2005). We restricted to GO process categories containing 1,000 genes or less. Annotations implied by the is-a or part-of subsumption GO hierarchies were automatically added. We also collected S. cerevisae gene pathways from MIPS (downloaded from on May 12, 2002) and H. sapiens gene pathways from 88

101 BioCarta (downloaded from on June 13, 2005). For the MIPS and BioCarta datasets, we found the predicted C. elegans ortholog for each gene in a pathway by identifying the reciprocal best match protein using the BLASTP program (Altschul et al., 1997). All of the categories with their associated genes can be found in Additional Data File Construction of the Query Network Pairs of query genes found to interact with a significantly similar set of target genes were connected by congruent links as defined by Tong et al. and Ye et al. (Tong et al., 2004; Ye et al., 2005). The P- value of the overlap of k target genes of a query gene pair (A,B) was determined using the hypergeometic distribution: i n i n K N K P( X k)=, n i= k N where K is the number of target genes linked to query gene A, n is the number of target genes linked to query gene B, and N is the number of tested target genes. A P-value cutoff of p<10-9 yielded a total of 16 congruent links. 89

102 Chapter 3 Analysis of the C. elegans Synthetic Genetic Interaction Network The analysis of the SGI data was carried out in collaboration with Matthew Weirauch and Martina Koeva at the University of California, Santa Cruz, under the guidance of their supervisor Josh Stuart and mine (Peter Roy). Therefore, I use the plural pronoun we when referring to work done by the collaboration. I have included detailed methods of the data analysis in the materials and methods section, even though it was largely implemented by my bioinformatics collaborators. The work in this chapter has been published as: Byrne, A., Weirauch, M., Wong, V., Koeva, M., Dixon, S., Stuart, J. and Roy, P. (2007). A global analysis of genetic interactions in Caenorhabditis elegans. J Biol 6, 8.

103 Chapter 3. Analysis of the C. elegans Synthetic Genetic Interaction Network Abstract Uncovering genetic interactions on a high-throughput scale in a metazoan is a relatively new field. Therefore, not only is there much to learn about how to design a genetic interaction screen, but there is also much to learn about how to interpret the results to gain valuable biological information. In this chapter, I present the approaches that were taken to analyse the SGI network. Specifically, to understand the functional implications of genetic interactions, the network of synthetic interactions was overlapped with previously reported physical protein interactions, gene coexpression, and phenotypic correlation. Analysis of the SGI network of genetic interactions, the superimposed network, and the functional modules they contain reveal new function for uncharacterized and previously characterized genes, crosstalk between signal transduction pathways, and provides insight into how genetic interactions relate to other types of functional data on a global scale. The majority of genetic interactions provide orthogonal information to other datasets. Moreover, genetic interactions bridge subnetworks on a global scale. Finally, a comparison of existing yeast and worm genetic interactions suggests that they are unlikely to be conserved. 91

104 3.1. Introduction Deriving Functional Information with Network Analysis Although the genomes of humans and common model organisms have been sequenced for several years, the challenge of defining gene function on a genome-wide scale still exists. Network analysis has been instrumental in the interpretation of large datasets to extract functional information. For example, analysis of network topology and identification of functional modules have been used to predict gene function (Tanay et al., 2004; Tong et al., 2004). Mapping networks of physical interactions onto orthologous proteins has identified orthologous protein interactions (Yu et al., 2004). In addition, overlapping physical interaction networks with genetic interaction networks has predicted whether genetically interacting genes function in the same or in parallel pathways (Kelley and Ideker, 2005; Tong et al., 2004; Tong et al., 2001; Ulitsky and Shamir, 2007). Moreover, overlapping networks containing different types of functional relationships and identifying multiply-supported links has provided evidence for putative functional relationships (Gunsalus et al., 2005; Tewari et al., 2004; Walhout et al., 2002). Prior to this study, a detailed characterization of a large-scale, experimentally derived metazoan genetic interaction network had not been carried out. As such, the following questions remained to be investigated. First, how much genetic buffering exists in a metazoan genome? Second, do genes cluster in groups according to their function within the genetic interaction network, thus enabling prediction of gene function? Finally, what other type of functional insight can be learned from the genetic interaction network? 92

105 How do genetic interactions contribute to the functional landscape of C. elegans? A large genetic interaction network has not been added to an integrated metazoan network. Therefore, how genetic interactions incorporate into the topology of an integrated network has not been well understood. Previous efforts to analyze patterns within integrated networks have primarily relied on links supported by multiple data sources (Gunsalus et al., 2005; Tewari et al., 2004; Walhout et al., 2002). As such, these studies have been biased towards functional relationships between physically associating proteins. Since components within a functional module may not physically interact or have co-modulated expression, the addition of genetic interactions, whether supported by other functional information or not, could add significantly to the discovery of functional modules in an integrated module. Therefore, genetic interactions stand to provide much insight into the structure and implications of metazoan networks Are Genetic Interactions Conserved? An important question in the field of model organism biology is whether the findings are applicable to human development and disease. At the time of this investigation, it was unknown whether genetic interactions are conserved among organisms. Yu and colleagues have shown that if two gene products physically interact in one organism, their homologs are likely to physically interact in another organism (Yu et al., 2004). Moreover, there appears to be similar levels of redundancy in C. elegans and S. cerevisiae based on the number of nonessential genes in each organism. However, gene function within the context of a whole animal may be different from gene function within a unicellular organism. The implications of this question are significant. Yeast is an ideal model organism with which to study genetic interactions. Since systematic investigation of metazoan genetic 93

106 interactions is unwieldy, it would be enormously informative if genetic interactions could be inferred from yeast. Furthermore, conservation of yeast and worm interactions might suggest a universality of genetic interactions, which would have implications for using a metazoan model organism to infer genetic interactions in mammals Rationale In Chapter 2, I demonstrated that SGI is a valid high-throughput approach to identify synthetic interactions among signal transduction pathways. The biological relevance of the interactions within the SGI network is supported by the evidence that: 1) they are reproducible; 2) they correlate with genetic interactions found in other studies; and 3) they overlap with previously reported functional annotation. Next, I wanted to mine the interactions of the SGI network to investigate the functional implications of the SGI network. I was particularly interested in three aspects of the network: 1) can the genetic interaction network be used to assign function to genes in a metazoan? 2) how do genetic interactions contribute to the functional landscape of C. elegans? and 3) are genetic interactions conserved? This chapter describes the approaches that I took to address each of these aims Results The SGI Network Properties are Typical of Other Biological Networks Using the 1246 genetic interactions of the SGI network, I asked if genetic network properties are conserved with those of yeast. First, SGI interactions have properties similar to scale-free networks: most SGI target genes interact with few query genes and few target genes interact with many query genes (Figure 3-1A). Second, hubs within the SGI network are more likely to 94

107 A. B. Figure 3 1. Network Properties of the SGI Network. (A) A plot of the percentage of targets (y axis) that interact with a given number of query genes (x axis), illustrating that the SGI network has properties similar to that of scale free networks. (B) A plot of the percentage of targets that yield a catastrophic phenotype when targeted by RNAi in a wild type background (Kamath et al., 2003) (y axis) as a function of how many query genes they interact with (x axis). 95

108 result in catastrophic phenotype (lethality, arrest, or sterility) when knocked-down by RNAi in a wild-type background compared to less connected targets (p<10-47 ) ( Figure 3-1B, Materials and Methods 3.4.1). Third, the average shortest path length ( ), clustering coefficient ( ) and average degree ( ) of the C. elegans genetic network are indistinguishable from those of the SGA synthetic genetic network, which has an average shortest path-length of , a clustering coefficient of , and an average degree of (Tong et al., 2001; Tong et al., 2004) (Materials and Methods 3.4.2). These results demonstrate that the network properties of the SGI network are conserved with that of the yeast SGA network The Construction of a Superimposed Network To investigate how worm genetic interactions relate to other interaction types we first created a superimposed network by combining published interaction data from numerous sources (Figure 3-2). The superimposed network was constructed from several large-scale interaction datasets, including the Li, interolog, Lehner, co-expression, co-phenotype, and fine genetic interaction networks (see Chapter 2). Additionally, data from the SGA network (Tong et al., 2004) was included by mapping the yeast network onto C. elegans orthologs. The resulting C. elegans network is referred to as the transposed SGA network (Materials and Methods 3.4.3). I will describe only the construction of the co-phenotype network in detail since all of the other networks consist of published links between genes. 96

109 SGI Lehner Transposed Fine SGA Genetic Superimposed Nt Network Multiply Supported Subnetwork Co expression Co phenotype Protein Interaction Interolog Figure 3 2. A Schematic of the Construction of a Superimposed Network. Networks collected or constructed from various data sources were combined to create the superimposed network. Nodes represent genes, edges are colored according to the dt data type they represent. Dt Data sources are: a previously described C. elegans genetic interaction network (Lehner et al., 2006); a C. elegans protein interaction network (Li et al., 2004), a eukaryotic protein interaction network that augments the C. elegans protein interaction network with orthologous interactions from S. cerevisiae, D. melanogaster, and human protein interactions contained in BioGRID (Stark et al., 2006); an mrna co expression network constructed from C. elegans, S. cerevisiae, D. melanogaster, and human expression data (Kim et al., 2001; Stuart et al., 2003); a S. cerevisiaesyntheticsynthetic genetic interactionnetwork network (Tong et al., 2004); literaturecuratedc C. elegans genetic interactions, called fine genetic interactions (Wormbase); and a network we created based on the similarity of C. elegans RNAi induced phenotypes(kamath et al., 2003; Rual et al., 2004; Simmer et al., 2003). 97

110 Creation of co-phenotype network Shared phenotypic profiles can be a strong indicator of shared function, depending on the level of specificity of the phenotype and the method used to judge similarity of profile. We created the co-phenotype network to link genes that have similar phenotype(s) when disrupted. As a result of the accessibility of the genome-wide RNAi-feeding library, many groups have carried out comprehensive RNAi screens, the results of which are collated on Wormbase (Wormbase, Release WS170). In the past, this dataset has been exploited to link genes with shared phenotypes (Gonczy et al., 2000; Gunsalus et al., 2005; Gunsalus et al., 2004; Piano et al.; Sonnichsen et al., 2005; Zou et al., 2008). For example, Gunsalus et al. (Gunsalus et al., 2005) used the uncentered Pearson correlation coefficient (phenotypic PCC) approach, which links genes if they share a similar phenotype when function is reduced by RNAi. However, the phenotypic PCC can produce false positive links between genes with high correlation that is based on a single (or even a few) shared common phenotype(s) when the two genes fail to produce phenotypes in all (or many) of the other phenotypes. Inspection of the compiled RNAi phenotype dataset reveals thousands of gene pairs that result in such spurious, yet perfect, correlation. Instead, we reasoned that when comparing two phenotypic profiles, each individual phenotype, whether shared or not, should be weighted by the frequency that it appears in the population. A good measure of similarity should give more weight to rare phenotypes shared between genes as opposed to common phenotypes because infrequent phenotypes will cooccur less often in two genes by chance. For example, 10% of the genome is embryonic lethal (Emb) when disrupted because there are a multitude of unrelated deficiencies that result in embryonic lethality. Conversely, only 23 genes produce a rolling (Rol) phenotype when 98

111 targeted by RNAi (Wormbase, Release WS170). Therefore, two genes that share the Rol phenotype when disrupted are much more likely to share function than are two genes that share the Emb phenotype. Furthermore, the correlation between two genes should increase if both do not produce a very common phenotype when genes are targeted by RNAi. A loss-offunction agreement score, LOFA, was calculated for two genes i and j, that captures these ideas (Materials and Methods 3.4.4). The co-phenotype network was created by linking genes with similar loss of function phenotypes detected in recently published high-throughput RNAi screens (Kamath et al., 2003; Rual et al., 2004; Simmer et al., 2003). I considered various RNAi screens to incorporate into the phenotype network. There are many genome-wide screens that have investigated a specific phenotype such as transposon silencing, genome stability, fat regulation, and apoptosis (Gonczy et al., 2000; Piano et al. 2000; Ashrafi et al., 2003) The previously reported screens vary on two main characteristics: whether they were genome-wide, and whether they report a range of phenotypes. For example, Maeda et al. screened only 2500 genes for very drastic phenotypes such as emb, ste, let, muv, dpy, unc, and larval arrest. On the other hand, Sonnichsen et al screened 98% of genome. However, they were only looking for genes required for the first two rounds of division in the embryo, and only reported dsrna effects if embryonic lethality was seen. Therefore, I did not include results from such screens as they did not report a wide range of phenotypes. Instead, I focused on the screens that covered a large part of the genome and that reported a range of phenotypes. I assembled an RNAi phenotype compendium by compiling the results of three genome-wide RNAi studies: 31 phenotypes scored for 1,472 RNAi from the Kamath et al. dataset (Kamath et al., 2003); 25 phenotypes scored for 1,486 RNAi from the Simmer et al. dataset (Simmer et al., 2003); and 26 phenotypes 99

112 scored for 1,066 RNAi from the Rual et al. dataset (Rual et al., 2004). While this last dataset was not a genome-wide screen, it is derived from an Orfeome based RNAi library that was used to target 10,953 genes, which included 1,736 genes not targeted in the Kamath and Simmer screens. Furthermore, the phenotypes were scored in a manner similar to that used in the other two screens. Thus, this dataset was included in the construction of the co-phenotype network. Several phenotypic annotations in the datasets were converted to provide a uniform terminology that allowed the three datasets to be integrated. These conversions included labelling brood counts scored as 1-5 and 6-10 as Ste ; re-labelling Prz as Prl ; relabelling Lvl as Let ; and labelling any embryonic lethal percentages over 10% as Emb. In total, 37 phenotypes scored across 2,327 unique RNAi experiments were collected from the three studies and recorded in a 2,327-by-37 RNAi phenotype matrix. Each entry in the matrix was set to 1 if RNAi against the respective gene produced a specific phenotype in one of the three studies and was set to 0 otherwise. Each row in the matrix is referred to as a gene s RNAi phenotype profile. In a subsequent investigation, Matt Weirauch and Josh Stuart compared the LOFA method to that of PCC and others (Weirauch et al., 2008). In their investigation, the LOFA score is referred to as the Agreement score (AGREE). The network created with AGREE outperformed the other methods at predicting functional congruency, as measured by GO annotation, between linked pairs of genes. 100

113 Genetic interactions are orthogonal to other interaction datasets The links from all of the aforementioned networks were combined with the SGI network to form a single superimposed network (Figure 3-2). Altogether, the superimposed network contains 7,825 genes connected by 75,283 links: 43,363 eukaryotic co-expression links, 2,620 previously reported C. elegans genetic interactions, 7,527 transposed synthetic genetic interactions from yeast, 12,796 non-worm eukaryotic protein-protein interactions, 3,967 C. elegans proteinprotein interactions, 8,862 co-phenotype links, and 1,246 SGI links (Table 3-1 and Additional Data File 3). Only 1.2% of the interactions within the superimposed network are supported by multiple data types. Concomitantly, there is little overlap between any genetic interaction dataset and other modes of interaction, suggesting that genetic interactions typically reveal novel relationships between genes The Superimposed Network was Mined for Multiply-Supported Subnetworks Next, we examined how SGI interactions contribute to the connectivity of multiply-supported subnetworks (MSSNs) within the superimposed network (Materials and Methods). MSSNs are highly connected subnetworks of genes composed of qualitatively different data types that do not necessarily overlap (see Figure 3-2). MSSNs may therefore be able to reveal functional modules that emerge from non-overlapping links. Initially, we found 68 MSSNs in the superimposed network that may reflect a higher level organization of gene activity (Hartwell et al., 1999) as 82% are significantly enriched for genes with similar functional annotation (see Additional Data File 6). A second approach (Materials and Methods) identified an MSSN that I call the bar-1 module, which illustrates how genetic interactions can unite data from disparate sources to reveal coordinate function (Figure 3-3). bar-1 encodes a β-catenin ortholog that 101

114 a Number of links supported by other data within the superimposed network. The fold enrichment over the average number obtained from 1000 randomly permuted superimposed networks (representation factor) is given in brackets. Unless noted with an asterisk, P values of the representation factor are <1e-04. na = not applicable. 102 Genetically- Genetically- Physically- Co-Exp.- Co-Phen.- Supported Supported Supported Supported Supported Supported Network Links Nodes a Links b Links (A) c Links (B) d Links e Links Links f Superimposed 75,283 7, (7.2) na na na na na network SGI 1, (2.0) 43 (1.6) 53 (1.8) 9 (5.6) 2 (9.0) 4 (5.9)* Lehner (5.5) 13 (10.8) 23 (7.3) 3 (22.7) 1 (17.9) 1 (30.3) Fine genetic 2,279 1, (4.6) na 48 (1.7) 61 (27.8) 23 (36.1) 22 (20.2) interactions Transposed SGA 7, (2.3) 5 (4.5) 5 (3.2)* 43 (2.2) 14 (3.0) 4 (1.3)* Interolog 12,796 4, (9.9) 61 (27.8) 110 (4.8) na 577 (14.6) 42 (3.9) C. elegans protein 3,967 2, (3.7) 7 (10.6) 10 (4.2) na 13 (3.8) 5 (3.4)* interaction Eukaryotic coexpression 43,363 5, (11.8) 23 (36.1) 40 (7.2) 577 (14.6) na 84 (6.1) C. elegans cophenotype 8, (5.2) 22 (20.2) 30 (6.1) 42 (3.9) 84 (6.1) na Table 3-1 Composition of the C. elegans Superimposed Network b Number of links supported by fine genetic analysis reported in Wormbase (release 170). c Number of links supported by genetic interactions reported in Wormbase (release 170), Lehner et al. (2006) or SGI. d Number of links supported by eukaryotic physical interactions (interologs; see text for details). e Number of links supported by eukaryotic mrna co-expression analysis (see text for details). f Number of links supported by C. elegans co-phenotype correlations (see text for details).

115 A. SGI Co expression Worm Phenotype Protein protein Worm Genetic Multiply Supported SGI Gene B. bar 1(ga80) bar 1(ga80); prx 5(RNAi) bar 1(ga80); lin 35(RNAi) bar 1(ga80); T20B12.7(RNAi) Figure 3 3. Genes in bar 1 module share a pale phenotype. (A) The bar 1 module of 21 genes was identified by virtue of the interconnectedness of co expression, co phenotype, genetic, and protein interactions within the superimposed network. Edges are colored according to the type of supporting evidence. Genes tested for interaction with bar 1 within the original SGI matrix are indicated (black dot). Visualization generated with Visant (Hu et al., 2007). (B E) bar 1(ga80) worms (B) fed RNAi that targets prx 5 (C), lin 35 (D), or T20B12.7 (E). The pale phenotype of worms in (C E) is evident when compared to the relatively wild type colouration of bar 1(ga80) worms in (B). 103

116 transduces a Wnt signal (Eisenmann et al., 1998b). The 21 genes of the bar-1 module are linked by 7 SGI interactions to the bar-1 query gene, 11 fine genetic interactions, 36 co-phenotype links, 3 co-expression links, and 1 protein-protein interaction link. To further investigate this subnetwork, I targeted all of the genes within the subnetwork with RNAi in a bar-1(ga80) mutant background. Of the 9 interactions within the bar-1 module that were tested within the original SGI matrix, 8 (89%) retested similarly. An additional 7 new genetic interactions were found within the module (Table 3-2). In total, I found that 12 of the 20 RNAi targets (60%) interacted with bar-1(ga80), which is 3-fold more than expected compared to bar-1(ga80) interactions within the SGI matrix (p<10-4 ) hg. Genes within the bar-1 module linked by co-phenotype edges exhibit a pale and scrawny phenotype when targeted by RNAi (Kamath et al., 2003). I also found that lin-35(rnai) and T20B12.7(RNAi) exhibit the same pale and scrawny phenotype in a bar-1(ga80) background ( Figure 3-3). I hypothesized that the pale phenotype is due to decreased fat production or storage. A common method to examine fat accumulation in C. elegans is to incubate worms in Nile Red vital dye, which stains lipids and readily accumulates within the triglyceride deposits in the intestine (Greenspan et al., 1985). I therefore targeted each gene within the subnetwork by RNAi in the presence of Nile Red and measured the accumulation of Nile Red microscopically (Materials and Methods 3.4.9). Inhibition of 15 of the 20 genes caused a significant decrease in Nile Red accumulation in an N2 background (Figure 3-4). 5 of the 9 genes that present the pale and scrawny phenotype also showed the decrease in Nile Red staining, suggesting that defects in fat metabolism and/or accumulation may account for the phenotypes observed with the transmitted light dissection microscope. Moreover, 10 of the 11 genes that did not present the pale phenotype also retained less Nile Red than controls. Together, these results suggest that 104

117 Target a Gene bar-1-linked (in SGI network) b bar-1-linked (Retest) c C27F2.10 Y Y lin-2 N N lin-7 Y Y lin-35 Y Y lin-39* N N ogt-1 Y W prx-5 Y Y T20B12.7 Y Y ZC Y N bar-1 nd N B nd Y efl-1 nd N exo-3 nd N F29C12.4 nd Y F54C9.6 nd Y lin-23 nd N mrp-5 nd Y T01E8.6 nd Y T09A5.5 nd Y ubc-18 nd N Y48E1B.5 nd Y Table 3-2. Genetic Interactions Within the bar-1 Module. a The 21 genes of the bar-1 module, including the bar-1 query. The asterisk indicates that lin-39 was previously reported to interact with bar-1 (Eisenmann et al., 1998b). b The 9 interactions between the targets and the bar-1 query within the bar-1 module that were tested in the original SGI matrix. Y, an interaction was inferred; N, no interaction was inferred; nd, indicates gene pair not tested in SGI. c All nodes within the bar-1 module were targeted by RNAi in the background of bar-1(gm80). ogt-1 interacted weakly (W) in the direct test, and also had weak interaction scores within the original SGI matrix. We therefore counted ogt-1 as a target that behaved similarly in both the SGI matrix and the detailed examination of the bar-1 module. 105

118 A Normalise ed dintensity N2; Ø(RN NAi) F1 N2; Ø(RN NAi) F2 bar-1; Ø(R RNAi) F2 N2; Y48E E1B.5(RNAi) F1 N2; mrp-5 5(RNAi) F1 N2; F29C C12.4(RNAi) F1 N2; ZC (RNAi) F2 N2; lin-2(r RNAi) F2 N2; B (RNAi) F1 N2; T20B B12.7(RNAi) F2 RNAi) F2 N2; efl-1( N2; lin-39 9(RNAi) F2 N2; C27F F2.10(RNAi) F2 Genotype N2; lin-35 5(RNAi) F2 (RNAi) F2 N2; ogt-1 (RNAi) F1 N2; prx-5 N2; T09A A5.5(RNAi) F1 N2; ubc-1 18(RNAi) F1 N2; lin-23 3(RNAi) F1 N2; F54C C9.6(RNAi) F1 N2; exo-3 3(RNAi) F1 N2; lin-7(r RNAi) F2 N2; T01E E8.6(RNAi) F1 B. N2; Ø(RNAi) (Nile Red) C. N2; T20B12.7(RNAi) (Nile Red) D. N2; Ø(RNAi) )(DIC) E. N2; T20B12.7(RNAi) (DIC) Figure 3 4. The bar 1 Module Regulates Fat Storage and/or Metabolism. (A) Fat accumulation and/or storage disruption in the bar 1 module. Genes in the bar 1 module were targeted by RNAi in an N2 background. The resulting worms were stained with Nile Red and staining was quantified in order to compare values to N2 worms fed negative control RNAi (see Materials and Methods). 15 of 20 genes show a reduction of Nile Red staining in an N2 background. Values have been normalized with N2 values for each experiment. Error bars represent standard error of the mean. (B,C) Visualization of Nile Red staining in N2 worms fed either negative control mock RNAi (B) or RNAi that targets T20B12.7 (C). The corresponding DIC pictures (D,E) are shown below the respective dark field pictures. Scale bar, 50μm. 106

119 the bar-1 module may regulate fat production or storage. Furthermore, the analysis of the bar- 1 module illustrates how SGI interactions can reveal coordinated activity between otherwise disparate genes within the superimposed network The Nature of Genetic Interactions in C. elegans High Throughput Interactions are Likely Between-Pathway Interactions We next investigated the overlap between genetic interactions and other types of data within the superimposed network. We found that fine genetic interactions are supported by far more physical interactions compared to SGA interactions (Figure 3-5) consistent with the idea that fine genetic interactions are enriched for within-pathway interactions and that SGA interactions are enriched for between-pathway interactions (Kelley and Ideker, 2005; Tong et al., 2004; Ulitsky and Shamir, 2007). The fraction of SGI and Lehner genetic interactions supported by physical interactions is indistinguishable from the fraction of SGA links supported by physical interactions (Figure 3-5). Similar results were obtained when the analysis was repeated to measure the proportion of genetically interacting gene pairs that overlap with either the co-expression or co-phenotype networks. Importantly, for each comparison, the analysis was restricted to pairs of genes that were tested in each dataset. Hence, the SGI and Lehner genetic interactions are likely biased towards between-pathway interactions, similar to those revealed by SGA Genetic Interactions Bridge Functional Modules The topology of the bar-1 module, along with the finding that SGI interactions are largely orthogonal to other types of functional links raised the possibility that synthetic genetic 107

120 A. Yeast SGA network Worm fine genetic network Between pathway interaction A D Within pathway interaction A Null B C E X F Null mutant B C mutant Genetic interaction Protein interaction ti X Lethal/Sick X Lethal/Sick hl/ik B. B. 25 wsgi lap % Overl Lehner Tong Fine genetic 5 0 PPI Co expression Co phenotype Figure 3 5. An Analysis of the Overlap between Genetic Interactions and Other Modes of Interaction. (A) A schematic outlining the expected propensity of yeast SGA and Fine Genetic Interactions to be enriched for between pathway and within pathway interactions, respectively. Within pathway interactions are more likely to overlap physical interactions than are between pathway interactions. (B)The number of genetically interacting gene pairs from SGI, Lehner (Lehner et al., 2006), the transposed SGA dataset (Tong et al., 2004) and low throughput fine genetic interactions (Wormbase) (see text and methods) that also interacted through direct protein protein interactions (PPI) (Li et al., 2004), or were tightly co expressed (coexpression) (Kim et al., 2001; Stuart et al., 2003), or had similar phenotypic profiles (co phenotype) (Kamath et al., 2003; Rual et al., 2004; Simmer et al., 2003) (see Materials and Methods) was analyzed (x axis). Only gene pairs tested in both relevant datasets are considered here. To account for the differences and disparity of genes tested in the various screens, the results are represented as the number of interactions that overlap between the two datasets as a fraction of the number of identical or homologous gene pairs tested in both studies (y axis). Error bars indicate one unit of standard deviation assuming a binomial distribution. 108

121 interactions interconnect, or bridge, functional modules on a global scale. To investigate this possibility, we first identified subnetworks within the co-expression, co-phenotype, and interolog networks that contributed to the superimposed network (Materials and Methods). 162 of the 343 resulting subnetworks (47.2%) are enriched for shared functional annotation (see Additional Data File 7), suggesting that they are biologically relevant. We then asked if SGI interactions typically fall within or between subnetworks (Figure 3-6A). We found 33 subnetwork pairs significantly bridged by SGI-links, which is 8-fold more than expected by chance (p<10-23 ) (Materials and Methods and Additional Data File 8). By contrast, SGI links are significantly under-represented within these subnetworks (p<0.001) hg. An example of a pair of subnetworks bridged by SGI interactions is shown in Figure 3-6B, in which a regulation of body size subnetwork is linked to a formation of primary germline subnetwork, as defined by GO annotation. Interestingly, a negative regulation of body size subnetwork was found to be bridged to the same formation of primary germline subnetwork. Genes within these subnetworks are known to interact with one another in other systems and are discussed below. Therefore, this analysis reveals that SGI interactions do bridge functional modules on a global scale. To further investigate the propensity of SGI interactions to bridge subnetworks, we relaxed the stringency by which we identified subnetworks to create broad subnetworks that contain up to hundreds of genes (Materials and Methods and Additional Data File 7). We reasoned that broad subnetworks are likely to contain genes that belong to common pathways, complexes, and functional modules. Interactions that bridge broad subnetworks are therefore likely to reveal functional redundancy among these components. Consistent with the idea that broad subnetworks are enriched for functional modules, the protein (p<10-4 ) hg, co-expression (p<0) hg, 109

122 A. or SGI Co expression Worm Phenotype Protein protein B. C. Regulation of Body Size Germline Development Figure 3 6. SGI Interactions Bridge Subnetworks. (A) Three hypothetical subnetworks are depicted. We asked whether SGI interactions are more likely to bridge subnetworks (left) or fall within subnetworks (right). (B) An example of a bridged subnetwork pair is shown. A regulation of body size co phenotype subnetwork (green links) is linked to a formation of primary germline co expression subnetwork (blue links) via six SGI interactions (pink links). Visualization generated with Visant (Hu et al., 2007). (c) Broad subnetworks were identified separately within the co expression, blue; co phenotype, green; and interolog networks, purple (see Materials and Methods). All broad subnetworks that are significantlybridgedwith at least one other broad subnetwork by SGI interactions (pink edges) are shown. Nodes represent individual genes and edges represent interactions. Visualization generated with Visant (Hu et al., 2007). 110

123 and co-phenotype (p<10-26 ) hg networks are each significantly enriched for interactions within broad subnetworks (see Additional Data File 9). This was measured by testing the interactions within each network for overlap with broad subnetworks made up of the other types of functional data. By contrast, SGI interactions significantly bridge broad subnetworks (p<10-6 ) hg (Figure 3-6C). 612 SGI interactions bridge subnetworks, compared to an expected based on random chance. These results demonstrate further that SGI interactions have the propensity to bridge distinct functional modules. Together, these results provide the first evidence that functional redundancy may extend beyond individual gene pairs to a higher level of organization within the system, the functional module The connectivity of the current synthetic genetic networks is not conserved between worms and yeast An important question in systems biology is whether genetic interaction networks are evolutionarily conserved beyond network principles. We devised several approaches to investigate whether the connectivity of the current yeast and worm genetic interaction networks is conserved (Figure 3-7). First, a direct comparison of SGI interactions and SGA interactions revealed no overlap. However, there is very little overlap between the sets of genes tested in both screens; only 17% of the gene pairs tested for a genetic interaction in either system are orthologous. Therefore, the significance of the aforementioned result cannot be determined due to a lack of statistical power. Second, we compared a compendium of worm genetic interactions (SGI and Lehner et al. (Lehner et al., 2006) genetic interactions) to a compendium of yeast genetic interactions (genetic interactions in BioGrid (Stark et al., 2006) and SGA interactions (Tong et al., 2004)). This analysis was restricted to pairs of worm genes 111

124 A. or analysis of gene pairs tested for interactions in both worm and yeast B. or analysis of subnetwork bridging by worm and/or transposed yeast interactions worm yeast Figure 3 7. A Schematic of the Approaches used to Investigate if Synthetic Genetic Network Connectivity is Conserved. In all panels, nodes represent genes and lines represent interactions. (A) Among pairs of homologous genes tested for interaction in both worm and yeast, we investigated if there was significant overlap between worm (pink) and yeast (blue) genetic interactions (left), or few overlapping interactions (right). (B) After identifying subnetworks (groups of highly interconnected nodes linked by green, purple or light blue links) within the superimposed network, we investigated if worm (pink) and yeast (blue) genetic interactions link the same (left) or different (right) subnetworks. 112

125 tested by SGI and the Lehner study that have homologs in yeast. We asked whether genes found to interact in worm were more likely to interact in yeast. Of the gene pairs that interact in worms, 4.7% (2/43) also interact in yeast. However, 4.4% (40/916) of all gene pairs tested in worms also interact in yeast. Thus, an interacting gene pair in C. elegans is no more likely than any of the tested gene pairs to interact in S. cerevisiae (Chi Square test, p>0.05). Third, we investigated whether worm and transposed yeast genetic interactions bridge the same subnetworks. For each pair of subnetworks, we determined whether there is a concomitant enrichment of both yeast and worm genetic bridges over what is expected (Materials and Methods). We restricted this analysis to pairs of subnetworks such that one subnetwork contains genes that have been tested for interaction with genes in the other subnetwork in both worm and yeast analyses. Of the 274 subnetwork pairs, 27 are significantly bridged by worm links and 35 are bridged by at least one SGA link. Four of these pairs are bridged by both worm genetic interactions and SGA interactions, which is not a significant enrichment (Chi Square test, p>0.05). Fourth, we repeated the aforementioned analysis using broad subnetworks (see above and Materials and Methods). We found 16 of the 181 possible pairs of broad subnetworks to be bridged by both worm and yeast genetic links, which is not significantly different from the 16.6 pairs expected to be bridged by both types of links by random chance (Chi Square test, p>0.05). We therefore conclude that the connectivity of the current synthetic genetic interaction networks is not conserved between yeast and worms. 113

126 3.3. Discussion Investigating the integration of genetic interactions into a superimposed network reveals a new level of organization To explore how genetic interactions integrate into the biological system, we integrated the SGI interactions with other genetic interactions and data from the interactome, transcriptome, and phenome into a superimposed network. An investigation of the overlap between SGI and other contributing interactions within the superimposed network revealed little overlap. Given that only ~1% of the links in the superimposed network are multiply supported, this is not surprising. The lack of overlap cannot be attributed solely to the sparseness of available data in the superimposed network as both the co-expression and co-phenotype networks were created from nearly genome-scale datasets. In addition, the lack of overlap is unlikely to reflect poor quality data, as we have demonstrated that the interactions within the SGI network and other datasets contain significant numbers of functionally related gene pairs. This paradox may suggest that most high-throughput datasets generated to date have many false negatives. Alternatively, different interaction modes may have little real correspondence with one another and instead yield complementary information. In either case, a better understanding of biological systems may be achieved by investigating the entirety of superimposed networks and not just multiply-supported links Functional Modules Represent Groups of Genes that Share Function Three lines of evidence suggest that multiply-supported subnetworks can help predict the function of uncharacterized genes. First, the subnetworks are significantly enriched for GO biological processes, suggesting that uncharacterized genes within the subnetworks may have 114

127 similar functions. Second, a detailed examination of the bar-1 module revealed new genetic interactions that were not tested within the SGI matrix. Third, a shared role in fat accumulation was discovered among the genes of the bar-1 module. Of note, the prx-5 gene of the bar-1 module is required for peroxisomal import, which carries out β-oxidation of long-chain fatty acids, and has previously been identified in a genome-wide screen for fat regulatory mutants (Ashrafi et al., 2003; Thieringer et al., 2003). In humans, peroxisomal misregulation results in deficient lipid metabolism, which is associated with diseases such as Zellweger s Syndrome (Thieringer et al., 2003). How other components of the bar-1 module regulate fat will be an interesting avenue for further investigation. Regardless, our data shows that the addition of SGI interactions to other data sets enhances the ability to predict gene function. The general lack of overlap between contributing datasets of the superimposed network, along with the topology of the bar-1 module, led us to the finding that SGI interactions bridge across different subnetworks. Subnetworks enriched for particular functions likely work towards a common goal and may define a higher level of organization within the cell, such as molecular machines (Alberts, 1998) or functional modules (Hartwell et al., 1999). In one example, SGI interactions with sma-6 bridge a subnetwork enriched for regulation of body size genes and a subnetwork enriched for germ line development genes. SMA-6 is an ortholog of type I TGF-β receptors (Krishna et al., 1999; Yoshida et al., 2001). While sma-6 regulates bodysize, TGF-β signalling can also regulate germline proliferation in both C. elegans and Drosophila (Narbonne and Roy, 2006; Twombly et al., 1996; Xie and Spradling, 1998). Thus, interactions with sma-6 revealed a putative novel redundant function for the two modules. By overlaying SGI interactions onto a superimposed network, we have discovered significant redundancy between functional modules and revealed a new layer of interactions within a biological system. 115

128 The connectivity of synthetic genetic networks may not be evolutionarily conserved It remains an open question if the connectivity of genetic interactions is conserved, rather than just the principles of network biology. A comparison between two organisms in which genetic interactions have been systematically investigated, S. cerevisiae and C. elegans, suggests not. We have evidence against the conservation of genetic interactions at both the level of individual gene pairs and at the level of subnetwork connectivity. How can this be, given that individual genes, homologous physical interactions (interologs), the essentiality of hubs, and network principles are all clearly conserved (Kamath et al., 2003; Lehner et al., 2006; Li et al., 2004; C. elegans Sequencing Consortium, 1998; Yu et al., 2004; Zhong and Sternberg, 2006)? There are at least three trivial explanations for the apparent lack of conservation in the connectivity of synthetic genetic networks. First, the different approaches used to uncover interactions may have led to an artificial difference in the genetic network connectivity within the two systems. Second, synthetic genetic interaction analysis in C. elegans has focused on signalling pathways that are largely absent from S. cerevisiae, hindering direct comparisons. Third, only a tiny fraction of the synthetic genetic network has been probed in either system. An expanded investigation of the networks may yield more commonalities. Finally, a non-trivial explanation for the apparent lack of conservation may lie in the nature of synthetic genetic networks: Synthetic genetic interaction networks overwhelmingly reveal redundancy between pathways and functional modules (Kelley and Ideker, 2005; Ulitsky and Shamir, 2007, this study). Thus, perturbations in the connectivity between modules may change through random mutation of genes without phenotypic consequence. Over an evolutionary time scale, synthetic genetic relationships may therefore drift and/or be selected for or against to satisfy new constraints 116

129 during speciation (Davidson and Erwin, 2006; Hartwell et al., 1999; True and Haag, 2001). If one mode of evolution is the shuffling of relationships between functional modules, then there may be no reason to expect that the connectivity of genetic networks will be conserved. While model systems have repeatedly proven their utility for discovering and understanding basic biological processes and monogenic diseases, our results suggest that understanding the complex network of interactions that underlie polygenic diseases may require network analysis of systems more closely related to humans. Regardless, a study of the connectivity of synthetic genetic networks from different species may provide insight into the evolution of divergent form and function. Since the work presented in this thesis was published, a study by Tischler et al. has shown that only 0.7% of 837 reported S. cerevisiae genetic interactions between genes with worm homologues were reproducible in C. elegans (Tischler et al., 2008). As such, this result discounts the possibility that the lack of overlap between yeast and worm genetic interactions is due to experimental design. Instead, both the orthogonality of genetic interactions and the propensity of SGI genetic interactions to bridge subnetworks are likely due to the betweenpathway nature of SGI interactions. While the evidence presented here suggests that relationships between pathways or functional modules are shuffled throughout evolution, the possibility of conservation of within-pathway genetic interactions still remains and is discussed in greater detail in Chapter Conclusions I set out to address the questions of whether network topology can be used to assign function to genes in a metazoan, how genetic interactions contribute to our understanding of systems 117

130 biology, and whether genetic interactions are conserved. Through their integration into a superimposed network, I found that the SGI interactions do help reveal new putative functional modules. Because genetic links are largely orthogonal to other interaction modes, the SGI data make a significant contribution to the connectivity within the superimposed network. Furthermore, the SGI interactions link distinct functional modules on a global scale, revealing a new level of organization within the system. Finally, genetic network properties are conserved from yeast to worms, but the connectivity appears not to be. Together, these results indicate that a comprehensive investigation of genetic interactions is critical to our understanding of animal biology Materials and Methods Testing the correlation of target hubs with RNAi phenotype I asked whether targets with high degree (those linked to many query genes) have an increased tendency to produce a strong phenotype when targeted by RNAi compared to targets with low degree (those linked to few query genes). The phenotype data of Kamath et al. (Kamath et al., 2003) were used. We define a strong phenotype as any of the following: Emb (embryonic lethal), Ste (sterile), Let (lethal), Lva (larval arrest), Lvl (larval lethal), or Adl (adult lethal). The null hypothesis is that the degree of a target gene is not correlated with strong RNAi phenotypes. Under the null hypothesis, we expect to find an equal proportion of strong RNAi phenotypes among targets with any degree. We quantified the difference between the observed and expected number of target genes with a strong RNAi phenotype for each degree using a chi-square test with 10 degrees of freedom (one less than the number of query genes). 118

131 Comparing the Network Properties of the SGI and SGA Genetic Networks To measure topological network properties of the SGI and yeast SGA genetic interaction networks, we used tyna to analyze the variance of the SGI and yeast SGA network properties (Yip et al., 2006). The resulting standard errors of the mean for the SGI network parameters are reported in the text Construction of the transposed SGA network and the interolog network We constructed the transposed SGA network of synthetic genetic interactions from those interactions described in (Tong et al., 2004) by mapping each yeast gene to its predicted worm ortholog(s). Maps were created containing all gene pairs with BlastP significance values of p<10-30 or better (Altschul et al., 1997). For interactions between yeast genes with multiple predicted worm orthologs, transposed interactions were created for all combinations of predicted orthologs. The interolog network was created from eukaryotic protein-protein interactions reported in BioGRID (Stark et al., 2006). All interactions assembled from organisms other than C. elegans were mapped to predicted worm ortholog pairs using BlastP with a significance cutoff of p<10-30 (Altschul et al., 1997) Construction of the co-phenotype network We calculated a loss-of-function agreement score, LOFA, for two genes i and j, defined as: 119

132 LOFA i 37 (, j) = 2 [ K iv K jv log( f v ) ( 1 K iv )( 1 K jv ) log( ( 1 f v ))] v= 1 where f v is the frequency of phenotype v across the genome and K iv is the (i,v) th entry from the RNAi phenotype compendium matrix as described above. If RNAi against each of two genes produces phenotype v, the LOFA score is increased by -log(f v ). The boost is larger for more infrequent phenotypes. For example, a phenotype that occurs in 1 out of 100 genes will increase the score by 2 units, whereas a phenotype that occurs in 1 out of 10 genes will contribute only 1 unit of score. The LOFA s second term gives a bonus to two genes if they both do not share a common phenotype in an analogous fashion. The LOFA and phenotypic PCC measures of similarity were compared by measuring their ability to predict genes of related function. For each score, we constructed networks induced by using a cutoff above which genes were considered to be functionally related. We first varied the LOFA score cutoff from high to low, producing 51 networks of increasing size. Similarly, 51 networks of increasing size were produced for phenotypic PCC by lowering the phenotypic PCC cutoff. The precision of each network was measured by calculating the fraction of linked genes found to be annotated with a common GO category. Precision levels were then plotted against the network size. LOFA was found to be superior to phenotypic PCC for connecting genes of related function as it produced substantially higher precision levels than phenotypic PCC for every network size (see Additional Data File 11). A final co-phenotype network was constructed by linking genes exhibiting significant levels of agreement. The significance of the LOFA score was assessed by generating 3 million random LOFA values. We first constructed a random dataset in which the genes associated with loss of function phenotype v in the RNAi phenotype compendium were permuted. This was repeated 120

133 for each phenotype to produce one permuted dataset from which 100,000 random pairs were then picked and LOFA was calculated. We repeated this procedure for 30 different permuted datasets. We found that a cutoff of 7.0 was equivalent to an estimated significance level of as approximately 100 LOFAs computed from random datasets exceeded this value on average in each of the 30 permuted trials Construction of permuted networks To gauge the significance of various network properties, 1,000 randomly permuted networks were constructed for each data type. Permuted SGI networks were created by combining permuted signalling and LGIII networks. A link in each of these networks associates one query gene with one RNAi target gene. The permuted SGI networks link each query gene to a random set of target genes by randomly picking genes from the entire set of target genes tested in the screen. The number of target genes linked to each query was held fixed in the permuted networks to preserve the degree distribution across target genes. We also created permuted Lehner et al. networks, yeast SGA networks, and protein interaction networks using this method. Permuted co-expression, co-phenotype, and fine genetic networks were created by randomly linking genes present in each network. Random superimposed networks were created by taking the union of all links from the permuted networks obtained from the separate data types Determination of the significance of the number of supported links The significance of the number of supported links (gene pairs linked by more than one data type) in the superimposed network was estimated by comparing the observed number of 121

134 supported links to the number of supported links in 1000 randomly permuted superimposed networks. Significance was calculated with a standard Z-score transformation using the mean and standard deviation of the number of supported links across the random networks. The significance of the overlap of two data types was estimated in a similar manner Identification of gene subnetworks We identified subnetworks, defined as small- to medium-sized groups of possibly overlapping genes, by searching for densely connected sets of genes in individual networks and in the superimposed network using MODES (Hu et al., 2005). We used MODES parameter settings such that a subnetwork must have at least 50% connectivity, cannot overlap any other subnetwork by more than half of its genes, and must contain a minimum of four genes. A connectivity significance score was assigned to each subnetwork based on the number of links connecting each of its members (supports). The connectivity significance score for a subnetwork containing n genes was calculated as a standard Z-score (l-m)/s where l is the observed number of links in the subnetwork and m and s are the mean and standard deviation of the number of links across 1,000 random collections of n genes. As a post-processing step, any gene that was not grouped into a subnetwork by MODES was iteratively considered for addition to each subnetwork. To achieve this, a hierarchical clustering merge step was performed on all such genes across all subnetworks, using the connectivity score as the basis for a similarity metric. At each step in the clustering, the gene/subnetwork pair with the largest increase in connectivity score was combined. The connectivity score increase was calculated as the subnetwork s connectivity score upon addition of the gene minus its connectivity score prior to the addition of the gene. 122

135 Broad subnetworks were identified in single datatype networks using the VxOrd algorithm (Kim et al., 2001). VxOrd clusters a network of genes on a two-dimensional surface using multidimensional scaling (Werner-Washburne et al., 2002). The links between genes are treated as spring constants and a configuration of the springs is sought that minimizes the total free energy of the system. The result is a collection of genes arranged on the X-Y plane. We partitioned the genes into clusters using the dense subregions obtained from two-dimensional density estimation over a grid superimposed on the X-Y plane. We formed clusters of genes in contiguous regions whose densities were at least 10% of the maximum density and matched a minimum area cutoff Characterization of multiply-supported subnetworks Each subnetwork identified in the superimposed network was inspected to determine which types of data significantly link its gene members. For each subnetwork, the significance of the number of links of a specific data type which connected two genes within the subnetwork was calculated using the connectivity significance score (see previous section). Subnetworks were annotated as enriched for a data source if the connectivity score had an associated P-value of 0.01 or less. The bar-1 module was identified in a search for multiply-supported subnetworks within an earlier version of the superimposed network. The links within the subnetwork were updated using the same data as reported in the current subnetwork. This resulted in the addition of two links to the module: an interolog interaction between efl-1 and lin-35 and a Lehner interaction between ubc-18 and lin

136 Nile Red Analysis L4 parental worms were placed on NGM plates seeded with RNAi or mock-rnai bacteria and ug/ml Nile Red. L4 F1 and F2 progeny were analysed by fluorescence microscopy for Nile Red intensity. To quantify Nile Red intensity, Openlab software (Improvision Inc. Lexington, MA) was used to calculate mean fluorescence within a measured area as well as the length of the worm. Nile Red intensity was calculated as: Mean fluorescence x Area/length of worm Identification of significantly bridged subnetwork pairs All pairs of subnetworks derived from the co-expression, co-phenotype, and interolog networks were inspected for significant bridging by SGI links. An SGI link is considered to bridge a pair of subnetworks if it connects a gene in one subnetwork to a gene in another subnetwork. The total number of bridges was counted for each pair of subnetworks. The significance of the number of bridges for each subnetwork pair was then determined with a standard Z-score transformation using the mean and standard deviation of the number of bridges between that subnetwork pair in 1,000 randomly permuted SGI networks (see Additional Data File 12 for evidence that a normal approximation in the Z-score transformation is valid). In addition to a cutoff of P< 0.01, a subnetwork pair was required to have at least three bridges to be considered significantly bridged. 124

137 Estimation of the significance of the number of bridged subnetwork pairs We estimated the significance of the number of significantly bridged subnetwork pairs by comparing to the number of pairs significantly bridged by permuted SGI networks. Each of the 1,000 randomly permuted SGI networks was used to search for significantly bridged subnetwork pairs using the same method described above for the true SGI network. The mean and standard deviation of the number of significantly bridged subnetwork pairs were then calculated across all permuted networks. The number of subnetwork pairs significantly bridged by the SGI network was then compared to these values using a standard Z-score transformation to obtain a single significance value Determination of bridging propensities To measure the propensity for a given datatype to bridge subnetworks more than expected by chance, we restricted our analysis to all subnetwork-to-subnetwork links (SSLs). We defined an SSL as a linked gene pair (A,B) in which both A and B were included in at least one broad subnetwork of any data type. Over all SSLs, we counted the number of supports, those links in which genes A and B occurred in the same subnetwork, as well as bridges, those links in which A and B occurred in separate subnetworks. Links that both bridge and support were counted as supports. The bridging fraction was then calculated as the total number of bridges divided by the total number of SSLs. The observed bridging fraction was calculated using all SSLs in the network. The expected bridging fraction was calculated using all SSLs tested in the dataset. To measure the tendency for a given datatype to link across versus within broad subnetworks, we calculated the bridging propensity as the observed bridging fraction divided by the expected 125

138 bridging fraction, minus one. Positive bridging propensities are indicative of a link type tending to bridge (as opposed to fall within) broad subnetworks more than expected by chance Determination of the degree of subnetwork bridging conservation To determine if the same subnetwork pairs were bridged in worm and yeast, we identified significantly bridged subnetwork pairs separately in each species. We used a compendium of SGI and Lehner et al. interactions for worm, and transposed SGA links for yeast. We examined all pairs of subnetworks and broad subnetworks separately. We calculated the expected number of bridges as the number of possible (tested) gene pairs between the subnetworks times the probability of linking a gene pair for that data type. An estimate of the probability of a data type linking a gene pair was calculated as the number of links in its network divided by the number of possible (tested) links. This yielded an estimated background probability of for worm, and for yeast. To determine the degree of subnetwork bridging conservation among all possible pairs of subnetworks, we created contingency tables containing the observed and expected number of subnetwork pairs significantly bridged only in worm, only in yeast, in both, and in neither. The expected number of pairs for each of these four categories was then calculated, assuming independence of worm and yeast bridging. We first calculated the worm bridging probability, P w (P y for yeast), as the number of bridged subnetwork pairs divided by the total number of pairs, N. The expected number of subnetwork pairs bridged only in worm was then calculated as NP w (1-P y ). Likewise, the expected number of bridged pairs only in yeast was calculated as N(1-P w )P y. The expected number of bridged pairs in both species was calculated as NP w P y. 126

139 Finally, the expected number of pairs bridged by neither was N(1-P w )(1-P y ). We used a chisquare test with 3 degrees of freedom to determine if the observed and expected counts for each of these categories were significantly different. 127

140 Chapter 4 Summary of Global Genetic Interaction Analysis in C. elegans and Future Directions

141 Chapter 4. Summary of Global Genetic Interaction Analysis in C. elegans and Future Directions 4.1. Summary In the preceding chapters, I presented an approach to investigate metazoan genetic interactions called Systematic Genetic Interaction (SGI) analysis. In C. elegans, approximately 80-90% of genes targeted by RNAi produce no phenotype, which is an indication that large amounts of genetic buffering exist within the worm genome. To investigate the function of these redundant genes in conserved signal transduction pathways, I fed RNAi-inducing bacteria to viable hypomorphic mutants in each of the Insulin, EGF, FGF, Wingless, Notch, DDR and TGF-β signalling pathways and looked for synthetic sick or lethal genetic interactions. I fed mutants in seven pathways [daf-2(e1370)/insr, let-23(n1045)/egf, let-756(s2613)/fgf, egl- 15(n1477)/FGFR, let-60(n2021)/ras, sem-5(n2019)/grb2, glp-1(or178)/notch receptor, rad- 5(mn159)/DDR component, sos-1(cs41)/sos1, bar-1(ga80)/wnt component, and sma- 6(e1482)/TGF- β receptor] RNAi-inducing bacteria that target 372 known or predicted signalling genes and 486 random genes from linkage group (LG) III. A quantitative scoring scheme facilitated comparison of the growth of mutant-rnai combinations to the growth of controls. Genetic interactions were then inferred through an unbiased global analysis of the growth matrix. The resulting network contains 1246 synthetic sick or lethal interactions, representing the largest metazoan genetic interaction network to date. A significant number of interacting gene pairs have the same GO annotation, which suggests that these interactions are biologically relevant. 129

142 The SGI network was superimposed with previously reported data sets that include yeasttwo-hybrid physical interactions, microarray co-expression, RNAi phenotypes, genetic interactions, and orthologous gene interactions in an attempt to understand the functional implications of the genetic interactions. The resulting superimposed network contains 56 putative functional modules that consist of densely connected groups of genes with shared function. One of the functional modules regulates fat accumulation and is coordinated by genetic interactions with bar-1/β-catenin. This is but one example of how the analysis of the SGI network of genetic interactions, the superimposed network, and the functional modules they contain reveals new function for uncharacterized and previously characterized genes, as well as cross-talk between conserved signal transduction pathways. The superimposed network also provides insight into how genetic interactions relate to other types of functional data on a global scale. The majority of genetic interactions are orthogonal to other types of functional relationships and bridge subnetworks on a global scale. In addition, comparison of current C. elegans and S. cerevisiae genetic interaction networks reveals shared network properties but a lack of conservation at the level of individual links. Therefore, synthetic genetic interactions may reveal modular redundancy in a metazoan, whereby the relationship between groups of functionally related genes is altered throughout evolution Future Directions Genetic Interactions Serve as a Hypothesis Generating Resource to Infer Function for Many Genes 130

143 The SGI network, superimposed network, and functional modules that they contain can be exploited in many ways to reveal insight into metazoan biology. They each serve as a hypothesis generating resource to 1) reveal new function for several genes and 2) uncover connections between classic signalling pathways to reveal mechanisms underlying signalling plasticity. Understanding the biological implications of the specific functional hypotheses will require detailed investigation by researchers who specialize in specific pathways or processes. Furthermore, the networks will serve as new resources for developing bioinformatic approaches to biological discovery. Examples of the investigation of specific functional hypotheses that derive from the SGI data as well as examples of the inclusion of the SGI data in new analyses of systems biology are presented herein. The SGI network contains genetic interactions with 133 uncharacterized genes from the set of LGIII targets. Functional hypotheses for these uncharacterized genes stem from shared interaction patterns with a specific cluster of genes in the heatmap (Chapter 2.2.8, Additional data file 5) or from interaction with a given query gene (Additional data file 3). For example, four uncharacterized genes share a similar interaction pattern with a group of genes annotated with Notch receptor processing. A detailed phenotypic characterization of these four mutants in the background of various Notch mutants may identify a specific role for these genes in cell fate determination. Analysis of the superimposed network resulted in the identification of 56 functionallyenriched multiply supported subnetworks (Additional data file 6). Many functional hypotheses can be derived from the links within these subnetworks. For example, an interaction between bar-1, a β-catenin with a characterized role in Wnt signalling, and ogt-1, a gene that encodes an O-linked N-acetylglucosamine transferase with a previously characterized role in macronutrient 131

144 storage and dauer formation (Hanover et al., 2005), was represented in the bar-1 subnetwork (Figure 3-3). The genetic interaction between ogt-1 and bar-1 and their inclusion in a subnetwork of genes that regulate lipid accumulation or storage inspired a detailed investigation of the relationship between the two genes by the Hanover lab (personal communication). His group found that bar-1 and ogt-1 cause a synthetic protruding vulva (Pvl) phenotype. Moreover, the double mutant has dramatically reduced fecundity. The Hanover lab is now investigating the role of ogt-1 in the bar-1/wnt pathway that regulates vulval induction and the role of bar-1 in lipid metabolism. Therefore, the subnetworks have already provided valuable functional hypotheses to the C. elegans community. The SGI and superimposed network can be integrated with new data as it arises to lend functional information to that process, as has been done with a network of predicted functional interactions (Zhong and Sternberg, 2006). Cram et al. carried out a screen for genes with a role in cell migration and found 99 genes required for distal tip cell migration (Cram et al., 2006). By superimposing this list of genes with the network of predicted functional interactions (Zhong and Sternberg, 2006), the authors were able to connect 59 of the 99 genes within the new integrated network. The resulting network describes the relationships between and organization of a group of genes required for cell migration. The SGI and superimposed networks are now another resource to which data of interest can be overlapped to gather functional information regarding a process of interest. Paananen and Wong (2009) built FORG3D, a 3-dimensional graph editor for integration, manipulation, and visualization of genome scale data. The SGI data was used in a case study to show the utility of this software for integrating multiple data types and extracting biologically relevant information. Specifically, the SGI network was integrated with functional annotation 132

145 from Wormbase and genome-wide gene expression data that reflects the consequences of a transgenic Parkinson s disease model (Vartiainen et al., 2006). The analysis linked daf-2 to multiple genes whose expression changed in the Parkinson s disease model compared to wildtype, thereby suggesting that daf-2 is a regulator of gene expression in this model of Parkinson s disease. This finding supports multiple lines of evidence that suggest an association between the Insulin pathway and Parkinson s disease (Arnulfo Quesada, 2004; Cohen and Dillin, 2008; Craft and Stennis Watson, 2004; Hu et al., 2007; Offen et al., 2001; Takahashi et al., 1996). In addition to its utility in the investigation of specific functional processes, the SGI network has also served as a resource for large-scale analyses of the general principles of network biology. For example, Chipman and Singh (2009) used the network to construct a genetic interaction prediction algorithm based on network topology. The method predicts genetic interactions with a true positive rate of 95% and a false positive rate of 7%. In addition, Hannay et al. (2008) used the SGI network in an investigation of the correlation between gene essentiality, gene duplication, and gene buffering in eleven organisms. The group investigated whether disruption of a duplicated gene is lethal and found that gene duplicates contribute weakly to buffering in nine of the eleven organisms, including E. coli, D. melanogaster, C. elegans, and S. cerevisiae. The SGI data were used to demonstrate that buffering gene duplicates share more function than non-buffering duplicates. The examples outlined above highlight how future analysis of the SGI functional networks have the potential to assign function to uncharacterized genes, specify new function for previously characterized genes, and provide new insights into the amount of crosstalk that occurs between genetic pathways in an animal. Moreover, the functional SGI networks will serve as new resources for the investigation of network biology. 133

146 4.3. Towards a Global Network Graph in C. elegans Despite being a relatively large C. elegans interaction network, the SGI network is relatively small in the context of all of the potential genetic interactions in the animal. Creating a global network graph of C. elegans genetic interactions is a daunting task that would require screening all 20,604 genes (Wormbase, Release WS170) against all 20,604 genes, thereby testing 212,262,408 potential genetic interactions. The enormity of this task means it will take an immeasurable amount of time to complete. As we proceed to investigate as many of these interactions as possible, five developments in genetic interaction analysis will greatly improve our understanding of genetic interactions and gene function in a metazoan: 1) high-throughput screens for alleviating interactions; 2) high throughput sensitive assays for genetic interactions that result in specific phenotypes aside from population growth; 3) investigation of genetic interactions among functionally related subsets of genes; 4) improvement of functional datasets other than genetic interactions; and 5) investigation of genetic interactions in other animals (Figure 4-1). Each of these advances are discussed in detail below High-Throughput Screens for Alleviating Interactions may Reveal Within-Pathway Interactions and Conserved Yeast Genetic Interactions Alleviating interactions typically reveal within-pathway interactions (Collins et al., 2007; Onge et al., 2007; Schuldiner et al., 2005). In contrast, enhancing and synthetic interactions typically reveal between-pathway interactions (Byrne et al., 2007; Kelley and Ideker, 2005; Tong et al., 2004; Ulitsky and Shamir, 2007). Therefore, alleviating genetic interactions are an important contribution to a comprehensive network of functional links. Although SGI was not designed to 134

147 Alleviating and Enhancing Genetic Interactions Functional Data From Other Organisms Global Metazoan Network High Throughput Assays for Specific Synthetic Phenotypes Development of Other Functional Datasets Directed Synthetic Screens Figure 4-1. Towards a Global Metazoan Network Graph. Several improvements to highthroughput functional investigation will be instrumental in creating a biologically informative global metazoan network. 135

148 uncover alleviating genetic interactions, it can easily be modified to do so. As is, the experiments are scored at approximately the same time as when the populations of worms on control plates reach saturation. A double mutant that grows more slowly than controls indicates a genetic interaction between the two disrupted genes; therefore, the screen is biased towards the identification of enhancing or synthetic lethal interactions. Modification of the SGI approach to score experiments at earlier time points would allow investigators to identify mutant combinations that grow more quickly than controls, thereby identifying alleviating interactions. Beyond adding to the functional annotation of C. elegans genes, alleviating interactions are also integral to a thorough investigation of conservation between C. elegans and S. cerevisiae genetic interactions. In Chapter 3, I presented supporting evidence for the hypothesis that within-pathway interactions are conserved and between-pathway interactions are not. First, the majority of enhancing genetic interactions identified to date in C. elegans and S. cerevisiae, which are largely between-pathway interactions, are not conserved (Byrne et al., 2007; Tischler et al., 2008). Second, physical interactions, which overlap with within-pathway genetic interactions, are conserved between C. elegans and S. cerevisiae. It has been proposed that perhaps physical interactions antagonize evolutionary change within a complex (Roguev et al., 2008). Therefore, the possibility remains that within-pathway genetic interactions are conserved between worms and yeast. Third, since the publication of the SGI analysis, a study of the epistasis map of fission yeast revealed that alleviating interactions are more likely to be conserved between S. pombe and S. cerevisiae than are aggravating interactions (Roguev et al., 2008). It is unlikely that these alleviating interactions are conserved because of lack of evolutionary divergence between the two unicellular fungi. S. pombe and S. cerevisiae are as 136

149 evolutionarily divergent as are C. elegans and H. sapiens (Hedges, 2002). Therefore, conservation of alleviating interactions between S. pombe and S. cerevisiae supports the hypothesis that alleviating interactions may be conserved between S. cerevisiae and C. elegans. Despite evidence indicating that within pathway interactions are more conserved than between pathway interactions, evidence is emerging that the latter are somewhat conserved as well. First, conserved synthetic enhancing and synthetic lethal interactions have also been identified in S. pombe and S. cerevisiae (Dixon et al., 2008; Roguev et al., 2008). However, Roguev et al. found that these aggravating interactions between functional modules were less conserved than alleviating interactions within functional modules. Second, a small analysis of enhancing genetic interactions among components of the C. elegans spindle assembly checkpoint (SAC) pathway revealed conservation of 9 of 21 synthetic lethal genetic interactions identified among S. cerevisiae SAC components (Tarailo et al., 2007). It is not clear whether the enhancing C. elegans interactions are within- or between-pathway interactions. Therefore, the hypothesis that within-pathway interactions are conserved and between-pathway interactions are not does not exclude conservation of some enhancing and synthetic genetic interactions. In summary, modification of SGI to investigate alleviating genetic interactions will reveal within-pathway interactions that will contribute to a comprehensive interaction network that can be used: to identify gene function, to study network biology and to investigate conservation of interactions with other organisms. Conservation of alleviating genetic interactions would suggest that one mechanism of evolution is to modify the relationships between pathways, complexes, or functional modules while maintaining their inner connectivity. In turn, this would suggest that functional relationships in model organisms can be used to infer interactions 137

150 among genes within functional modules but cannot be used to infer from functional relationships between functional modules Genetic Interaction Analyses in Specific Cells or Tissues May Reveal More Biologically Relevant Interactions In order to provide a detailed view of genetic interactions in C. elegans, specific phenotypes that are less catastrophic than lethality must be investigated. Towards that end, high-throughput and specific assays need to be developed to identify genetic interactions that result in a specific phenotype. For example, the specificity of the SGI screen could be enhanced by assaying for interactions in muscle within the context of a living worm (Figure 4-2). Screening for interactions between muscle-expressed genes (Roy et al., 2002) would result in a network of interactions likely required for proper muscle development and function, with the ultimate goals of providing a more detailed view of the metazoan network, revealing gene function, and uncovering novel, evolutionarily conserved functional modules. To do so, lin-15b; eri-1 worms, which are hypersensitive to RNAi (Wang et al., 2005), would be soaked in two strains of RNAi from a list of candidate muscle-expressed genes. Musclespecific genetic interactions could be investigated by looking for paralyzed, arrested at the twofold embryo stage (Pat), and uncoordinated (Unc) phenotypes. For unknown reasons, muscle contraction is required for elongation of the worm; therefore, worms that have nonfunctional body wall muscle cells are Pat (Williams and Waterston, 1994). However, worms that have only partly compromised body wall muscles are Unc (Brenner, 1974). This presents a spectrum of phenotype from which the level of gene function can be inferred (Baugh et al., 138

151 (A) System wide enhanced RNAi (B) Enhanced muscle specific RNAi lin 15b; eri 1 RNAi Muscle lin 15b; eri 1; rde 1; myo 3::rde 1 (C) RNAi 1 RNAi L3 Worms 96 well plate Agar Slab Unc Wild Type Pat Figure 4-2. Synthetic Genetic Analysis of Muscle in C. elegans. Two background strains will be used in the analysis of muscle-expressed genes. (A) lin-15b; eri-1 has system-wide enhanced RNAi. (B) lin-15b; eri-1; rde-1; myo-3::rde-1 has muscle-specific enhanced RNAi. The worms will be used in the protocol outlined in (C). Two cultures of RNAi that target specific muscle- expressed genes will be grown overnight in each well of a 96-well plate rd larval-stage (L3) worms will be added to each well and incubated at 20C for 3-4 days. Worms will be taken from the bottom of each well and placed on an agar slab to look for phenotypes associated with muscle impairment: Unc (Uncoordinated) and Pat (Paralyzed, arrested at twofold). 139

152 2005; Hresko et al., 1994), which would simplify the interpretation of results to make the assay higher throughput. Indeed, this new approach would improve on the current SGI approach in a number of ways. For example, assaying for specific phenotype should lower the rate of false positive interactions. Defects in the nervous system or defects in attachment of the muscle to the extracellular matrix and hypodermis could also result in Unc or Pat phenotypes, respectively (Hresko et al., 1994). Incorporation of pan-neuronal and myo-3-driven body-wall muscle-specific fluorescent reporters into the background strain would facilitate identification of gross structural defects in either tissue. This would allow for discrimination between interactions in the different tissue types and isolate muscle-specific interactions. The tissue-specificity of the screen could be further increased by disrupting gene function specifically in muscle cells. To do so, a triple mutant strain eri-1; lin-15b; rde-1 could be built. rde-1 mutants have system-wide compromised RNAi processing (Parrish and Fire, 2001), while lin-15b and eri-1 mutants enhance RNAi efficacy when rde-1 function is wild type (Wang et al., 2005). Incorporation of a transgene, hlh-1::rde-1, into this strain would drive muscle specific expression of WT rde-1, allowing RNAi to work in an enhanced fashion in muscle cells specifically (Qadota et al., 2007). The RNAi feeding and screening protocols would be the same as described above. Since the interactions would be specific to muscle cells, the approach would uncover interactions between muscle-specific genes and genes involved in core biological processes. The muscle-specific and system-wide RNAi approaches would be complementary, as the muscle-specific screen would uncover interactions between genes that may have a lethal phenotype in the systemic RNAi screen. Restricting the disruption of gene function to a specific 140

153 cell-type is comparable to screening conditional alleles of essential genes with yeast SGA (Davierwala et al., 2005). Conditional alleles of essential genes are tested at the permissive temperature such that function is reduced, not eliminated, allowing the yeast cell to survive. Similarly, interacting genes with roles in core biological processes would likely produce dead muscles in the muscle-specific RNAi screen and a lethal phenotype in the system-wide RNAi screen. Alternatively, neuronal, extracellular matrix or hypodermal interactions in the systemic RNAi screen would not be duplicated in the muscle-specific RNAi screen. Combined, the two assays would provide a network of genes in various tissues that play a role in muscle development and function. As in the SGI analysis, the muscle data could be overlapped with the superimposed network, thereby comparing the interactions to other genetic interactions, yeast-two-hybrid physical interactions, microarray co-expression data, and phenotype correlations. Interactions that are supported by evidence from more than one data set would be considered the most likely to have biological significance; however, the superimposed network could also identify novel functional modules as those that do not overlap with data from other interaction screens. The function of the genes in these novel functional modules could be tested with specific functional assays based on the known function of the characterized genes in the module. The inclusion of orthologous human Y2H (Stelzl et al., 2005) and gene co-expression data (Lee et al., 2004; Prieto et al., 2008; Stuart et al., 2003) in the superimposed network would also allow for investigation of conservation of the functional modules in humans. 141

154 Investigation of Genetic Interactions Among Functionally Related Subsets of Genes will Create a Dense Network of Interactions Investigation of a subset of genes that are likely to be functionally related maximizes the number of genetic interactions that can be obtained with limited time and resources. For example, the SGI network of genetic interactions was enriched for interactions with predicted signal transduction targets. Far fewer interactions were identified with the randomly chosen LGIII target genes. In yeast, epistatic miniarrays consist of arrayed mutants that have a shared function such as chromatin regulation (Schuldiner et al., 2005). Genetic interaction analyses among miniarrays uncover a larger number of genetic interactions than screens where one mutant is tested for genetic interaction against the entire genome, even though the number of potential interactions is larger in the latter scenario (Schuldiner et al., 2005). There is no question that non-hypothesis driven research is crucial to biological discovery. However, a dense network of genetic interactions can provide a strong framework with which to investigate the function of a candidate gene. This is done by exploiting the guilt-by-association principle of network biology, whereby a candidate gene s function is likely to be similar to that of its closest neighbours (Tong et al., 2004). Therefore, to build an informative global interaction network, both candidate and non-candidate screens for genetic interactions could be performed, as was done with SGI Advances in Functional Genomics will Improve the Global Network Advances in identification of other types of functional relationships will provide better coverage of the genome and more accurate data to integrate and compare genetic interactions. Consider the approaches to identify functional data that were used to create the SGI superimposed 142

155 network. Specific phenotypic characterization of RNAi-targeted genes will improve the cophenotype network. For example, RNAi screens for detailed embryonic phenotypes have characterized the function of multiple genes with previously unidentified roles in embryonic development (Piano et al., 2000). Moreover, the creation of a genome-wide knockout library will greatly enhance phenotypic characterization of genes that are refractory to RNAi. The inclusion of specific phenotypes for more genes will provide a finer view of the network. The C. elegans yeast-two-hybrid data that was used in the superimposed network has relatively low precision and recall compared to other functional datasets (Chapter ). Since then, false positive and false negative rates of yeast-two-hybrid screening were improved with the use of empirical quality control measures (Simonis et al., 2009). Large-scale implementation of other approaches to identify physical interactions such as membrane-tagged Y2H to identify interactions with membrane proteins (Stagljar et al., 1998) and yeast one-hybrid approaches to identify transcriptional regulators (Deplancke et al., 2004; Deplancke et al., 2006) will add valuable functional data to the network. Moreover, orthologous mammalian interactions that were identified with approaches such as high throughput mammalian Y2H (Fiebitz et al., 2008) and LUminescense based Mammalian IntERactome technology (LUMIER) (Barrios-Rodiles et al., 2005) will all contribute to a dense network of functional relationships. Finally, the co-expression network will be improved with the addition of new types of data that include analyses of spatio-temporal transcriptional promoter activity (Dupuy et al., 2007) and the genetic interaction network will be improved with additional genetic interactions identified in S. cerevisiae and C. elegans. Identification of genetic interactions will be aided by increased availability of mutants and development of screens that identify a range of alleviating and enhancing interactions by using quantitative scoring methods and specific assays for 143

156 interactions. As discussed in detail below, genetic interactions from other metazoans may also contribute to the global network graph. Just as the annotation of a genome changes as we learn about the functions and intricacies of gene structure, the implications of the functional network will also evolve as more data is added to it. Therefore, continuous re-evaluation of the superimposed network ought to be carried out upon large additions to it Investigation of Worm Interactions with Interactions in Higher Organisms Development of genetic interaction analysis in other metazoans will allow us to address whether genetic interactions are conserved among metazoans. If conserved, orthologous genetic interactions from multiple organisms could be combined to provide a global view of a metazoan system. Many aspects of the SGI methodology are applicable to the systematic investigation of genetic interactions in higher organisms. These include the use of RNAi libraries, imaging platforms, and unbiased determination of genetic interactions based on precision and recall measurements. Multiple genome-wide RNAi libraries have been developed that target genes in D. melanogaster and H. sapiens (Boutros et al., 2004; Dietzl et al., 2007; Goshima et al., 2007; Moffat et al., 2006). Moreover, cellular imaging technologies have been developed to analyse the phenotypic consequences of gene disruption in a high-throughput manner (for examples see Moffat et al., 2006; Sepp et al., 2008). As in C. elegans, there is no large set of experimentally verified genetic interactions with which to compare a new set of genetic interactions for true positives and false negatives. The use of precision and recall based on GO annotation to identify interactions of varying strengths in an unbiased manner could be instrumental to the identification of genetic interactions in metazoans. 144

157 To date, high-throughput genetic interaction screens have only been carried out in cultured cells. For example, Bakal et al.(008) screened 17,724 Drosophila gene pairs targeted for simultaneous downregulation by RNAi to identify regulators of JUN NH 2 -terminal kinase (JNK) phosphorylation. Fluorescence resonance energy transfer (FRET) was used to determine the phosphorylation state of JNK in cells transfected with various pairs of dsrna. Use of the FRET reporter system provided a quantitative phenotypic readout that identified 55 enhancers and suppressors of JNK phosphorylation. While cell culture is convenient for large-scale RNAi screens, it does not reflect genetic interactions that occur in the context of a whole animal such as non-autonomous signalling. In the future, genetic interaction screens in whole animals will greatly improve our understanding of the metazoan network. Drosophila is the organism of choice for large-scale interaction screens because it is extremely well-studied and its genome is highly conserved with the human genome. However, making in vivo genetic interaction screens high-throughput is technically difficult. The most high-throughput way to reduce gene function in a highthroughput manner is to use RNAi. Currently, there are two ways to reduce gene function in a whole fly by RNAi: injection of dsrna into the fly embryo and the use of integrated transgenic dsrna expression constructs. The genome-wide library of transgenic flies that express dsrna will be instrumental to future large-scale genetic interaction investigations because it provides the ability to create gene-specific losses of function in a tissue-specific manner in vivo (Dietzl et al., 2007). While the transgenic RNAi library has been used for a genome-wide screen of genes involved in Notch signalling (Mummery-Widmer et al., 2009), it has not yet been used in a highthroughput screen for genetic interactions. 145

158 In summary, the development of various approaches to functional analysis and the accumulation of functional data, including SGI and the genetic networks that were identified with it, has created an important foundation on which to build a global metazoan interaction network. The construction of global networks has and will provide a better understanding of the mechanisms that regulate development, disease, and evolution. The SGI analysis indicates genetic interactions will be a critical part of this global network as they link disparate types of functional data. Moreover, the inclusion of diverse types of genetic interactions, such as alleviating interactions, interactions from other organisms, and tissue-specific interactions will dramatically improve our understanding of the metazoan network. 146

159 Chapter 5 UNC-73 Functions Cell- Autonomously in the UNC-40 Pathway to Regulate Muscle Arm Extension I performed the experiments presented in this chapter except for the following: Kevin Chan built the pprkc294[him-4p::unc-40::yfp] construct and Guillermo Selman built all of the other constructs presented herein. Robert Steven (University of Toledo, Ohio, USA) provided the UNC-73B cdna. Mariam Alexander, Kevin Chan, and Ryan Mui (a rotation student) mapped tr114, tr117, tr121, and tr126 to large regions of the respective chromosomes, which I further refined.

160 Chapter 5. UNC-73 Functions Cell-Autonomously in the UNC-40 Pathway to Regulate Muscle Arm Extension Abstract In the nematode Caenorhabditis elegans, the postsynaptic membrane of the neuromuscular junction reaches its destination through an active process of guided cell extension. The worm has 95 body wall muscles (BWMs) that extend projections called 'muscle arms' to motor axons. The muscle arms harbour the postsynaptic elements of neuromuscular junctions. The stereotypical pattern of muscle arm extension was exploited in a forward genetic screen for new genes required for guided cell migration by looking for mutations that caused a reduction in the number of arms that extend to the motor axons. One of the resulting mutants was tr117, in which BWMs extend half the number of arms as the corresponding muscles in wild type animals. Genetic mapping, complementation tests, and sequencing revealed that tr117 is a mutation in unc-73, which encodes a guanine nucleotide exchange factor orthologous to Trio in vertebrates. Expression of UNC-73 in the BWMs, but not in the nervous system, rescues the muscle arm development defect (Madd) phenotype of unc-73(e936) mutants, indicating that UNC-73 functions cell-autonomously to regulate muscle arm extension. UNC-73 expression in the BWMs is enriched at muscle arm termini in a pattern similar to that of UNC-40/Dcc, which directs muscle arm extension to the motor axons. UNC-73 over-expression suppresses the Madd phenotype of unc-40 null worms and unc-73(e936) suppresses ectopic myopodia induced by UNC-40 overexpression. These results indicate that UNC-73 functions downstream of UNC- 40 in a pathway that regulates muscle arm extension. 148

161 5.1. Introduction to Muscle Arm Extension in C. elegans Genetic Control of Guided Cell Migration is Plastic in Nature Understanding guided cell migration is imperative because of its roles in both development and disease. In the developing embryo, neurons extend axons towards muscle cells to establish a synapse through which the two cell types communicate. The study of axon guidance has led to the discovery of several conserved chemotropic signalling pathways including the Netrin, Slit, Ephrin, and Semaphorin pathways (Yu and Bargmann, 2001). These pathways have either short- or long-range effects that repel or attract axonal growth cones along the dorsal-ventral and anterior-posterior axes (Killeen and Sybingco, 2008; Tessier-Lavigne, 1994; Yu and Bargmann, 2001). Disruption of the signal transduction pathways that regulate guided cell migration results in dramatic consequences such as neurological disorders and cancer (Delloye- Bourgeois et al., 2009; Marlow et al., 2008; Mazelin et al., 2004; Mehlen and Llambi, 2005; Stella et al., 2009; Yaron and Zheng, 2007). For example, the Slit pathway inhibits neoplastic invasive growth and Netrin regulates colorectal tumorigenesis by inhibiting apoptosis (Mazelin et al., 2004; Stella et al., 2009). The study of components required for guided cell and growth cone migrations in multiple cell types reveals the plastic nature of the signal transduction pathways that regulate these events. For example, investigation of the role of Netrin signalling in guiding different cell migrations revealed that components of the Netrin pathway act in a plastic manner according to their context (Figure 5-1) (Hedgecock et al., 1990; Leung-Hagesteijn et al., 1992). In both C. elegans and mammals, UNC-6/Netrin is required to guide the dorsal-ventral migration of commissural axons (Hedgecock et al., 1990; Ishii et al., 1992; Serafini et al., 1994). Specifically, sensory axons of C. elegans that express the transmembrane receptor UNC-40/Dcc 149

162 UNC 40 +UNC 5 UNC 40 Figure 5 1. Axon Guidance by the UNC 6/Netrin Pathway in C. elegans. A schematic of the cross section of a developing C. elegans. Muscles are shown in red. Axons that express unc 40/DCC (orange) extend towards sources of unc 6 while axons that express unc 40 and unc 5 (black) are repelled from sources of unc 6. Adapted from Roy,

163 are attracted by UNC-6 towards the ventral nerve cord (Hedgecock et al., 1990; Chan et al., 1996). Alternatively, motor axons that express both UNC-40 and its co-receptor UNC-5 are repelled from the ventral source of UNC-6 and migrate towards the dorsal nerve cord (Hedgecock et al., 1990; Leung-Hagesteijn et al., 1992; Chan et al., 1996). Similarly, the Slit pathway differentially guides the migration of multiple axons in various axes in the developing worm by using diverse combinations of signal transduction components. For example, differential Slit pathway signalling results in posterior-directed migration of anterior lateral mechanosensory (ALM) neurons, ventral-directed migration of anterior ventral mechanosensory (AVM) axons, as well as inhibition of anterior-directed migration of circumferential nerve ring axons (Hao et al., 2001; Zallen et al., 1999). The Slit ligand (SLT-1) signals through the ROBO receptor (SAX-3) to direct the migration of ALM neurons and AVM axons. In contrast, sax-3 is thought to direct the positions of the nerve ring axons in a slt-1 independent manner (Hao et al., 2001). The Slit pathway is also modulated by various coreceptors to differentially guide axon migration. For example, further investigation of AVM sensory axon migration revealed that EVA-1 acts as a co-receptor for the Slit receptor SAX-3 to guide ventral-directed migration of the AVM axon (Fujisawa et al., 2007). Moreover, UNC-40 functions in a SAX-3 pathway, likely as a co-receptor, to direct AVM and nerve ring axon migration in a Netrin-independent manner (Yu et al., 2002). The investigation of guided cell migration in different contexts has demonstrated that the signal transduction pathways that guide cell migration are differentially regulated by many strategies (Yu and Bargmann, 2001). These strategies include regulation of the spatio-temporal localization of guidance factors (Adler et al., 2006; Nash et al., 2000; Su et al., 2000; Yu and Bargmann, 2001) and the involvement of various modifiers such as co-receptors, phosphatases, 151

164 or additional signalling pathways (Fujisawa et al., 2007; MacNeil et al., 2009; Yu et al., 2002). Therefore, a comprehensive understanding of the signalling pathways that guide cell migration cannot be achieved without investigating the migration of multiple cell-types Muscle Arm Extension in C. elegans is a Model for Guided Cell Migration In Nematodes, body wall muscles required for locomotion extend membrane projections called muscle arms to the motor axons in order to form the neuromuscular junction. In C. elegans, the 95 body wall muscles are arranged in four quadrants that run longitudinally along the dorsal and ventral lengths of the worm (Figure 5-2). Each quadrant contains two rows of body wall muscles that lie on either side of both the dorsal and ventral nerve cords (White et al., 1986). The anterior-most four muscles in each quadrant, called the head muscles, project muscle arms to the motor axons within the major neuropile of the worm, called the nerve ring. Posterior to the head muscles are four neck muscles in each quadrant. These muscles project arms to both the nerve ring and the motor axons that reside in the dorsal or ventral nerve cord, whichever is closest. The remaining 63 muscles, called body muscles, project muscle arms exclusively to the nearest nerve cord (Dixon and Roy, 2005; Hedgecock et al., 1990; White et al., 1986). Henceforth, the term body wall muscle (BWM) will be used to describe the 63 body muscles posterior to the head and neck muscles. Scott Dixon and Peter Roy found that muscle arm development is a stereotypical process such that each distal BWM extends a characteristic number of muscle arms to the motor axons in the nearest nerve cord (Dixon and Roy, 2005). On average, BWMs in young adults extend 4.0 muscle arms per cell. The BWMs of L1 hatchling worms have an average of 1.7 muscle arms per cell; therefore, most muscle arms extend to the nearest nerve cord during larval development 152

165 Dorsal D. unc 5 mutant schematic Ventral Figure 5 2. Muscle Arm Extension in C. elegans. (A)A schematic of an adult C. elegans hermaphrodite. (B) A cross section of C. elegans. Muscle arms are shown extending from the distal rows of body wall muscles (red). (C C ) Photomicroscopy of the 15 th muscle in the dorsal right quadrant (D R 15) and the 11 th muscle in the ventral left quadrant (V L 11) (white arrows). The distal rows of BWMs are expressing the integrated him 4p::membrane anchored dyfp. The muscle arms extending from each of these muscles are indicated d( (red arrows). Motor neurons are false coloured blue (yellow arrows). The scale bars represent 50 μm. (D) A schematic of an unc 5 mutant worm whose dorsal nerve cord is misplaced laterally. The muscle arms extend towards this misplaced nerve cord, suggesting muscle arms are guided by a chemotropic process. Adapted from Alexander et al

166 (Dixon and Roy, 2005; Hedgecock et al., 1990). In addition, several reconstructions of transmission electron microscope images of developing embryos show that, during embryogenesis, myoblasts migrate from the nerve ring and leave behind attachments that become muscle arms (C. R. Norris, I. A. Bazykina, E. M. Hedgecock, D. H. Hall, personal communication to S. J. Dixon and P. J. Roy). A two-phase model of muscle arm development has been proposed to explain the previous observations (Dixon et al., 2006). In the first phase, myoblasts, which are originally juxtaposed with motor neurons, move to their final position and passively leave behind one or two muscle arms that connect each BWM with the motor axons (Dixon and Roy, 2005). In the second phase, it is hypothesized that development of 53 postembryonic motor neurons in the L1 and L2 stages triggers the BWMs to actively extend muscle arms towards the motor axons in response to a secreted guidance cue. Two lines of evidence suggest that a chemotropic process guides muscle arms to the motor axons. First, muscle arms extend towards misguided motor axons. The cell bodies of the motor neurons that control body muscle contraction reside within the ventral nerve cord. To innervate the dorsal muscles, the ventral cord motor neurons extend commissural axons to the dorsal midline. Once there, the motor axons extend longitudinally along the dorsal nerve cord to innervate the dorsal body muscles (White et al., 1986). In unc-5 and unc-6 mutants, the commissures fail to reach the dorsal midline and instead extend subdorsally along the lateral body wall of the worm (Hedgecock et al., 1987). Muscle arms are still attracted to the misguided motor axons of unc-5 and unc-6 mutants, suggesting that motor axons express a chemotropic cue that guides muscle arm extension (Figure 5-2) (Hedgecock et al., 1990). The second line of evidence that supports chemotropic-guided extension of muscle arms is that muscle arms extend to sites of aberrant vesicle accumulation in the cell bodies of motor 154

167 neurons in unc-104 mutants (Hall and Hedgecock, 1991; Zhou et al., 2001). unc-104 encodes a kinesin-like motor protein that functions within an axon to transport synaptic vesicles to the axon termini. The errantly extended arms in unc-104 mutants suggest that the misplaced vesicles contain a chemoattractant that directs muscle arm extension. Despite the identification of muscle arms over 200 years ago and their subsequent physical characterization, the signalling events responsible for regulating muscle arm extension are unknown (Dixon and Roy, 2005; Hedgecock et al., 1990; Rudolphi, 1808; White et al., 1986). Therefore, studying muscle arm extension is a new avenue of investigation that harbours the potential to reveal novel information about the signalling pathways that regulate guided cell migration, just as investigating the migration of different axons revealed diverse aspects of Netrin and Slit signalling A Forward Genetic Screen Identifies 23 Muscle Arm Development Defective (Madd) Mutants The stereotypical pattern and chemotropic nature of muscle arm extension inspired a forward genetic screen for genes required to guide membrane extension (Alexander et al., 2009). Specifically, the rationale for the screen was that disruption of genes required to guide muscle arms to motor axons in the nearest nerve cord, such as a chemotropic cue or receptor, should result in a non-stereotypical pattern of arm extension. Therefore, the objective of the screen was to identify mutations that cause a reduction in the number of arms that extend to the motor axon targets. To visualize the muscle arms of the distal muscles in living worms on a dissection scope, worms harbouring the tris25 chromosomally integrated transgenic array that drives the expression of membrane-anchored YFP from the him-4 promoter were used to 155

168 screen for Muscle Arm Development Defective (Madd) mutants (Dixon and Roy, 2005). The tris25 array also contains a dominant rol-6 transgene that causes the animals to have a corkscrew like twist, thereby enabling visualization of the dorsal and ventral midlines of the worm, and a clear view of the muscle arms, without physical manipulation. To identify Madd mutants, tris25-containing worms were randomly mutagenized with ethyl methanesulfonate (EMS). An oligoclonal F2 screen was then carried out to identify homozygous recessive mutations in a high-throughput manner. This entailed mutagenizing parental (P₀) worms and plating 2-3 F1s in each well of a 12-well plate using the COPAS (Complex Object Parametric Analyzer and Sorter) Biosort (Union Biometrica). Adult F2s were screened approximately four days later to identify Madd mutants. In total, 23,000 haploid mutagenized genomes were screened and 23 Madd mutants were isolated and pursued in detail. From this screen, I characterised seven mutants, including tr98, tr105, unc-33(tr114), unc-73(tr117), unc-40(tr121), tr123, and unc-51(tr126). Many of the Madd mutants have mutations in genes with suspected roles in membrane extension. For example, at least one of the Madd mutants has a mutation in the gene encoding the actin cytoskeleton regulator UNC-60B/ADF/Cofilin and two others have mutations in the gene unc-54 that encodes myosin heavy chain B. A preceding screen of candidate genes required for muscle arm extension identified roles for unc-54 and unc-60b in muscle arm extension (Dixon and Roy, 2005). Specifically, unc-54 is suspected to be required for actingenerated tension and unc-60b regulates F-actin depolymerisation. Uncovering these two genes in the forward genetic screen suggested that the screen is efficient at identifying genes that regulate muscle arm extension. Another Madd mutant was identified as a mutation in unc- 95 (LIM-domain protein). UNC-95 is required for the assembly of dense bodies, which are analogous to vertebrate focal adhesions (Broday et al., 2004; Zengel and Epstein, 1980). 156

169 Complementation Group Allele a LG Map Position a Failed to Complement b Homolog Mutation c 1 gex-2 tr116 IV < -16 ok1603 Sra-1/p140 R420Stop (c5009t) 2 d madd-2 tr103 V > MID1 R304* 3 tr96 tr103 A745T 4 tr101 tr103 I503N 5 tr64 tr103 L6667F 6 tr113 tr103 nt 8366< tr129 tr103 Q947* 8 unc-33 tr114 IV -5 < +1 e204 ~CRMP-2 R502H (g6504a) 9 unc-40 tr63 I < 0.96 n324 DCC/Neogenin intron 6 splice donor (g4869a) 10 tr115 > n324 W1107Stop (g8867a) 11 tr < 0.98 n324 exon 8 splice donor/d426n (g5765a) 12 unc-51 tr126 V > e369 ULK2 I59T (t1240c) 13 unc-54 tr112 I LGI e190 MHC-B - 14 tr124 LGI e unc-60b tr125 V LGV su158 Cofilin/ADF G44E (g2241a) 16 unc-73 tr117 I -4.5 < 0.96 e936 E1335K (g9486a) sd 17 unc-93 tr120 III -7.4 < -2 e1500 UNC-93 G388R (g2476a) 18 unc-95 tr61 I LGI su33 UNC-95 intron 1 splice acceptor (g1693a) 19 - tr50 V LGV tr98 d I -6.15< tr105 I LGI tr119 I < tr123 I < Table 5-1. The 23 Madd Mutants Recovered in a Screen for Genes Required for Muscle Arm Extension. Mutants that I cloned and/or characterized are highlighted. Adapted from Alexander et al. (2009) a The linkage group (LG) and map position are shown. Only the linkage group is shown if mapping did not proceed beyond bulk segregant analysis. b The allele used in the complementation test is shown. c The mutant residue is shown followed by the mutant nucleotide in brackets, which is relative to the adenine of the predicted start codon in the genomic sequence (wormbase release 187). For tr117 and tr125, the mutant nucleotides are with respect to F55C7.7b and C38C3.5c.1, respectively. Asterisks denote a non sense codon. 157

170 Specifically, dense bodies are attachment sites between the cytoskeleton and the extracellular matrix that mediate muscle contraction and its translation into worm movement (Cox and Hardin, 2004). The role of dense bodies in muscle arm extension was confirmed with the subsequent discovery of muscle arm extension defects in two other dense body components: unc-97 (a Pinch ortholog) and the zinc-finger-encoding gene unc-98 (Alexander et al., 2009; Hobert et al., 1999; Mercer et al., 2003; Zengel and Epstein, 1980). The screen also led to the finding that members of the WAVE complex function in muscle arm extension. These genes include gex-2/sra1, gex-3/nap1 and wve-1/wave. In mammals, WAVE regulates ARP2/3 complex-mediated actin polymerization (Eden et al., 2002; Kunda et al., 2003; Miki et al., 1998; Shakir et al., 2008). Genes with previously uncharacterized roles in guided cell extension were also successfully uncovered from the screen. These include unc-93, encoding a predicted ion-channel regulatory protein (de la Cruz et al., 2003), and the previously uncharacterized gene madd-2 (Muscle Arm Development Defective-2) (Alexander et al., in review). The usefulness of muscle arm extension as a model for guided cell migration is exemplified by the discovery of madd-2, which is a homolog of Mid1, a gene implicated in human Opitz syndrome. Patients with Opitz syndrome have multiple symptoms including cleft palate, imperforate anus, hypospadia, and hypertelorism, all of which result from defects at the midline (Quaderi et al., 1997). In addition to directing muscle arm extension to the dorsal and ventral midlines of the worm, MADD-2 is also required for extension of contralateral vulval muscles, AVM neurons, PVM neurons, and HSN neurons towards the ventral midline (Alexander et al., In Review). As such, mutation of either MID1 in human or madd-2 in worm results in ventral midline defects. Analyses of both MID1 and madd-2 suggest that they function in similar ways within the cell (Alexander et al., In 158

171 Review; Schweiger et al., 1999; Short et al., 2002; Trockenbacher et al., 2001). For example, both genes mediate protein-phosphatase 2A (PP2A) downregulation. These results suggest that the biological role of MID1 and madd-2 in regulating guidance to the midline is likely evolutionarily conserved, highlighting the importance of using muscle arm extension as a model to better understand the conserved mechanisms of guided cell extension and migration UNC-40/DCC Regulates Muscle Arm Extension in C. elegans Another graduate student in the Roy lab, Kevin Chan, characterized a complementation group of two new unc-40 alleles (Table 5-1 and Figure 5-3). I subsequently identified one mutant from the screen as an additional allele of unc-40. Two lines of evidence demonstrate that UNC-40 plays a primary role in directing muscle arm extension in response to a chemotropic cue from the motor axons (Alexander et al., 2009). First, UNC-40 expression in the BWMs is sufficient to rescue the Madd phenotype of unc-40 null mutants. Conversely, pan-neuronal expression of UNC-40 in an unc-40 null background rescues motor axon guidance to the dorsal nerve cord but not muscle arm extension defects. Therefore, UNC-40 functions cell-autonomously in muscle to direct muscle arm extension. Second, unc-40 is required for errant lateral muscle arm extension to the misplaced motor axons of unc-5 and unc-6 mutants (Alexander et al., 2009). Alternatively, if UNC-40 directed muscle arm extensions towards a cue from the dorsal midline and not towards a cue from the motor axons, the muscle arms that extend to the errant lateral motor axons in unc-5 and unc-6 mutants would not be dependent on unc-40. The finding that UNC-40 directs muscle arm extension to the motor axon targets provides a third line of evidence that the motor axons likely secrete an attractive chemotropic cue that guides muscle arm extension. 159

172 A. wild type muscle cell B. unc 40(tr63) body wall muscles neurons muscle cell C. # Arms/ /BWM Figure 5 3. UNC 40/DCC Regulates Muscle Arm Extension in C. elegans. (A) Photomicroscopy of a wild type muscle V L 11 (white arrow). (B) unc 40(tr63) muscle V L 11 is shown (white arrow). This muscle is extending one muscle arm (red arrow). The distal rows of BWMs are expressing integrated him 4p::membrane anchored YFP. Motor neurons are false coloured blue. (C) The average number of muscle arms extending from V L 11 (light grey) is shown for two unc 40 mutants isolated from the forward genetic screen for Madd mutants. The error bars represent standard error of the mean. Courtesy of Kevin Chan. Adapted from Alexander et al.,

173 Kevin Chan s work suggests that unc-40 may be acting in a Netrin-independent manner to direct muscle arm extension because unc-6 mutants do not have a ventral or a lateral Madd phenotype (Alexander et al., 2009). The role of unc-6 in muscle arm extension was further investigated in worms over-expressing UNC-40::YFP (Alexander et al., 2009). Suppression of gain-of-function phenotypes caused by UNC-40 over-expression have previously been used to order axon guidance components in various unc-40 pathways (Chang et al., 2006; Gitai et al., 2003; Levy-Strumpf and Culotti, 2007). Over-expression of UNC-40::YFP in the BWMs induces the random extension of plasma membrane from the muscle that we call myopodia (Figure 5-11B). The formation of these myopodia is dependent on unc-60b, suggesting that the myopodia may be analogous to muscle arm extensions. unc-6(null) suppresses ectopic myopodia caused by UNC-40 over-expression, suggesting that another cue may act redundantly with unc-6 to direct muscle arm extension UNC-73 Regulates Axon Guidance in C. elegans In this chapter, I describe my characterization of seven of the 23 Madd mutants (Table 5-1). Analysis of one of the mutants revealed a role for unc-73, an ortholog of the vertebrate gene Trio, in muscle arm extension. unc-73 encodes eight isoforms of a Rho-GTPase guanine nucleotide exchange factor (Rho-GEF) with known roles in axon guidance, neuronal function, and apoptotic cell engulfment (debakker et al., 2004; Kishore and Sundaram, 2002; Kubiseski et al., 2003; Lundquist et al., 2001; Steven et al., 1998; Steven et al., 2005; Wu et al., 2002). Rho- GEFs function to facilitate transformation of the Rho family of small GTPases into their active GTP-bound form, while Rho-GAPs inactivate Rho-GTPases by triggering the latters endogeneous GTPase activity. unc-73 encodes two Rho-GEF domains, each placed in tandem 161

174 with a pleckstrin homology domain (Steven et al., 1998; Steven et al., 2005) (Figure 5-6D). The first Rho-GEF domain interacts with Rac-GTPases MIG-2 and CED-10 and is predicted to regulate the cytoskeletal rearrangements necessary in axon guidance (Kishore and Sundaram, 2002; Kubiseski et al., 2003; Lundquist et al., 2001; Steven et al., 1998; Wu et al., 2002). Accordingly, mutations that affect the first domain result in uncoordinated movement. The second Rho-GEF domain activates the Rho-GTPase RHO-1 to regulate pharyngeal pumping, egg-laying, neuronal function, speed of locomotion, and cell migration (Spencer et al., 2001; Steven et al., 2005). Previous work showed that UNC-73 functions upstream of UNC-40 and SAX-3/Robo, yet downstream of VAB-8, a kinesin-like protein, to regulate ALM cell and axon extension (Levy-Strumpf and Culotti, 2007; Watari-Goshima et al., 2007). In this chapter, I present evidence that UNC-73 functions downstream of UNC-40 in directing muscle arm extension to the motor axon targets. Together with results showing that UNC-40 may function independently of UNC-6/Netrin, my work demonstrates how distinct components of the UNC- 40 pathway are employed differently depending on the cell type and illustrates the plastic nature of guidance pathways Results Characterization of Seven Madd Mutants To characterize the Madd mutants (Table 5-1), I crossed each of them into the tris30 strain, which expresses him-4p::yfp in the BWMs and hmr-1b::dsred2; unc-129nsp::dsred2 panneuronally. Two aspects of tris30 make it preferable to the background used to isolate the Madd mutants, tris25, for characterization Madd and axon guidance phenotypes. First, tris30 does not have the dominant rol-6 transgene that causes tris25 animals to roll, facilitating a 162

175 more comprehensive analysis of the muscle arms of single animals on the compound microscope. Second, axon guidance defects are more readily visualised in tris30, which has two neuronal markers with increased specificity for commissural motor axons. To gauge the severity of the Madd phenotype in each mutant, I compared the number of muscle arms extending from specific BWMs to the number of muscle arms extending from the corresponding BWMs in control animals (Table 5-2). Specifically, the 15 th muscle in the dorsal right quadrant (D R 15) and the 11 th muscle in the ventral left quadrant (V L 11) were analysed. These specific muscles were chosen because they extend a relatively large number of arms in wild type worms and their shape and position make them easy to recognize. A Madd mutant with a dorsal but not a ventral Madd phenotype is likely an axon guidance mutant with a disrupted dorsal nerve cord but intact ventral nerve cord. Therefore, the number of motor axons that reached the right side of the dorsal nerve cord was also counted in order to determine whether the Madd phenotype might be a secondary consequence of misplaced motor axons as previously described in unc-5 mutants (Table 5-2). Next, in order to identify the mutants physical identities, I performed Single Nucleotide Polymorphism (snip-snp) mapping of each mutant as previously described (Wicks et al., 2001). I chose candidate genes within the resulting approximate physical location whose mutants share visible phenotypes with the Madd mutant in question for complementation tests. Once noncomplementation was observed between a Madd mutant and a reference mutant, the candidate gene was sequenced in the Madd mutant to identify the mutated nucleotide. 163

176 Genotype Muscle Arm # (Dorsal Right) Muscle Arm # (Ventral Left) Dorsal Commissures P Value 1 control (tris30) 3.7 ± ± ± rrf-3(pk1426) 3.7 ± ± ± (1) unc-33 3 tr ± 0.2 (n23) 3.0 ± 0.2 (n21) 15.7 ± (1) 4 tr114/+ 3.8 ± 0.1 (n27) 3.3 ± 0.2 (n28) e204/+ 3.8 ± ± tr114/e ± ± 0.2 (n8) (1) 7 mn ± ± ± 0.5 <0.001 (1) 8 mn407/+ 4.3 ± ± tr114/mn ± 0.2(n=16) 2.8 ± 0.5 (n=4) (1) 10 unc-40(n324); unc-33(tr114) 0.33 ± ± ± (26) unc tr ± ± ± 0.5 <0.001 (1) 12 tr121/+ 3.5 ± ± n ± ± ± 0.31 <0.001 (1) 14 n324/+ 3.6 ± 0.3 (n10) 2.8 ± 0.3 (n8) tr121/n ± 0.1 (n15) 1.3 ± 0.2 (n10) - <0.001 (1) 16 tris34(him-4p::unc-40::yfp) 4.5 ± ± (1) 17 unc-40(n324); tris ± ± (1) 18 unc-40(n324); Ex[control]#1 0.4 ± ± ± (26) 19 unc-40(n324); Ex[control]#2 0.4 ± ± ± (26) 20 unc-40 (n324); Ex[him-4p::UNC-40::YFP]#1 3.6 ± ± <0.001 (26) 21 unc-40 (n324); Ex[him-4p::UNC-40::YFP]#2 3.3 ± ± <0.001 (26) 22 unc-40 (n324); Ex[unc-119p::UNC-40::GFP]#1 1.0 ± ± ± (26) 23 unc-40 (n324); Ex[unc-119p::UNC-40::GFP]#2 0.7 ± ± ± (26) unc tr ± ± 0.2 (n23) - <0.01 (1) 25 tr126/+ 4.6 ± ± e369/+ 4.6 ± ± 0.2 (n14) tr126/e ± ± 0.2 (n14) - <0.001 (1) 28 e ± ± ± (1) 29 e1189/+ 3.9 ± ± 0.1 (n15)

177 30 tr126/e ± ± 0.3 (n6) (1) unc tr ± ± ± 0.42 <0.001 (1) 32 tr117/+ 3.4 ± ± e ± ± ± 2.1 <0.001 (1) 34 e936/+ 3.8 ± ± tr117/e ± ± <0.001 (1) 36 tr117/n ± 0.2 (n14) 2.6 ± 0.3 (n13) - <0.01 (1) 37 tr117/tr ± 0.1 (n27) 2.7 ± 0.2 (n29) - <0.01 (1) 38 unc-73(e936); Ex[him-4p::UNC-73::CFP] line ± 0.6 (n7) 3.9 ± 0.5 (n7) (1) 39 unc-73(e936); Ex[him-4p::UNC-73::CFP] line ± 0.5 (n7) 4.1 ± 0.4 (n8) (1) Mutants not yet cloned 40 tr ± ± 0.1 (n27) 20.4 ± 0.6 <0.001 (1) 41 tr ± ± ± (1) 42 tr ± ± ± 0.7 <0.001 (1) Table 5-2. Characterization of Madd Mutants. The average number of muscle arms extending from D R 15 and from V L 11 is shown for each genotype. Dorsal commissures represent the number of commissures that meet the right side of the dorsal nerve cord. Standard error of the mean is indicated for all counts. Adapted from Alexander et al. (2009) unc-33(tr114) and unc-51 (tr126) are Likely Madd as a Secondary Consequence of Neuronal Defects tr126 worms are severely uncoordinated and dumpy. I mapped tr126 to a 5.76 cm region on linkage group (LG) V that corresponds with the location of the gene unc-51 (Table 5-1). Complementation tests and sequencing confirmed that tr126 is indeed a novel allele of unc-51 (Table 5-1 and Figure 5-4). unc-51 encodes a serine/threonine kinase with homology to the 165

178 Figure 5 4. Muscle Arm Development Defects of Cloned Madd Mutants. (A D ) Representative pictures of the muscle arm defects in DR15 and VL11. The distal rows of BWMs are expressing integrated him 4p::membrane anchored YFP. Individual muscle arms are indicated with red arrows and motor neurons are false coloured blue. (E) The average number of muscle arms extending from DR15 (dark grey) or VL11 (light grey) is shown for each Madd mutant and for wild type worms. Scale bars represent 50 μm. Adapted from Alexander et al.,

179 yeast autophagy gene ATG1 and to mammalian ULK (unc-51-like kinase) genes (Ogura et al., 1994). Similarly, tr114 worms are uncoordinated and dumpy. I found that tr114 is a novel allele of the gene unc-33, which is homologous to Collapsin Response Mediator Protein-2 (CRMP-2). By homology to CRMP-2, unc-33 is thought to play a role in regulating tubulin transport, thereby facilitating axon outgrowth and branching (Fukata et al., 2002; Kimura et al., 2005; Tsuboi et al., 2005). Both unc-51(tr126) and unc-33(tr114) have strong dorsal Madd phenotypes and weak ventral Madd phenotypes (Table 5-2 and Figure 5-4). This result, along with the previously characterized roles of unc-51 and unc-33 in axon guidance (Fukata et al., 2002; Li et al., 1992; Ogura and Goshima, 2006; Ogura et al., 1994; Tsuboi et al., 2005), suggests that the dorsal Madd phenotype of these genes may be a secondary consequence of neuronal defects. Further investigation may clarify the role of these genes in muscle arm extension (see discussion) tr121 is an Allele of unc-40 tr121 mutants are short and uncoordinated with a strong Madd phenotype and few axon guidance defects (Table 5-2). I mapped tr121 to a region that corresponds with UNC-40 s location on the center of LG I (Table 5-1). I then performed a complementation test with unc- 40(n324) that identified tr121 as an allele of unc-40 (Figure 5-4). Sequencing of the unc-40 locus in tr121 worms revealed a mutation in the splice donor sequence of intron 8 (Table 5-1). As described above, there is a possibility that unc-40 directs muscle membrane extension downstream of unc-6(ev400) and an unidentified redundant ligand. In mammals, Neogenin (Neo1) is a paralog of Dcc. There is significant identity between unc-40 and Neo1 (30.7%, e- 144), and between Neo1 and Dcc (48%, e0.0) (Costanzo et al., 2000). While Netrin can signal through Dcc and Neo, another ligand, the Repulsive Guidance Molecule (Rgm), only signals 167

180 through Neogenin to direct retinal axon guidance (Monnier et al., 2002; Rajagopalan et al., 2004; Wang et al., 1999). The C. elegans gene Y71G12B.16 is homologous to the RGM family of proteins (Costanzo et al., 2000). Mutations in Y71G12B.16 have yet to be reported; however, there is a bacterial strain expressing Y71G12B.16 dsrna (Kamath et al., 2003; Simmer et al., 2003). To test whether C. elegans RGM might be the ligand that directs muscle arm extension, I tested whether Y71G12B.16 RNAi causes a Madd phenotype. Qualitative analysis revealed that neither unc-6(ev400) nor wild type worms display a Madd phenotype when fed Y71G12B.16 RNAi-inducing bacteria, suggesting that Y71G12B.16 may not be the ligand in question. Alternatively, it is possible that the reduced efficiency of RNAi in the nervous system produced a false-negative result (Kamath et al., 2003). A putative null mutant of Y71G12B.16 recently became available (National Bioresource Project, Tokyo Women's Medical University, Japan); analysis of this mutant may reveal a role for Y71G12B.16 in directing muscle arm extension tr105, tr123 and tr98 Mutants Have Weak Madd Phenotypes I mapped tr105, tr123 and tr98 to LG I (Table 5-1). tr98 is a dominant allele that causes a weak ventral Madd phenotype. tr123 also causes a weak ventral Madd phenotype but in a recessive manner (Table 5-2 and Figure 5-5). The tr105 worms have strong axon guidance defects and very weak ventral Madd phenotypes, suggesting that the dorsal Madd phenotype is a secondary consequence of a disrupted nervous system (Table 5-2 and Figure 5-5). I did not pursue these mutants in any more detail as they have relatively weak Madd phenotypes. 168

181 Figure 5 5. Muscle Arm Development Defects of Uncloned Madd Mutants. (A D ) Representative pictures of the muscle arm defects in D R 15 and V L 11. The distal rows of BWMs are expressing integrated him 4p::membrane anchored YFP. Individual muscle arms are indicated with red arrows and motor neurons are false coloured blue. (E) The average number of muscle arms extending from D R 15 (dark grey) or V L 11 (light grey) is shown for each Madd mutant and for wild type worms. Adapted from Alexander et al.,

182 unc-73 Functions Cell-Autonomously to Regulate Muscle Arm Development One of the alleles from the screen for Madd mutants was tr117, which extends fewer than 50% of the wild type number of muscle arms (Table 5-2 and Figure 5-6). The tr117 homozyogotes are uncoordinated, short, and have misguided commissural motor axons. As mentioned in the introduction, madd-2 functions in other cell extension events including vulval muscle extension to the midline. This finding prompted me to analyse vulval muscle extension to the midline in tr117 worms expressing egl-15p::gfp in vulval muscles. I found that tr117 mutants have defective extension of hermaphrodite sex muscles, which is supporting evidence for a role of tr117 in guided membrane extension. I mapped the tr117 mutation to LG I between -4.5 and 1.0 map units (Table 5-1 and Figure 5-6B). I tested several genes in this 5.5 cm region, including unc-73, for complementation of tr117. unc-57 (endophilin A), scd-3, kin-32 (focal adhesion kinase), sem-2, and unc-40 each complemented tr117. However, unc-73(e936) failed to complement the Madd phenotype of tr117 worms (Figure 5-6C). This result, along with a shared phenotypic profile, suggested that tr117 is likely an allele of unc-73. Sequencing tr117 revealed a glutamic acid to lysine missense mutation in position 1335 of the RhoGEF-1 domain encoded by unc-73 (Table 5-1 and Figure 5-6 D). The canonical e936 mutation is also in the RhoGEF-1 domain. The B isoform of UNC-73 has an intact Rho GEF-1 domain, but is missing the Rho-GEF-2 domain. I found that expression of a CFP-tagged UNC-73B isoform (provided by Rob Steven, University of Toledo, Ohio, USA) in the BWMs rescues the muscle arm extension defect of unc-73(e936) mutants. Together, these results demonstrate that tr117 is allelic to unc-73 and that unc-73 functions cell-autonomously to regulate muscle arm extension (Figure 5-7). 170

183 A. wild type B. tr117 body wall muscles neurons C. Madd F2 progeny of tr117 x CB4856 D. LG I # Arms/BW WM x102 x1 x5 x1 x4 E. e936 tr117(e1335k) Figure 5 6. tr117 is an Allele of unc 73. (A B) Dorsal aspect is shown. Anterior is to the right. The dorsal nerve cord is false coloured blue and red arrows indicate individual muscle arms extending from D R 11 (white arrow) in representative wild type (A) and unc 73(tr117) (B) worms. Adapted from Alexander et al., (C) Snip SNP mapping was performed to narrow the region of tr117 to the center of chromosome I. (D) unc 73(e936) did not complement tr117 when the two mutant alleles were placed in trans. The error bars represent standard error of the mean. (E) e936 and tr117 are mutations in the RhoGEF 1 domain of UNC 73. A construct of the UNC 73B isoform tagged with CFP is also shown. CFP 171

184 A. wild type B. unc 73(e936) D. C. unc 73(e936); muscle expressed UNC 73B::YFP body wall muscles neurons 5 # Arms/BWM Line 1 Line 2 Figure 5 7. unc 73(tr117) Functions Cell Autonomously in Muscle to Guide Muscle Arm Extension. (A C). Ventral aspect is shown. Anterior is to the right. The ventral motor neurons are false coloured blue and red arrows indicate individual muscle arms extending from V R 11(white arrow). (A) A wild type worm. (B) unc 73(e936) mutantmuscle muscle arm extension is rescued by him 4p drivenunc unc 73B::YFP expression (C). (D) The average number of muscle arms extending from V L 11 is indicated for each genotype. The error bars represent standard error of the mean. Adapted from Alexander et al.,

185 UNC-73 is Localized to Muscle Arm Termini I found that functional UNC-73::CFP is enriched at the leading edge of extended muscle arms (Figure 5-8D-F). This localization pattern is similar to that of a functional UNC-40::YFP fusion protein. To investigate whether UNC-73 and UNC-40 reporters co-localize, I first integrated a transgenic reporter (tris34) that drives unc-40 expression specifically in muscle cells. This results in a line of enriched UNC-40 expression at the tips of the muscle arms along the ventral and dorsal cords (Figure 5-8A-C ). Occasionally, UNC-40::YFP is also localized to the muscle membrane proximal to the nearest nerve cord. The integrated construct produces functional UNC-40, as tris34 rescues the muscle arm phenotype of unc-40(n324) worms (p<0.01) (Figure 5-8G). Additionally, tris34 is sufficient to induce supernumerary (Sna) muscle arms in a wild type background (p<0.02) (Figure 5-8G). When expressed in the same cell, UNC-73::CFP and UNC- 40::YFP co-localize at locations corresponding to the post-synaptic membrane, consistent with a role for unc-73 and unc-40 in regulating muscle arm extension UNC-73 Functions Downstream of UNC-40 in Muscle Arm Development The co-localization of UNC-40::YFP and UNC-73::CFP to muscle arm termini suggests that these genes function at the leading edge of muscle arms to direct muscle arm extension. Furthermore, the extension of myopodia in random directions from BWMs upon overexpression of MYR::unc-40::YFP at the plasma membrane supports the hypothesis that regulation of UNC-40 localization may be integral to its function in guiding muscle arm extension (Alexander et al., 2009). In an attempt to identify the mechanism that localizes UNC- 40 to the muscle arm termini, I carried out a systematic search for genes required to localize UNC-40::YFP to the muscle arm termini. 173

186 A. muscle promoter::unc 40::YFP A B. muscle promoter::mb::cfp B C. UNC 40 (red), muscle (green) C D. muscle promoter::unc 73::CFP E. muscle promoter::unc 40::YFP G. # Arms / BWM F. UNC 40::YFP; UNC 73::CFP Figure 5 8. UNC 40 and UNC 73 are Enriched at the Muscle Arm Termini. (A C ) Fluorescent micrographs show UNC 40::YFP localization at the tips of muscle arms (arrows). Muscles are also expressing a membrane tagged CFP reporter that is used to visualize the entire muscle. The panels on the right are magnified views of the panels on the left. (D F) UNC 73::CFP is enriched and colocalized with UNC 40::YFP at the muscle arm termini. (G) The average number of muscle arms extending from D R 15 (dark grey) or V L 11 (light grey) is shown for each genotype. tris34 is an integrated array harbouring him 4p::UNC 40::YFP and rol 6 transgenes. The error bars represent standard error of the mean. 174

187 Specifically, I investigated whether candidate genes that cause a Madd phenotype when disrupted regulate UNC-40 localization. I found that madd2t-2, unc-60b, unc-95, unc-54, unc- 104/Kif1, and unc-14/spag16 are not required to localize UNC-40::YFP (Figure 5-9). I also found that unc-6 and unc-5 are dispensable for UNC-40::YFP localization. In contrast, UNC-40::YFP fails to localize to the muscle arm termini properly in unc-73(e936) and in unc-51(tr126) mutant backgrounds (Figure 5-9). The disrupted sub-cellular localization of UNC-40 is not a secondary consequence of the fewer muscle arms of unc-73(e936) or unc-51(tr126) because the other Madd mutants listed above have wild type UNC-40 localization patterns. Moreover, wild-type UNC-40 localization patterns in Madd mutants listed above are not likely to be caused by unc- 40-mediated rescue of their Madd phenotypes because madd-2(tr103) tris30; tris34 worms have a similar number of muscle arms to madd-2(tr103); tris30 worms ( Figure 5-9H). In a parallel experiment, I found that UNC-73::CFP is not mislocalized in an unc-40(n324) background, demonstrating that UNC-40 does not regulate UNC-73::CFP localization (Figure 5-9I,J). Next, I investigated whether unc-73 is required directly or indirectly to regulate UNC-40 localization. To investigate whether UNC-40 localization is dependent on UNC-73 function in the muscle or in the nervous system, I tested whether UNC-40 localization is rescued by muscleor neuronal-expressed UNC-73 in unc-73(e936) mutants. I found that the UNC-40 localization pattern was rescued in worms carrying neuronal-expressed UNC-73, but not in worms carrying muscle-expressed UNC-73 (Figure 5-10). However, the Madd phenotype was rescued in worms carrying muscle-expressed UNC-73, but not in worms carrying neuronal-expressed UNC-73 (Figure 5-7D). These results suggest that the disrupted UNC-40 localization pattern is secondary 175

188 UNC termini A. control (UNC 40::YFP) B. ~PAXILLIN unc 95(su33) C. unc 73(e936) D. Cofilin/ADF unc 60(su158) E. MHC B unc 54(e190) F. S T Kinase unc 51(tr126) G. TRIM 9 madd 2(tr103) *all strains have tris34[him 4p::UNC 40::YFP; rol 6] H. # Arms/BWM I. UNC 73B::CFP J. unc 40(n324); UNC 73B::CFP Figure 5 9. An Analysis of UNC 73 s Role in UNC 40 Localization. (A)UNC 40::YFP is localized to the muscle arm termini (red arrow) in wild type animals. The vulva serves as a mid body reference point (yellow arrowhead). (B G)UNC 40::YFP localization is disrupted in unc 73(e936) and unc 51(tr126) backgrounds. (H) The average number of muscle arms extending from D R 15 (dark grey) or V L 11 (light grey) is shown for each genotype. The error bars represent standard error of the mean. (I) UNC 73::CFP is localized to the muscle arm termini (blue arrow) in wild type and (J) unc 40(n324)animals. 176

189 # Gaps in UNC-40 expression patte ern * p<0.01 vs. control * * * p<0.01 vs. unc-73(e936) * * * * ** * * * A P A P A P A P A P A P A P Line Line 2 2 Line Line 3 3 Line Line 1 1 Line Line 2 2 Line Line 3 3 * * Figure Quantification of UNC 73 s Role in UNC 40 Localization. The number of gaps in the UNC 40::YFP expression pattern was counted both anteriorly (A) and posteriorly (P) to the vulva in each of the indicated backgrounds. All counts were done in the background of tris34[him 4p::UNC 40::YFP]. Control represents wild type number of gaps in worms carrying the tris34 array. Multiple lines were counted for worms containing either of the extrachromosomal transgenic arrays: neuronal specific unc 73::YFP or muscle specific unc 73::YFP expression. The error bars represent standard error of the mean. 177

190 to unc-73 mutants neuronal phenotypes. This possibility will be addressed in greater detail in the discussion section of this chapter. I further investigated the relationship between unc-73 and unc-40 by attempting to build a strain homozygous for both unc-73(e936) and unc-40(n324) mutations. The cross resulted in several worms with synthetic withered tails (Wit) characteristic of strong loss-of-function of unc-73 and associated with defective CAN cell migrations (Forrester et al., 1998). Since this withered tail phenotype is not seen in unc-73(e936) mutants, the presence of this phenotype suggests unc-73(e936) hypomorphs are sensitive to the gene dose of unc-40. However, I was unable to establish a self-propagating line of mutants, indicating that double mutants are subviable. The number of muscle arms extending from the worms with withered tails is indistinguishable from unc-73(e936) mutants. This suggests that the genotype of the witheredtail mutants that I analysed were likely unc-73 unc-40/unc-73 + or unc-73 +/ + unc-40. To distinguish between these two possibilities, I investigated the number of muscle arms in transheterozygous unc-73 +/ + unc-40 worms. The resulting Madd phenotype of these worms was less severe than that of unc-73(e936) mutants, suggesting that the worms with withered tails likely had the genotype unc-73 unc-40/unc-73 + (Figure 5-11A). The non-allelic noncomplementation of the Madd phenotype in trans-heterozygous unc-73 +/ + unc-40 worms reveals a genetic interaction between unc-40 and unc-73 (Figure 5-11A). These results suggest that unc-40 and unc-73 function together to regulate muscle arm migration. Transgenic expression of two separately derived arrays containing a muscle-specific unc-73 construct partially alleviated the Madd phenotype of unc-40(n324) worms (Figure 5-11A), suggesting that unc-73 functions downstream of unc-40 in muscle arm extension. In a concurrent investigation, Kevin Chan discovered that unc-73(e936) suppresses ectopic 178

191 A. # Arms / BWM B. O/E UNC 40::YFP D. C. O/E UNC 40::YFP; unc 73(e936) Figure UNC 73 Functions Downstream of UNC 40 in Muscle Cells. (A)The average number of muscle arms extending from D R 15 (dark grey) or V L 11 (light grey) is shown for each genotype. Mp represents the him 4 muscle specific promoter. (B) Overexpression of UNC 40::YFP at fifty fold higherconcentration than that used for rescue experiments induce fine membrane protrusions called myopodia (red arrowheads). (C,D) The number of myopodia induced by UNC 40::YFP overexpression are greatly reduced in unc 73 mutant backgrounds. The error bars represent standard error of the mean. (B D) Courtesy of Kevin Chan. Adapted from Alexander et al

192 myopodia that extend as a consequence of unc-40 overexpression (Figures 5-11B-D) (Alexander et al., 2009). Together, these results suggest that unc-73 functions downstream of unc-40 to facilitate muscle arm extension The Localization of UNC-40 and MADD-2 are Disrupted in unc-51 Mutants UNC-40 localization is also disrupted in an unc-51(e369) background (Figure 5-9). Previous work showed that UNC-51 is required for the proper localization of UNC-5 and UNC-33 reporters within motor neurons {Ogura, 2006 #537;Tsuboi, 2005 #479}. Together, these findings led to an investigation of whether unc-51 functions in muscle arm extension to localize the products of various Madd genes to muscle arm termini. This analysis demonstrated that the localization pattern of MADD-2 is also disrupted in an unc-51 background (Figure 5-12) Proteins Required for Muscle Arm Extension are Localized to the Muscle Arm Termini The subcellular localization of UNC-73 and UNC-40 to the muscle arm termini inspired an investigation of the subcellular localization of other genes identified as, or predicted to be, components of muscle arm extension (Figure 5-13). These genes include: 1) Dense body components unc-95 and unc-97; 2) WAVE complex regulator gex-2; and 3) The SLT-1 receptor eva-1. I examined the localization of the dense body components UNC-95 and UNC-97 (extrachromosomal lines from the Hobert lab) in L1-staged worms because they were not enriched at the muscle arm termini of young adults (Alexander et al., 2009). I found that both UNC-95::GFP and UNC-97::GFP were expressed in the body of the BWMs; however, neither UNC-95::GFP nor UNC-97::GFP were enriched at the muscle arm termini in young animals, 180

193 A. tris36 [MADD 2::CFP] control unc 51(e1189) B. tris37 [MADD 2::YFP] control unc 51(e369) Figure MADD 2 Localization is Disrupted in an unc 51 Mutant Background. (A) MADD 2::YFP and (B) MADD 2::CFP are localized to the muscle arm termini (red arrow) in wild type animals (depicted in left column). The vulva serves as a mid body reference point (yellow arrowhead). MADD 2::YFP and MADD 2::CFP localization is disrupted in unc 51 mutant backgrounds (depicted in right column). 181

194 A. GEX 2::CFP A A. UNC 40::YFP A. merge B. EVA 1::CFP B. UNC 40::YFP B. merge Figure GEX 2 and EVA 1 are Enriched at the Muscle Arm Termini. (A A ) Fluorescent micrographs show GEX 2::CFP (A A ) and EVA 1::CFP (B B ) co localization with UNC 40::YFP at the tips of muscle arms (yellow arrows). 182

195 suggesting that dense body components may not play a primary role in muscle arm extension. Alternatively, small amounts of protein may be sufficient to regulate this process, thereby abrogating visualization of enhanced localization to muscle arm termini in young worms. In contrast, GEX-2::CFP expressed specifically in muscle cells co-localized with UNC-40::YFP at the muscle arm termini (Figure 5-13). Here, GEX-2::CFP localization was enriched relative to GEX- 2::CFP localization to the stalks of the muscle arms. A similar pattern was observed in worms with muscle specific expression of an EVA-1::CFP transgene (Figure 5-13). I investigated the localization of EVA-1 because it enhances the muscle arm phenotype of unc-40(null) worms (genotype: number of muscle arms extending from VL11 ± SEM, unc-40: 0.30±0.09, eva-1: 2.8±0.2, unc-40(n324); eva-1(ok1133): 0.13±0.06). Therefore, these experiments suggest that proteins involved in signal transduction are localized at the muscle arm termini, consistent with their roles in guiding muscle arm extension, while cytoskeletal components are not UNC-40 Functions at the Membrane to Direct Muscle Arm Extension To further investigate UNC-40 s function in muscle arm extension, Kevin Chan and I analysed the sub-cellular localization and function of various UNC-40 domains. As described above, muscle-expressed full length UNC-40::YFP is enriched at the muscle arm termini and at the muscle membrane proximal to the nearest nerve cord (Figure 5-14B). I analyzed the localization of the extracellular UNC-40 domain and found that it localizes in a similar pattern as the fulllength fusion protein (Figure 5-14C and 5.14F-F ). Next, I analyzed the localization of the cytoplasmic portion of UNC-40::YFP that has a myristilation tag (MYR) at its N-terminus to tether it to the plasma membrane (Figure 5-14D). As expected, this fusion protein localizes to 183

196 A. Rescue? B. him 4p::unc 40::YFP C. him 4p::unc 40(Δcyto)::YFP D. him 4p::MYR:: unc 40( Δecto)::YFP E. him 4p:: unc 40(Δecto)::YFP F. him 4p:: unc 40(Δcyto)::YFP G. him 4p::UNC 40(Δecto)::YFP F. him 4p::mCherry G. him 4p::mCherry F. UNC 40(green); Muscle(red) G. UNC 40(green); Muscle(red) Figure UNC 40 Functions at the Membrane to Direct Muscle Arm Extension. (A) UNC 40 construct variants are illustrated along with whether each was capable of rescuing the MADD phenotype of unc 40(n324) worms. Illustration courtesy of Peter J. Roy. (B E) Representative pictures of cytoplasmic and extracellular UNC 40 protein isoforms. Red arrows point to muscle arms and green arrow point to the nucleus. (F F ) UNC 40(Δcyto)::YFP is localized to muscle arm termini and to membrane proximal to the nearest nerve cord. (G G ) UNC 40(Δecto)::YFP is dramatically enriched in the nucleus. 184

A complementation test would be done by crossing the haploid strains and scoring the phenotype in the diploids.

A complementation test would be done by crossing the haploid strains and scoring the phenotype in the diploids. Problem set H answers 1. To study DNA repair mechanisms, geneticists isolated yeast mutants that were sensitive to various types of radiation; for example, mutants that were more sensitive to UV light.

More information

Correspondence: Peter J Roy. Joshua M Stuart.

Correspondence: Peter J Roy.   Joshua M Stuart. BioMed Central Open Access Research article A global analysis of genetic interactions in Caenorhabditis elegans Alexandra B Byrne*, Matthew T Weirauch, Victoria Wong*, Martina Koeva, Scott J Dixon*, Joshua

More information

Bypass and interaction suppressors; pathway analysis

Bypass and interaction suppressors; pathway analysis Bypass and interaction suppressors; pathway analysis The isolation of extragenic suppressors is a powerful tool for identifying genes that encode proteins that function in the same process as a gene of

More information

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia From Wikipedia, the free encyclopedia Functional genomics..is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects)

More information

The geneticist s questions

The geneticist s questions The geneticist s questions a) What is consequence of reduced gene function? 1) gene knockout (deletion, RNAi) b) What is the consequence of increased gene function? 2) gene overexpression c) What does

More information

Bi Lecture 8 Genetic Pathways and Genetic Screens

Bi Lecture 8 Genetic Pathways and Genetic Screens Bi190-2013 Lecture 8 Genetic Pathways and Genetic Screens WT A 2X:2A her-1 tra-1 1X:2A her-1 tra-1 Female body Male body Female body Male body her-1(lf) B 2X:2A her-1(lf) tra-1 1X:2A her-1(lf) tra-1 Female

More information

The Worm, Ceanorhabditis elegans

The Worm, Ceanorhabditis elegans 1 1 Institute of Biology University of Iceland October, 2005 Lecture outline The problem of phenotype Dear Max Sidney Brenner A Nobel Prize in Medicine Genome sequence Some tools Gene structure Genomic

More information

Green Fluorescent Protein (GFP) Today s Nobel Prize in Chemistry

Green Fluorescent Protein (GFP) Today s Nobel Prize in Chemistry In the news: High-throughput sequencing using Solexa/Illumina technology The copy number of each fetal chromosome can be determined by direct sequencing of DNA in cell-free plasma from pregnant women Confession:

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Principles of Genetics

Principles of Genetics Principles of Genetics Snustad, D ISBN-13: 9780470903599 Table of Contents C H A P T E R 1 The Science of Genetics 1 An Invitation 2 Three Great Milestones in Genetics 2 DNA as the Genetic Material 6 Genetics

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary Discussion Rationale for using maternal ythdf2 -/- mutants as study subject To study the genetic basis of the embryonic developmental delay that we observed, we crossed fish with different

More information

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Evidence for dynamically organized modularity in the yeast protein-protein interaction network Evidence for dynamically organized modularity in the yeast protein-protein interaction network Sari Bombino Helsinki 27.3.2007 UNIVERSITY OF HELSINKI Department of Computer Science Seminar on Computational

More information

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype. Series 2: Cross Diagrams - Complementation There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome:

More information

Modelling genotype phenotype relationships and human disease with genetic interaction networks

Modelling genotype phenotype relationships and human disease with genetic interaction networks 1559 The Journal of Experimental Biology 210, 1559-1566 Published by The Company of Biologists 2007 doi:10.1242/jeb.002311 Modelling genotype phenotype relationships and human disease with genetic interaction

More information

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai Network Biology: Understanding the cell s functional organization Albert-László Barabási Zoltán N. Oltvai Outline: Evolutionary origin of scale-free networks Motifs, modules and hierarchical networks Network

More information

Types of biological networks. I. Intra-cellurar networks

Types of biological networks. I. Intra-cellurar networks Types of biological networks I. Intra-cellurar networks 1 Some intra-cellular networks: 1. Metabolic networks 2. Transcriptional regulation networks 3. Cell signalling networks 4. Protein-protein interaction

More information

networks in molecular biology Wolfgang Huber

networks in molecular biology Wolfgang Huber networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION med!1,2 Wild-type (N2) end!3 elt!2 5 1 15 Time (minutes) 5 1 15 Time (minutes) med!1,2 end!3 5 1 15 Time (minutes) elt!2 5 1 15 Time (minutes) Supplementary Figure 1: Number of med-1,2, end-3, end-1 and

More information

Introduction. Gene expression is the combined process of :

Introduction. Gene expression is the combined process of : 1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression

More information

GSBHSRSBRSRRk IZTI/^Q. LlML. I Iv^O IV I I I FROM GENES TO GENOMES ^^^H*" ^^^^J*^ ill! BQPIP. illt. goidbkc. itip31. li4»twlil FIFTH EDITION

GSBHSRSBRSRRk IZTI/^Q. LlML. I Iv^O IV I I I FROM GENES TO GENOMES ^^^H* ^^^^J*^ ill! BQPIP. illt. goidbkc. itip31. li4»twlil FIFTH EDITION FIFTH EDITION IV I ^HHk ^ttm IZTI/^Q i I II MPHBBMWBBIHB '-llwmpbi^hbwm^^pfc ' GSBHSRSBRSRRk LlML I I \l 1MB ^HP'^^MMMP" jflp^^^^^^^^st I Iv^O FROM GENES TO GENOMES %^MiM^PM^^MWi99Mi$9i0^^ ^^^^^^^^^^^^^V^^^fii^^t^i^^^^^

More information

2. Der Dissertation zugrunde liegende Publikationen und Manuskripte. 2.1 Fine scale mapping in the sex locus region of the honey bee (Apis mellifera)

2. Der Dissertation zugrunde liegende Publikationen und Manuskripte. 2.1 Fine scale mapping in the sex locus region of the honey bee (Apis mellifera) 2. Der Dissertation zugrunde liegende Publikationen und Manuskripte 2.1 Fine scale mapping in the sex locus region of the honey bee (Apis mellifera) M. Hasselmann 1, M. K. Fondrk², R. E. Page Jr.² und

More information

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007 Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007 The Computational Lens Exposes the computational nature of natural processes and provides a language for their description.

More information

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON PROKARYOTE GENES: E. COLI LAC OPERON CHAPTER 13 CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON Figure 1. Electron micrograph of growing E. coli. Some show the constriction at the location where daughter

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Overexpression of YFP::GPR-1 in the germline.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Overexpression of YFP::GPR-1 in the germline. Supplementary Figure 1 Overexpression of YFP::GPR-1 in the germline. The pie-1 promoter and 3 utr were used to express yfp::gpr-1 in the germline. Expression levels from the yfp::gpr-1(cai 1.0)-expressing

More information

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics Chapter 18 Lecture Concepts of Genetics Tenth Edition Developmental Genetics Chapter Contents 18.1 Differentiated States Develop from Coordinated Programs of Gene Expression 18.2 Evolutionary Conservation

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Upstream Elements Regulating mir-241 and mir-48 Abstract Introduction

Upstream Elements Regulating mir-241 and mir-48 Abstract Introduction Upstream Elements Regulating mir-241 and mir-48 Hanna Vollbrecht, Tamar Resnick, and Ann Rougvie University of Minnesota: Twin Cities Undergraduate Research Scholarship 2012-2013 Abstract Caenorhabditis

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

GACE Biology Assessment Test I (026) Curriculum Crosswalk

GACE Biology Assessment Test I (026) Curriculum Crosswalk Subarea I. Cell Biology: Cell Structure and Function (50%) Objective 1: Understands the basic biochemistry and metabolism of living organisms A. Understands the chemical structures and properties of biologically

More information

When one gene is wild type and the other mutant:

When one gene is wild type and the other mutant: Series 2: Cross Diagrams Linkage Analysis There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome:

More information

Molecular Developmental Physiology and Signal Transduction

Molecular Developmental Physiology and Signal Transduction Prof. Dr. J. Vanden Broeck (Animal Physiology and Neurobiology - Dept. of Biology - KU Leuven) Molecular Developmental Physiology and Signal Transduction My Research Team Insect species under study +

More information

BIS &003 Answers to Assigned Problems May 23, Week /18.6 How would you distinguish between an enhancer and a promoter?

BIS &003 Answers to Assigned Problems May 23, Week /18.6 How would you distinguish between an enhancer and a promoter? Week 9 Study Questions from the textbook: 6 th Edition: Chapter 19-19.6, 19.7, 19.15, 19.17 OR 7 th Edition: Chapter 18-18.6 18.7, 18.15, 18.17 19.6/18.6 How would you distinguish between an enhancer and

More information

Biology EOC Review Study Questions

Biology EOC Review Study Questions Biology EOC Review Study Questions Microscopes and Characteristics of Life 1. How do you calculate total magnification on a compound light microscope? 2. What is the basic building block of all living

More information

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype. Series 1: Cross Diagrams There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome: When both

More information

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling Lethality and centrality in protein networks Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling molecules, or building blocks of cells and microorganisms.

More information

Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

with%dr.%van%buskirk%%%

with%dr.%van%buskirk%%% with%dr.%van%buskirk%%% How$to$do$well?$ Before$class:$read$the$corresponding$chapter$ Come$to$class$ready$to$par9cipate$in$Top$Hat$ Don t$miss$an$exam!!!!!!!!!!!!!!!!!!!!!!!!!!$ But$I m$not$good$with$science

More information

16 CONTROL OF GENE EXPRESSION

16 CONTROL OF GENE EXPRESSION 16 CONTROL OF GENE EXPRESSION Chapter Outline 16.1 REGULATION OF GENE EXPRESSION IN PROKARYOTES The operon is the unit of transcription in prokaryotes The lac operon for lactose metabolism is transcribed

More information

Allele interactions: Terms used to specify interactions between alleles of the same gene: Dominant/recessive incompletely dominant codominant

Allele interactions: Terms used to specify interactions between alleles of the same gene: Dominant/recessive incompletely dominant codominant Biol 321 Feb 3, 2010 Allele interactions: Terms used to specify interactions between alleles of the same gene: Dominant/recessive incompletely dominant codominant Gene interactions: the collaborative efforts

More information

Biology I Level - 2nd Semester Final Review

Biology I Level - 2nd Semester Final Review Biology I Level - 2nd Semester Final Review The 2 nd Semester Final encompasses all material that was discussed during second semester. It s important that you review ALL notes and worksheets from the

More information

Biology 112 Practice Midterm Questions

Biology 112 Practice Midterm Questions Biology 112 Practice Midterm Questions 1. Identify which statement is true or false I. Bacterial cell walls prevent osmotic lysis II. All bacterial cell walls contain an LPS layer III. In a Gram stain,

More information

Comparative Network Analysis

Comparative Network Analysis Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Regulation of Gene Expression

Regulation of Gene Expression Chapter 18 Regulation of Gene Expression Edited by Shawn Lester PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley

More information

Unit 3 - Molecular Biology & Genetics - Review Packet

Unit 3 - Molecular Biology & Genetics - Review Packet Name Date Hour Unit 3 - Molecular Biology & Genetics - Review Packet True / False Questions - Indicate True or False for the following statements. 1. Eye color, hair color and the shape of your ears can

More information

1. Draw, label and describe the structure of DNA and RNA including bonding mechanisms.

1. Draw, label and describe the structure of DNA and RNA including bonding mechanisms. Practicing Biology BIG IDEA 3.A 1. Draw, label and describe the structure of DNA and RNA including bonding mechanisms. 2. Using at least 2 well-known experiments, describe which features of DNA and RNA

More information

Bio 3411, Fall 2006, Lecture 19-Cell Death.

Bio 3411, Fall 2006, Lecture 19-Cell Death. Types of Cell Death Questions : Apoptosis (Programmed Cell Death) : Cell-Autonomous Stereotypic Rapid Clean (dead cells eaten) Necrosis : Not Self-Initiated Not Stereotypic Can Be Slow Messy (injury can

More information

Genetics 275 Notes Week 7

Genetics 275 Notes Week 7 Cytoplasmic Inheritance Genetics 275 Notes Week 7 Criteriafor recognition of cytoplasmic inheritance: 1. Reciprocal crosses give different results -mainly due to the fact that the female parent contributes

More information

56:198:582 Biological Networks Lecture 10

56:198:582 Biological Networks Lecture 10 56:198:582 Biological Networks Lecture 10 Temporal Programs and the Global Structure The single-input module (SIM) network motif The network motifs we have studied so far all had a defined number of nodes.

More information

Is Molecular Genetics Becoming Less Reductionistic?

Is Molecular Genetics Becoming Less Reductionistic? Is Molecular Genetics Becoming Less Reductionistic? Notes from recent case studies on mapping C. elegans and the discovery of microrna Richard M. Burian Virginia Tech rmburian@vt.edu Outline Introduction

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

The Role of Inorganic Carbon Transport and Accumulation in the CO 2 -Concentrating Mechanism and CO 2 Assimilation in Chlamydomonas

The Role of Inorganic Carbon Transport and Accumulation in the CO 2 -Concentrating Mechanism and CO 2 Assimilation in Chlamydomonas The Role of Inorganic Carbon Transport and Accumulation in the CO 2 -Concentrating Mechanism and CO 2 Assimilation in Chlamydomonas Is there a Role for the CCM in Increasing Biological CO 2 Capture? Generalized

More information

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

Map of AP-Aligned Bio-Rad Kits with Learning Objectives Map of AP-Aligned Bio-Rad Kits with Learning Objectives Cover more than one AP Biology Big Idea with these AP-aligned Bio-Rad kits. Big Idea 1 Big Idea 2 Big Idea 3 Big Idea 4 ThINQ! pglo Transformation

More information

Honors Biology Reading Guide Chapter 11

Honors Biology Reading Guide Chapter 11 Honors Biology Reading Guide Chapter 11 v Promoter a specific nucleotide sequence in DNA located near the start of a gene that is the binding site for RNA polymerase and the place where transcription begins

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression 13.4 Gene Regulation and Expression THINK ABOUT IT Think of a library filled with how-to books. Would you ever need to use all of those books at the same time? Of course not. Now picture a tiny bacterium

More information

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17. Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Eukaryotic Gene Expression

Eukaryotic Gene Expression Eukaryotic Gene Expression Lectures 22-23 Several Features Distinguish Eukaryotic Processes From Mechanisms in Bacteria 123 Eukaryotic Gene Expression Several Features Distinguish Eukaryotic Processes

More information

Objective 3.01 (DNA, RNA and Protein Synthesis)

Objective 3.01 (DNA, RNA and Protein Synthesis) Objective 3.01 (DNA, RNA and Protein Synthesis) DNA Structure o Discovered by Watson and Crick o Double-stranded o Shape is a double helix (twisted ladder) o Made of chains of nucleotides: o Has four types

More information

BioControl - Week 6, Lecture 1

BioControl - Week 6, Lecture 1 BioControl - Week 6, Lecture 1 Goals of this lecture Large metabolic networks organization Design principles for small genetic modules - Rules based on gene demand - Rules based on error minimization Suggested

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

1. What are the three general areas of the developing vertebrate limb? 2. What embryonic regions contribute to the developing limb bud?

1. What are the three general areas of the developing vertebrate limb? 2. What embryonic regions contribute to the developing limb bud? Study Questions - Lecture 17 & 18 1. What are the three general areas of the developing vertebrate limb? The three general areas of the developing vertebrate limb are the proximal stylopod, zeugopod, and

More information

Measuring TF-DNA interactions

Measuring TF-DNA interactions Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF

More information

SYSTEMS BIOLOGY 1: NETWORKS

SYSTEMS BIOLOGY 1: NETWORKS SYSTEMS BIOLOGY 1: NETWORKS SYSTEMS BIOLOGY Starting around 2000 a number of biologists started adopting the term systems biology for an approach to biology that emphasized the systems-character of biology:

More information

Why Flies? stages of embryogenesis. The Fly in History

Why Flies? stages of embryogenesis. The Fly in History The Fly in History 1859 Darwin 1866 Mendel c. 1890 Driesch, Roux (experimental embryology) 1900 rediscovery of Mendel (birth of genetics) 1910 first mutant (white) (Morgan) 1913 first genetic map (Sturtevant

More information

Bacterial diet affects vulval organogenesis in Caenorhabditis elegans Mediator kinase module mutants

Bacterial diet affects vulval organogenesis in Caenorhabditis elegans Mediator kinase module mutants Correspondence taubert@cmmt.ubc.ca Disciplines Genetics Keywords Signal Transduction Transcription Gene Expression Models, Genetic Type of Observation Standalone Type of Link Orphan Data Submitted May

More information

Genetically Engineering Yeast to Understand Molecular Modes of Speciation

Genetically Engineering Yeast to Understand Molecular Modes of Speciation Genetically Engineering Yeast to Understand Molecular Modes of Speciation Mark Umbarger Biophysics 242 May 6, 2004 Abstract: An understanding of the molecular mechanisms of speciation (reproductive isolation)

More information

Bi Lecture 9 Genetic Screens (cont.) Chromosomes

Bi Lecture 9 Genetic Screens (cont.) Chromosomes Bi190-2013 Lecture 9 Genetic Screens (cont.) Chromosomes C. elegans EGF-receptor signaling: a branched signaling pathway LET-23 EGF-R [IP2] PLCγ [IP3] [PIP2] ITR-1 IP3 Receptor SEM-5 Grb2 LET-341 SOS LET-60

More information

Name Date Period Unit 1 Basic Biological Principles 1. What are the 7 characteristics of life?

Name Date Period Unit 1 Basic Biological Principles 1. What are the 7 characteristics of life? Unit 1 Basic Biological Principles 1. What are the 7 characteristics of life? Eukaryotic cell parts you should be able a. to identify and label: Nucleus b. Nucleolus c. Rough/smooth ER Ribosomes d. Golgi

More information

MIR-237 is Likely a Developmental Timing Gene that Regulates the L2-to-L3 Transition in C. Elegans

MIR-237 is Likely a Developmental Timing Gene that Regulates the L2-to-L3 Transition in C. Elegans Marquette University e-publications@marquette Master's Theses (2009 -) Dissertations, Theses, and Professional Projects MIR-237 is Likely a Developmental Timing Gene that Regulates the L2-to-L3 Transition

More information

REQUIREMENTS FOR THE BIOCHEMISTRY MAJOR

REQUIREMENTS FOR THE BIOCHEMISTRY MAJOR REQUIREMENTS FOR THE BIOCHEMISTRY MAJOR Grade Requirement: All courses required for the Biochemistry major (CH, MATH, PHYS, BI courses) must be graded and passed with a grade of C- or better. Core Chemistry

More information

UNIT 5. Protein Synthesis 11/22/16

UNIT 5. Protein Synthesis 11/22/16 UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA

More information

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Name: SBI 4U. Gene Expression Quiz. Overall Expectation: Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Systems biology and biological networks

Systems biology and biological networks Systems Biology Workshop Systems biology and biological networks Center for Biological Sequence Analysis Networks in electronics Radio kindly provided by Lazebnik, Cancer Cell, 2002 Systems Biology Workshop,

More information

Chromosome Chr Duplica Duplic t a ion Pixley

Chromosome Chr Duplica Duplic t a ion Pixley Chromosome Duplication Pixley Figure 4-6 Molecular Biology of the Cell ( Garland Science 2008) Figure 4-72 Molecular Biology of the Cell ( Garland Science 2008) Interphase During mitosis (cell division),

More information

Lecture 2: Read about the yeast MAT locus in Molecular Biology of the Gene. Watson et al. Chapter 10. Plus section on yeast as a model system Read

Lecture 2: Read about the yeast MAT locus in Molecular Biology of the Gene. Watson et al. Chapter 10. Plus section on yeast as a model system Read Lecture 2: Read about the yeast MAT locus in Molecular Biology of the Gene. Watson et al. Chapter 10. Plus section on yeast as a model system Read chapter 22 and chapter 10 [section on MATing type gene

More information

C. elegans L1 cell adhesion molecule functions in axon guidance

C. elegans L1 cell adhesion molecule functions in axon guidance C. elegans L1 cell adhesion molecule functions in axon guidance Biorad Lihsia Chen Dept. of Genetics, Cell Biology & Development Developmental Biology Center C. elegans embryogenesis Goldstein lab, UNC-Chapel

More information

13.4 Gene Regulation and Expression

13.4 Gene Regulation and Expression 13.4 Gene Regulation and Expression Lesson Objectives Describe gene regulation in prokaryotes. Explain how most eukaryotic genes are regulated. Relate gene regulation to development in multicellular organisms.

More information

Mole_Oce Lecture # 24: Introduction to genomics

Mole_Oce Lecture # 24: Introduction to genomics Mole_Oce Lecture # 24: Introduction to genomics DEFINITION: Genomics: the study of genomes or he study of genes and their function. Genomics (1980s):The systematic generation of information about genes

More information

Study Guide: Fall Final Exam H O N O R S B I O L O G Y : U N I T S 1-5

Study Guide: Fall Final Exam H O N O R S B I O L O G Y : U N I T S 1-5 Study Guide: Fall Final Exam H O N O R S B I O L O G Y : U N I T S 1-5 Directions: The list below identifies topics, terms, and concepts that will be addressed on your Fall Final Exam. This list should

More information

Biological Networks. Gavin Conant 163B ASRC

Biological Networks. Gavin Conant 163B ASRC Biological Networks Gavin Conant 163B ASRC conantg@missouri.edu 882-2931 Types of Network Regulatory Protein-interaction Metabolic Signaling Co-expressing General principle Relationship between genes Gene/protein/enzyme

More information

Sex and the single worm: sex determination in the nematode C. elegans

Sex and the single worm: sex determination in the nematode C. elegans Mechanisms of Development 83 (1999) 3 15 Review article Sex and the single worm: sex determination in the nematode C. elegans Dave Hansen, Dave Pilgrim* Department of Biological Sciences, University of

More information

allosteric cis-acting DNA element coding strand dominant constitutive mutation coordinate regulation of genes denatured

allosteric cis-acting DNA element coding strand dominant constitutive mutation coordinate regulation of genes denatured A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AA BB CC DD EE FF GG HH II JJ KK LL MM NN OO PP QQ RR SS TT UU VV allosteric cis-acting DNA element coding strand codominant constitutive mutation coordinate

More information

Clustering and Network

Clustering and Network Clustering and Network Jing-Dong Jackie Han jdhan@picb.ac.cn http://www.picb.ac.cn/~jdhan Copy Right: Jing-Dong Jackie Han What is clustering? A way of grouping together data samples that are similar in

More information

You are required to know all terms defined in lecture. EXPLORE THE COURSE WEB SITE 1/6/2010 MENDEL AND MODELS

You are required to know all terms defined in lecture. EXPLORE THE COURSE WEB SITE 1/6/2010 MENDEL AND MODELS 1/6/2010 MENDEL AND MODELS!!! GENETIC TERMINOLOGY!!! Essential to the mastery of genetics is a thorough knowledge and understanding of the vocabulary of this science. New terms will be introduced and defined

More information

7.06 Problem Set #4, Spring 2005

7.06 Problem Set #4, Spring 2005 7.06 Problem Set #4, Spring 2005 1. You re doing a mutant hunt in S. cerevisiae (budding yeast), looking for temperaturesensitive mutants that are defective in the cell cycle. You discover a mutant strain

More information

Conclusions. The experimental studies presented in this thesis provide the first molecular insights

Conclusions. The experimental studies presented in this thesis provide the first molecular insights C h a p t e r 5 Conclusions 5.1 Summary The experimental studies presented in this thesis provide the first molecular insights into the cellular processes of assembly, and aggregation of neural crest and

More information

REQUIREMENTS FOR THE BIOCHEMISTRY MAJOR

REQUIREMENTS FOR THE BIOCHEMISTRY MAJOR REQUIREMENTS FOR THE BIOCHEMISTRY MAJOR Grade Requirement: All courses required for the Biochemistry major (CH, MATH, PHYS, BI courses) must be graded and passed with a grade of C- or better. Core Chemistry

More information

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype. Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype. please read pages 38-47; 49-55;57-63. Slide 1 of Chapter 2 1 Extension sot Mendelian Behavior of Genes Single gene inheritance

More information

Reading: Chapter 5, pp ; Reference chapter D, pp Problem set F

Reading: Chapter 5, pp ; Reference chapter D, pp Problem set F Mosaic Analysis Reading: Chapter 5, pp140-141; Reference chapter D, pp820-823 Problem set F Twin spots in Drosophila Although segregation and recombination in mitosis do not occur at the same frequency

More information

Compare and contrast the cellular structures and degrees of complexity of prokaryotic and eukaryotic organisms.

Compare and contrast the cellular structures and degrees of complexity of prokaryotic and eukaryotic organisms. Subject Area - 3: Science and Technology and Engineering Education Standard Area - 3.1: Biological Sciences Organizing Category - 3.1.A: Organisms and Cells Course - 3.1.B.A: BIOLOGY Standard - 3.1.B.A1:

More information

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS 141 GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS MATTHIAS E. FUTSCHIK 1 ANNA TSCHAUT 2 m.futschik@staff.hu-berlin.de tschaut@zedat.fu-berlin.de GAUTAM

More information

Designer Genes C Test

Designer Genes C Test Northern Regional: January 19 th, 2019 Designer Genes C Test Name(s): Team Name: School Name: Team Number: Rank: Score: Directions: You will have 50 minutes to complete the test. You may not write on the

More information

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS. !! www.clutchprep.com CONCEPT: HISTORY OF GENETICS The earliest use of genetics was through of plants and animals (8000-1000 B.C.) Selective breeding (artificial selection) is the process of breeding organisms

More information

Bi 1x Spring 2014: LacI Titration

Bi 1x Spring 2014: LacI Titration Bi 1x Spring 2014: LacI Titration 1 Overview In this experiment, you will measure the effect of various mutated LacI repressor ribosome binding sites in an E. coli cell by measuring the expression of a

More information