Functional exploration of the C. elegans genome using DNA microarrays

Similar documents
GENOME sequencing projects of multiple organisms

Genome-wide germline-enriched and sex-biased expression profiles in Caenorhabditis elegans

Eukaryotic Gene Expression

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Caenorhabditis elegans

Eukaryotic vs. Prokaryotic genes

Honors Biology Reading Guide Chapter 11

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics

Control of Gene Expression

Exam 1 ID#: October 4, 2007

Introduction. Gene expression is the combined process of :

SUPPLEMENTARY INFORMATION

BIS &003 Answers to Assigned Problems May 23, Week /18.6 How would you distinguish between an enhancer and a promoter?

16 CONTROL OF GENE EXPRESSION

Introduction to molecular biology. Mitesh Shrestha

doi: /nature09429

Chapter 11. Development: Differentiation and Determination

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Regulation of Gene Expression

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

Prokaryotic Regulation

Peter Pristas. Gene regulation in eukaryotes

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

Developmental Biology Lecture Outlines

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Green Fluorescent Protein (GFP) Today s Nobel Prize in Chemistry

Small RNA in rice genome

Chromosome Chr Duplica Duplic t a ion Pixley

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

RNA Synthesis and Processing

Controlling Gene Expression

Bypass and interaction suppressors; pathway analysis

Welcome to Class 21!

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering

13.4 Gene Regulation and Expression

Regulation of Transcription in Eukaryotes

Full file at CHAPTER 2 Genetics

Measuring TF-DNA interactions

Multiple Choice Review- Eukaryotic Gene Expression

Genomes and Their Evolution

Regulation of gene Expression in Prokaryotes & Eukaryotes

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

STUDY UNIT 1 MITOSIS AND MEIOSIS. Klug, Cummings & Spencer Chapter 2. Morphology of eukaryotic metaphase chromosomes. Chromatids

A complementation test would be done by crossing the haploid strains and scoring the phenotype in the diploids.

GSBHSRSBRSRRk IZTI/^Q. LlML. I Iv^O IV I I I FROM GENES TO GENOMES ^^^H*" ^^^^J*^ ill! BQPIP. illt. goidbkc. itip31. li4»twlil FIFTH EDITION

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Regulation of gene expression. Premedical - Biology

Supplementary Materials for

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Why Flies? stages of embryogenesis. The Fly in History

1. Draw, label and describe the structure of DNA and RNA including bonding mechanisms.

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Genetics 275 Notes Week 7

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

Bio 119 Bacterial Genomics 6/26/10

Principles of Genetics

12-5 Gene Regulation

ASSESSING TRANSLATIONAL EFFICIACY THROUGH POLY(A)- TAIL PROFILING AND IN VIVO RNA SECONDARY STRUCTURE DETERMINATION

BIOLOGY 111. CHAPTER 5: Chromosomes and Inheritance

Introduction to Molecular and Cell Biology

Clustering and Network

C. elegans L1 cell adhesion molecule functions in axon guidance

Wan-Ju Liu 1,2, John S Reece-Hoyes 3, Albertha JM Walhout 3 and David M Eisenmann 1*

Developmental genetics: finding the genes that regulate development

A diploid somatic cell from a rat has a total of 42 chromosomes (2n = 42). As in humans, sex chromosomes determine sex: XX in females and XY in males.

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Upstream Elements Regulating mir-241 and mir-48 Abstract Introduction

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p.

Unit 5: Cell Division and Development Guided Reading Questions (45 pts total)

2. Der Dissertation zugrunde liegende Publikationen und Manuskripte. 2.1 Fine scale mapping in the sex locus region of the honey bee (Apis mellifera)

Compare and contrast the cellular structures and degrees of complexity of prokaryotic and eukaryotic organisms.

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005

SUPPLEMENTARY INFORMATION

University of Massachusetts Medical School Wan-Ju Liu University of Maryland

networks in molecular biology Wolfgang Huber

Introduction to Bioinformatics

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Supplementary Materials for

Chapter 18 Regulation of Gene Expression

1. In most cases, genes code for and it is that

The Making of the Fittest: Evolving Switches, Evolving Bodies

18.4 Embryonic development involves cell division, cell differentiation, and morphogenesis

CELL REPRODUCTION. Unit 20 LEARNING OBJECTIVES:

Genetically Engineering Yeast to Understand Molecular Modes of Speciation

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Overexpression of YFP::GPR-1 in the germline.

MOLECULAR CONTROL OF EMBRYONIC PATTERN FORMATION

Gene expression in prokaryotic and eukaryotic cells, Plasmids: types, maintenance and functions. Mitesh Shrestha

CONJOINT 541. Translating a Transcriptome at Specific Times and Places. David Morris. Department of Biochemistry

GCD3033:Cell Biology. Transcription

Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons

Proteomics. Areas of Interest

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

An introduction to SYSTEMS BIOLOGY

Mole_Oce Lecture # 24: Introduction to genomics

Transcription:

Functional exploration of the C. elegans genome using DNA microarrays Valerie Reinke doi:10.1038/ng1039 Global changes in gene expression underlie developmental processes such as organogenesis, embryogenesis and aging in Caenorhabditis elegans. Recently developed methods allow gene expression profiles to be determined selectively for individual tissues and cell types. Results from both whole-animal and tissue-specific expression profiling have provided an unprecedented view into genome organization and gene function. Integration of these results with other types of functional genomics data gathered from RNA-mediated interference and yeast two-hybrid analyses will allow rapid identification and exploration of the complex functional gene networks that govern C. elegans development. Gene expression profiling and related technologies have provided important insights into gene regulation in single cell organisms such as Saccharomyces cerevisiae. The adaptation of these technologies for use with developmental genetic model organisms, such as C. elegans, Drosophila and mouse, presents two main challenges. First, the increasingly complex genome of metazoans makes gene prediction and manipulation more difficult. Second, working with organisms composed of more than one type of cell can complicate experimental design and data analysis. The nematode C. elegans provides an excellent system in which to begin to address these challenges. The genome sequence of the nematode C. elegans was the first to be completed for a multicellular organism 1, providing a springboard into the realm of functional genomics for metazoans. Compared with mammalian genomes, the C. elegans genome has relatively low complexity. For instance, the human genome is 30 times the size of the C. elegans genome, but has only twice the number of genes. Individual C. elegans genes contain few and relatively short introns, with an average intergenic distance of only 5 kb. It is also relatively straightforward to carry out genetic modifications in C. elegans, including mutation, deletion and transgenesis; therefore, functional predictions from genomic analyses can be tested rapidly. Furthermore, C. elegans has an invariant developmental program and the precise history and fate of each of its 959 somatic cells is known. This feature increases our ability to link gene functions to precisely defined developmental events within specific cells. Only 7% of the genes in the nematode genome have been associated with a specific biological process or biochemical function by classical forward genetic or biochemical analysis 2. To increase this percentage, several functional genomics platforms, including DNA microarrays, RNA-mediated interference (RNAi) and the yeast two-hybrid assay, are currently used. Gene expression profiling using DNA microarrays or related technologies provides a strong complement to classical genetic screens. For example, genes whose mutation does not lead to a detectable phenotype can be associated with specific biological processes on the basis of their co-regulation with known factors. Genome-wide expression profiles can describe biological processes in more molecular detail than was previously imaginable. In addition, such profiles can disclose global characteristics in the organism that cannot be revealed by single gene studies. This highlights several recent applications of DNA microarrays to C. elegans research. Analysis of the data from such experiments extends beyond straightforward description of gene expression differences to the discovery of previously unsuspected characteristics of genome organization and gene function. Tissue-specific profiling Several gene expression profiling experiments in C. elegans have investigated the transcriptional programs underlying the development of distinct tissues. The small size and internal pressure of the nematode generally prevent the effective dissection of distinct organs or tissues for use in DNA microarray experiments. As a result, several alternative methods have been developed, including approaches based on mutants, transgenics, tissue culture and global clustering analysis. Because the presence of many types of cells in any animal or tissue can complicate the interpretation of gene expression data, the methods developed by C. elegans researchers to avoid this difficulty are of general interest. Mutants. Gene expression profiles can be compared between wildtype animals and mutants with an overabundance or absence of a specific tissue. This method of comparison has been used successfully to detect genes expressed in the pharynx and in the germline 3,4, some of which represent the downstream effectors of developmental cascades directed by tissue-specific transcription factors. A set of gene expression profiling experiments (ref. 4, and V.R. and S. Ward, unpublished data) has examined the molecular basis of germline formation. These experiments have shown that 25% of the genes in the genome have significant germline-enriched gene expression. Expression profiling of germline mutants that affect oogenesis and spermatogenesis has then allowed germlineexpressed genes to be categorized into three groups: spermenriched genes, oocyte-enriched genes and germline-intrinsic genes (germline-enriched genes whose expression is not affected significantly by the type of gamete being produced). The identification of this set of genes with known tissue-specific expression Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06520, USA (e-mail: valerie.reinke@yale.edu). nature genetics supplement volume 32 december 2002 541

a mrna tagging FLAG-PAB-1 mrna crosslink and lyse total lysate IP with anti-flag antibodies mrna RT and label with cy3 hybridize to microarray wt PAB-1 only RT and label with cy5 and known genomic location has facilitated the positional cloning of genes for which a mutant phenotype affecting germline development is available. For example, many genes in which mutations cause either spermatogenesis-defective or meiotic segregation phenotypes (syp-1) were cloned rapidly after the identification of sets of genes whose expression is enriched during spermatogenesis and in the germline (refs 5,6, and S. Ward, pers. comm.). Another set of microarray experiments has identified the molecular targets of a key regulatory factor, PHA-4, that is required for the differentiation of the pharynx 3,7. Gaudet and Mango 3 used microarray analysis to identify 240 candidate pharynx-enriched transcripts by comparing embryos with excess pharyngeal development to those with no pharyngeal tissue. Independent verification of the expression of these genes showed that 82% are expressed primarily or solely in the pharynx and that they are activated at different times during pharyngeal development. The consensus sequence bound by PHA-4 and other FoxA proteins (RTTKRY) were found in the promoters of many of the pharynx-enriched genes. The authors made reporter constructs of green fluorescent protein (GFP) fused to eight selected promoters and found that the affinity of PHA-4 for the consensus binding site was correlated with the timing of induction of the relevant genes for all eight promoters. Distinct mutations of the site that either increased or decreased the affinity of PHA-4 binding altered the timing of expression. The amount of PHA-4 itself increases during pharynx development 7. These data indicate that PHA-4 functions by continuously controlling pharynx development through the timed activation of differentiation genes, rather than simply initiating a cascade of gene expression. New technologies for tissue-specific profiling. Tissues such as the germline and pharynx can be profiled using whole animals or embryos because each tissue comprises a significant proportion of the animal, and temperature-sensitive mutants are available that result in the loss of either tissue. But mutants as convenient as these are not necessarily available for other tissues. In addition, many tissues in C. elegans consist of no more than one or a few cells; genes expressed specifically in such a small handful of cells might not be detectable when mrna from whole animals is isolated. Also, a tissue-specific change in the expression of a given b cell culture/facs mutant wt embryos embryos dissociate and culture overnight RT and label with cy3 GFP + touch neuron sort for GFP + hybridize to microarray RT and label with cy5 Bob Crimi Fig. 1 Methods for measuring tissuespecific gene expression. a, mrna tagging. Flag PAB-1 is expressed in a specific tissue, where it binds to polyadenylated RNAs. Blue circles indicate Flag-tagged PAB-1; yellow circles indicate wildtype PAB-1. After crosslinking and lysis, the mrnas bound by PAB-1 are isolated by immunoprecipitation (IP) with antibodies against Flag and identified by microarray analysis. RT, reverse transcription. b, Cell culture and FACS. Embryos are dissociated and cultured to induce differentiation. Pink circles indicate GFP-negative cells; green cells indicate GFP-positive cells (such as differentiating touch neurons). FACS is used to enrich for specific cells that express GFP. The gene expression differences between wildtype and mutant cells are identified by microarray analysis. gene might be masked by uniform expression in other tissues. Two technologies have now been developed that allow researchers to overcome these limitations and to examine gene expression in specific cell types (Fig. 1). First, mrna tagging 8 exploits the properties of an RNAbinding protein, PAB-1, which binds tightly to the poly(a) tails of almost all mrnas 9. A transgene encoding an epitope-tagged PAB-1 protein (Flag PAB-1) is placed under the control of a promoter that drives expression in a tissue-specific manner (Fig. 1a). Any mrnas expressed in that tissue will be associated with Flag PAB-1 and will be co-immunoprecipitated by an antibody against Flag. The identities of these mrnas are then mapped using DNA microarrays. mrna tagging has been used successfully to define 1,364 genes that are significantly enriched in C. elegans muscle 8. Many well-characterized promoters are available to drive expression of Flag PAB-1 and allow the investigation of mrnas expressed in diverse tissues, at different developmental times and in different growth conditions and genetic backgrounds. This technology is not limited to C. elegans, because the function of PAB-1 is highly conserved from yeast to humans. Second, at least two types of cells, neuronal and muscle, can be cultured from dissociated embryos 10. Using fluorescenceactivated cell sorting (FACS), specific subtypes of either muscle or neuronal cells expressing GFP under the control of a tissuespecific promoter can be enriched from a mixed culture. This technique has been used to examine gene expression in a single type of sensory neuron in C. elegans, the touch receptor neuron, of which only six cells exist in a single animal (Fig. 1b) 11. As the transcription factor MEC-3 controls the differentiation of touch receptor neurons 12, Zhang et al. 11 cultured wildtype and mec-3 mutant cells from embryos expressing GFP in presumptive touch neuron cells 11. In the original culture, GFP-positive touch receptor neurons accounted for less than 1% of the cell population; by using cell sorting, Zhang et al. increased this proportion to 40 60%. Using microarrays to compare gene expression between wildtype and mec-3 mutant neurons, they identified 71 MEC-3-dependent genes, including many of the known targets of MEC-3, and were able to link one gene to an existing yet uncloned mutant, mec-17. They also examined promoter sequences for a known MEC-3/UNC-86 heterodimer consensus site (AAATT/GCAT) and found that this site was significantly overrepresented among the best candidate clones, along with a 542 nature genetics supplement volume 32 december 2002

a Fig. 2 Genomic constraints on gene expression. a, Germline X-chromosome silencing. Antibodies against histone H3 methylated on lysine 4, which marks areas of competent gene expression, fails to stain a single chromosome the X chromosome. Green areas indicate antibody staining; red areas indicate 4', 6-diamidino-2-phenylindole dihydrochloride (DAPI) staining of DNA. Image provided by W.G. Kelly. b, Chromosomal clustering. Genes located within 10 25 kb of each other that do not share the same promoter are often expressed in the same tissue (such as muscle). Intervening genes do not always have the same expression profile. c, Operons. In an operon, adjacent genes (red and green) are expressed as a single transcript under control of one promoter and then trans-spliced. The first gene receives the SL1 leader sequence (blue); the downstream gene receives the SL2 leader sequence (orange). b c Bob Crimi previously unknown sequence, TCATCA, located immediately 5' to the MEC-3/UNC-86 consensus site. Notably, several genes encoding chaperonin molecules have MEC-3-dependent expression in neurons, but are expressed widely in other tissues in a MEC-3-independent manner. These genes would not have been identified as MEC-3 target genes in whole-animal studies, illustrating the necessity for tissue-specific profiling. Each of the two approaches described above has strengths and drawbacks. A concern with the mrna tagging technique is that PAB-1 might have different binding affinities for different transcripts, and therefore some tissue-specific transcripts with low PAB-1 affinity might not be detected. For both mrna tagging and cell culture techniques, a tissue-specific promoter is required to drive expression of either PAB-1 or GFP to distinguish the tissue of interest from all others, but a truly cell-typespecific promoter can be difficult to find. For the current cell culturing technology, the tissue (and promoter) of interest must be embryonic, or at least capable of differentiating from cultured embryonic cells. In addition, the FACS sorting for the culturing method does not provide a completely homogenous population of cells, which potentially reduces the identification of relevant transcripts. Overall, however, both of these approaches have the strength of effectively isolating the transcripts corresponding to a single tissue from those of the rest of the animal. Global clustering. A strikingly different, integrated approach can be used to define genes that are expressed differentially in individual tissues. A global gene expression survey used microarray data gathered from over 500 diverse experiments carried out by different C. elegans researchers who examined many different experimental and mutant conditions 13. The underlying premise behind this approach was that the co-regulation of previously unknown genes with known genes across diverse conditions would provide functional information about the unknown genes, because co-regulated genes are likely to encode proteins that function in related biological or biochemical processes. The correlations among gene expression profiles were determined using a hierarchical clustering algorithm, in combination with a visualization method that renders the relationships between the genes into a three-dimensional topological map 13. In this map, genes with correlated expression were grouped in clusters. The overrepresentation of particular types of genes within a cluster resulted in a broad functional classification for 30 of the 44 clusters (and by extension, the unknown genes within). Whereas some clusters were enriched with genes that were likely to be expressed in a particular tissue, such as neurons or intestine, other clusters contained genes encoding proteins with similar biochemical functions such as heat shock genes or histones. Genomic organization Several expression profiling experiments in C. elegans, including two of the tissue-specific experiments described above, have identified sets of genes showing a non-random arrangement in the genome. These data provide important information about the constraints placed on genome organization by particular requirements for gene expression (Fig. 2). Chromosome-based gene exclusion. The early microarray studies that identified germline-enriched transcripts 4 also showed that the corresponding genomic loci are markedly underrepresented on the X chromosome. Genes with sperm-enriched and germline-intrinsic expression are located almost exclusively on the five autosome pairs. In contrast, genes with oocyte-enriched expression are less restricted in their genomic location although whole-genome microarray analysis of germline gene expression has detected a slight bias also against X-chromosome locations for oocyte-enriched genes (V.R. and S. Ward, unpublished data). This observation led to the hypothesis that the X chromosome is not able to activate gene expression appropriately in the germ cells of XO males, which only produce sperm. Subsequent studies examining histone modifications that correlate with active gene expression have shown that the X chromosome does not contain detectable quantities of these modifications 14. It therefore seems likely that most genes on the X chromosome are indeed silenced in the male germline. Unexpectedly, similar types of experiments have shown that the X chromosomes are also silenced in the hermaphrodite (XX) germline (Fig. 2a) 14. The maternal-effect sterile (MES) proteins, which share sequence identity with the known chromatinbinding repressor Polycomb group proteins in Drosophila 15 17, are required for germ cell viability and germline gene silencing in hermaphrodites, although the target genes were unknown. The germline microarray studies led to the demonstration that MES function is required for X-chromosome silencing 18. Together these data provide the first demonstration of sex chromosome silencing in the germline of a homogametic (XX) species. The identification of silencing of a whole chromosome pair in a major tissue would not have been achieved easily by studies examining gene function or expression on a gene-bygene basis, but instead required a global examination of gene expression. nature genetics supplement volume 32 december 2002 543

Table 1 Websites with gene expression lists Website Address Gene expression lists Genomics publications by http://cmgm.stanford.edu/ kimlab/publications.htm developmental timecourse 23 ; aging Kim lab timecourse 24 ; mrna tagging 8 ; global clustering 13 Genomic analysis of gene http://www.sciencemag.org/feature/data/1053496.shl developmental timecourse 22 expression in C. elegans Nematode Expression http://nematode.lab.nig.ac.jp in situ hybridization of ESTs Pattern Database WormBase http://wormbase.org developmental timecourses 22,23 ; global clustering 13 ; and promoter reporter studies 25 Germline gene expression http://cmgm.stanford.edu/ kimlab/germline germline studies 4 in C. elegans (web supplement to ref. 4) Regulation of organogenesis http://www.sciencemag.org/cgi/content/full/295/5556/821/dc1 pharynx studies 3 by the C. elegans FoxA protein PHA-4 Localized clustering of co-expressed genes. Gene expression profiles of specific tissues have also identified regulatory domains containing small groups of co-expressed genes 8. Analysis of 1,364 genes expressed in muscle, which were identified through mrna tagging, indicated that over 30% of these genes have start positions located within 10 kb of another muscle gene (Fig. 2b). Genes that might share promoter-binding sites, such as genes in operons (see below) or tandemly duplicated genes, were excluded from this analysis. These gene clusters usually contain at least two co-expressed genes, with the largest cluster containing five genes. For many genes, an intervening gene with divergent or no expression was found to be located between co-expressed genes. The clustered genes encode proteins with many different predicted functions. This observation extends to other existing datasets of genes that are co-regulated by tissue type, including the germline, neurons and intestine. By contrast, genes grouped by biochemical function (for example, transcription factors, metabolism enzymes, lipid proteins) are not significantly co-localized along chromosomes. Possibly, co-expressed gene clusters may correspond to open chromatin domains, or shared tissue-specific enhancer elements might drive the expression of these neighboring genes. Identification of operons. The C. elegans genome is relatively unique among metazoans because some of its genes are arranged in operons; that is, polycistronic messages are transcribed and then trans-spliced into monocistronic messages before translation 19. The primary transcript of the operon receives a small RNA leader sequence called SL1, and downstream transcripts receive a different leader, SL2 (Fig. 2a). Before the development of DNA microarrays, only a few operons had been identified experimentally. Recently, Blumenthal et al. 20 used whole-genome DNA microarrays to identify roughly 1,200 candidate downstream genes whose transcripts were spliced to an SL2 leader. The genomic structure of each candidate gene was evaluated for whether it had characteristics expected of downstream genes in operons, such as close proximity to an upstream gene. Of the 1,200 candidate genes, 86% had a genomic locus indicative of a downstream gene. By combining these data with expressed sequence tag (EST) data, Blumenthal et al. 20 concluded that the genome contains at least 1,068 operons, with 13 15% of all C. elegans genes existing in operons. The average operon contains 2.6 genes, and the longest contains 8 genes. The origin and requirement for these operons in C. elegans is not understood, but it has been speculated that the co-regulation of functionally related genes by arrangement into operons might have been selected since the limited space of the relatively compact C. elegans genome places constraints on the number of regulatory elements. Several operons do seem to encode functionally related proteins. However, only 4% of C. elegans operons contain genes encoding functionally related proteins, as defined by Gene Ontology Consortium categories, compared to 36% of bacterial operons 21. An alternative possibility is that the presence of trans-splicing machinery facilitated the formation of operons among neighboring genes, regardless of the function of the encoded proteins. Together, the three examples of operon identification, X- chromosome silencing and chromosomal gene clusters indicate that the location of genes in the genome both on chromosomes and in local gene neighborhoods can be influenced by the requirements for expressing those genes within particular tissues or developmental environments. Integration of genome-wide functional data Gene expression profiling experiments provide a valuable resource to investigators. In addition to the tissue-specific experiments described above, several temporal studies have been carried out that focus on different aspects of C. elegans development, including aging and embryogenesis (refs 22 24 and L.R. Baugh et al., unpublished data). Because the data from all of these studies are accessible on the Internet (Table 1), the click of a button allows a researcher to identify unknown genes that share common expression profiles with known ones, or to examine the regulation of unknown genes across many experiments. By combining gene expression profiles with sequence data, a potential function can be tentatively assigned for many genes. Whether focused on distinct tissues or temporal surveys, microarray studies increase the efficiency with which mutants are cloned, they identify functionally redundant co-regulated genes, and they can pinpoint which member of a family of proteins is most likely to function in a particular process. In the 4 years since the genomic sequence for C. elegans was completed, diverse approaches based on functional genomics have been used to analyze gene function on a global scale. These approaches include methods for examining the localization of gene expression such as systematic reporter transgene assays 25 and in situ hybridization of ESTs (Nematode Expression Pattern Database; Table 1), which provide valuable complements to DNA microarrays. In addition, several groups are using reverse genetic techniques to cause loss-of-function phenotypes, including the creation of deletion mutants 26 and RNAi 27 31. Proteomic 544 nature genetics supplement volume 32 december 2002

approaches include the generation of a set of constructs containing the open reading frames of about 12,000 genes from start to stop (the ORFeome 32 ), which have been used in systematic global yeast two-hybrid studies 33,34. Global gene expression profiling can be integrated with these other functional genomics technologies in several ways. First, the information gained from DNA microarray analysis can be used to focus follow-up functional screens. Knowing that a set of genes is expressed in a particular tissue can narrow the search for genes with particular phenotypes expected to affect that tissue. For example, the identification of a set of germline-enriched genes 4 allowed several groups to concentrate on those genes in a series of RNAi screens looking for missegregation of meiotic chromosomes, germline protein mislocalization and defects in early embryogenesis (refs 5,35, G. Seydoux, pers. comm., and T. Evans, pers. comm.). Similar types of studies can be done with the sets of muscle, neuronal and pharyngeal genes described above 3,8,11. Transgenic screens using reporter proteins fused to promoter regions 25 also promise to identify many more promoters that can be used for mrna tagging. RNAi can also be directly combined with microarray technology. For example, each component of a regulatory complex can be functionally depleted using RNAi, and the resulting changes in gene expression can be measured using DNA microarrays in the treated animals. Improved methods of linearly amplifying small amounts of RNA permit microarray analysis to be carried out with relatively few animals 36, and therefore any factor can be depleted, regardless of the severity of its effects. RNAi-based microarray experiments will allow a systematic investigation of several multiprotein regulatory complexes, as well as the combinatorial effects of different transcription factors on target genes. The data from independent functional genomics experiments can also be integrated to understand better many aspects of gene function. Independent convergence of functional information on pairs or groups of genes strengthens the prediction that the encoded proteins are biologically related in some way. In S. cerevisiae, for example, several studies comparing global yeast twohybrid data with gene expression profiles have shown that co-regulated genes encode interacting proteins more frequently than expected by random chance 37 39. In C. elegans, such a comparison is still premature but is likely to be carried out in the near future. With the ability to test functional hypotheses obtained from gene expression profiling studies rapidly by examining existing phenotypes, expression patterns and protein interactions we can make great inroads toward understanding many of the molecular mechanisms of development. Independent functional data can help to guide the selection of gene sets for DNA microarray analysis. A selected set of genes to be studied by RNAi-based microarray can be refined using independent yeast two-hybrid studies to define components of protein complexes, or by using independent RNAi screens to select genes with similar reduction-of-function phenotypes. The possibility of using the mrna tagging method to analyze gene expression specifically in a tissue where the RNAi phenotype is occurring increases the sensitivity of this type of analysis even further. Future directions Future studies using DNA microarrays in C. elegans will include analysis of the regulatory sequences of the genome. The creation of microarrays containing noncoding regulatory sequences, such as promoters and introns, in addition to coding sequences (known as tiling path arrays ) will facilitate the identification of key regulatory elements that control gene expression throughout the genome. Experimental methods such as chromatin immunoprecipitation 40 and DamID 41 can be used to associate regulatory factors with target binding sites in the genome using these microarrays (see by J.R. Pollack and V.R. Iyer, pages 515 521, this issue). Genome-wide examination of the organization of regulatory elements will help us to better elucidate the functional constraints on the genomic organization of individual genes. This approach is strengthened by the recently completed Caenorhabditis briggsae genome, which will allow sequence conservation in regulatory regions to be compared between the two species. The combination of genome-wide information from gene expression and tiling path microarrays with other functional genomics techniques will pave the way for modeling the complex regulatory networks that govern metazoan development. Acknowledgments I thank K. White, S. West and W. Chi for critically reading the manuscript, and W.G. Kelly for the photograph in Fig. 2b. 1. The C. elegans sequencing consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012 2018 (1998). 2. Costanzo, M.C. et al. The yeast proteome database (YPD) and Caenorhabditis elegans proteome database (WormPD): comprehensive resources for the organization and comparison of model organism protein information. Nucleic Acids Res. 28, 73 76 (2000). 3. Gaudet, J. & Mango, S.E. Regulation of organogenesis by the Caenorhabditis elegans FoxA protein PHA-4. Science 295, 821 825 (2001). 4. Reinke, V. et al. A global profile of germline gene expression in C. elegans. Mol. Cell 6, 605 616 (2000). 5. Colaiácovo, M.P. et al. A targeted RNAi screen for genes involved in chromosome morphogenesis and nuclear organization in the C. elegans germ line. Genetics 162, 113 128 (2002). 6. MacQueen, A.J., Colaiacovo, M.P., McDonald, K. & Villeneuve, A.M. Synapsisdependent and -independent mechanisms stabilize homolog pairing during meiotic prophase in C. elegans. Genes Dev. 16, 2428 2442 (2002). 7. Horner, M.A. et al. pha-4, an HNF-3 homolog, specifies pharyngeal organ identity in Caenorhabditis elegans. Genes Dev. 12, 1947 1952 (1998). 8. Roy, P.J., Stuart, J.M., Lund, J. & Kim, S.K. Chromosomal clustering of muscleexpressed genes in Caenorhabditis elegans. Nature 418, 975 979 (2002). 9. Gorlach, M., Burd, C.G. & Dreyfuss, G. The mrna poly(a)-binding protein: localization, abundance, and RNA-binding specificity. Exp. Cell Res. 211, 400 407 (1994). 10. Christensen, M. et al. A primary culture system for functional analysis of C. elegans neurons and muscle cells. Neuron 33, 503 514 (2002). 11. Zhang, Y. et al. Identification of genes expressed in C. elegans touch receptor neurons. Nature 418, 331 335 (2002). 12. Way, J.C. & Chalfie, M. mec-3, a homeobox-containing gene that specifies differentiation of the touch receptor neurons in C. elegans. Cell 54, 5 16 (1988). 13. Kim, S.K. et al. A gene expression map for Caenorhabditis elegans. Science 293, 2087 2092 (2001). 14. Kelly, W.G. et al. X-chromosome silencing in the germline of C. elegans. Development 129, 479 492 (2002). 15. Kelly, W.G. & Fire, A. Chromatin silencing and the maintenance of a functional germline in Caenorhabditis elegans. Development 125, 2451 2456 (1998). 16. Holdeman, R., Nehrt, S. & Strome, S. MES-2, a maternal protein essential for viability of the germline in Caenorhabditis elegans, is homologous to a Drosophila Polycomb group protein. Development 125, 2457 2467 (1998). 17. Korf, I., Fan, Y. & Strome, S. The Polycomb group in Caenorhabditis elegans and maternal control of germline development. Development 125, 2469 2478 (1998). 18. Fong, Y., Bender, L., Wang, W. & Strome, S. Regulation of the different chromatin states of autosomes and X chromosomes in the germ line of C. elegans. Science 296, 2235 2238 (2002). 19. Blumenthal, T. Gene clusters and polycistronic transcription in eukaryotes. BioEssays 20, 480 487 (1998). 20. Blumenthal, T. et al. A global analysis of Caenorhabditis elegans operons. Nature 417, 851 854 (2002). 21. von Mering, C. & Bork, P. Teamed up for transcription. Nature 417, 797 798 (2002). 22. Hill, A.A., Hunter, C.P., Tsung, B.T., Tucker-Kellogg, G. & Brown, E.L. Genomic analysis of gene expression in C. elegans. Science 290, 809 812 (2000). 23. Jiang, M., Ryu, J., Kiraly, M., Duke, K., Reinke, V. & Kim, S.K. Genome-wide analysis of developmental and sex-regulated gene expression profiles in Caenorhabditis elegans. Proc. Natl Acad. Sci. USA 98, 218 223 (2001). 24. Lund, J. et al. Transcriptional profile of aging in C. elegans. Curr. Biol. 12, 1566 1573 (2002). 25. Lynch, A.S., Briggs, D. & Hope, I.A. Developmental expression pattern screen for genes predicted in the C. elegans genome sequencing project. Nature Genet. 11, 309 313 (1995). 26. Jansen, G., Hazendonk, E., Thijssen, K.L. & Plasterk, R.H. Reverse genetics by chemical mutagenesis in Caenorhabditis elegans. Nat. Genet. 17, 119 121 (1997). 27. Hunter, C.P. Gene silencing: shrinking the black box of RNAi. Curr. Biol. 9, R440 R442 (1999). 28. Gonczy, P. et al. Functional genomic analysis of cell division in C. elegans using RNAi of genes on chromosome III. Nature 408, 331 336 (2000). 29. Fraser, A.G. et al. Functional genomic analysis of C. elegans chromosome I by systematic RNA interference. Nature 408, 325 330 (2000). 30. Piano, F., Schetter, A.J., Mangone, M., Stein, L. & Kemphues, K.J. RNAi analysis of genes expressed in the ovary of Caenorhabditis elegans. Curr. Biol. 10, 1619 1622 (2000). nature genetics supplement volume 32 december 2002 545

31. Maeda, I., Kohara, Y., Yamamoto, M. & Sugimoto, A. Large-scale analysis of gene function in Caenorhabditis elegans by high-throughput RNAi. Curr. Biol. 11, 171 176 (2001). 32. Reboul, J. et al. Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegans. Nature Genet. 27, 332 336 (2001). 33. Walhout, A.J. et al. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287, 116 122 (2000). 34. Boulton, S.J. et al. Combined functional genomic maps of the C. elegans DNA damage response. Science 295, 127 131 (2002). 35. Piano, F. et al. Gene clustering based on RNAi phenotypes of ovary-enriched genes in C. elegans. Current Biol. (in press). 36. Baugh, L.R., Hill, A.A., Brown, E.L. & Hunter, C.P. Quantitative analysis of mrna amplification by in vitro transcription. Nucleic Acids. Res. 29, E29 (2001). 37. Ge, H., Liu, Z., Church, G.M. & Vidal, M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nature Genet. 29, 482 486 (2002). 38. Kemmeren, P. et al. Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol. Cell 9, 1133 1143 (2002). 39. Qian, J., Dolled-Filhart, M., Lin, J., Yu, H., & Gerstein, M. Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. J. Mol. Biol. 14, 1053 1066 (2001). 40. Wells, J. & Farnham, P.J. Characterizing transcription factor binding sites using formaldehyde crosslinking and immunoprecipitation. Methods 26, 48 56 (2002). 41. van Steensel, B., Delrow, J. & Henikoff, S. Chromatin profiling using targeted DNA adenine methyltransferase. Nature Genet. 27, 304 308 (2001). 546 nature genetics supplement volume 32 december 2002