TRANSCRIPTOMICS (or the analysis of the transcriptome) Mario Cáceres Main objectives of genomics Determine the entire DNA sequence of an organism Identify and annotate the complete set of genes encoded within a genome Characterize the gene-expression profiles in different tissues and cell types on a genome-wide scale Ascertain the function of each and every gene product Understand the genetic basis of the phenotypic differences between individuals and species 1
Central dogma of molecular biology Central dogma of molecular biology Genome Complete DNA content of an organism with all its genes and regulatory sequences Transcription Proteome Complete collection of proteins and their relative levels in each cell Translation Transcriptome Complete set of transcripts and relative levels of expression in a particular cell or tissue under defined conditions at a given time 2
Why study of RNA is so important? RNA profiling provides clues to: - Expressed sequences and genes of a genome - Gene regulation and regulatory sequences - Function and interaction between genes - Functional differences between tissues and cell types - Identification of candidate genes for any given process or disease Beyond the genome Gene Number Complexity Gene Complexity Gene regulation Transcriptome 3
Overview 1. Methods for transcriptome analysis 2. Genome transcription landscape 3. Types of RNAs and functions 4. Analysis of gene structure 5. Transcriptome profiling (tissues, individuals) 6. Gene expression and evolution Methods for transcriptome analysis Gene expression arrays - Quantification of transcript abundance - Single/multiple 3 probes End of CDS or 3 UTR (100-500 bp) Genome tiling arrays - Identification of transcribed sequences - Multiple probes along the genome Alternative splicing arrays - Quantification of different RNA isoforms - Probes in exons and exon-exon junctions + Inclusion isoform Exclusion isoform 4
Methods for transcriptome analysis RNA-tag sequencing - Quantification of transcript abundance - Single end reads of each RNA species Whole RNA sequencing - Identification of transcribed sequences - Multiple reads along each RNA species The ENCODE project ENCyclopedia of DNA Elements Pilot phase (2003-2007) Identification of all functional elements in 1% of the human genome Production phase (2007-?) Identification of all functional elements in the whole human genome 5
Main results from ENCODE project Pervasive transcription of the human genome (14.7% of the analyzed bases are transcribed in at least one tissue sample, but as much as 93% could be present in primary transcripts) Identification of many novel non-protein transcripts (ncrnas either overlapping protein-coding loci or in regions thought to be transcriptionally silent) Numerous unrecognized transcription start sites Great complexity and overlap of transcribed regions (multiple overlapping transcripts of different sizes from the same or opposite DNA strands) The human transcription landscape (Kapranov et al., Science 2002) 6
The human transcription landscape (Kapranov et al., Science 2002) The human transcription landscape Genomic distribution of long RNAs (Kapranov et al., Science 2007) Most transcribed nucleotides are outside annotated exons 7
Genome transcription relevance? 1. Transcriptional noise 2. Functional Types of RNAs (Non coding) pirna PASR PALR PASR Structural RNAs Regulatory RNAs Figure 1.12 Genomes 3 ( Garland Science 2007) 8
Types of RNAs and characteristics Type Name Size Number Function mrna messenger RNA Several kb ~30,000 Coding for proteins rrnas ribosomal RNAs 114-5000 nt ~300-400 Component of ribosome trnas transfer RNAs 74-95 nt ~1000 Translation snrnas snornas small nuclear RNAs small nucleolar RNAs 100-300 nt 60-300 nt? ~375 Splicing RNA modification mirnas micro RNAs 21-23 nt ~1000 Gene expression regulation pirnas (rasi RNAs) piwi RNAs 26-30 nt ~33000 Gametogenesis, transposon defense lncrnas Long non-coding RNAs >200 bp ~35000 Regulation, imprinting Messenger RNAs Typical mrna 9
Micro RNAs Small non-coding RNAs (21-23 nt) involved in posttranscriptional regulation of gene expression by binding to the 3 UTR of target mrnas Identified in the early 1990s, but not recognized as a distinct class of regulators until the early 2000s Abundant in many cell types and may be involved in different developmental processes (complex organisms) Target around 60% of mammalian genes and each mirna can repress hundreds of genes Micro RNAs pathway 10
Other new non-coding RNAS Transcriptional complexity (61%-72% genes) 11
Long range transcription Long range transcription could be more frequent than previously thought and involve both exons located very far away (up to Mb) and exons in different chromosomes (Gingeras et al., Nat. Rev. Genet. 2007) Identification of new TSS Mapping of CAGE-tags to the human genome (all) (>2) Carninci et al., Nat. Genet. 2006 12
Alternative splicing (AS) Cassette exons: Exons that are spliced out of certain mrnas to alter the amino acid sequence of the expressed protein. Alternative splicing affords extensive proteomic and regulatory diversity from a limited repertoire of genes Large-scale analysis of alt. splicing 13
Alternative splicing facts 92-98% of multi exon genes are subjected to alternative splicing (almost 86% of human genes) ~100,000 AS events of different frequency detected in human tissues and many show variation between them Less than 20% of alternative splicing events appear to be conserved between humans and mice Alternative splicing could be a key mechanism in generating proteomic diversity Alternative splicing in human tissues Pan et al. Wang et al. 14
Transcript profiling across tissues Human http://biogps.gnf.org/ Mouse Rat Su et al. 2004 Transcript profiling in the brain Relationship between tissues Functional differences between tissues Identification of tissue specific promoters 15
Transcript profiling in cancer Molecular classification of cancer types ALL acute lymphoblastic leukemia AML acute myeloid leukemia Transcriptome reflects Biology - Predictive - Treatment Transcript profiling across individuals Characterization of expression levels in lymphoblastoid cell lines from individuals of hapmap populations Many genes with variation accross individuals Gene expression phenotypes are highly heritable Higher proportion of expression differences related to cis-variants (although they are also easier to detect) 16
Transcript profiling across individuals Mapping gene expression variants (bigger effects, few genes) (moderate effects, more genes) 17
Mapping gene expression variants Mapping gene expression variants 18
Persistence of lactase expression Lactase production in adults shows large variability in human populations and seems related with pastoralism: Persistence of lactase expression 19
Persistence of lactase expression Coding vs. regulatory evolution Regulatory changes have unique properties that could make them especially important in phenotypic evolution: Reduced pleiotropical effects Fine-tuning of gene function Co-dominance and more efficient selection Expression analysis are an important tool in evolution study 20
Multiple levels of regulatory evolution Structural changes - Gene duplication or deletion - Chromosomal inversion - TE insertion Regulatory change in cis - Basal promoter - Tissue-specific element - Distant enhancer Gene expression levels Epigenetic chromatin modifications - DNA methylation - Histone modifications Changes in trans-acting regulator - Transcription factor - microrna Changes in the mrna sequence - Alternative splicing - Alternative 5 or 3 UTRs - RNA editing Comparison of gene-expression levels in the brain during human evolution 21
Microarrays and brain evolution Several studies have compared gene-expression levels in the adult brain of humans and non-human primates: Sex Age Table 1 Comparative studies of human and non-human primate gene expression in the brain by oligonucleotide arrays Study Platform Coverage Tissue Species Tissue Follow-up compared compared a (years) origin validation Enard et al. HG_U95A 12533 probe Brain frontal cortex 3 humans all males adults autopsy Northern (2002) arrays sets (~9100 (area 9), plus liver 3 chimps all males adults (PMI = 2-6 genes) 1 orangutan male adult h) Cáceres et HG_U95Av2 12558 probe Brain frontal, and 5 humans 3 mal./2 fem. 30-71 surgery and cdna arrays, al. (2003) arrays sets (~9100 temporal cortex 4 chimps 1 mal./3 fem. 1-24 autopsy real-time PCR genes) (several regions), plus heart 4 rhesus 2 mal./1 fem. 1-9 (PMI = 2-14 h) and in-situ hybridization Khaitovich HG_U95Av2 3 humans all males. 45-70 autopsy - et al. (2004) arrays 3 chimps all males. 12-40 Uddin et al. (2004) Khaitovich et al. (2005) Khaitovich et al. (2006) HG_U133A and B arrays HG_U133 Plus2 arrays ENCODE 0.1 Fwd. and Rev. arrays 12558 probe sets (~9100 genes) 44792 probe sets (~16400 genes) 54613 probe sets (~20700 genes) 30 Mb of DNA Brain frontal, temporal and occipital cortex (several regions) Brain anterior cingulate cortex Brain frontal cortex (area 9), plus liver, heart, kidney and testis Brain frontal cortex (area 9), plus heart and testis 3 humans 1 chimp 1 gorilla 3 rhesus 6 humans 5 chimps 6 humans 5 chimps all females female male 2 mal./1 fem. all males all males all males all males 14-28 34 41 4-11 45-70 7-40 45-70 7-40 autopsy (PMI < 12 h.) autopsy - autopsy - (Modified from Preuss, Cáceres et al., Nat. Rev. Genet. 2004) - Gene-expression evolution patterns 1. Acceleration of gene-expression changes in the brain during human evolution (Enard, Khaitovich et al., Science 2002) (Preuss, Cáceres et al., Nat. Rev. Genet. 2004) 2. Increased expression levels in human brain compared to nonhuman primates 22
Gene-expression changes in cortex Metanalysis identified 533 differentially expressed genes in human-chimp cortex with good confirmation by RT-PCR: Bayes t-test (PPDE > 0.99%) 612 genes SAM (t test FDR < 0.1%) 577 genes RT-PCR validation Total probe sets (genes) Sensitivity (TP/TP+FN) All PCRs 78 (61) 89% Specificity (TP/TP+FP) 64% 79 533 44 40.2% Dec. / 59.8% Inc. PCR log 2 (Hs/Pt ratio) 6 4 2 0-2 -4-6 Diff. exp. ps. y = 0.4775x R 2 = 0.5023-4 -3-2 -1 0 1 2 3 4 5 Array log 2 (Hs/Pt ratio) (Cáceres, Oldham et al., in prep.) Functional classification of genes Genes differentially expressed in human and chimp cortex belong to a wide diversity of functional categories: 54 45 76 38 71 45 24 Biological process categories Cell adhesion and signaling Signal transduction Cell cycle and proliferation Cell organization and biogenesis Nucleic acid metabolism Carbohydrate metabolism Lipid metabolism ** Protein metabolism 46 82 Response to stress Transport De ve lopm e nt Unknow n 86 33 18 (Cáceres, Oldham et al., in prep.) 23
Follow-up of expression differences mrna levels cdna microarrays Real-time RT-PCR Northern blots Protein levels Western blots Cellular distribution In situ hybridization Immunohisto chemistry Phenotypic changes Regulatory studies Molecular evolution Functional analysis Why thrombospondins? Thrombospondins are extracellular matrix proteins that regulate cell interactions in multiple tissues - THBS1 and THBS2 (Subfamily A) THBS domains - THBS3, THBS4 and COMP (Subfamily B) In rodents, THBSs have been implicated in the induction of synapse formation by astrocytes both in vitro and in vivo Synapse formation (Christopherson et al., Cell 2005) 24
THBSs expression patterns mrna and protein levels of THBS4 and THBS2 are specifically upregulated in human cortex: Real-time RT-PCR Western blot analysis ~6 fold increase *** *** ~2 fold increase *** *** (Cáceres et al., Cerebral Cortex 2006) Cortical distribution of THBS4 In situ hybridization (THBS4 mrna) Human Immunohistochemistry (THBS4 protein) Chimp Human Chimp Rhesus Humans sections exhibit denser THBS4 labeling of the neuropil, consistent with a potential role in synaptic activity and plasticity Human Chimp Rhesus (Cáceres et al., Cerebral Cortex 2006) 25
Possible functional consequences? Increased levels of thrombospondins could be related to changes in synaptic organization and plasticity in the cortex: Greater density of synapses Higher synapses turnover Increased neurite growth Macaque Human (Duan et al., Cereb. Cortex 2003) 26