BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1
Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study the complex interaction of many levels of biological information to understand how they work together Data typically generated using high-throughput technology, i.e. omics 2
Systems biology, why? Using computational strategy to study how the parts were assembled into working wholes using the comprehensive catalogs of genomic and cellular constituents 3
The realities in living cells: signaling pathways 4
The realities in living cells: cellular metabolisms Glycan biosynthesis Nucleotide metabolism Carbohydrate metabolism Amino acid metabolism Energy metabolism TCA cycle Lipid metabolism 5
The realities in living cells: chemical reactions Glycolysis pathways 6
Data are Becoming More Plentiful and More Complex 7
Biological systems & data Sequence data Protein folding and 3D structure Taxonomic data Literature Protein families and domains Pathways and networks Small molecules Whole genome data Biological systems 8
Systems biology, how? 9
Integrated Genomics Introduction Gene Ontology : GO Pathway Database : KEGG Pathway Mapping & Network building : MetaCore 10
The Genome is similar to a jigsaw puzzle http://www.genmapp.org/concept.html 11
The pieces can be grouped into classes Blue Gold/Grey Brown http://www.genmapp.org/concept.html 12
Groups are then organized by interaction Sky Compass Cliff/Trees http://www.genmapp.org/concept.html 13
Ultimately a global picture of all interactions can be assembled http://www.genmapp.org/concept.html 14
Data from high-throughput analysis randomly list thousands of genes http://www.genmapp.org/concept.html 15
Genes can be grouped by biological function G protein signaling Fatty acid degradation Apoptosis http://www.genmapp.org/concept.html 16
Fatty acid degradation http://www.genmapp.org/concept.html 17
Fatty acid degradation http://www.genmapp.org/concept.html 18
Integrated Genomics Introduction Gene Ontology : GO Pathway Database : KEGG Public domain tool : DAVID 19
GO Structure 20
How do you describe a gene? How do you identify a group of genes with related biological functions? How do you know a biological function is conserved from one species to another? 21
In biology Tactition Taction Tactile sense? Adopted from http://www.geneontology.org/ 22
Tactition Tactio n Tactile sense perception of touch ; GO:0050975 Adopted from http://www.geneontology.org/ 23
Bud initiation? Adopted from http://www.geneontology.org/ 24
= tooth bud initiation = cellular bud initiation = flower bud initiation Adopted from http://www.geneontology.org/ 25
GO : Gene Ontology http://www.geneontology.org/ Go Tools 26
What is the Gene Ontology? Genes are linked, or associated, with GO terms by trained curators at genome databases known as gene associations or GO annotations Some GO annotations created automatically genome and protein databases gene -> GO term associated genes GO database 25th June 2007 Jane Lomax 27
What is Gene Ontology Gene Ontology (GO) A set of controlled vocabulary that classify concepts and define the relationship between genes GO uses three hierarchical terms to describe different aspects of every protein: where, what, and why? Cellular components (where) : the location of protein activity Molecular function (what) : the biochemical activity the protein accomplishes Biological processes (why) : the overall objective toward which this protein contributions 28
Gene Ontology (GO) Gene Ontology (GO) Provide controlled vocabulary to describe biological knowledge for gene and gene products. Three components / subontologies in GO Example : BRCA1 Cellular component (where) Its location Cellular component Nucleus Molecular function (what) Tasks performed on the molecular level Biological process (why) How it pertains to the organism Molecular function Protein-binding Biological process DNA replication and chromosome cycle 29
Evidence Code of GO 30
The Gene Ontology is like a dictionary Each concept has: a name a definition an ID number term: transcription initiation id: GO:0006352 definition: Processes involved in the assembly of the RNA polymerase complex at the promoter region of a DNA template resulting in the subsequent synthesis of RNA from that promoter. Adopted from http://www.geneontology.org/ 31
There are also relationships between them. Nucleic acid binding is a type of binding. is_a is_a DNA binding is a type of nucleic acid binding. Adopted from http://www.geneontology.org/ 32
Ontology Structure Terms are linked by two relationships is-a part-of Adopted from http://www.geneontology.org/ 33
Ontology Structure cell is-a part-of membrane chloroplast mitochondrial chloroplast membrane membrane Adopted from http://www.geneontology.org/ 34
Ontology Structure Ontologies are structured as a hierarchical directed acyclic graph (DAG) Terms can have more than one parent and zero, one or more children Adopted from http://www.geneontology.org/ 35
Ontology Structure cell Directed Acyclic Graph (DAG) - multiple parentage allowed membrane chloroplast mitochondrial chloroplast membrane membrane Adopted from http://www.geneontology.org/ 36
GO structure GO terms divided into three parts: cellular component molecular function biological process 25th June 2007 Jane Lomax 37
where a gene product acts Cellular Component 25th June 2007 Jane Lomax 38
Cellular Component 25th June 2007 Jane Lomax 39
Cellular Component 25th June 2007 Jane Lomax 40
Cellular Component Enzyme complexes in the component ontology refer to places, not activities. 25th June 2007 Jane Lomax 41
Molecular Function activities or jobs of a gene product glucose-6-phosphate isomerase activity 25th June 2007 Jane Lomax 42
Molecular Function insulin binding insulin receptor activity 25th June 2007 Jane Lomax 43
Molecular Function A gene product may have several functions Sets of functions make up a biological process. 25th June 2007 Jane Lomax 44
Biological Process a commonly recognized series of events cell division 25th June 2007 Jane Lomax 45
Biological Process transcription 25th June 2007 Jane Lomax 46
Biological Process regulation of gluconeogenesis 25th June 2007 Jane Lomax 47
Biological Process limb development 25th June 2007 Jane Lomax 48
Biological Process courtship behavior 25th June 2007 Jane Lomax 49
GO for microarray analysis Annotations give function label to genes Ask meaningful questions of microarray data e.g. Do genes involved in the same process show similar or different expression patterns? 25th June 2007 Jane Lomax 50
Traditional analysis of omics data Gene 1 Apoptosis Cell-cell signaling Protein phosphorylation Mitosis Gene 2 Growth control Mitosis Oncogenesis Protein phosphorylation Gene 3 Growth control Mitosis Gene 4 Oncogenesis Nervous system Protein phosphorylation Pregnancy Oncogenesis Mitosis Gene 100 Positive ctrl. of cell prolif Mitosis Oncogenesis Glucose transport 25th June 2007 Jane Lomax 51
Using GO annotations But by using GO annotations, this work has already been done for you! GO:0006915 : apoptosis 25th June 2007 Jane Lomax 52
Grouping by process Apoptosis Gene 1 Gene 53 Mitosis Gene 2 Gene 5 Gene45 Gene 7 Gene 35 Glucose transport Gene 7 Gene 3 Gene 6 Positive ctrl. of cell prolif. Gene 7 Gene 3 Gene 12 Growth Gene 5 Gene 2 Gene 6 25th June 2007 Jane Lomax 53
Using GO in practice Statistical measure how likely your differentially regulated genes fall into that category by chance? 80 70 60 50 40 30 20 microarray 1000 genes experiment 100 genes differentiall y regulated 10 0 mitosis apoptosis positive control of cell proliferation glucose transport mitosis 80/100 apoptosis 40/100 p. ctrl. cell prol. 30/100 glucose transp. 20/100 25th June 2007 Jane Lomax 54
Using GO in practice However, when you look at the distribution of all genes on the microarray: Process Genes on array # genes expected in occurred 100 random genes mitosis 800/1000 80 80 apoptosis 400/1000 40 40 p. ctrl. cell prol. 100/1000 10 30 glucose transp. 50/1000 5 20 25th June 2007 Jane Lomax 55
Integrated Genomics Introduction Gene Ontology : GO Pathway Database : KEGG Pathway Mapping & Network building : MetaCore 56
Pathway Database : KEGG Kyoto Encyclopedia of Genes and Genomes http://www.genome.jp/kegg/ 57
KEGG: Kyoto Encyclopedia of Genes and Genomes KEGG is a database of biological systems which contains: Genetic building blocks of genes and proteins (KEGG GENES) Chemical building blocks of both endogenous and exogenous substances (KEGG LIGAND), Molecular wiring diagrams of interaction and reaction networks (KEGG PATHWAY), Hierarchies and relationships of various biological objects (KEGG BRITE). KEGG provides a reference knowledge base for linking genomes to biological systems and also to environments by the processes of PATHWAY mapping and BRITE mapping. 58
KEGG database: Table of contents Total 19 databases subdivided into three categories 59
KEGG database: Identifier KEGG Objects 60
KEGG database: current statistics 61
Main entry sites http://www.genome.jp/kegg/ Main database entry sites 62
Search by key words Search by K or ko number 63
Drop-down List: Metabolism pathways Total 204 metabolic pathways 64
Pathway description Glycolysis; fatty acid oxidation From Lehninger Biochem 65
Pyruvate dehydrogenase Enzymes 66
Pyruvate dehydrogenase 67
Search pathway by gene name Search by Gene Name 68
Object ID CDK2 related pathways 69
Cell Cycle Pathway 70
Description BRITE PATHWAY Ortholog Module 71
Integration 72
DAVID Bioinformatics Resources DAVID web server : http://david.abcc.ncifcrf.gov/home.jsp 73
Analytic tools/modules in DAVID 74
DAVID analytic modules 75
Start analysis wizard Click Start Analysis from anywhere within the website 76
Submit gene list or use built-in demo gene lists 77
Select one of the DAVID Tools Gene List Manager Panel 78
Gene Name Batch Viewer Uer s input gene IDs Gene name translated by DAVID Click on gene name will lead to more detail info RG means Related Genes search fucntion 79
Gene Functional Classification Parameter panel Gene functional groups are separated by the blue rows A set of functions provided in the blue row for area for each group Gene Clusters identified by DAVID User s gene IDs & Names 80
2D View of Gene Function Classification Green color represents the positive association of the pair of term and gene Blank color represents the negative or no association of the pair of term and gene 81
Select annotation category and run Functional Annotation Chart 82
Select annotation category and run Functional Annotation Chart Parameter Panel Enrichment annotation Enrichment p-value Click on term name lead to details Click on blue bar to list all associated genes Click on RT to list other related terms Sort results by different columns 83
Select annotation category and run Functional Clustering Parameter Panel Annotation Clusters identified by DAVID Term clusters are separated by the blue rows A set of functions provided in the blue row area for each cluster 84
Functional Table Annotation Categories Annotation contents Header for each gene Each block separated by blue rows contains the contents for one gene A set of hyperlinks lead to more detailed descrptions 85
DAVID Bioinformatics Resources DAVID web server : http://david.abcc.ncifcrf.gov/home.jsp 86
Databases and Tools Database Gene Ontology at http://www.geneontology.org/ KEGG pathway at http://www.genome.ad.jp/kegg/ BioCarta at http://www.biocarta.com/ Oncomine at http://www.oncomine.org/ PharmGKB at http://www.pharmgkb.org/ REACTOME at http://www.reactome.org/ Tools for gene classification, pathway and network analysis BABELOMICS at http://www.fatigo.org/ PANTHER at http://www.pantherdb.org/ GSEA at http://www.broad.mit.edu/gsea/ DAVID at http://david.abcc.ncifcrf.gov/ GenMapp at http://www.genmapp.org/ Cytoscape at http://www.cytoscape.org/ MetaCore at http://www.genego.com/ Ingenuity Pathway at http://www.ingenuity.com/ 87