Geert Geeven. April 14, 2010

Similar documents
Prokaryotic Regulation

Warm-Up. Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22)

Chapter 15 Active Reading Guide Regulation of Gene Expression

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Introduction to Bioinformatics

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Gene Regulation and Expression

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

Gene Control Mechanisms at Transcription and Translation Levels

Control of Gene Expression

BME 5742 Biosystems Modeling and Control

12-5 Gene Regulation

Topic 4 - #14 The Lactose Operon

Developmental Biology 3230 Midterm Exam 1 March 2006

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

Controlling Gene Expression

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Translation and Operons

Regulation of gene Expression in Prokaryotes & Eukaryotes

Written Exam 15 December Course name: Introduction to Systems Biology Course no

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Prokaryotic Gene Expression (Learning Objectives)

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

CS-E5880 Modeling biological networks Gene regulatory networks

Introduction. Gene expression is the combined process of :

Genomics and bioinformatics summary. Finding genes -- computer searches

13.4 Gene Regulation and Expression

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Biology: Basics & Interesting Problems

Bi 1x Spring 2014: LacI Titration

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

The Research Plan. Functional Genomics Research Stream. Transcription Factors. Tuning In Is A Good Idea

CHAPTER : Prokaryotic Genetics

Inferring Transcriptional Regulatory Networks from High-throughput Data

GCD3033:Cell Biology. Transcription

Honors Biology Reading Guide Chapter 11

Multiple Choice Review- Eukaryotic Gene Expression

More Protein Synthesis and a Model for Protein Transcription Error Rates

Eukaryotic Gene Expression

Control of Gene Expression in Prokaryotes

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00.

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Physiology 2 nd year. Neuroscience Optional Lecture

Network motifs in the transcriptional regulation network (of Escherichia coli):

A Simple Protein Synthesis Model

Name Period The Control of Gene Expression in Prokaryotes Notes

Neural development its all connected

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites

Discovering MultipleLevels of Regulatory Networks

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

GENE REGULATION AND PROBLEMS OF DEVELOPMENT

The Making of the Fittest: Evolving Switches, Evolving Bodies

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g.

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Chapter 20. Initiation of transcription. Eukaryotic transcription initiation

Multivariate point process models

Genome 541! Unit 4, lecture 3! Genomics assays

Inferring Protein-Signaling Networks

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Measuring TF-DNA interactions

Nucleus. The nucleus is a membrane bound organelle that store, protect and express most of the genetic information(dna) found in the cell.

SUPPLEMENTARY INFORMATION

AP Biology Gene Regulation and Development Review

Approximate inference for stochastic dynamics in large biological networks

Conclusions. The experimental studies presented in this thesis provide the first molecular insights

4. Why not make all enzymes all the time (even if not needed)? Enzyme synthesis uses a lot of energy.

Lecture 7: Simple genetic circuits I

Regulation of Gene Expression *

Cells. Steven McLoon Department of Neuroscience University of Minnesota

Chapter 11. Development: Differentiation and Determination

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005

PROTEIN SYNTHESIS INTRO

Regulation of Gene Expression

UNIT 5. Protein Synthesis 11/22/16

Bioinformatics 2 - Lecture 4

Predicting Protein Functions and Domain Interactions from Protein Interactions

Lecture 4: Transcription networks basic concepts

Prokaryotic Gene Expression (Learning Objectives)

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

The majority of cells in the nervous system arise during the embryonic and early post

Developmental Biology Lecture Outlines

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Three types of RNA polymerase in eukaryotic nuclei

ESL Chap3. Some extensions of lasso

2. Mathematical descriptions. (i) the master equation (ii) Langevin theory. 3. Single cell measurements

Welcome to Class 21!

Chapter 17. From Gene to Protein. Biology Kevin Dees

Gene regulation II Biochemistry 302. February 27, 2006

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Identifying Signaling Pathways

Comparative analysis of RNA- Seq data with DESeq2

32 Gene regulation, continued Lecture Outline 11/21/05

56:198:582 Biological Networks Lecture 8

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics

Supplementary materials Quantitative assessment of ribosome drop-off in E. coli

Deciphering regulatory networks by promoter sequence analysis

Transcription:

iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010

Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the involvement of transcription factors (TFs) that govern spatio-temporal transcription of genes. The outline for today s talk is as follows Biological background - basic mechanism of transcriptional gene regulation. Regression models for gene expression and DNA sequence data. GEMULA (Gene Expression Modeling Using Lasso) Application of GEMULA on data from a biological model of neuronal regeneration.

DNA RNA Protein Outline Biological Background Construction of Predictors Production of a protein requires transcription of the corresponding gene, i.e. the prodcution of a mrna (messenger RNA) molecule which carries a message for the protein synthesizing apparatus of a cell. Figure: The biological processes of transcription and translation. Regulation of (the rate of) transcription is a fundamental mechanism by which cells accomplish differential expression of proteins.

Outline Biological Background Construction of Predictors Typical Structure of an Eukaryotic Gene The term gene is used to refer to the complete DNA sequence which is required for the production of a functional protein. Apart from coding sequences, genes contain regulatory DNA elements that are crucial for their proper function.

Gene Expression and Microarrays Outline Biological Background Construction of Predictors Gene expression profiling experiments involve measuring the relative amount of mrna expressed in two or more experimental conditions. DNA microarrays are widely used to quantify gene expression. Figure: Graphical representation of the experimental steps in a typical microarray experiment.

Outline Biological Background Construction of Predictors Modeling DNA Sequences of TFBSs Eac possible window of length 8 is scored according to a model and the score is compared to a threshold to predict putative TFBSs. S = TGTAGCTGACGTCAATGATGAAGGGTAGAATGACGTAAC In this case S contains 2 TFBSs for the TF CREB.

Outline Biological Background Construction of Predictors Construct Genome-wide Predictors of Gene Expression Given the regulatory DNA sequences of all known genes ((n)) and a set of TFBS models (p), putative binding of TFs to gene sequences can be assesed genome-wide. The result is a matrix [X 1 X p] of dimension n p containing potential predictors that may explain observed variation in gene expression.

Introduction GEMULA Modeling Gene Expression Using Regression Suppose we observe a response vector Y = (Y 1,..., Y n ) that represens gene expression for a set of n genes. Additionally, let a set of p predictor variables X 1,..., X p which are potentially biologically related to Y be given. We assume that Y and X 1,..., X p are related through the following regression model Y = Xβ + ɛ, (1) where X = [1 f 1 (X 1,..., X s ) f d (X 1,..., X s )], is an unknown n (d + 1) design matrix, β = (β 0,..., β d ) is an unknown vector of regression parameters and ɛ = (ɛ 1,..., ɛ n ) N (0, σ 2 I n ).

Introduction GEMULA Linear Models and Model Selection How can we model interactions between predictors using simple linear models? And how to perform model selection? Solution: GEMULA (Gene Expression Modeling Using Lasso). GEMULA selects candidate models M q, q = 1,..., Q restricted to models sub-spaces M q constrained by parameters γ q = (γ q1, γ q2, γ q3), where γ q1 represents the maximum allowed order of interaction between terms, γ q2 is the maximum allowed power to which a candidate predictors is raised and γ q3 is the maximum number of terms allowed in the model. GEMULA uses the lasso for model selection.

The Lasso - An Example Introduction GEMULA Suppose we observe a repsonse Y = (Y 1,..., Y n ) and additionally let predictor variables X 1,..., X 19 be given. We assume that Y = X M β M + ɛ, for X M = [1 X 1 X 19 ]. For a given shrinkage parameter t R +, lasso estimates of β M minimize min β M n Y i β M0 i=1 d M j=1 2 β Mj X ij subject to d M j=1 β Mj t, (2) A fast algorithm called lars is available to solve this problem. It finds all solutions in a small number of steps.

The Lasso - An Example Introduction GEMULA Figure: Example of a lasso path. At each step of the lars algorithm, one predictor enters the active set.

GEMULA: Step I Introduction GEMULA Let M 0 represent the model for which the design matrix satisfies X M0 = [1 X 1 X p ]. Since at each step k, exactly one predictor enters the active set BM k = {j : βk Mj 0}, GEMULA uses the mapping r(j) = min {k : j B k M 0 }, j {1,..., p}, and its inverse r 1 defined by r 1 (s) = j r(j) = s j {1,..., p}, s {1,..., K} to define the order in which predictors enter the model. Now, e.g. when γ 1 = (1, 1, 50), the sub-space M γ1 consists of all possible regression models that contain any subset of main effects for the first 50 predictors and GEMULA uses the lasso to select a model using the design matrix X γ1 = [1 X r 1 (1) X r 1 (50)].

GEMULA: Step II Introduction GEMULA Within each sub-space M q determined by a parameter γ q, GEMULA selects a model using lasso. When interactions between predictors are considered, the restrictions on the maximum number of allowed terms imposed by γ q3 force GEMULA to limit the number of predictors in the following way. Suppose we set γ 2 = (2, 1, 150), then GEMULA first determines s = max{s {1,..., p} : s + s(s 1)/2 150}, and then X γ2 denotes the design matrix that contains all main effects and possible interactions between the predictors X r 1 (1),..., X r 1 (s ). For each matrix X γq, we fit the entire path of lasso solutions and select the optimal lasso-parameter according to the AIC model selection criterion.

GEMULA: Step III Introduction GEMULA GEMULA uses V -fold cross-validation to evaluate the fit of the Q selected candidate models. The R 2 -statistics is used as goodness-of-fit measure, as its has an intuitive and biologically meaningful interpretation. For a model M q with fitted response values Ŷ Mq, it is given by R 2 (M q ) = 1 n i=1 (Y i Ŷ Mq ) 2 n i=1 (Y i Ȳ )2.

Axonal Regeneration After Injury Models of Neuronal Regeneration Results Neurons in the peripheral nervous system (PNS) successfully regenerate following axonal injury. Neurons in the central nervous system (CNS) generally do not. Combining different experimental and computational modeling approaches, we want to gain insight into the transcriptional network that underlies this difference.

The Dorsal Root Ganglion Models of Neuronal Regeneration Results Neurons in the DRG have branches extending both into the PNS and CNS. This provides an excellent model to study the dramatic differences in regenerative capacity between PNS and CNS neurons.

Models of Neuronal Regeneration Results Gene Expression Changes in F11 Cells in Response to Forskolin Stimulation The intrinsic potential of neurons to regrow damaged nerve fibers after an injury depends in part on their ability to initiate a growth promoting gene expression program. Coordinated expression of regeneration associated genes is believed to be governed by interactions between TFs and target genes. The F11 cell line is a fusion product of mouse neuroblastoma cells with embryonic rat DRG neurons. Upon stimulation with Forskolin, F11 cells acquire a neuronal phenotype which results in the outgrowth of neurites. F11 cells are easy to culture and transfect and provide a good in vitro model for neuronal regeneration in vivo. Since cultured F11 cells are a unicellular system, gene expression changes are more homogeneous and less complex than gene expression from in vivo samples of neuronal tissue where cells are in a complex and heterogeneous cellular environment.

Models of Neuronal Regeneration Results Clustering Genes Based on Expression Profiles We distinguish between genes whose expression profile show either early or late response to Forskolin stimulation.

Models Fitted With GEMULA Models of Neuronal Regeneration Results Early responsive genes Late responsive genes Time Model type Model P Model T R 2 cv Model P Model T R 2 cv 2h M1 30 30 0.14 8 8-0.00 2h M2 31 72 0.22 36 53 0.06 2h M3 14 47 0.25 16 27 0.03 4h M1 16 16 0.08 0 0 0 4h M2 31 75 0.14 36 53 0.03 4h M3 14 39 0.07 16 16 0.03 24h M1 50 50 0.01 39 39 0.25 24h M2 31 80 0.11 36 63 0.24 24h M3 14 20 0.02 15 27 0.23 48h M1 5 5-0.01 44 44 0.25 48h M2 31 85 0.11 35 52 0.27 48h M3 14 60 0.04 16 37 0.24 Table: Comparison of models fitted using GEMULA for early and late Forskolin responsive genes in F11 cells at all four time-points. Columns 3-5 correspond to models fitted for the early responsive genes and columns 6-8 to models for the late responsive genes.

Models of Neuronal Regeneration Results Predictors Associated to Gene Expression Changes TFBS ID Motif logo Time Activity V.AP1.Q4.01 Early Activator V.AP1.Q4.01 Late Activator V.AREB6.02 Late Activator V.CREB.Q4.01 Early Activator V.CEBPDELTA.Q6 Early Activator V.CETS1P54.02 Early Repressor V.CETS1P54.02 Late Repressor V.E2F.Q6.01 Late Repressor V.EBF.Q6 Late Activator V.PPARA.01 Early Repressor V.PPARA.01 Late Activator Table: TFBS motif logos of predictors present in models selected by GEMULA.

Models of Neuronal Regeneration Results Experimental Validation of Predictions The functional role of PPARα and PPARγ was studied in F11 cultured cells in vitro. It turned out that PPARγ, and not PPARα promotes neurite outgrowth in F11 cells and that knock-down of PPARγ significantly decreases neurite outgrowth.

Possible Future Improvements Models of Neuronal Regeneration Results In the future, as more and higher quality data becomes available and we better understand the different molecular mechanisms of gene regulation, possible improvements to the model may include (Even) better suitable TFBS predictors more focused at DNA elements that are most likely to be functional. For instance, how are absolute and relative location and conservation of DNA elements related to functionality? Inclusion of in vivo TF binding-assay data (ChIP-chip or ChIP-seq). Inclusion of predictors corresponding to other determinants of transcription rates such as DNA methylation status and nucleosome positions. Use of more accurate (absolute) measurements of gene transcription, e.g. RNA-seq data instead of DNA microarray data.

Acknowledgements Models of Neuronal Regeneration Results Mathisca de Gunst Department of Mathematics, Stochastics Section VU University - Amsterdam Harold McGillavry and Ronald van Kesteren Department of Molecular and Cellular Neurobiology VU University - Neuroscience Campus - Amsterdam