Discovering MultipleLevels of Regulatory Networks

Similar documents
Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions)

Measuring TF-DNA interactions

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Chapter 15 Active Reading Guide Regulation of Gene Expression

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Bio 119 Bacterial Genomics 6/26/10

GCD3033:Cell Biology. Transcription

BME 5742 Biosystems Modeling and Control

Introduction to Bioinformatics

Welcome to Class 21!

1. In most cases, genes code for and it is that

CHAPTER : Prokaryotic Genetics

Multiple Choice Review- Eukaryotic Gene Expression

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

RNA Synthesis and Processing

Computational Biology: Basics & Interesting Problems

From gene to protein. Premedical biology

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Regulation of Gene Expression

CS-E5880 Modeling biological networks Gene regulatory networks

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

Computational Cell Biology Lecture 4

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

Biology 112 Practice Midterm Questions

Regulation of gene expression. Premedical - Biology

Lecture 4: Transcription networks basic concepts

Honors Biology Reading Guide Chapter 11

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Warm-Up. Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22)

In Genomes, Two Types of Genes

Eukaryotic Gene Expression

Introduction. Gene expression is the combined process of :

Translation and Operons

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

CONJOINT 541. Translating a Transcriptome at Specific Times and Places. David Morris. Department of Biochemistry

Regulation of Transcription in Eukaryotes. Nelson Saibo

Prokaryotic Regulation

UE Praktikum Bioinformatik

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p.

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Three types of RNA polymerase in eukaryotic nuclei

Regulation of Transcription in Eukaryotes

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g.

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

Chapter

Chapter 9 DNA recognition by eukaryotic transcription factors

Gene Expression. Molecular Genetics, March, 2018

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

From Gene to Protein

ASSESSING TRANSLATIONAL EFFICIACY THROUGH POLY(A)- TAIL PROFILING AND IN VIVO RNA SECONDARY STRUCTURE DETERMINATION

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

MIP543 RNA Biology Fall 2015

Deciphering regulatory networks by promoter sequence analysis

Chapter 17 The Mechanism of Translation I: Initiation

Regulation of Gene Expression

Eukaryotic vs. Prokaryotic genes

Stochastic simulations

56:198:582 Biological Networks Lecture 8

Gene Control Mechanisms at Transcription and Translation Levels

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Molecular Biology (9)

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Controlling Gene Expression

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Types of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell.

Inferring Protein-Signaling Networks

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Geert Geeven. April 14, 2010

Name Period The Control of Gene Expression in Prokaryotes Notes

Network Biology-part II

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Bi 8 Lecture 11. Quantitative aspects of transcription factor binding and gene regulatory circuit design. Ellen Rothenberg 9 February 2016

Prokaryotic Gene Expression (Learning Objectives)

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Genetics 304 Lecture 6

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

SYLLABUS AND COURSE POLICIES

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

PROTEIN SYNTHESIS: TRANSLATION AND THE GENETIC CODE

Molecular Biology of the Cell

Molecular Biology of the Cell

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Lecture 7: Simple genetic circuits I

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering

Whole-genome analysis of GCN4 binding in S.cerevisiae

Biology I Fall Semester Exam Review 2014

Flow of Genetic Information

Bi 1x Spring 2014: LacI Titration

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Gene regulation II Biochemistry 302. Bob Kelm February 28, 2005

BioControl - Week 6, Lecture 1

Translation Part 2 of Protein Synthesis

Clustering and Network

Transcription:

Discovering MultipleLevels of Regulatory Networks IAS EXTENDED WORKSHOP ON GENOMES, CELLS, AND MATHEMATICS Hong Kong, July 25, 2018 Gary D. Stormo Department of Genetics

Outline of the talk 1. Transcriptional Regulation Specificity of Transcription Factors (TFs) Models and methods of determination Relationship to epigenetics Uses: network modeling and interpreting genomic variants Cooperativity between TFs Measurements, Effects on Specificity Making cis-regulatory elements/modules Post-Transcriptional Regulation Opportunities for regulation Mechanisms Feedback loops

Structure of EGR1-DNA Complex Specificity logo of EGR1-DNA interaction

Representing TF Specificity with a Position Weight Matrix (PWM) Model (aka: Weight Matrix, PSSM) A: -8 10-1 2 1-8 C: -10-9 -3-2 -1-12 G: -7-9 -1-1 -4-9 T: 10-6 9 0-1 11

PWM Model Score = -24.a C T A T A A t g A: -8 10-1 2 1-8 C: -10-9 -3-2 -1-12 G: -7-9 -1-1 -4-9 T: 10-6 9 0-1 11

PWM Model Score = 43.a c T A T A A T g t A: -8 10-1 2 1-8 C: -10-9 -3-2 -1-12 G: -7-9 -1-1 -4-9 T: 10-6 9 0-1 11

A: -8 10-1 2 1-8 C: -10-9 -3-2 -1-12 G: -7-9 -1-1 -4-9 T: 10-6 9 0-1 11 Score S i W = W S i PWM is a linear model: S i encodes the sequence (which base occurs at each position) W weights those encoded features to provide the score Easy to add more features if they are necessary

A: -8 10-1 2 1-8 C: -10-9 -3-2 -1-12 G: -7-9 -1-1 -4-9 T: 10-6 9 0-1 11 Score S i W = W S i PWM is a linear model: S i encodes the sequence (which base occurs at each position) W weights those features to provide the score Easy to add more features if they are necessary George Box: All models are wrong. Some models are useful.

Parameter estimation Various methods for determining parameters: Discriminant learning Probabilistic modeling (i.e. log-odds) Basis of most motif discovery algorithms Regression on quantitative data Binding energy models Stormo (2013) Quantitative Biology 1:115-130

Probabilistic modeling based on known sites N(b,i) PFM (PPM, PWM) PWM (PSSM) F(b,i) W(b,i) = log[f(b,i)/p(b)] I(i) = F(b,i)W(b,i) Motif discovery by Finding sites with max I

Classic Logo (from Tom Schneider): Height of column at each position is Information Content Each base in proportion to its frequency

Binding probabilities depend on the protein concentration Positions are normalized independently, leading to apparent non-independence and mis-0rdering of probabilities Biophysical (energy) models are preferred

Measuring Specificity ( vs Affinity)

Modeling Specificity from high-throughput methods Specificity Modeling Stormo and Zhao, Nature Reviews Genetics, 2010

Diverse sets: >100 TFs ~20 TFs ~240 TFs Weirauch et al >1000 TFs

Uses Expectation Maximization (EM) to simultaneously infer the binding site on each sequence and the parameters of the model (PWM) Out performs all other algorithms on in vitro data, Comparable on in vivo data (ChIP-seq)

HT-SELEX (SELEX-Seq) P(S i b) P(S i ) 1 1+e E i μ Compared to reference sequence with E = 0 P S i b P S i = P S ref b P S ref 1+e μ 1+e E i μ

Spec-seq (specificity by sequencing) P(S i b) P(S i u) = eμ E i Compared to reference sequence with E = 0 P S ref b P S ref P S i b = e E i ln P S i P S ref b P S ref P S i b P S i = E i

Spec-seq: Specificity by sequencing P + S i P S i K A (S i ) = [P S i] P [S i ] K A S 1 : K A S 2 : : K A S n = P S 1 S 1 : P S 2 S 2 : : P S n S n Zuo and Stormo, Genetics, 2014

Can easily measure effects of methylation: M = mc; W = mc on opposite strand Zuo et al, Sciences Advances, 2017

How well do binding models predict regulatory sites (and networks) in cells? In bacteria, pretty well In eukaryotes, quite poorly What information is missing? Only a small fraction of the genome is accessible available for interactions with TFs With that added information, prediction is much improved DNA methylation can inhibit or promote binding not sequence alone but epigenetic marks Cooperativity between TFs can be very important Can also lead to latent specificity

Dnase Hypersensitivity + catalog of PWMs can give rise to GRN NEPH, ET AL 2012 Cell.

Weinhold et al, Nat Gen 2014

Measuring Cooperativity Previous method Determine cooperativity factors of Sox-Oct binding using fluorescently labeled DNA targets. Oct 4 with 11 different Sox TFs ~10 different sequences, each in a separate gel lane

Coop-seq for combinatorial binding can get all of the important parameters in one experiment, including cooperativity Calculating cooperativity S i i = K = K S i = K S i K S i K S i K S i K S i i = 1 : no cooperativity. i 1 : anti-cooperativity. (if i = 0, two proteins are mutually exclusive.) Stormo, i 1 : binding of the second protein is facilitated by binding of the first. Zuo, Chang, Briefings in Functional Genomics, 2015

Quantitative profiling of selective Sox/POU pairing on hundreds of sequences in parallel by Coop-seq (NAR, 2017) Chang et al, collaboration with Ralf Jauch lab

Protein pairs tested in China Sox2 with POU proteins Oct4 Brn4 Oct6 Brn2 Oct4 with Sox proteins Sox2 Sox17 Sox17EK Sox5 Sox15 Sox18

-Omega Energy Sox Family-Oct4 cooperativity energy 8 6 4 2 0-1 0 1 2 3 4 5 6-2 -4-6 Spacer Sox5-Oct4 Sox15-Oct4 Sox17-Oct4 Sox17EK-Oct4 Sox18-Oct4 Sox2-Oct4

Hu et al. J Mol Biol. 2017 Sites that bind Pax6 and Sox2 cooperatively prefer a non-consensus sequence for the Pax6 site Corresponds to sites observed in vivo with ChIP-seq Regulates neuronal developmental genes.

DNA-dependent formation of transcription factor pairs alters their binding specificity. Jolma, et al, Nature, 2015 Surveyed ~9400 pairs of TFs Found 315 (~3%) with significant co-motifs Many have alterations to consensus sites for TFs

Summary of Part 1 Good methods/models exist for TF specificity and cooperativity, including epigenetic effects Combined with additional data, e.g. accessibility, chromatin architecture, expression changes, can lead to network inference and causal modeling Open problems More data needed (comprehensive list of motifs; which TFs are cooperative; effects of epigenetics) How are epigenetic marks established? Causal, cooperative, feedback? Small differences in energy can lead to larger than expected differences in regulation. Kinetic proofreading? Other mechanisms?

Overall Process of Gene Expression DNA transcription RNA translation Protein Regulation can happen at each step Gene Regulatory Networks often only refers to transcriptional regulation, misses much (>50%?) of the total network

Result: on/off (up/down) Overall Process of Gene Expression DNA transcription RNA translation Protein Pre-transcriptional regulation: Chromatin modifications Epigenetics Nuclear architecture Transcriptional initiation regulation: TF binding: activators/repressors Cis-regulatory modules Looping, insulators Signal transduction TF modifications

Overall Process of Gene Expression DNA transcription RNA translation Protein Post-transcriptional regulation: Modulation of translation (on/off; up/down) splicing/alternative splicing Modulation of termination and stability localization RNA editing Frame-shifting

Mechanisms of mrna regulation: RNA-RNA interactions (mirnas, lncrnas, ) Protein-RNA interactions (many examples such as splicing factors, translational inhibitors, editing enzymes) RNA self-regulation (eg. riboswitches) Often involve mrna secondary structures this makes motif discovery much harder

A compendium of RNA-binding motifs for decoding gene regulation Debashish et al. Nature 499,172 177(11 July 2013) Here we report a systematic analysis of the RNA motifs recognized by RNA-binding proteins, encompassing 205 distinct genes from 24 diverse eukaryotes

Prokaryotic riboswitch motifs Widely exist in prokaryotic genomes. Unique RNA structures. http://www.yale.edu/breaker Their structures are conserve across species.

Prokaryotic mrna regulatory motifs Example of Riboswitch regulation mechanism: Case 1: Metabolite is limited. Case 2: Metabolite is abundant. 1 UUUUU AUG 2 3 4 ORF Transcription is completed. Attenuato r UUUUU AUG Transcription is terminated. ORF http://www.yale.edu/breake

Examples of Post-transcriptional autoregulation Ribosomal proteins Primary target: rrna; secondary: own mrna trna synthetases Primary target: trna; secondary: own mrna Translation initiation and release factors Some splicing factors In polycistronic mrnas, translational coupling

Russell Betney, Eric de Silva, Jawahar Krishnan, et al. RNA 2010 16: 655-663

Gene X Activity high low high low Expression

Summary of post-transcriptional regulation part Many opportunities (steps) for regulation Structural motifs are harder to find than sequence motifs Phylogenetic conservation can be a key to finding structural motifs Autoregulation can provide efficient control of expression levels Autoregulation is easy to evolve; mrnas evolves to bind to protein to be inactivated when sufficient protein exists

Strategy to identify autoregulated genes Use collection of yeast strains with GFP fused to protein GFP fusion protein GFP Genes Native expression of the protein Make inducible plasmids with creless versions of candidate genes P3 m-cherry CRE-less

Upon induction, mcherry version is made. Without autoregulation both proteins accumulate in cells GFP fusion protein GFP Genes P3 m-cherry m-cherry fusion protein CRE-less With autoregulation mcherry protein turns off expression of GFP protein GFP Genes P3 m-cherry m-cherry fusion protein CRE-less

Positive control, known to be Autoregulated Red channel PDC1: Pyruvate Decarboxylase Isozyme Green channel

Negative RPS28A Red channel Green channel

New positive Red channel RPL1b Green channel

Acknowledgements Stormo Lab Kenny Chang Manishi Pandey Shuxiang Ruan Zheng Zuo David Granas Basab Roy Jonathan Cher Collaborators Josh Swamidass (BME Wash U) Ralf Jauch (Guangzhou Institutes; now at Univ of Hong Kong) Funding: NIH