Deciphering regulatory networks by promoter sequence analysis

Similar documents
Intro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye!

XIII International PhD Workshop OWD 2011, October 2011

Discovering MultipleLevels of Regulatory Networks

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00.

Introduction to Bioinformatics

Measuring TF-DNA interactions

Similarity of position frequency matrices for transcription factor binding sites

Alignment. Peak Detection

Chapter 15 Active Reading Guide Regulation of Gene Expression

Computational methods for predicting protein-protein interactions

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Example of Function Prediction

Graph Alignment and Biological Networks

Chapter 8. Regulatory Motif Discovery: from Decoding to Meta-Analysis. 1 Introduction. Qing Zhou Mayetri Gupta

Transcrip:on factor binding mo:fs

Geert Geeven. April 14, 2010

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Bioinformatics. Dept. of Computational Biology & Bioinformatics

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Introduction to Bioinformatics Online Course: IBT

Procedure to Create NCBI KOGS

Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions)

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes

Ch. 9 Multiple Sequence Alignment (MSA)

Transcription Regulation And Gene Expression in Eukaryotes UPSTREAM TRANSCRIPTION FACTORS

Computation-Based Discovery of Cis-Regulatory. Modules by Hidden Markov Model

Computational Genomics. Uses of evolutionary theory

Quantitative Bioinformatics

Transcrip)on Regula)on And Gene Expression in Eukaryotes Cycle G2 (lecture 13709) FS 2014 P Ma?hias & RG Clerc

Quantitative Measurement of Genome-wide Protein Domain Co-occurrence of Transcription Factors

Kernels for gene regulatory regions

Whole Genome Alignments and Synteny Maps

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

RNA Synthesis and Processing

Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test *

Multivariate point process models

Network Biology-part II

A Database of human biological pathways

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

SUPPLEMENTARY INFORMATION

Synteny Portal Documentation

Few selected Human Transcription Factors Sequence Analysis and their Phylogenetic Relationship

QB LECTURE #4: Motif Finding

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

identifiers matched to homologous genes. Probeset annotation files for each array platform were used to

Bioinformatics Exercises

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Position-specific scoring matrices (PSSM)

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Chapter 7: Regulatory Networks

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

BLAST. Varieties of BLAST

PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny

Genome 541 Introduction to Computational Molecular Biology. Max Libbrecht

Comparative Network Analysis

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Inferring Protein-Signaling Networks

Whole Genome Human/Mouse Phylogenetic Footprinting of Potential Transcription Regulatory Signals. E. Cheremushkin, A. Kel

Fundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs

Practical considerations of working with sequencing data

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

12-5 Gene Regulation

CSCE555 Bioinformatics. Protein Function Annotation

Bioinformatics Chapter 1. Introduction

Transcription factors (TFs) regulate genes by binding to their

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Towards Reverse Engineering of Genetic Regulatory Networks

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Gene Regulation and Expression

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Transcription Factor Binding Site Positioning in Yeast: Proximal Promoter Motifs Characterize TATA-Less Promoters

Clustering and Network

Identifying Signaling Pathways

S1 Gene ontology (GO) analysis of the network alignment results

How much non-coding DNA do eukaryotes require?

Supplementary Information

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Three types of RNA polymerase in eukaryotic nuclei

Latent Variable models for GWAs

Comparing transcription factor regulatory networks of human cell types. The Protein Network Workshop June 8 12, 2015

BIOINFORMATICS LAB AP BIOLOGY

Whole-genome analysis of GCN4 binding in S.cerevisiae

Regulation of Transcription in Eukaryotes. Nelson Saibo

Comparative genomics: Overview & Tools + MUMmer algorithm

Sequence motif analysis

Genome 541! Unit 4, lecture 3! Genomics assays

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Different gene regulation strategies revealed by analysis of binding motifs

Peter Pristas. Gene regulation in eukaryotes

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Transcription:

Bioinformatics Workshop 2009 Interpreting Gene Lists from -omics Studies Deciphering regulatory networks by promoter sequence analysis Elodie Portales-Casamar University of British Columbia www.cisreg.ca Bioinformatics Workshop - Interpreting Gene Lists from -omics Studies 1 Module #: Title of Module Bioinformatics Workshop - Interpreting Gene Lists from -omics Studies 2

Overview Part 1: Overview of transcription Lab 1: Promoters in Genome Browser (UCSC and PAZAR) Part 2: Prediction of transcription factor binding sites using binding profiles ( Discrimination ) Lab 2: TFBS scan (ORCAtk) Part 3: Interrogation of sets of co-expressed genes to identify mediating transcription factors Lab 3: TFBS Over-Representation (opossum) 3 Restrictions in Coverage Focus on Eukaryotic cells and PolII Promoters Principles apply to prokaryotes Will provide suggestions for similar tools for other species as requested Many of the examples drawn from the Wasserman lab s work there are equivalent tools 4

Part 1 Introduction to transcription in eukaryotic cells 5 Complexity in Transcription Chromatin Distal enhancer Proximal enhancer Core Promoter Distal enhancer 6

Studying gene expression at the bench EMSA DNase I footprinting ChIP- chip SELEX experiment Gene reporter assay Expensive and Time-Consuming!!! http://w ww.chiponchip.org/ http://w w w.abcam.com http://w w w.hku.hk http://dukehealth1.org http://opbs.okstate.edu 7 PAZAR and UCSC 8

Part 2 Prediction of TF Binding Sites Teaching a computer to find TFBS 9 TF Binding Profile Aligned binding sites TCACTATGATTCAGCAACAAA TCACAGTGAGTCGGCAAAATT TCATGCTGACTCAGCGGATCG CAACCATGACACAGCATAAAA CAGGCATGACATTGCATTTTT TAATGGTGACAAAGCAACTTT GGAGCATGACCCAGCAGAAGG CTGGGATGACATAGCATTCAT TCAGAATGACAAAGCAGAAAT TCACCGTTACTCAGCACTTTG AGGTGGTGATGTTGCATCACA CCAGGATGACTTAGCAAAAAC AGCCTGTGACTGGGCCGGGGC AGACAATGACTAAGCAGAAAT TCCCCGTGACTCAGCGCTTTG TCAGCATGACTCAGCAGTCGC CCTCCATGACAAAGCACTTTT AGCGGGTGACCAAGCCCTCAA TCAGGGTGACTCAGCAGCTTG TCTGTGTGACTCAGCTTTGGA Position Frequency Matrix (PFM) A 10 0 0 20 0 6 5 16 0 0 15 C 1 0 0 0 17 2 10 0 0 20 2 G 9 0 19 0 1 1 1 2 20 0 2 T 0 20 1 0 2 11 4 2 0 0 1 Position Specific Scoring Matrix (PSSM) A 0.9-2.5-2.5 1.8-2.5 0.2 0.0 1.5-2.5-2.5 1.4 C -1.5-2.5-2.5-2.5 1.6-1.0 0.9-2.5-2.5 1.8-1.0 G 0.7-2.5 1.7-2.5-1.5-1.5-1.5-1.0 1.8-2.5-1.0 T -2.5 1.8-1.5-2.5-1.0 1.0-0.3-1.0-2.5-2.5-1.5 A T G A T T C A G C A Score = 13.6 Binding Profile Logo 10

JASPAR: AN OPEN-ACCESS DATABASE OF TF BINDING PROFILES ( jaspar.genereg.net ) 11 Analysis of TFBS with Phylogenetic Footprinting Scanning a single sequence Scanning a pair orf orthologous sequences for conserved patterns in conserved sequence regions A dramatic improvement in the percentage of biologically significant detections Low specificity of profiles: too many hits great majority not biologically significant 12

Phylogenetic Footprinting Dramatically Reduces Spurious Hits Human Mouse Actin, alpha cardiac 13 Choosing the right species for pairwise comparison... CHICKEN MOUSE HUMAN COW HUMAN HUMAN 14

ORCAtk 15 TFBS Discrimination Tools Phylogenetic Footprinting Servers FOOTER http://biodev.hgen.pitt.edu/footer_php/footerv2_0.php CONSITE http://asp.ii.uib.no:8090/cgi-bin/consite/consite/ rvista http://rvista.dcode.org/ ORCAtk http://burgundy.cmmt.ubc.ca/cgi-bin/orcatk/orcatk SNPs in TFBS Analysis RAVEN http://burgundy.cmmt.ubc.ca/cgi-bin/raven/a?rm=home Prokaryotes or Yeast PRODORIC http://prodoric.tu-bs.de/ YEASTRACT http://www.yeastract.com/index.php Software Packages TOUCAN http://homes.esat.kuleuven.be/~saerts/software/toucan.php Programming Tools TFBS http://tfbs.genereg.net/ ORCAtk http://burgundy.cmmt.ubc.ca/cgi-bin/orcatk/orcatk 16

Part 3: Inferring Regulating TFs for Sets of Co-Expressed Genes 17 Two Examples of TFBS Over-Representation Foreground Foreground Background More Genes with TFBS Background More Total TFBS 18

Statistical Methods for Identifying Over-represented TFBS Fisher exact probability scores Based on the number of genes containing the TFBS relative to background Hypergeometric probability distribution Binomial test (Z scores) Based on the number of occurrences of the TFBS relative to background Normalized for sequence length Simple binomial distribution model 19 opossum Procedure Set of coexpressed genes Automated sequence retrieval from EnsEMBL Phylogenetic Footprinting ORCA Putative mediating transcription factors Statistical significance of binding sites Detection of transcription factor binding sites 20

Validation using Reference Gene Sets A. Muscle-specific (23 input; 16 analyzed) B. Liver-specific (20 input; 12 analyzed) Rank Z-score Fisher Rank Z-score Fisher SRF 1 21.41 1.18e-02 HNF-1 1 38.21 8.83e-08 MEF2 2 18.12 8.05e-04 HLF 2 11.00 9.50e-03 c-myb_1 3 14.41 1.25e-03 Sox-5 3 9.822 1.22e-01 Myf 4 13.54 3.83e-03 FREAC-4 4 7.101 1.60e-01 TEF-1 5 11.22 2.87e-03 HNF-3beta 5 4.494 4.66e-02 deltaef1 6 10.88 1.09e-02 SOX17 6 4.229 4.20e-01 S8 7 5.874 2.93e-01 Yin-Yang 7 4.070 1.16e-01 Irf-1 8 5.245 2.63e-01 S8 8 3.821 1.61e-02 Thing1-E47 9 4.485 4.97e-02 Irf-1 9 3.477 1.69e-01 HNF-1 10 3.353 2.93e-01 COUP-TF 10 3.286 2.97e-01 TFs with experimentally-verified sites in the reference sets. 21 Empirical Selection of Parameters based on Reference Studies 40 30 p65 NF- _B c-rel p50 HNF-1 SRF Z-score 20 10 0 TEF-1 MEF2 FREAC-2 Myf cebp SP1 HNF-3 _ Muscle Liver NF-_B Z-score cutoff Fisher cutoff -10-20 1.0E-09 1.0E-07 1.0E-05 1.0E-03 1.0E-01 Fisher p-value 22

Structurally-related TFs with Indistinguishable TFBS Most structurally related TFs bind to highly similar patterns Zn-finger is a big exception 23 opossum Server 24

TFBS Over-representation Analysis Tools o P O S S U M : h t t p : / / w w w. c i s r e g. c a / o P O S S U M T F M - E x p l o r e r : h ttp :/ / b i o i n f o. lifl.fr/ T F M E / fo rm A s a p : h ttp :/ / a s a p. b i n f. k u. d k / A s a p / H o m e.h t m l 25 REFLECTIONS Part 2 Futility Theorem Essentially predictions of individual TFBS have no relationship to an in vivo function Successful bioinformatics methods for site discrimination incorporate additional information (clusters, conservation) Part 3 TFBS over-representation is a powerful new means to identify TFs likely to contribute to observed patterns of co-expression Generally best performance has been with data directly linked to a transcription factor Statistical significance is extremely sensitive to gene set size TFs in the same structural family tend to have similar binding preferences 26

The end More tomorrow in the lab 27 Part 4: de novo Discovery of TF Binding Sites (Gibbs sampling method) 28

Gibbs Sampling (grossly over-simplified) ttcgctcc cgatacgc tgctacct tgacttcc agacctca ctgtagtg acgcatct 1 2 3 4 5 6 7 8 A 2 0 2 2 2 1 0 1 C 0 2 3 3 2 1 6 2 G 0 4 1 0 1 0 1 1 T 4 1 1 2 2 5 0 2 29 Pattern Discovery Gibbs sampling is guaranteed to return an optimal pattern if repeated sufficiently often Procedure is fast, so running many 1000s of times is feasible Unfortunately, we have a problem what if the mediating TFBS are not strongly overrepresented relative to other patterns 30

Applied Pattern Discovery is Acutely Sensitive to Noise PATTERN SIMILARITY vs. TRUE MEF2 PROFILE 18 16 14 12 Pink line is negative control with no Mef2 sites included 10 0 100 200 300 400 500 600 True Mef2 Binding Sites SEQUENCE LENGTH 31 Four Approaches to Improve Sensitivity Better background models -Higher-order properties of DNA Phylogenetic Footprinting Human:Mouse comparison eliminates ~75% of sequence Regulatory Modules Architectural rules Limit the types of binding profiles allowed TFBS patterns are NOT random 32

Pattern Discovery Summary Pattern discovery methods can recover overrepresented patterns in the promoters of coexpressed genes Methods are acutely sensitive to noise, indicating that the signal we seek is weak TFs tolerate great variability between binding sites As for pattern discrimination, supplementary information/approaches are required to overcome the noise 33