Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji
|
|
- Dwayne Nicholson
- 5 years ago
- Views:
Transcription
1 Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu)
2 Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC TCACACCTGACATGAAAAGGCACATGAGGATCCTCAAATACCCCGTGATCAGTCTCAGGGTAGCTCTCATAGCCTGGACAGG GCCCCCCTCGGGGGTTGCGCCCAGGTCCAGGCGGGGGATGCACAGCAACAGTCACCGAAGCAGAAGCCGTCACAGTGGTGAT GGGCTGGCAGTAGCTGGGCACAGAGCTGCCCATGGCGGTGGACGTTGGGTTCCGAGGGTTGTGAGAACGGGCCCCACGGGGC CCTGAGCGGTCCCTATTGCTAGGGCCAGAATGCCCTTCAGTAGAAATTTCAAAAGCGTCTCTGCGCGGTCTGTAGGGGGGTG GCCGCAAGCCTTCTCTAGGGGGATCCCTTCGAGGCTGCTGGCCTTGCCGTCCAGGGGACAAGGAGCCAGAGTCCAGGTGGGG CTGTTGCCGAGGGGTCAAGGGAGGCTGATGTCTGGAGTCCGGATGGACCACCTGCAGAGGAGAGACATAGGTCAACACAGGG AGGTAGGATGGTGGTGATGTTCCACCCACAAAAGAAAACCTATTCCTTTAGAAACCTCCAGGATGTGAATCCTGCCTGCACC TGCACAGCTGGCTGGAGGCATATAGCCACTGCCCATAGATCTCAACTTACCCTCACAACCAACTGCCCCCAGGCCTAAGTTC TCTGCCTCAAAACTGCCAAGGCCTGGATAGCCAAGAGCCTGGGTGTCTTGGAAATATGCAACCATAAATAGTAGCTTTTAGA AGTATAAGGCTCCTGTTTCTGGGTCATATTAGTGTTGTTTTCACCTGTCCCCAGCCCTAAGCCAGGTGTGGCCAGAAGCAAA TGTACTGTAAGAGCAGAGCAAAAACTTCCACACAGATAGTTCTGTTAGGCAATACATCTCTGCCTGACTATTAGGAATCTGG TTTCTGGGTCCTCTGTACAAAGCTCGGAGCAACACAGTGGCCACATCAATCAAAAGGACCGTGACCAACTTCAAAGTCGGTG AGCTTGTACCTATTTTTAGGCTCCTGCTGAACAGAACCAGATTCACACTACAGCTCAGCAGGGCATCGTCACGGGTGTGTGT GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGGGGGGGGGGGGTGGACAGAGGACGGGGACACAATTCACTGGCCAGCCCTT CTCTCCTTCAAGGAAGGCTGCTCTAGCCTGGGACTGGAATACACATTTCCTGTAAACATGGTGGGGGCCTCAGGCAAGCCAG AGTTTTGGAGCCTTCCTTAACTCTTCAAGGTGAGCATCTTGACTTGGAGGGTGGGGGTGCGGGTAAGGAAGGAACCTGTGGA CTCCTCCCTACAAGACAGAAAAGGAATAAGCCACGAAGACAATAACGATTTTTGTATCAAGCGTCCTCTCCCATTTCAGCTT ACCTGACAATGAAATCAAATTCGGACCCTGCAAGCATCAGTACACCCAGCAGAGTGGACACAGCACCGTCCAGAACGGGAGC AAACATGTGCTCCAGAGCGAGCATAGCCCTGTGGTTCTTGTCCCCAATGGCTGTCAGAAAGGCCTGAACAAAGGAGAAAATT GACACGGTCACATTCTGGGTGTGGTAAAGTGCTCAGCTGTGTCTATACTTGGGTTTTGTAT Basic units of DNA: A, C, G, T Total amount of DNA in human genome: 3 * 10 9 base pairs (bp)
3 A gene needs to be expressed in order to execute its function Human: ~30,000 genes Gene Gene Gene Gene Gene DNA A A G G T C T T C C A G G C T A Transcription Gene Expression mrna A A G G U C G U A G C G U Translation Protein I K V R Q Function Cell skeleton Enzymes Signal molecules
4 Gene expression is *ghtly regulated Expression No Expression Spatially X Y Z Temporally X A B A B A B Y Z C C C X Y Z X Y Z
5 Transcrip*onal regula*on is an important way to control gene expression Transcription factors (TF): TF1 TF2 Transcription factor binding sites (TFBS): CCACCCAC, TAATAAAAT TF1 TF2 TF1 TTATGTAACCTGCACTTACTACCACCCACAACATAATAAAATCTAAACCACTGAATGAAATACAAAATCTATGTATGA... TF2 TTATGTAACCTGCACTTACTACCACCCACAACATAATAAAATCTAAACCACTGAATGAAATACAAAATCTATGTATGA...
6 Transcrip*on factors recognize specific mo*fs TF GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TF TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA TF CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA TF TTAGAGGCACAATTGCTTGGGTGGTGCACAAAAAAACAAG TF AACAGCCTTGGATTAGCTGCTGGGGGGGTGAGTGGTCCAC TF ATCAGAATGGGTGGTCCATATATCCCAAAGAAGAGGGTAG Transcription Factor Binding Sites (TFBS) TGGGTGGTC TGGGTGGTA TGGGAGGTC TGGGTGGTG TGAGTGGTC TGGGTGGTC A C G T Motif A C G T
7 Transcrip*on factor binding sites are regulatory codes in the genome TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC TCACACCACCCATGTTTTGTTTATGAGGATCCTCAAATACCCCGTGATCAGTCTCAGGGTAGCTCTCATAGCCTGGACAGGG CCCCCCTCGGGGGTTGCGCCCAGGTCCAGGCGGGGGATGCACAGCAACAGTCACCGAAGCAGAAGCCGTCACAGTGGTGATG GGCTGGCAGTAGCTGGGCACAGAGCTGCCCATGGCGGTGGACGTTGGGTTCCGAGGGTTGTGAGAACGGGCCCCACGGGGCC CTGAGCGGTCCCTATTGCTAGGGCCAGAATGCCCTTCAGTAGAAATTTCAAAAGCGTCTCTGCGCGGTCTGTAGGGGGGTGG CCGCAAGCCTTCTCTAGGGGGATCCCTTCGTTGCTGCTGGCCTTGCCGTCCAGGGGACAAGGAGCCAGAGTCCAGGTGGGGC TGTTGCCGAGGGGTCAAGGGAGGCTGATGTCTGGAGTCCGGATGGACCACCTGCAGAGGAGAGACATAGGTCAACACAGGGA GGTAGGATGGTGGTGATGTTCCACCCACAAAAGAAAACCTATTCCTTTAGAAACCTCCAGGATGTGAATCCTGCCTGCACCT GCACAGCTGGCTGGAGGCATATAGCCACTGCCCATAGATCTCAACTTACCCTCACAACCAACTGCCCCCAGGCCTAAGTTCT CTGCCTCAAAACTGCCAAGGCCTGGATAGCCAAGAGCCTGGGTGTCTTGGAAATATGCAACCATAAATAGTAGCTTTTAGAA GTATAAGGCTCCTGTTTCTGGGTCATATTAGTTTTGTTTTCACCTGTCCCCACCCATAAGCCAGGTGTGGCCAGAAGCAAAT GTACTGTAAGAGCAGAGCAAAAACTTCCACACAGATAGTTCTGTTAGGCAATACATCTCTGCCTGACTATTAGGAATCTGGT TTCTGGGTCCTCTGTACAAAGCTCGGAGCAACACAGTGGCCACATCAATCAAAAGGACCGTGACCAACTTCAAAGTCGGTGA GCTTGTACCTATTTTTAGGCTCCTGCTGAACAGAACCAGATTCACACTACAGCTCAGCAGGGCATCGTCACGGGTGTGTGTG TGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGGGGGGGGGGGGTGGACAGAGGACGGGGACACAATTCACTGGCCAGCCCTTC TCTCCTTCAAGGAAGGCTGCTCTAGCCTGGGACTGGAATACACATTTCCTGTAAACATGGTGGGGGCCTCAGGCAAGCCAGA GTTTTGGAGCCTTCCTTAACTCTTCAAGGTGAGCATCTTGACTTGGAGGGTGGGGGTGCGGGTAAGGAAGGAACCTGTGGAC TCCACCCAACAAGACAGAAAAGGAATAAGCCACGAAGACAATAACGATTTTTGTATCAAGCGTCCTCTCCCATTTCAGCTTA CCTGACAATGAAATCAAATTCGGACCCTGCAAGCATCAGTACACCCAGCAGAGTGGACACAGCACCGTCCAGAACGGGAGCA AACATGTGCTCCAGAGCGAGCATAGCCCTGTGGTTCTTGTCCCCAATGGCTGTCAGAAAGGCCTGAACAAAGGAGAAAATTG ACACGGTCACATTCTGGGTGTGGTAAAGTGCTCAGCTGTGTCTATACTTGGGTTTTGTAT Transcription Factor Binding Sites (TFBS) Gene
8 Transcrip*on factor binding mo*fs/sites are gene*c basis for understanding gene regulatory network TF1 TF2 TF1 TF2 TACTACCACCCACAACATAATAAAATCTAA TF2 TF1 TTAATAAAATACCACCCACAACCTAAGGAT Gene1 Gene2 Transcription factors Other genes Activation Repression Other Interactions TF3 TF3 Gene3 Diseases Misregulation
9 Mo*f discovery and decoding regulatory programs in the genome Genomic Language GGCCCTGAGCGGTCCCTATTGCTGGGTGGTCAATGCCCTTCATCTGAAATTTCA AAAGCGTCTCTGCGCGGTCTGTAGGGGGGTGGCCGCAAGCCTTCTCTAGGGGGG CCCTGAGCGGTCCCTATTGCTAGGGCCAGAATGCCCTTCAGTAGAAATTTC step1 Dictionary GGCCCTGAGCGGTCCCTATTGCTGGGTGGTCAATGCCCTTCATCTGGAATTTCAstep2 AAAGCGTCTCTGCGCGGTCTGTAGGGGGGTGGCCGCAAGCCTTCTCTAGGGGGG CCCTGAGCGGTCCCTATTGCTAGGGCCAGAATGCCCTTCAGTAGAAATTTC Human Language guesswhatthestoryisaslongasyouknowthela nguageitshouldbeprettyeasy Guess what the story is. As long as you know the language, it should be pretty easy. step1 step2 Dictionary Know Guess Be
10 Mo*fs can be iden*fied as enriched sequence pakerns in co- regulated genes (Roth et al., 1998; Hughes et al., 2000; etc.) GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA Gene1 CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA Gene2 TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA Gene3 GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA Condition1 Condition2 Gene 1 Gene 2 Gene 3 Gene N
11 De novo mo*f discovery A C G T! A ! C ! G ! T ! ! A ! C ! G ! T ! Background : θ 0! Motif: Θ, W! q 0! q = q 1! [q 0,q 1 ]! S: GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA A: f ( A, Θ, W, q S, θ0) f ( S, A Θ, W, q, θ0) π( Θ, W, q) Inference by iterative estimation/sampling Θ,W,q! A! EM: Lawrence and Reilly (1990) Bailey and Elkan (1994), etc. Gibbs Sampler: Lawrence et al. (1993) Liu (1994), Liu et al. (1995), etc.
12 Mo*f discovery is difficult in mammalian genomes due to a low signal- to- noise ra*o yeast 100~1000 bp 100~1000 bp 100~1000 bp Gene1 Gene2 Gene3 human 10k~1000k bp 10k~1000k bp 10k~1000k bp Gene1 Gene2 Gene3
13 Genome- wide chroma*n immunoprecipita*on analysis: ChIP- chip and ChIP- seq
14 Data and nota*on Data: j=ip: Red j=ct: Green rep. k X ijk (i=1,2,...,i; j=1,2,...,j; k=1,2,...,k) I = 40,000,000 J = 2 or more K = 2 or 3 Each Column > $2100 probe i Goal: Identify rows where IP>Control (i.e. Peaks) DNA fragment (500~2000 bp long) Probes 35~300 bp spacing Previous work to detect binding regions: Kampa et al. (2004), Keles et al. (2004)
15 TileMap: a two step approach for peak detec*on (Ji and Wong, 2005, Bioinforma3cs, 21: ) STEP 1: Compute a test statistic for each probe to summarize probe level information STEP 2: Combine probe level test statistics of neighboring probes to help infer binding regions
16 Probe Level Summary n A simple way: Probe k=1 k=2 k=3 sample variance t-statistic i= i= i= i=i noise n Problem: unstable variance estimates due to small number of replicates
17 Variance Shrinkage Es*mator Probe I Sample Variance (df) 2 s 1 2 s 2 2 s 3 2 s I Mean Sum of Squares 2 s S 2 = [ s ( 2 i i s )] 2 Shrinkage Factor Variance Estimates Probe level test statistics B = 2 df + 2 I ( s I df + 2 ) I 1 S ˆ σˆ1 2 σˆ2 2 σˆ3 2 σˆ I ~ ~ t 1 t ~ 2 t ~ t 3 I Variance Shrinkage Estimator ˆ σ 2 2 (1 Bˆ) s Bˆ i = i + A modified t-statistic ~ t = i x 1 K i1 1 x i2 1 + σˆ K 2 i s 2
18 Shrinking variance increases sta*s*cal power Moving Average t-statistic, variance shrinking t-statistic, canonical Mean(X 1 )-Mean(X 2 )
19 Peak 2 (180bp) transgenics Neural tube expression Transgenics
20 Combining neighboring probes TileMap (HMM) 1. Compute the probe level test statistic t for each probe; 2. Estimate the distribution of t under H 0 and H 1 ; 3. Model t by a Hidden Markov Model, and decode the HMM.
21 Probe Effects
22 MAT Model (Johnson et al., 2006, PNAS, 103: ) Baseline on number of Ts A,C,G,T Count Square A,C,G at each position of the 25mer 25mer Copy Number along the Genome
23 MAT Example
24 TileProbe (Judy and Ji, 2009, Bioinforma3cs, 25: )
25 TileProbe vs. MAT
26 TileProbe vs. MAT
27 ChIP- seq: common designs n One sample analysis A ChIP sample only n Two sample analysis A ChIP sample + a negative control sample
28 CisGenome: two- sample analysis (Ji et al. 2008, Nature Biotechnology, 26: ) Alignment IP Control Exploration k 1i k 2i FDR computation n i =k 1i + k 2i k 1i n i ~ Binom(n i, p 0 ) Peak Detection Post Processing
29 CisGenome: one- sample analysis Alignment IP Exploration FDR computation Poisson Model k i ~ Poisson(λ 0 ) k i Negative Binomial Model k i λ i ~ Poisson(λ i ) λ i ~ Gamma(α, β) Marginally, k i ~ NegBinom(α, β) Peak Detection Post Processing
30 Background Reads Follow Nega*ve Binomial Distribu*on
31 FDR Es*ma*on: Nega*ve Binomial vs. Poisson
32 Boundary Refinement & Single Strand Filtering
33 Peak Length & Mo*f Coverage
34 ChIP- chip and ChIP- seq signals are correlated
35 ChIP and control read sampling rates vary across the genome
36 MACS (Zhang et al., 2008, Genome Biology, 9:R137) Locus-dependent background model: λ local = max(λ BG, [λ 1k,] λ 5k, λ 10k ) Read shift:
37 Others n GC content MOSAiCS (Kuan et al., 2012, JASA) n Peak shape PICS (Zhang et al., 2011, Biometrics, 67:151-63)
38 Mo*f analysis axer ChIP- X Reference: Ji HK, Vokes SA and Wong WH (2006) A compara3ve analysis of genome- wide chroma3n immunoprecipita3on data for mammalian transcrip3on factors. Nucleic Acids Research, 34:e146. doi: /nar/gkl803.
39 Mo*fs can be successfully recovered without prior informa*on
40 Matched controls are crucial for iden*fying the key transcrip*on factor binding mo*fs MDSCAN score: S = log( n) W T i= 1 j= A (Liu, Brutlag and Liu 2002, MDSCAN) p W ij log( p ij q j ) Motifs discovered from Sox2 ChIP-chip
41 Ascertaining the key mo*f by comparing to nega*ve controls regions ChIP-chip Regions Negative Control Regions Relative Enrichment (2.0 sites/1000 bp) / (2.0 sites/1000 bp) = 1 (r 1 ) (1.5 sites/1000 bp) / (0.3 sites/1000 bp) = 5
42 Random genomic control does not solve the problem
43 Matched genomic controls TF TF TF d2 d1 d3 Gene 1 Gene 2 Gene 3 d1 d2 Gene Y Gene X d3 Gene Z
44 Matched controls can solve the problem
45 Some insight: binding regions are GC- rich
46 Beyond ChIP- X n Limitations of ChIP-X (1) One TF at a time (2) Need good antibody n A new approach: predict ChIP-X using chromatin surrogates H3K4me1, H3K4me2, H3K4me3, H3K27ac, Dnase, FAIRE
47 Predict ChIP- X using chroma*n surrogates
48 Dnase predicts ChIP- seq (Pique-Regi R et al., 2011, Genome Res. 21, )
49 Dnase predicts ChIP- seq
50 Summary n Motifs ( ) Low specificity, no context information n ChIP-chip (2000, 2004) Increased specificity, genome-wide but limited to array design, contextaware, low resolution, require large number of cells n ChIP-seq (2007) High resolution, whole-genome, require less material n Dnase or other surrogates etc. (2011) Multiple TFs
Alignment. Peak Detection
ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie
More informationTranscrip:on factor binding mo:fs
Transcrip:on factor binding mo:fs BMMB- 597D Lecture 29 Shaun Mahony Transcrip.on factor binding sites Short: Typically between 6 20bp long Degenerate: TFs have favorite binding sequences but don t require
More informationGenome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics
Genome 541 Unit 4, lecture 2 Transcription factor binding using functional genomics Slides vs chalk talk: I m not sure why you chose a chalk talk over ppt. I prefer the latter no issues with readability
More informationGenome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics
Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics I believe it is helpful to number your slides for easy reference. It's been a while since I took
More informationChIP seq peak calling. Statistical integration between ChIP seq and RNA seq
Institute for Computational Biomedicine ChIP seq peak calling Statistical integration between ChIP seq and RNA seq Olivier Elemento, PhD ChIP-seq to map where transcription factors bind DNA Transcription
More informationMatrix-based pattern discovery algorithms
Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
More informationGenome 541! Unit 4, lecture 3! Genomics assays
Genome 541! Unit 4, lecture 3! Genomics assays Much easier to follow with slides. Good pace.! Having the slides was really helpful clearer to read and easier to follow the trajectory of the lecture.!!
More informationChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier
ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier Data visualization, quality control, normalization & peak calling Peak annotation Presentation () Practical session
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier
ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier Visualization, quality, normalization & peak-calling Presentation (Carl Herrmann) Practical session Peak annotation
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationMODEL-BASED APPROACHES FOR THE DETECTION OF BIOLOGICALLY ACTIVE GENOMIC REGIONS FROM NEXT GENERATION SEQUENCING DATA. Naim Rashid
MODEL-BASED APPROACHES FOR THE DETECTION OF BIOLOGICALLY ACTIVE GENOMIC REGIONS FROM NEXT GENERATION SEQUENCING DATA Naim Rashid A dissertation submitted to the faculty of the University of North Carolina
More informationIns?tute for Computa?onal Biomedicine. ChIP- seq. Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD
Ins?tute for Computa?onal Biomedicine ChIP- seq Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Plan 1. ChIP- seq 2. Quality Control of ChIP- seq data 3. ChIP- seq Peak detec?on 4. Peak Analysis and
More informationChapter 7: Regulatory Networks
Chapter 7: Regulatory Networks 7.2 Analyzing Regulation Prof. Yechiam Yemini (YY) Computer Science Department Columbia University The Challenge How do we discover regulatory mechanisms? Complexity: hundreds
More informationCSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs
CSCI1950 Z Computa3onal Methods for Biology Lecture 24 Ben Raphael April 29, 2009 hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs Subnetworks with more occurrences than expected by chance. How to
More informationPrinciples of Gene Expression
Principles of Gene Expression I. Introduc5on Genome : the en*re set of genes (transcrip*on units) of an organism Transcriptome : the en*re set of marns found in a cell at a given *me Proteome : the en*re
More informationNetworks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource
Networks Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks in biology Protein-Protein Interaction Network of Yeast Transcriptional regulatory network of E.coli Experimental
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationHidden Markov Models and some applications
Oleg Makhnin New Mexico Tech Dept. of Mathematics November 11, 2011 HMM description Application to genetic analysis Applications to weather and climate modeling Discussion HMM description Application to
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationIntro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye!
Today Gene regulation Synteny Good bye! Gene regulation What governs gene transcription? Genes active under different circumstances. Gene regulation What governs gene transcription? Genes active under
More informationJoint modelling of ChIP-seq data via a Markov random field model
Joint modelling of ChIP-seq data via a Markov random field model Y. Bao 1, V. Vinciotti 1,, E. Wit 2 and P. t Hoen 3,4 1 School of Information Systems, Computing and Mathematics, Brunel University, UK
More informationTECHNICAL REPORT NO. 1151
DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Avenue Madison, WI 53706 TECHNICAL REPORT NO. 1151 January 12, 2009 A Hierarchical Semi-Markov Model for Detecting Enrichment with Application
More informationnetworks in molecular biology Wolfgang Huber
networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic
More informationChapter 15 Active Reading Guide Regulation of Gene Expression
Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,
More information(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.
1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the
More informationGenome 541 Introduction to Computational Molecular Biology. Max Libbrecht
Genome 541 Introduction to Computational Molecular Biology Max Libbrecht Genome 541 units Max Libbrecht: Gene regulation and epigenomics Postdoc, Bill Noble s lab Yi Yin: Bayesian statistics Postdoc, Jay
More informationWhole-genome analysis of GCN4 binding in S.cerevisiae
Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?
More informationDe novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes
De novo identification of motifs in one species Modified from Serafim Batzoglou s lecture notes Finding Regulatory Motifs... Given a collection of genes that may be regulated by the same transcription
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationGene Regulatory Networks II Computa.onal Genomics Seyoung Kim
Gene Regulatory Networks II 02-710 Computa.onal Genomics Seyoung Kim Goal: Discover Structure and Func;on of Complex systems in the Cell Identify the different regulators and their target genes that are
More informationHidden Markov Models and some applications
Oleg Makhnin New Mexico Tech Dept. of Mathematics November 11, 2011 HMM description Application to genetic analysis Applications to weather and climate modeling Discussion HMM description Hidden Markov
More informationChapter 8. Regulatory Motif Discovery: from Decoding to Meta-Analysis. 1 Introduction. Qing Zhou Mayetri Gupta
Chapter 8 Regulatory Motif Discovery: from Decoding to Meta-Analysis Qing Zhou Mayetri Gupta Abstract Gene transcription is regulated by interactions between transcription factors and their target binding
More informationComplete all warm up questions Focus on operon functioning we will be creating operon models on Monday
Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA
More informationChemistry Chapter 26
Chemistry 2100 Chapter 26 The Central Dogma! The central dogma of molecular biology: Information contained in DNA molecules is expressed in the structure of proteins. Gene expression is the turning on
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationBioinformatics 2 - Lecture 4
Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what
More informationExhaustive search. CS 466 Saurabh Sinha
Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction
More informationIntroduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas
Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq
More informationDiscovering MultipleLevels of Regulatory Networks
Discovering MultipleLevels of Regulatory Networks IAS EXTENDED WORKSHOP ON GENOMES, CELLS, AND MATHEMATICS Hong Kong, July 25, 2018 Gary D. Stormo Department of Genetics Outline of the talk 1. Transcriptional
More informationGeert Geeven. April 14, 2010
iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the
More informationDeciphering regulatory networks by promoter sequence analysis
Bioinformatics Workshop 2009 Interpreting Gene Lists from -omics Studies Deciphering regulatory networks by promoter sequence analysis Elodie Portales-Casamar University of British Columbia www.cisreg.ca
More informationProteomics Systems Biology
Dr. Sanjeeva Srivastava IIT Bombay Proteomics Systems Biology IIT Bombay 2 1 DNA Genomics RNA Transcriptomics Global Cellular Protein Proteomics Global Cellular Metabolite Metabolomics Global Cellular
More informationProbabilistic models of biological sequence motifs
Probabilistic models of biological sequence motifs Discovery of new motifs Master in Bioinformatics UPF 2015-2016 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain what
More informationJMJ14-HA. Col. Col. jmj14-1. jmj14-1 JMJ14ΔFYR-HA. Methylene Blue. Methylene Blue
Fig. S1 JMJ14 JMJ14 JMJ14ΔFYR Methylene Blue Col jmj14-1 JMJ14-HA Methylene Blue Col jmj14-1 JMJ14ΔFYR-HA Fig. S1. The expression level of JMJ14 and truncated JMJ14 with FYR (FYRN + FYRC) domain deletion
More informationPriors in Dependency network learning
Priors in Dependency network learning Sushmita Roy sroy@biostat.wisc.edu Computa:onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu
More informationDiscovering molecular pathways from protein interaction and ge
Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why
More informationStatistical Inferences for Isoform Expression in RNA-Seq
Statistical Inferences for Isoform Expression in RNA-Seq Hui Jiang and Wing Hung Wong February 25, 2009 Abstract The development of RNA sequencing (RNA-Seq) makes it possible for us to measure transcription
More informationUnderstanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007
Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007 The Computational Lens Exposes the computational nature of natural processes and provides a language for their description.
More informationComparative analysis of RNA- Seq data with DESeq2
Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given
More informationClustering and Network
Clustering and Network Jing-Dong Jackie Han jdhan@picb.ac.cn http://www.picb.ac.cn/~jdhan Copy Right: Jing-Dong Jackie Han What is clustering? A way of grouping together data samples that are similar in
More informationfor the Analysis of ChIP-Seq Data
Supplementary Materials: A Statistical Framework for the Analysis of ChIP-Seq Data Pei Fen Kuan Departments of Statistics and of Biostatistics and Medical Informatics Dongjun Chung Departments of Statistics
More informationTranslation Part 2 of Protein Synthesis
Translation Part 2 of Protein Synthesis IN: How is transcription like making a jello mold? (be specific) What process does this diagram represent? A. Mutation B. Replication C.Transcription D.Translation
More informationName: SBI 4U. Gene Expression Quiz. Overall Expectation:
Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz
More informationAn overview of deep learning methods for genomics
An overview of deep learning methods for genomics Matthew Ploenzke STAT115/215/BIO/BIST282 Harvard University April 19, 218 1 Snapshot 1. Brief introduction to convolutional neural networks What is deep
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationShane T. Jensen, X. Shirley Liu, Qing Zhou and Jun S. Liu
Statistical Science 2004, Vol. 19, No. 1, 188 204 DOI 10.1214/088342304000000107 Institute of Mathematical Statistics, 2004 Computational Discovery of Gene Regulatory Binding Motifs: A Bayesian Perspective
More informationL3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus
L3.1: Circuits: Introduction to Transcription Networks Cellular Design Principles Prof. Jenna Rickus In this lecture Cognitive problem of the Cell Introduce transcription networks Key processing network
More informationEvolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites
Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites Paper by: James P. Balhoff and Gregory A. Wray Presentation by: Stephanie Lucas Reviewed
More informationQuantitative Bioinformatics
Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationCMARRT: A TOOL FOR THE ANALYSIS OF CHIP-CHIP DATA FROM TILING ARRAYS BY INCORPORATING THE CORRELATION STRUCTURE
CMARRT: A TOOL FOR THE ANALYSIS OF CHIP-CHIP DATA FROM TILING ARRAYS BY INCORPORATING THE CORRELATION STRUCTURE PEI FEN KUAN 1, HYONHO CHUN 1, SÜNDÜZ KELEŞ1,2 1 Department of Statistics, 2 Department of
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationEukaryotic Gene Expression
Eukaryotic Gene Expression Lectures 22-23 Several Features Distinguish Eukaryotic Processes From Mechanisms in Bacteria 123 Eukaryotic Gene Expression Several Features Distinguish Eukaryotic Processes
More informationNetwork motifs in the transcriptional regulation network (of Escherichia coli):
Network motifs in the transcriptional regulation network (of Escherichia coli): Janne.Ravantti@Helsinki.Fi (disclaimer: IANASB) Contents: Transcription Networks (aka. The Very Boring Biology Part ) Network
More informationBiological Networks. Gavin Conant 163B ASRC
Biological Networks Gavin Conant 163B ASRC conantg@missouri.edu 882-2931 Types of Network Regulatory Protein-interaction Metabolic Signaling Co-expressing General principle Relationship between genes Gene/protein/enzyme
More informationBi 8 Lecture 11. Quantitative aspects of transcription factor binding and gene regulatory circuit design. Ellen Rothenberg 9 February 2016
Bi 8 Lecture 11 Quantitative aspects of transcription factor binding and gene regulatory circuit design Ellen Rothenberg 9 February 2016 Major take-home messages from λ phage system that apply to many
More informationComputational Cell Biology Lecture 4
Computational Cell Biology Lecture 4 Case Study: Basic Modeling in Gene Expression Yang Cao Department of Computer Science DNA Structure and Base Pair Gene Expression Gene is just a small part of DNA.
More informationGraph structure learning for network inference
Graph structure learning for network inference Sushmita Roy sroy@biostat.wisc.edu Computa9onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu
More informationCSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery
CSE 527 Autumn 2006 Lectures 8-9 (& part of 10) Motifs: Representation & Discovery 1 DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%,
More informationComputational Genomics. Reconstructing dynamic regulatory networks in multiple species
02-710 Computational Genomics Reconstructing dynamic regulatory networks in multiple species Methods for reconstructing networks in cells CRH1 SLT2 SLR3 YPS3 YPS1 Amit et al Science 2009 Pe er et al Recomb
More informationDNA Binding Proteins CSE 527 Autumn 2007
DNA Binding Proteins CSE 527 Autumn 2007 A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%, of all human proteins) modulate transcription of protein coding
More informationPROTEIN SYNTHESIS: TRANSLATION AND THE GENETIC CODE
PROTEIN SYNTHESIS: TRANSLATION AND THE GENETIC CODE HLeeYu Jsuico Junsay Department of Chemistry School of Science and Engineering Ateneo de Manila University 1 Nucleic Acids are important for their roles
More informationBME 5742 Biosystems Modeling and Control
BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various
More informationWritten Exam 15 December Course name: Introduction to Systems Biology Course no
Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate
More informationStatistical analysis of genomic binding sites using high-throughput ChIP-seq data
Statistical analysis of genomic binding sites using high-throughput ChIP-seq data Ibrahim Ali H Nafisah Department of Statistics University of Leeds Submitted in accordance with the requirments for the
More informationJoint modelling of ChIP-seq data via a Markov random field model
Joint modelling of ChIP-seq data via a Markov random field model arxiv:1306.4438v1 [stat.me] 19 Jun 2013 Y. Bao 1, V. Vinciotti 1,, E. Wit 2 and P. t Hoen 3,4 1 School of Information Systems, Computing
More informationStatistics for Differential Expression in Sequencing Studies. Naomi Altman
Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationNetwork Biology-part II
Network Biology-part II Jun Zhu, Ph. D. Professor of Genomics and Genetic Sciences Icahn Institute of Genomics and Multi-scale Biology The Tisch Cancer Institute Icahn Medical School at Mount Sinai New
More informationLatent Variable models for GWAs
Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs
More informationMeasuring TF-DNA interactions
Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF
More informationInferring Transcriptional Regulatory Networks from High-throughput Data
Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More informationNeyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?
Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems
More informationControl of Gene Expression in Prokaryotes
Why? Control of Expression in Prokaryotes How do prokaryotes use operons to control gene expression? Houses usually have a light source in every room, but it would be a waste of energy to leave every light
More informationCHAPTER : Prokaryotic Genetics
CHAPTER 13.3 13.5: Prokaryotic Genetics 1. Most bacteria are not pathogenic. Identify several important roles they play in the ecosystem and human culture. 2. How do variations arise in bacteria considering
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationGLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data
GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationFlow of Genetic Information
presents Flow of Genetic Information A Montagud E Navarro P Fernández de Córdoba JF Urchueguía Elements Nucleic acid DNA RNA building block structure & organization genome building block types Amino acid
More informationLecture 18 June 2 nd, Gene Expression Regulation Mutations
Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase
More informationBiology I Fall Semester Exam Review 2014
Biology I Fall Semester Exam Review 2014 Biomolecules and Enzymes (Chapter 2) 8 questions Macromolecules, Biomolecules, Organic Compunds Elements *From the Periodic Table of Elements Subunits Monomers,
More informationNumber of questions TEK (Learning Target) Biomolecules & Enzymes
Unit Biomolecules & Enzymes Number of questions TEK (Learning Target) on Exam 8 questions 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how
More informationRNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA
RNA & PROTEIN SYNTHESIS Making Proteins Using Directions From DNA RNA & Protein Synthesis v Nitrogenous bases in DNA contain information that directs protein synthesis v DNA remains in nucleus v in order
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More information