Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions)
|
|
- Marion Burke
- 5 years ago
- Views:
Transcription
1 Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions) Computational Genomics Course Cold Spring Harbor Labs Oct 31, 2016 Gary D. Stormo Department of Genetics Outline Modeling specificity with a position weight matrix (PWM) General features Limitations and extensions How to set the weights General ideas, some history Using high-throughput experimental data Using in vivo location data (Chip-chip/seq) 1
2 {Sites} Terminology: Sites vs Motifs Motif Think restriction sites EcoRI: {GAATTC} GAATTC HincII {GTTAAC,GTTGAC,GTCAAC,GTCGAC} GTYRAC Transcription factor motifs should be quantitative, give different scores to different sites, reflecting differences in binding affinity Also: site is specific location in genome Representations/Models of Protein-DNA binding Transcription factors don t bind to just one sequence A Consensus sequence is usually the preferred site, but similar sequences also bind well Not all variants bind equally well; some positions contribute more to the specificity than others 2
3 Position Weight Matrix Model (PWM, also PSSM) A: C: G: T: PWM Model Score = -24.A C T A T A A T G T A: C: G: T:
4 PWM Model Score = 43.A C T A T A A T G T A: C: G: T: A: C: G: T:
5 PWM Model A: C: G: T: PWM is a generalization of consensus sequence. There is NO advantage in consensus sequences. Given a consensus sequence one can define a PWM and a threshold that will return the same sites. PWM Model A: C: G: T: Score( S ) W S i PWM is a linear model: S i encodes the sequence (which base occurs at each position) W weights those encoded features to provide the score Easy to add more features if they are necessary i 5
6 Two important issues need to be addressed Parameter estimation: Where do the matrix elements come from? Different types of data lead to different methods of parameter estimation. Additivity: do the positions really contribute independently to the binding interaction? If not, how to we extend the model? Complete binding energy list vs model. 6
7 If simple additive model is inadequate, can use di-nucleotide or higher-order models. Some form of a matrix model must be correct because binding the binding data itself is a matrix (vector) AA AC AG AT TT AAA AAC AAG AAT TTT Alternative approach to higher-order contributions: structure parameters Maybe the non-additivity is due to structural preferences known to be dependent on nearest neighbor bases (or longer) May capture context effects with fewer parameters For example, see work by Rohs and colleagues: Covariation between homeodomain transcription factors and the shape of their DNA binding sites. Nucleic Acids Res : TFBSshape: a motif database for DNA shape features of transcription factor binding sites. Nucleic Acids Res (Database issue):d Quantitative modeling of transcription factor binding specificities using DNA shape. Proc Natl Acad Sci U S A (15): DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo. Cell Syst Sep 28;3(3):
8 How to Set the Matrix Elements Statistical treatment of known sites. Need a reasonable sample size. Some assumptions about how the sample is obtained. - probabilistic model is easy, can be accurate if assumptions are reasonable Quantitative binding data: determine matrix parameters that provide the best fit. - Has been laborious and slow experimental work, but new technologies make this much easier Modeling based on known sites N(b,i) PFM or PPM Note: some papers and programs call this a PWM Log-odds PWM F(b,i) W(b,i) = log[f(b,i)/p(b)] I(i) = F(b,i)W(b,i) 8
9 Classic Logo (from Tom Schneider): Height of column at each position is Information Content Each base in proportion to its frequency Likelihood Ratio Statistics Primer Given two probability distributions P i and Q I P i = Q i = 1 And some data, D i, which is number of times each type i is observed in N total observations The Likelihood Ratio of the data being from distribution Q i versus P i is: LR = (Q i /P i ) Di And the log-likelihood Ratio is LLR = D i ln (Q i /P i ) 9
10 LLR = D i ln (Q i /P i ) Maximum likelihood distribution is Q i = D i /N So max LLR = N Q i ln (Q i /P i ) Q i ln (Q i /P i ) 0 Information Content Relative Entropy Kullbach-Liebler Distance Related to G-statistic and χ 2 Modeling from experimental data From single binding site experiments to highthroughput methods that allow for the determination of specificity (relative affinity) across all possible sequences at once 10
11 Quantitative Binding Affinity of TF for one sequence Specificity Refers to the relative Affinity to different Sequences, ideally to All sequences 11
12 High-throughput experimental methods to Measure TF specificity Specificity Modeling High-throughput in vitro binding site analyses Can give good, quantitative models of intrinsic binding specificity More data alone isn t sufficient to give better models, also need good analysis methods Log-odds method is based on assumptions that may not be true Energetic models can give better descriptions Non-linear relationship between binding affinity and binding probability at high TF concentration 12
13 Log-odds method is equivalent to an energy model if the sites are from a Boltzmann distribution with binding probability e E posterior prior energy E F( S ) P( S ) e i / Z i i E i FS ( i ) ln PS ( ) i Log-odds relationship between binding energy and frequencies Reality is a Fermi-Dirac distribution with Boltzmann a special case at the low concentration range Djordjevic et al, Genome Res :
14 GTGGA vs ATGGA E G -E A =2kT GTGTA vs ATGTA Additive changes in binding energy have non-independent (context dependent) effects on binding probability Probabilities no longer factor, even though energies are additive HT-SELEX (SELEX-Seq) min [ N( S ) n( S )] i a N( Si) b n( Si) data WS i 1 e Parameters to fit: a, b, W, μ i i 2 14
15 Fit of model to HT-SELEX data for zif268 BEEML vs BioProspector Zhao et al, PLoS Comp Bio, 2009 Protein Binding Microarray (PBM) 15
16 Example of Plag1 using BEEML-PBM Zhao and Stormo, Nature Biotechnol : Nat Biotechnol : Most TFs (~90%) fit well by PWMs BEEML-PBM among the best methods Some do better with di-nucmodels A few require multiple modes of interaction Best models fit in vivo data as well as in vivo-derived models 16
17 Diverse sets: >100 TFs ~20 TFs ~240 TFs Weirauch et al >1000 TFs Bacterial-1-Hybrid (B1H) 17
18 B1H on zif268 returns the expected model 18
19 Average Prediction Accuracy for ZFPs HT-SELEX (SELEX-Seq) P(S i b) P(S i ) 1 1+e E i μ Compared to reference sequence with E = 0 P S i b P S i P S ref b = P S ref 1+e μ 1+e E i μ 19
20 Spec-seq (specificity by sequencing) P(S i b) P(S i u) = eμ E i Compared to reference sequence with E = 0 P S ref b P S ref P S i b = e E i ln P S i P S ref b P S ref P S i b P S i = E i Spec-seq: Specificity by sequencing P + S i P S i K A (S i ) = [P S i] P [S i ] K A S 1 : K A S 2 : : K A S n = P S 1 S 1 : P S 2 S 2 : : P S n S n 20
21 Specificity of the Lac repressor WT operator is asymmetric 4 libraries: vary both sequence and spacing 2560 different binding sites Highly reproducible: ~5% variance in affinity ~0.1kT variance in energy Zuo and Stormo, Genetics, 2014 Three dimensional structure of the dimeric lac HP62 O1 operator complex. Kalodimos C G et al. EMBO J. 2002;21: by European Molecular Biology Organization 21
22 No motif for half of all human TFs Most are C2H2 zinc finger proteins Laura Campitelli No motif for half of all human TFs Most are C2H2 zinc finger proteins No motif (809) Known motif (637) Close ortholog/paralog has motif (219) Human all TFs (1,665) Not tried/ no data (37) C2H2 with No motif (573) Human no motif (809) Needs heterodimerization partner (56) Possibly not sequencespecific (143) No motif (573) Human all C2H2s (714) Known motif Close (97) ortholog/par alog has motif (44) Matt Weirauch 22
23 ZF specificity prediction Use three programs: ours, One from Princeton group, One from Toronto group The Logos look pretty different but that is largely quantitative, and there are many high IC positions of agreement. By averaging the PFMs one can obtain a consensus sequence that agrees pretty well with all three. Spec-seq randomizations: Reverse Consensus (30bp) TCTTGATGATGCTGCAATATTAATAATTTA Consensus is good enough to show shift in EMSA. So we randomized five adjacent positions at a time, generating 6 libraries of 1024 sequences. Merged Logo shows overall good match with consensus and provides quantitative predictions about binding energy contributions. 23
24 Spec-seq motif matches well with motif obtained from in vivo recombination hotspots and using Affinity-seq method Affinity-seq pulls out genomic DNA fragments in vitro and sequences them without Amplification. Affinity-seq motif Hotspot motif Can also be easily adapted to study CpG methylation sensitivity ZFP57 involved in Imprinting Maintenance Has 2 ZF clusters, one binds TGCCGC, prefers mcpg 3 libraries with random regions and methylation variants 24
25 Spec-seq for combinatorial binding can get all of the important parameters in one experiment, including cooperativity K X x 1 : K X x 2 : : K X x n = N x 1 B X, N x 1 B, : N x 2 B X, N x 2 B, : : N(x n B X, ) N(x n B, ) K X Y x 1 : K X Y x 2 : : K X Y x n = N x 1 B X,Y N x 1 B,Y : N x 2 B X,Y N x 2 B,Y : : N(x n B X,Y ) N(x n B,Y ) ω i = K X Y S i K X S i = K Y X S i K Y S i = K X,Y S i K X S i K Y S i Stormo, Zuo, Chang, Briefings in Functional Genomics, 2015 Conclusions: 1. Different types of high-throughput data can be used to obtain good specificity models; good analysis methods are critical 2. PWMs are often (usually?) good approximations, but higher order models can be obtained if needed Specificity Modeling 25
26 Discovery of Binding Motifs from in vivo data Datatypes for Motif Discovery Co-regulated genes Genetic studies (deletion, over-expression effects) Expression analysis (microarrays, RNA-Seq) Co-bound regions ChIP-chip/-Seq location analysis Phylogenetic analysis, conservation across species phylogenetic footprinting Can be combined with multigene analysis, even over the whole genome Goal: Find the most significant pattern in common Can t look at all possible alignments too many In vitro analysis methods don t work; assumptions not valid Outline of problem 26
27 Example dataset: promoter region from co-regulated genes CE1CG \TAATGTTTGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGCGTGGTGTGAAAGACTGTTTTTTTGATCGTTTTCACAAAAATGGAAGTCCACAGTCTTGACAG\ ECOARABOP \GACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAG\ ECOBGLR1 \ACAAATCCCAATAACTTAATTATTGGGATTTGTTATATATAACTTTATAAATTCCTAAAATTACACAAAGTTAATAACTGTGAGCATGGTCATATTTTTATCAAT\ ECOCRP \CACAAAGCGAAAGCTATGCTAAAACAGTCAGGATGCTACAGTAATACATTGATGTACTGCATGTATGCAAAGGACGTCACATTACCGTGCAGTACAGTTGATAGC\ ECOCYA \ACGGTGCTACACTTGTATGTAGCGCATCTTTCTTTACGGTCAATCAGCAAGGTGTTAAATTGATCACGTTTTAGACCATTTTTTCGTCGTGAAACTAAAAAAACC\ ECODEOP2 \AGTGAATTATTTGAACCAGATCGCATTACAGTGATGCAAACTTGTAAGTAGATTTCCTTAATTGTGATGTGTATCGAAGTGTGTTGCGGAGTAGATGTTAGAATA\ ECOGALE \GCGCATAAAAAACGGCTAAATTCTTGTGTAAACGATTCCACTAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATGCTATGGTTATTTCATACCATAAGCC\ ECOILVBPR \GCTCCGGCGGGGTTTTTTGTTATCTGCAATTCAGTACAAAACGTGATCAACCCCTCAATTTTCCCTTTGCTGAAAAATTTTCCATTGTCTCCCCTGTAAAGCTGT\ ECOLAC \AACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCAC\ ECOMALBA \ACATTACCGCCAATTCTGTAACAGAGATCACACAAAGCGACGGTGGGGCGTAGGGGCAAGGAGGATGGAAAGAGGTTGCCGTATAAAGAAACTAGAGTCCGTTTA\ ECOMALBA \GGAGGAGGCGGGAGGATGAGAACACGGCTTCTGTGAACTAAACCGAGGTCATGTAAGGAATTTCGTGATGTTGCTTGCAAAAATCGTGGCGATTTTATGTGCGCA\ ECOMALT \GATCAGCGTCGTTTTAGGTGAGTTGTTAATAAAGATTTGGAATTGTGACACAGTGCAAATTCAGACACATAAAAAAACGTCATCGCTTGCATTAGAAAGGTTTCT\ ECOOMPA \GCTGACAAAAAAGATTAAACATACCTTATACAAGACTTTTTTTTCATATGCCTGACGGAGTTCACACTTGTAAGTTTTCAACTACGTTGTAGACTTTACATCGCC\ ECOTNAA \TTTTTTAAACATTAAAATTCTTACGTAATTTATAATCTTTAAAAAAAGCATTTAATATTGCTCCCCGAACGATTGTGATTCGATTCACATTTAAACAATTTCAGA\ ECOUXU1 \CCCATGAGAGTGAAATTGTTGTGATGTGGTTAACCCAATTAGAATTCGGGATTGACATGTCTTACCAAAAGGTAGAACTTATACGCCATCTCATCCGATGCAAGC\ PBR322 \CTGGCTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCTC\ TRN9CAT \CTGTGACGGAAGATCACTTCGCAGAATAAATAAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGACGTTGATCGGCACG\ TDC \GATTTTTATACTTTAACTTGTTGATATTTAAAGGTATTTAATTGTAATAACGATACTCTGGAAAGTATTGAAAGTTAATTTGTGAGTGGTCGCACATATCCTGTT\ 27
28 Expectation Maximization (EM) Approach to Motif Discovery Basic Idea: - Given sites, estimate PWM (log-odds model) - Given PWM, pick likely sites according to their probability - Make initial guess, then iterate between those steps until convergence Algorithm: Initial PWM from average of all possible sites Using current PWM estimate probability of each position being site; make new PWM from weighted average of all sites Iterate to convergence; usually fast, no guarantee of optimal Gibbs Sampling Approach to Motif Discovery Same Basic Idea: - Given sites, estimate PWM (log-odds model) - Given PWM, pick likely sites according to their probability - Iterate between those steps until convergence Algorithm: Pick 1 site from N-1 sequences, make PWM Use pseudocounts to avoid prob. = 0 Use current PWM to pick site from left out sequence by sampling from probability disturbition; update PWM Iterate to convergence; run multiple times, compare results; still no guarantee of optimal but avoids local optima often obtained with EM 28
29 From Lawrence et al, (1993) Science 1 262: Motif discovery from co-regulated genes Single species Multiple species A B 29
30 Alignment of conserved regions Alignment of profiles Example Leu3 YGL125W S. cerevisiae GAAAAAATAACAGCGACTTTTCTCCCGGTAGCGGGCCGTCGTTTAGTCATTCTATCCCTC S. mikatae AAAACATAACAGCGAATTTTCCTCCCGGTAGCGGGCCTTCGTTTAGTCATTCTCTCTCTT S. bayanus AAAAAATAACAGCGACTTTTCCCCCCGGTAGCGGGCCGTCGTTTAGTCATTCTCTCTCCC S. kudriavzevii GAAAAAAAACAACGGCGGCCTCCCCCGGTAGCGGGCCGTCGTTTAGTCATTCTCTCTCTC ***** **** ** *** * ************************************* YOR108W S. cerevisiae GCCATCATGGTCCGGTAACGGTCGTAGTGAATGACTCATATTTTTCCATCTCTTT S. mikatae GCCATCAAGGTCCGGTAACGGTCGTAGTGAATGACTCACATTTTCTTCGTTATTC S. bayanus ACCATTACGGTCCGGTAACGGACTTAGTGAATGATTCATCTTTTCTTCTTTTTTC S. kudriavzevii GTCGTTAAGGTCCGGTAACGGCCCTCAGCGAATGATTCATAATTTCATTTTTTTC ***** * ************* * ********** *** **** *** *** YMR108W S. cerevisiae AACGCCTAGCCGCCGGAGCCTGCCGGTACCGGCTTGGCTTCAGTTGCTGATCTCGG S. mikatae CACAATGACACATACCTAACAGCCGGTACCGGCTTGAATGCCGCCGTTGGCTTCGG S. bayanus ATCTTCTAGTCACCGCAGTCTGCCGGTACCGGCTTGAATTCCGCCGTTGATCCTGG S. kudriavzevii CACATCTCTAGTCCGCGCTCTGCCGGTACCGGCTTAGACTAGCCACGAATCTCGGC ** *** * **** ***************** **** ** * ** ** YGL125W YOR108W YMR108W A C G T A C G T A C G T Wang and Stormo, Bioinformatics 2003 Even whole genome search for conserved, multi-copy elements (eg. PhyloNet) Wang and Stormo, PNAS
Discovering MultipleLevels of Regulatory Networks
Discovering MultipleLevels of Regulatory Networks IAS EXTENDED WORKSHOP ON GENOMES, CELLS, AND MATHEMATICS Hong Kong, July 25, 2018 Gary D. Stormo Department of Genetics Outline of the talk 1. Transcriptional
More informationChapter 7: Regulatory Networks
Chapter 7: Regulatory Networks 7.2 Analyzing Regulation Prof. Yechiam Yemini (YY) Computer Science Department Columbia University The Challenge How do we discover regulatory mechanisms? Complexity: hundreds
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationMatrix-based pattern discovery algorithms
Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
More informationAlignment. Peak Detection
ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie
More informationDe novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes
De novo identification of motifs in one species Modified from Serafim Batzoglou s lecture notes Finding Regulatory Motifs... Given a collection of genes that may be regulated by the same transcription
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationDNA Binding Proteins CSE 527 Autumn 2007
DNA Binding Proteins CSE 527 Autumn 2007 A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%, of all human proteins) modulate transcription of protein coding
More informationMeasuring TF-DNA interactions
Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF
More informationCSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery
CSE 527 Autumn 2006 Lectures 8-9 (& part of 10) Motifs: Representation & Discovery 1 DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%,
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationTranscrip:on factor binding mo:fs
Transcrip:on factor binding mo:fs BMMB- 597D Lecture 29 Shaun Mahony Transcrip.on factor binding sites Short: Typically between 6 20bp long Degenerate: TFs have favorite binding sequences but don t require
More informationCSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators
CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationCSEP 590A Summer Lecture 4 MLE, EM, RE, Expression
CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationSequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.
Sequence Analysis, '18 -- lecture 9 Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. How can I represent thousands of homolog sequences in a compact
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationNeyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?
Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationPosition-specific scoring matrices (PSSM)
Regulatory Sequence nalysis Position-specific scoring matrices (PSSM) Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d ix-marseille, France Technological dvances for Genomics and Clinics
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More informationA Combined Motif Discovery Method
University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 8-6-2009 A Combined Motif Discovery Method Daming Lu University of New Orleans Follow
More informationBioinformatics 2 - Lecture 4
Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what
More informationBi 8 Lecture 11. Quantitative aspects of transcription factor binding and gene regulatory circuit design. Ellen Rothenberg 9 February 2016
Bi 8 Lecture 11 Quantitative aspects of transcription factor binding and gene regulatory circuit design Ellen Rothenberg 9 February 2016 Major take-home messages from λ phage system that apply to many
More informationThe geneticist s questions
The geneticist s questions a) What is consequence of reduced gene function? 1) gene knockout (deletion, RNAi) b) What is the consequence of increased gene function? 2) gene overexpression c) What does
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationQB LECTURE #4: Motif Finding
QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin
More informationGene regulation: From biophysics to evolutionary genetics
Gene regulation: From biophysics to evolutionary genetics Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen Johannes Berg Stana Willmann Curt Callan (Princeton)
More informationQuantitative Bioinformatics
Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize
More informationMEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY
Command line Training Set First Motif Summary of Motifs Termination Explanation MEME - Motif discovery tool MEME version 3.0 (Release date: 2002/04/02 00:11:59) For further information on how to interpret
More informationInferring Protein-Signaling Networks
Inferring Protein-Signaling Networks Lectures 14 Nov 14, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationBinding of transcription factors adapts to resolve information-energy tradeoff
Binding of transcription factors adapts to resolve information-energy tradeoff Yonatan Savir 1,*, Jacob Kagan 2 and Tsvi Tlusty 3,* 1 Department of Systems Biology, Harvard Medical School, Boston, 02115,
More informationDeciphering regulatory networks by promoter sequence analysis
Bioinformatics Workshop 2009 Interpreting Gene Lists from -omics Studies Deciphering regulatory networks by promoter sequence analysis Elodie Portales-Casamar University of British Columbia www.cisreg.ca
More informationSupplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM)
Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM) The data sets alpha30 and alpha38 were analyzed with PNM (Lu et al. 2004). The first two time points were deleted to alleviate
More informationAP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide
Name: Period: Date: AP Bio Module 6: Bacterial Genetics and Operons, Student Learning Guide Getting started. Work in pairs (share a computer). Make sure that you log in for the first quiz so that you get
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationLecture 4: Transcription networks basic concepts
Lecture 4: Transcription networks basic concepts - Activators and repressors - Input functions; Logic input functions; Multidimensional input functions - Dynamics and response time 2.1 Introduction The
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More informationSupporting Information
Supporting Information Weghorn and Lässig 10.1073/pnas.1210887110 SI Text Null Distributions of Nucleosome Affinity and of Regulatory Site Content. Our inference of selection is based on a comparison of
More informationGene expression in prokaryotic and eukaryotic cells, Plasmids: types, maintenance and functions. Mitesh Shrestha
Gene expression in prokaryotic and eukaryotic cells, Plasmids: types, maintenance and functions. Mitesh Shrestha Plasmids 1. Extrachromosomal DNA, usually circular-parasite 2. Usually encode ancillary
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationRNA Synthesis and Processing
RNA Synthesis and Processing Introduction Regulation of gene expression allows cells to adapt to environmental changes and is responsible for the distinct activities of the differentiated cell types that
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationIntro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye!
Today Gene regulation Synteny Good bye! Gene regulation What governs gene transcription? Genes active under different circumstances. Gene regulation What governs gene transcription? Genes active under
More informationGLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data
GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between
More informationOutline CSE 527 Autumn 2009
Outline CSE 527 Autumn 2009 5 Motifs: Representation & Discovery Previously: Learning from data MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) These Slides: Bio: Expression
More informationBi 1x Spring 2014: LacI Titration
Bi 1x Spring 2014: LacI Titration 1 Overview In this experiment, you will measure the effect of various mutated LacI repressor ribosome binding sites in an E. coli cell by measuring the expression of a
More informationDifferent gene regulation strategies revealed by analysis of binding motifs
Different gene regulation strategies revealed by analysis of binding motifs The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/8/07 CAP5510 1 Pattern Discovery 2/8/07 CAP5510 2 Patterns Nature
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationWhat is the expectation maximization algorithm?
primer 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou The expectation maximization algorithm arises
More informationThe Physical Language of Molecules
The Physical Language of Molecules How do molecular codes emerge and evolve? International Workshop on Bio Soft Matter Tokyo, 2008 Biological information is carried by molecules Self replicating information
More informationGene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji
Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC
More informationMotifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.
Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer
More informationLecture 18 June 2 nd, Gene Expression Regulation Mutations
Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase
More informationChapter 9 DNA recognition by eukaryotic transcription factors
Chapter 9 DNA recognition by eukaryotic transcription factors TRANSCRIPTION 101 Eukaryotic RNA polymerases RNA polymerase RNA polymerase I RNA polymerase II RNA polymerase III RNA polymerase IV Function
More informationGoals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions
Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Jamie Duke 1,2 and Carlos Camacho 3 1 Bioengineering and Bioinformatics Summer Institute,
More informationChapter 15 Active Reading Guide Regulation of Gene Expression
Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,
More informationEM-algorithm for motif discovery
EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width
More informationGeert Geeven. April 14, 2010
iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationINTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA
INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationCHAPTER : Prokaryotic Genetics
CHAPTER 13.3 13.5: Prokaryotic Genetics 1. Most bacteria are not pathogenic. Identify several important roles they play in the ecosystem and human culture. 2. How do variations arise in bacteria considering
More informationWhole-genome analysis of GCN4 binding in S.cerevisiae
Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?
More information56:198:582 Biological Networks Lecture 8
56:198:582 Biological Networks Lecture 8 Course organization Two complementary approaches to modeling and understanding biological networks Constraint-based modeling (Palsson) System-wide Metabolism Steady-state
More informationProteomics Systems Biology
Dr. Sanjeeva Srivastava IIT Bombay Proteomics Systems Biology IIT Bombay 2 1 DNA Genomics RNA Transcriptomics Global Cellular Protein Proteomics Global Cellular Metabolite Metabolomics Global Cellular
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationAlgorithms in Bioinformatics II SS 07 ZBIT, C. Dieterich, (modified script of D. Huson), April 25,
Algorithms in Bioinformatics II SS 07 ZBIT, C. Dieterich, (modified script of D. Huson), April 25, 200707 Motif Finding This exposition is based on the following sources, which are all recommended reading:.
More informationFundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs
Fundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs Zeba Wunderlich 1* and Leonid A. Mirny 1,2 1 Biophysics Program, Harvard
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationA genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12
The integration host factor regulon of E. coli K12 genome 783 A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12 M. Trindade dos Santos and
More informationQuantitative modeling and data analysis of SELEX experiments
Quantitative modeling and data analysis of SELEX experiments Marko Djordjevic, 1,2, and Anirvan M. Sengupta 3 1 Department of Physics, Columbia University, New York, NY 10027 2 Mathematical Biosciences
More informationGS 559. Lecture 12a, 2/12/09 Larry Ruzzo. A little more about motifs
GS 559 Lecture 12a, 2/12/09 Larry Ruzzo A little more about motifs 1 Reflections from 2/10 Bioinformatics: Motif scanning stuff was very cool Good explanation of max likelihood; good use of examples (2)
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Kinetics of Gene Regulation COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 The simplest model of gene expression involves only two steps: the
More informationDNA/RNA structure and packing
DNA/RNA structure and packing Reminder: Nucleic acids one oxygen atom distinguishes RNA from DNA, increases reactivity (so DNA is more stable) base attaches at 1, phosphate at 5 purines pyrimidines Replace
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationGenome 541! Unit 4, lecture 3! Genomics assays
Genome 541! Unit 4, lecture 3! Genomics assays Much easier to follow with slides. Good pace.! Having the slides was really helpful clearer to read and easier to follow the trajectory of the lecture.!!
More informationCSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:
Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2
More information3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.
3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of
More informationStephen Scott.
1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative
More informationBayesian Clustering of Multi-Omics
Bayesian Clustering of Multi-Omics for Cardiovascular Diseases Nils Strelow 22./23.01.2019 Final Presentation Trends in Bioinformatics WS18/19 Recap Intermediate presentation Precision Medicine Multi-Omics
More informationJMJ14-HA. Col. Col. jmj14-1. jmj14-1 JMJ14ΔFYR-HA. Methylene Blue. Methylene Blue
Fig. S1 JMJ14 JMJ14 JMJ14ΔFYR Methylene Blue Col jmj14-1 JMJ14-HA Methylene Blue Col jmj14-1 JMJ14ΔFYR-HA Fig. S1. The expression level of JMJ14 and truncated JMJ14 with FYR (FYRN + FYRC) domain deletion
More informationAmino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)
Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings
More informationSequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,
Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationSequence motif analysis
Sequence motif analysis Alan Moses Associate Professor and Canada Research Chair in Computational Biology Departments of Cell & Systems Biology, Computer Science, and Ecology & Evolutionary Biology Director,
More informationThe geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia
From Wikipedia, the free encyclopedia Functional genomics..is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects)
More informationAn overview of deep learning methods for genomics
An overview of deep learning methods for genomics Matthew Ploenzke STAT115/215/BIO/BIST282 Harvard University April 19, 218 1 Snapshot 1. Brief introduction to convolutional neural networks What is deep
More informationThree types of RNA polymerase in eukaryotic nuclei
Three types of RNA polymerase in eukaryotic nuclei Type Location RNA synthesized Effect of α-amanitin I Nucleolus Pre-rRNA for 18,.8 and 8S rrnas Insensitive II Nucleoplasm Pre-mRNA, some snrnas Sensitive
More informationCSEP 590B Fall Motifs: Representation & Discovery
CSEP 590B Fall 2014 5 Motifs: Representation & Discovery 1 Outline Previously: Learning from data MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) These Slides: Bio: Expression
More informationGenome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics
Genome 541 Unit 4, lecture 2 Transcription factor binding using functional genomics Slides vs chalk talk: I m not sure why you chose a chalk talk over ppt. I prefer the latter no issues with readability
More information5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT
5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:
More information