Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions)

Size: px
Start display at page:

Download "Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions)"

Transcription

1 Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions) Computational Genomics Course Cold Spring Harbor Labs Oct 31, 2016 Gary D. Stormo Department of Genetics Outline Modeling specificity with a position weight matrix (PWM) General features Limitations and extensions How to set the weights General ideas, some history Using high-throughput experimental data Using in vivo location data (Chip-chip/seq) 1

2 {Sites} Terminology: Sites vs Motifs Motif Think restriction sites EcoRI: {GAATTC} GAATTC HincII {GTTAAC,GTTGAC,GTCAAC,GTCGAC} GTYRAC Transcription factor motifs should be quantitative, give different scores to different sites, reflecting differences in binding affinity Also: site is specific location in genome Representations/Models of Protein-DNA binding Transcription factors don t bind to just one sequence A Consensus sequence is usually the preferred site, but similar sequences also bind well Not all variants bind equally well; some positions contribute more to the specificity than others 2

3 Position Weight Matrix Model (PWM, also PSSM) A: C: G: T: PWM Model Score = -24.A C T A T A A T G T A: C: G: T:

4 PWM Model Score = 43.A C T A T A A T G T A: C: G: T: A: C: G: T:

5 PWM Model A: C: G: T: PWM is a generalization of consensus sequence. There is NO advantage in consensus sequences. Given a consensus sequence one can define a PWM and a threshold that will return the same sites. PWM Model A: C: G: T: Score( S ) W S i PWM is a linear model: S i encodes the sequence (which base occurs at each position) W weights those encoded features to provide the score Easy to add more features if they are necessary i 5

6 Two important issues need to be addressed Parameter estimation: Where do the matrix elements come from? Different types of data lead to different methods of parameter estimation. Additivity: do the positions really contribute independently to the binding interaction? If not, how to we extend the model? Complete binding energy list vs model. 6

7 If simple additive model is inadequate, can use di-nucleotide or higher-order models. Some form of a matrix model must be correct because binding the binding data itself is a matrix (vector) AA AC AG AT TT AAA AAC AAG AAT TTT Alternative approach to higher-order contributions: structure parameters Maybe the non-additivity is due to structural preferences known to be dependent on nearest neighbor bases (or longer) May capture context effects with fewer parameters For example, see work by Rohs and colleagues: Covariation between homeodomain transcription factors and the shape of their DNA binding sites. Nucleic Acids Res : TFBSshape: a motif database for DNA shape features of transcription factor binding sites. Nucleic Acids Res (Database issue):d Quantitative modeling of transcription factor binding specificities using DNA shape. Proc Natl Acad Sci U S A (15): DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo. Cell Syst Sep 28;3(3):

8 How to Set the Matrix Elements Statistical treatment of known sites. Need a reasonable sample size. Some assumptions about how the sample is obtained. - probabilistic model is easy, can be accurate if assumptions are reasonable Quantitative binding data: determine matrix parameters that provide the best fit. - Has been laborious and slow experimental work, but new technologies make this much easier Modeling based on known sites N(b,i) PFM or PPM Note: some papers and programs call this a PWM Log-odds PWM F(b,i) W(b,i) = log[f(b,i)/p(b)] I(i) = F(b,i)W(b,i) 8

9 Classic Logo (from Tom Schneider): Height of column at each position is Information Content Each base in proportion to its frequency Likelihood Ratio Statistics Primer Given two probability distributions P i and Q I P i = Q i = 1 And some data, D i, which is number of times each type i is observed in N total observations The Likelihood Ratio of the data being from distribution Q i versus P i is: LR = (Q i /P i ) Di And the log-likelihood Ratio is LLR = D i ln (Q i /P i ) 9

10 LLR = D i ln (Q i /P i ) Maximum likelihood distribution is Q i = D i /N So max LLR = N Q i ln (Q i /P i ) Q i ln (Q i /P i ) 0 Information Content Relative Entropy Kullbach-Liebler Distance Related to G-statistic and χ 2 Modeling from experimental data From single binding site experiments to highthroughput methods that allow for the determination of specificity (relative affinity) across all possible sequences at once 10

11 Quantitative Binding Affinity of TF for one sequence Specificity Refers to the relative Affinity to different Sequences, ideally to All sequences 11

12 High-throughput experimental methods to Measure TF specificity Specificity Modeling High-throughput in vitro binding site analyses Can give good, quantitative models of intrinsic binding specificity More data alone isn t sufficient to give better models, also need good analysis methods Log-odds method is based on assumptions that may not be true Energetic models can give better descriptions Non-linear relationship between binding affinity and binding probability at high TF concentration 12

13 Log-odds method is equivalent to an energy model if the sites are from a Boltzmann distribution with binding probability e E posterior prior energy E F( S ) P( S ) e i / Z i i E i FS ( i ) ln PS ( ) i Log-odds relationship between binding energy and frequencies Reality is a Fermi-Dirac distribution with Boltzmann a special case at the low concentration range Djordjevic et al, Genome Res :

14 GTGGA vs ATGGA E G -E A =2kT GTGTA vs ATGTA Additive changes in binding energy have non-independent (context dependent) effects on binding probability Probabilities no longer factor, even though energies are additive HT-SELEX (SELEX-Seq) min [ N( S ) n( S )] i a N( Si) b n( Si) data WS i 1 e Parameters to fit: a, b, W, μ i i 2 14

15 Fit of model to HT-SELEX data for zif268 BEEML vs BioProspector Zhao et al, PLoS Comp Bio, 2009 Protein Binding Microarray (PBM) 15

16 Example of Plag1 using BEEML-PBM Zhao and Stormo, Nature Biotechnol : Nat Biotechnol : Most TFs (~90%) fit well by PWMs BEEML-PBM among the best methods Some do better with di-nucmodels A few require multiple modes of interaction Best models fit in vivo data as well as in vivo-derived models 16

17 Diverse sets: >100 TFs ~20 TFs ~240 TFs Weirauch et al >1000 TFs Bacterial-1-Hybrid (B1H) 17

18 B1H on zif268 returns the expected model 18

19 Average Prediction Accuracy for ZFPs HT-SELEX (SELEX-Seq) P(S i b) P(S i ) 1 1+e E i μ Compared to reference sequence with E = 0 P S i b P S i P S ref b = P S ref 1+e μ 1+e E i μ 19

20 Spec-seq (specificity by sequencing) P(S i b) P(S i u) = eμ E i Compared to reference sequence with E = 0 P S ref b P S ref P S i b = e E i ln P S i P S ref b P S ref P S i b P S i = E i Spec-seq: Specificity by sequencing P + S i P S i K A (S i ) = [P S i] P [S i ] K A S 1 : K A S 2 : : K A S n = P S 1 S 1 : P S 2 S 2 : : P S n S n 20

21 Specificity of the Lac repressor WT operator is asymmetric 4 libraries: vary both sequence and spacing 2560 different binding sites Highly reproducible: ~5% variance in affinity ~0.1kT variance in energy Zuo and Stormo, Genetics, 2014 Three dimensional structure of the dimeric lac HP62 O1 operator complex. Kalodimos C G et al. EMBO J. 2002;21: by European Molecular Biology Organization 21

22 No motif for half of all human TFs Most are C2H2 zinc finger proteins Laura Campitelli No motif for half of all human TFs Most are C2H2 zinc finger proteins No motif (809) Known motif (637) Close ortholog/paralog has motif (219) Human all TFs (1,665) Not tried/ no data (37) C2H2 with No motif (573) Human no motif (809) Needs heterodimerization partner (56) Possibly not sequencespecific (143) No motif (573) Human all C2H2s (714) Known motif Close (97) ortholog/par alog has motif (44) Matt Weirauch 22

23 ZF specificity prediction Use three programs: ours, One from Princeton group, One from Toronto group The Logos look pretty different but that is largely quantitative, and there are many high IC positions of agreement. By averaging the PFMs one can obtain a consensus sequence that agrees pretty well with all three. Spec-seq randomizations: Reverse Consensus (30bp) TCTTGATGATGCTGCAATATTAATAATTTA Consensus is good enough to show shift in EMSA. So we randomized five adjacent positions at a time, generating 6 libraries of 1024 sequences. Merged Logo shows overall good match with consensus and provides quantitative predictions about binding energy contributions. 23

24 Spec-seq motif matches well with motif obtained from in vivo recombination hotspots and using Affinity-seq method Affinity-seq pulls out genomic DNA fragments in vitro and sequences them without Amplification. Affinity-seq motif Hotspot motif Can also be easily adapted to study CpG methylation sensitivity ZFP57 involved in Imprinting Maintenance Has 2 ZF clusters, one binds TGCCGC, prefers mcpg 3 libraries with random regions and methylation variants 24

25 Spec-seq for combinatorial binding can get all of the important parameters in one experiment, including cooperativity K X x 1 : K X x 2 : : K X x n = N x 1 B X, N x 1 B, : N x 2 B X, N x 2 B, : : N(x n B X, ) N(x n B, ) K X Y x 1 : K X Y x 2 : : K X Y x n = N x 1 B X,Y N x 1 B,Y : N x 2 B X,Y N x 2 B,Y : : N(x n B X,Y ) N(x n B,Y ) ω i = K X Y S i K X S i = K Y X S i K Y S i = K X,Y S i K X S i K Y S i Stormo, Zuo, Chang, Briefings in Functional Genomics, 2015 Conclusions: 1. Different types of high-throughput data can be used to obtain good specificity models; good analysis methods are critical 2. PWMs are often (usually?) good approximations, but higher order models can be obtained if needed Specificity Modeling 25

26 Discovery of Binding Motifs from in vivo data Datatypes for Motif Discovery Co-regulated genes Genetic studies (deletion, over-expression effects) Expression analysis (microarrays, RNA-Seq) Co-bound regions ChIP-chip/-Seq location analysis Phylogenetic analysis, conservation across species phylogenetic footprinting Can be combined with multigene analysis, even over the whole genome Goal: Find the most significant pattern in common Can t look at all possible alignments too many In vitro analysis methods don t work; assumptions not valid Outline of problem 26

27 Example dataset: promoter region from co-regulated genes CE1CG \TAATGTTTGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGCGTGGTGTGAAAGACTGTTTTTTTGATCGTTTTCACAAAAATGGAAGTCCACAGTCTTGACAG\ ECOARABOP \GACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAG\ ECOBGLR1 \ACAAATCCCAATAACTTAATTATTGGGATTTGTTATATATAACTTTATAAATTCCTAAAATTACACAAAGTTAATAACTGTGAGCATGGTCATATTTTTATCAAT\ ECOCRP \CACAAAGCGAAAGCTATGCTAAAACAGTCAGGATGCTACAGTAATACATTGATGTACTGCATGTATGCAAAGGACGTCACATTACCGTGCAGTACAGTTGATAGC\ ECOCYA \ACGGTGCTACACTTGTATGTAGCGCATCTTTCTTTACGGTCAATCAGCAAGGTGTTAAATTGATCACGTTTTAGACCATTTTTTCGTCGTGAAACTAAAAAAACC\ ECODEOP2 \AGTGAATTATTTGAACCAGATCGCATTACAGTGATGCAAACTTGTAAGTAGATTTCCTTAATTGTGATGTGTATCGAAGTGTGTTGCGGAGTAGATGTTAGAATA\ ECOGALE \GCGCATAAAAAACGGCTAAATTCTTGTGTAAACGATTCCACTAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATGCTATGGTTATTTCATACCATAAGCC\ ECOILVBPR \GCTCCGGCGGGGTTTTTTGTTATCTGCAATTCAGTACAAAACGTGATCAACCCCTCAATTTTCCCTTTGCTGAAAAATTTTCCATTGTCTCCCCTGTAAAGCTGT\ ECOLAC \AACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCAC\ ECOMALBA \ACATTACCGCCAATTCTGTAACAGAGATCACACAAAGCGACGGTGGGGCGTAGGGGCAAGGAGGATGGAAAGAGGTTGCCGTATAAAGAAACTAGAGTCCGTTTA\ ECOMALBA \GGAGGAGGCGGGAGGATGAGAACACGGCTTCTGTGAACTAAACCGAGGTCATGTAAGGAATTTCGTGATGTTGCTTGCAAAAATCGTGGCGATTTTATGTGCGCA\ ECOMALT \GATCAGCGTCGTTTTAGGTGAGTTGTTAATAAAGATTTGGAATTGTGACACAGTGCAAATTCAGACACATAAAAAAACGTCATCGCTTGCATTAGAAAGGTTTCT\ ECOOMPA \GCTGACAAAAAAGATTAAACATACCTTATACAAGACTTTTTTTTCATATGCCTGACGGAGTTCACACTTGTAAGTTTTCAACTACGTTGTAGACTTTACATCGCC\ ECOTNAA \TTTTTTAAACATTAAAATTCTTACGTAATTTATAATCTTTAAAAAAAGCATTTAATATTGCTCCCCGAACGATTGTGATTCGATTCACATTTAAACAATTTCAGA\ ECOUXU1 \CCCATGAGAGTGAAATTGTTGTGATGTGGTTAACCCAATTAGAATTCGGGATTGACATGTCTTACCAAAAGGTAGAACTTATACGCCATCTCATCCGATGCAAGC\ PBR322 \CTGGCTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCTC\ TRN9CAT \CTGTGACGGAAGATCACTTCGCAGAATAAATAAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGGCGAAAATGAGACGTTGATCGGCACG\ TDC \GATTTTTATACTTTAACTTGTTGATATTTAAAGGTATTTAATTGTAATAACGATACTCTGGAAAGTATTGAAAGTTAATTTGTGAGTGGTCGCACATATCCTGTT\ 27

28 Expectation Maximization (EM) Approach to Motif Discovery Basic Idea: - Given sites, estimate PWM (log-odds model) - Given PWM, pick likely sites according to their probability - Make initial guess, then iterate between those steps until convergence Algorithm: Initial PWM from average of all possible sites Using current PWM estimate probability of each position being site; make new PWM from weighted average of all sites Iterate to convergence; usually fast, no guarantee of optimal Gibbs Sampling Approach to Motif Discovery Same Basic Idea: - Given sites, estimate PWM (log-odds model) - Given PWM, pick likely sites according to their probability - Iterate between those steps until convergence Algorithm: Pick 1 site from N-1 sequences, make PWM Use pseudocounts to avoid prob. = 0 Use current PWM to pick site from left out sequence by sampling from probability disturbition; update PWM Iterate to convergence; run multiple times, compare results; still no guarantee of optimal but avoids local optima often obtained with EM 28

29 From Lawrence et al, (1993) Science 1 262: Motif discovery from co-regulated genes Single species Multiple species A B 29

30 Alignment of conserved regions Alignment of profiles Example Leu3 YGL125W S. cerevisiae GAAAAAATAACAGCGACTTTTCTCCCGGTAGCGGGCCGTCGTTTAGTCATTCTATCCCTC S. mikatae AAAACATAACAGCGAATTTTCCTCCCGGTAGCGGGCCTTCGTTTAGTCATTCTCTCTCTT S. bayanus AAAAAATAACAGCGACTTTTCCCCCCGGTAGCGGGCCGTCGTTTAGTCATTCTCTCTCCC S. kudriavzevii GAAAAAAAACAACGGCGGCCTCCCCCGGTAGCGGGCCGTCGTTTAGTCATTCTCTCTCTC ***** **** ** *** * ************************************* YOR108W S. cerevisiae GCCATCATGGTCCGGTAACGGTCGTAGTGAATGACTCATATTTTTCCATCTCTTT S. mikatae GCCATCAAGGTCCGGTAACGGTCGTAGTGAATGACTCACATTTTCTTCGTTATTC S. bayanus ACCATTACGGTCCGGTAACGGACTTAGTGAATGATTCATCTTTTCTTCTTTTTTC S. kudriavzevii GTCGTTAAGGTCCGGTAACGGCCCTCAGCGAATGATTCATAATTTCATTTTTTTC ***** * ************* * ********** *** **** *** *** YMR108W S. cerevisiae AACGCCTAGCCGCCGGAGCCTGCCGGTACCGGCTTGGCTTCAGTTGCTGATCTCGG S. mikatae CACAATGACACATACCTAACAGCCGGTACCGGCTTGAATGCCGCCGTTGGCTTCGG S. bayanus ATCTTCTAGTCACCGCAGTCTGCCGGTACCGGCTTGAATTCCGCCGTTGATCCTGG S. kudriavzevii CACATCTCTAGTCCGCGCTCTGCCGGTACCGGCTTAGACTAGCCACGAATCTCGGC ** *** * **** ***************** **** ** * ** ** YGL125W YOR108W YMR108W A C G T A C G T A C G T Wang and Stormo, Bioinformatics 2003 Even whole genome search for conserved, multi-copy elements (eg. PhyloNet) Wang and Stormo, PNAS

Discovering MultipleLevels of Regulatory Networks

Discovering MultipleLevels of Regulatory Networks Discovering MultipleLevels of Regulatory Networks IAS EXTENDED WORKSHOP ON GENOMES, CELLS, AND MATHEMATICS Hong Kong, July 25, 2018 Gary D. Stormo Department of Genetics Outline of the talk 1. Transcriptional

More information

Chapter 7: Regulatory Networks

Chapter 7: Regulatory Networks Chapter 7: Regulatory Networks 7.2 Analyzing Regulation Prof. Yechiam Yemini (YY) Computer Science Department Columbia University The Challenge How do we discover regulatory mechanisms? Complexity: hundreds

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Matrix-based pattern discovery algorithms

Matrix-based pattern discovery algorithms Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)

More information

Alignment. Peak Detection

Alignment. Peak Detection ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie

More information

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes De novo identification of motifs in one species Modified from Serafim Batzoglou s lecture notes Finding Regulatory Motifs... Given a collection of genes that may be regulated by the same transcription

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

DNA Binding Proteins CSE 527 Autumn 2007

DNA Binding Proteins CSE 527 Autumn 2007 DNA Binding Proteins CSE 527 Autumn 2007 A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%, of all human proteins) modulate transcription of protein coding

More information

Measuring TF-DNA interactions

Measuring TF-DNA interactions Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF

More information

CSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery

CSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery CSE 527 Autumn 2006 Lectures 8-9 (& part of 10) Motifs: Representation & Discovery 1 DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%,

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Transcrip:on factor binding mo:fs

Transcrip:on factor binding mo:fs Transcrip:on factor binding mo:fs BMMB- 597D Lecture 29 Shaun Mahony Transcrip.on factor binding sites Short: Typically between 6 20bp long Degenerate: TFs have favorite binding sequences but don t require

More information

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. Sequence Analysis, '18 -- lecture 9 Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. How can I represent thousands of homolog sequences in a compact

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM? Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Position-specific scoring matrices (PSSM)

Position-specific scoring matrices (PSSM) Regulatory Sequence nalysis Position-specific scoring matrices (PSSM) Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d ix-marseille, France Technological dvances for Genomics and Clinics

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

A Combined Motif Discovery Method

A Combined Motif Discovery Method University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 8-6-2009 A Combined Motif Discovery Method Daming Lu University of New Orleans Follow

More information

Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

Bi 8 Lecture 11. Quantitative aspects of transcription factor binding and gene regulatory circuit design. Ellen Rothenberg 9 February 2016

Bi 8 Lecture 11. Quantitative aspects of transcription factor binding and gene regulatory circuit design. Ellen Rothenberg 9 February 2016 Bi 8 Lecture 11 Quantitative aspects of transcription factor binding and gene regulatory circuit design Ellen Rothenberg 9 February 2016 Major take-home messages from λ phage system that apply to many

More information

The geneticist s questions

The geneticist s questions The geneticist s questions a) What is consequence of reduced gene function? 1) gene knockout (deletion, RNAi) b) What is the consequence of increased gene function? 2) gene overexpression c) What does

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

QB LECTURE #4: Motif Finding

QB LECTURE #4: Motif Finding QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin

More information

Gene regulation: From biophysics to evolutionary genetics

Gene regulation: From biophysics to evolutionary genetics Gene regulation: From biophysics to evolutionary genetics Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen Johannes Berg Stana Willmann Curt Callan (Princeton)

More information

Quantitative Bioinformatics

Quantitative Bioinformatics Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize

More information

MEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY

MEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY Command line Training Set First Motif Summary of Motifs Termination Explanation MEME - Motif discovery tool MEME version 3.0 (Release date: 2002/04/02 00:11:59) For further information on how to interpret

More information

Inferring Protein-Signaling Networks

Inferring Protein-Signaling Networks Inferring Protein-Signaling Networks Lectures 14 Nov 14, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

Binding of transcription factors adapts to resolve information-energy tradeoff

Binding of transcription factors adapts to resolve information-energy tradeoff Binding of transcription factors adapts to resolve information-energy tradeoff Yonatan Savir 1,*, Jacob Kagan 2 and Tsvi Tlusty 3,* 1 Department of Systems Biology, Harvard Medical School, Boston, 02115,

More information

Deciphering regulatory networks by promoter sequence analysis

Deciphering regulatory networks by promoter sequence analysis Bioinformatics Workshop 2009 Interpreting Gene Lists from -omics Studies Deciphering regulatory networks by promoter sequence analysis Elodie Portales-Casamar University of British Columbia www.cisreg.ca

More information

Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM)

Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM) Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM) The data sets alpha30 and alpha38 were analyzed with PNM (Lu et al. 2004). The first two time points were deleted to alleviate

More information

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide Name: Period: Date: AP Bio Module 6: Bacterial Genetics and Operons, Student Learning Guide Getting started. Work in pairs (share a computer). Make sure that you log in for the first quiz so that you get

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Lecture 4: Transcription networks basic concepts

Lecture 4: Transcription networks basic concepts Lecture 4: Transcription networks basic concepts - Activators and repressors - Input functions; Logic input functions; Multidimensional input functions - Dynamics and response time 2.1 Introduction The

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Supporting Information

Supporting Information Supporting Information Weghorn and Lässig 10.1073/pnas.1210887110 SI Text Null Distributions of Nucleosome Affinity and of Regulatory Site Content. Our inference of selection is based on a comparison of

More information

Gene expression in prokaryotic and eukaryotic cells, Plasmids: types, maintenance and functions. Mitesh Shrestha

Gene expression in prokaryotic and eukaryotic cells, Plasmids: types, maintenance and functions. Mitesh Shrestha Gene expression in prokaryotic and eukaryotic cells, Plasmids: types, maintenance and functions. Mitesh Shrestha Plasmids 1. Extrachromosomal DNA, usually circular-parasite 2. Usually encode ancillary

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

RNA Synthesis and Processing

RNA Synthesis and Processing RNA Synthesis and Processing Introduction Regulation of gene expression allows cells to adapt to environmental changes and is responsible for the distinct activities of the differentiated cell types that

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Intro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye!

Intro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye! Today Gene regulation Synteny Good bye! Gene regulation What governs gene transcription? Genes active under different circumstances. Gene regulation What governs gene transcription? Genes active under

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

Outline CSE 527 Autumn 2009

Outline CSE 527 Autumn 2009 Outline CSE 527 Autumn 2009 5 Motifs: Representation & Discovery Previously: Learning from data MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) These Slides: Bio: Expression

More information

Bi 1x Spring 2014: LacI Titration

Bi 1x Spring 2014: LacI Titration Bi 1x Spring 2014: LacI Titration 1 Overview In this experiment, you will measure the effect of various mutated LacI repressor ribosome binding sites in an E. coli cell by measuring the expression of a

More information

Different gene regulation strategies revealed by analysis of binding motifs

Different gene regulation strategies revealed by analysis of binding motifs Different gene regulation strategies revealed by analysis of binding motifs The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/8/07 CAP5510 1 Pattern Discovery 2/8/07 CAP5510 2 Patterns Nature

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

What is the expectation maximization algorithm?

What is the expectation maximization algorithm? primer 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou The expectation maximization algorithm arises

More information

The Physical Language of Molecules

The Physical Language of Molecules The Physical Language of Molecules How do molecular codes emerge and evolve? International Workshop on Bio Soft Matter Tokyo, 2008 Biological information is carried by molecules Self replicating information

More information

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC

More information

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1. Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer

More information

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Lecture 18 June 2 nd, Gene Expression Regulation Mutations Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase

More information

Chapter 9 DNA recognition by eukaryotic transcription factors

Chapter 9 DNA recognition by eukaryotic transcription factors Chapter 9 DNA recognition by eukaryotic transcription factors TRANSCRIPTION 101 Eukaryotic RNA polymerases RNA polymerase RNA polymerase I RNA polymerase II RNA polymerase III RNA polymerase IV Function

More information

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Jamie Duke 1,2 and Carlos Camacho 3 1 Bioengineering and Bioinformatics Summer Institute,

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

EM-algorithm for motif discovery

EM-algorithm for motif discovery EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width

More information

Geert Geeven. April 14, 2010

Geert Geeven. April 14, 2010 iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

CHAPTER : Prokaryotic Genetics

CHAPTER : Prokaryotic Genetics CHAPTER 13.3 13.5: Prokaryotic Genetics 1. Most bacteria are not pathogenic. Identify several important roles they play in the ecosystem and human culture. 2. How do variations arise in bacteria considering

More information

Whole-genome analysis of GCN4 binding in S.cerevisiae

Whole-genome analysis of GCN4 binding in S.cerevisiae Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?

More information

56:198:582 Biological Networks Lecture 8

56:198:582 Biological Networks Lecture 8 56:198:582 Biological Networks Lecture 8 Course organization Two complementary approaches to modeling and understanding biological networks Constraint-based modeling (Palsson) System-wide Metabolism Steady-state

More information

Proteomics Systems Biology

Proteomics Systems Biology Dr. Sanjeeva Srivastava IIT Bombay Proteomics Systems Biology IIT Bombay 2 1 DNA Genomics RNA Transcriptomics Global Cellular Protein Proteomics Global Cellular Metabolite Metabolomics Global Cellular

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

Algorithms in Bioinformatics II SS 07 ZBIT, C. Dieterich, (modified script of D. Huson), April 25,

Algorithms in Bioinformatics II SS 07 ZBIT, C. Dieterich, (modified script of D. Huson), April 25, Algorithms in Bioinformatics II SS 07 ZBIT, C. Dieterich, (modified script of D. Huson), April 25, 200707 Motif Finding This exposition is based on the following sources, which are all recommended reading:.

More information

Fundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs

Fundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs Fundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs Zeba Wunderlich 1* and Leonid A. Mirny 1,2 1 Biophysics Program, Harvard

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12

A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12 The integration host factor regulon of E. coli K12 genome 783 A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12 M. Trindade dos Santos and

More information

Quantitative modeling and data analysis of SELEX experiments

Quantitative modeling and data analysis of SELEX experiments Quantitative modeling and data analysis of SELEX experiments Marko Djordjevic, 1,2, and Anirvan M. Sengupta 3 1 Department of Physics, Columbia University, New York, NY 10027 2 Mathematical Biosciences

More information

GS 559. Lecture 12a, 2/12/09 Larry Ruzzo. A little more about motifs

GS 559. Lecture 12a, 2/12/09 Larry Ruzzo. A little more about motifs GS 559 Lecture 12a, 2/12/09 Larry Ruzzo A little more about motifs 1 Reflections from 2/10 Bioinformatics: Motif scanning stuff was very cool Good explanation of max likelihood; good use of examples (2)

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Kinetics of Gene Regulation COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 The simplest model of gene expression involves only two steps: the

More information

DNA/RNA structure and packing

DNA/RNA structure and packing DNA/RNA structure and packing Reminder: Nucleic acids one oxygen atom distinguishes RNA from DNA, increases reactivity (so DNA is more stable) base attaches at 1, phosphate at 5 purines pyrimidines Replace

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Genome 541! Unit 4, lecture 3! Genomics assays

Genome 541! Unit 4, lecture 3! Genomics assays Genome 541! Unit 4, lecture 3! Genomics assays Much easier to follow with slides. Good pace.! Having the slides was really helpful clearer to read and easier to follow the trajectory of the lecture.!!

More information

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9: Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2

More information

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization. 3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of

More information

Stephen Scott.

Stephen Scott. 1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative

More information

Bayesian Clustering of Multi-Omics

Bayesian Clustering of Multi-Omics Bayesian Clustering of Multi-Omics for Cardiovascular Diseases Nils Strelow 22./23.01.2019 Final Presentation Trends in Bioinformatics WS18/19 Recap Intermediate presentation Precision Medicine Multi-Omics

More information

JMJ14-HA. Col. Col. jmj14-1. jmj14-1 JMJ14ΔFYR-HA. Methylene Blue. Methylene Blue

JMJ14-HA. Col. Col. jmj14-1. jmj14-1 JMJ14ΔFYR-HA. Methylene Blue. Methylene Blue Fig. S1 JMJ14 JMJ14 JMJ14ΔFYR Methylene Blue Col jmj14-1 JMJ14-HA Methylene Blue Col jmj14-1 JMJ14ΔFYR-HA Fig. S1. The expression level of JMJ14 and truncated JMJ14 with FYR (FYRN + FYRC) domain deletion

More information

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12) Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

Sequence motif analysis

Sequence motif analysis Sequence motif analysis Alan Moses Associate Professor and Canada Research Chair in Computational Biology Departments of Cell & Systems Biology, Computer Science, and Ecology & Evolutionary Biology Director,

More information

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia From Wikipedia, the free encyclopedia Functional genomics..is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects)

More information

An overview of deep learning methods for genomics

An overview of deep learning methods for genomics An overview of deep learning methods for genomics Matthew Ploenzke STAT115/215/BIO/BIST282 Harvard University April 19, 218 1 Snapshot 1. Brief introduction to convolutional neural networks What is deep

More information

Three types of RNA polymerase in eukaryotic nuclei

Three types of RNA polymerase in eukaryotic nuclei Three types of RNA polymerase in eukaryotic nuclei Type Location RNA synthesized Effect of α-amanitin I Nucleolus Pre-rRNA for 18,.8 and 8S rrnas Insensitive II Nucleoplasm Pre-mRNA, some snrnas Sensitive

More information

CSEP 590B Fall Motifs: Representation & Discovery

CSEP 590B Fall Motifs: Representation & Discovery CSEP 590B Fall 2014 5 Motifs: Representation & Discovery 1 Outline Previously: Learning from data MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) These Slides: Bio: Expression

More information

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics Genome 541 Unit 4, lecture 2 Transcription factor binding using functional genomics Slides vs chalk talk: I m not sure why you chose a chalk talk over ppt. I prefer the latter no issues with readability

More information

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT 5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:

More information