Transcrip:on factor binding mo:fs
|
|
- Allen Gardner
- 5 years ago
- Views:
Transcription
1 Transcrip:on factor binding mo:fs BMMB- 597D Lecture 29 Shaun Mahony
2 Transcrip.on factor binding sites Short: Typically between 6 20bp long Degenerate: TFs have favorite binding sequences but don t require a perfect match to bind. Belong to families: Related TFs oken bind to similar sequences. Likes to bind: CCGGAA TGACCT..TGACCT ATTA CACGTG
3 We can represent a TF s binding preference as a degenerate string or probabilis.cally Defini.on: A mo:f is a panern of conserva:on in a sequence alignment. TF: NFκB Favorite sequence: GGGAATTTCC Known NFκB binding sites: GGGAATTTCC GGGAATTTCC GGGAATTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGAAATTCC GGGACTTCCC GGGGATTTCC GGGGATTTCC GGGGATTTCC GGGATTTTCC GGGGATTCCC GGGATTTCCC GGGAATTCAC GGGGCTTTCC GGGGCTTTCC GGGAAGTCCC Consensus sequence GGGRNWTYCC Weight matrix
4 Consensus sequences Defini.on: A consensus sequence represents a mul:ple sequence alignment using a degenerate alphabet to represent variable posi:ons. Extended DNA Alphabet (IUPAC) A C G T W = A or T S = C or G M = A or C K = G or T R = A or G Y = C or T N = any base (A/C/G/T) Alignment GGGAATTTCC GGGAATTTCC GGGAATTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGAAATTCC GGGACTTCCC GGGGATTTCC GGGGATTTCC GGGGATTTCC GGGATTTTCC Consensus GGGAATTTCC GGGAMTTTCC GGGAMWTTCC GGGAMWTYCC GGGRNWTYCC GGGGTTTCCC?
5 Posi.on weight matrices Defini.on: A posi:on weight matrix (PWM) is a matrix of log likelihood values that gives a weighted match to fixed length strings. Also known as posi:on specific scoring matrix (PSSM). GGGAATTTCC GGGAATTTCC GGGAATTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGAAATTCC GGGACTTCCC GGGGATTTCC GGGGATTTCC GGGGATTTCC GGGATTTTCC GGGGATTCCC GGGATTTCCC GGGAATTCAC GGGGCTTTCC A C G T Count matrix A C G T Rela.ve frequency matrix p i,j : probability of observing letter i at position j
6 Posi.on weight matrices A C G T Posi.on weight matrix: m i,j = log 2 (p i,j / b i ) where b i is the background probability of letter i Informa:on content: LeNer height in logo: Sequence logo T IC j = 2 ( p i, j log 2 (p i, j )) i=a Height i, j = p i, j IC j
7 Mo.f scanning: finding instances of a known mo.f >Test Sequence TTACGTTTGTCGATTTATGGGACTTTCCTCTTCGTATTTATTAGGCT T T A C G T T T G T C T C A C G T Score = A T G G G A C T T T C C T C A C G T Score = 17.57
8 Databases of known mo.fs TRANSFAC: hnp:// regula:on.com/pub/databases.html Public version is old (2005), private version is expensive. Best coverage, not necessarily highest quality mo:fs. Jaspar: hnp://jaspar.genereg.net Open access. High quality mo:fs, medium coverage. UniProbe: hnp://the_brain.bwh.harvard.edu/uniprobe Based on in vitro protein binding microarray experiments. Good coverage of many TF families.
9 Jaspar
10 Mo.f scanning tools Mo:fViz: hnp://biowulf.bu.edu/mo:fviz/ Various mo:f scanning tools included. FIMO: hnp://meme.nbcr.net/meme/cgi- bin/fimo.cgi Part of the MEME suite of mo:f analysis tools. MATCH: Kel, et al. Nucleic Acids Res (2003) TAMO: Gordon, et al. Bioinforma:cs (2005) Mo:fScanner/TOUCAN: Aerts, et al. Nucleic Acids Res (2005)
11 FIMO: Finding Individual Mo.f Occurrences >Seq1 AATGTTCAGCTAAAATGTATTATTTTTTGTCTATACATAGAACGTGGGAATTTCCCC AGGGTTCTAAATAGACTCAAGTTAGCCTCTTATCAGAGCAGAA >Seq2 TGCCTCAGACTTTCAAATTATATCTGGGTGGATCATTCAAAGGCCGGGAAATTCCAA TAGCTGCCTACATAGATATGGAGTATTGCCATACTTTCTGGCC >Seq3 CCTCACAGGAACAGGAGCAGAGAAATTTTGAACAAATCAAATCTTGGGATTTCCCCT GGTGCTGTACCTGCAATAGCCTGTCCCAACTAAGGGTATAACA
12 FIMO: Finding Individual Mo.f Occurrences
13
14 What if we don t know the mo.f? Defini.on: Mo:f- finding is the process by which short, degenerate, sta:s:cally- overrepresented panerns are discovered in a set of sequences. When do we use mo.f scanning vs. mo.f- finding? Mo.f scanning: We know TF x binds somewhere in a sequence (e.g. we know it directly regulates a gene) AND we know the mo:f for TF x. Beware of similarity thresholds! Mo.f- finding: We hypothesize that a set of genes are bound by the same (unknown) TF. The set of genes may be defined by a similar gene expression profiles. Mo.f- finding: We performed high- throughput Chroma:n Immunoprecipita:on assays (ChIP- chip/chip- seq) on TF y, and we want to characterize TF y s binding preference from that data.
15 Mo.f- finding approaches Expecta:on- Maximiza:on MEME, Improbizer Gibbs Sampling AlignACE, GLAM, Mo:fSampler Clustering SOMBRERO Exhaus:ve enumera:on Weeder, YMF Evalua:on of various methods in: Tompa, et al. Nature Biotech. (2005)
16 Mo.f- finding with Expecta.on Maximiza.on (EM) Define a mixture model with two components: Mo:f & Background, where mo:f is ini:alized randomly Simple EM algorithm: 1. Expecta>on: Find expected likelihood of sequence data given current model. i.e. Probabilis:cally assign each k- mer to mo:f & background components. 2. Maximiza>on: Update model parameters to maximize expected- likelihood func:on. i.e. Update the mo:f and background component parameters to reflect the assignments in 1) 3. Repeat E- M un:l no change in likelihood.
17 Problem: We added galactose to the growth media of yeast (S. cerevisiae), and observed the up- regula:on of four genes: GAL7, GAL10, GAL1 & GAL2. We hypothesize that these four genes are under the control of the same transcrip:on factor. To test this hypothesis, we perform mo:f- finding on the promoter sequences of these genes. >YBR018C GAL7 GACGGTAGCAACAAGAATATAGCACGAGCCGCGGAGTTCATTTCGTTACTTTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATC ATAAGGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGGGTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTT TGCCTCTCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAGATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAA AGGGTCCAAAAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGTGTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGC TATGTTCAGTTAGTTTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATTATGCAGAGCATCAACATGATAAA >YBR019C GAL10 ATCGCTTCGCTGATTAATTACCCCAGAAATAAGGCTAAAAAACTAATCGCATTATCATCCTATGGTTGTTAATTTGATTCGTTAATTTGAAGGTTTGT GGGGCCAGGTTCTGCCAATTTTTCCTCTTCATAACCATAAAAGCTAGTATTGTAGAATCTTTATTGTTCGGAGCAGTGCGGCGCGAGGCACATCTGC GTTTCAGGAACGCGACCGGTGAAGACGAGGACGCACGGAGGAGAGTCTTCCGTCGGAGGGCTGTCGCCCGCTCGGCGGCTTCTAATCCGTACTT CAATATAGCAATGAGCAGTTAAGCGTATTACTGAAAGTTCCAAAGAGAAGGTTTTTTTAGGCTAAGATAATGGGGCTCTTTACATTTCCACAACATA TAAGTAAGATTAGATATGGATATGTATATGGTGGTAATGCCATGTAATATGATTATTAAACTTCTTTGCGTCCATCCAAAAAAAAAGT >YBR020W GAL1 ACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTC TTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTC TCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGC TTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGAT TAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTA ACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATA >YLR081W GAL2 AGGTTGCAATTTCTTTTTCTATTAGTAGCTAAAAATGGGTCACGTGATCTATATTCGAAAGGGGCGGTTGCCTCAGGAAGGCACCGGCGGTCTTTC GTCCGTGCGGAGATATCTGCGCCGTTCAGGGGTCCATGTGCCTTGGACGATATTAAGGCAGAAGGCAGTATCGGGGCGGATCACTCCGAACCGA GATTAGTTAAGCCCTTCCCATCTCAAGATGGGGAGCAAATGGCATTATACTCCTGCTAGAAAGTTAACTGTGCACATATTCTTAAATTATACAACAT TCTGGAGAGCTATTGTTCAAAAAACAAACATTTCGCAGGCTAAAATGTGGAGATAGGATAAGTTTTGTAGACATATATAAACAATCAGTAATTGGA TTGAAAATTTGGTGTTGTGAATTGCTCTTCATTATGCACCTTATTCAATTATCATCAAGAATAGTAATAGTTAAGTAAACACAAGATTA
18 MEME has a lot of op.ons, and good documenta.on
19 MEME Galactose example
20 MEME Galactose example
21 >YBR018C GAL7 GACGGTAGCAACAAGAATATAGCACGAGCCGCGGAGTTCATTTCGTTACTTTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATC ATAAGGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGGGTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTT TGCCTCTCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAGATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAA AGGGTCCAAAAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGTGTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGC TATGTTCAGTTAGTTTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATTATGCAGAGCATCAACATGATAAA >YBR019C GAL10 ATCGCTTCGCTGATTAATTACCCCAGAAATAAGGCTAAAAAACTAATCGCATTATCATCCTATGGTTGTTAATTTGATTCGTTAATTTGAAGGTTTGT GGGGCCAGGTTCTGCCAATTTTTCCTCTTCATAACCATAAAAGCTAGTATTGTAGAATCTTTATTGTTCGGAGCAGTGCGGCGCGAGGCACATCTGC GTTTCAGGAACGCGACCGGTGAAGACGAGGACGCACGGAGGAGAGTCTTCCGTCGGAGGGCTGTCGCCCGCTCGGCGGCTTCTAATCCGTACTT CGTTTCAGGAACGCGACCGGTGAAGACGAGGACGCACGGAGGAGAGTCTTCCGTCGGAGGGCTGTCGCCCGCTCGGCGGCTTCTAATCCGTACT CAATATAGCAATGAGCAGTTAAGCGTATTACTGAAAGTTCCAAAGAGAAGGTTTTTTTAGGCTAAGATAATGGGGCTCTTTACATTTCCACAACATA TCAATATAGCAATGAGCAGTTAAGCGTATTACTGAAAGTTCCAAAGAGAAGGTTTTTTTAGGCTAAGATAATGGGGCTCTTTACATTTCCACAACAT ATAAGTAAGATTAGATATGGATATGTATATGGTGGTAATGCCATGTAATATGATTATTAAACTTCTTTGCGTCCATCCAAAAAAAAAGT >YBR020W GAL1 ACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTC TTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTC TCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGC TTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGAT TAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTA ACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATA >YLR081W GAL2 AGGTTGCAATTTCTTTTTCTATTAGTAGCTAAAAATGGGTCACGTGATCTATATTCGAAAGGGGCGGTTGCCTCAGGAAGGCACCGGCGGTCTTTC GTCCGTGCGGAGATATCTGCGCCGTTCAGGGGTCCATGTGCCTTGGACGATATTAAGGCAGAAGGCAGTATCGGGGCGGATCACTCCGAACCGA GATTAGTTAAGCCCTTCCCATCTCAAGATGGGGAGCAAATGGCATTATACTCCTGCTAGAAAGTTAACTGTGCACATATTCTTAAATTATACAACAT TCTGGAGAGCTATTGTTCAAAAAACAAACATTTCGCAGGCTAAAATGTGGAGATAGGATAAGTTTTGTAGACATATATAAACAATCAGTAATTGGA TTGAAAATTTGGTGTTGTGAATTGCTCTTCATTATGCACCTTATTCAATTATCATCAAGAATAGTAATAGTTAAGTAAACACAAGATTA We found a mo.f! Now what?
22 Which TFs could bind our mo.fs? Match discovered mo:fs against known mo:f databases. Mo:f alignment sokware: STAMP hnp:// Mo:f alignment, mul:ple alignment, clustering TOMTOM hnp://meme.nbcr.net/meme/cgi- bin/tomtom.cgi Integrated with MEME suite
23
24
25 Transcrip.on factor binding mo.fs come in families Elk4 NR2F1 Antp Max Eip74EF PPARg Lim3 Myc FEV Esrrb Yox1 NHLH1
26 The fu.lity theorem Fu.lity theorem*: essen:ally all TF binding mo:f occurrences will have no func:on. How can we focus on mo:f instances that are bound & func:onal? Conserva:on Nucleosome posi:oning Cis- regulatory modules (i.e. clusters of sites) Accessibility (DNaseI hypersensi:vity) Chroma:n marks (H3K4me1, H3K27ac) * Wasserman & Sandelin, Nature Reviews Gene:cs (2004)
27 Priors for mo.f- finding Posi:onal priors: Analyzing ChIP- seq data, mo:fs are more likely to be centered under the peak. Structural priors: Mo:fs are more likely to be similar to known mo:fs and to have par:cular informa:on content shapes.
28 Further reading Protein- DNA binding assays: Determining the specificity of protein DNA interac:ons, Stormo & Zhao, Nature Rev Gene:cs (2010) Mo:f scanning & mo:f- finding: Prac:cal strategies for discovering regulatory DNA sequence mo:fs, MacIsaac & Fraenkel, PLoS Comp Bio (2006) Applied bioinforma:cs for the iden:fica:on of regulatory elements, Wasserman & Sandelin, Nature Rev Gene:cs (2004) MEME: Fiung a mixture model by expecta:on maximiza:on to discover mo:fs in biopolymers, Bailey & Elkan, Proc. ISMB (1994) Gene regula:on & Gal4 system: Genes & Signals, Ptashne & Gann (2002)
Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji
Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC
More informationDe novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes
De novo identification of motifs in one species Modified from Serafim Batzoglou s lecture notes Finding Regulatory Motifs... Given a collection of genes that may be regulated by the same transcription
More informationMatrix-based pattern discovery algorithms
Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
More informationProbabilistic models of biological sequence motifs
Probabilistic models of biological sequence motifs Discovery of new motifs Master in Bioinformatics UPF 2015-2016 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain what
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationAlignment. Peak Detection
ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationChapter 7: Regulatory Networks
Chapter 7: Regulatory Networks 7.2 Analyzing Regulation Prof. Yechiam Yemini (YY) Computer Science Department Columbia University The Challenge How do we discover regulatory mechanisms? Complexity: hundreds
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationCSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs
CSCI1950 Z Computa3onal Methods for Biology Lecture 24 Ben Raphael April 29, 2009 hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs Subnetworks with more occurrences than expected by chance. How to
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationCSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery
CSE 527 Autumn 2006 Lectures 8-9 (& part of 10) Motifs: Representation & Discovery 1 DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%,
More informationPriors in Dependency network learning
Priors in Dependency network learning Sushmita Roy sroy@biostat.wisc.edu Computa:onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu
More informationDNA Binding Proteins CSE 527 Autumn 2007
DNA Binding Proteins CSE 527 Autumn 2007 A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%, of all human proteins) modulate transcription of protein coding
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/8/07 CAP5510 1 Pattern Discovery 2/8/07 CAP5510 2 Patterns Nature
More informationPosition-specific scoring matrices (PSSM)
Regulatory Sequence nalysis Position-specific scoring matrices (PSSM) Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d ix-marseille, France Technological dvances for Genomics and Clinics
More informationIns?tute for Computa?onal Biomedicine. ChIP- seq. Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD
Ins?tute for Computa?onal Biomedicine ChIP- seq Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Plan 1. ChIP- seq 2. Quality Control of ChIP- seq data 3. ChIP- seq Peak detec?on 4. Peak Analysis and
More informationModeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions)
Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions) Computational Genomics Course Cold Spring Harbor Labs Oct 31, 2016 Gary D. Stormo Department of Genetics
More informationEvolu&on of Cellular Interac&on Networks. Pedro Beltrao Krogan and Lim UCSF
Evolu&on of Cellular Interac&on Networks Pedro Beltrao Krogan and Lim Labs @ UCSF Point muta&ons Recombina&on Duplica&ons Mutants Mutants Point muta&ons Recombina&on Duplica&ons Changes in protein- protein,
More informationGene Regulatory Networks II Computa.onal Genomics Seyoung Kim
Gene Regulatory Networks II 02-710 Computa.onal Genomics Seyoung Kim Goal: Discover Structure and Func;on of Complex systems in the Cell Identify the different regulators and their target genes that are
More informationNeyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?
Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test
More informationSimilarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 27, 855-868 (20) Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test * QIAN LIU +, SAN-YANG LIU AND LI-FANG LIU + Department
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationMEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY
Command line Training Set First Motif Summary of Motifs Termination Explanation MEME - Motif discovery tool MEME version 3.0 (Release date: 2002/04/02 00:11:59) For further information on how to interpret
More informationDeciphering regulatory networks by promoter sequence analysis
Bioinformatics Workshop 2009 Interpreting Gene Lists from -omics Studies Deciphering regulatory networks by promoter sequence analysis Elodie Portales-Casamar University of British Columbia www.cisreg.ca
More informationAmino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)
Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings
More informationNetworks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource
Networks Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks in biology Protein-Protein Interaction Network of Yeast Transcriptional regulatory network of E.coli Experimental
More informationQuantitative Bioinformatics
Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize
More informationInferring Protein-Signaling Networks
Inferring Protein-Signaling Networks Lectures 14 Nov 14, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1
More informationDifferent gene regulation strategies revealed by analysis of binding motifs
Different gene regulation strategies revealed by analysis of binding motifs The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation
More informationSequence motif analysis
Sequence motif analysis Alan Moses Associate Professor and Canada Research Chair in Computational Biology Departments of Cell & Systems Biology, Computer Science, and Ecology & Evolutionary Biology Director,
More informationPrinciples of Gene Expression
Principles of Gene Expression I. Introduc5on Genome : the en*re set of genes (transcrip*on units) of an organism Transcriptome : the en*re set of marns found in a cell at a given *me Proteome : the en*re
More informationGraph structure learning for network inference
Graph structure learning for network inference Sushmita Roy sroy@biostat.wisc.edu Computa9onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu
More informationMatrix-based pattern matching
Regulatory sequence analysis Matrix-based pattern matching Jacques van Helden Jacques.van-Helden@univ-amu.fr Aix-Marseille Université, France Technological Advances for Genomics and Clinics (TAGC, INSERM
More informationComputational Genomics. Uses of evolutionary theory
Computational Genomics 10-810/02 810/02-710, Spring 2009 Model-based Comparative Genomics Eric Xing Lecture 14, March 2, 2009 Reading: class assignment Eric Xing @ CMU, 2005-2009 1 Uses of evolutionary
More informationOn the Monotonicity of the String Correction Factor for Words with Mismatches
On the Monotonicity of the String Correction Factor for Words with Mismatches (extended abstract) Alberto Apostolico Georgia Tech & Univ. of Padova Cinzia Pizzi Univ. of Padova & Univ. of Helsinki Abstract.
More informationIntroduction to Bioinformatics. Shifra Ben-Dor Irit Orr
Introduction to Bioinformatics Shifra Ben-Dor Irit Orr Lecture Outline: Technical Course Items Introduction to Bioinformatics Introduction to Databases This week and next week What is bioinformatics? A
More informationAlgorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding
Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12:00 4001 Motif finding This exposition was developed by Knut Reinert and Clemens Gröpl. It is based on the following
More informationfor the Analysis of ChIP-Seq Data
Supplementary Materials: A Statistical Framework for the Analysis of ChIP-Seq Data Pei Fen Kuan Departments of Statistics and of Biostatistics and Medical Informatics Dongjun Chung Departments of Statistics
More informationIntroduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas
Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationFundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs
Fundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs Zeba Wunderlich 1* and Leonid A. Mirny 1,2 1 Biophysics Program, Harvard
More informationBasics on bioinforma-cs Lecture 7. Nunzio D Agostino
Basics on bioinforma-cs Lecture 7 Nunzio D Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Multiple alignments One sequence plays coy a pair of homologous sequence whisper many aligned
More informationComparative Network Analysis
Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More informationPrac%cal Bioinforma%cs for Life Scien%sts. Week 14, Lecture 28. István Albert Bioinforma%cs Consul%ng Center Penn State
Prac%cal Bioinforma%cs for Life Scien%sts Week 14, Lecture 28 István Albert Bioinforma%cs Consul%ng Center Penn State Final project A group of researchers are interested in studying protein binding loca%ons
More informationChIP seq peak calling. Statistical integration between ChIP seq and RNA seq
Institute for Computational Biomedicine ChIP seq peak calling Statistical integration between ChIP seq and RNA seq Olivier Elemento, PhD ChIP-seq to map where transcription factors bind DNA Transcription
More informationDiscovering MultipleLevels of Regulatory Networks
Discovering MultipleLevels of Regulatory Networks IAS EXTENDED WORKSHOP ON GENOMES, CELLS, AND MATHEMATICS Hong Kong, July 25, 2018 Gary D. Stormo Department of Genetics Outline of the talk 1. Transcriptional
More informationWhole-genome analysis of GCN4 binding in S.cerevisiae
Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?
More informationEM-algorithm for motif discovery
EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width
More informationA Combined Motif Discovery Method
University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 8-6-2009 A Combined Motif Discovery Method Daming Lu University of New Orleans Follow
More informationA genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12
The integration host factor regulon of E. coli K12 genome 783 A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12 M. Trindade dos Santos and
More informationJianlin Cheng, PhD. Department of Computer Science University of Missouri, Columbia. Fall, 2014
Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia Fall, 2014 Free for academic use. Copyright @ Jianlin Cheng & original sources for some materials Find a set of sub-sequences
More informationGeert Geeven. April 14, 2010
iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the
More informationOutline CSE 527 Autumn 2009
Outline CSE 527 Autumn 2009 5 Motifs: Representation & Discovery Previously: Learning from data MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) These Slides: Bio: Expression
More informationInferring Models of cis-regulatory Modules using Information Theory
Inferring Models of cis-regulatory Modules using Information Theory BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 26 Anthony Gitter gitter@biostat.wisc.edu Overview Biological question What is causing
More informationHow can one gene have such drastic effects?
Slides revised and adapted Computational Biology course IST Ana Teresa Freitas 2011/2012 A recent microarray experiment showed that when gene X is knocked out, 20 other genes are not expressed How can
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationMeasuring TF-DNA interactions
Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF
More informationPROBABILISTIC MOTIF SEARCHING
PROBABILISTIC MOTIF SEARCHING Thesis for the M.S.c. Degree Submitted to the Scientific Council of The Weizmann Institute of Science Rehovot 76100, Israel By Libi Hertzberg Carried Out Under the Supervision
More informationScoring Matrices. Shifra Ben-Dor Irit Orr
Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary
CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationStatistics of transcriptional regulation
Statistics of transcriptional regulation Sündüz Keleş Department of Statistics Department of Biostatistics and Medical Informatics University of Wisconsin, Madison February 18-27, 2008 Stat 992 (877) (Spring
More informationInferring Models of cis-regulatory Modules using Information Theory
Inferring Models of cis-regulatory Modules using Information Theory BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 28 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material,
More informationCSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1. Ben Raphael January 21, Course Par3culars
CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1 Ben Raphael January 21, 2009 Course Par3culars Three major topics 1. Phylogeny: ~50% lectures 2. Func3onal Genomics: ~25% lectures
More informationGenome 559 Wi RNA Function, Search, Discovery
Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,
More informationTheoretical distribution of PSSM scores
Regulatory Sequence Analysis Theoretical distribution of PSSM scores Jacques van Helden Jacques.van-Helden@univ-amu.fr Aix-Marseille Université, France Technological Advances for Genomics and Clinics (TAGC,
More informationCSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on
CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 13.1 and 13.2 1 Status Check Extra credits? Announcement Evalua/on process will start soon
More informationProteomics Systems Biology
Dr. Sanjeeva Srivastava IIT Bombay Proteomics Systems Biology IIT Bombay 2 1 DNA Genomics RNA Transcriptomics Global Cellular Protein Proteomics Global Cellular Metabolite Metabolomics Global Cellular
More informationTranscription factors (TFs) regulate genes by binding to their
CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling Qing Zhou* and Wing H. Wong* *Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, MA 02138;
More informationSupplementary text for the section Interactions conserved across species: can one select the conserved interactions?
1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section
More informationGenome 541! Unit 4, lecture 3! Genomics assays
Genome 541! Unit 4, lecture 3! Genomics assays Much easier to follow with slides. Good pace.! Having the slides was really helpful clearer to read and easier to follow the trajectory of the lecture.!!
More informationSimilarity of position frequency matrices for transcription factor binding sites
BIOINFORMATICS ORIGINAL PAPER Vol. 21 no. 3 2005, pages 307 313 doi:10.1093/bioinformatics/bth480 Similarity of position frequency matrices for transcription factor binding sites Dustin E. Schones 1,2,,
More informationDiscovering Binding Motif Pairs from Interacting Protein Groups
Discovering Binding Motif Pairs from Interacting Protein Groups Limsoon Wong Institute for Infocomm Research Singapore Copyright 2005 by Limsoon Wong Plan Motivation from biology & problem statement Recasting
More informationSTATISTICAL SIGNIFICANCE FOR DNA MOTIF DISCOVERY
STATISTICAL SIGNIFICANCE FOR DNA MOTIF DISCOVERY A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor
More informationPhylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki
Phylogene)cs IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, 2016 Joyce Nzioki Phylogenetics The study of evolutionary relatedness of organisms. Derived from two Greek words:» Phle/Phylon: Tribe/Race» Genetikos:
More informationDeciphering the cis-regulatory network of an organism is a
Identifying the conserved network of cis-regulatory sites of a eukaryotic genome Ting Wang and Gary D. Stormo* Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110 Edited
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression
More informationBayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen
Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania stjensen@wharton.upenn.edu
More informationRegulatory Element Detection using a Probabilistic Segmentation Model
Regulatory Element Detection using a Probabilistic Segmentation Model Harmen J Bussemaker 1, Hao Li 2,3, and Eric D Siggia 2,4 1 Swammerdam Institute for Life Sciences and Amsterdam Center for Computational
More informationFinding motifs from all sequences with and without binding sites
BIOINFORMATICS ORIGINAL PAPER Vol. 22 no. 18 2006, pages 2217 2223 doi:10.1093/bioinformatics/btl371 Sequence analysis Finding motifs from all sequences with and without binding sites Henry C. M. Leung
More informationDifferent gene regulation strategies revealed by analysis of binding motifs
Acknowledgements We thank members of the Zhang laboratory and three anonymous reviewers for valuable comments. This work was supported by research grants from the National Institutes of Health to J.Z.
More informationGENOME-WIDE ANALYSIS OF CORE PROMOTER REGIONS IN EMILIANIA HUXLEYI
1 GENOME-WIDE ANALYSIS OF CORE PROMOTER REGIONS IN EMILIANIA HUXLEYI Justin Dailey and Xiaoyu Zhang Department of Computer Science, California State University San Marcos San Marcos, CA 92096 Email: daile005@csusm.edu,
More informationObjectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.
Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae Emily Germain 1,2 Mentor Dr. Hugh Nicholas 3 1 Bioengineering & Bioinformatics Summer Institute, Department of Computational
More informationChapter 8. Regulatory Motif Discovery: from Decoding to Meta-Analysis. 1 Introduction. Qing Zhou Mayetri Gupta
Chapter 8 Regulatory Motif Discovery: from Decoding to Meta-Analysis Qing Zhou Mayetri Gupta Abstract Gene transcription is regulated by interactions between transcription factors and their target binding
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationGenome 541 Introduction to Computational Molecular Biology. Max Libbrecht
Genome 541 Introduction to Computational Molecular Biology Max Libbrecht Genome 541 units Max Libbrecht: Gene regulation and epigenomics Postdoc, Bill Noble s lab Yi Yin: Bayesian statistics Postdoc, Jay
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationGenome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics
Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics I believe it is helpful to number your slides for easy reference. It's been a while since I took
More informationCSEP 590B Fall Motifs: Representation & Discovery
CSEP 590B Fall 2014 5 Motifs: Representation & Discovery 1 Outline Previously: Learning from data MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) These Slides: Bio: Expression
More informationPhyloGibbs-MP: Module Prediction and Discriminative Motif-Finding by Gibbs Sampling
: Module Prediction and Discriminative Motif-Finding by Gibbs Sampling Rahul Siddharthan* The Institute of Mathematical Sciences, Chennai, India Abstract PhyloGibbs, our recent Gibbs-sampling motif-finder,
More informationHub Gene Selection Methods for the Reconstruction of Transcription Networks
for the Reconstruction of Transcription Networks José Miguel Hernández-Lobato (1) and Tjeerd. M. H. Dijkstra (2) (1) Computer Science Department, Universidad Autónoma de Madrid, Spain (2) Institute for
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationDifferen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies
Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies Florian Tramèr, Zhicong Huang, Erman Ayday, Jean- Pierre Hubaux ACM CCS 205 Denver, Colorado,
More informationIntroduction to Bioinformatics Online Course: IBT
Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple
More informationGoing Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014
Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more
More information