Transcrip:on factor binding mo:fs

Size: px
Start display at page:

Download "Transcrip:on factor binding mo:fs"

Transcription

1 Transcrip:on factor binding mo:fs BMMB- 597D Lecture 29 Shaun Mahony

2 Transcrip.on factor binding sites Short: Typically between 6 20bp long Degenerate: TFs have favorite binding sequences but don t require a perfect match to bind. Belong to families: Related TFs oken bind to similar sequences. Likes to bind: CCGGAA TGACCT..TGACCT ATTA CACGTG

3 We can represent a TF s binding preference as a degenerate string or probabilis.cally Defini.on: A mo:f is a panern of conserva:on in a sequence alignment. TF: NFκB Favorite sequence: GGGAATTTCC Known NFκB binding sites: GGGAATTTCC GGGAATTTCC GGGAATTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGAAATTCC GGGACTTCCC GGGGATTTCC GGGGATTTCC GGGGATTTCC GGGATTTTCC GGGGATTCCC GGGATTTCCC GGGAATTCAC GGGGCTTTCC GGGGCTTTCC GGGAAGTCCC Consensus sequence GGGRNWTYCC Weight matrix

4 Consensus sequences Defini.on: A consensus sequence represents a mul:ple sequence alignment using a degenerate alphabet to represent variable posi:ons. Extended DNA Alphabet (IUPAC) A C G T W = A or T S = C or G M = A or C K = G or T R = A or G Y = C or T N = any base (A/C/G/T) Alignment GGGAATTTCC GGGAATTTCC GGGAATTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGAAATTCC GGGACTTCCC GGGGATTTCC GGGGATTTCC GGGGATTTCC GGGATTTTCC Consensus GGGAATTTCC GGGAMTTTCC GGGAMWTTCC GGGAMWTYCC GGGRNWTYCC GGGGTTTCCC?

5 Posi.on weight matrices Defini.on: A posi:on weight matrix (PWM) is a matrix of log likelihood values that gives a weighted match to fixed length strings. Also known as posi:on specific scoring matrix (PSSM). GGGAATTTCC GGGAATTTCC GGGAATTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGAAATTCC GGGACTTCCC GGGGATTTCC GGGGATTTCC GGGGATTTCC GGGATTTTCC GGGGATTCCC GGGATTTCCC GGGAATTCAC GGGGCTTTCC A C G T Count matrix A C G T Rela.ve frequency matrix p i,j : probability of observing letter i at position j

6 Posi.on weight matrices A C G T Posi.on weight matrix: m i,j = log 2 (p i,j / b i ) where b i is the background probability of letter i Informa:on content: LeNer height in logo: Sequence logo T IC j = 2 ( p i, j log 2 (p i, j )) i=a Height i, j = p i, j IC j

7 Mo.f scanning: finding instances of a known mo.f >Test Sequence TTACGTTTGTCGATTTATGGGACTTTCCTCTTCGTATTTATTAGGCT T T A C G T T T G T C T C A C G T Score = A T G G G A C T T T C C T C A C G T Score = 17.57

8 Databases of known mo.fs TRANSFAC: hnp:// regula:on.com/pub/databases.html Public version is old (2005), private version is expensive. Best coverage, not necessarily highest quality mo:fs. Jaspar: hnp://jaspar.genereg.net Open access. High quality mo:fs, medium coverage. UniProbe: hnp://the_brain.bwh.harvard.edu/uniprobe Based on in vitro protein binding microarray experiments. Good coverage of many TF families.

9 Jaspar

10 Mo.f scanning tools Mo:fViz: hnp://biowulf.bu.edu/mo:fviz/ Various mo:f scanning tools included. FIMO: hnp://meme.nbcr.net/meme/cgi- bin/fimo.cgi Part of the MEME suite of mo:f analysis tools. MATCH: Kel, et al. Nucleic Acids Res (2003) TAMO: Gordon, et al. Bioinforma:cs (2005) Mo:fScanner/TOUCAN: Aerts, et al. Nucleic Acids Res (2005)

11 FIMO: Finding Individual Mo.f Occurrences >Seq1 AATGTTCAGCTAAAATGTATTATTTTTTGTCTATACATAGAACGTGGGAATTTCCCC AGGGTTCTAAATAGACTCAAGTTAGCCTCTTATCAGAGCAGAA >Seq2 TGCCTCAGACTTTCAAATTATATCTGGGTGGATCATTCAAAGGCCGGGAAATTCCAA TAGCTGCCTACATAGATATGGAGTATTGCCATACTTTCTGGCC >Seq3 CCTCACAGGAACAGGAGCAGAGAAATTTTGAACAAATCAAATCTTGGGATTTCCCCT GGTGCTGTACCTGCAATAGCCTGTCCCAACTAAGGGTATAACA

12 FIMO: Finding Individual Mo.f Occurrences

13

14 What if we don t know the mo.f? Defini.on: Mo:f- finding is the process by which short, degenerate, sta:s:cally- overrepresented panerns are discovered in a set of sequences. When do we use mo.f scanning vs. mo.f- finding? Mo.f scanning: We know TF x binds somewhere in a sequence (e.g. we know it directly regulates a gene) AND we know the mo:f for TF x. Beware of similarity thresholds! Mo.f- finding: We hypothesize that a set of genes are bound by the same (unknown) TF. The set of genes may be defined by a similar gene expression profiles. Mo.f- finding: We performed high- throughput Chroma:n Immunoprecipita:on assays (ChIP- chip/chip- seq) on TF y, and we want to characterize TF y s binding preference from that data.

15 Mo.f- finding approaches Expecta:on- Maximiza:on MEME, Improbizer Gibbs Sampling AlignACE, GLAM, Mo:fSampler Clustering SOMBRERO Exhaus:ve enumera:on Weeder, YMF Evalua:on of various methods in: Tompa, et al. Nature Biotech. (2005)

16 Mo.f- finding with Expecta.on Maximiza.on (EM) Define a mixture model with two components: Mo:f & Background, where mo:f is ini:alized randomly Simple EM algorithm: 1. Expecta>on: Find expected likelihood of sequence data given current model. i.e. Probabilis:cally assign each k- mer to mo:f & background components. 2. Maximiza>on: Update model parameters to maximize expected- likelihood func:on. i.e. Update the mo:f and background component parameters to reflect the assignments in 1) 3. Repeat E- M un:l no change in likelihood.

17 Problem: We added galactose to the growth media of yeast (S. cerevisiae), and observed the up- regula:on of four genes: GAL7, GAL10, GAL1 & GAL2. We hypothesize that these four genes are under the control of the same transcrip:on factor. To test this hypothesis, we perform mo:f- finding on the promoter sequences of these genes. >YBR018C GAL7 GACGGTAGCAACAAGAATATAGCACGAGCCGCGGAGTTCATTTCGTTACTTTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATC ATAAGGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGGGTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTT TGCCTCTCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAGATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAA AGGGTCCAAAAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGTGTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGC TATGTTCAGTTAGTTTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATTATGCAGAGCATCAACATGATAAA >YBR019C GAL10 ATCGCTTCGCTGATTAATTACCCCAGAAATAAGGCTAAAAAACTAATCGCATTATCATCCTATGGTTGTTAATTTGATTCGTTAATTTGAAGGTTTGT GGGGCCAGGTTCTGCCAATTTTTCCTCTTCATAACCATAAAAGCTAGTATTGTAGAATCTTTATTGTTCGGAGCAGTGCGGCGCGAGGCACATCTGC GTTTCAGGAACGCGACCGGTGAAGACGAGGACGCACGGAGGAGAGTCTTCCGTCGGAGGGCTGTCGCCCGCTCGGCGGCTTCTAATCCGTACTT CAATATAGCAATGAGCAGTTAAGCGTATTACTGAAAGTTCCAAAGAGAAGGTTTTTTTAGGCTAAGATAATGGGGCTCTTTACATTTCCACAACATA TAAGTAAGATTAGATATGGATATGTATATGGTGGTAATGCCATGTAATATGATTATTAAACTTCTTTGCGTCCATCCAAAAAAAAAGT >YBR020W GAL1 ACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTC TTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTC TCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGC TTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGAT TAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTA ACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATA >YLR081W GAL2 AGGTTGCAATTTCTTTTTCTATTAGTAGCTAAAAATGGGTCACGTGATCTATATTCGAAAGGGGCGGTTGCCTCAGGAAGGCACCGGCGGTCTTTC GTCCGTGCGGAGATATCTGCGCCGTTCAGGGGTCCATGTGCCTTGGACGATATTAAGGCAGAAGGCAGTATCGGGGCGGATCACTCCGAACCGA GATTAGTTAAGCCCTTCCCATCTCAAGATGGGGAGCAAATGGCATTATACTCCTGCTAGAAAGTTAACTGTGCACATATTCTTAAATTATACAACAT TCTGGAGAGCTATTGTTCAAAAAACAAACATTTCGCAGGCTAAAATGTGGAGATAGGATAAGTTTTGTAGACATATATAAACAATCAGTAATTGGA TTGAAAATTTGGTGTTGTGAATTGCTCTTCATTATGCACCTTATTCAATTATCATCAAGAATAGTAATAGTTAAGTAAACACAAGATTA

18 MEME has a lot of op.ons, and good documenta.on

19 MEME Galactose example

20 MEME Galactose example

21 >YBR018C GAL7 GACGGTAGCAACAAGAATATAGCACGAGCCGCGGAGTTCATTTCGTTACTTTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATC ATAAGGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGGGTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTT TGCCTCTCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAGATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAA AGGGTCCAAAAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGTGTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGC TATGTTCAGTTAGTTTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATTATGCAGAGCATCAACATGATAAA >YBR019C GAL10 ATCGCTTCGCTGATTAATTACCCCAGAAATAAGGCTAAAAAACTAATCGCATTATCATCCTATGGTTGTTAATTTGATTCGTTAATTTGAAGGTTTGT GGGGCCAGGTTCTGCCAATTTTTCCTCTTCATAACCATAAAAGCTAGTATTGTAGAATCTTTATTGTTCGGAGCAGTGCGGCGCGAGGCACATCTGC GTTTCAGGAACGCGACCGGTGAAGACGAGGACGCACGGAGGAGAGTCTTCCGTCGGAGGGCTGTCGCCCGCTCGGCGGCTTCTAATCCGTACTT CGTTTCAGGAACGCGACCGGTGAAGACGAGGACGCACGGAGGAGAGTCTTCCGTCGGAGGGCTGTCGCCCGCTCGGCGGCTTCTAATCCGTACT CAATATAGCAATGAGCAGTTAAGCGTATTACTGAAAGTTCCAAAGAGAAGGTTTTTTTAGGCTAAGATAATGGGGCTCTTTACATTTCCACAACATA TCAATATAGCAATGAGCAGTTAAGCGTATTACTGAAAGTTCCAAAGAGAAGGTTTTTTTAGGCTAAGATAATGGGGCTCTTTACATTTCCACAACAT ATAAGTAAGATTAGATATGGATATGTATATGGTGGTAATGCCATGTAATATGATTATTAAACTTCTTTGCGTCCATCCAAAAAAAAAGT >YBR020W GAL1 ACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTC TTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTC TCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGC TTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGAT TAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTA ACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATA >YLR081W GAL2 AGGTTGCAATTTCTTTTTCTATTAGTAGCTAAAAATGGGTCACGTGATCTATATTCGAAAGGGGCGGTTGCCTCAGGAAGGCACCGGCGGTCTTTC GTCCGTGCGGAGATATCTGCGCCGTTCAGGGGTCCATGTGCCTTGGACGATATTAAGGCAGAAGGCAGTATCGGGGCGGATCACTCCGAACCGA GATTAGTTAAGCCCTTCCCATCTCAAGATGGGGAGCAAATGGCATTATACTCCTGCTAGAAAGTTAACTGTGCACATATTCTTAAATTATACAACAT TCTGGAGAGCTATTGTTCAAAAAACAAACATTTCGCAGGCTAAAATGTGGAGATAGGATAAGTTTTGTAGACATATATAAACAATCAGTAATTGGA TTGAAAATTTGGTGTTGTGAATTGCTCTTCATTATGCACCTTATTCAATTATCATCAAGAATAGTAATAGTTAAGTAAACACAAGATTA We found a mo.f! Now what?

22 Which TFs could bind our mo.fs? Match discovered mo:fs against known mo:f databases. Mo:f alignment sokware: STAMP hnp:// Mo:f alignment, mul:ple alignment, clustering TOMTOM hnp://meme.nbcr.net/meme/cgi- bin/tomtom.cgi Integrated with MEME suite

23

24

25 Transcrip.on factor binding mo.fs come in families Elk4 NR2F1 Antp Max Eip74EF PPARg Lim3 Myc FEV Esrrb Yox1 NHLH1

26 The fu.lity theorem Fu.lity theorem*: essen:ally all TF binding mo:f occurrences will have no func:on. How can we focus on mo:f instances that are bound & func:onal? Conserva:on Nucleosome posi:oning Cis- regulatory modules (i.e. clusters of sites) Accessibility (DNaseI hypersensi:vity) Chroma:n marks (H3K4me1, H3K27ac) * Wasserman & Sandelin, Nature Reviews Gene:cs (2004)

27 Priors for mo.f- finding Posi:onal priors: Analyzing ChIP- seq data, mo:fs are more likely to be centered under the peak. Structural priors: Mo:fs are more likely to be similar to known mo:fs and to have par:cular informa:on content shapes.

28 Further reading Protein- DNA binding assays: Determining the specificity of protein DNA interac:ons, Stormo & Zhao, Nature Rev Gene:cs (2010) Mo:f scanning & mo:f- finding: Prac:cal strategies for discovering regulatory DNA sequence mo:fs, MacIsaac & Fraenkel, PLoS Comp Bio (2006) Applied bioinforma:cs for the iden:fica:on of regulatory elements, Wasserman & Sandelin, Nature Rev Gene:cs (2004) MEME: Fiung a mixture model by expecta:on maximiza:on to discover mo:fs in biopolymers, Bailey & Elkan, Proc. ISMB (1994) Gene regula:on & Gal4 system: Genes & Signals, Ptashne & Gann (2002)

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC

More information

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes De novo identification of motifs in one species Modified from Serafim Batzoglou s lecture notes Finding Regulatory Motifs... Given a collection of genes that may be regulated by the same transcription

More information

Matrix-based pattern discovery algorithms

Matrix-based pattern discovery algorithms Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)

More information

Probabilistic models of biological sequence motifs

Probabilistic models of biological sequence motifs Probabilistic models of biological sequence motifs Discovery of new motifs Master in Bioinformatics UPF 2015-2016 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain what

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Alignment. Peak Detection

Alignment. Peak Detection ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Chapter 7: Regulatory Networks

Chapter 7: Regulatory Networks Chapter 7: Regulatory Networks 7.2 Analyzing Regulation Prof. Yechiam Yemini (YY) Computer Science Department Columbia University The Challenge How do we discover regulatory mechanisms? Complexity: hundreds

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

CSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs

CSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs CSCI1950 Z Computa3onal Methods for Biology Lecture 24 Ben Raphael April 29, 2009 hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs Subnetworks with more occurrences than expected by chance. How to

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

CSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery

CSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery CSE 527 Autumn 2006 Lectures 8-9 (& part of 10) Motifs: Representation & Discovery 1 DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%,

More information

Priors in Dependency network learning

Priors in Dependency network learning Priors in Dependency network learning Sushmita Roy sroy@biostat.wisc.edu Computa:onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu

More information

DNA Binding Proteins CSE 527 Autumn 2007

DNA Binding Proteins CSE 527 Autumn 2007 DNA Binding Proteins CSE 527 Autumn 2007 A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%, of all human proteins) modulate transcription of protein coding

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/8/07 CAP5510 1 Pattern Discovery 2/8/07 CAP5510 2 Patterns Nature

More information

Position-specific scoring matrices (PSSM)

Position-specific scoring matrices (PSSM) Regulatory Sequence nalysis Position-specific scoring matrices (PSSM) Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d ix-marseille, France Technological dvances for Genomics and Clinics

More information

Ins?tute for Computa?onal Biomedicine. ChIP- seq. Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD

Ins?tute for Computa?onal Biomedicine. ChIP- seq. Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Ins?tute for Computa?onal Biomedicine ChIP- seq Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Plan 1. ChIP- seq 2. Quality Control of ChIP- seq data 3. ChIP- seq Peak detec?on 4. Peak Analysis and

More information

Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions)

Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions) Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions) Computational Genomics Course Cold Spring Harbor Labs Oct 31, 2016 Gary D. Stormo Department of Genetics

More information

Evolu&on of Cellular Interac&on Networks. Pedro Beltrao Krogan and Lim UCSF

Evolu&on of Cellular Interac&on Networks. Pedro Beltrao Krogan and Lim UCSF Evolu&on of Cellular Interac&on Networks Pedro Beltrao Krogan and Lim Labs @ UCSF Point muta&ons Recombina&on Duplica&ons Mutants Mutants Point muta&ons Recombina&on Duplica&ons Changes in protein- protein,

More information

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim Gene Regulatory Networks II 02-710 Computa.onal Genomics Seyoung Kim Goal: Discover Structure and Func;on of Complex systems in the Cell Identify the different regulators and their target genes that are

More information

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM? Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test

More information

Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test *

Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 27, 855-868 (20) Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test * QIAN LIU +, SAN-YANG LIU AND LI-FANG LIU + Department

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

MEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY

MEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY Command line Training Set First Motif Summary of Motifs Termination Explanation MEME - Motif discovery tool MEME version 3.0 (Release date: 2002/04/02 00:11:59) For further information on how to interpret

More information

Deciphering regulatory networks by promoter sequence analysis

Deciphering regulatory networks by promoter sequence analysis Bioinformatics Workshop 2009 Interpreting Gene Lists from -omics Studies Deciphering regulatory networks by promoter sequence analysis Elodie Portales-Casamar University of British Columbia www.cisreg.ca

More information

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12) Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings

More information

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks in biology Protein-Protein Interaction Network of Yeast Transcriptional regulatory network of E.coli Experimental

More information

Quantitative Bioinformatics

Quantitative Bioinformatics Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize

More information

Inferring Protein-Signaling Networks

Inferring Protein-Signaling Networks Inferring Protein-Signaling Networks Lectures 14 Nov 14, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1

More information

Different gene regulation strategies revealed by analysis of binding motifs

Different gene regulation strategies revealed by analysis of binding motifs Different gene regulation strategies revealed by analysis of binding motifs The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation

More information

Sequence motif analysis

Sequence motif analysis Sequence motif analysis Alan Moses Associate Professor and Canada Research Chair in Computational Biology Departments of Cell & Systems Biology, Computer Science, and Ecology & Evolutionary Biology Director,

More information

Principles of Gene Expression

Principles of Gene Expression Principles of Gene Expression I. Introduc5on Genome : the en*re set of genes (transcrip*on units) of an organism Transcriptome : the en*re set of marns found in a cell at a given *me Proteome : the en*re

More information

Graph structure learning for network inference

Graph structure learning for network inference Graph structure learning for network inference Sushmita Roy sroy@biostat.wisc.edu Computa9onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu

More information

Matrix-based pattern matching

Matrix-based pattern matching Regulatory sequence analysis Matrix-based pattern matching Jacques van Helden Jacques.van-Helden@univ-amu.fr Aix-Marseille Université, France Technological Advances for Genomics and Clinics (TAGC, INSERM

More information

Computational Genomics. Uses of evolutionary theory

Computational Genomics. Uses of evolutionary theory Computational Genomics 10-810/02 810/02-710, Spring 2009 Model-based Comparative Genomics Eric Xing Lecture 14, March 2, 2009 Reading: class assignment Eric Xing @ CMU, 2005-2009 1 Uses of evolutionary

More information

On the Monotonicity of the String Correction Factor for Words with Mismatches

On the Monotonicity of the String Correction Factor for Words with Mismatches On the Monotonicity of the String Correction Factor for Words with Mismatches (extended abstract) Alberto Apostolico Georgia Tech & Univ. of Padova Cinzia Pizzi Univ. of Padova & Univ. of Helsinki Abstract.

More information

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr Introduction to Bioinformatics Shifra Ben-Dor Irit Orr Lecture Outline: Technical Course Items Introduction to Bioinformatics Introduction to Databases This week and next week What is bioinformatics? A

More information

Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding

Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12:00 4001 Motif finding This exposition was developed by Knut Reinert and Clemens Gröpl. It is based on the following

More information

for the Analysis of ChIP-Seq Data

for the Analysis of ChIP-Seq Data Supplementary Materials: A Statistical Framework for the Analysis of ChIP-Seq Data Pei Fen Kuan Departments of Statistics and of Biostatistics and Medical Informatics Dongjun Chung Departments of Statistics

More information

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Fundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs

Fundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs Fundamentally different strategies for transcriptional regulation are revealed by information-theoretical analysis of binding motifs Zeba Wunderlich 1* and Leonid A. Mirny 1,2 1 Biophysics Program, Harvard

More information

Basics on bioinforma-cs Lecture 7. Nunzio D Agostino

Basics on bioinforma-cs Lecture 7. Nunzio D Agostino Basics on bioinforma-cs Lecture 7 Nunzio D Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Multiple alignments One sequence plays coy a pair of homologous sequence whisper many aligned

More information

Mixture Models. Michael Kuhn

Mixture Models. Michael Kuhn Mixture Models Michael Kuhn 2017-8-26 Objec

More information

Comparative Network Analysis

Comparative Network Analysis Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Prac%cal Bioinforma%cs for Life Scien%sts. Week 14, Lecture 28. István Albert Bioinforma%cs Consul%ng Center Penn State

Prac%cal Bioinforma%cs for Life Scien%sts. Week 14, Lecture 28. István Albert Bioinforma%cs Consul%ng Center Penn State Prac%cal Bioinforma%cs for Life Scien%sts Week 14, Lecture 28 István Albert Bioinforma%cs Consul%ng Center Penn State Final project A group of researchers are interested in studying protein binding loca%ons

More information

ChIP seq peak calling. Statistical integration between ChIP seq and RNA seq

ChIP seq peak calling. Statistical integration between ChIP seq and RNA seq Institute for Computational Biomedicine ChIP seq peak calling Statistical integration between ChIP seq and RNA seq Olivier Elemento, PhD ChIP-seq to map where transcription factors bind DNA Transcription

More information

Discovering MultipleLevels of Regulatory Networks

Discovering MultipleLevels of Regulatory Networks Discovering MultipleLevels of Regulatory Networks IAS EXTENDED WORKSHOP ON GENOMES, CELLS, AND MATHEMATICS Hong Kong, July 25, 2018 Gary D. Stormo Department of Genetics Outline of the talk 1. Transcriptional

More information

Whole-genome analysis of GCN4 binding in S.cerevisiae

Whole-genome analysis of GCN4 binding in S.cerevisiae Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?

More information

EM-algorithm for motif discovery

EM-algorithm for motif discovery EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width

More information

A Combined Motif Discovery Method

A Combined Motif Discovery Method University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 8-6-2009 A Combined Motif Discovery Method Daming Lu University of New Orleans Follow

More information

A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12

A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12 The integration host factor regulon of E. coli K12 genome 783 A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12 M. Trindade dos Santos and

More information

Jianlin Cheng, PhD. Department of Computer Science University of Missouri, Columbia. Fall, 2014

Jianlin Cheng, PhD. Department of Computer Science University of Missouri, Columbia. Fall, 2014 Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia Fall, 2014 Free for academic use. Copyright @ Jianlin Cheng & original sources for some materials Find a set of sub-sequences

More information

Geert Geeven. April 14, 2010

Geert Geeven. April 14, 2010 iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the

More information

Outline CSE 527 Autumn 2009

Outline CSE 527 Autumn 2009 Outline CSE 527 Autumn 2009 5 Motifs: Representation & Discovery Previously: Learning from data MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) These Slides: Bio: Expression

More information

Inferring Models of cis-regulatory Modules using Information Theory

Inferring Models of cis-regulatory Modules using Information Theory Inferring Models of cis-regulatory Modules using Information Theory BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 26 Anthony Gitter gitter@biostat.wisc.edu Overview Biological question What is causing

More information

How can one gene have such drastic effects?

How can one gene have such drastic effects? Slides revised and adapted Computational Biology course IST Ana Teresa Freitas 2011/2012 A recent microarray experiment showed that when gene X is knocked out, 20 other genes are not expressed How can

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Measuring TF-DNA interactions

Measuring TF-DNA interactions Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF

More information

PROBABILISTIC MOTIF SEARCHING

PROBABILISTIC MOTIF SEARCHING PROBABILISTIC MOTIF SEARCHING Thesis for the M.S.c. Degree Submitted to the Scientific Council of The Weizmann Institute of Science Rehovot 76100, Israel By Libi Hertzberg Carried Out Under the Supervision

More information

Scoring Matrices. Shifra Ben-Dor Irit Orr

Scoring Matrices. Shifra Ben-Dor Irit Orr Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Statistics of transcriptional regulation

Statistics of transcriptional regulation Statistics of transcriptional regulation Sündüz Keleş Department of Statistics Department of Biostatistics and Medical Informatics University of Wisconsin, Madison February 18-27, 2008 Stat 992 (877) (Spring

More information

Inferring Models of cis-regulatory Modules using Information Theory

Inferring Models of cis-regulatory Modules using Information Theory Inferring Models of cis-regulatory Modules using Information Theory BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 28 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material,

More information

CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1. Ben Raphael January 21, Course Par3culars

CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1. Ben Raphael January 21, Course Par3culars CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1 Ben Raphael January 21, 2009 Course Par3culars Three major topics 1. Phylogeny: ~50% lectures 2. Func3onal Genomics: ~25% lectures

More information

Genome 559 Wi RNA Function, Search, Discovery

Genome 559 Wi RNA Function, Search, Discovery Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,

More information

Theoretical distribution of PSSM scores

Theoretical distribution of PSSM scores Regulatory Sequence Analysis Theoretical distribution of PSSM scores Jacques van Helden Jacques.van-Helden@univ-amu.fr Aix-Marseille Université, France Technological Advances for Genomics and Clinics (TAGC,

More information

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 13.1 and 13.2 1 Status Check Extra credits? Announcement Evalua/on process will start soon

More information

Proteomics Systems Biology

Proteomics Systems Biology Dr. Sanjeeva Srivastava IIT Bombay Proteomics Systems Biology IIT Bombay 2 1 DNA Genomics RNA Transcriptomics Global Cellular Protein Proteomics Global Cellular Metabolite Metabolomics Global Cellular

More information

Transcription factors (TFs) regulate genes by binding to their

Transcription factors (TFs) regulate genes by binding to their CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling Qing Zhou* and Wing H. Wong* *Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, MA 02138;

More information

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions? 1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section

More information

Genome 541! Unit 4, lecture 3! Genomics assays

Genome 541! Unit 4, lecture 3! Genomics assays Genome 541! Unit 4, lecture 3! Genomics assays Much easier to follow with slides. Good pace.! Having the slides was really helpful clearer to read and easier to follow the trajectory of the lecture.!!

More information

Similarity of position frequency matrices for transcription factor binding sites

Similarity of position frequency matrices for transcription factor binding sites BIOINFORMATICS ORIGINAL PAPER Vol. 21 no. 3 2005, pages 307 313 doi:10.1093/bioinformatics/bth480 Similarity of position frequency matrices for transcription factor binding sites Dustin E. Schones 1,2,,

More information

Discovering Binding Motif Pairs from Interacting Protein Groups

Discovering Binding Motif Pairs from Interacting Protein Groups Discovering Binding Motif Pairs from Interacting Protein Groups Limsoon Wong Institute for Infocomm Research Singapore Copyright 2005 by Limsoon Wong Plan Motivation from biology & problem statement Recasting

More information

STATISTICAL SIGNIFICANCE FOR DNA MOTIF DISCOVERY

STATISTICAL SIGNIFICANCE FOR DNA MOTIF DISCOVERY STATISTICAL SIGNIFICANCE FOR DNA MOTIF DISCOVERY A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor

More information

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki Phylogene)cs IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, 2016 Joyce Nzioki Phylogenetics The study of evolutionary relatedness of organisms. Derived from two Greek words:» Phle/Phylon: Tribe/Race» Genetikos:

More information

Deciphering the cis-regulatory network of an organism is a

Deciphering the cis-regulatory network of an organism is a Identifying the conserved network of cis-regulatory sites of a eukaryotic genome Ting Wang and Gary D. Stormo* Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110 Edited

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

More information

Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen

Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania stjensen@wharton.upenn.edu

More information

Regulatory Element Detection using a Probabilistic Segmentation Model

Regulatory Element Detection using a Probabilistic Segmentation Model Regulatory Element Detection using a Probabilistic Segmentation Model Harmen J Bussemaker 1, Hao Li 2,3, and Eric D Siggia 2,4 1 Swammerdam Institute for Life Sciences and Amsterdam Center for Computational

More information

Finding motifs from all sequences with and without binding sites

Finding motifs from all sequences with and without binding sites BIOINFORMATICS ORIGINAL PAPER Vol. 22 no. 18 2006, pages 2217 2223 doi:10.1093/bioinformatics/btl371 Sequence analysis Finding motifs from all sequences with and without binding sites Henry C. M. Leung

More information

Different gene regulation strategies revealed by analysis of binding motifs

Different gene regulation strategies revealed by analysis of binding motifs Acknowledgements We thank members of the Zhang laboratory and three anonymous reviewers for valuable comments. This work was supported by research grants from the National Institutes of Health to J.Z.

More information

GENOME-WIDE ANALYSIS OF CORE PROMOTER REGIONS IN EMILIANIA HUXLEYI

GENOME-WIDE ANALYSIS OF CORE PROMOTER REGIONS IN EMILIANIA HUXLEYI 1 GENOME-WIDE ANALYSIS OF CORE PROMOTER REGIONS IN EMILIANIA HUXLEYI Justin Dailey and Xiaoyu Zhang Department of Computer Science, California State University San Marcos San Marcos, CA 92096 Email: daile005@csusm.edu,

More information

Objectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.

Objectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae Emily Germain 1,2 Mentor Dr. Hugh Nicholas 3 1 Bioengineering & Bioinformatics Summer Institute, Department of Computational

More information

Chapter 8. Regulatory Motif Discovery: from Decoding to Meta-Analysis. 1 Introduction. Qing Zhou Mayetri Gupta

Chapter 8. Regulatory Motif Discovery: from Decoding to Meta-Analysis. 1 Introduction. Qing Zhou Mayetri Gupta Chapter 8 Regulatory Motif Discovery: from Decoding to Meta-Analysis Qing Zhou Mayetri Gupta Abstract Gene transcription is regulated by interactions between transcription factors and their target binding

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Genome 541 Introduction to Computational Molecular Biology. Max Libbrecht

Genome 541 Introduction to Computational Molecular Biology. Max Libbrecht Genome 541 Introduction to Computational Molecular Biology Max Libbrecht Genome 541 units Max Libbrecht: Gene regulation and epigenomics Postdoc, Bill Noble s lab Yi Yin: Bayesian statistics Postdoc, Jay

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics

Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics I believe it is helpful to number your slides for easy reference. It's been a while since I took

More information

CSEP 590B Fall Motifs: Representation & Discovery

CSEP 590B Fall Motifs: Representation & Discovery CSEP 590B Fall 2014 5 Motifs: Representation & Discovery 1 Outline Previously: Learning from data MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) These Slides: Bio: Expression

More information

PhyloGibbs-MP: Module Prediction and Discriminative Motif-Finding by Gibbs Sampling

PhyloGibbs-MP: Module Prediction and Discriminative Motif-Finding by Gibbs Sampling : Module Prediction and Discriminative Motif-Finding by Gibbs Sampling Rahul Siddharthan* The Institute of Mathematical Sciences, Chennai, India Abstract PhyloGibbs, our recent Gibbs-sampling motif-finder,

More information

Hub Gene Selection Methods for the Reconstruction of Transcription Networks

Hub Gene Selection Methods for the Reconstruction of Transcription Networks for the Reconstruction of Transcription Networks José Miguel Hernández-Lobato (1) and Tjeerd. M. H. Dijkstra (2) (1) Computer Science Department, Universidad Autónoma de Madrid, Spain (2) Institute for

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies

Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies Florian Tramèr, Zhicong Huang, Erman Ayday, Jean- Pierre Hubaux ACM CCS 205 Denver, Colorado,

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

More information

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014 Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more

More information