De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes
|
|
- Russell Tate
- 6 years ago
- Views:
Transcription
1 De novo identification of motifs in one species Modified from Serafim Batzoglou s lecture notes
2 Finding Regulatory Motifs... Given a collection of genes that may be regulated by the same transcription factor, Find the TF-binding motif in common.
3 Characteristics of Regulatory Motifs TCCGGAAGC TCCGGATGC TCCGGATCT CATGGATGC CCAGGAAGT GGTGGATGC ACCGGATGC Tiny degenerate ~Constant Size Because a constant-size transcription factor binds Occurs everywhere
4 Enormous non-coding sequences??? 0 gene Given upstream regions of coregulated genes: Increasing length makes motif finding harder random motifs clutter the true ones Decreasing length makes motif finding harder true motif missing in some sequences
5 Problem Definition Given a collection of promoter sequences s 1,, s N of genes with common expression Combinatorial Motif M: m 1 m W Some of the m i s blank Find M that occurs in all s i with k differences Probabilistic Motif: M ij ; 1 i W 1 j 4 M ij = Prob[ letter i, pos j ] Find best M, and positions p 1,, p N in sequences
6 Essentially a Multiple Local Alignment... Find best multiple local alignment Alignment score defined differently in probabilistic/combinatorial cases
7 Probabilistic Algorithms 1. Expectation Maximization: MEME 2. Gibbs Sampling: AlignACE, BioProspector Combinatorial CONSENSUS
8 Discrete Approaches to Motif Finding
9 Discrete Formulations Given sequences S = {x 1,, x n } A motif W is a consensus string w 1 w K Find motif W * with best match to x 1,, x n Definition of best : d(w, x i ) = min hamming dist. between W and a word in x i d(w, S) = i d(w, x i )
10 Approaches Exhaustive Searches CONSENSUS MULTIPROFILER, TEIRESIAS, SP-STAR, WINNOWER
11 Exhaustive Searches Pattern-driven algorithm: For W = AA A to TT T possibilities) Find d( W, S ) Report W* = argmin( d(w, S) ) Running time: O( K N 4 K ) (where N = i x i ) (4 K
12 Exhaustive Searches (2) 2. Sample-driven algorithm: For W = a K-bp word in some x i Find d( W, S ) Report W* = argmin( d( W, S ) ) OR Report a local improvement of W * Running time: O( K N 2 )
13 Exhaustive Searches (3) Problem with sample-driven approach: If: True motif does not occur in data, and True motif is weak Then, random strings may score better than any instance of true motif
14 CONSENSUS (1) Algorithm: Cycle 1: For each word W in S 1 For each word W in S 2 Create alignment (gap free) of W, W Keep the C 1 best alignments, A 1,, A C1 ACGGTTG, CGAACTT, GGGCTCT ACGCCTG, AGAACTA, GGGGTGT
15 CONSENSUS (2) Algorithm (cont d): Cycle 2: For each word W in S For each alignment A j from the previous cycle,create alignment (gap free) of W, A j Keep the C l best alignments A 1,, A cl
16 CONSENSUS (3) C 1,, C n are user-defined heuristic constants Running time: O(N 2 ) + O(N C 1 ) + O(N C 2 ) + + O(N C n ) = O( N 2 + NC total ) Where C total = i C i, typically O(nC), where C is a big constant
17 MULTIPROFILER Extended sample-driven approach Given a K-bp long word W, define: N a (W) = words W in S s.t. d(w,w ) a Idea: Assume W is occurrence of true motif W * Will use N a (W) to correct errors in W
18 MULTIPROFILER (2) Assume W differs from true motif W * in at most L positions Define: A wordlet G of W is a L-bp pattern with blanks, differing from W Example: K = 7; L = 3 W = ACGTTGA G = --A--CG
19 Algorithm: MULTIPROFILER (2) For each W in S: For L = 1 to L max Find all strong L-long wordlets G in N a (W) Modify W by the wordlet G -> W Compute d(w, S) Report W * = argmin d(w, S)
20 Expectation Maximization in Motif Finding
21 Expectation Maximization (1) The MM algorithm, part of MEME package uses Expectation Maximization Algorithm (sketch): Given genomic sequences find all K-bp words 1. Assume each word is motif or background 2. Find likeliest motif & background models, and classification of words
22 Expectation Maximization (2) Given sequences x 1,, x N, Find all k-bp words X 1,, X n Define motif model: M = (M 1,, M K ) M i = (M i1,, M i4 ) (assume {A, C, G, T}) where M ij = Prob[ motif position i is letter j ] Define background model: B = B 1,, B 4 B i = Prob[ letter j in background sequence ]
23 Expectation Maximization (3) Define Z i1 = { 1, if X i is motif; 0, otherwise } Z i2 = { 0, if X i is motif; 1, otherwise } Given a word X i = a[1] a[k], P[ X i, Z i1 =1 ] = M 1a[1] M ka[k] P[ X i, Z i2 =1 ] = (1 - ) B a[1] B a[k]
24 Expectation Maximization (4) Define: Parameter space = (M,B) Objective: Maximize log likelihood of model: log ) ( log )) ( log( ),,... ( log j j j ij n i j i ij n i n i j j i j ij n Z Z Z X P X P Z X X P
25 Expectation Maximization (5) Maximize expected likelihood, in iteration of two steps: Expectation: Find expected value of log likelihood: E[log P( X... X, Z, )] 1 n Maximization: Maximize expected value over,
26 Expectation Maximization (6): E-step Expectation: Find expected value of log likelihood: ]log [ ) ( ]log [ )],,... ( [log j j j ij n i j i ij n i n Z Z E X P E Z X X P E where expected values of Z can be computed as follows: 2 1 ) ( ) ( k k i k j i j ij X P X P Z
27 Expectation Maximization (7): M-step Maximization: Maximize expected value over and independently For, this is easy: NEW j arg m a x n E[ Z ]log j ij j i1 i1 n Z ij n
28 Expectation Maximization (8): M-step For = (M, B), define c jk = E[ # times letter k appears in motif position j] c 0k = E[ # times letter k appears in background] It easily follows: M NEW jk c jk 4 k 1 c jk B NEW k c 0k 4 c k 1 0 k to not allow any 0 s, add pseudocounts
29 Overview of EM Algorithm 1. Initialize parameters = (M, B), : Try different values of from N -1/2 up to 1/(2K) 2. Repeat: a. Expectation b. Maximization 3. Until change in = (M, B), falls below 4. Report results for several good
30 Conclusion One iteration running time: O(NK) Usually need < N iterations for convergence, and < N starting points. Overall complexity: unclear typically O(N 2 K) -O(N 3 K) EM is a local optimization method Initial parameters matter MEME: Bailey and Elkan, ISMB 1994.
31 Gibbs Sampling in Motif Finding
32 Overview of Gibbs Sampler Iteratively sample from conditional distribution when other parameters are fixed. In order to draw: ( 1 2 n ~ X, X,, X ) ( X ) i, draw X i ~ ( X i X [ i] )
33 Gibbs Sampling (1) Given: x 1,, x N, motif length K, background B, Find: Model M Locations a 1,, a N in x 1,, x N Maximizing log-odds likelihood ratio: N K M ( k, x i a k log i B( x i ) i1 k 1 a k i )
34 Gibbs Sampling (2) AlignACE: first statistical motif finder BioProspector: improved version of AlignACE Algorithm (sketch): 1. Initialization: a. Select random locations in sequences x 1,, x N b. Compute an initial model M from these locations 2. Sampling Iterations: a. Remove one sequence x i b. Recalculate model c. Pick a new location of motif in x i according to probability the location is a motif occurrence
35 Gibbs Sampling (3) Initialization: Select random locations a 1,, a N in x 1,, x N For these locations, compute M: N 1 M kj ( xa k j) i N That is, M kj is the number of occurrences of letter j in motif position k, over the total i1
36 Gibbs Sampling (4) Predictive Update: Select a sequence x = x i Remove x i, recompute model: M kj ( N 1 1) ( j B s N 1, s ( x i a s k j)) where j are pseudocounts to avoid 0s, and B = j j M
37 Gibbs Sampling (5) Sampling: For every K-long word x j,,x j+k-1 in x: Q j = Prob[ word motif ] = M(1,x j ) M(k,x j+k-1 ) P i = Prob[ word background ] B(x j ) B(x j+k-1 ) Let A j Q j x k 1 j1 / Q j P / j P j Prob 0 x Sample a random new position a i according to the probabilities A 1,, A x -k+1.
38 Gibbs Sampling (6) Running Gibbs Sampling: 1. Initialize 2. Run until convergence 3. Repeat 1,2 several times, report common motifs
39 Advantages / Disadvantages Very similar to EM Advantages: Easier to implement Less dependent on initial parameters More versatile, easier to enhance with heuristics Disadvantages: More dependent on all sequences to exhibit the motif Less systematic search of initial parameter space
40 Repeats, and a Better Background Model Repeat DNA can be confused as motif Especially low-complexity CACACA AAAAA, etc. Solution: more elaborate background model 0 th order: B = { p A, p C, p G, p T } 1 st order: B = { P(A A), P(A C),, P(T T) } K th order: B = { P(X b 1 b K ); X, b i {A,C,G,T} } Has been applied to EM and Gibbs (up to 3 rd order)
41 Applications
42 Application 1: Motifs in Yeast Group: Tavazoie et al. 1999, G. Church s lab, Harvard Data: Microarrays on 6,220 mrnas from yeast Affymetrix chips (Cho et al.) 15 time points across two cell cycles
43 Processing of Data 1. Selection of 3,000 genes Genes with most variable expression were selected 2. Clustering according to common expression K-means clustering 30 clusters, genes/cluster Clusters correlate well with known function 3. AlignACE motif finding 600-long upstream regions 50 regions/trial
44 Motifs in Periodic Clusters
45 Motifs in Non-periodic Clusters
46 Application 2: Discovery of Heat Shock Motif in C. Elegans Group: GuhaThakurta et al. 2002, C.D. Link s lab & colleagues Data: Microarrays on 11,917 genes from C. Elegans Isolated genes upregulated in heat shock
47 Processing of Data, and Results Isolated 28 genes upregulated in heat shock during 5 separate experiments Motif finding with CONSENSUS and ANNSpec on 500-long upstream regions 2 motifs found: TTCTAGAA: known heat shock factor (HSF) GGGTGTC: previously unreported Conserved in comparison with C. Briggsae Validation by in vitro mutagenesis of a GFP reporter
Chapter 7: Regulatory Networks
Chapter 7: Regulatory Networks 7.2 Analyzing Regulation Prof. Yechiam Yemini (YY) Computer Science Department Columbia University The Challenge How do we discover regulatory mechanisms? Complexity: hundreds
More informationMatrix-based pattern discovery algorithms
Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
More informationNeyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?
Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationEM-algorithm for motif discovery
EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationTranscrip:on factor binding mo:fs
Transcrip:on factor binding mo:fs BMMB- 597D Lecture 29 Shaun Mahony Transcrip.on factor binding sites Short: Typically between 6 20bp long Degenerate: TFs have favorite binding sequences but don t require
More informationAlgorithms in Bioinformatics II SS 07 ZBIT, C. Dieterich, (modified script of D. Huson), April 25,
Algorithms in Bioinformatics II SS 07 ZBIT, C. Dieterich, (modified script of D. Huson), April 25, 200707 Motif Finding This exposition is based on the following sources, which are all recommended reading:.
More informationAlgorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding
Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12:00 4001 Motif finding This exposition was developed by Knut Reinert and Clemens Gröpl. It is based on the following
More informationCSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery
CSE 527 Autumn 2006 Lectures 8-9 (& part of 10) Motifs: Representation & Discovery 1 DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%,
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationDNA Binding Proteins CSE 527 Autumn 2007
DNA Binding Proteins CSE 527 Autumn 2007 A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%, of all human proteins) modulate transcription of protein coding
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationInferring Transcriptional Regulatory Networks from High-throughput Data
Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20
More informationComputational Genomics. Uses of evolutionary theory
Computational Genomics 10-810/02 810/02-710, Spring 2009 Model-based Comparative Genomics Eric Xing Lecture 14, March 2, 2009 Reading: class assignment Eric Xing @ CMU, 2005-2009 1 Uses of evolutionary
More informationA Combined Motif Discovery Method
University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 8-6-2009 A Combined Motif Discovery Method Daming Lu University of New Orleans Follow
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationWhole-genome analysis of GCN4 binding in S.cerevisiae
Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationAlignment. Peak Detection
ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie
More informationModeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions)
Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions) Computational Genomics Course Cold Spring Harbor Labs Oct 31, 2016 Gary D. Stormo Department of Genetics
More informationIntro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye!
Today Gene regulation Synteny Good bye! Gene regulation What governs gene transcription? Genes active under different circumstances. Gene regulation What governs gene transcription? Genes active under
More informationOutline CSE 527 Autumn 2009
Outline CSE 527 Autumn 2009 5 Motifs: Representation & Discovery Previously: Learning from data MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) These Slides: Bio: Expression
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationGene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji
Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationPhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny
PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny Rahul Siddharthan 1,2, Eric D. Siggia 1, Erik van Nimwegen 1,3* 1 Center for Studies in Physics and Biology, The Rockefeller University,
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationHow can one gene have such drastic effects?
Slides revised and adapted Computational Biology course IST Ana Teresa Freitas 2011/2012 A recent microarray experiment showed that when gene X is knocked out, 20 other genes are not expressed How can
More informationIntroduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo
Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Assaf Weiner Tuesday, March 13, 2007 1 Introduction Today we will return to the motif finding problem, in lecture 10
More informationHidden Markov Models
Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm
More informationSequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,
Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:
More informationA Case Study -- Chu et al. The Transcriptional Program of Sporulation in Budding Yeast. What is Sporulation? CSE 527, W.L. Ruzzo 1
A Case Study -- Chu et al. An interesting early microarray paper My goals Show arrays used in a real experiment Show where computation is important Start looking at analysis techniques The Transcriptional
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K HiSeq X & NextSeq Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization:
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)
More informationCSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:
Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2
More informationStochastic processes and
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University
More informationVariable Selection in Structured High-dimensional Covariate Spaces
Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007
More informationStephen Scott.
1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative
More informationProbabilistic models of biological sequence motifs
Probabilistic models of biological sequence motifs Discovery of new motifs Master in Bioinformatics UPF 2015-2016 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain what
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationQB LECTURE #4: Motif Finding
QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationPosition-specific scoring matrices (PSSM)
Regulatory Sequence nalysis Position-specific scoring matrices (PSSM) Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d ix-marseille, France Technological dvances for Genomics and Clinics
More informationJianlin Cheng, PhD. Department of Computer Science University of Missouri, Columbia. Fall, 2014
Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia Fall, 2014 Free for academic use. Copyright @ Jianlin Cheng & original sources for some materials Find a set of sub-sequences
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationLecture 7 Sequence analysis. Hidden Markov Models
Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden
More informationInferring Protein-Signaling Networks
Inferring Protein-Signaling Networks Lectures 14 Nov 14, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1
More informationQuantitative Bioinformatics
Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize
More informationBio nformatics. Lecture 3. Saad Mneimneh
Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/14/07 CAP5510 1 CpG Islands Regions in DNA sequences with increased
More informationA Sequential Monte Carlo Method for Motif Discovery Kuo-ching Liang, Xiaodong Wang, Fellow, IEEE, and Dimitris Anastassiou, Fellow, IEEE
4496 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 9, SEPTEMBER 2008 A Sequential Monte Carlo Method for Motif Discovery Kuo-ching Liang, Xiaodong Wang, Fellow, IEEE, and Dimitris Anastassiou, Fellow,
More informationMEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY
Command line Training Set First Motif Summary of Motifs Termination Explanation MEME - Motif discovery tool MEME version 3.0 (Release date: 2002/04/02 00:11:59) For further information on how to interpret
More informationGLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data
GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between
More informationDeciphering regulatory networks by promoter sequence analysis
Bioinformatics Workshop 2009 Interpreting Gene Lists from -omics Studies Deciphering regulatory networks by promoter sequence analysis Elodie Portales-Casamar University of British Columbia www.cisreg.ca
More informationMeasuring TF-DNA interactions
Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF
More information6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II
6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 05 Hidden Markov Models Part II 1 2 Module 1: Aligning and modeling genomes Module 1: Computational foundations Dynamic programming:
More informationGibbs sampling. Massimo Andreatta Center for Biological Sequence Analysis Technical University of Denmark.
Gibbs sampling Massimo Andreatta Center for Biological Sequence Analysis Technical University of Denmark massimo@cbs.dtu.dk Technical University of Denmark 1 Monte Carlo simulations MC methods use repeated
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How
More informationOn the Monotonicity of the String Correction Factor for Words with Mismatches
On the Monotonicity of the String Correction Factor for Words with Mismatches (extended abstract) Alberto Apostolico Georgia Tech & Univ. of Padova Cinzia Pizzi Univ. of Padova & Univ. of Helsinki Abstract.
More information1.5 MM and EM Algorithms
72 CHAPTER 1. SEQUENCE MODELS 1.5 MM and EM Algorithms The MM algorithm [1] is an iterative algorithm that can be used to minimize or maximize a function. We focus on using it to maximize the log likelihood
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationAlignment Algorithms. Alignment Algorithms
Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.
More informationExpectation-Maximization (EM) algorithm
I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce
More informationAlgorithms for Bioinformatics
These slides are based on previous years slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen These slides use material from http://bix.ucsd.edu/bioalgorithms/slides.php Algorithms for Bioinformatics
More informationObjectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.
Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae Emily Germain 1,2 Mentor Dr. Hugh Nicholas 3 1 Bioengineering & Bioinformatics Summer Institute, Department of Computational
More informationComparative Network Analysis
Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More informationLecture 5: November 19, Minimizing the maximum intracluster distance
Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More informationExample: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding
Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)
More informationCh. 9 Multiple Sequence Alignment (MSA)
Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -
More informationFinding motifs from all sequences with and without binding sites
BIOINFORMATICS ORIGINAL PAPER Vol. 22 no. 18 2006, pages 2217 2223 doi:10.1093/bioinformatics/btl371 Sequence analysis Finding motifs from all sequences with and without binding sites Henry C. M. Leung
More informationFrom Arabidopsis roots to bilinear equations
From Arabidopsis roots to bilinear equations Dustin Cartwright 1 October 22, 2008 1 joint with Philip Benfey, Siobhan Brady, David Orlando (Duke University) and Bernd Sturmfels (UC Berkeley), research
More informationShane T. Jensen, X. Shirley Liu, Qing Zhou and Jun S. Liu
Statistical Science 2004, Vol. 19, No. 1, 188 204 DOI 10.1214/088342304000000107 Institute of Mathematical Statistics, 2004 Computational Discovery of Gene Regulatory Binding Motifs: A Bayesian Perspective
More informationCSCE 471/871 Lecture 3: Markov Chains and
and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas CG-Islands Given 4 nucleotides: probability of occurrence is ~ 1/4. Thus, probability of
More informationFinding Regulatory Motifs in DNA Sequences
Finding Regulatory Motifs in DNA Sequences Outline Implanting Patterns in Random Text Gene Regulation Regulatory Motifs The Gold Bug Problem The Motif Finding Problem Brute Force Motif Finding The Median
More informationPhylogenetic Gibbs Recursive Sampler. Locating Transcription Factor Binding Sites
A for Locating Transcription Factor Binding Sites Sean P. Conlan 1 Lee Ann McCue 2 1,3 Thomas M. Smith 3 William Thompson 4 Charles E. Lawrence 4 1 Wadsworth Center, New York State Department of Health
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationChapter 8. Regulatory Motif Discovery: from Decoding to Meta-Analysis. 1 Introduction. Qing Zhou Mayetri Gupta
Chapter 8 Regulatory Motif Discovery: from Decoding to Meta-Analysis Qing Zhou Mayetri Gupta Abstract Gene transcription is regulated by interactions between transcription factors and their target binding
More informationAmino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)
Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationDiscovering MultipleLevels of Regulatory Networks
Discovering MultipleLevels of Regulatory Networks IAS EXTENDED WORKSHOP ON GENOMES, CELLS, AND MATHEMATICS Hong Kong, July 25, 2018 Gary D. Stormo Department of Genetics Outline of the talk 1. Transcriptional
More informationWritten Exam 15 December Course name: Introduction to Systems Biology Course no
Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate
More informationComputational Genomics. Reconstructing dynamic regulatory networks in multiple species
02-710 Computational Genomics Reconstructing dynamic regulatory networks in multiple species Methods for reconstructing networks in cells CRH1 SLT2 SLR3 YPS3 YPS1 Amit et al Science 2009 Pe er et al Recomb
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationMining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences
Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences Daisuke Ikeda Department of Informatics, Kyushu University 744 Moto-oka, Fukuoka 819-0395, Japan. daisuke@inf.kyushu-u.ac.jp
More informationPlan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping
Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics
More informationCSEP 590B Fall Motifs: Representation & Discovery
CSEP 590B Fall 2014 5 Motifs: Representation & Discovery 1 Outline Previously: Learning from data MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) These Slides: Bio: Expression
More informationHidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:
Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm
More informationGenome 541 Introduction to Computational Molecular Biology. Max Libbrecht
Genome 541 Introduction to Computational Molecular Biology Max Libbrecht Genome 541 units Max Libbrecht: Gene regulation and epigenomics Postdoc, Bill Noble s lab Yi Yin: Bayesian statistics Postdoc, Jay
More informationSTATISTICAL SIGNIFICANCE FOR DNA MOTIF DISCOVERY
STATISTICAL SIGNIFICANCE FOR DNA MOTIF DISCOVERY A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor
More informationLogic Regression: Biological Motivation Cyclic Gene Study
Logic Regression Earl F. Glynn March 005 1 Logic Regression 1. Biological Motivation. Background Regression Boolean Algebra 3. Logic Regression Formalities 4. Simple Example in R 5. Take Home Message 1
More information