6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
|
|
- Allison Phelps
- 5 years ago
- Views:
Transcription
1 MIT OpenCourseWare / Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit:
2 Computational Biology: Genomes, Networks, Evolution Motif Discovery Lecture 9 October 2, 2008
3 Regulatory Motifs Find promoter motifs associated with co-regulated or functionally related genes
4 Motifs Are Degenerate Protein-DNA interactions A DNA Sugar phosphate backbone B Proteins read DNA by feeling the chemical properties of the bases Without opening DNA (not by base complementarity) Ser Arg 3 Asn Sequence specificity 1 Topology of 3D contact dictates sequence specificity of binding Arg Some positions are fully constrained; other positions are degenerate C Base pair COOH D Ambiguous / degenerate positions are loosely contacted by the transcription factor NH 2 Figure by MIT OpenCourseWare.
5 Other Motifs Splicing Signals Splice junctions Exonic Splicing Enhancers (ESE) Exonic Splicing Surpressors (ESS) Protein Domains Glycosylation sites Kinase targets Targetting signals Protein Epitopes MHC binding specificities
6 Essential Tasks Modeling Motifs How to computationally represent motifs Visualizing Motifs Motif Information Predicting Motif Instances Using the model to classify new sequences Learning Motif Structure Finding new motifs, assessing their quality
7 Modeling Motifs
8 Consensus Sequences Useful for publication IUPAC symbols for degenerate sites Not very amenable to computation HEM13 HEM13 HEM13 ANB1 ANB1 ANB1 ANB1 ROX1 CCCATTGTTCTC TTTCTGGTTCTC TCAATTGTTTAG CTCATTGTTGTC TCCATTGTTCTC CCTATTGTTCTC TCCATTGTTCGT CCAATTGTTTTG YCHATTGTTCTC Figure by MIT OpenCourseWare. Nature Biotechnology 24, (2006)
9 Probabilistic Model HEM13 1 K Count frequencies Add pseudocounts CCCATT HEM13 HEM13 ANB1 ANB1 TTTCTG TCAATT CTCATT TCCATT A C G T M M K.7 ANB1 CCTATT P k (S M) ANB1 TCCATT ROX1 CCAATT Figure by MIT OpenCourseWare. Position Frequency Matrix (PFM)
10 Scoring A Sequence To score a sequence, we compare to a null model Score N N PS ( PFM) Pi( Si PFM ) Pi( Si PFM ) = log = log = log PS ( B) PS ( B) PS ( B) i= 1 i i= 1 i A C G T PFM Background DNA (B) A: 05 T: 05 G: 05 C: 05 A C G T Position Weight Matrix (PWM)
11 Scoring a Sequence Courtesy of Kenzie MacIsaac and Ernest Fraenkel. Used with permission. MacIsaac, Kenzie, and Ernest Fraenkel. "Practical Strategies for Discovering Regulatory DNA Sequence Motifs." PLoS Computational Biology 2, no. 4 (2006): e36. Common threshold = 60% of maximum score MacIsaac & Fraenkel (2006) PLoS Comp Bio
12 Visualizing Motifs Motif Logos Represent both base frequency and conservation at each position Height of letter proportional to frequency of base at that position Height of stack proportional to conservation at that position
13 Motif Information The height of a stack is often called the motif information at that position measured in bits Information Motif Position Information = 2 pblog b= { A, T, G, C} p b Why is this a measure of information?
14 Uncertainty and probability Uncertainty is related to our surprise at an event The sun will rise tomorrow The sun will not rise tomorrow Not surprising (p~1) Very surprising (p<<1) Uncertainty is inversely related to probability of event
15 Average Uncertainty Two possible outcomes for sun rising A The sun will rise tomorrow B The sun will not rise tomorrow P(A)=p 1 P(B)=p 2 What is our average uncertainty about the sun rising = PA ( )Uncertainty(A) + PB ( )Uncertainty(B) = p log p p log p = p i log p i = Entropy
16 Entropy Entropy measures average uncertainty Entropy measures randomness H ( X) = pilog 2 p i i If log is base 2, then the units are called bits
17 Entropy versus randomness Entropy is maximum at maximum randomness Entropy Example: Coin Toss P(heads)=0 Not very random H(X)=0.47 bits P(heads)=0.5 Completely random H(X)=1 bits P(heads)
18 Entropy Examples P(x) A1 T2 G3 C4 H( X ) = [05log(05) + 05log(05) + 05log(05) + 05log(05)] = 2 bits P(x) A1 T2 G3 C4 H( X ) = [0log(0) + 0log(0) + 0log(0) log(0.75)] = 0.63 bits
19 Information Content Information is a decrease in uncertainty Once I tell you the sun will rise, your uncertainty about the event decreases Information = H before (X) - H after (X) Information is difference in entropy after receiving information
20 Motif Information Motif Position Information = 2 - b= { A, T, G, C} p b log p b H background (X) Prior uncertainty about nucleotide H motif_i (X) Uncertainty after learning it is position i in a motif P(x) A T G C H(X)=2 bits P(x) A T G C H(X)=0.63 bits Uncertainty at this position has been reduced by 0.37 bits
21 Motif Logo Conserved Residue Reduction of uncertainty of 2 bits Little Conservation Minimal reduction of uncertainty
22 Background DNA Frequency The definition of information assumes a uniform background DNA nucleotide frequency What if the background frequency is not uniform? (e.g. Plasmodium) P(x) H background (X) A T G C H(X)=1.7 bits P(x) H motif_i (X) A T G C H(X)=1.9 bits Motif Position Information = b= { A, T, G, C} p b log p b = -0 bits Some motifs could have negative information!
23 A Different Measure Relative entropy or Kullback-Leibler (KL) divergence Divergence between a true distribution and another D ( P P ) = P ( i)log motif KL motif background motif i= { A, T, G, C} Pbackground P () i () i True Distribution Other Distribution D KL is larger the more different P motif is from P background Same as Information if P background is uniform Properties D D KL KL 0 = 0 if and only if P =P motif D ( P Q) D ( Q P) KL KL background
24 Comparing Both Methods Information assuming uniform background DNA KL Distance assuming 20% GC content (e.g. Plasmodium)
25 Online Logo Generation
26 Finding New Motifs Learning Motif Models
27 A Promoter Model Length K Motif Background DNA A C G T M M K.3.4 A: 05 T: 05 G: 05 C: 05 P k (S M) P(S B) The same motif model in all promoters
28 Probability of a Sequence Given a sequence(s), motif model and motif location A T A T G C 59 6 = = i Pk ( k + 63 ) PSeq ( Mstart 10, Model) PS ( B) S M P( S B) 100 i= 1 k = 1 i= 66 i S i = nucleotide at position i in the sequence A C G T M M K.3.4
29 Parameterizing the Motif Model Given multiple sequences and motif locations but no motif model AATGCG ATATGG ATATCG GATGCA Count Frequencies Add pseudocounts A C G T M 1 M 6 3/4 ETC 3/4
30 Finding Known Motifs Given multiple sequences and motif model but no motif locations P(Seq window Motif) window Calculate P(Seq window Motif) for every starting location
31 Motif Position Distribution Z ij Z ij the element of the matrix represents the probability that the motif starts in position j in sequence I Z Z = seq seq seq seq Some examples: Z 1 Z 2 Z 3 no clear winner two candidates one big winner Z 4 uniform
32 Calculating the Z Vector PZ ( = 1 SM, ) = ij PS ( Zij= 1, M) PZij ( = 1) PS ( ) (Bayes rule) PZ ( = 1 SM, ) = ij L K+ 1 k = 1 PS ( Zij= 1, M) PZij ( = 1) PS ( Zij= 1, M) PZij ( = 1) PZ ( = 1 SM, ) = ij L K+ 1 k = 1 PS ( Zij= 1, M) PS ( Zij= 1, M) Assume uniform priors (motif equally likely to start at any position)
33 Calculating the Z Vector - Example Z i Z i X i p = = G C T G T A G A C G T = = then normalize so that L W + 1 Z ij j= 1 = 1
34 Discovering Motifs Given a set of co-regulated genes, we need to discover with only sequences We have neither a motif model nor motif locations Need to discover both How can we approach this problem? (Hint: start with a random motif model)
35 Expectation Maximization (EM) Remember the basic idea! 1.Use model to estimate distribution of missing data 2.Use estimate to update model 3.Repeat until convergence Model is the motif model Missing data are the motif locations
36 EM for Motif Discovery 1. Start with random motif model 2. E Step: estimate probability of motif positions for each sequence A C G T M Step: use estimate to update motif model 4. Iterate (to convergence) A C G T ETC.3
37 The M-Step Calculating the Motif Matrix M ck is the probability of character c at position k With specific motif positions, we can estimate M ck : Counts of c at pos k In each motif position M ck, = b n ck, ck, n + d + d bk, bk, Pseudocounts But with probabilities of positions, Z ij, we average: n ck, = sequences S i { js = c} i Z ij
38 MEME MEME - implements EM for motif discovery in DNA and proteins MAST search sequences for motifs given a model
39 P(Seq Model) Landscape EM searches for parameters to increase P(seqs parameters) Useful to think of P(seqs parameters) as a function of parameters EM starts at an initial set of parameters And then climbs uphill until it reaches a local maximum P(Sequences params1,params2) Parameter1 Parameter2 Where EM starts can make a big difference
40 Search from Many Different Starts To minimize the effects of local maxima, you should search multiple times from different starting points MEME uses this idea Start at many points Run for one iteration Choose starting point that got the highest and continue P(Sequences params1,params2) Parameter1 Parameter2
41 The ZOOPS Model The approach as we ve outlined it, assumes that each sequence has exactly one motif occurrence per sequence; this is the OOPS model The ZOOPS model assumes zero or one occurrences per sequence
42 E-step in the ZOOPS Model We need to consider another alternative: the ith sequence doesn t contain the motif We add another parameter (and its relative) λ γ = ( L W +1)λ prior prob that any position in a sequence is the start of a motif prior prob of a sequence containing a motif
43 E-step in the ZOOPS Model PZ ( = 1) = ij Pr( S Z = 1, M) λ i L W+ 1 Pr( S Q = 0, M)(1 γ ) + Pr( S Z = 1, M) λ i i i ik k = 1 ij here Q i is a random variable that takes on 0 to indicate that the sequence doesn t contain a motif occurrence L W + 1 Q i = Z i j= 1, j
44 M-step in the ZOOPS Model update p same as before λ, γ update as follows λ γ 1 ( t+ 1) n m ( t+ 1) () t = = Zi, j ( L W + 1) n( L W + 1) sequences i= 1 positions j= 1 Z i, j ( t) average of across all sequences, positions
45 The TCM Model The TCM (two-component mixture model) assumes zero or more motif occurrences per sequence
46 Likelihood in the TCM Model the TCM model treats each length W subsequence independently to determine the likelihood of such a subsequence: j+ W 1 Pr( S Z = 1, M) = M ij ij ck, k j+ 1 k= j assuming a motif starts there j+ W 1 Pr( Sij Zij = 0, p) = P( ck B) k= j assuming a motif doesn t start there
47 E-step in the TCM Model Z ij = Pr( S Z = 1, M) λ i, j Pr( S Z = 0, B)(1 λ) + Pr( S Z = 1, M) λ i, j ij i, j ij ij subsequence isn t a motif subsequence is a motif M-step same as before
48 Gibbs Sampling A stochastic version of EM that differs from deterministic EM in two key ways 1. At each iteration, we only update the motif position of a single sequence 2. We may update a motif position to a suboptimal new position
49 Gibbs Sampling Best Location New Location 1. Start with random motif locations and calculate a motif model 2. Randomly select a sequence, remove its motif and recalculate tempory model A C G T With temporary model, calculate probability of motif at each position on sequence 4. Select new position based on this distribution A C G T Update model and Iterate ETC
50 Gibbs Sampling and Climbing Because gibbs sampling does not always choose the best new location it can move to another place not directly uphill P(Sequences params1,params2) Parameter1 Parameter2 In theory, Gibbs Sampling less likely to get stuck a local maxima
51 AlignACE Implements Gibbs sampling for motif discovery Several enhancements ScanAce look for motifs in a sequence given a model CompareAce calculate similarity between two motifs (i.e. for clustering motifs)
52 Antigen Epitope Prediction
53 Antigens and Epitopes Antigens are molecules that induce immune system to produce antibodies Antibodies recognize parts of molecules called epitopes
54 Genome to Immunome Pathogen genome sequences provide define all proteins that could illicit an immune response Looking for a needle Only a small number of epitopes are typically antigenic in a very big haystack Vaccinia virus (258 ORFs): 175,716 potential epitopes (8-, 9-, and 10-mers) M. tuberculosis (~4K genes): 433,206 potential epitopes A. nidulans (~9K genes): 1,579,000 potential epitopes Can computational approaches predict all antigenic epitopes from a genome?
55 Modeling MHC Epitopes Have a set of peptides that have been associate with a particular MHC allele Want to discover motif within the peptide bound by MHC allele Use motif to predict other potential epitopes
56 Motifs Bound by MHCs MHC 1 Closed ends of grove Peptides 8-10 AAs in length Motif is the peptide MHC 2 Grove has open ends Peptides have broad length distribution: AAs Need to find binding motif within peptides
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationQuantitative Bioinformatics
Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationDe novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes
De novo identification of motifs in one species Modified from Serafim Batzoglou s lecture notes Finding Regulatory Motifs... Given a collection of genes that may be regulated by the same transcription
More informationMotifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.
Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationNeyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?
Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test
More informationQB LECTURE #4: Motif Finding
QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More information6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II
6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 05 Hidden Markov Models Part II 1 2 Module 1: Aligning and modeling genomes Module 1: Computational foundations Dynamic programming:
More informationAlgorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding
Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12:00 4001 Motif finding This exposition was developed by Knut Reinert and Clemens Gröpl. It is based on the following
More informationTranscrip:on factor binding mo:fs
Transcrip:on factor binding mo:fs BMMB- 597D Lecture 29 Shaun Mahony Transcrip.on factor binding sites Short: Typically between 6 20bp long Degenerate: TFs have favorite binding sequences but don t require
More informationWhole-genome analysis of GCN4 binding in S.cerevisiae
Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?
More information1.5 MM and EM Algorithms
72 CHAPTER 1. SEQUENCE MODELS 1.5 MM and EM Algorithms The MM algorithm [1] is an iterative algorithm that can be used to minimize or maximize a function. We focus on using it to maximize the log likelihood
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationExpectation-Maximization (EM) algorithm
I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce
More informationChapter 7: Regulatory Networks
Chapter 7: Regulatory Networks 7.2 Analyzing Regulation Prof. Yechiam Yemini (YY) Computer Science Department Columbia University The Challenge How do we discover regulatory mechanisms? Complexity: hundreds
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationProbabilistic models of biological sequence motifs
Probabilistic models of biological sequence motifs Discovery of new motifs Master in Bioinformatics UPF 2015-2016 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain what
More informationEM-algorithm for motif discovery
EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width
More informationAlignment. Peak Detection
ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationStephen Scott.
1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative
More informationHidden Markov Models for biological sequence analysis I
Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands
More informationCSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:
Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2
More informationCSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery
CSE 527 Autumn 2006 Lectures 8-9 (& part of 10) Motifs: Representation & Discovery 1 DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%,
More informationLecture 7 Sequence analysis. Hidden Markov Models
Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden
More informationPosition-specific scoring matrices (PSSM)
Regulatory Sequence nalysis Position-specific scoring matrices (PSSM) Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d ix-marseille, France Technological dvances for Genomics and Clinics
More informationA Combined Motif Discovery Method
University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 8-6-2009 A Combined Motif Discovery Method Daming Lu University of New Orleans Follow
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationMEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY
Command line Training Set First Motif Summary of Motifs Termination Explanation MEME - Motif discovery tool MEME version 3.0 (Release date: 2002/04/02 00:11:59) For further information on how to interpret
More informationDNA Binding Proteins CSE 527 Autumn 2007
DNA Binding Proteins CSE 527 Autumn 2007 A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%, of all human proteins) modulate transcription of protein coding
More informationMatrix-based pattern discovery algorithms
Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
More informationQuantitative Biology Lecture 3
23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance
More informationCSCE 471/871 Lecture 3: Markov Chains and
and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State
More informationAlgorithms in Bioinformatics II SS 07 ZBIT, C. Dieterich, (modified script of D. Huson), April 25,
Algorithms in Bioinformatics II SS 07 ZBIT, C. Dieterich, (modified script of D. Huson), April 25, 200707 Motif Finding This exposition is based on the following sources, which are all recommended reading:.
More informationGS 559. Lecture 12a, 2/12/09 Larry Ruzzo. A little more about motifs
GS 559 Lecture 12a, 2/12/09 Larry Ruzzo A little more about motifs 1 Reflections from 2/10 Bioinformatics: Motif scanning stuff was very cool Good explanation of max likelihood; good use of examples (2)
More informationJianlin Cheng, PhD. Department of Computer Science University of Missouri, Columbia. Fall, 2014
Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia Fall, 2014 Free for academic use. Copyright @ Jianlin Cheng & original sources for some materials Find a set of sub-sequences
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationStochastic processes and
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University
More informationIntroduction to Information Theory
Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information
More informationCSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9
CSCI 5832 Natural Language Processing Jim Martin Lecture 9 1 Today 2/19 Review HMMs for POS tagging Entropy intuition Statistical Sequence classifiers HMMs MaxEnt MEMMs 2 Statistical Sequence Classification
More informationHidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2
Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis Shamir s lecture notes and Rabiner s tutorial on HMM 1 music recognition deal with variations
More informationCSEP 590A Summer Lecture 4 MLE, EM, RE, Expression
CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationCSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators
CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationVL Bioinformatik für Nebenfächler SS2018 Woche 9
VL Bioinformatik für Nebenfächler SS2018 Woche 9 Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, FU Berlin Based on slides by P. Compeau and P. Pevzner, authors of Teil 1 (Woche
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationEM (cont.) November 26 th, Carlos Guestrin 1
EM (cont.) Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 26 th, 2007 1 Silly Example Let events be grades in a class w 1 = Gets an A P(A) = ½ w 2 = Gets a B P(B) = µ
More informationHidden Markov Models
Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationHidden Markov Models for biological sequence analysis
Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA
More informationLecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models
Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary
More informationPlan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping
Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationInformation. = more information was provided by the outcome in #2
Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual
More informationA Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University
A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationInformation Theory & Decision Trees
Information Theory & Decision Trees Jihoon ang Sogang University Email: yangjh@sogang.ac.kr Decision tree classifiers Decision tree representation for modeling dependencies among input variables using
More informationGenome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics
Genome 541 Unit 4, lecture 2 Transcription factor binding using functional genomics Slides vs chalk talk: I m not sure why you chose a chalk talk over ppt. I prefer the latter no issues with readability
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationMarkov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University
Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationHypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses
Testing Hypotheses MIT 18.443 Dr. Kempthorne Spring 2015 1 Outline Hypothesis Testing 1 Hypothesis Testing 2 Hypothesis Testing: Statistical Decision Problem Two coins: Coin 0 and Coin 1 P(Head Coin 0)
More informationSearching Sear ( Sub- (Sub )Strings Ulf Leser
Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf
More informationStochastic processes and Markov chains (part II)
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Amsterdam, The
More informationRecitation 9: Loopy BP
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,
More informationModeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions)
Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions) Computational Genomics Course Cold Spring Harbor Labs Oct 31, 2016 Gary D. Stormo Department of Genetics
More informationStatistical Sequence Recognition and Training: An Introduction to HMMs
Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * The contents are adapted from Dr. Jean Gao at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Primer on Probability Random
More informationSelf-Organization by Optimizing Free-Energy
Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationInferring Models of cis-regulatory Modules using Information Theory
Inferring Models of cis-regulatory Modules using Information Theory BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 28 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material,
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationFrom Arabidopsis roots to bilinear equations
From Arabidopsis roots to bilinear equations Dustin Cartwright 1 October 22, 2008 1 joint with Philip Benfey, Siobhan Brady, David Orlando (Duke University) and Bernd Sturmfels (UC Berkeley), research
More informationQuantitative Information Flow. Lecture 7
Quantitative Information Flow Lecture 7 1 The basic model: Systems = Information-Theoretic channels Secret Information Observables s1 o1... System... sm on Input Output 2 Probabilistic systems are noisy
More informationInferring Models of cis-regulatory Modules using Information Theory
Inferring Models of cis-regulatory Modules using Information Theory BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 26 Anthony Gitter gitter@biostat.wisc.edu Overview Biological question What is causing
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationStructural biomathematics: an overview of molecular simulations and protein structure prediction
: an overview of molecular simulations and protein structure prediction Figure: Parc de Recerca Biomèdica de Barcelona (PRBB). Contents 1 A Glance at Structural Biology 2 3 1 A Glance at Structural Biology
More informationInferring Protein-Signaling Networks
Inferring Protein-Signaling Networks Lectures 14 Nov 14, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1
More informationImage Characteristics
1 Image Characteristics Image Mean I I av = i i j I( i, j 1 j) I I NEW (x,y)=i(x,y)-b x x Changing the image mean Image Contrast The contrast definition of the entire image is ambiguous In general it is
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationIntroduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo
Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Assaf Weiner Tuesday, March 13, 2007 1 Introduction Today we will return to the motif finding problem, in lecture 10
More information6.096 Algorithms for Computational Biology. Prof. Manolis Kellis
6.096 Algorithms for Computational Biology Prof. Manolis Kellis Today s Goals Introduction Class introduction Challenges in Computational Biology Gene Regulation: Regulatory Motif Discovery Exhaustive
More information