Gibbs Sampling Methods for Multiple Sequence Alignment
|
|
- Peregrine Gallagher
- 6 years ago
- Views:
Transcription
1 Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1
2 Outline Statistical models for multiple alignment Bayesian methodology Missing data problems The Gibbs sampling algorithm Extensions to multiple motifs and repeats Motif Sampler, PROBE, and beyond 11/17/99 2
3 Sequence alignment problem Find commonalties among multiple protein sequences in order to: Understand function Predict structure Reveal evolutionary relationships Approaches: Pairwise vs. multiple Global vs. local Score-based vs. model-based (EM, Gibbs, HMM) Our basic view: block-motif model 11/17/99 3
4 Multiple (local) sequence alignment a Motif 1 a 2 width = w a k length n k Alignment variable: A={a 1, a 2,, a k } Objective: find the best (most probable) common patterns. 11/17/99 4
5 Statistical models How do we describe patterns? Frequencies of amino acid types Multinomial distribution More generally, a probabilistic model A typical aligned motif 11/17/99 5
6 Multinomial distribution Observed columns: Motif Positions Seq 1 I G K P I E Seq 2 V G D P G E Seq 3 V G D D A D Seq 4 I G Q H P E Seq 5 L S G P E E A total of k sequences Model for i th column: (k i,1, k i,2,, k i,20 ) ~ Multinom (k, p i ) where p i =(p i,1,, p i,20 ) 11/17/99 6
7 Estimating model probabilities Maximum likelihood: Bayesian estimate: Prior: Posterior: Posterior Mean: $p ij $p ij = p i ~ Dirichlet (a i,1,, a i,20 ), pseudo-counts Posterior Distribution: = k [p i obs ]~ Dirichlet (a i,1 +k i,1,, a i,20 + k i,20 ) ij k k k + ij ij + i 11/17/99 7
8 Prior and posterior distributions Prob of p Prior = 10, = 10 The prior belief changes after observing the data k=90, N-k=0 Prob of p Prob of p k + = 100, N k + = 10 Posterior Prob of p 11/17/99 8
9 Motif alignment model a 1 a 2 Motif width = w a k length n k Alignment variable: A={a 1, a 2,, a k } Non-site positions follow a common multinomial distribution with p 0 =(p 0,1,, p 0,20 ) Each position i in the motif follows probability distribution p i =(p i,1,, p i,20 ) 11/17/99 9
10 The tricky part We don t know the alignment variables A={a 1, a 2,, a k } (they are missing ) General missing data problem: Statistical problem of unobserved data May be potentially observable, but unrecorded Arises in e.g. non-response in surveys Biological examples Sequence alignment Protein and RNA structure 11/17/99 10
11 Dealing with the missing data Let Θ=(p 0, p 1,..., p w ), A={a 1, a 2,, a K } Iterative sampling: Draw from, then from P(Θ A, Data) P(A Θ, Data) Predictive Updating: Pretend K-1 sequences are aligned Predict the K th sequence (stochastically) a 1 a 2 a 3 11/17/99 a k? 11
12 The Predictive Updating step a 1 a 2 a 3 a k? 1. Compute predictive frequencies of each position i in motif c ij = count of amino acid type j at position i. c 0j = count of amino acid type j in all non-site positions. q ij = (c ij +b j )/(K-1+B), B=b b K pseudo-counts 2. Sample from the predictive distribution of a k. P( a l 1) k 11/17/99 12 i w 1 q q i, R ( l i) k 0, R ( l i) k
13 The algorithm Initialize by choosing random starting positions 0 a, a,..., a K ( ) ( 0) ( 0) 1 2 Iterate the following steps many times: Randomly or systematically choose a sequence, say sequence k, to exclude. Carry out Predictive Update step to update a k Stop when changes are small, or other convergence criterion met. 11/17/99 13
14 Phase-shift Sometimes stuck in a shift local optima p : True motif locations min{ 1, ( A R) } ( A R) a k? How to escape from this local optima? Simultaneous move: A => A+δ={a 1 +δ,,a K +δ} Use Metropolis step: accept move with prob: Compare entropy between added and removed columns 11/17/99 14
15 ( ) X X X X W J Fragmentation Imagine we have W positions for the motif, only J of them are important : Choose best J columns among W possible /17/99 15 W via a Metropolis algorithm. Prior distribution for fragmentation: W is regarded as the length of span No need to specify W with this model
16 Example: H-T-H proteins HTH: sequence-specific DNA binding, gene regulation. Motifs occur as local isolated structures. The whole 3-D structures are known and very different. 30 sequences with known HTH positions chosen. The set represents a typically diverse cross section of HTH sequences. Width of the motif pattern is assumed to be in the range from 17 to 22. The criterion information per parameter is used to determine the optimal width, 21. Heuristic convergence criteria developed. (multiple restarts with IPP monitored) 11/17/99 16
17 Alignment * = structural alignment 11/17/99 17
18 Model probability ratios 100 * q i,j /p j 11/17/99 18
19 Determining width of motif : All sequences correctly aligned + : One or more incorrectly aligned ο: Sequences with HTH region deleted x : Randomized sequences 11/17/99 19
20 Monitoring convergence 11/17/99 20
21 Motif Sampler Often one has multiple repeats of several motifs in the set of sequences. One long sequence with 3 motif types: Idea: partition the input sequence into segments that correspond to different (unknown) motif models. Use mixture models (unsupervised learning) 11/17/99 21
22 Special case: Bernoulli sampler Sequence data: R = r 1 r 2 r 3 r N Indicator variable: = δ 1 δ 2 δ 3 δ N i 1, 0, if it is the start of an element if not. parameters for the motif model Likelihood: p(r, Θ, ε), ε is the prior probability for δ i =1 Predictive Update: ( 1, R) k [ k ] ( 0, R) 1 k [ k ] $ $ w i 1 p$ p$ i, r 0, r k i k i /17/99 22
23 Example: Porin-like proteins Membrane protein, channels for nutrients, waste products, and antibiotics. 25 proteins suspected of porin 70 segments found, with width=13 4 repeats detected for each of the 3 porins with known 3-D structures, correspond to alternating membrane spanning β-strands. All on outside of the structure. 11/17/99 23
24 Conserved repeats in bacterial porins 11/17/99 24
25 Multiple motif regions detected in bacterial porins 11/17/99 25
26 Example: HTH proteins (cont d) 30 hth sequences (J=12, unknown # repeats) Fragmentation model test: 100 runs of Bernoulli sampler, with/out fragmentation. With Without Average: correct incorrect Most often contained 18 out of 30 known sites. Most frequently: 28 sites, MAP=139; 25 sites, MAP=147. Multiple runs needed to reveal subtle qualitative differences. Different nearby modes are worth 11/17/99 26
27 Hidden Markov model Architecture D 1 D 2 D 3 D 4 I 0 I 1 I 2 I 3 I 4 m 0 m 1 m 2 m 3 m 4 m E 11/17/99 27
28 i = # of deletions G i 0 1 The Path if generated by insertion if generated by a match r 1 r 2 r 3 m r 4 r 5 r 6 (δ 1,G 1 ) (δ 2,G 2 ) m 0 m 1 m 2 3 m r 0 r 1 r 2 r 3 r 4 r 5 r 6 4 (δ 3,G 3 ) (δ 4,G 4 ) (δ 5,G 5 ) (δ 6,G 6 ) Solid path: Dashed path: (,1) (,0) (,0) (,1) (,0) (,0) (,1) (,0) (,0) (,1) (,0) (,0) 11/17/99 28
29 More restricted models Block Motif Propagation Model: limit the total number of gaps but no deletions allowed. Motif 1 Motif 2 Motif 3 Motif 4 With Deletion Indicators: Model: D 1 D 2 D 3 D 4 11/17/99 29
30 Likelihood: Bayesian analysis of the propagation model Random variables in the model A: alignment variable indicating where motif starts Θ: parameters for insertion and model frequencies : fragmentation indicator (for which columns are included) L: total number of motifs for the alignment W: total number of alignment columns R = (R 1,R 2,, R k ), set of protein sequences to be aligned P( R1, R2, K, RK A, Θ,, L, W ) Posterior: c P(R A,Θ,, L,W) P(A, Θ, L,W) normalizing constant prior 11/17/99 30
31 The PROBE algorithm Iterative (Gibbs) sampling from conditionals Given A [-k], (alignment of the rest of sequences) find the distribution of sequence k s alignment Multiple motifs - use recursive summations Estimated from other sequences Seq. k Motif 1 Motif 2 Motif 3 Motif 4 a 1 a 2 a 3 a 4 Genetic algorithm for variable L and W 11/17/99 31
32 An Illustration Alignment of 16 putative N- acetyltransferases. The best alignment found has 4 blocks and 47 columns. The value plotted is the logarithm of the ratio between the likelihoods of posited model and null model. Also applied to the shuffled sequences. 2 blocks 3 blocks 1 block 5 blocks 4 blocks Results for shuffled seq 11/17/99 32
33 An example: Swinging Arm domain in bacterial Membrane Fusion Proteins Lipoyl binding domain from Bacillus stearothermophilus dihydrolipamide acetyltransferase PROBE detects conserved regions in MFPs shared with lipoyl and biotin binding domains Large insertions between two conserved regions in MFPs. Implying a swinging arm mechanism 11/17/99 33
34 Representative alignment 11/17/99 34
35 References Lawrence, C.E. et al. (1993). Gibbs sampling for detecting subtle motifs in multiple sequences. Science 262, Liu, J.S. (1994). The collapsed Gibbs sampler with application to a gene regulation problem. J. Amer. Statist. Assoc. 89, 1156 Liu, J.S., Neuwald, A.F., and Lawrence, C.E. (1999). Markovian structures in biological sequence alignments. J. Amer. Statist. Assoc., Forthcoming. --- (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Statist. Assoc. 90, Neuwald, A.F., Liu, J.S., and Lawrence, C.E. (1997). Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Science 4, Neuwald, A.F., Liu, J.S., Lipman, D.J., and Lawrence, C.E. (1997). Extracting protein alignment models from the sequence database. Nucleic Acids. Res. 25, /17/99 35
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationMatrix-based pattern discovery algorithms
Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationStephen Scott.
1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/14/07 CAP5510 1 CpG Islands Regions in DNA sequences with increased
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationStephen Scott.
1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationHidden Markov Models
Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm
More informationCSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:
Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationA New Similarity Measure among Protein Sequences
A New Similarity Measure among Protein Sequences Kuen-Pin Wu, Hsin-Nan Lin, Ting-Yi Sung and Wen-Lian Hsu * Institute of Information Science Academia Sinica, Taipei 115, Taiwan Abstract Protein sequence
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationHidden Markov Models and some applications
Oleg Makhnin New Mexico Tech Dept. of Mathematics November 11, 2011 HMM description Application to genetic analysis Applications to weather and climate modeling Discussion HMM description Application to
More informationHidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)
Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P
More informationLearning Bayesian Networks
Learning Bayesian Networks Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki V-1 Aspects in learning Learning the parameters of a Bayesian network Marginalizing over all all parameters
More informationSteven L. Scott. Presented by Ahmet Engin Ural
Steven L. Scott Presented by Ahmet Engin Ural Overview of HMM Evaluating likelihoods The Likelihood Recursion The Forward-Backward Recursion Sampling HMM DG and FB samplers Autocovariance of samplers Some
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationComputational Molecular Biology (
Computational Molecular Biology (http://cmgm cmgm.stanford.edu/biochem218/) Biochemistry 218/Medical Information Sciences 231 Douglas L. Brutlag, Lee Kozar Jimmy Huang, Josh Silverman Lecture Syllabus
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationLearning Sequence Motif Models Using Gibbs Sampling
Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationNeyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?
Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test
More informationHidden Markov Models and some applications
Oleg Makhnin New Mexico Tech Dept. of Mathematics November 11, 2011 HMM description Application to genetic analysis Applications to weather and climate modeling Discussion HMM description Hidden Markov
More informationHidden Markov Models in computational biology. Ron Elber Computer Science Cornell
Hidden Markov Models in computational biology Ron Elber Computer Science Cornell 1 Or: how to fish homolog sequences from a database Many sequences in database RPOBESEQ Partitioned data base 2 An accessible
More informationHidden Markov Models
Andrea Passerini passerini@disi.unitn.it Statistical relational learning The aim Modeling temporal sequences Model signals which vary over time (e.g. speech) Two alternatives: deterministic models directly
More informationMultiple Sequence Alignment: HMMs and Other Approaches
Multiple Sequence Alignment: HMMs and Other Approaches Background Readings: Durbin et. al. Section 3.1, Ewens and Grant, Ch4. Wing-Kin Sung, Ch 6 Beerenwinkel N, Siebourg J. Statistics, probability, and
More informationBayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen
Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania stjensen@wharton.upenn.edu
More informationProtein Secondary Structure Prediction
Protein Secondary Structure Prediction Doug Brutlag & Scott C. Schmidler Overview Goals and problem definition Existing approaches Classic methods Recent successful approaches Evaluating prediction algorithms
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationMultiple Sequence Alignment using Profile HMM
Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationExpectation-Maximization (EM) algorithm
I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationA Brief Overview of Gibbs Sampling
A Brief Overview of Gibbs Sampling Eric C. Rouchka TR-ULBL-008-0 March 4, 008 University of Louisville Speed School of Engineering epartment of Computer Engineering and Computer Science 3 JB Speed Building
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationChapter 7: Regulatory Networks
Chapter 7: Regulatory Networks 7.2 Analyzing Regulation Prof. Yechiam Yemini (YY) Computer Science Department Columbia University The Challenge How do we discover regulatory mechanisms? Complexity: hundreds
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationAlignment Algorithms. Alignment Algorithms
Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationEM-algorithm for motif discovery
EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationOn the Monotonicity of the String Correction Factor for Words with Mismatches
On the Monotonicity of the String Correction Factor for Words with Mismatches (extended abstract) Alberto Apostolico Georgia Tech & Univ. of Padova Cinzia Pizzi Univ. of Padova & Univ. of Helsinki Abstract.
More informationInfering the Number of State Clusters in Hidden Markov Model and its Extension
Infering the Number of State Clusters in Hidden Markov Model and its Extension Xugang Ye Department of Applied Mathematics and Statistics, Johns Hopkins University Elements of a Hidden Markov Model (HMM)
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationCSCE 471/871 Lecture 3: Markov Chains and
and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State
More informationInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company UNSUPERVISED LEARNING OF BAYESIAN NETWORKS VIA ESTIMATION OF DISTRIBUTION ALGORITHMS: AN
More informationLearning Bayesian Networks for Biomedical Data
Learning Bayesian Networks for Biomedical Data Faming Liang (Texas A&M University ) Liang, F. and Zhang, J. (2009) Learning Bayesian Networks for Discrete Data. Computational Statistics and Data Analysis,
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationCSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery
CSE 527 Autumn 2006 Lectures 8-9 (& part of 10) Motifs: Representation & Discovery 1 DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%,
More informationWhat is the expectation maximization algorithm?
primer 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou The expectation maximization algorithm arises
More informationHMM: Parameter Estimation
I529: Machine Learning in Bioinformatics (Spring 2017) HMM: Parameter Estimation Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Content Review HMM: three problems
More informationDNA Binding Proteins CSE 527 Autumn 2007
DNA Binding Proteins CSE 527 Autumn 2007 A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%, of all human proteins) modulate transcription of protein coding
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationComparative Network Analysis
Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More informationComputational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept
Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples
More informationAlignment. Peak Detection
ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie
More informationMoreover, the circular logic
Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationChapter 11. Stochastic Methods Rooted in Statistical Mechanics
Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More informationDirichlet Mixtures, the Dirichlet Process, and the Topography of Amino Acid Multinomial Space. Stephen Altschul
Dirichlet Mixtures, the Dirichlet Process, and the Topography of mino cid Multinomial Space Stephen ltschul National Center for Biotechnology Information National Library of Medicine National Institutes
More informationGene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji
Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationExample: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding
Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationTemplate Free Protein Structure Modeling Jianlin Cheng, PhD
Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationState Space and Hidden Markov Models
State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State
More information1.5 MM and EM Algorithms
72 CHAPTER 1. SEQUENCE MODELS 1.5 MM and EM Algorithms The MM algorithm [1] is an iterative algorithm that can be used to minimize or maximize a function. We focus on using it to maximize the log likelihood
More informationCh. 9 Multiple Sequence Alignment (MSA)
Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -
More informationGLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data
GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)
More informationLecture 7 Sequence analysis. Hidden Markov Models
Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More information