HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
|
|
- Robert Douglas
- 5 years ago
- Views:
Transcription
1 HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding with HMMs Using the gene finder Apply the gene finder to both the top strand and bottom strand Training using annotated genes 1
2 A simple eukaryotic gene finder Pair HMMs and Sequence Alignment Global alignment with affine gaps revisited Suppose we re aligning sequences x,y of lengths m and n. Dynamic Programming: M(i, j): Optimal alignment of x 1 x i to y 1 y j ending in a match I X (i, j): Optimal alignment of x 1 x i to y 1 y j ending in a gap in x Alignment with affine gaps Initialization: M(0,0) = 0; M(i, 0) = M(0, j) = -, for i, j > 0 I X (i,0) = - (d + i * e); I Y (0, j) = - (d + j * e) Iteration: M(i 1, j 1) M(i, j) = s(x i, y j ) + max I X (i 1, j 1) I X (i 1, j 1) I Y (i, j): Optimal alignment of x 1 x i to y 1 y j ending in a gap in y I X (i, j) = max I Y (i, j) = max I X (i 1, j) - e M(i 1, j) - d I Y (i, j 1) - e M(i, j 1) - d Termination: Optimal alignment given by max { M(m, n), I X (m, n), I Y (m, n) } A state model for alignment Scoring transitions x -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- y TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC XMMYMMMMMMMYYMMMMMMYMMMMMMMXXMMMMMXXX x -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- y TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC XMMYMMMMMMMYYMMMMMMYMMMMMMMXXMMMMMXXX 2
3 Probabilistic interpretation of an alignment An alignment is a hypothesis that the two sequences are related by evolution Goal: Produce the most likely alignment Compute the likelihood that the sequences are indeed related A Pair HMM for alignments ε M δ 1 2δ optional Transition probabilities: δ δ: set so that 1/2δ is avg. length of match region ε: set so that 1/() is avg. length of a gap ε Model generates two sequences simultaneously Emission probabilities: Match/Mismatch state M: Emits pair of symbols with distribution P(x,y) that reflects substitution frequencies between pairs of amino acids Insertion states X,Y: Emit symbol + gap according to P(x), P(y) that reflect amino acid frequencies of x and y Viterbi for pair HMMs A model for unaligned sequences Initialization: Recurrence: ε 1 2δ δ δ R The two sequences are independently generated from one another P(x, y R) = P(x 1 ) P(x m ) P(y 1 ) P(y n ) = Π i P(x i ) Π j P(y j ) Termination: ALIGNMENT vs. RANDOM hypothesis Every pair of letters contributes: M (1 2δ) P(x i, y j ) when matched ε P(x i ) or ε P(y j ) when gapped R P(x i ) P(y j ) in random model ε 1 2δ δ δ ε ALIGNMENT vs. RANDOM hypothesis To compare the models we consider the log odds ratio under the two models, which can be expressed as a sum of terms: δ ε P(x i, y j ) s(x i, y j ) = log + log (1 2δ) P(x i ) P(y j ) δ () P(x i ) d = log (1 2δ) P(x i ) 1 2δ δ = Defn substitution score = Defn gap initiation penalty ε P(x i ) e = log P(x i ) = Defn gap extension penalty 3
4 Log-odds Viterbi Comments The Viterbi algorithm for Pair HMMs corresponds exactly to global alignment DP with affine gaps V M (i, j) = max { V M (i 1, j 1), V X ( i 1, j 1), V Y ( i 1, j 1) } + s(x i, y j ) V X (i, j) = max { V M (i 1, j) d, V X ( i 1, j) e } V Y (i, j) = max { V M (i 1, j) d, V Y ( i 1, j) e } The pair HMM formulation provides a probabilistic interpretation for the pairwise alignment algorithm, and allows assigning values to parameters from first principles. Viterbi provides the probability that two sequences are related by a particular alignment according to the HMM. Using the forward algorithm we can compute the probability that the two sequences are related by any alignment: Probability that two residues are aligned Notation: means x i and y j are aligned f M and b M are the forward and backward probs. that correspond to a match state Pair HMM for local alignment Describing protein families for prediction of protein structure and function Protein sequence and structure Protein classification Figure 4.3 from Durbin et al. 4
5 Primary Structure: Sequence The primary structure of a protein is its amino acid sequence Primary Structure: Sequence Primary Structure: Sequence Twenty different amino acids have distinct shapes and properties Secondary Structure: α, β, & loops α helices and β sheets are stabilized by hydrogen bonds between backbone oxygen and hydrogen atoms A useful mnemonic for the hydrophobic amino acids is "FAMILY VW" Tertiary Structure: the Fold of a protein Structure Determines Function The Protein Folding Problem What determines structure? 3d structure is minimum free-energy conformation How can we determine structure? Experimental methods Computational predictions 5
6 Growth of the Protein Data Bank (PDB) The number of folds in nature New PDB structures Classifying folds Structure Classification Databases Despite the large growth in the number of solved protein structures, the number of new folds identified is small Classifying proteins into folds Generate overview of structure types Detect similarities between proteins Help predict 3D structure of new protein sequences Classification of 25,973 protein structures in PDB SCOP release 1.69, Class # folds All alpha proteins 218 All beta proteins 144 Alpha and beta proteins (a/b) 136 Alpha and beta proteins (a+b) 279 Multi-domain proteins 46 Membrane & cell surface 47 Small proteins 75 Total 945 SCOP Manual classification (A. Murzin) scop.berkeley.edu CATH Semi manual classification (C. Orengo) FSSP Automatic classification (L. Holm) Major classes in SCOP All α: Hemoglobin (1bab) Classes All α proteins All β proteins α and β proteins (α/β) α and β proteins (α+β) Membrane and cell surface proteins Small proteins Morten Nielsen,CBS, BioCentrum, DTU 6
7 All β: Immunoglobulin (8fab) α/β: Triosephosphate isomerase (1hti) Morten Nielsen,CBS, BioCentrum, DTU Morten Nielsen,CBS, BioCentrum, DTU α+β: Lysozyme (1jsf) Protein Family Consists of proteins whose evolutionarily relationship is readily recognizable from sequence (>30% sequence identity) Superfamily Fold Family Proteins Morten Nielsen,CBS, BioCentrum, DTU Protein Superfamily Fold Consists of proteins which are (remotely) evolutionarily related Sequence similarity low Related function Similar in structure Relationships between members of a superfamily hard to recognize from sequence alone Superfamily Family Fold Proteins Proteins in the same fold share the same general architecture, and have the same major secondary structures in the same topological arrangement No evolutionary relationship implied Fold Superfamily Family Proteins 7
8 Protein Classification PSI-BLAST Given a new protein, can we assign it to its position within an existing protein hierarchy? Methods BLAST / PsiBLAST Profile HMMs new protein? Superfamily Family Fold Given a sequence query x, and database D 1. Find all pairwise alignments of x to sequences in D 2. Collect all matches of x to y with some minimum significance 3. Construct a profile M Each sequence y is given a weight so that many similar sequences cannot have much influence on a position (Henikoff & Henikoff 1994) 4. Using M, search D for more matches 5. Iterate 1 4 until convergence Supervised machine learning Proteins Profile M Profiles & Profile HMMs Psi-BLAST builds a profile Profile HMM: a more elaborate version of a profile Intuitively, a profile that models insertions and deletions PWMs as HMMs Insertions I 1 BEGIN M L BEGIN M L A PWM can be described as a very simple HMM where the emissions associated with M i correspond to the i th column of the PWM Insertion: a part of the sequence x that is not modeled by the match states The loop allows for insertions of multiple letters An insert state emits symbols with the background distribution 8
9 Insertions Deletions I 1 BEGIN M L BEGIN Log odds contribution of an insertion of length k: M L Looks like an affine gap penalty Deletion states: silent states Cost of deletion of a sequence of letters: sum of log transition probabilities We could model deletions by allowing all forward transitions from match states: lots of transitions Profile HMMs Profile HMMs BEGIN I 0 I 1 I L-1 I L BEGIN I 0 I 1 I L-1 I L M L Protein Family F Each M state has a position-specific pre-computed substitution table Each I state has position-specific gap penalties (and in principle can have its own emission distributions) Each D allows to skip its corresponding match state In principle, D-D transitions can also be customized per position M L Protein Family F transition between match states a M(i)M(i+1) transitions between match and insert states a M(i)I(i), a I(i)M(i+1) transition within insert state a I(i)I(i) transition between match and delete states a M(i)D(i+1), a D(i)M(i+1) transition within delete state a D(i)D(i+1) emission of amino acid b at a state S e S (b) Estimating Profile HMMs from MSAs Estimating Profile HMMs from MSAs BEGIN I 0 I 1 I L-1 I L M L Columns that have less than a given number of gaps are used as match state of the HMM (starred columns) Amino acid frequencies of each column used to set emission probabilities for the corresponding match state transition probabilities ~ frequency of a transition in alignment emission probabilities ~ frequency of an emission in alignment pseudocounts are usually introduced Akl akl =! A ' ' l kl E k ( a) ek ( a) =! E k ( a ' ) ' a ' 9
10 Profile HMM for the globins example What you can do with a profile HMM Classification: given a protein, we can compute its probability under the model, or compare it to a random model (using a log-odds-ratio). Use forward/backward Alignment: Use the Viterbi algorithm to align a sequence to the HMM. Learning: set model parameters using a given MSA, or from scratch using EM. Figure 5.4 from Durbin et al. Alignment of a protein to a profile HMM Log-odds Viterbi To align sequence x 1 x n to a profile HMM: We will find the most likely alignment with the Viterbi DP algorithm Define V jm (i): score of best alignment of x 1 x i to the HMM ending in x i being emitted from M j V ji (i): score of best alignment of x 1 x i to the HMM ending in x i being emitted from I j V jd (i): score of best alignment of x 1 x i to the HMM ending in D j (x i is the last character emitted before D j ) Denote by q a the frequency of amino acid a in a random protein Log-odds Viterbi (termination) MSA using a profile HMM 10
11 BEGIN I 0 I 1 I m-1 M m I m BEGIN I 0 I 1 I m-1 M m BEGIN I 0 I 1 I m-1 M m I m I m How to build a profile HMM Classification with Profile HMMs Fold Superfamily Family new protein? Classification with Profile HMMs Classifying globins How generative models work Training examples: sequences known to be members of family Tuning parameters with a priori knowledge Model assigns a probability to any given protein sequence Idea: The sequences from the family (hopefully) yields a higher probability than sequences outside the family Log-likelihood ratio as score P(H 1 X) L(X) = log P(H 0 X) Classification using Z score Weighting training sequences The examples of a protein family are not a good sample from the sequences that belong to the family: typically there are sequences that are very closely related. They can given the mistaken impression regarding the pattern of conservation in the family. Solution: assign weights to each sequence. Less weight to related sequences. 11
12 Profile HMMs for local alignment PFAM: profile HMMs for protein families PFAM method: Use Blast to cluster a protein database into families of related proteins Construct a multiple alignment for each protein family. Construct a profile HMM from each alignment Given a protein of unknown origin: Align the target sequence against each HMM to find the best fit PFAM Generative Models Pfam decribes protein domains Each protein domain family in Pfam has: Seed alignment: manually verified multiple alignment of a representative set of sequences. HMM built from the seed alignment for further database searches. Full alignment generated automatically from the HMM The distinction between seed and full alignments facilitates Pfam updates. Seed alignments are a stable resource (manually inspected by an expert) HMMs can be updated with new protein sequences. Generative Models Generative Models 12
13 Digression: Discriminative Models k-mer based SVMs for protein classification Rather than model each class (red/ green), construct a decision boundary between the classes. margin Support Vector Machines (SVM) learn large margin decision boundaries For given word size k, and mismatch tolerance m, define K(x, y) = # distinct k-long word occurrences with m mismatches SVM can be learned by supplying this kernel function X A B A C A R D I w Decision Rule: red: w T x > 0 Let k = 3; m = 1 Y A B R A D A B I K(X, Y) = 4 Benchmarks Resources on the web HMMer a free profile HMM software SAM another free profile HMM software PFAM database of alignments and HMMs for protein families and domains SCOP a structural classification of proteins Transmembrane proteins Prediction of transmembrane proteins 13
14 Transmembrane proteins Sonnhammer et al., ISMB, 6: , 1998: TMHMM Properties of transmembrane proteins: hydrophobic α-helices connected by interfacial loops There are 3 main locations of a residue: TM helix core (in hydrophobic tail of membrane) TM helix cap (in head of membrane) cytoplasmic vs non-cytoplasmic side of the helix core loops cytoplasimc vs non-cytoplasmic (short) vs non-cytoplasmic (long) cyto non-cyto Architecture of TMHMM 14
Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationHidden Markov Models
Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationProtein Structure: Data Bases and Classification Ingo Ruczinski
Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationHidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)
Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P
More informationMoreover, the circular logic
Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationStephen Scott.
1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationMultiple Sequence Alignment using Profile HMM
Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More informationCSCE 471/871 Lecture 3: Markov Chains and
and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationProtein structure alignments
Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationLecture 7 Sequence analysis. Hidden Markov Models
Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden
More informationHidden Markov Models (I)
GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables
More informationPairwise alignment using HMMs
Pairwise alignment using HMMs The states of an HMM fulfill the Markov property: probability of transition depends only on the last state. CpG islands and casino example: HMMs emit sequence of symbols (nucleotides
More information1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)
Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein
More informationBioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing
Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.
More informationHidden Markov Models and Their Applications in Biological Sequence Analysis
Hidden Markov Models and Their Applications in Biological Sequence Analysis Byung-Jun Yoon Dept. of Electrical & Computer Engineering Texas A&M University, College Station, TX 77843-3128, USA Abstract
More informationProcheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.
Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond
More informationBasics of protein structure
Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu
More informationStephen Scott.
1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative
More informationProtein Structure Prediction using String Kernels. Technical Report
Protein Structure Prediction using String Kernels Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE Minneapolis, MN 55455-0159
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K HiSeq X & NextSeq Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization:
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationSCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like
SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationProtein tertiary structure prediction with new machine learning approaches
Protein tertiary structure prediction with new machine learning approaches Rui Kuang Department of Computer Science Columbia University Supervisor: Jason Weston(NEC) and Christina Leslie(Columbia) NEC
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationIntroduction to Comparative Protein Modeling. Chapter 4 Part I
Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature
More informationCSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:
Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2
More informationLecture 9. Intro to Hidden Markov Models (finish up)
Lecture 9 Intro to Hidden Markov Models (finish up) Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationIdentification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach
Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Prof. Dr. M. A. Mottalib, Md. Rahat Hossain Department of Computer Science and Information Technology
More informationStatistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics
Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia
More informationPROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES
PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES Eser Aygün 1, Caner Kömürlü 2, Zafer Aydin 3 and Zehra Çataltepe 1 1 Computer Engineering Department and 2
More informationRNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"
RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More informationBioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs
Bioinformatics 1--lectures 15, 16 Markov chains Hidden Markov models Profile HMMs target sequence database input to database search results are sequence family pseudocounts or background-weighted pseudocounts
More informationHIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION
From THE CENTER FOR GENOMICS AND BIOINFORMATICS Karolinska Institutet, Stockholm, Sweden HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION Markus Wistrand Stockholm 2005 All previously published
More informationComparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes
More informationConditional Graphical Models for Protein Structure Prediction
Conditional Graphical Models for Protein Structure Prediction Yan Liu Language Technologies Institute School of Computer Science Carnegie Mellon University Submitted in partial fulfillment of the requirements
More informationHidden Markov Models
Andrea Passerini passerini@disi.unitn.it Statistical relational learning The aim Modeling temporal sequences Model signals which vary over time (e.g. speech) Two alternatives: deterministic models directly
More informationALL LECTURES IN SB Introduction
1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL
More informationIntro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models
Last time Domains Hidden Markov Models Today Secondary structure Transmembrane proteins Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL
More informationHeteropolymer. Mostly in regular secondary structure
Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!
More informationToday. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure
Last time Today Domains Hidden Markov Models Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL SSLGPVVDAHPEYEEVALLERMVIPERVIE FRVPWEDDNGKVHVNTGYRVQFNGAIGPYK
More informationProtein Secondary Structure Prediction
Protein Secondary Structure Prediction Doug Brutlag & Scott C. Schmidler Overview Goals and problem definition Existing approaches Classic methods Recent successful approaches Evaluating prediction algorithms
More informationEBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013
EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice
More informationCOMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University
COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/14/07 CAP5510 1 CpG Islands Regions in DNA sequences with increased
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationNumber sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence
Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence
More informationAmino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1
Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationHidden Markov Models (HMMs) and Profiles
Hidden Markov Models (HMMs) and Profiles Swiss Institute of Bioinformatics (SIB) 26-30 November 2001 Markov Chain Models A Markov Chain Model is a succession of states S i (i = 0, 1,...) connected by transitions.
More informationPairwise sequence alignment and pair hidden Markov models
Pairwise sequence alignment and pair hidden Markov models Martin C. Frith April 13, 2012 ntroduction Pairwise alignment and pair hidden Markov models (phmms) are basic textbook fare [2]. However, there
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More informationDecember 2, :4 WSPC/INSTRUCTION FILE jbcb-profile-kernel. Profile-based string kernels for remote homology detection and motif extraction
Journal of Bioinformatics and Computational Biology c Imperial College Press Profile-based string kernels for remote homology detection and motif extraction Rui Kuang 1, Eugene Ie 1,3, Ke Wang 1, Kai Wang
More informationProtein Structure Prediction
Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on
More informationHidden Markov Models
Hidden Markov Models A selection of slides taken from the following: Chris Bystroff Protein Folding Initiation Site Motifs Iosif Vaisman Bioinformatics and Gene Discovery Colin Cherry Hidden Markov Models
More informationMotifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC
Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved
More informationAnalysis and Prediction of Protein Structure (I)
Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng
More informationDATE A DAtabase of TIM Barrel Enzymes
DATE A DAtabase of TIM Barrel Enzymes 2 2.1 Introduction.. 2.2 Objective and salient features of the database 2.2.1 Choice of the dataset.. 2.3 Statistical information on the database.. 2.4 Features....
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationPresentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy
Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy Burkhard Rost and Chris Sander By Kalyan C. Gopavarapu 1 Presentation Outline Major Terminology Problem Method
More informationSome Problems from Enzyme Families
Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems
More informationProtein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1
Protein Structures Sequences of amino acid residues 20 different amino acids Primary Secondary Tertiary Quaternary 10/8/2002 Lecture 12 1 Angles φ and ψ in the polypeptide chain 10/8/2002 Lecture 12 2
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems
More informationExample: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding
Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationSequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir
Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out
More informationAlignment Algorithms. Alignment Algorithms
Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationPair Hidden Markov Models
Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationMarkov Chains and Hidden Markov Models. = stochastic, generative models
Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,
More informationProtein Structure & Motifs
& Motifs Biochemistry 201 Molecular Biology January 12, 2000 Doug Brutlag Introduction Proteins are more flexible than nucleic acids in structure because of both the larger number of types of residues
More information