Supplementing information theory with opposite polarity of amino acids for protein contact prediction
|
|
- Britton Rose
- 5 years ago
- Views:
Transcription
1 Supplementing information theory with opposite polarity of amino acids for protein contact prediction Yancy Liao 1, Jeremy Selengut 1 1 Department of Computer Science, University of Maryland - College Park The covariation of protein residue pairs is often measured by mutual information scores. Residue pairs with high mutual information scores, however, are not guaranteed to be functionally or structurally significant, since their covariation may be due to a number of indirect effects. We propose filtering for residue pairs whose covariational patterns suggest a direct interaction based on their chemical properties. Specifically, we took three protein families (grol, gudd, and BADH) and filtered them for pairs that covary by opposite polarity of charge ([K or R] with [E or D]). The resulting sets of pairs had moderate improvements in physical proximity compared to using mutual information alone. This suggests a potentially new approach of applying mutual information in tandem with chemical properties to improve contact prediction. Introduction Residue pairs which share a high degree of covariation can offer insights concerning the structure and function of a protein family. When changes in one residue is mirrored by compensatory changes in the other residue, this suggests that the residues have co-evolved. 1
2 Covariation can be used to infer residue pairs which are in physical contact or which have functional relationships. This can then be used to construct models of protein folding or mutational processes (Juan et al., 2013). While these inferences have become more widely accessible with the proliferation of sequence data, they are imperfect. Apart from random background noise, there are a variety of confounding factors by which residue pairs can exhibit a spurious degree of covariation. Such spurious relationships can come about, e.g., through a transitive chain of direct interactions, or because of the presence of a common lineage within the sequence data (Marks et al., 2012). Covariational methods are split broadly into two groups: local and global (Marks et al., 2012). While global methods are typically more complex and avoid the problem of transitivity by modeling sequences as a whole, local methods are relatively simpler computationally and allow us to easily isolate for the contributions of each amino acid pairing, and thereby to augment the method to more heavily favor those pairings which correspond to meaningful chemical interactions. Corrected Mutual Information In this paper, we focus on one commonly used local method called mutual information. In particular, we use a variant which corrects for a portion of the background noise including the impact of common lineage. Mutual information is a way to quantify the dependence of two random variables, or how much knowing one variable can tell you about the other. Concretely, suppose X and Y are positions in a multiple sequence alignment (MSA). Then the mutual information of X and Y is: MI(X, Y ) = p(x, y) log y Y x X ( ) p(x, y) p(x)p(y) 2
3 Like many other local methods, mutual information revolves around a comparison of the observed distribution of pairs (p(x, y)) with the expected distribution under an assumption of independence (p(x)p(y)). The contribution of each amino acid pairing (or combination of x and y) is additive, so as we make use of later, we can meaningfully distinguish particular pairings which have an out-sized effect on the overall mutual information score. Introduced by Dunn et al. (2008), corrected mutual information (cmi) is a minor variation which subtracts away a quantity called the average product correction (APC). The APC of positions X and Y is defined as: AP C(X, Y ) = MI(X, )MI(, Y ) MI where, if we allow n to be the length of the MSA, MI = 2 na=1 nb A MI(A, B) n(n 1) is the average of MI scores across all position pairs, and MI(X, ) = 1 nb X MI(X, B) is n 1 the average of MI scores among position pairs featuring X. And finally, we have: cmi(x, Y ) = MI(X, Y ) AP C(X, Y ) By incorporating a positional averaging, the APC under certain theoretical assumptions approximates the amount of background noise or shared ancestry inherent in its input positions, which is then subtracted away from the MI to produce the cmi, which has been empirically shown to be a more accurate measure of the covariation from structural or functional constraints (Dunn et al., 2008). Methods The selection of protein families on which to test our hypothesis was constrained by finding proteins with: ample sequence length, a representative hidden markov model (HMM), a PDB 3
4 crystal structure with multiple chains, and a sufficient number of representatives in our sequence database. With these criteria in mind, we selected and tested on three protein families: grol, gudd, and BADH. In this section we focus on giving a detailed breakdown of the process for grol. Chaperonin GroEL (grol) is a protein found in bacteria which is involved with protein folding. It has an HMM of length 526, denoted TIGR02348 in the TIGRFAMs database. It has a PDB crystal structure of length 525, denoted 2EU1 in the Protein Data Bank. The crystal structure consists of 14 different chains. Physical distance within the crystal structure was defined as the minimum distance between atoms in the r-groups. For glycine, which has no r-group, we used the location of its c-alpha atom. We further refined the notion of distance between two positions to be the minimum distance for those positions across combinations of the 14 approximately identical chains. Using HMMER 3.0, we performed an HMM search against the Uniref full database which found over 5000 sequences above the trusted cut-off, a number of sequences well surpassing the minimum for MI methods (Gloor et al., 2005). We aligned the crystal structure sequence against the HMM as well, in order to translate residue positions from HMM to crystal structure, and vice versa. With the 5000 sequences as our data set, we computed cmi for each pair of positions from the 526. We selected those pairs above the 99.5th percentile in cmi, or the top 687 pairs of positions. For each pair, we translated them to the crystal structure and computed their physical distance. We then generated 196 pairs of positions taken at random, and computed their physical distance. We also generated 99 pairs of adjacent positions, meaning they are within 5 positions apart on the sequence. Intuitively, we would expect the high cmi positions, since they are covarying, to be closer in distance than the random positions. We would also expect the 4
5 adjacent positions to be fairly close in distance since they are near in terms of sequence order. Finally, with regards our hypothesis, we wished to filter the high cmi positions for covariations from amino acids having opposite polarity. We selected high cmi pairs for which [K or R] with [E or D] interactions accounted for at least 15% of their MI score. Since opposite polarities have a physical explanation for covarying, we expected those positions to be closer. Results We first plotted the three sets of position pairs: high cmi, random, and adjacent, according to their physical distance (in angstroms) versus their cmi score (note that while regular MI is always non-negative, it is possible for cmi to be negative). Figure 1: 5
6 We see in Figure 1 that the random positions are spread fairly evenly in terms of distance, from 0 angstroms apart to over 60 angstroms apart, with a mean distance of Meanwhile, they are clustered tightly around 0 in cmi score. The high cmi positions, naturally, only occur on the right side of the graph, with a clear separation from the random positions. While still fairly spread out in terms of distance, the high cmi positions are weighted more heavily towards the lower side of the graph, indicating their closer distances while having a mean of The adjacent positions are salient in the uniformity of their closer distances with a mean of We then plotted within the high cmi group, highlighting those with covariation coming from opposite polarity amino acids (KR-ED correlation). Figure 2: Figure 3: We see in Figure 2 that of the 687 high cmi positions, 112 of them were found to be KR-ED correlated. The KR-ED correlated pairs do indeed have a closer mean distance (12.88) compared to the overall high cmi (19.04). However, of the 112 KR-ED correlated pairs, we found that 38 of them were sequentially adjacent. This is a fairly high percentage. And as we know from the previous Figure 1, adjacent pairs are almost always close in distance. So, what portion of the improved distance in KR-ED correlated pairs could be explained by the sequen- 6
7 tially adjacent? In Figure 3, we excluded the sequentially adjacent and plotted 576 non-adjacent high cmi positions alongside 74 non-adjacent KR-ED correlated positions. The former had a mean of 21.71, while the latter had a mean of 16.98, meaning the difference was diminished slightly, but still presents a moderate improvement. The overall results we presented thus far for grol hold true for the other two protein families, gudd and BADH, as well, and their figures are included in the supplemental section. Discussion and Future Work We have seen that filtering for opposite polarity of covarying amino acid pairs improves upon a mere reliance of high cmi score, even when removing the effects of selecting for sequentially adjacent pairs. The improvement effect is moderate, on an averaged basis, and by no means gives a guaranteed set of contacting pairs. In this study we focused exclusively on opposite polarity. In the course of the study we did attempt to filter by other interactions, such as those for hydrophobic affiliation, but obtained weak or mixed results. We also ran through the most common specific pairings found for close positions versus far positions but did not detect a discernibly generalizable pattern. Nevertheless, this line of investigation may hold untapped potential in leading to additional, more fine-grained filters based on the chemical properties of covariational patterns, and thereby augmenting local methods like mutual information in producing more accurate contact predictions. Acknowledgments I am extremely grateful to Dr. Jeremy Selengut for his generous guidance and expertise while working on this project; to members of the CBCB including Dr. Mihai Pop, Jay Ghurye, and Nidhi Shah for their helpful insights during conversations; and to IARPA for providing the 7
8 funding for this project to occur. References and Notes 1. Dunn, S.D., Wahl, L.M., Gloor, G.B. (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics, Volume 24, Issue 3, Pages Gloor, Gregory B., Martin, Louise C., Wahl, Lindi M., Dunn, Stanley D. (2005) Mutual Information in Protein Multiple Sequence Alignments Reveals Two Classes of Coevolving Positions. Biochemistry 44 (19), Pages Juan, David de, Pazos, Florencio, Valencia, Alfonso. (2013) Emerging methods in protein co-evolution. Nature Review Genetics 14, Pages Marks, Debora S., Hopf, Thomas A., Sander, Chris. (2012) Protein structure prediction from sequence variation. Nature Biotechnology 30, Pages
9 Supplemental 9
10 10
Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationThe Evolutionary Origins of Protein Sequence Variation
Temple University Structural Bioinformatics II The Evolutionary Origins of Protein Sequence Variation Protein Evolution (tour of concepts & current ideas) Protein Fitness, Marginal Stability, Compensatory
More informationPotts Models and Protein Covariation. Allan Haldane Ron Levy Group
Temple University Structural Bioinformatics II Potts Models and Protein Covariation Allan Haldane Ron Levy Group Outline The Evolutionary Origins of Protein Sequence Variation Contents of the Pfam database
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationSupporting Information
Supporting Information I. INFERRING THE ENERGY FUNCTION We downloaded a multiple sequence alignment (MSA) for the HIV-1 clade B Protease protein from the Los Alamos National Laboratory HIV database (http://www.hiv.lanl.gov).
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationwhere Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.
Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter
More informationMotif Prediction in Amino Acid Interaction Networks
Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationUsing Information Theory to Reduce Complexities of Neural Networks in Protein Secondary Structure Prediction
Boon Thye Thomas Yeo 4662474 Biochem 218 Final Project Using Information Theory to Reduce Complexities of Neural Networks in Protein Secondary Structure Prediction 1 Abstract Neural networks are commonly
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationJeremy Chang Identifying protein protein interactions with statistical coupling analysis
Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationComparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain, Rensselaer Polytechnic Institute
Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae Emily Germain, Rensselaer Polytechnic Institute Mentor: Dr. Hugh Nicholas, Biomedical Initiative, Pittsburgh Supercomputing
More informationSAM Teacher s Guide Protein Partnering and Function
SAM Teacher s Guide Protein Partnering and Function Overview Students explore protein molecules physical and chemical characteristics and learn that these unique characteristics enable other molecules
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationIntroduction to spectral alignment
SI Appendix C. Introduction to spectral alignment Due to the complexity of the anti-symmetric spectral alignment algorithm described in Appendix A, this appendix provides an extended introduction to the
More informationProtein Structure Determination from Pseudocontact Shifts Using ROSETTA
Supporting Information Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Christophe Schmitz, Robert Vernon, Gottfried Otting, David Baker and Thomas Huber Table S0. Biological Magnetic
More informationResearch Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.
Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research
More informationMotifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC
Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved
More informationStructural biomathematics: an overview of molecular simulations and protein structure prediction
: an overview of molecular simulations and protein structure prediction Figure: Parc de Recerca Biomèdica de Barcelona (PRBB). Contents 1 A Glance at Structural Biology 2 3 1 A Glance at Structural Biology
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationConditional Random Fields: An Introduction
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 2-24-2004 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More informationBiological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor
Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms
More informationCHAPTER 5 SOUND LEVEL MEASUREMENT ON POWER TRANSFORMERS
CHAPTER 5 SOUND LEVEL MEASUREMENT ON POWER TRANSFORMERS 5.1 Introduction Sound pressure level measurements have been developed to quantify pressure variations in air that a human ear can detect. The perceived
More informationMolecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007
Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationStructure learning in human causal induction
Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationWarm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab
Date: Agenda Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab Ask questions based on 5.1 and 5.2 Quiz on 5.1 and 5.2 How
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationPairwise sequence alignments
Pairwise sequence alignments Volker Flegel VI, October 2003 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs VI, October
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationLocal Alignment Statistics
Local Alignment Statistics Stephen Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD Central Issues in Biological Sequence Comparison
More informationProtein Structure Prediction, Engineering & Design CHEM 430
Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment
More informationIn order to compare the proteins of the phylogenomic matrix, we needed a similarity
Similarity Matrix Generation In order to compare the proteins of the phylogenomic matrix, we needed a similarity measure. Hamming distances between phylogenetic profiles require the use of thresholds for
More informationQuantitative Stability/Flexibility Relationships; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 12
Quantitative Stability/Flexibility Relationships; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 12 The figure shows that the DCM when applied to the helix-coil transition, and solved
More informationObjectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.
Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae Emily Germain 1,2 Mentor Dr. Hugh Nicholas 3 1 Bioengineering & Bioinformatics Summer Institute, Department of Computational
More informationImproving De novo Protein Structure Prediction using Contact Maps Information
CIBCB 2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology Improving De novo Protein Structure Prediction using Contact Maps Information Karina Baptista dos Santos
More informationMutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802
Mutual Information & Genotype-Phenotype Association Norman MacDonald January 31, 2011 CSCI 4181/6802 2 Overview What is information (specifically Shannon Information)? What are information entropy and
More informationBioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre
Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement
More informationSome Problems from Enzyme Families
Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems
More information8/04/2011. last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression)
psyc3010 lecture 7 analysis of covariance (ANCOVA) last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression) 1 announcements quiz 2 correlation and
More informationLecture Notes: Markov chains
Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity
More informationDetecting unfolded regions in protein sequences. Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France
Detecting unfolded regions in protein sequences Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France Large proteins and complexes: a domain approach Structural studies
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More information114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009
114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome
More informationIEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 1. Graphical Models of Residue Coupling in Protein Families
IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 1 Graphical Models of Residue Coupling in Protein Families John Thomas, Naren Ramakrishnan, and Chris Bailey-Kellogg Abstract Many statistical
More informationSequence Analysis and Databases 2: Sequences and Multiple Alignments
1 Sequence Analysis and Databases 2: Sequences and Multiple Alignments Jose María González-Izarzugaza Martínez CNIO Spanish National Cancer Research Centre (jmgonzalez@cnio.es) 2 Sequence Comparisons:
More informationLECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS
LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationThe Markov equivalence class of the mediation model. Felix Thoemmes, Wang Liao, & Marina Yamasaki Cornell University
The Markov equivalence class of the mediation model Felix Thoemmes, Wang Liao, & Marina Yamasaki Cornell University Mediation model 2 Mediation model If M is a mediator, then indirect effect should be
More informationPairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )
Pairwise sequence alignments Vassilios Ioannidis (From Volker Flegel ) Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs Importance
More informationInquiry Calculus and the Issue of Negative Higher Order Informations
Article Inquiry Calculus and the Issue of Negative Higher Order Informations H. R. Noel van Erp, *, Ronald O. Linger and Pieter H. A. J. M. van Gelder,2 ID Safety and Security Science Group, TU Delft,
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationPresentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy
Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy Burkhard Rost and Chris Sander By Kalyan C. Gopavarapu 1 Presentation Outline Major Terminology Problem Method
More informationOrientational degeneracy in the presence of one alignment tensor.
Orientational degeneracy in the presence of one alignment tensor. Rotation about the x, y and z axes can be performed in the aligned mode of the program to examine the four degenerate orientations of two
More informationForecasting Wind Ramps
Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators
More informationHidden Markov Models
Hidden Markov Models A selection of slides taken from the following: Chris Bystroff Protein Folding Initiation Site Motifs Iosif Vaisman Bioinformatics and Gene Discovery Colin Cherry Hidden Markov Models
More informationOn the errors introduced by the naive Bayes independence assumption
On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of
More information08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega
BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationAnalysis and Prediction of Protein Structure (I)
Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng
More informationCHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the
CHAPTER 4 VARIABILITY ANALYSES Chapter 3 introduced the mode, median, and mean as tools for summarizing the information provided in an distribution of data. Measures of central tendency are often useful
More informationIntroduction to" Protein Structure
Introduction to" Protein Structure Function, evolution & experimental methods Thomas Blicher, Center for Biological Sequence Analysis Learning Objectives Outline the basic levels of protein structure.
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More informationBasics of protein structure
Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu
More informationCNV Methods File format v2.0 Software v2.0.0 September, 2011
File format v2.0 Software v2.0.0 September, 2011 Copyright 2011 Complete Genomics Incorporated. All rights reserved. cpal and DNB are trademarks of Complete Genomics, Inc. in the US and certain other countries.
More informationPacking of Secondary Structures
7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary
More informationApplications of Hidden Markov Models
18.417 Introduction to Computational Molecular Biology Lecture 18: November 9, 2004 Scribe: Chris Peikert Lecturer: Ross Lippert Editor: Chris Peikert Applications of Hidden Markov Models Review of Notation
More informationLecture 18 Generalized Belief Propagation and Free Energy Approximations
Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and
More informationHOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.
HOMOLOGY MODELING Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationProtein Structure: Data Bases and Classification Ingo Ruczinski
Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationCladistics and Bioinformatics Questions 2013
AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species
More informationAnalysis of the 500 mb height fields and waves: testing Rossby wave theory
Analysis of the 500 mb height fields and waves: testing Rossby wave theory Jeffrey D. Duda, Suzanne Morris, Michelle Werness, and Benjamin H. McNeill Department of Geologic and Atmospheric Sciences, Iowa
More informationTable S1. Primers used for the constructions of recombinant GAL1 and λ5 mutants. GAL1-E74A ccgagcagcgggcggctgtctttcc ggaaagacagccgcccgctgctcgg
SUPPLEMENTAL DATA Table S1. Primers used for the constructions of recombinant GAL1 and λ5 mutants Sense primer (5 to 3 ) Anti-sense primer (5 to 3 ) GAL1 mutants GAL1-E74A ccgagcagcgggcggctgtctttcc ggaaagacagccgcccgctgctcgg
More informationBuilding a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor
Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor Presented by Stephanie Lee Research Mentor: Dr. Rob Coalson Glycine Alpha 1 Receptor (GlyRa1) Member of the superfamily
More information1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)
Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein
More informationSystematic strategies for real time filtering of turbulent signals in complex systems
Systematic strategies for real time filtering of turbulent signals in complex systems Statistical inversion theory for Gaussian random variables The Kalman Filter for Vector Systems: Reduced Filters and
More informationBig Data Analysis with Apache Spark UC#BERKELEY
Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»
More informationProtein sequences as random fractals
J. Biosci., Vol. 18, Number 2, June 1993, pp 213-220. Printed in India. Protein sequences as random fractals CHANCHAL K MITRA* and MEETA RANI School of Life Sciences, University of Hyderabad, Hyderabad
More informationGLOSSARY. a n + n. a n 1 b + + n. a n r b r + + n C 1. C r. C n
GLOSSARY A absolute cell referencing A spreadsheet feature that blocks automatic adjustment of cell references when formulas are moved or copied. References preceded by a dollar sign $A$1, for example
More informationChapter Chemical Uniqueness 1/23/2009. The Uses of Principles. Zoology: the Study of Animal Life. Fig. 1.1
Fig. 1.1 Chapter 1 Life: Biological Principles and the Science of Zoology BIO 2402 General Zoology Copyright The McGraw Hill Companies, Inc. Permission required for reproduction or display. The Uses of
More informationSecondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure
Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted
More informationAlpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University
Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and
More informationSTRUCTURAL BIOINFORMATICS I. Fall 2015
STRUCTURAL BIOINFORMATICS I Fall 2015 Info Course Number - Classification: Biology 5411 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Vincenzo Carnevale - SERC, Room 704C;
More informationS H/T 0 ph = log([h + ]) E = mc 2 S = klnw G = H T S ph = pk a + log([a ]/[HA]) K a = [H + ][A ]/[HA] G = RTlnK eq e iπ + 1 = 0
Biochemistry 463, Summer II Your Name: University of Maryland, College Park Your SID #: Biochemistry and Physiology Prof. Jason Kahn Exam I (100 points total) July 27, 2007 You have 80 minutes for this
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian
More information