Bioinformatics: Secondary Structure Prediction
|
|
- Dayna Terry
- 5 years ago
- Views:
Transcription
1 Bioinformatics: Secondary Structure Prediction Prof. David Jones
2 LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem?
3 Entries Why predict structure? Growth of sequence and structure databases Metagenomics Sequences Structures Year
4 Biological Information Flow Gene (DNA) Protein 3D structure Function The unique 3D structure of a protein determines its biochemical function.
5 Protein Secondary Structure HELIX STRAND COIL
6 Secondary Structure Prediction LMLSTQNPALLKRNIIYWNNVALLWEAGSD? LMLSTQNPALL HHHEEEECCCC OUTPUT: 3-letter Secondary Structure Alphabet INPUT: 20-letter Amino Acid Alphabet
7 Goal Take primary structure (sequence) and, using rules derived from known structures, predict the secondary structure that is most likely to be adopted by each residue
8 1st Generation Methods Based on statistical analysis of single amino acid properties Examples: Chou & Fasman (1974) Lim (1974) Garnier, Osguthorpe & Robson (1978)
9 Structural Propensities Due to the size, shape and charge of its side chain, each amino acid may fit better in one type of secondary structure than another Classic example: The rigidity and side chain angle of proline cannot be accommodated in an -helical structure
10 Structural Propensities Two ways to view the significance of this preference (or propensity) It may control or affect the folding of the protein in its immediate vicinity (amino acid determines structure) It may constitute selective pressure to use particular amino acids in regions that must have a particular structure (structure determines amino acid)
11 Chou-Fasman method Uses table of conformational parameters (propensities) determined primarily from measurements of secondary structure by CD spectroscopy Table consists of one likelihood for each structure for each amino acid For amino acid type A (e.g. leucine) and structure type S (e.g. α helix), a propensity score is calculated as follows: P S p( A S) p( A) n A, S n A n n S
12 Chou-Fasman propensities (partial table) Amino Acid P P P t Glu Met Ala Val Ile Tyr Pro Gly
13 Chou-Fasman method A prediction is made for each type of structure for each amino acid Can result in ambiguity if a region has high propensities for both helix and sheet (higher value usually chosen, with exceptions)
14 Chou-Fasman method Calculation rules are somewhat ad hoc Example: Method for helix Search for nucleating region where 4 out of 6 a.a. have P > 1.03 Extend until 4 consecutive a.a. have an average P < 1.00 If region is at least 6 a.a. long, has an average P > 1.03, and average P > average P consider region to be helix
15 2nd Generation Methods Based on peptide segments / residue pairs Examples: GOR III (1987) The BIG NEWS, however, was the appearance of the first examples of MACHINE LEARNING in secondary structure prediction Neural Networks: Qian & Sejnowski (1988), Bohr et al. (1988), Holley & Karplus (1989)
16 Neural Networks Originally, neural networks were developed as simple models of brain function i.e. they were intended to be simulations of real networks of neurons. Hence the term Artificial Neural Network. Today these simple models are obsolete in neuroscience research but instead have become very useful tools for finding patterns in data.
17 Artificial Neural Networks Inputs Output An artificial neural network (ANN) is made up of many switching units (artificial neurons) that are connected together according to a specific network architecture. The objective of an artificial neural network is to learn how to transform inputs into meaningful outputs.
18 Training and Using ANNs We start with complete random connection weights Then we present an input pattern to the network and compare the output we get to what it should be If the output is correct then we don t need to change any weights If the output is WRONG then we make small changes to the connection weights so as to reduce the ERROR We then repeat this for all the examples we have And keep repeating on all the data until we see no further improvement
19 Training and Testing Sets The data we use to adjust the network weights is called the TRAINING SET To make sure we are not over-fitting our network to our training set, we should test the network on a completely separate TESTING SET This splitting of training and testing data is called CROSS- VALIDATION and is an important concept in statistics and machine learning
20 Some real data... Inputs A R N D C Q E G H I L K M F P S T W Y V Coil Strand Coil Strand Strand Helix Helix Coil Strand Helix Helix Helix Strand Helix Helix Strand Strand Strand Coil Helix Helix Coil Coil Coil... And so on... Desired Outputs
21 A Better Scheme for Predicting Secondary Structure by Machine Learning Window of 15 residues Classifier (neural network) Helix Strand Coil MLSPQAMSDFHEELKWLLCNIPGQKLASLANREYT We are predicting the secondary Structure for this central residue
22 Representation Usual representation is to use 21 inputs per amino acid: Ala : Val : :
23 3rd Generation Neural Network Methods PHD Rost B, Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, PSIPRED Jones, D. T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292:
24 3rd Generation Methods Exploit evolutionary information Based on conservation analysis of multiple sequence alignments (HMMs or profiles) Extract some long-range information via accessibility patterns Conserved Hydrophobic Residues -> BURIED Variable Polar Residues -> EXPOSED
25 Variability and Hydrophobity of Amino Acids Variable & Polar Conserved & Hydrophobic Water soluble Variable & Very Hydrophobic Conserved & Hydrophobic Transmembrane
26 Pros & Cons of 3rd Generation Methods PROs High residue accuracy Less underprediction of strands Good quality segment predictions CONs Provides prediction for FAMILY CONSENSUS structure NOT THE STRUCTURE OF THE TARGET SEQUENCE
27 PSIPRED Works directly on PSI-BLAST profiles (PSSMs) Uses 2 separate stages of neural networks First network predicts secondary structure Second network cleans outputs from 1st net Trained on profile from ~3000 different proteins of known structure (taken from PDB)
28 4 x 1st Networks PSIPRED Neural Networks 15x21 Input units 75 hidden units 3 output units (H/E/C) 2nd Network 15x3 Input units 60 hidden units 3 output units (H/E/C)
29 PSIPRED Method Raw profile from PSI-BLAST Log File Position-based scoring matrix used A R N D C Q E G H I L K M F P S T W Y V Window of 15 rows A R N D C Q E G H I L K M F P S T W Y V x 20 scaled inputs to 1st network 1st Network 315 inputs 75 hidden units 3 outputs Window of 15 x 3 outputs fed to 2nd network 2nd Network 60 inputs 60 hidden units 3 outputs Final 3-state Prediction
30 2 nd Network: Filtering Raw Predictions from 1 st Network Stage 1 output Stage 2 output Actual 49 T E L E V E R E V E P C G C S C W E E H I H P H V H A H C H E 49 T E L E V E R E V E P C G C S H W H E H I H P H V H A H C H E 49 T E 50 L E 51 V E 52 R E 53 V E 54 P C 55 G C 56 S H 57 W H 58 E H 59 I H 60 P H 61 V H 62 A H
31 Common Measures of Secondary Structure Prediction Accuracy Q 3 scores give the percentage of correctly predicted residues across 3 states (H,E,C) This is the most commonly used measure Other scores such as Matthew s Correlation Coefficient try to identify accuracy for individual states (Coil, Strand, Helix) and are more sensitive to over-prediction e.g. if you predict all residues to be random coil you will get a Q 3 score of around 50% just because around 50% of residues in proteins are in random coils. However, the MCC scores will be close to zero!
32 Matthews Correlation Coefficient
33 Number PSIPRED Benchmark Results Mean Q 3 score: 77.8% (80.6% now) Q3 Score (%)
34 OBSOLETE!! First use of neural network Current best method Comparison of Generations by Average Q3 Scores GEN 1 GEN 2 GEN Chou & Fasman GOR I Qian & Sejnowski PHD (1994) PSIPRED 10 0
35 Protein Fold Recognition
36 Sequence Comparison > 30% Identity between two protein sequences implies probable common structure and possibly common function However, there are many exceptions to this rule of thumb 1PLC 1NIN IDVLLGADDGSLAFVPSEFSISPGEKIVFKNNAGFPHNIVFDEDSIPS-GVDASKISM : X:.: : :.: :...:.::.. : :: :::.::: :...:.: :. ETYTVKLGSDKGLLVFEPAKLTIKPGDTVEFLNNKVPPHNVVFDAALNPAKSADLAK-SL PLC 1NIN SEEDLLNAKGETFEVAL---SNKGEYSFYCSPHQGAGMVGKVTVN- :...:X. : :::.::: ::.:::::::.:: SHKQLLMSPGQSTSTTFPADAPAGEYTFYCEPHRGAGMVGKITVAG
37 Similar Sequence Similar 3-D Structure (RMSD = 2.1 A, Seq. ID = 30%) Ribonuclease MC1 Ribonuclease Rh
38 Structure similarity
39 Increasing accuracy/reliability Prediction Methods Comparative modelling Requires: Known fold + clear homology Fold recognition Requires: Known fold Ab initio / new fold methods Requires: only target sequence Increasing Difficulty
40 Tertiary Structure Prediction BASIS: native fold is expected to be the conformation of lowest energy True for small molecules IMPLICATION: native fold can be found by defining a potential energy function and searching all conformations for the one with lowest energy
41 The Levinthal Paradox For a protein sequence of length l, the total number of possible chain conformations N is given by: N 10 l >> Even if a protein was able to rearrange itself at the speed of light it would take ~10 75 years to locate the global energy minimum for a 100 residue protein.
42 Fold recognition a short cut to predicting protein tertiary structure Although there are vast number of possible protein structures, only a few have been observed in Nature The chance of a newly solved structure having a previously unknown fold is only ~20% >> We might be able to predict protein structure by selecting from already observed folds (a Multiple Choice version of the protein folding problem)
43 Protein Structure Prediction by Threading
44 An objective function for protein folding As yet there is no accurate physical model for protein folding Physics based force fields are not able to properly handle entropic solvent effects We cannot rely on classical physics Can we define a statistical model? Can we estimate Prob(Structure Sequence)?
45 Original structure (fragment) T G P A S K Native threading opposite charges stabilise the fold. D I Q
46 Threading alignment 1 - W R T M E D Y Ouch! Equal sign charges repel!! S
47 Threading alignment 2 - W - T M E R Y Much better - opposite charges again! S
48 A Scoring Function for Threading We want a want of assessing the energy of a model i.e. a model based on an alignment of a sequence with a structure Energy functions based on physics do not work Let s use a KNOWLEDGE-BASED APPROACH APPROACH - Look at known structures and see how often particular features (e.g. contacts between amino acids) occur. In other words we calculate probabilities.
49 Converting Probabilities to Potentials Probabilities are inconvenient for computer algorithms due to the required multiplications Additive quantities (e.g. energies) are easier to handle >> For many applications it is common to transform probabilities into energy-like quantities
50 The Inverse Boltzmann Principle The basic assumption in generating empirical potentials from probabilities is the so-called Inverse Boltzmann principle. According to the Boltzmann principle, the probability of occurrence of a given conformational state of energy E scales with the Boltzmann factor e -E /RT, where R is the gas constant (1.987 x 10-3 kcal.mol -1 K -1 ) and T is the absolute temperature (e.g. room temperature).
51 Potentials of Mean Force Count interactions of given type (e.g. alanine->serine betacarbon to alpha-carbon) in real protein structures Count interactions of same type in randomly generated protein structures or randomly selected sites (the reference state) Ratio of probabilities provides an estimate of the free energy change according to the inverse Boltzmann equation: E RT log p( interactio n in real proteins ) p( interactio n in decoy structures ) E High Energy State Low Energy State
52 Potentials of Mean Force k = 4, s = 14 Angstroms A S How common is this configuration in real proteins? What about in randomly folded protein chains?
53 Fold Recognition Potentials SR terms (n <11) MR terms (10 < n < 30) LR terms (n >= 30) Solvation terms (Rel. Acc.)
54 Potential Short-range C Pair Potential Separation 4) Distance (A)
55 We can estimate the stability of a given protein fold by summing potentials of mean force for all residue pairs Loops are sometimes ignored Single residue (solvation) terms usually included Can scale terms according to protein size
56 Partially correct Immunoglobulin Heavy Chain Fab Fragment Model Coloured by Threading Potential HIGH LOW
57 Can statistical potentials find the correct fold amongst a large set of incorrect decoy structures?
58 Search Problem: Threading Searching for the optimum mapping of sequence to structure while optimizing the sum of pair interactions is NP-complete (proven by R. Lathrop in 1994) i.e. an exhaustive search is needed to guarantee an optimal solution This search process is called THREADING i.e. what is the best way of threading this sequence through this structure In practice due to short range of pair potentials, heuristic solutions work fairly well: Exhaustive search (not practical) Dynamic programming. Double dynamic programming. Branch and bound. Monte Carlo, simulated annealing, Gibbs sampling, genetic algorithm.
59 Threading Methods in Practice Compared to comparative modelling, threading methods can produce models where there is no detectable homologue to be found in PDB The simplifying assumption that the backbone of the structure does not change when the sequence changes also results in poor recognition of folds (~30% reliability) and inaccurate models (i.e. inaccurate alignments) For distant homologues, however, there are weak clues in the sequences that can help
60 Learning to recognise protein folds Ideally we would like to define a single value which denotes the compatibility of a structure with a sequence i.e. P(Structure Seq) In practice this is not straightforward as each feature is estimated differently and are not independent >> We need to combine different features to decide whether or not a given fold recognition match is correct
61 GenTHREADER Neural Net Pair Energy Solv. Energy Alignment score Proteins related Alignment Length Proteins unrelated Nres (Struct) Nres (seq)
62 FSSP Z-score GenTHREADER4 Network Output Correlation with Structural Similarity Network Output
63 Calibrating the scores In theory, the network outputs should be good estimates of posterior probabilities In practice, they are not accurate estimates of p-values when benchmarked However, we can empirically estimate p- values by fitting to a suitable distribution
64 Frequency Estimating GenTHREADER p-values % 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00%.00% Frequency Cumulative % Network Score p 26537e 25.33x
65 P-value p-value dependence on Pair Energy and Solvation Energy Solv = 10 Solv = Pair Energy
66 P-value p-value dependence on Pair Energy and Profile Alignment Score E = +250 E = -250 Profile Alignment Score
67 Comparison of Search Methods Run time to search SWISSPROT Release 37 (77977 sequences) for matches to Adenylate Kinase Method Hits Time Taken BLAST 61 8 sec PSIBLAST min GenTHREADER min Adenylate Kinase Perfect threading method 2000? days
68 Error Rate (%) Ratio Error Rate/Coverage Plot for Sequence-based Profiles (i.e. HMMs) vs GenTHREADER (1% error rate thresholds marked) Profile GenTHR % 52% Coverage (Sensitivity)
69 An Example of Fold Recognition
70 An Example ORF HI0073 from H. influenzae 114 a.a. long Function UNKNOWN: E64000 hypothetical protein HI Haemophilus influenzae (str e-46 A71149 hypothetical protein PH Pyrococcus horikoshii 101 7e-22 H70345 conserved hypothetical protein aq_507 - Aquifex aeolicus 99 2e-21 F72600 hypothetical protein APE Aeropyrum pernix (strain K1) 50 1e-06 C75046 hypothetical protein PAB Pyrococcus abyssi (strain e-04 C64354 hypothetical protein MJ Methanococcus jannaschii D64375 hypothetical protein MJ Methanococcus jannaschii H90346 conserved hypothetical protein [imported] - Sulfolobus so S62544 hypothetical protein SPAC12G12.13c - fission yeast (Schiz C90279 conserved hypothetical protein [imported] - Sulfolobus so H64462 hypothetical protein MJ Methanococcus jannaschii C69282 conserved hypothetical protein AF Archaeoglobus ful
71
72
73
74
75 Conclusions HI0073 is a probable nucleotidyl transferase (now confirmed) Consistent fold recognition results Match to 1FA0 Chain B Secondary structure in reasonable agreement Functionally Important Residues are CONSERVED
Bioinformatics: Secondary Structure Prediction
Bioinformatics: Secondary Structure Prediction Prof. David Jones d.t.jones@ucl.ac.uk Possibly the greatest unsolved problem in molecular biology: The Protein Folding Problem MWMPPRPEEVARK LRRLGFVERMAKG
More informationProtein Structure Prediction and Display
Protein Structure Prediction and Display Goal Take primary structure (sequence) and, using rules derived from known structures, predict the secondary structure that is most likely to be adopted by each
More informationBioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter
Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction Institute of Bioinformatics Johannes Kepler University, Linz, Austria Chapter 4 Protein Secondary
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More informationPhysiochemical Properties of Residues
Physiochemical Properties of Residues Various Sources C N Cα R Slide 1 Conformational Propensities Conformational Propensity is the frequency in which a residue adopts a given conformation (in a polypeptide)
More informationProtein Secondary Structure Prediction
part of Bioinformatik von RNA- und Proteinstrukturen Computational EvoDevo University Leipzig Leipzig, SS 2011 the goal is the prediction of the secondary structure conformation which is local each amino
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationProtein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods
Cell communication channel Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu SEQUENCE STRUCTURE DNA Sequence Protein Sequence Protein Structure Protein structure ATGAAATTTGGAAACTTCCTTCTCACTTATCAGCCACCT...
More informationBasics of protein structure
Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More informationPROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS
Int. J. LifeSc. Bt & Pharm. Res. 2012 Kaladhar, 2012 Research Paper ISSN 2250-3137 www.ijlbpr.com Vol.1, Issue. 1, January 2012 2012 IJLBPR. All Rights Reserved PROTEIN SECONDARY STRUCTURE PREDICTION:
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationProtein Structures: Experiments and Modeling. Patrice Koehl
Protein Structures: Experiments and Modeling Patrice Koehl Structural Bioinformatics: Proteins Proteins: Sources of Structure Information Proteins: Homology Modeling Proteins: Ab initio prediction Proteins:
More informationSteps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure
Structure prediction, fold recognition and homology modelling Marjolein Thunnissen Lund September 2012 Steps in protein modelling 3-D structure known Comparative Modelling Sequence of interest Similarity
More informationProtein Secondary Structure Prediction using Feed-Forward Neural Network
COPYRIGHT 2010 JCIT, ISSN 2078-5828 (PRINT), ISSN 2218-5224 (ONLINE), VOLUME 01, ISSUE 01, MANUSCRIPT CODE: 100713 Protein Secondary Structure Prediction using Feed-Forward Neural Network M. A. Mottalib,
More informationIntroduction to Comparative Protein Modeling. Chapter 4 Part I
Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature
More informationBIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Protein Structure Prediction I
BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer 2013 9. Protein Structure Prediction I Structure Prediction Overview Overview of problem variants Secondary structure prediction
More informationProtein Secondary Structure Prediction
Protein Secondary Structure Prediction Doug Brutlag & Scott C. Schmidler Overview Goals and problem definition Existing approaches Classic methods Recent successful approaches Evaluating prediction algorithms
More informationProgramme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues
Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback
More informationProtein Structure Prediction
Protein Structure Prediction Michael Feig MMTSB/CTBP 2006 Summer Workshop From Sequence to Structure SEALGDTIVKNA Ab initio Structure Prediction Protocol Amino Acid Sequence Conformational Sampling to
More informationTHE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION
THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure
More informationProtein Structure Prediction
Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on
More informationImproved Protein Secondary Structure Prediction
Improved Protein Secondary Structure Prediction Secondary Structure Prediction! Given a protein sequence a 1 a 2 a N, secondary structure prediction aims at defining the state of each amino acid ai as
More informationSUPPLEMENTARY MATERIALS
SUPPLEMENTARY MATERIALS Enhanced Recognition of Transmembrane Protein Domains with Prediction-based Structural Profiles Baoqiang Cao, Aleksey Porollo, Rafal Adamczak, Mark Jarrell and Jaroslaw Meller Contact:
More informationStatistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics
Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia
More information1-D Predictions. Prediction of local features: Secondary structure & surface exposure
1-D Predictions Prediction of local features: Secondary structure & surface exposure 1 Learning Objectives After today s session you should be able to: Explain the meaning and usage of the following local
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationProtein Structure Prediction, Engineering & Design CHEM 430
Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationIT og Sundhed 2010/11
IT og Sundhed 2010/11 Sequence based predictors. Secondary structure and surface accessibility Bent Petersen 13 January 2011 1 NetSurfP Real Value Solvent Accessibility predictions with amino acid associated
More informationPresentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy
Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy Burkhard Rost and Chris Sander By Kalyan C. Gopavarapu 1 Presentation Outline Major Terminology Problem Method
More informationPacking of Secondary Structures
7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinff18.html Proteins and Protein Structure
More informationALL LECTURES IN SB Introduction
1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL
More informationProtein Structure Prediction Using Multiple Artificial Neural Network Classifier *
Protein Structure Prediction Using Multiple Artificial Neural Network Classifier * Hemashree Bordoloi and Kandarpa Kumar Sarma Abstract. Protein secondary structure prediction is the method of extracting
More information3D Structure. Prediction & Assessment Pt. 2. David Wishart 3-41 Athabasca Hall
3D Structure Prediction & Assessment Pt. 2 David Wishart 3-41 Athabasca Hall david.wishart@ualberta.ca Objectives Become familiar with methods and algorithms for secondary Structure Prediction Become familiar
More information114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009
114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome
More informationBuilding 3D models of proteins
Building 3D models of proteins Why make a structural model for your protein? The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier
More informationCan protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU
Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO! Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality
More informationSecondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure
Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted
More informationTemplate Free Protein Structure Modeling Jianlin Cheng, PhD
Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html
More informationPROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES
PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES by Lipontseng Cecilia Tsilo A thesis submitted to Rhodes University in partial fulfillment of the requirements for
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationTemplate Free Protein Structure Modeling Jianlin Cheng, PhD
Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling
More informationHIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.
Proteins are linear polypeptide chains (one or more) Building blocks: 20 types of amino acids. Range from a few 10s-1000s They fold into varying three-dimensional shapes structure medicine Certain level
More informationProtein Secondary Structure Assignment and Prediction
1 Protein Secondary Structure Assignment and Prediction Defining SS features - Dihedral angles, alpha helix, beta stand (Hydrogen bonds) Assigned manually by crystallographers or Automatic DSSP (Kabsch
More informationLecture 7. Protein Secondary Structure Prediction. Secondary Structure DSSP. Master Course DNA/Protein Structurefunction.
C N T R F O R N T G R A T V B O N F O R M A T C S V U Master Course DNA/Protein Structurefunction Analysis and Prediction Lecture 7 Protein Secondary Structure Prediction Protein primary structure 20 amino
More informationGetting To Know Your Protein
Getting To Know Your Protein Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research
More informationProtein Secondary Structure Prediction using Pattern Recognition Neural Network
Protein Secondary Structure Prediction using Pattern Recognition Neural Network P.V. Nageswara Rao 1 (nagesh@gitam.edu), T. Uma Devi 1, DSVGK Kaladhar 1, G.R. Sridhar 2, Allam Appa Rao 3 1 GITAM University,
More informationProtein structure alignments
Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More information09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition
Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*
More informationSCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like
SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,
More informationPrediction of protein secondary structure by mining structural fragment database
Polymer 46 (2005) 4314 4321 www.elsevier.com/locate/polymer Prediction of protein secondary structure by mining structural fragment database Haitao Cheng a, Taner Z. Sen a, Andrzej Kloczkowski a, Dimitris
More informationDesign of a Novel Globular Protein Fold with Atomic-Level Accuracy
Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein
More informationProtein Structure Prediction Using Neural Networks
Protein Structure Prediction Using Neural Networks Martha Mercaldi Kasia Wilamowska Literature Review December 16, 2003 The Protein Folding Problem Evolution of Neural Networks Neural networks originally
More informationSection Week 3. Junaid Malek, M.D.
Section Week 3 Junaid Malek, M.D. Biological Polymers DA 4 monomers (building blocks), limited structure (double-helix) RA 4 monomers, greater flexibility, multiple structures Proteins 20 Amino Acids,
More informationProtein Structure Determination
Protein Structure Determination Given a protein sequence, determine its 3D structure 1 MIKLGIVMDP IANINIKKDS SFAMLLEAQR RGYELHYMEM GDLYLINGEA 51 RAHTRTLNVK QNYEEWFSFV GEQDLPLADL DVILMRKDPP FDTEFIYATY 101
More informationProtein Secondary Structure Prediction
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Master Course DNA/Protein Structurefunction Analysis and Prediction Lecture 7 Protein Secondary Structure Prediction Protein primary
More informationProtein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.
Protein Dynamics The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Below is myoglobin hydrated with 350 water molecules. Only a small
More informationProtein Structure Prediction
Protein Structure Prediction Michael Feig MMTSB/CTBP 2009 Summer Workshop From Sequence to Structure SEALGDTIVKNA Folding with All-Atom Models AAQAAAAQAAAAQAA All-atom MD in general not succesful for real
More informationProcheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.
Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationComputer simulations of protein folding with a small number of distance restraints
Vol. 49 No. 3/2002 683 692 QUARTERLY Computer simulations of protein folding with a small number of distance restraints Andrzej Sikorski 1, Andrzej Kolinski 1,2 and Jeffrey Skolnick 2 1 Department of Chemistry,
More informationProtein Structure. W. M. Grogan, Ph.D. OBJECTIVES
Protein Structure W. M. Grogan, Ph.D. OBJECTIVES 1. Describe the structure and characteristic properties of typical proteins. 2. List and describe the four levels of structure found in proteins. 3. Relate
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationAn Artificial Neural Network Classifier for the Prediction of Protein Structural Classes
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2017 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article An Artificial
More informationTemplate-Based 3D Structure Prediction
Template-Based 3D Structure Prediction Sequence and Structure-based Template Detection and Alignment Issues The rate of new sequences is growing exponentially relative to the rate of protein structures
More informationSequence Analysis and Databases 2: Sequences and Multiple Alignments
1 Sequence Analysis and Databases 2: Sequences and Multiple Alignments Jose María González-Izarzugaza Martínez CNIO Spanish National Cancer Research Centre (jmgonzalez@cnio.es) 2 Sequence Comparisons:
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationImproving Protein Secondary-Structure Prediction by Predicting Ends of Secondary-Structure Segments
Improving Protein Secondary-Structure Prediction by Predicting Ends of Secondary-Structure Segments Uros Midic 1 A. Keith Dunker 2 Zoran Obradovic 1* 1 Center for Information Science and Technology Temple
More informationWe used the PSI-BLAST program (http://www.ncbi.nlm.nih.gov/blast/) to search the
SUPPLEMENTARY METHODS - in silico protein analysis We used the PSI-BLAST program (http://www.ncbi.nlm.nih.gov/blast/) to search the Protein Data Bank (PDB, http://www.rcsb.org/pdb/) and the NCBI non-redundant
More informationProtein Structure Prediction using String Kernels. Technical Report
Protein Structure Prediction using String Kernels Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE Minneapolis, MN 55455-0159
More informationProtein Threading. BMI/CS 776 Colin Dewey Spring 2015
Protein Threading BMI/CS 776 www.biostat.wisc.edu/bmi776/ Colin Dewey cdewey@biostat.wisc.edu Spring 2015 Goals for Lecture the key concepts to understand are the following the threading prediction task
More informationProtein secondary structure prediction with a neural network
Proc. Nati. Acad. Sci. USA Vol. 86, pp. 152-156, January 1989 Biophysics Protein secondary structure prediction with a neural network L. HOWARD HOLLEY AND MARTIN KARPLUS Department of Chemistry, Harvard
More informationDevelopment and Large Scale Benchmark Testing of the PROSPECTOR_3 Threading Algorithm
PROTEINS: Structure, Function, and Bioinformatics 56:502 518 (2004) Development and Large Scale Benchmark Testing of the PROSPECTOR_3 Threading Algorithm Jeffrey Skolnick,* Daisuke Kihara, and Yang Zhang
More informationMotif Prediction in Amino Acid Interaction Networks
Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationPredicting Protein Structural Features With Artificial Neural Networks
CHAPTER 4 Predicting Protein Structural Features With Artificial Neural Networks Stephen R. Holbrook, Steven M. Muskal and Sung-Hou Kim 1. Introduction The prediction of protein structure from amino acid
More informationIntro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models
Last time Domains Hidden Markov Models Today Secondary structure Transmembrane proteins Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL
More informationProtein 8-class Secondary Structure Prediction Using Conditional Neural Fields
2010 IEEE International Conference on Bioinformatics and Biomedicine Protein 8-class Secondary Structure Prediction Using Conditional Neural Fields Zhiyong Wang, Feng Zhao, Jian Peng, Jinbo Xu* Toyota
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationToday. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure
Last time Today Domains Hidden Markov Models Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL SSLGPVVDAHPEYEEVALLERMVIPERVIE FRVPWEDDNGKVHVNTGYRVQFNGAIGPYK
More informationProtein Structure Bioinformatics Introduction
1 Swiss Institute of Bioinformatics Protein Structure Bioinformatics Introduction Basel, 27. September 2004 Torsten Schwede Biozentrum - Universität Basel Swiss Institute of Bioinformatics Klingelbergstr
More informationCopyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.
Structure Determination and Sequence Analysis The vast majority of the experimentally determined three-dimensional protein structures have been solved by one of two methods: X-ray diffraction and Nuclear
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationLecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability
Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions: Van der Waals Interactions
More informationBio nformatics. Lecture 23. Saad Mneimneh
Bio nformatics Lecture 23 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely
More informationBayesian Models and Algorithms for Protein Beta-Sheet Prediction
0 Bayesian Models and Algorithms for Protein Beta-Sheet Prediction Zafer Aydin, Student Member, IEEE, Yucel Altunbasak, Senior Member, IEEE, and Hakan Erdogan, Member, IEEE Abstract Prediction of the three-dimensional
More informationStructure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27
Acta Cryst. (2014). D70, doi:10.1107/s1399004714021695 Supporting information Volume 70 (2014) Supporting information for article: Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase
More informationProperties of amino acids in proteins
Properties of amino acids in proteins one of the primary roles of DNA (but not the only one!) is to code for proteins A typical bacterium builds thousands types of proteins, all from ~20 amino acids repeated
More informationProtein quality assessment
Protein quality assessment Speaker: Renzhi Cao Advisor: Dr. Jianlin Cheng Major: Computer Science May 17 th, 2013 1 Outline Introduction Paper1 Paper2 Paper3 Discussion and research plan Acknowledgement
More informationConditional Graphical Models
PhD Thesis Proposal Conditional Graphical Models for Protein Structure Prediction Yan Liu Language Technologies Institute University Thesis Committee Jaime Carbonell (Chair) John Lafferty Eric P. Xing
More informationProtein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)
Computat onal Biology Lecture 21 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely
More informationBCH 4053 Spring 2003 Chapter 6 Lecture Notes
BCH 4053 Spring 2003 Chapter 6 Lecture Notes 1 CHAPTER 6 Proteins: Secondary, Tertiary, and Quaternary Structure 2 Levels of Protein Structure Primary (sequence) Secondary (ordered structure along peptide
More informationFlexPepDock In a nutshell
FlexPepDock In a nutshell All Tutorial files are located in http://bit.ly/mxtakv FlexPepdock refinement Step 1 Step 3 - Refinement Step 4 - Selection of models Measure of fit FlexPepdock Ab-initio Step
More informationSupersecondary Structures (structural motifs)
Supersecondary Structures (structural motifs) Various Sources Slide 1 Supersecondary Structures (Motifs) Supersecondary Structures (Motifs): : Combinations of secondary structures in specific geometric
More information