Prediction of protein
|
|
- Allyson Murphy
- 5 years ago
- Views:
Transcription
1 Prediction of protein contact t maps Piero Fariselli Department of Biology University of Bologna
2 From Sequence to Function Functional Genomics and Proteomics >BGAL_SULSO BETA-GALACTOSIDASE Sulfolobus solfataricus. MYSFPNSFRFGWSQAGFQSEMGTPGSEDPNTDWYKWVHDPENMAAGLVSG DLPENGPGYWGNYKTFHDNAQKMGLKIARLNVEWSRIFPNPLPRPQNFDE SKQDVTEVEINENELKRLDEYANKDALNHYREIFKDLKSRGLYFILNMYH WPLPLWLHDPIRVRRGDFTGPSGWLSTRTVYEFARFSAYIAWKFDDLVDE YSTMNEPNVVGGLGYVGVKSGFPPGYLSFELSRRHMYNIIQAHARAYDGI KSVSKKPVGIIYANSSFQPLTDKDMEAVEMAENDNRWWFFDAIIRGEITR GNEKIVRDDLKGRLDWIGVNYYTRTVVKRTEKGYVSLGGYGHGCERNSVS LAGLPTSDFGWEFFPEGLYDVLTKYWNRYHLYMYVTENGIADDADYQRPY YLVSHVYQVHRAINSGADVRGYLHWSLADNYEWASGFSMRFGLLKVDYNT KRLYWRPSALVYREIATNGAITDEIEHLNSVPPVKPLRH Genomic Protein sequences Protein structures sequences Protein functions
3 The Protein Folding T T C C P S I V A R S N F N V C R L P G T P E A L C A T Y T G C I I I P G A T C P G D Y A N
4 (Rost B.)
5 The Data Bases of Sequences and Structures EMBL: 195,241,608 sequences 292,078,866,691 nucleotides >BGAL_SULSO BETA-GALACTOSIDASE Sulfolobus solfataricus. MYSFPNSFRFGWSQAGFQSEMGTPGSEDPNTDWYKWVHDPENMAAGLVSG DLPENGPGYWGNYKTFHDNAQKMGLKIARLNVEWSRIFPNPLPRPQNFDE SKQDVTEVEINENELKRLDEYANKDALNHYREIFKDLKSRGLYFILNMYH WPLPLWLHDPIRVRRGDFTGPSGWLSTRTVYEFARFSAYIAWKFDDLVDE YSTMNEPNVVGGLGYVGVKSGFPPGYLSFELSRRHMYNIIQAHARAYDGI KSVSKKPVGIIYANSSFQPLTDKDMEAVEMAENDNRWWFFDAIIRGEITR GNEKIVRDDLKGRLDWIGVNYYTRTVVKRTEKGYVSLGGYGHGCERNSVS LAGLPTSDFGWEFFPEGLYDVLTKYWNRYHLYMYVTENGIADDADYQRPY YLVSHVYQVHRAINSGADVRGYLHWSLADNYEWASGFSMRFGLLKVDYNT KRLYWRPSALVYREIATNGAITDEIEHLNSVPPVKPLRH UNIPROT: sequences 154'416'236 residues PDB: D structures membrane proteins 1% November/2009
6 What is a multiple alignment? The short answer is this - VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS--VTVAWKADS AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG--
7 MSA quence profile Se 1 Y K D Y H S - D K K K G E L Y R D Y Q T - D Q K K G D L Y R D Y Q S - D H K K G E L Y R D Y V S - D H K K G E L Y R D Y Q F - D Q K K G S L Y K D Y N T - H Q K K N E S Y R D Y Q T - D H K K A D L G Y G F G - - L I K N T E T T K 9 T K G Y G F G L I K N T E T T K 10 T K G Y G F G L I K N T E T T K sequence position A C D E F G H K I L M N P Q R S T V W Y Evolutionary information Multiple Sequence Alignment (MSA) of similar sequences Sequence profile: for each position a 20- valued vector contains the aminoacidic composition of the aligned sequences.
8
9
10 3D structure prediction of proteins New folds Existing folds Ab initio prediction Threading Building by homology Homology (%)
11 Contact definition Contacts and Contact Maps F 156 V 299 F 297 I 269 V 271 V 238 I 240
12 Protein contact definitions: 1. Based on C 2. Based on C 3. All-atom (without Hydrogens)
13 From the 3D structure to the contact map Given a protein of length L, and a square matrix M of dimension L L For each pair of residue i and j calculate distance between i and j if distance < threshold otherwise put 1 in the cell M(i,j) put 0 in the cell M(i,j)
14 From 3D Structure Computation of Contact Maps Computation of Contact Maps From 3D Structure F 156 V 299 To Contact Map TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN T F 297 V 271 I 269 T T C C P S I V A R I 240 V 238 R S N F N V C R L P G T P E A I C A T Y T G C I I I P G A T C P G D Y A N
15 Protein Structural Classes All- All- + /
16 An Example of a Contact map (All- ) C5A
17 An Example of a Contact map (All- ) SFP C N
18 An Example of Contact map ( ) 60 6PTI N C
19 From the contact map to the 3D structure Two methods have been proposed : 1. Bohr et al., Protein Structure from distance Inequalities J.Mol. Biol. 1993, 231: => based on a steepest descent procedure 2. Vendruscolo and Domany Fold. Des. 1998, 2: => based on a modified Metropolis procedure
20 6pti Reconstruction Efficiency (58 residues) Vendruscolo and Domany Fold. Des RMSD M (Number of random flipping) At M= 200 No of eliminated true contacts 6 % real contacts No of added false contacts 52 % real contacts
21 From the contact map to the 3D structure: the reconstruction efficiency
22 3-D Modelling through Contact Maps example: Bacteriorhodopsin 1QHJ (1.9 Å) N Model Contact map C RMSD = 2.5 Å
23 MARC efficiency in 3D reconstruction from the protein contact map after progressive elimination of true contacts (6pti) RM SD % missing contacts
24 MARC efficiency in 3D reconstruction after progressive addition of wrong contacts to a protein contact map with 30 % of true contacts (6pti) RMS SD % wrong contacts
25 Prediction of Contact Maps
26 Prediction of Contact Maps Several methods have been applied: Bohr et al., FEBS :43-46 => based on neural networks Göbel et al., PROTEINS : => based on correlated mutations in proteins Thomas et al., Prot. Eng : => based on a statistical method and evolution information Olmea and Valencia Fold. Des :S25-S32 => based on correlated mutations and other information Fariselli and Casadio Prot. Eng :15-21 => based on neural networks and evolutionary information Fariselli et al., CASP4/ and Prot. Eng. in press => Neural networks and other information Pollastri and Baldi al., Bioinformatics S62-S70 S70 => Recurrent Neural networks
27 Relevant points Contact Threshold Sequence separation (or sequence gap) No of contacts vs No of non-contacts
28 The Contact Threshold 16 Å
29 The Contact Threshold 16 Å Å
30 The Contact Threshold 16 Å Å 8 Å
31 The Contact Threshold 16 Å Å 8 Å 6 Å
32 Sequence separation VTISCTGSSSNIGAGNHVKWYQQLPG
33 The Sequence Separation example of a sequence separation = 10 residues
34 0.025 Frequency distribution of the real and hypothetical contacts as a function of sequence separation frequ ency of contac cts Theoretical Experimental sequence separation
35 Relation between the number of contacts and the protein length ts mber of contact Num Protein length
36 Evaluation of the efficiency of contact map predictions 1) Accuracy: A = Ncp * / Ncp where Ncp * and Ncp are the number of correctly assigned contacts and that of total predicted contacts, respectively. 2) Improvement over a random predictor : R = A / (Nc/Np) where Nc/Np is the accuracy of a random predictor ; Nc is the number of real contacts in the protein ti of flength thlp, and Np are all the possible contacts t 3) Difference in the distribution of the inter-residue distances in the 3D structure for predicted pairs compared with all pair distances in the structure (Pazos et al., 1997): Xd= i=1,n (P ic -P ia ) / n d i where n is the number of bins of the distance distribution (15 equally distributed bins from 4 to 60Å cluster all the possible distances of residue pairs observed in the protein structure); d i is the upper limit (normalised to 60 Å) for each bin, e.g. 8 Å for the 4 to 8 Å bin; P ic and P ia are the percentage of predicted contact pairs (with distance between d i and d i-1 ) and that of all possible pairs respectively
37 Tools out of machine learning approaches Neural Networks Training Data Base Subset TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN Prediction New sequence General rules Known mapping Prediction
38 Contact definition used: C - C distance < 0.8 nm Sequence gap > 7 residues
39 The database of proteins used to train and test the contact map ppredictors. L<100 1c5a 1sco 2sn3 1bkf 1npk 3lzt 1juk 1axn 1a1i_A 1cfh 1spy 2sxl 1bkr_A 1pdn_C 3nul 1kid 1b0m 1a1t_A 1ctj 1sro 3gat_A 1br0 1pkp 5p21 1mml 1bg2 1a68 1cyo 1tbn 3mef_A 1bsn 1poa 7rsa 1mrj 1bgp 1a7i 1fna 1tiv 4mt2 1bv1 1put L: nls 1bxo 1acp 1hev 1tle 5pti 1bxa 1ra9 1ad2 1ppn 1dlc 1ah9 1hrz_A 1tsg L: c25 1rcf 1akz 1rgs 1irk 1aho 1kbs 1ubi 1a62 1cew_I 1rie 1amm 1rhs 1iso 1aie 1mbh 1uxd 1a6g 1cfe 1skz 1aol 1thv 1kvu 1ail 1mbj 2acy 1acz 1cyx 1tam 1ap8 1vin 1moq 1ajj 1msi 2adx 1asx 1dun 1vsd 1bf8 1xnb 1svb 1aoo 1mzm 2bop_A 1aud_A 1eca 1whi 1bjk 1yub 1uro_A 1ap0 1nxb 2ech 1ax3 1erv 2fsp 1byq_A 1zin 1ysc 1ark 1ocp 2fdn 1b10 1exg 2gdm 1c3d 2baa 2cae 1awd 1opd 2fn2 1bc4 1hfc 2ilk 1cdi 2fha 2dpg 1awj 1pce 2fow 1bd8 1ifc 2lfb 1cne L>300 2pgd 1awo 1plc 2hfh 1bea 1jvr 2pil 1cnv 16pk 3grs 1bbo 1pou 2hoa 1bfe_A 1kpf 2tgi 1csn 1a8e 1bc8_C 1ppt 2hqi 1bfg 1kte 2ucz 1ezm 1ads 1brf 1rof 2lef_A 1bgf 1mak 3chy 1fts 1arv
40 Neural Network-based predictor 1 output neuron (contact/non-contact) 1 hidden layer with 8 neurons Input layer with 1071 input neurons : Ordered residue pairs (1050 neurons) Secondary structures (18 neurons) Correlated mutations (1 neuron) Sequence conservation (2 neurons)
41 Representation of the input coding based on ordered couples. (A) An alignment of 5 (hypothetical) sequences they are represented in a HSSP file (Sander and (A) An alignment of 5 (hypothetical) sequences they are represented in a HSSP file (Sander and Schneider, 1991). i and j stand for the positions of the two residues making or not making contact (A and D in the leading sequence or sequence 1). (B) Single sequence coding. The position representing the couple (AD) in the vector is set to 1.0 while the other positions are set to 0. (C) Multiple sequence coding. For each sequence in the alignment (1 to 5 in the scheme in A) a couple of residues in position i and j is counted. The final input coding representing the frequency of each couple in the alignment is normalized to the number of the sequences
42 N seque ences M = N (N N-1)/2 couple es Correlated mutations Multiple sequence alignment 1 MVKGPGLYTDIGKKARDLLYKDYHSDKKFTISTYSPTGVAITSS 2 MVKGPGLYSDIGKRARDLLYRDYQSDHKFTLTTYTANGVAITST 3 MVKGPGLYTEIGKKARDLLYRDYQGDQKFSVTTYSSTGVAITTT M-valued vectors: i 1 MVKGPGLYTDIGKKARDLLYKDYHSDKKFTISTYSPTGVAITSS 2 MVKGPGLYSDIGKRARDLLYRDYQSDHKFTLTTYTANGVAITST 1 MVKGPGLYTDIGKKARDLLYKDYHSDKKFTISTYSPTGVAITSS 3 MVKGPGLYTEIGKKARDLLYRDYQGDQKFSVTTYSSTGVAITTT 2 MVKGPGLYSDIGKRARDLLYRDYQSDHKFTLTTYTANGVAITST 3 MVKGPGLYTEIGKKARDLLYRDYQGDQKFSVTTYSSTGVAITTT V i S : McLachlan substitution matrix S(T;S) S(T;T) S(S;T) j S(I;L) S(I;V) S(L;V) V j Correlation: C ij 1 M M V (k) V V (k) V i σ V σ V k 1 i j i j j
43 The neural network architecture for prediction of contact maps
44 Accuracy of contact map prediction using a crosslidtddt data set t(170 proteins) ti validated No of proteins Accuracy
45 T0087: 310 residues (A = 0.20 FR/NF ) C N
46 T0106: 123 residues (A=0.06 FR / NF ) C N
47 T0128: 222 residues (A = 0.24 CM ) N C
48 T0110: 128 residues (A = 0.30 FR ) N C
49 T0125: 141 residues (A = 0.03 CM ) N C
50 T0124: 242 residues (A = 0.01 NF) C N
51 Sequenc ce posit tion TARGET: T0115 (300 residues) (A = 0.17 FR/NF) PDB code: 1FWK (Homoserine kinase, Methanococcus jannaschii) C Sequence position N
52 Predictive performance on 29 targets Predicted Fr(H) Predicted Fr(E) Observed Fr(H) Observed Fr(E) Lp Nal Xd A Class Target Q3 (SS) or T FR/NF T CM/FR/NF T FR/NF T CM/FR T FR T FR T FR/NF T FR T FR T FR T FR T CM T CM/FR T FR/NF T FR/NF T CM/FR T CM T CM T FR T FR T CM all- T FR all- T FR/NF T FR/NF T NF T FR T FR/NF T NF T CM Q3=secondary structure prediction accuarcy; Fr(H) and Fr(E)= frequency of predicted and observed alfa and beta structures in the chain; Lp=protein length in residues; Nal= number of sequences in the alignment; Xd and A are as defined in equations 2 and 1, respectively; Class is the classification of targets by predictio difficulty: CM=comparative modeling, FR=fold recognition, NF=new fold.
53 COMMENTS The predictor is trained mainly on globular mixed proteins Contacts among beta structures dominate Contacts in all-alpha proteins are more difficult to predict A filtering i algorithm is needed
Prediction of protein function from sequence analysis
Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:
More informationSequencing alignment Ameer Effat M. Elfarash
Sequencing alignment Ameer Effat M. Elfarash Dept. of Genetics Fac. of Agriculture, Assiut Univ. aelfarash@aun.edu.eg Why perform a multiple sequence alignment? MSAs are at the heart of comparative genomics
More informationSequencing alignment Ameer Effat M. Elfarash
Sequencing alignment Ameer Effat M. Elfarash Dept. of Genetics Fac. of Agriculture, Assiut Univ. amir_effat@yahoo.com Why perform a multiple sequence alignment? MSAs are at the heart of comparative genomics
More informationDNA and protein databases. EMBL/GenBank/DDBJ database of nucleic acids
Database searches 1 DNA and protein databases EMBL/GenBank/DDBJ database of nucleic acids 2 DNA and protein databases EMBL/GenBank/DDBJ database of nucleic acids (cntd) 3 DNA and protein databases SWISS-PROT
More informationStatistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics
Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationPresentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy
Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy Burkhard Rost and Chris Sander By Kalyan C. Gopavarapu 1 Presentation Outline Major Terminology Problem Method
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationBioinformatics: Secondary Structure Prediction
Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries
More informationAccurate Prediction of Protein Disordered Regions by Mining Protein Structure Data
Data Mining and Knowledge Discovery, 11, 213 222, 2005 c 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. DOI: 10.1007/s10618-005-0001-y Accurate Prediction of Protein Disordered
More informationProcheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.
Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationPredicting protein contact map using evolutionary and physical constraints by integer programming (extended version)
Predicting protein contact map using evolutionary and physical constraints by integer programming (extended version) Zhiyong Wang 1 and Jinbo Xu 1,* 1 Toyota Technological Institute at Chicago 6045 S Kenwood,
More informationSUPPLEMENTARY MATERIALS
SUPPLEMENTARY MATERIALS Enhanced Recognition of Transmembrane Protein Domains with Prediction-based Structural Profiles Baoqiang Cao, Aleksey Porollo, Rafal Adamczak, Mark Jarrell and Jaroslaw Meller Contact:
More informationBioinformatics: Secondary Structure Prediction
Bioinformatics: Secondary Structure Prediction Prof. David Jones d.t.jones@ucl.ac.uk Possibly the greatest unsolved problem in molecular biology: The Protein Folding Problem MWMPPRPEEVARK LRRLGFVERMAKG
More informationDNA1: Last week's take-home lessons
Harvard-MIT Division of Health Sciences and Technology HST.508: Genomics and Computational Biology DNA1: Last week's take-home lessons Types of mutants Mutation, drift, selection Binomial for each Association
More informationProtein Structure Prediction Using Neural Networks
Protein Structure Prediction Using Neural Networks Martha Mercaldi Kasia Wilamowska Literature Review December 16, 2003 The Protein Folding Problem Evolution of Neural Networks Neural networks originally
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationImproving De novo Protein Structure Prediction using Contact Maps Information
CIBCB 2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology Improving De novo Protein Structure Prediction using Contact Maps Information Karina Baptista dos Santos
More informationIntroducing Hippy: A visualization tool for understanding the α-helix pair interface
Introducing Hippy: A visualization tool for understanding the α-helix pair interface Robert Fraser and Janice Glasgow School of Computing, Queen s University, Kingston ON, Canada, K7L3N6 {robert,janice}@cs.queensu.ca
More informationContact map guided ab initio structure prediction
Contact map guided ab initio structure prediction S M Golam Mortuza Postdoctoral Research Fellow I-TASSER Workshop 2017 North Carolina A&T State University, Greensboro, NC Outline Ab initio structure prediction:
More informationProtein Structure Prediction
Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on
More informationProtein Secondary Structure Prediction using Feed-Forward Neural Network
COPYRIGHT 2010 JCIT, ISSN 2078-5828 (PRINT), ISSN 2218-5224 (ONLINE), VOLUME 01, ISSUE 01, MANUSCRIPT CODE: 100713 Protein Secondary Structure Prediction using Feed-Forward Neural Network M. A. Mottalib,
More information1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)
Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinff18.html Proteins and Protein Structure
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationImproving Protein 3D Structure Prediction Accuracy using Dense Regions Areas of Secondary Structures in the Contact Map
American Journal of Biochemistry and Biotechnology 4 (4): 375-384, 8 ISSN 553-3468 8 Science Publications Improving Protein 3D Structure Prediction Accuracy using Dense Regions Areas of Secondary Structures
More informationMolecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007
Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More informationDetection of Protein Binding Sites II
Detection of Protein Binding Sites II Goal: Given a protein structure, predict where a ligand might bind Thomas Funkhouser Princeton University CS597A, Fall 2007 1hld Geometric, chemical, evolutionary
More informationComputational Biology From The Perspective Of A Physical Scientist
Computational Biology From The Perspective Of A Physical Scientist Dr. Arthur Dong PP1@TUM 26 November 2013 Bioinformatics Education Curriculum Math, Physics, Computer Science (Statistics and Programming)
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationAlpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University
Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and
More informationSequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5
Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many
More informationBioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter
Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction Institute of Bioinformatics Johannes Kepler University, Linz, Austria Chapter 4 Protein Secondary
More informationBasics of protein structure
Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu
More informationBidirectional IOHMMs and Recurrent Neural Networks for Protein Secondary Structure Prediction
1 Bidirectional IOHMMs and Recurrent Neural Networks for Protein Secondary Structure Prediction Pierre Baldi 1, Soren Brunak 2, Paolo Frasconi 3, and Gianluca Pollastri 1 1 Dept. of Information and Computer
More informationMultiple sequence alignment
Multiple sequence alignment Irit Orr Shifra Ben-Dor An example of Multiple Alignment VTISCTGSSSNIGAG-NHVKWYQQLPGQLPG VTISCTGTSSNIGS--ITVNWYQQLPGQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG
More informationBioinformatics. Macromolecular structure
Bioinformatics Macromolecular structure Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis Structure alignment Domain
More informationProtein Secondary Structure Prediction
Protein Secondary Structure Prediction Doug Brutlag & Scott C. Schmidler Overview Goals and problem definition Existing approaches Classic methods Recent successful approaches Evaluating prediction algorithms
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationProtein Structure Prediction Using Multiple Artificial Neural Network Classifier *
Protein Structure Prediction Using Multiple Artificial Neural Network Classifier * Hemashree Bordoloi and Kandarpa Kumar Sarma Abstract. Protein secondary structure prediction is the method of extracting
More informationPREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS
PREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS T. Z. SEN, A. KLOCZKOWSKI, R. L. JERNIGAN L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University Ames, IA
More informationBayesian Models and Algorithms for Protein Beta-Sheet Prediction
0 Bayesian Models and Algorithms for Protein Beta-Sheet Prediction Zafer Aydin, Student Member, IEEE, Yucel Altunbasak, Senior Member, IEEE, and Hakan Erdogan, Member, IEEE Abstract Prediction of the three-dimensional
More informationProtein quality assessment
Protein quality assessment Speaker: Renzhi Cao Advisor: Dr. Jianlin Cheng Major: Computer Science May 17 th, 2013 1 Outline Introduction Paper1 Paper2 Paper3 Discussion and research plan Acknowledgement
More informationProtein structures and comparisons ndrew Torda Bioinformatik, Mai 2008
Protein structures and comparisons ndrew Torda 67.937 Bioinformatik, Mai 2008 Ultimate aim how to find out the most about a protein what you can get from sequence and structure information On the way..
More informationImproving Protein Secondary-Structure Prediction by Predicting Ends of Secondary-Structure Segments
Improving Protein Secondary-Structure Prediction by Predicting Ends of Secondary-Structure Segments Uros Midic 1 A. Keith Dunker 2 Zoran Obradovic 1* 1 Center for Information Science and Technology Temple
More informationPatterns, Profiles, and
Patterns, Profiles, and Mltiple l Alignments Otline Profiles, Position Specific Scoring Matrices Profile Hidden Markov Models Alignment of Profiles Mltiple Alignment Algorithms Problem definition Can we
More informationPROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES
PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES Eser Aygün 1, Caner Kömürlü 2, Zafer Aydin 3 and Zehra Çataltepe 1 1 Computer Engineering Department and 2
More informationProtein Structure Prediction, Engineering & Design CHEM 430
Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment
More informationCOMBINED MULTIPLE SEQUENCE REDUCED PROTEIN MODEL APPROACH TO PREDICT THE TERTIARY STRUCTURE OF SMALL PROTEINS
COMBINED MULTIPLE SEQUENCE REDUCED PROTEIN MODEL APPROACH TO PREDICT THE TERTIARY STRUCTURE OF SMALL PROTEINS ANGEL R. ORTIZ 1, ANDRZEJ KOLINSKI 1,2, JEFFREY SKOLNICK 1 1 Department of Molecular Biology,
More informationCan protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU
Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO! Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality
More informationPROTEINS 25% 75% ALL BETA ALL ALPHA
Machine Learning Structural and Functional Proteomics Pierre Baldi and Gianluca Pollastri Department of Information and Computer Science Institute for Genomics and Bioinformatics University of California,
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationUnderstanding Sequence, Structure and Function Relationships and the Resulting Redundancy
Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy many slides by Philip E. Bourne Department of Pharmacology, UCSD Agenda Understand the relationship between sequence,
More informationPROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES
PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES by Lipontseng Cecilia Tsilo A thesis submitted to Rhodes University in partial fulfillment of the requirements for
More informationHeteropolymer. Mostly in regular secondary structure
Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!
More informationImproved Protein Secondary Structure Prediction
Improved Protein Secondary Structure Prediction Secondary Structure Prediction! Given a protein sequence a 1 a 2 a N, secondary structure prediction aims at defining the state of each amino acid ai as
More informationSUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH
SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH Ashutosh Kumar Singh 1, S S Sahu 2, Ankita Mishra 3 1,2,3 Birla Institute of Technology, Mesra, Ranchi Email: 1 ashutosh.4kumar.4singh@gmail.com,
More informationProtein 8-class Secondary Structure Prediction Using Conditional Neural Fields
2010 IEEE International Conference on Bioinformatics and Biomedicine Protein 8-class Secondary Structure Prediction Using Conditional Neural Fields Zhiyong Wang, Feng Zhao, Jian Peng, Jinbo Xu* Toyota
More informationIT og Sundhed 2010/11
IT og Sundhed 2010/11 Sequence based predictors. Secondary structure and surface accessibility Bent Petersen 13 January 2011 1 NetSurfP Real Value Solvent Accessibility predictions with amino acid associated
More informationProtein structure alignments
Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives
More informationA New Similarity Measure among Protein Sequences
A New Similarity Measure among Protein Sequences Kuen-Pin Wu, Hsin-Nan Lin, Ting-Yi Sung and Wen-Lian Hsu * Institute of Information Science Academia Sinica, Taipei 115, Taiwan Abstract Protein sequence
More informationPredictors (of secondary structure) based on Machine Learning tools
Predictors (of secondary structure) based on Machine Learning tools Predictors of secondary structure 1 Generation methods: propensity of each residue to be in a given conformation Chou-Fasman 2 Generation
More informationALL LECTURES IN SB Introduction
1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More information114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009
114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome
More informationMotif Prediction in Amino Acid Interaction Networks
Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationComparative Study of Machine Learning Models in Protein Structure Prediction
Comparative Study of Machine Learning Models in Protein Structure Prediction *Sonal Mishra 1, *Anamika Ahirwar 2 * 1 Computer Science and Engineering Department, Maharana Pratap College of technology Gwalior.
More informationEffective Use of Sequence Correlation and Conservation in Fold Recognition
Article No. jmbi.1999.3208 available online at http://www.idealibrary.com on J. Mol. Biol. (1999) 295, 1221±1239 Effective Use of Sequence Correlation and Conservation in Fold Recognition Osvaldo Olmea
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationA General Method for Combining Predictors Tested on Protein Secondary Structure Prediction
A General Method for Combining Predictors Tested on Protein Secondary Structure Prediction Jakob V. Hansen Department of Computer Science, University of Aarhus Ny Munkegade, Bldg. 540, DK-8000 Aarhus C,
More informationProtein Folding Prof. Eugene Shakhnovich
Protein Folding Eugene Shakhnovich Department of Chemistry and Chemical Biology Harvard University 1 Proteins are folded on various scales As of now we know hundreds of thousands of sequences (Swissprot)
More informationProtein Structure Prediction using String Kernels. Technical Report
Protein Structure Prediction using String Kernels Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE Minneapolis, MN 55455-0159
More information1-D Predictions. Prediction of local features: Secondary structure & surface exposure
1-D Predictions Prediction of local features: Secondary structure & surface exposure 1 Learning Objectives After today s session you should be able to: Explain the meaning and usage of the following local
More informationProtein-Protein Interaction Classification Using Jordan Recurrent Neural Network
Protein-Protein Interaction Classification Using Jordan Recurrent Neural Network Dilpreet Kaur Department of Computer Science and Engineering PEC University of Technology Chandigarh, India dilpreet.kaur88@gmail.com
More informationAlignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)
Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in
More informationSupporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB
Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications, Cvicek et al. Supporting Text 1 Here we compare the GRoSS alignment
More informationJeremy Chang Identifying protein protein interactions with statistical coupling analysis
Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building
More informationProtein Structure Prediction and Display
Protein Structure Prediction and Display Goal Take primary structure (sequence) and, using rules derived from known structures, predict the secondary structure that is most likely to be adopted by each
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationIntroduction to Evolutionary Concepts
Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq
More informationStructural Alignment of Proteins
Goal Align protein structures Structural Alignment of Proteins 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE
More informationA profile-based protein sequence alignment algorithm for a domain clustering database
A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*
More informationProtein Structure: Data Bases and Classification Ingo Ruczinski
Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References
More informationBIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Protein Structure Prediction I
BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer 2013 9. Protein Structure Prediction I Structure Prediction Overview Overview of problem variants Secondary structure prediction
More informationIdentification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach
Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Prof. Dr. M. A. Mottalib, Md. Rahat Hossain Department of Computer Science and Information Technology
More informationGetting To Know Your Protein
Getting To Know Your Protein Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research
More informationPatterns, Profiles, and
Patterns, Profiles, and Mltiple l Alignments Otline Profiles, Position Specific Scoring Matrices Profile Hidden Markov Models Alignment of Profiles Mltiple Alignment Algorithms Problem definition Can we
More informationproteins Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick*
proteins STRUCTURE O FUNCTION O BIOINFORMATICS Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick* Center for the Study
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationTOUCHSTONE: A Unified Approach to Protein Structure Prediction
PROTEINS: Structure, Function, and Genetics 53:469 479 (2003) TOUCHSTONE: A Unified Approach to Protein Structure Prediction Jeffrey Skolnick, 1 * Yang Zhang, 1 Adrian K. Arakaki, 1 Andrzej Kolinski, 1,2
More informationBioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre
Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement
More informationOptimization of the Sliding Window Size for Protein Structure Prediction
Optimization of the Sliding Window Size for Protein Structure Prediction Ke Chen* 1, Lukasz Kurgan 1 and Jishou Ruan 2 1 University of Alberta, Department of Electrical and Computer Engineering, Edmonton,
More information