Analysis and Prediction of Protein Structure (I)

Similar documents
Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Template Based Protein Structure Modeling Jianlin Cheng, PhD

Basics of protein structure

Introduction to" Protein Structure

Protein Structures. 11/19/2002 Lecture 24 1

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

CAP 5510 Lecture 3 Protein Structures

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

ALL LECTURES IN SB Introduction

Getting To Know Your Protein

Protein Structure: Data Bases and Classification Ingo Ruczinski

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Protein Structure Basics

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Details of Protein Structure

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Protein Structure & Motifs

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Protein structure alignments

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Bioinformatics. Macromolecular structure

Useful background reading

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

CS612 - Algorithms in Bioinformatics

Molecular Modeling lecture 2

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Template Free Protein Structure Modeling Jianlin Cheng, PhD

D Dobbs ISU - BCB 444/544X 1

BIRKBECK COLLEGE (University of London)

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

BIOCHEMISTRY Unit 2 Part 4 ACTIVITY #6 (Chapter 5) PROTEINS

Presenter: She Zhang

Protein Structures: Experiments and Modeling. Patrice Koehl

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Protein Structure Prediction and Display

Heteropolymer. Mostly in regular secondary structure

Protein Structure Prediction

Computational Molecular Modeling

Protein structure and folding

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Protein Secondary Structure Assignment and Prediction

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

NMR, X-ray Diffraction, Protein Structure, and RasMol

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Structure to Function. Molecular Bioinformatics, X3, 2006

Prediction and refinement of NMR structures from sparse experimental data

Protein quality assessment

Supersecondary Structures (structural motifs)

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Protein structure analysis. Risto Laakso 10th January 2005

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

Physiochemical Properties of Residues

IT og Sundhed 2010/11

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Protein Structure Determination

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies)

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

From Amino Acids to Proteins - in 4 Easy Steps

RNA and Protein Structure Prediction

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Study of Mining Protein Structural Properties and its Application

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Analysis and Prediction of Protein Structure

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

BCB 444/544 Fall 07 Dobbs 1

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

7 Protein secondary structure

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Protein Structure Prediction 11/11/05

8 Protein secondary structure

Pymol Practial Guide

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 10 (10/4/17) Lecture 10 (10/4/17)

Membrane proteins Porins: FadL. Oriol Solà, Dimitri Ivancic, Daniel Folch, Marc Olivella

AP Biology. Proteins. AP Biology. Proteins. Multipurpose molecules

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Model Mélange. Physical Models of Peptides and Proteins

1) NMR is a method of chemical analysis. (Who uses NMR in this way?) 2) NMR is used as a method for medical imaging. (called MRI )

Protein structure similarity based on multi-view images generated from 3D molecular visualization

Introduction to Computational Structural Biology

1. (5) Draw a diagram of an isomeric molecule to demonstrate a structural, geometric, and an enantiomer organization.

The protein folding problem consists of two parts:

Part I: Introduction to Protein Structure Similarity

Introducing Hippy: A visualization tool for understanding the α-helix pair interface

The Structure and Functions of Proteins

Transcription:

Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng & original sources for some materials

Outline I. Sequence, Structure, Function Relation II. Determination, Storage, Visualization, Analysis, and Comparison III. Structure Classification IV. 1D Prediction V. 2D Prediction VI. 3D Prediction VII. Useful Tools

Sequence, Structure and Function AGCWY Cell

Binding Site Protection Function in Immune System High specificity (amino acid sequence segment at binding site) Strong affinity Antibody Question: there are so many different viruses not known before hand, how an animal cell figure out an Antibody to bind to them, but not bind to its own protein? In stock or make on the fly?

Protein Sequence Primary The first protein was sequenced by Frederick Sanger in 1953. Twice Nobel Laureate (1958, 1980) (other: Curie, Pauling, Bardeen). Determined the amino acid sequence of insulin and proved proteins have specific primary structure. Structure

Protein Sequence N A directional sequence of amino acids/residues C Amino Acid 1 Amino Acid 2 Peptide bond Side Chain Backbone

Amino Acid Structure

Amino Acids Hydrophilic

Protein Secondary Structure Determined by hydrogen bond patterns 3-Class categories: alpha-helix, betasheet, loop (or coil) First deduced by Linus Pauling et al.

Alpha-Helix Jurnak, 2003

Beta-Sheet Anti-Parallel Parallel

Non-Repetitive Secondary Structure Beta-Turn Loop

Tertiary Structure John Kendrew et al., Myoglobin Max Perutz et al., Haemoglobin 1962 Nobel Prize in Chemistry Perutz Kendrew

myoglobin haemoglobin

Quaternary Structure: Complex G-Protein Complex

Anfinsen s Folding Experiment Structure is uniquely determined by protein sequence Protein function is determined by protein structure

Outline I. Sequence, Structure, Function Relation II. Determination, Storage, Visualization, Analysis, and Comparison III. Structure Classification IV. 1D Prediction V. 2D Prediction VI. 3D Prediction VII. Useful Tools

Protein Structure Determination X-ray crystallography Nuclear Magnetic Resonance (NMR) Spectroscopy X-ray: any size, accurate (1-3 Angstrom (10-10 m)), sometime hard to grow crystal NMR: small to medium size, moderate accuracy, structure in solution

Wikipedia, the free encyclopedia

Pacific Northwest National Laboratory's high magnetic field (800 MHz, 18.8 T) NMR spectrometer being loaded with a sample. Wikipedia, the free encyclopedia

Storage in Protein Data Bank Search database

Search protein 1VJG

PDB Format (2C8Q, insulin)

PDB content growth (www.pdb.org) 40,000 30,000 structures 20,000 10,000 2006 2000 1990 1980 year J. Pevsner, 2005

Structure Visualization Rasmol (http://www.umass.edu/microbio/rasmol/getras.ht m) MDL Chime (plug-in) (http://www.mdl.com/products/framework/chime/) Protein Explorer (http://molvis.sdsc.edu/protexpl/frntdoor.htm) Online tools (PDB sites)

J. Pevsner, 2005

Demo Using Rasmol (1VJG)

Structure Analysis Assign secondary structure for amino acids from 3D structure Generate solvent accessible area for amino acids from 3D structure Most widely used tool: DSSP (Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Kabsch and Sander)

DSSP server: http://bioweb.pasteur.fr/seqanal/interfaces/dssp-simple.html DSSP download: http://swift.cmbi.ru.nl/gv/dssp/ Chris Sander One of the founders of Bioinformatics and Computational Biology DSSP Code: H = alpha helix G = 3-helix (3/10 helix) I = 5 helix (pi helix) B = residue in isolated beta-bridge E = extended strand, participates in beta ladder T = hydrogen bonded turn S = bend Blank = loop

DSSP Web Service http://bioweb.pasteur.fr/seqanal/interfaces/dssp-simple.html

Click here

Amino Acids Secondary Structure Solvent Accessibility

Solvent Accessibility Size of the area of an amino acid that is exposed to solvent (water). Maximum solvent accessible area for each amino acid is its whole surface area. Hydrophobic residues like to be Buried inside (interior). Hydrophilic residues like to be exposed on the surface.

Structure Comparison (Alignment) Are the structures of two protein similar? Are the two structure models of the same protein similar? Different measures (RMSD, GDT-TS (Zemla et al., 1999), MaxSub (Siew et al., 2000), TM score (Zhang and Skolnick,2005))

Basic Idea Try to superimpose two structures to minimize the distances between corresponding atoms (residues) Hard geometric optimization problem Alignment problem with very complex scoring function (treat each position as one character) Root Mean Square Deviation RMSD i n = i= 1 = (( a ix b ix ) 2 + ( a iy n b iy ) 2 + ( a iz b iz ) 2 )

Superimposition David Baker, 2005

Useful Structure Alignment Tools CE (http://cl.sdsc.edu/) DALI (http://www.ebi.ac.uk/ dali/) VAST (NCBI)

Outline I. Sequence, Structure, Function Relation II. Determination, Storage, Visualization, and Comparison III. Structure Classification IV. 1D Prediction V. 2D Prediction VI. 3D Prediction VII. Useful Tools

Protein Folding Favor Disfavor ΔG H-bond, Charge-Charge Interaction Charge-Polar interaction Polar-Polar interaction Entropy

Magnitude of ΔG ΔG is small, only about 1-3 H-Bonds. A small ΔG is critical for maintenance of the conformation flexibility of proteins in biochemical processes. Question: can any amino acid sequences fold into a stable protein structure?

Protein Structure Classification About 3 million known protein sequences About 30,000 known protein structures in PDB Many protein structures are similar due to evolutionary relationship Many protein structures are similar due to convergent evolution Number of unique structure topologies is estimated to be limited (1000 1500?) Number of protein sequences is huge (20 300 )

Protein Structure Universe Proteins. One thousand families for the molecular biologist. C. Chothia. Nature, 1992.

Colors in the universe of protein structures Christine Orengo (Structures, 1997, 5, 1093-1108) Christine Orengo (Structures, 1997, 5, 1093-1108) Christine Orengo 1997 Structures 5 1093-1108 B. Rost, 2005

Mapping Protein Universe: Structural Genomics Chandonia and Brenner, 2005

Why is the evolution of protein structure so slow? Protein sequence evolves faster than protein structure Protein structure is more conserved due to function constraints Nature reuses existing folds for new functions (pretty much like programming paradigm in computer science)

Domain and Fold Protein domain is the structural (also functional) unit. Protein domain is usually defined as a chunk of protein (usually continuous sub-sequences, but not always) that can fold independently into its tertiary structure without other parts of the protein Fold is the topology of a protein or domain: connectivity of secondary structure elements.

Domain Example (HIV-1 Capsid)

Another Domain Example http://scop.berkeley.edu/rsgen.cgi?chime=1;pd=1eft;pr=213-312

Domain Parsing Tools PDP (protein domain parser) (http://123d.ncifcrf.gov/pdp.html) Domain Parser (http://csbl.bmb.uga.edu/downloads/)

Typical Folds Fold: connectivity or arrangement of secondary structure elements. NAD-binding Rossman fold 3 layers, a/b/a, parallel beta-sheet of 3 strands. Order: 321456 http://scop.berkeley.edu/rsgen.cgi?chime=1;pd=1b16;pc=a

Another Fold Example: TIM betaalpha barrel Contains parallel beta-sheet Barrel, closed. 8 strands. Order 1,2,3,4,5,6,7,8. http://scop.berkeley.edu/rsgen.cgi?chime=1;pd=1hti;pc=a

Fold: Helix Bundle (human growth factor) http://scop.berkeley.edu/rsgen.cgi?chime=1;pd=1hgu

Fold: beta barrel http://scop.berkeley.edu/rsgen.cgi?chime=1;pd=1rbp

Fold Example: Lamda repressor DNA Binding http://scop.berkeley.edu/rsgen.cgi?chime=1;pd=5cro;pc=0

Number of unique folds (defined by SCOP) in PDB 1,000 structures 500 2006 2000 1990 1980 year J. Pevsner, 2005

Structure Classification Database SCOP (http://scop.berkeley.edu/) CATH (http://cathwww.biochem.ucl.ac.uk/latest/inde x.html) Dali/FSSP (http://ekhidna.biocenter.helsinki.fi/dali/start)

SCOP Classification Levels SCOP Root Class alpha Beta a+b a/b Small Protein 4-Helix Bundle.. TIM a/b barrel Fold Phospholipase Nuclease.. Superfamily Tyrosyl-DNA Phosphodiesterase. Family

J. Pevsner 2005