Biophysics 101: Genomics & Computational Biology. Section 8: Protein Structure S T R U C T U R E P R O C E S S. Outline.

Similar documents
Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Basics of protein structure

Protein Structures. 11/19/2002 Lecture 24 1

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Introduction to" Protein Structure

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Bioinformatics. Macromolecular structure

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Introduction to Computational Structural Biology

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Analysis and Prediction of Protein Structure (I)

CAP 5510 Lecture 3 Protein Structures

Protein Structure: Data Bases and Classification Ingo Ruczinski

From Amino Acids to Proteins - in 4 Easy Steps

ALL LECTURES IN SB Introduction

BIBC 100. Structural Biochemistry

Protein Structure & Motifs

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

BCMP 201 Protein biochemistry

D Dobbs ISU - BCB 444/544X 1

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Biomolecules: lecture 9

The Structure and Functions of Proteins

CS612 - Algorithms in Bioinformatics

Protein Structure Basics

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Proteins are not rigid structures: Protein dynamics, conformational variability, and thermodynamic stability

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Biochemistry - I SPRING Mondays and Wednesdays 9:30-10:45 AM (MR-1307) Lectures 3-4. Based on Profs. Kevin Gardner & Reza Khayat

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Computational Molecular Modeling

Motif Prediction in Amino Acid Interaction Networks

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

A General Model for Amino Acid Interaction Networks

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

AP Biology. Proteins. AP Biology. Proteins. Multipurpose molecules

STRUCTURAL BIOINFORMATICS. Barry Grant University of Michigan

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Principles of Physical Biochemistry

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Biomolecules: lecture 10

4 Proteins: Structure, Function, Folding W. H. Freeman and Company

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

1/40. Cellular mechanics I nd term

Syllabus BINF Computational Biology Core Course

Papers listed: Cell2. This weeks papers. Chapt 4. Protein structure and function. The importance of proteins

Heteropolymer. Mostly in regular secondary structure

Supporting Online Material for

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Getting To Know Your Protein

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

STRUCTURAL BIOINFORMATICS I. Fall 2015

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1

Biochemistry: Concepts and Connections

Details of Protein Structure

Free energy, electrostatics, and the hydrophobic effect

Protein folding. Today s Outline

BME Engineering Molecular Cell Biology. Structure and Dynamics of Cellular Molecules. Basics of Cell Biology Literature Reading

Introduction to Protein Folding

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Lec.1 Chemistry Of Water

Major Types of Association of Proteins with Cell Membranes. From Alberts et al

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

BBS501 Section 1 9:00 am 10:00 am Monday thru Friday LRC 105 A & B

Protein structure alignments

Proteins. Division Ave. High School Ms. Foglia AP Biology. Proteins. Proteins. Multipurpose molecules

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Other Cells. Hormones. Viruses. Toxins. Cell. Bacteria

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Protein Structure Marianne Øksnes Dalheim, PhD candidate Biopolymers, TBT4135, Autumn 2013

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Contents. xiii. Preface v

The protein folding problem consists of two parts:

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Protein Structure Prediction, Engineering & Design CHEM 430

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Physiochemical Properties of Residues

Orientational degeneracy in the presence of one alignment tensor.

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Protein Structure Prediction 11/11/05

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Ramachandran and his Map

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Transcription:

Biophysics 101: Genomics & Computational Biology Section 8: Protein Structure Faisal Reza Nov. 11 th, 2003 B101.pdb from PS5 shown at left with: animated ball and stick model, colored CPK H-bonds on, colored green van der Waals radii on, also colored CPK Outline Course Projects Biology/Chemistry of Protein Structure Protein Assembly, Folding, Packing and Interaction Primary, Secondary, Tertiary and Quaternary structures Class, Fold, Topology CS/Math/Physics of Protein Structure Experimental Determination and Analysis Computational Determination and Analysis Proteomics Mass Spectrometry Based on the backbone and H-bond configuration shown, what secondary structure might this be? Course Projects Biology/Chemistry of Protein Structure Videotaping authorization form Submission Parameters (via email) when: December 2, 2003 12noon EST. (9AM EST if presenting on December 2, 2003) where: bphys101@fas.harvard.edu what: (1) written project (.doc, ~1000-3000 words) (2) presentation slides (.ppt, 1-2 MB) Presentation Parameters (in person) when: December {2, 9, 16}, 2003 {12-2PM, 5:30-7:30PM} EST. where: HMS Cannon Seminar Room for 12-2PM Science Ctr. Lecture Hall A for 5:30-7:30PM what: (1) oral presentations (6 min/person + 2 min/person Q/A) (2) grading rubric and further information: http://www.courses.fas.harvard.edu/~bphys101/projects/index.html S T R U C T U R E Primary Secondary Tertiary Quaternary Assembly Folding Packing Interaction P R O C E S S occurs at the ribosome involves dehydration synthesis and polymerization of amino acids attached to trna: NH + - {A + B A-B + H O} -COO- 3 2 n thermodynamically unfavorable, with E = +10kJ/mol, thus coupled to reactions that act as sources of free energy yields primary structure Protein Assembly Primary Structure primary structure of human insulin CHAIN 1: GIVEQ CCTSI CSLYQ LENYC N CHAIN 2: FVNQH LCGSH LVEAL YLVCG ERGFF YTPKT linear ordered 1 dimensional sequence of amino acid polymer by convention, written from amino end to carboxyl end a perfectly linear amino acid polymer is neither functional nor energetically favorable folding! 1

occurs in the cytosol involves localized spatial interaction among primary structure elements, i.e. the amino acids may or may not involve chaperone proteins Protein Folding tumbles towards conformations that reduce E (this process is thermodynamically favorable) yields secondary structure Secondary Structure non-linear 3 dimensional localized to regions of an amino acid chain formed and stabilized by hydrogen bonding, electrostatic and van der Waals interactions Ramachandran Plot Pauling built models based on the following principles, codified by Ramachandran: (1) bond lengths and angles should be similar to those found in individual amino acids and small peptides (2) peptide bond should be planer (3) overlaps not permitted, pairs of atoms no closer than sum of their covalent radii (4) stabilization have sterics that permit hydrogen bonding Two degrees of freedom: (1) φ (phi) angle = rotation about N Cα (2) ψ (psi) angle = rotation about Cα C A linear amino acid polymer with some folds is better but still not functional nor completely energetically favorable packing! Protein Packing occurs in the cytosol (~60% bulk water, ~40% water of hydration) involves interaction between secondary structure elements and solvent may be promoted by chaperones, membrane proteins tumbles into molten globule states overall entropy loss is small enough so enthalpy determines sign of E, which decreases (loss in entropy from packing counteracted by gain from desolvation and reorganization of water, i.e. hydrophobic effect) yields tertiary structure non-linear 3 dimensional global but restricted to the amino acid polymer formed and stabilized by hydrogen bonding, covalent (e.g. disulfide) bonding, hydrophobic packing toward core and hydrophilic exposure to solvent A globular amino acid polymer folded and compacted is somewhat functional (catalytic) and energetically favorable interaction! Tertiary Structure Protein Interaction occurs in the cytosol, in close proximity to other folded and packed proteins involves interaction among tertiary structure elements of separate polymer chains may be promoted by chaperones, membrane proteins, cytosolic and extracellular elements as well as the proteins own propensities E decreases further due to further desolvation and reduction of surface area globular proteins, e.g. hemoglobin, largely involved in catalytic roles fibrous proteins, e.g. collagen, largely involved in structural roles yields quaternary structure 2

Quaternary Structure non-linear 3 dimensional global, and across distinct amino acid polymers formed by hydrogen bonding, covalent bonding, hydrophobic packing and hydrophilic exposure favorable, functional structures occur frequently and have been categorized Class/Motif class = secondary structure composition, e.g. all α, all β, segregated α+β, mixed α/β motif = small, specific combinations of secondary structure elements, e.g. β-α-β loop both subset of fold/architecture/domains Fold/Architecture/Domains Topology/Fold families/superfamilies fold = architecture = the overall shape and orientation of the secondary structures, ignoring connectivity between the structures, e.g. α/β barrel, TIM barrel domain = the functional property of such a fold or architecture, e.g. binding, cleaving, spanning sites subset of topology/fold families/superfamilies CLASS: α+β FOLD: sandwich FOLD FAMILY: flavodoxin topology = the overall shape and connectivity of the folds and domains fold families = categorization that takes into account topology and previous subsets as well as empirical/biological properties, e.g. flavodoxin superfamilies = in addition to fold families, includes evolutionary/ancestral properties CS/Math/Physics of Protein Structure Experimental Determination and Analysis Computational Determination and Analysis Experimental Determination and Analysis Repositories Protein Data Bank Molecular Modeling DataBase Resolution X-Ray Crystallography NMR Spectroscopy Mass Spectroscopy (next week) Fluorescence Resonance Energy Transfer 3

Cumulative increase in the number of domains Protein Data Bank Coordinates database RCSB Protein Data Bank (PDB) has many structures, partly due to minor differences in structure resolution and annotation has much fewer fold families, partly due to evolved pathways and mechanisms.pdb = data from experiment, with missing parameters and multiple conformations Molecular Modeling DataBase Comparative database NCBI Molecular Modeling DataBase (MMDB) subset of PDB, excludes theoretical structures, with native.asn format.asn = single-coordinate per-atom molecules, explicit bonding and SS remarks suited for computation, such as homology modeling and structure comparison Cumulative increase in the number of folds and superfamilies X-Ray Crystallography crystallize and immobilize single, perfect protein bombard with X-rays, record scattering diffraction patterns determine electron density map from scattering and phase via Fourier transform: use electron density and biochemical knowledge of the protein to refine and determine a model "All crystallographic models are not equal.... The brightly colored stereo views of a protein model, which are in fact more akin to cartoons than to molecules, endow the model with a concreteness that exceeds the intentions of the thoughtful crystallographer. It is impossible for the crystallographer, with vivid recall of the massive labor that produced the model, to forget its shortcomings. It is all too easy for users of the model to be unaware of them. It is also all too easy for the user to be unaware that, through temperature factors, occupancies, undetected parts of the protein, and unexplained density, crystallography reveals more than a single molecular model shows. - Rhodes, Crystallography Made Crystal Clear p. 183. determining constraints using constraints to determine secondary structure NMR Spectroscopy protein in aqueous solution, motile and tumbles/vibrates with thermal motion NMR detects chemical shifts of atomic nuclei with non-zero spin, shifts due to electronic environment nearby determine distances between specific pairs of atoms based on shifts, constraints use constraints and biochemical knowledge of the protein to determine an ensemble of models Fluorescence Resonance Energy Transfer FRET described as a molecular ruler segments of a protein are tagged with fluorophores energy transfer occurs when donor and acceptor interact, falls off as 1/d 6 where d is separation between donor and acceptor donor and acceptor must be within 50 Å, acceptor emission sensitive to distance change can determine pairs of side chains that are separated when unfolded and close when folded Computational Determination and Analysis Databases CATH (Class, Architecture, Topology, Homologous superfamily) SCOP (Structural Classification Of Proteins) FSSP (Fold classification based on Structure-Structure alignment of Proteins) Prediction Ab-initio, theoretical modeling, and conformation space search Homology modeling and threading Energy minimization, simulation and Monte Carlo Proteomics (next week) 4

CATH a combination of manual and automated hierarchical classification four major levels: Class (C) based on secondary structure content Architecture (A) based on gross orientation of secondary structures Topology (T) based on connections and numbers of secondary structures Homologous superfamily (H) based on structure/function evolutionary commonalities provides useful geometric information (e.g. architecture) partial automation may result in examples near fixed thresholds being assigned inaccurately SCOP a purely manual hierarchical classification three major levels: Family based on clear evolutionary relationship (pairwise residue identities between proteins are >30%) Superfamily based on probable evolutionary origin (low sequence identity but common structure/function features Fold based on major structural similarity (major secondary structures in same arrangement and topology provides detailed evolutionary information manual process influences update frequency and equally exhaustive examination a purely automated hierarchical classification three major levels: representative set 330 protein chains (less than 30% sequence identity) clustering based on structural alignment into fold families convergence cutting at a high statistical significance level increases the number of distinct families, gradually approaching one family per protein chain continually updated, presents data and lets user assess Without sufficient knowledge, user may not assess data appropriately FSSP list of representative set clustering dendogram CATH vs. SCOP vs. FSSP approximately two-thirds of the protein chains in each database are common to all three databases FSSP pairwise matches (Z-score 4.0) compared to CATH and SCOP matches at the fold level (a), homology level (b) FSSP pairwise matches (Z-score 6.0) compared to CATH and SCOP matches at the fold level (c), homology level (d) FSSP pairwise matches (Z-score 8.0) compared to CATH and SCOP matches at the fold level (e), homology level (f) Ab-initio, theoretical modeling, and conformation space search Ab-initio = given amino acid primary structure, i.e. sequence, derive structure from first principles (e.g. treat amino acids as beads and derive possible structures by rotating through all possible φ, ψ angles using a reliable energy function, then optimize globally) Theoretical modeling = subset of ab-initio, given amino acid primary structure and knowledge about characteristic features, derive structure that has that structure and features (e.g. protein has an iron binding site possible heme substructure) Conformation space search = subset of ab-initio, but a stochastic search in which the sample space is reduced by initial conditions/assumptions (e.g. reduce sample space to conform to Ramachandran plot) Homology modeling and threading Homology modeling = knowledge-based approach, given a sequence database, use multiple sequence alignment on this database to identify structurally conserved regions and construct structure backbone and loops based on these regions, restore side-chains and refine through energy minimization (apply to proteins that have high sequence similarity to those in the database) Threading = knowledge-based approach, given a structure database of interest (e.g. one that provides a limited set of possible structures per given sequence for fold recognition, one that provides a one structure per given limited set of possible sequences for inverse folding) use scoring functions and correlations from this database to derive structure that is in agreement (apply to proteins with moderate sequence similarity to those in the database) 5

Energy minimization, simulation and Monte Carlo Energy minimization = select an appropriate energy function and derive conformations that yield minimal energies based on this function Simulation = select appropriate molecular conditions and derive conformations that are suited to these molecular conditions Proteomics Mass Spectrometry Next Week Monte Carlo = subset of molecular simulation, but it is an iterated search through a Markov chain of conformations (many iterations canonical distribution, P(particular conformation)~exp(-e/t)) proposed by N. Metropolis, in which a new conformation is generated from the current one by a small ``move'' and is accepted with a probability P acc = min(1, exp(- E/kT)), which depends on the corresponding change in energy, E, and on an external adjustable parameter, kt References C. Branden, J. Tooze. Introduction to Protein Structure. Garland Science Publishing, 1999. C. Chothia, T. Hubard, S. Brenner, H. Barns, A. Murzin. Protein Folds in the All-β and ALL-α Classes. Annu. Rev. Biophys. Biomol. Struct., 1997, 26:597-627. G.M. Church. Proteins 1: Structure and Interactions. Biophysics 101: Computational Biology and Genomics, October 28, 2003. C. Hadley, D.T. Jones. A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure, August 27, 1999, 7:1099-1112. S. Komili. Section 8: Protein Structure. Biophysics 101: Computational Biology and Genomics, November 12, 2002. D.L. Nelson, A.L. Lehninger, M.M. Cox. Principles of Biochemistry, Third Edition. Worth Publishing, May 2002..pdb animation created with PDB to MultiGif, http://www.dkfz-heidelberg.de/spec/pdb2mgif/expert.html 6