STRUCTURAL BIOINFORMATICS. Barry Grant University of Michigan

Similar documents
Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

From Amino Acids to Proteins - in 4 Easy Steps

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Protein Structure Basics

Introduction to" Protein Structure

Physiochemical Properties of Residues

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

CAP 5510 Lecture 3 Protein Structures

Conformational Geometry of Peptides and Proteins:

Biomolecules: lecture 10

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

Supersecondary Structures (structural motifs)

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

BCMP 201 Protein biochemistry

4 Proteins: Structure, Function, Folding W. H. Freeman and Company

The Structure and Functions of Proteins

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Introduction to Computational Structural Biology

The protein folding problem consists of two parts:

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Biomolecules: lecture 9

ALL LECTURES IN SB Introduction

BIBC 100. Structural Biochemistry

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

D Dobbs ISU - BCB 444/544X 1

Protein Folding. I. Characteristics of proteins. C α

Proteins. Division Ave. High School Ms. Foglia AP Biology. Proteins. Proteins. Multipurpose molecules

AP Biology. Proteins. AP Biology. Proteins. Multipurpose molecules

BIOCHEMISTRY Unit 2 Part 4 ACTIVITY #6 (Chapter 5) PROTEINS

Supplementary Information

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Molecular Mechanics, Dynamics & Docking

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Bi 8 Midterm Review. TAs: Sarah Cohen, Doo Young Lee, Erin Isaza, and Courtney Chen

Bioinformatics. Macromolecular structure

Lecture 10 (10/4/17) Lecture 10 (10/4/17)

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Charged amino acids (side-chains)

Biomolecules. Energetics in biology. Biomolecules inside the cell

Protein folding. Today s Outline

Motif Prediction in Amino Acid Interaction Networks

BME Engineering Molecular Cell Biology. Structure and Dynamics of Cellular Molecules. Basics of Cell Biology Literature Reading

Assignment 2 Atomic-Level Molecular Modeling

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004

Bulk behaviour. Alanine. FIG. 1. Chemical structure of the RKLPDA peptide. Numbers on the left mark alpha carbons.

Protein Structure & Motifs

Computational Biology 1

Chapter 1. Topic: Overview of basic principles

Lecture 26: Polymers: DNA Packing and Protein folding 26.1 Problem Set 4 due today. Reading for Lectures 22 24: PKT Chapter 8 [ ].

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Basics of protein structure

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

CHEM 463: Advanced Inorganic Chemistry Modeling Metalloproteins for Structural Analysis

BSc and MSc Degree Examinations

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

Details of Protein Structure

Papers listed: Cell2. This weeks papers. Chapt 4. Protein structure and function. The importance of proteins

Contents. xiii. Preface v

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

Part 7 Bonds and Structural Supports

Bonds and Structural Supports

Free energy, electrostatics, and the hydrophobic effect

Why Proteins Fold. How Proteins Fold? e - ΔG/kT. Protein Folding, Nonbonding Forces, and Free Energy

What is the central dogma of biology?

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Introduction to Protein Folding

titin, has 35,213 amino acid residues (the human version of titin is smaller, with only 34,350 residues in the full length protein).

Major Types of Association of Proteins with Cell Membranes. From Alberts et al

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

Protein Dynamics, Allostery and Function

Enzyme Catalysis & Biotechnology

Principles of Physical Biochemistry

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

Protein Structure and Function. Protein Architecture:

Useful background reading

Nanobiotechnology. Place: IOP 1 st Meeting Room Time: 9:30-12:00. Reference: Review Papers. Grade: 40% midterm, 60% final report (oral + written)

Protein structure and folding

Section Week 3. Junaid Malek, M.D.

2: CHEMICAL COMPOSITION OF THE BODY

Biochemistry - I SPRING Mondays and Wednesdays 9:30-10:45 AM (MR-1307) Lectures 3-4. Based on Profs. Kevin Gardner & Reza Khayat

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Proteins are not rigid structures: Protein dynamics, conformational variability, and thermodynamic stability

Model Mélange. Physical Models of Peptides and Proteins

Solutions and Non-Covalent Binding Forces

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain.

Biochemistry: Concepts and Connections

Transcription:

STRUCTURAL BIOINFORMATICS Barry Grant University of Michigan www.thegrantlab.org bjgrant@umich.edu Bergen, Norway 28-Sep-2015

Objective: Provide an introduction to the practice of structural bioinformatics, major goals, current research challenges, and application areas.

What does Bioinformatics mean to you?

Bioinformatics is the application of computers to the collection, archiving, organization, and interpretation of biological data. [Orengo, 2003]

Bioinformatics is the application of computers to the collection, archiving, organization, and interpretation of biological data. [Orengo, 2003] A hybrid of biology and computer science

Bioinformatics is the application of computers to the collection, archiving, organization, and interpretation of biological data. [Orengo, 2003] Bioinformatics is computer aided biology!

So what is structural bioinformatics?

So what is structural bioinformatics? computer aided structural biology! Aims to characterizes biomolecules and their assembles at the molecular & atomic level

Why should we care?

Why should we care? Because biomolecules are nature s robots and because it is only by coiling into specific 3D structures that they are able to perform their functions

BIOINFORMATICS DATA Literature and ontologies Gene expression Genomes Protein sequence DNA & RNA sequence Protein structure DNA & RNA structure Protein families, motifs and domains Chemical entities Protein interactions Pathways Systems

STRUCTURAL DATA IS CENTRAL Literature and ontologies Gene expression Genomes Protein sequence DNA & RNA sequence Protein structure DNA & RNA structure Protein families, motifs and domains Chemical entities Protein interactions Pathways Systems

STRUCTURAL DATA IS CENTRAL Literature and ontologies Gene expression Genomes Protein sequence DNA & RNA sequence DNA & RNA structure Protein families, motifs and domains Protein structure Sequence > Structure > Function Chemical entities Protein interactions Pathways change color to gray and yellow from black and red? Systems

STRUCTURAL DATA IS CENTRAL Literature and ontologies Gene expression Genomes Protein sequence DNA & RNA sequence DNA & RNA structure Protein families, motifs and domains ENERGETICS Protein structure Sequence > Structure > Function > DYNAMICS > Chemical entities Protein interactions Pathways Systems

Sequence Unfolded chain of amino acid chain Highly mobile Inactive Structure Ordered in a precise 3D arrangment Stable but dynamic Function Active in specific conformations Specific associations & precise reactions

KEY CONCEPT: ENERGY LANDSCAPE Unfolded Expanded, Disordered Native Compact, Ordered

KEY CONCEPT: ENERGY LANDSCAPE 1 millisecond Barrier crossing time ~exp(barrier Height) 0.1 microseconds Unfolded Expanded, Disordered Molten Globule Compact, Disordered Barrier Height Native Compact, Ordered

KEY CONCEPT: ENERGY LANDSCAPE 0.1 microseconds Unfolded State Expanded, Disordered Multiple Native Conformations Native (e.g. ligand Barrier bound and unbound) Molten Globule State Compact, Disordered Height 1 millisecond Barrier crossing time ~exp(barrier Height) State(s) Compact, Ordered

OUTLINE: Overview of structural bioinformatics Major motivations, goals and challenges Fundamentals of protein structure Composition, form, forces and dynamics Representing and interpreting protein structure Modeling energy as a function of structure

OUTLINE: Overview of structural bioinformatics Major motivations, goals and challenges Fundamentals of protein structure Composition, form, forces and dynamics Representing and interpreting protein structure Modeling energy as a function of structure

TRADITIONAL FOCUS PROTEIN, DNA AND SMALL MOLECULE DATA SETS WITH MOLECULAR STRUCTURE Protein (PDB) DNA (NDB) Small Molecules (CCDB)

Motivation 1: Detailed understanding of molecular interactions Provides an invaluable structural context for conservation and mechanistic analysis leading to functional insight.

Motivation 1: Detailed understanding of molecular interactions Computational modeling can provide detailed insight into functional interactions, their regulation and potential consequences of perturbation. Grant et al. PLoS. Comp. Biol. (2010)

Motivation 2: Lots of structural data is becoming available 107,958 (4/8/2015) Structural Genomics has contributed to driving down the cost and time required for structural determination Data from: http://www.rcsb.org/pdb/statistics/

target selection Motivation 2: Lots of structural data is becoming available harvesting cloning imaging expression purification crystallization Structural Genomics has contributed to driving down the cost and time required for structural determination PDB bl xtal mounting xtal screening publication annotation data collection phasing tracing struc. validation struc. refinement Image Credit: Structure determination assembly line Adam Godzik

Motivation 3: Theoretical and computational predictions have been, and continue to be, enormously valuable and influential!

SUMMARY OF KEY MOTIVATIONS Sequence > Structure > Function Structure determines function, so understanding structure helps our understanding of function Structure is more conserved than sequence Structure allows identification of more distant evolutionary relationships Structure is encoded in sequence Understanding the determinants of structure allows design and manipulation of proteins for industrial and medical advantage

Goals: Analysis Visualization Comparison Prediction Design Residue No. Grant et al. JMB. (2007)

Goals: Analysis Visualization Comparison Prediction Design Scarabelli and Grant. PLoS. Comp. Biol. (2013)

Goals: Analysis Visualization Comparison Prediction Design Scarabelli and Grant. PLoS. Comp. Biol. (2013)

Goals: Analysis Visualization Comparison Prediction Design myosin kinesin G-protein Grant et al. unpublished

Goals: Analysis Visualization Comparison Prediction Design Grant et al. PLoS One (2011, 2012)

Goals: Analysis Visualization Comparison Prediction Design Grant et al. PLoS Biology (2011)

MAJOR RESEARCH AREAS AND CHALLENGES Include but are not limited to: Protein classification Structure prediction from sequence Binding site detection Binding prediction and drug design Modeling molecular motions Predicting physical properties (stability, binding affinities) Design of structure and function etc... With applications to Biology, Medicine, Agriculture and Industry

NEXT UP: Overview of structural bioinformatics Major motivations, goals and challenges Fundamentals of protein structure Composition, form, forces and dynamics Representing and interpreting protein structure Modeling energy as a function of structure

HIERARCHICAL STRUCTURE OF PROTEINS Primary > Secondary > Tertiary > Quaternary amino acid residues Alpha helix Polypeptide chain Assembled subunits Image from: http://www.ncbi.nlm.nih.gov/books/nbk21581/

RECAP: AMINO ACID NOMENCLATURE side chain (R group) main chain (backbone) Image from: http://www.ncbi.nlm.nih.gov/books/nbk21581/

AMINO ACIDS CAN BE GROUPED BY THE PHYSIOCHEMICAL PROPERTIES Image from: http://www.ncbi.nlm.nih.gov/books/nbk21581/

AMINO ACIDS POLYMERIZE THROUGH PEPTIDE BOND FORMATION side chains backbone Image from: http://www.ncbi.nlm.nih.gov/books/nbk21581/

PEPTIDES CAN ADOPT DIFFERENT CONFORMATIONS BY VARYING THEIR PHI & PSI BACKBONE TORSIONS φ ψ C- terminal N- terminal Bond angles and lengths are largely invariant Peptide bond is planer (Cα, C, O, N, H, Cα all lie in the same plane) Image from: http://www.ncbi.nlm.nih.gov/books/nbk21581/

PHI VS PSI PLOTS ARE KNOWN AS RAMACHANDRAN DIAGRAMS Beta Sheet Alpha Helix Steric hindrance dictates torsion angle preference Ramachandran plot show preferred regions of φ and ψ dihedral angles which correspond to major forms of secondary structure Image from: http://www.ncbi.nlm.nih.gov/books/nbk21581/

MAJOR SECONDARY STRUCTURE TYPES ALPHA HELIX & BETA SHEET α- helix β- sheets Most common from has 3.6 residues per turn (number of residues in one full rotation of 360 ) Hydrogen bonds (dashed lines) between residue i and i+4 stabilize the structure The side chains (in green) protrude outward 3 10 - helix and π- helix forms are less common Hydrogen bond: i i+4 Image from: http://www.ncbi.nlm.nih.gov/books/nbk21581/

MAJOR SECONDARY STRUCTURE TYPES ALPHA HELIX & BETA SHEET In antiparallel β- sheets Adjacent β- strands run in opposite directions Hydrogen bonds (dashed lines) between NH and CO stabilize the structure The side chains (in green) are above and below the sheet Image from: http://www.ncbi.nlm.nih.gov/books/nbk21581/

MAJOR SECONDARY STRUCTURE TYPES ALPHA HELIX & BETA SHEET In parallel β- sheets Adjacent β- strands run in same direction Hydrogen bonds (dashed lines) between NH and CO stabilize the structure The side chains (in green) are above and below the sheet Image from: http://www.ncbi.nlm.nih.gov/books/nbk21581/

What Does a Protein Look like?

Proteins are stable (and hidden) in water

Proteins closely interact with water

Proteins are close packed solid but flexible objects (globular)

Due to their large size and complexity it is often hard to see whats important in the structure

Backbone or main- chain representation can help trace chain topology

Backbone or main- chain representation can help trace chain topology & reveal secondary structure

Simplified secondary structure representations are commonly used to communicate structural details Now we can clearly see 2 o, 3 o and 4 o structure Coiled chain of connected secondary structures

DISPLACEMENTS REFLECT INTRINSIC FLEXIBILITY Superposition of all 482 structures in RCSB PDB (23/09/2015)

DISPLACEMENTS REFLECT INTRINSIC FLEXIBILITY Principal component analysis (PCA) of experimental structures

KEY CONCEPT: ENERGY LANDSCAPE 0.1 microseconds Unfolded State Expanded, Disordered Multiple Native Conformations Native (e.g. ligand Barrier bound and unbound) Molten Globule State Compact, Disordered Height 1 millisecond Barrier crossing time ~exp(barrier Height) State(s) Compact, Ordered

Key forces affecang structure: H- bonding Van der Waals Electrostaacs Hydrophobicity Disulfide Bridges d 2.6 Å < d < 3.1Å 150 < θ < 180

Key forces affecang structure: H- bonding Van der Waals Repulsion Electrostaacs Hydrophobicity Disulfide Bridges Airacaon d 3 Å < d < 4Å

Key forces affecang structure: H- bonding Van der Waals Electrostaacs d d = 2.8 Å Hydrophobicity (some ame called IONIC BONDs or SALT BRIDGEs) Disulfide Bridges Coulomb s law E = Energy k = constant D = Dielectric constant (vacuum = 1; H 2 O = 80) q 1 & q 2 = electronic charges (Coulombs) r = distance (Å)

Key forces affecang structure: H- bonding Van der Waals Electrostaacs Hydrophobicity Disulfide Bridges The force that causes hydrophobic molecules or nonpolar poraons of molecules to aggregate together rather than to dissolve in water is called Hydrophobicity (Greek, water fearing ). This is not a separate bonding force; rather, it is the result of the energy required to insert a nonpolar molecule into water.

Forces affecang structure: H- bonding Van der Waals Electrostaacs Hydrophobicity Disulfide Bridges Other names: cysane bridge disulfide bridge Hair contains lots of disulfide bonds which are broken and reformed by heat 10

NEXT UP: Overview of structural bioinformatics Major motivations, goals and challenges Fundamentals of protein structure Composition, form, forces and dynamics Representing and interpreting protein structure Modeling energy as a function of structure

KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE A SYSTEMS ENERGY AS A FUNCTION OF ITS STRUCTURE Two main approaches: (1). Physics- Based (2). Knowledge- Based

KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE A SYSTEMS ENERGY AS A FUNCTION OF ITS STRUCTURE Two main approaches: (1). Physics- Based (2). Knowledge- Based

PHYSICS-BASED POTENTIALS ENERGY TERMS FROM PHYSICAL THEORY U bond = oscillations about the equilibrium bond length U angle = oscillations of 3 atoms about an equilibrium bond angle U dihedral = torsional rotation of 4 atoms about a central bond U nonbond = non-bonded energy terms (electrostatics and Lenard-Jones) CHARMM P.E. function, see: http://www.charmm.org/

Slide Credit: Michael Levitt

2223234 Slide Credit: Michael Levitt

PHYSICS-ORIENTED APPROACHES Weaknesses Fully physical detail becomes computaaonally intractable Approximaaons are unavoidable (Quantum effects approximated classically, water may be treated crudely) Parameterizaaon sall required Strengths Interpretable, provides guides to design Broadly applicable, in principle at least Clear pathways to improving accuracy Status Useful, widely adopted but far from perfect Mulaple groups working on fewer, beier approxs Force fields, quantum entropy, water effects Moore s law: hardware improving

Put Levit s Slide here on Computer Power Increases! Johnny Appleseed To Do!

SIDE-NOTE: GPUS AND ANTON SUPERCOMPUTER

SIDE-NOTE: GPUS AND ANTON SUPERCOMPUTER

KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE A SYSTEMS ENERGY AS A FUNCTION OF ITS STRUCTURE Two main approaches: (1). Physics- Based (2). Knowledge- Based

KNOWLEDGE-BASED DOCKING POTENTIALS Hisadine Ligand carboxylate Aromaac stacking

ENERGY DETERMINES PROBABILITY (STABILITY) Basic idea: Use probability as a proxy for energy Energy Boltzmann: Probability x Inverse Boltzmann: Example: ligand carboxylate O to protein hisadine N Find all protein- ligand structures in the PDB with a ligand carboxylate O 1. For each structure, histogram the distances from O to every hisadine N 2. Sum the histograms over all structures to obtain p(r O- N ) 3. Compute E(r O- N ) from p(r O- N )

KNOWLEDGE-BASED DOCKING POTENTIALS PMF, Muegge & Martin, J. Med. Chem. (1999) 42:791 A few types of atom pairs, out of several hundred total Nitrogen + /Oxygen - Aromaac carbons Aliphaac carbons Atom- atom distance (Angstroms)

LIMITATIONS OF KNOWLEDGE-BASED POTENTIALS 1. Staasacal limitaaons (e.g., to pairwise potenaals) 100 bins for a histogram of O- N & O- C distances 10 bins for a histogram of O- N distances r 1 r 2 r 10 r O- C r O- N r O- N 2. Even if we had infinite staasacs, would the results be accurate? (Is inverse Boltzmann quite right? Where is entropy?)

KNOWLEDGE-ORIENTED APPROACHES Weaknesses Accuracy limited by availability of data Accuracy may also be limited by overall approach Strengths Relaavely easy to implement Computaaonally fast Status Useful, far from perfect May be at point of diminishing returns (not always clear how to make improvements)

CAUTIONARY NOTES Everything should be made as simple as it can be but not simpler A model is never perfect. A model that is not quanataavely accurate in every respect does not preclude one from establishing results relevant to our understanding of biomolecules as long as the biophysics of the model are properly understood and explored. CalibraIon of the parameters is an ongoing and imperfect process Quesaons and hypotheses should always be designed such that they do not depend crucially on the precise numbers used for the various parameters. A computaional model is rarely universally right or wrong A model may be accurate in some regards, inaccurate in others. These subtleaes can only be uncovered by comparing to all available experimental data.

SUMMARY Structural bioinformaacs is computer aided structural biology Described major moavaaons, goals and challenges of structural bioinformaacs Reviewed the fundamentals of protein structure Introduced both physics and knowledge based modeling approaches for describing the structure, energeacs and dynamics of proteins computaaonally

Ilan Samish et al. Bioinformatics 2015;31:146-150

INFORMING SYSTEMS BIOLOGY? Literature and ontologies Gene expression Genomes Protein sequence DNA & RNA sequence Protein structure DNA & RNA structure Protein families, motifs and domains Chemical entities Protein interactions Pathways Systems