Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Similar documents
Bio nformatics. Lecture 23. Saad Mneimneh

Protein Threading. BMI/CS 776 Colin Dewey Spring 2015

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

RNA and Protein Structure Prediction

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Basics of protein structure

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Objective: Students will be able identify peptide bonds in proteins and describe the overall reaction between amino acids that create peptide bonds.

Orientational degeneracy in the presence of one alignment tensor.

Biology Chemistry & Physics of Biomolecules. Examination #1. Proteins Module. September 29, Answer Key

Supersecondary Structures (structural motifs)

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Secondary structure stability, beta-sheet formation & stability

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Bayesian Models and Algorithms for Protein Beta-Sheet Prediction

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Protein Structure Basics

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Exact Algorithms for Dominating Induced Matching Based on Graph Partition

Protein Structures. 11/19/2002 Lecture 24 1

Biomolecules: lecture 10

Physiochemical Properties of Residues

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Protein Structure Prediction, Engineering & Design CHEM 430

CAP 5510 Lecture 3 Protein Structures

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004

Biophysical Model Building

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Motif Prediction in Amino Acid Interaction Networks

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

Motivating the need for optimal sequence alignments...

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Assignment 2 Atomic-Level Molecular Modeling

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Protein Secondary Structure Prediction

BIRKBECK COLLEGE (University of London)

Texas A&M University

D Dobbs ISU - BCB 444/544X 1

Structure-Based Comparison of Biomolecules

Exhaustive search. CS 466 Saurabh Sinha

Multimedia : Fibronectin and Titin unfolding simulation movies.

Computational Genomics and Molecular Biology, Fall

Introduction to" Protein Structure

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Sequence analysis and comparison

ALL LECTURES IN SB Introduction

Protein Structure Prediction and Display

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror

This means that we can assume each list ) is

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

A General Model for Amino Acid Interaction Networks

Course Notes: Topics in Computational. Structural Biology.

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Protein Structure Prediction using String Kernels. Technical Report

Genomics and bioinformatics summary. Finding genes -- computer searches

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

3. Branching Algorithms

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

(tree searching technique) (Boolean formulas) satisfying assignment: (X 1, X 2 )

Intractable Problems Part Two

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

3. Branching Algorithms COMP6741: Parameterized and Exact Computation

titin, has 35,213 amino acid residues (the human version of titin is smaller, with only 34,350 residues in the full length protein).

Homework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM

Supplementary Figure 1. Aligned sequences of yeast IDH1 (top) and IDH2 (bottom) with isocitrate

Algorithms and Complexity theory

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Secondary and sidechain structures

Building 3D models of proteins

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Model Mélange. Physical Models of Peptides and Proteins

CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models

Algorithm Design and Analysis

Protein Structure Determination

BME 5742 Biosystems Modeling and Control

Useful background reading

Organic Chemistry Option II: Chemical Biology

AP Biology. Proteins. AP Biology. Proteins. Multipurpose molecules

Analysis and Prediction of Protein Structure (I)

a 1 a 2 a 3 a 4 v i c i c(a 1, a 3 ) = 3

Quiz 2 Morphology of Complex Materials

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Transcription:

Computat onal Biology Lecture 21 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely determines the folding unfold a protein and then release it it immediately folds back to the three-dimensional structure it had before, its native structure Protein secondary structures α-helices β-sheets Neither helices nor sheets, called loops α-helix An α-helix is a simple helix having on average 10 residues (3 turns of the helix) Some helices can have as many as 40 residues Some amino acids appear more frequently on helices than other amino acids, not a strong enough fact to allow accurate prediction 1

β-sheet A β-sheet consists of binding between several sections of the amino acid sequence Each participating section is called a β-strand and has generally 5 to 10 residues The strands become adjacent to each other forming a kind of twisted sheet Certain amino acids show a preference for being in a β-sheet, but again these preferences are not so positive to allow accurate prediction Loop A loop is a section of the sequence that connects the other two kinds of secondary structure Loops are not regular structures both in shape and size In general, loops are outside a folded protein, whereas the other structures form the protein core A folded protein α-helix β-sheet loop 2

Motifs and Domains A motif is a simple combination of a few secondary structure that appear in several different proteins e.g. helix-loop-helix A motif might serve as a binding site to other molecules or may have no role at all. A domain is a more complex combination of secondary structures that have a very specific function; therefore, it contains a binding site (called active site) Protein folding Given the amino acid sequence of a protein, determine: where exactly all of its α-helices, β-sheets, and loops are, and how they arrange themselves in motifs and domains Greedy Approach Given enough chemical and physical information about each amino acid it should be possible to compute the free energy of a folding Enumerate all possible foldings, compute the free energy of each Choose the folding with the minimum free energy (assuming that such a folding is the protein s native structure) 3

Feasibility of greedy Recall that proteins fold thanks in large to the angles φ and ψ between the carbon and the neighboring atoms These angles can assume only a few values independently of each other Therefore each residue can have a configuration given by a pair of values for φ and ψ Assume both φ and ψ can assume 3 values and the protein is 100 residues long, then we have to examine 9 100 foldings! Other problems with greedy No agreement on how to compute the free energy of a folding, too many factors to consider Shape Size Polarity of molecules Strength of interactions of molecules, etc Because of all these difficulties, other techniques have been developed, e.g. Protein Threading Similarity of protein sequences An early method for secondary structure prediction was based on the idea that similar sequences should have similar structures folding of A known B s sequence is similar to A s sequence Folding of B is similar to A s Not generally true! Similar proteins at the sequence level may have different secondary structures On the other hand, certain proteins that are very different at the sequence level are structurally related: different loops, similar cores Protein threading: fit a known structure to a sequence 4

Protein threading We are given A protein sequence A with n amino acids a i A core structural model C, with m core segments, we have: The length c i of each core segment Core segments i and i+1 are connected by loop for which we know the maximum l i max and minimum l i min lengths Properties of each amino acid in C A score function f to evaluate a threading We want to find a best scoring set T = {t 1,, t m } of integers such that each t i indicates what amino acid from A occupies the first position from core segment i Illustration 1 2 3 a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11 a 12 a 13 a 14 a 15 a 16 a 17 a 18 a 19 a 20 T = {5, 8, 17} The threading is an alignment between the sequence and the core structural model If we ignore interactions between amino acids of different structures, while allowing variable-length loop regions, the problem can be solved by a standard dynamic programming technique If we allow pairwise interactions, the problem becomes NP-hard Approach We will allow pairwise interactions of amino acids We will solve the problem exactly with a standard technique used for handling NP-hard problems This technique is known as branch-and-bound 5

Branch-and-Bound Assume we want to find the maximum f(s) among many solutions s in the solution space First, it should be possible to separate the solution space into subspaces according to some constraints [branch] e.g. given a solution space X we could partition it into X 1 (all solutions that have a certain property) and X 2 (all solutions that do not) Second, for every partition X obtain an upper bound on the value of f(s) for every solution s X [bound] Assume we have f(s) for a given s. Now consider a subset X of the solution space with an upper bound u. If f(s) > u, then we can discard X. Some considerations The upper bound should be as close to the actual function value as possible, so that we can find the maximum faster than with a weaker upper bound The upper bound should also be efficient to compute Even if we speedup the process by discarding some subsets of the solution space, the worst case running time is still exponential Finding the solution space Every solution must satisfy: 1 + Σ j<i (c j +l j min ) t i n+1 Σ j i (c j +l j min ) t i + c i + l i min t i+1 t i + c i + l i max This implies that each t i [b i, e i ] in every solution 6

i Illustration i+1 i-1 b i-1 b i e i-1 e i b i+1 e i+1 A set of solutions is given by a collection of intervals, one for each of the m t i s. Branch Given a set of solutions: choose a segment, say i choose a position u i inside the interval [b i, e i ] Split the set into three sets in which the intervals for t i will be: [b i, u i-1 ] [u i+1, e i ] [u i ] Bound Given a solution s, f(s) can be expressed as: f(s) = Σ i g 1 (i, t i ) + Σ i Σ j>i g 2 (i,j,t i,t j ) score for score for interactions each segment Given a set X of solutions, a simple upper bound is: max s X f(s) = max s X Σ i [g 1 (i, t i ) + Σ j>i g 2 (i,j,t i,t j )] Σ i [max bi x ei g 1 (i, x) + Σ j>i max bi y,z ei g 2 (i,j,y,z)] 7

Algorithm X all possible threadings ub upper bound for X use a max priority queue with keys being the upper bounds ENQUEUE(Q, (ub, X)) while (true) do (ub, X) DEQUEUE(Q) if X = 1 [the only remaining threading in the set] then return X else split(x) for each new subset X i from X do ub i upper bound for X i ENQUEUE(Q, (ub i, X i )) 8