Bio nformatics. Lecture 23. Saad Mneimneh

Similar documents
Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Protein Threading. BMI/CS 776 Colin Dewey Spring 2015

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

RNA and Protein Structure Prediction

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Basics of protein structure

Orientational degeneracy in the presence of one alignment tensor.

Supersecondary Structures (structural motifs)

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Secondary structure stability, beta-sheet formation & stability

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Bayesian Models and Algorithms for Protein Beta-Sheet Prediction

Objective: Students will be able identify peptide bonds in proteins and describe the overall reaction between amino acids that create peptide bonds.

Biology Chemistry & Physics of Biomolecules. Examination #1. Proteins Module. September 29, Answer Key

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Protein Structure Basics

Exact Algorithms for Dominating Induced Matching Based on Graph Partition

Protein Structures. 11/19/2002 Lecture 24 1

Physiochemical Properties of Residues

Biomolecules: lecture 10

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Protein Structure Prediction, Engineering & Design CHEM 430

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Biophysical Model Building

Motif Prediction in Amino Acid Interaction Networks

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

CAP 5510 Lecture 3 Protein Structures

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004

Motivating the need for optimal sequence alignments...

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Assignment 2 Atomic-Level Molecular Modeling

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Protein Secondary Structure Prediction

D Dobbs ISU - BCB 444/544X 1

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Introduction to" Protein Structure

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Sequence analysis and comparison

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Protein Structure Prediction and Display

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

This means that we can assume each list ) is

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Genomics and bioinformatics summary. Finding genes -- computer searches

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

3. Branching Algorithms

Bio nformatics. Lecture 3. Saad Mneimneh

Texas A&M University

Structure-Based Comparison of Biomolecules

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

(tree searching technique) (Boolean formulas) satisfying assignment: (X 1, X 2 )

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

BIRKBECK COLLEGE (University of London)

3. Branching Algorithms COMP6741: Parameterized and Exact Computation

A Method for Aligning RNA Secondary Structures

titin, has 35,213 amino acid residues (the human version of titin is smaller, with only 34,350 residues in the full length protein).

ALL LECTURES IN SB Introduction

Supplementary Figure 1. Aligned sequences of yeast IDH1 (top) and IDH2 (bottom) with isocitrate

Algorithms and Complexity theory

Multimedia : Fibronectin and Titin unfolding simulation movies.

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Secondary and sidechain structures

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Computational Genomics and Molecular Biology, Fall

Building 3D models of proteins

Exhaustive search. CS 466 Saurabh Sinha

Model Mélange. Physical Models of Peptides and Proteins

CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models

Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Algorithm Design and Analysis

Protein Structure Determination

Useful background reading

Chemical Background for Spring 2010 BIO 311C Brand Part II

A General Model for Amino Acid Interaction Networks

Protein Structure Prediction using String Kernels. Technical Report

a 1 a 2 a 3 a 4 v i c i c(a 1, a 3 ) = 3

Quiz 2 Morphology of Complex Materials

Course Notes: Topics in Computational. Structural Biology.

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Matter and Substances Section 3-1

Single Source Shortest Paths

Pymol Practial Guide

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Transcription:

Bio nformatics Lecture 23

Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely determines the folding unfold a protein and then release it it immediately folds back to the three-dimensional structure it had before, its native structure Protein secondary structures α-helices β-sheets Neither helices nor sheets, called loops

α-helix An α-helix is a simple helix having on average 10 residues (3 turns of the helix) Some helices can have as many as 40 residues Some amino acids appear more frequently on helices than other amino acids, not a strong enough fact to allow accurate prediction

β-sheet A β-sheet consists of binding between several sections of the amino acid sequence Each participating section is called a β-strand and has generally 5 to 10 residues The strands become adjacent to each other forming a kind of twisted sheet Certain amino acids show a preference for being in a β-sheet, but again these preferences are not so positive to allow accurate prediction

Loop A loop is a section of the sequence that connects the other two kinds of secondary structure Loops are not regular structures both in shape and size In general, loops are outside a folded protein, whereas the other structures form the protein core

A folded protein α-helix β-sheet loop

Motifs and Domains A motif is a simple combination of a few secondary structure that appear in several different proteins e.g. helix-loop-helix A motif might serve as a binding site to other molecutles or may have no role at all. A domain is a more complex combination of secondary structures that have a very specific function; therefore, it contains a binding site (called active site)

Protein folding Given the amino acid sequence of a protein, determine: where exactly all of its α-helices, β-sheets, and loops are, and how they arrange themselves in motifs and domains

Greedy Approach Given enough chemical and physical information about each amino acid it should be possible to compute the free energy of a folding Enumerate all possible foldings, compute the free energy of each Choose the folding with the minimum free energy (assuming that such a folding is the protein s native structure)

Feasibility of greedy Recall that proteins fold thanks in large to the angles φ and ψ between the carbon and the neighboring atoms These angles can assume only a few values independently of each other Therefore each residue can have a configuration given by a pair of values for φ and ψ Assume both φ and ψ can assume 3 values and the protein is 100 residues long, then we have to examine 9 100 foldings!

Other problems with greedy No agreement on how to compute the free energy of a folding, too many factors to consider Shape Size Polarity of molecules Strength of interactions of molecules, etc Because of all these difficulties, other techniques have been developed, e.g. Protein Threading

Similarity of protein sequences An early method for secondary structure prediction was based on the idea that similar sequences should have similar structures folding of A known B s sequence is similar to A s sequence Folding of B is similar to A s Not generally true! Similar proteins at the sequence level may have different secondary structures On the other hand, certain proteins that are very different at the sequence level are structurally related: different loops, similar cores Protein threading: fit a known structure to a sequence

Protein threading We are given A protein sequence A with n amino acids a i A core structural model C, with m core segments, we have: The length c i of each core segment Core segments i and i+1 are connected by loop for which we know the maximum l i max and minimum l i min lengths Properties of each amino acid in C A score function f to evaluate a threading We want to find a best scoring set T = {t 1,, t m } of integers such that each t i indicates what amino acid from A occupies the first position from core segment i

Illustration 1 2 3 a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11 a 12 a 13 a 14 a 15 a 16 a 17 a 18 a 19 a 20 T = {5, 8, 17} The threading is an alignment between the sequence and the core structural model If we ignore interactions between amino acids of different structures, while allowing variable-length loop regions, the problem can be solved by a standard dynamic programming technique If we allow pairwise interactions, the problem becomes NP-hard

Approach We will allow pairwise interactions of amino acids We will solve the problem exactly with a standard technique used for handling NP-hard problems This technique is known as branch-and-bound

Branch-and-Bound Assume we want to find the maximum f(s) among many solutions s in the solution space First, it should be possible to separate the solution space into subspaces according to some constraints [branch] e.g. given a solution space X we could partition it into X 1 (all solutions that have a certain property) and X 2 (all solutions that do not) Second, for every partition X obtain an upper bound on the value of f(s) for every solution s X [bound] Assume we have f(s) for a given s. Now consider a subset X of the solution space with an upper bound u. If f(s) > u, then we can discard X.

Some considerations The upper bound should be as close to the actual function value as possible, so that we can find the maximum faster than with a weaker upper bound The upper bound should also be efficient to compute Even if we speedup the process by discarding some subsets of the solution space, the worst case running time is still exponential

Finding the solution space Every solution must satisfy: 1 + Σ j<i (c j +l j min ) t i n+1 Σ j i (c j +l j min ) t i + c i + l i min t i+1 t i + c i + l i max This implies that each t i [b i, e i ] in every solution

Illustration i i+1 i-1 b i-1 b i e i-1 e i b i+1 e i+1 A set of solutions is given by a collection of intervals, one for each of the m t i s.

Branch Given a set of solutions: choose a segment, say i choose a position u i inside the interval [b i, e i ] Split the set into three sets in which the intervals for t i will be: [b i, u i-1 ] [u i+1, e i ] [u i ]

Bound Given a solution s, f(s) can be expressed as: f(s) = Σ i g 1 (i, t i ) + Σ i Σ j>i g 2 (i,j,t i,t j ) score for each segment score for interactions Given a set X of solutions, a simple upper bound is: max s X f(s) = max s X Σ i [g 1 (i, t i ) + Σ j>i g 2 (i,j,t i,t j )] Σ i [max bi x ei g 1 (i, x) + Σ j>i max bi y,z ei g 2 (i,j,y,z)]

Algorithm X all possible threadings ub upper bound for X use a max priority queue with keys being the upper bounds ENQUEUE(Q, (ub, X)) while (true) do (ub, X) DEQUEUE(Q) if X = 1 [the only remaining threading in the set] then return X else split(x) for each new subset X i from X do ub i upper bound for X i ENQUEUE(Q, (ub i, X i ))