Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Similar documents
Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Protein Structure: Data Bases and Classification Ingo Ruczinski

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Protein Structures. 11/19/2002 Lecture 24 1

Basics of protein structure

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

CAP 5510 Lecture 3 Protein Structures

Structure to Function. Molecular Bioinformatics, X3, 2006

CSCE555 Bioinformatics. Protein Function Annotation

Introduction to" Protein Structure

Analysis and Prediction of Protein Structure (I)

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CS612 - Algorithms in Bioinformatics

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Getting To Know Your Protein

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Protein structure alignments

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

EBI web resources II: Ensembl and InterPro

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Week 10: Homology Modelling (II) - HHpred

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Protein Structures: Experiments and Modeling. Patrice Koehl

D Dobbs ISU - BCB 444/544X 1

BCB 444/544 Fall 07 Dobbs 1

Large-Scale Genomic Surveys

Bioinformatics. Macromolecular structure

SUPPLEMENTARY MATERIALS

-max_target_seqs: maximum number of targets to report

Details of Protein Structure

Protein Secondary Structure Prediction

Bio nformatics. Lecture 23. Saad Mneimneh

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Protein Secondary Structure Assignment and Prediction

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Sequence analysis and comparison

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Protein Secondary Structure Prediction

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Lecture 8: Protein structure analysis

ALL LECTURES IN SB Introduction

Computational Molecular Biology (

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Protein Structure & Motifs

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

A Protein Ontology from Large-scale Textmining?

Large-Scale Genomic Surveys

Physiochemical Properties of Residues

Model Mélange. Physical Models of Peptides and Proteins

Protein Structure Prediction 11/11/05

Introduction to Computational Structural Biology

Protein function prediction based on sequence analysis

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Heteropolymer. Mostly in regular secondary structure

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

December 2, :4 WSPC/INSTRUCTION FILE jbcb-profile-kernel. Profile-based string kernels for remote homology detection and motif extraction

Protein Structure and Function Prediction using Kernel Methods.

Comparing Protein Structures. Why?

Motif Prediction in Amino Acid Interaction Networks

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Supersecondary Structures (structural motifs)

Biophysics 101: Genomics & Computational Biology. Section 8: Protein Structure S T R U C T U R E P R O C E S S. Outline.

Protein Structure Prediction and Display

Molecular Modeling lecture 2

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

RNA and Protein Structure Prediction

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Protein Structure Prediction using String Kernels. Technical Report

STRUCTURAL BIOINFORMATICS I. Fall 2015

BIOCHEMISTRY Unit 2 Part 4 ACTIVITY #6 (Chapter 5) PROTEINS

Ch. 9 Multiple Sequence Alignment (MSA)

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

Mining Protein Sequences for Motifs

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Bioinformatics: Secondary Structure Prediction

Transcription:

Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1

Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2

Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 3

Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 4

Secondary Structure Prediction Software 10/7/2003 CAP/CGS 5991: Lecture 7 5

Secondary Structure Prediction [NN based] PSI-pred, nnpredict (2-layer, feedforward NN), Pred2ary [Consensus Approach] JPRED, SOPMA [K-nearest neighbor] NNSSP, PREDATOR [HMM] PSA ZPRED SSP PHD (See Sample) 10/7/2003 CAP/CGS 5991: Lecture 7 6

Motif Detection Tools PROSITE (Database of protein families & domains) Try PDOC00040. Also Try PS00041 PRINTS Sample Output BLOCKS (multiply aligned ungapped segments for highly conserved regions of proteins; automatically created) Sample Output Pfam (Protein families database of alignments & HMMs) Multiple Alignment, domain architectures, species distribution, links: Try MoST PROBE ProDom DIP 10/7/2003 CAP/CGS 5991: Lecture 7 7

Protein Information Sites SwissPROT & GenBank InterPRO is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences. See sample. PIR Sample Protein page 10/7/2003 CAP/CGS 5991: Lecture 7 8

Modular Nature of Proteins Proteins are collections of modular domains. For example, Coagulation Factor XII F2 E F2 E K Catalytic Domain PLAT F2 E K K Catalytic Domain 10/7/2003 CAP/CGS 5991: Lecture 7 9

Domain Architecture Tools CDART Protein AAH24495; Domain Architecture; It s domain relatives; Multiple alignment for 2 nd domain SMART 10/7/2003 CAP/CGS 5991: Lecture 7 10

Predicting Specialized Structures COILS Predicts coiled coil motifs TMPred predicts transmembrane regions SignalP predicts signal peptides SEG predicts nonglobular regions 10/7/2003 CAP/CGS 5991: Lecture 7 11

Tertiary & Quaternary Protein Structures Experimental methods X-ray crystallography [More accurate!] Nuclear Magnetic Resonance Spectroscopy (NMR) If protein unfolded (denatured) and released, then it goes back to its native 3-d structure. The tertiary structure is a structure of minimum energy. Angles φ and ψ are constrained. Proteins structures often have hydrophobic core. 10/7/2003 CAP/CGS 5991: Lecture 7 12

Protein Folding Unfolded Molten Globule State Folded Native State Rapid (< 1s) Slow (1 1000 s) How to find minimum energy configuration? 10/7/2003 CAP/CGS 5991: Lecture 7 13

Modular Nature of Protein Structures Example: Diphtheria Toxin 10/7/2003 CAP/CGS 5991: Lecture 7 14

Structural Classification of Proteins SCOP (Structural Classification of Proteins) Based on structurla & evolutionary relationships. Contains ~ 40,000 domains Classes (groups of folds), Folds (proteins sharing folds), Families (proteins related by function/evolution), Superfamilies (distantly related proteins) 10/7/2003 CAP/CGS 5991: Lecture 7 15

SCOP Family View 10/7/2003 CAP/CGS 5991: Lecture 7 16

CATH: Protein Structure Classification Semi-automatic classification; ~36K domains 4 levels of classification: Class (C), depends on sec. Str. Content Architecture (A), orientation of sec. Str. Topolgy (T), topological connections & Homologous Superfamily (H), similar str and functions. 10/7/2003 CAP/CGS 5991: Lecture 7 17

DALI Domain Dictionary Completely automated; 3724 domains Criteria of compactness & recurrence Each domain is assigned a Domain Classification number DC_l_m_n_p representing fold space attractor region (l), globular folding topology (m), functional family (n) and sequence family (p). 10/7/2003 CAP/CGS 5991: Lecture 7 18

5 Fold Space classes Attractor 1 can be characterized as alpha/beta, attractor 2 as all-beta, attractor 3 as all-alpha, attractor 5 as alpha-beta meander (1mli), and attractor 4 contains antiparallel beta-barrels e.g. OB-fold (1prtF). 10/7/2003 CAP/CGS 5991: Lecture 7 19

Fold Types & Neighbors Structural neighbours of 1urnA (top left). 1mli (bottom right) has the same topology even though there are shifts in the relative orientation of secondary structure elements. 10/7/2003 CAP/CGS 5991: Lecture 7 20

Sequence Alignment of Fold Neighbors 10/7/2003 CAP/CGS 5991: Lecture 7 21

Frequent Fold Types 10/7/2003 CAP/CGS 5991: Lecture 7 22

Motifs in Protein Sequences Motifs are combinations of secondary structures in proteins with a specific structure and a specific function. They are also called super-secondary structures. Examples: Helix-Turn-Helix, Zinc-finger, Homeobox domain, Hairpin-beta motif, Calcium-binding motif, Beta-alpha-beta motif, Coiled-coil motifs. Several motifs may combine to form domains. Serine proteinase domain, Kringle domain, calciumbinding domain, homeobox domain. 10/7/2003 CAP/CGS 5991: Lecture 7 23

Motif Detection Problem Input: Set, S, of known (aligned) examples of a motif M, A new protein sequence, P. Output: Does P have a copy of the motif M? Example: Zinc Finger Motif YKCGLCERSFVEKSALSRHORVHKN 3 6 19 23 Input: Output: Database, D, of known protein sequences, A new protein sequence, P. What interesting patterns from D are present in P? 10/7/2003 CAP/CGS 5991: Lecture 7 24

Helix-Turn-Helix Motifs Structure 3-helix complex Length: 22 amino acids Turn angle Function Gene regulation by binding to DNA 10/7/2003 CAP/CGS 5991: Lecture 7 25

DNA Binding at HTH Motif 10/7/2003 CAP/CGS 5991: Lecture 7 26

HTH Motifs: Examples Loc Helix 2 Turn 10/7/2003 CAP/CGS 5991: Lecture 7 27

Basis for New Algorithm Combinations of residues in specific locations (may not be contiguous) contribute towards stabilizing a structure. Some reinforcing combinations are relatively rare. 10/7/2003 CAP/CGS 5991: Lecture 7 28

New Motif Detection Algorithm Pattern Generation: Aligned Motif Examples Pattern Generator Motif Detection: New Protein Sequence Motif Detector Pattern Dictionary Detection Results 10/7/2003 CAP/CGS 5991: Lecture 7 29

Patterns Loc Protein Helix 2 Turn Helix 3 Name -1 0 1 Q1 G9 N20 A5 G9 V10 I15 10/7/2003 CAP/CGS 5991: Lecture 7 30

Pattern Mining Algorithm Algorithm Pattern-Mining Input: Motif length m, support threshold T, list of aligned motifs M. Output: Dictionary L of frequent patterns. 1. L 1 := All frequent patterns of length 1 2. for i = 2 to m do 3. C i := Candidates(L i-1 ) 4. L i := Frequent candidates from C i 5. if ( L i <= 1) then 6. return L as the union of all L j, j <= i. 10/7/2003 CAP/CGS 5991: Lecture 7 31

Candidates Function L 3 G1, V2, S3 G1, V2, T6 G1, V2, I7 G1, V2, E8 G1, S3, T6 G1, T6, I7 V2, T6, I7 V2, T6, E8 C 4 G1, V2, S3, T6 G1, V2, S3, I7 G1, V2, S3, E8 G1, V2, T6, I7 G1, V2, T6, E8 G1, V2, I7, E8 V2, T6, I7, E8 L 4 G1, V2, S3, T6 G1, V2, S3, I7 G1, V2, S3, E8 G1, V2, T6, E8 V2, T6, I7, E8 10/7/2003 CAP/CGS 5991: Lecture 7 32

Motif Detection Algorithm Algorithm Motif-Detection Input : Motif length m, threshold score T, pattern dictionary L, and input protein sequence P[1..n]. Output : Information about motif(s) detected. 1. for each location i do 2. S := MatchScore(P[i..i+m-1], L). 3. if (S > T) then 4. Report it as a possible motif 10/7/2003 CAP/CGS 5991: Lecture 7 33

Experimental Results: GYM 2.0 Motif HTH Motif (22) Protein Number GYM = DE Number GYM = Annot. Family Tested Agree Annotated Master 88 88 (100 %) 13 13 Sigma 314 284 + 23 (98 %) 96 82 Negates 93 86 (92 %) 0 0 LysR 130 127 (98 %) 95 93 AraC 68 57 (84 %) 41 34 Rreg 116 99 (85 %) 57 46 Total 675 653 + 23 (94 %) 289 255 (88 %) 10/7/2003 CAP/CGS 5991: Lecture 7 34

Experiments Basic Implementation (Y. Gao) Improved implementation & comprehensive testing (K. Mathee, GN). Implementation for homeobox domain detection (X. Wang). Statistical methods to determine thresholds (C. Bu). Use of substitution matrix (C. Bu). Study of patterns causing errors (N. Xu). Negative training set (N. Xu). NN implementation & testing (J. Liu & X. He). HMM implementation & testing (J. Liu & X. He). 10/7/2003 CAP/CGS 5991: Lecture 7 35