CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Similar documents
Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Protein Structures. 11/19/2002 Lecture 24 1

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Basics of protein structure

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Getting To Know Your Protein

Protein Structure: Data Bases and Classification Ingo Ruczinski

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

ALL LECTURES IN SB Introduction

Analysis and Prediction of Protein Structure (I)

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

CAP 5510 Lecture 3 Protein Structures

Introduction to" Protein Structure

Protein Structures: Experiments and Modeling. Patrice Koehl

Supersecondary Structures (structural motifs)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Biophysics 101: Genomics & Computational Biology. Section 8: Protein Structure S T R U C T U R E P R O C E S S. Outline.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Physiochemical Properties of Residues

Introduction to Computational Structural Biology

Bioinformatics: Secondary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Protein Structure Basics

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Motif Prediction in Amino Acid Interaction Networks

D Dobbs ISU - BCB 444/544X 1

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

Bioinformatics. Macromolecular structure

Protein structure alignments

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Protein Secondary Structure Assignment and Prediction

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Protein Structure Prediction and Display

3D Structure. Prediction & Assessment Pt. 2. David Wishart 3-41 Athabasca Hall

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

BCB 444/544 Fall 07 Dobbs 1

Structure to Function. Molecular Bioinformatics, X3, 2006

Proteins: Structure & Function. Ulf Leser

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Heteropolymer. Mostly in regular secondary structure

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Bioinformatics: Secondary Structure Prediction

Central Dogma. modifications genome transcriptome proteome

Lecture 10 (10/4/17) Lecture 10 (10/4/17)

Details of Protein Structure

Protein Structure Prediction 11/11/05

Protein Structure Prediction

CS612 - Algorithms in Bioinformatics

Lecture 8: Protein structure analysis

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

Protein Structure Prediction, Engineering & Design CHEM 430

CSCE555 Bioinformatics. Protein Function Annotation

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

DATE A DAtabase of TIM Barrel Enzymes

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

STRUCTURAL BIOINFORMATICS. Barry Grant University of Michigan

Protein Secondary Structure Prediction

Protein Structure Prediction

Computational Molecular Biology. Protein Structure and Homology Modeling

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Homology Modeling. Roberto Lins EPFL - summer semester 2005

SUPPLEMENTARY MATERIALS

Study of Mining Protein Structural Properties and its Application

Supporting Online Material for

Introduction to Protein Folding

From Amino Acids to Proteins - in 4 Easy Steps

Protein Structure Bioinformatics Introduction

A General Model for Amino Acid Interaction Networks

Protein Structure Prediction

Protein Secondary Structure Prediction

STRUCTURAL BIOINFORMATICS I. Fall 2015

BIRKBECK COLLEGE (University of London)

Biomolecules: lecture 10

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

NMR, X-ray Diffraction, Protein Structure, and RasMol

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Syllabus BINF Computational Biology Core Course

Molecular Modeling lecture 2

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier *

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Transcription:

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinff18.html

Proteins and Protein Structure

Protein Structures Sequences of amino acid residues 20 different amino acids Primary Secondary Tertiary Quaternary!3

Proteins: Levels of Description!4

Proteins Primary structure is the sequence of amino acid residues of the protein, e.g., Flavodoxin: AKIGLFYGTQTGVTQTIAESIQQEFGGESIVDLNDIANADA Different regions of the sequence form local regular secondary structures, such as Alpha helix, beta strands, etc. AKIGLFYGTQTGVTQTIAESIQQEFGGESIVDLNDIANADA Secondary!5

More on Secondary Structures α-helix Main chain with peptide bonds Side chains project outward from helix Stability provided by H-bonds between CO and NH groups of residues 4 locations away. β-strand Stability provided by H-bonds with one or more β-strands, forming β-sheets. Needs a β-turn.!6

Proteins Tertiary structures are formed by packing secondary structural elements into a globular structure. Myoglobin Lambda Cro!7

Quaternary Structures in Proteins The final structure may contain more than one chain arranged in a quaternary structure. Quaternary Insulin Hexamer!8

More quaternary structures Bovine deoxyhemoglobin (Heterotetramer) Muscle creatine kinase (Homodimer)!9

Amino Acid Types Hydrophobic I,L,M,V,A,F,P Charged Basic Acidic K,H,R E,D Polar S,T,Y,H,C,N,Q,W Small A,S,T Very Small A,G Aromatic F,Y,W!10

Amino Acid Types!11

Structure of a single amino acid All 3 figures are cartoons of an amino acid residue.!12

Chains of amino acids Amino acids vs Amino acid residues!13

Angles φ and ψ in the polypeptide chain!14

!15

Amino Acid Structures from Klug & Cummings!16

Amino Acid Structures from Klug & Cummings!17

Amino Acid Structures from Klug & Cummings!18

Amino Acid Structures from Klug & Cummings!19

!20

Ramachandran Plot!21

!22

Alpha Helix!23

Beta Strands and Sheets!24

Molecular Representations!25

Supersecondary structures βαβ repeat βαβ-meander Greek Key Gamma β crystallin!26

Secondary Structure Prediction Software Recent Ones: GOR V PREDATOR Zpred PROF NNSSP PHD PSIPRED Jnet!27

Chou & Fasman Propensities!28

GOR IV prediction for 1bbc!29

Neural Networks!30

Neural Network Prediction of SS!31

PDB: Protein Data Bank Database of protein tertiary and quaternary structures and protein complexes. http://www.rcsb.org/pdb/ Over 29,000 structures as of Feb 1, 2005. Structures determined by NMR Spectroscopy X-ray crystallography Computational prediction methods Sample PDB file: Click here [ ]!32

PDB Search Results!33

Protein Folding Unfolded Rapid (< 1s) Molten Globule State Slow (1 1000 s) Folded Native State How to find minimum energy configuration?!34

Protein Folding!35

Energy Landscape!36

Modular Nature of Protein Structures Example: Diphtheria Toxin!37

Protein Structures Most proteins have a hydrophobic core. Within the core, specific interactions take place between amino acid side chains. Can an amino acid be replaced by some other amino acid? Limited by space and available contacts with nearby amino acids Outside the core, proteins are composed of loops and structural elements in contact with water, solvent, other proteins and other structures.!38

Active Sites Active sites in proteins are usually hydrophobic pockets/ crevices/troughs that involve sidechain atoms.!39

Active Sites Left PDB 3RTD (streptavidin) and the first site located by the MOE Site Finder. Middle 3RTD with complexed ligand (biotin). Right Biotin ligand overlaid with calculated alpha spheres of the first site.!40

Viewing Protein Structures SPDBV RASMOL CHIME!41

Structural Classification of Proteins Over 1000 protein families known Sequence alignment, motif finding, block finding, similarity search SCOP (Structural Classification of Proteins) Based on structural & evolutionary relationships. Contains ~ 40,000 domains Classes (groups of folds), Folds (proteins sharing folds), Families (proteins related by function/evolution), Superfamilies (distantly related proteins)!42

SCOP Family View!43

CATH: Protein Structure Classification Semi-automatic classification; ~36K domains 4 levels of classification: Class (C), depends on sec. Str. Content α class, β class, α/β class, α+β class Architecture (A), orientation of sec. Str. Topolgy (T), topological connections & Homologous Superfamily (H), similar str and functions.!44

DALI/FSSP Database Completely automated; 3724 domains Criteria of compactness & recurrence Each domain is assigned a Domain Classification number DC_l_m_n_p representing fold space attractor region (l), globular folding topology (m), functional family (n) and sequence family (p).!45

Structural Alignment What is structural alignment of proteins? 3-d superimposition of the atoms as best as possible, i.e., to minimize RMSD (root mean square deviation). Can be done using VAST and SARF Structural similarity is common, even among proteins that do not share sequence similarity or evolutionary relationship.!46

Other databases & tools MMDB contains groups of structurally related proteins SARF structurally similar proteins using secondary structure elements VAST Structure Neighbors SSAP uses double dynamic programming to structurally align proteins!47

5 Fold Space classes Attractor 1 can be characterized as alpha/beta, attractor 2 as all-beta, attractor 3 as all-alpha, attractor 5 as alpha-beta meander (1mli), and attractor 4 contains antiparallel beta-barrels e.g. OB-fold (1prtF).!48

Examples of protein classes Parvalbumin Translation Initiation Factor 5A Serotonin N- acetyltransfer ase Hypothetical protein from yeast (α/β alternating fold)!49

Fold Types & Neighbors Structural neighbours of 1urnA (top left). 1mli (bottom right) has the same topology even though there are shifts in the relative orientation of secondary structure elements.!50

Sequence Alignment of Fold Neighbors!51

Frequent Fold Types!52

Protein Structure Prediction Holy Grail of bioinformatics Protein Structure Initiative to determine a set of protein structures that span protein structure space sufficiently well. WHY? Number of folds in natural proteins is limited. Thus a newly discovered proteins should be within modeling distance of some protein in set. CASP: Critical Assessment of techniques for structure prediction To stimulate work in this difficult field!53

PSP Methods homology-based modeling methods based on fold recognition Threading methods ab initio methods From first principles With the help of databases!54

Best method for PSP ROSETTA As proteins fold, a large number of partially folded, lowenergy conformations are formed, and that local structures combine to form more global structures with minimum energy. Build a database of known structures (I-sites) of short sequences (3-15 residues). Monte Carlo simulation assembling possible substructures and computing energy!55

See p471, Mount Threading Methods http://www.bioinformaticsonline.org/links/ch_10_t_7.html!56

!57

Modeling Servers SwissMODEL 3DJigsaw CPHModel ESyPred3D Geno3D SDSC1 Rosetta MolIDE SCWRL PSIPred MODELLER LOOPY!58

Genomics Study of all genes in a genome All aspects of total gene content Gene Expression Microarray experiments & analysis RNA-Seq!59

Proteomics Study of all proteins in a genome, or comparison of whole genomes. Whole genome annotation & Functional proteomics Whole genome comparison Protein Expression: 2D Gel Electrophoresis!60

2D-Gels!61

2D Gel Electrophoresis!62

2D-gels Comparing Proteomes For Differences in Protein Expression Comparing Different Sample Types For Changes in Protein Levels!63

Mass Spectrometry Mass measurements By Time-of-Flight Laser ionizes protein Electric field accelerates molecules in sample toward detector Time to detector is inversely proportional to mass of molecule Infer molecular weights of proteins and peptides!64

Mass Spectrometry (MS) Using Peptide Masses to Identify Proteins Peptide mass fingerprint is a compilation of molecular weights of peptides Use molecular weight of native protein and MS signature to search database for similarlysized proteins with similar MS maps Fairly easy to sequence proteins using MS!65

!66

Other Proteomics Tools From ExPASy/SWISS-PROT: AACompIdent identify proteins from aa composition [Input: aa composition, isoelectric point, mol wt., etc. Output: proteins from DB] AACompSim compares proteins aa composition with other proteins MultIdent uses mol wt., mass fingerprints, etc. to identify proteins PeptIdent compares experimentally determined mass fingerprints with theoretically determined ones for all proteins FindMod predicts post-translational modifications based on mass difference between experimental and theoretical mass fingerprints. PeptideMass theoretical mass fingerprint for a given protein. GlycoMod predicts oligosaccharide modifications from mass difference TGREASE calculates hydrophobicity of protein along its length!67