Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Similar documents
Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Protein Structures. 11/19/2002 Lecture 24 1

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510 Lecture 3 Protein Structures

Basics of protein structure

Introduction to" Protein Structure

Model Mélange. Physical Models of Peptides and Proteins

Supersecondary Structures (structural motifs)

Physiochemical Properties of Residues

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Conformational Geometry of Peptides and Proteins:

From Amino Acids to Proteins - in 4 Easy Steps

The Select Command and Boolean Operators

Part 4 The Select Command and Boolean Operators

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

RNA and Protein Structure Prediction

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

D Dobbs ISU - BCB 444/544X 1

BIRKBECK COLLEGE (University of London)

What is the central dogma of biology?

Mining Protein Sequences for Motifs

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Bio nformatics. Lecture 23. Saad Mneimneh

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Secondary and sidechain structures

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

BIOCHEMISTRY Unit 2 Part 4 ACTIVITY #6 (Chapter 5) PROTEINS

Protein Structure: Data Bases and Classification Ingo Ruczinski

STRUCTURAL BIOINFORMATICS. Barry Grant University of Michigan

Protein Structures: Experiments and Modeling. Patrice Koehl

Details of Protein Structure

AP Biology. Proteins. AP Biology. Proteins. Multipurpose molecules

From gene to protein. Premedical biology

Pymol Practial Guide

Motif Prediction in Amino Acid Interaction Networks

Protein Structure Basics

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Orientational degeneracy in the presence of one alignment tensor.

Proteins. Division Ave. High School Ms. Foglia AP Biology. Proteins. Proteins. Multipurpose molecules

BIBC 100. Structural Biochemistry

The Structure and Functions of Proteins

Conditional Graphical Models

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Computational Molecular Biology (

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Useful background reading

Protein Structure & Motifs

Structure to Function. Molecular Bioinformatics, X3, 2006

Basic Principles of Protein Structures

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Enzyme Catalysis & Biotechnology

ALL LECTURES IN SB Introduction

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Proteins: Structure & Function. Ulf Leser

Objective: Students will be able identify peptide bonds in proteins and describe the overall reaction between amino acids that create peptide bonds.

Protein Secondary Structure Prediction

Protein Structure Bioinformatics Introduction

Analysis and Prediction of Protein Structure (I)

Protein Secondary Structure Prediction

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

titin, has 35,213 amino acid residues (the human version of titin is smaller, with only 34,350 residues in the full length protein).

Lecture 10 (10/4/17) Lecture 10 (10/4/17)

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Properties of amino acids in proteins

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Basic structures of proteins

Ch. 9 Multiple Sequence Alignment (MSA)

Charged amino acids (side-chains)

Biomolecules: lecture 10

Denaturation and renaturation of proteins

Week 10: Homology Modelling (II) - HHpred

Ant Colony Approach to Predict Amino Acid Interaction Networks

Transcription Regulation And Gene Expression in Eukaryotes UPSTREAM TRANSCRIPTION FACTORS

Problem Set 1

Bonds and Structural Supports

Part 7 Bonds and Structural Supports

Bioinformatics: Secondary Structure Prediction

Detection of Protein Binding Sites II

MULTIPLE CHOICE. Circle the one alternative that best completes the statement or answers the question.

Lecture 26: Polymers: DNA Packing and Protein folding 26.1 Problem Set 4 due today. Reading for Lectures 22 24: PKT Chapter 8 [ ].

Introduction to Protein Folding

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier *

1. (5) Draw a diagram of an isomeric molecule to demonstrate a structural, geometric, and an enantiomer organization.

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004

Genome Annotation Project Presentation

Transcription:

Protein Structures Sequences of amino acid residues 20 different amino acids Primary Secondary Tertiary Quaternary 10/8/2002 Lecture 12 1

Angles φ and ψ in the polypeptide chain 10/8/2002 Lecture 12 2

10/8/2002 Lecture 12 3

More on Secondary Structures α-helix Main chain with peptide bonds Side chains project outward from helix Stability provided by H-bonds between CO and NH groups of residues 4 locations away. β-strand Stability provided by H-bonds with one or more β-strands, forming β-sheets. Needs a β-turn. 10/8/2002 Lecture 12 4

Secondary Structure Prediction Software 10/8/2002 Lecture 12 5

Active Sites Active sites in proteins are usually hydrophobic pockets/crevices/troughs that involve sidechain atoms. 10/8/2002 Lecture 12 6

Active Sites Left PDB 3RTD (streptavidin) and the first site located by the MOE Site Finder. Middle 3RTD with complexed ligand (biotin). Right Biotin ligand overlaid with calculated alpha spheres of the first site. 10/8/2002 Lecture 12 7

Motifs in Protein Sequences Motifs are combinations of secondary structures in proteins with a specific structure and a specific function. They are also called super-secondary structures. Examples: Helix-Turn-Helix, Zinc-finger, Homeobox domain, Hairpin-beta motif, Calcium-binding motif, Beta-alpha-beta motif, Coiled-coil motifs. Several motifs may combine to form domains. Serine proteinase domain, Kringle domain, calciumbinding domain, homeobox domain. 10/8/2002 Lecture 12 8

Motif Detection Problem Input: Set, S, of known (aligned) examples of a motif M, A new protein sequence, P. Output: Does P have a copy of the motif M? Example: Zinc Finger Motif YKCGLCERSFVEKSALSRHORVHKN 3 6 19 23 Input: Output: Database, D, of known protein sequences, A new protein sequence, P. What interesting patterns from D are present in P? 10/8/2002 Lecture 12 9

Helix-Turn-Helix Motifs Structure 3-helix complex Length: 22 amino acids Turn angle Function Gene regulation by binding to DNA 10/8/2002 Lecture 12 10

DNA Binding at HTH Motif 10/8/2002 Lecture 12 11

HTH Motifs: Examples Loc Protein Helix 2 Turn Helix 3 Name -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 14 Cro F G Q E K T A K D L G V Y Q S A I N K A I H 16 434 Cro M T Q T E L A T K A G V K Q Q S I Q L I E A 11 P22 Cro G T Q R A V A K A L G I S D A A V S Q W K E 31 Rep L S Q E S V A D K M G M G Q S G V G A L F N 16 434 Rep L N Q A E L A Q K V G T T Q Q S I E Q L E N 19 P22 Rep I R Q A A L G K M V G V S N V A I S Q W E R 24 CII L G T E K T A E A V G V D K S Q I S R W K R 4 LacR V T L Y D V A E Y A G V S Y Q T V S R V V N 167 CAP I T R Q E I G Q I V G C S R E T V G R I L K 66 TrpR M S Q R E L K N E L G A G I A T I T R G S N 22 BlaA Pv L N F T K A A L E L Y V T Q G A V S Q Q V R 23 TrpI Ps N S V S Q A A E Q L H V T H G A V S R Q L K 10/8/2002 Lecture 12 12

Basis for New Algorithm Combinations of residues in specific locations (may not be contiguous) contribute towards stabilizing a structure. Some reinforcing combinations are relatively rare. 10/8/2002 Lecture 12 13

New Motif Detection Algorithm Pattern Generation: Aligned Motif Examples Pattern Generator Motif Detection: New Protein Sequence Motif Detector Pattern Dictionary Detection Results 10/8/2002 Lecture 12 14

Patterns Loc Protein Helix 2 Turn Helix 3 Name -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 14 Cro F G Q E K T A K D L G V Y Q S A I N K A I H 16 434 Cro M T Q T E L A T K A G V K Q Q S I Q L I E A 11 P22 Cro G T Q R A V A K A L G I S D A A V S Q W K E 31 Rep L S Q E S V A D K M G M G Q S G V G A L F N 16 434 Rep L N Q A E L A Q K V G T T Q Q S I E Q L E N 19 P22 Rep I R Q A A L G K M V G V S N V A I S Q W E R 24 CII L G T E K T A E A V G V D K S Q I S R W K R 4 LacR V T L Y D V A E Y A G V S Y Q T V S R V V N 167 CAP I T R Q E I G Q I V G C S R E T V G R I L K 66 TrpR M S Q R E L K N E L G A G I A T I T R G S N 22 BlaA Pv L N F T K A A L E L Y V T Q G A V S Q Q V R 23 TrpI Ps N S V S Q A A E Q L H V T H G A V S R Q L K Q1 G9 N20 A5 G9 V10 I15 10/8/2002 Lecture 12 15

Pattern Mining Algorithm Algorithm Pattern-Mining Input: Motif length m, support threshold T, list of aligned motifs M. Output: Dictionary L of frequent patterns. 1. L 1 := All frequent patterns of length 1 2. for i = 2 to m do 3. C i := Candidates(L i-1 ) 4. L i := Frequent candidates from C i 5. if ( L i <= 1) then 6. return L as the union of all L j, j <= i. 10/8/2002 Lecture 12 16

Candidates Function L 3 G1, V2, S3 G1, V2, T6 G1, V2, I7 G1, V2, E8 G1, S3, T6 G1, T6, I7 V2, T6, I7 V2, T6, E8 C 4 G1, V2, S3, T6 G1, V2, S3, I7 G1, V2, S3, E8 G1, V2, T6, I7 G1, V2, T6, E8 G1, V2, I7, E8 V2, T6, I7, E8 L 4 G1, V2, S3, T6 G1, V2, S3, I7 G1, V2, S3, E8 G1, V2, T6, E8 V2, T6, I7, E8 10/8/2002 Lecture 12 17

Motif Detection Algorithm Algorithm Motif-Detection Input : Motif length m, threshold score T, pattern dictionary L, and input protein sequence P[1..n]. Output : Information about motif(s) detected. 1. for each location i do 2. S := MatchScore(P[i..i+m-1], L). 3. if (S > T) then 4. Report it as a possible motif 10/8/2002 Lecture 12 18

Experimental Results: GYM 2.0 Motif HTH Motif (22) Protein Number GYM = DE Number GYM = Annot. Family Tested Agree Annotated Master 88 88 (100 %) 13 13 Sigma 314 284 + 23 (98 %) 96 82 Negates 93 86 (92 %) 0 0 LysR 130 127 (98 %) 95 93 AraC 68 57 (84 %) 41 34 Rreg 116 99 (85 %) 57 46 Total 675 653 + 23 (94 %) 289 255 (88 %) 10/8/2002 Lecture 12 19

Experiments Basic Implementation (Y. Gao) Improved implementation & comprehensive testing (K. Mathee, GN). Implementation for homeobox domain detection (X. Wang). Statistical methods to determine thresholds (C. Bu). Use of substitution matrix (C. Bu). Study of patterns causing errors (N. Xu). Negative training set (N. Xu). NN implementation & testing (J. Liu & X. He). HMM implementation & testing (J. Liu & X. He). 10/8/2002 Lecture 12 20