Lecture 15: Realities of Genome Assembly Protein Sequencing

Similar documents
Protein Sequencing and Identification by Mass Spectrometry

Proteins: Characteristics and Properties of Amino Acids

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Mass spectrometry in proteomics

Translation. A ribosome, mrna, and trna.

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Properties of amino acids in proteins

Was T. rex Just a Big Chicken? Computational Proteomics

Amino Acids and Peptides

Chemistry Chapter 22

Solutions In each case, the chirality center has the R configuration

Viewing and Analyzing Proteins, Ligands and their Complexes 2

BENG 183 Trey Ideker. Protein Sequencing

Collision Cross Section: Ideal elastic hard sphere collision:

UNIT TWELVE. a, I _,o "' I I I. I I.P. l'o. H-c-c. I ~o I ~ I / H HI oh H...- I II I II 'oh. HO\HO~ I "-oh

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Exam III. Please read through each question carefully, and make sure you provide all of the requested information.

Proteomics. November 13, 2007

Studies Leading to the Development of a Highly Selective. Colorimetric and Fluorescent Chemosensor for Lysine

Protein Structure Bioinformatics Introduction

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

Other Methods for Generating Ions 1. MALDI matrix assisted laser desorption ionization MS 2. Spray ionization techniques 3. Fast atom bombardment 4.

EXAM 1 Fall 2009 BCHS3304, SECTION # 21734, GENERAL BIOCHEMISTRY I Dr. Glen B Legge

A Plausible Model Correlates Prebiotic Peptide Synthesis with. Primordial Genetic Code

Proteome Informatics. Brian C. Searle Creative Commons Attribution

Principles of Biochemistry

Read more about Pauling and more scientists at: Profiles in Science, The National Library of Medicine, profiles.nlm.nih.gov

Potentiometric Titration of an Amino Acid. Introduction

Lecture 14 - Cells. Astronomy Winter Lecture 14 Cells: The Building Blocks of Life

1. Amino Acids and Peptides Structures and Properties

Protein Struktur (optional, flexible)

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain.

12/6/12. Dr. Sanjeeva Srivastava IIT Bombay. Primary Structure. Secondary Structure. Tertiary Structure. Quaternary Structure.

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

Problem Set 1

CHMI 2227 EL. Biochemistry I. Test January Prof : Eric R. Gauthier, Ph.D.

BCH 4053 Exam I Review Spring 2017

Exam I Answer Key: Summer 2006, Semester C

Towards Understanding the Origin of Genetic Languages

7.05 Spring 2004 February 27, Recitation #2

BIS Office Hours

CHEMISTRY ATAR COURSE DATA BOOKLET

Chemical Properties of Amino Acids

Using an Artificial Regulatory Network to Investigate Neural Computation

Computational Methods For Identification Of Cyclic Peptides Using Mass Spectrometry. Julio Ng Bioinformatics Program, UCSD March, 26 th 2010

Separation of Large and Small Peptides by Supercritical Fluid Chromatography and Detection by Mass Spectrometry

A Logic-Based Approach to Polymer Sequence Analysis

Lecture'18:'April'2,'2013

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Protein Struktur. Biologen und Chemiker dürfen mit Handys spielen (leise) go home, go to sleep. wake up at slide 39

Basic Principles of Protein Structures

THE UNIVERSITY OF MANITOBA. PAPER NO: _1_ LOCATION: 173 Robert Schultz Theatre PAGE NO: 1 of 5 DEPARTMENT & COURSE NO: CHEM / MBIO 2770 TIME: 1 HOUR

Section Week 3. Junaid Malek, M.D.

Protein Sequencing and Identification by Mass Spectrometry

Rotamers in the CHARMM19 Force Field

Lecture 15. Saad Mneimneh

DNA sequencing. Bad example (repeats) Lecture 15. Shortest common superstring SCS. Given a set of fragments F,

Sequence comparison: Score matrices

Protein Structure Marianne Øksnes Dalheim, PhD candidate Biopolymers, TBT4135, Autumn 2013

Dental Biochemistry Exam The total number of unique tripeptides that can be produced using all of the common 20 amino acids is

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

C CH 3 N C COOH. Write the structural formulas of all of the dipeptides that they could form with each other.

INTRODUCTION. Amino acids occurring in nature have the general structure shown below:

Discussion Section (Day, Time): TF:

A. Two of the common amino acids are analyzed. Amino acid X and amino acid Y both have an isoionic point in the range of

Molecular Selective Binding of Basic Amino Acids by a Water-soluble Pillar[5]arene

Peptides And Proteins

Conformational Analysis

Enzyme Catalysis & Biotechnology

Discussion Section (Day, Time):

NH 2. Biochemistry I, Fall Term Sept 9, Lecture 5: Amino Acids & Peptides Assigned reading in Campbell: Chapter

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Resonance assignments in proteins. Christina Redfield

Introduction to Comparative Protein Modeling. Chapter 4 Part I

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

Protein Sequencing and Identification by Mass Spectrometry

CHEM 3653 Exam # 1 (03/07/13)

Practice Midterm Exam 200 points total 75 minutes Multiple Choice (3 pts each 30 pts total) Mark your answers in the space to the left:

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

Protein Structure. Role of (bio)informatics in drug discovery. Bioinformatics

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009

Energy and Cellular Metabolism

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011

Biochemistry by Mary K. Campbell & Shawn O. Farrell 8th. Ed. 2016

CHEM J-9 June 2014

Chapter 3 - Amino Acids

Packing of Secondary Structures

Introduction to graph theory and molecular networks

Chapter 4: Amino Acids

All Proteins Have a Basic Molecular Formula

Advanced Topics in RNA and DNA. DNA Microarrays Aptamers

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

On the Structure Differences of Short Fragments and Amino Acids in Proteins with and without Disulfide Bonds

MS/MS of Peptides Manual Sequencing of Protonated Peptides

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

Dental Biochemistry EXAM I

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two

Mass Spectrometry Based De Novo Peptide Sequencing Error Correction

Housekeeping. Housekeeping. Molecules of Life: Biopolymers

Transcription:

Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing edges: in(v)=out(v) Theorem 1: A connected graph has a Eulerian Cycle if and only if each of its vertices are balanced. In mid-tour for every path onto an island there must be another path off Exceptions are allowed at the start and end of the tour Theorem 2: A connected graph has an Eulerian Path if and only if it contains exactly two semi-balanced vertices and all others are balanced. Semi-balanced vertex: in(v) out(v) =1 One of the semi-balanced vertices, with out(v)=in(v)+1 is the start of the tour The other semi-balanced vertex, with in(v)=out(v)+1 is the end of the tour 2 1

Eulerian Cycle Start at any vertex *v*, and follow a trail of edges until you return to *v* As long as there exists any vertex *u* that belongs to the current tour, but has adjacent edges that are not part of the tour Start a new trail from *u* Following unused edges until returning to *u* Join the new trail to the original tour A more complicated Königsberg 3 Example Problem: Eulerian Path Approach S = { ATG, TGG, TGC, GTG, GGC, GCA, GCG, CGT } Vertices correspond to ( l 1 ) mers : { AT, TG, GC, GG, GT, CA, CG } Edges correspond to l mers from S GT CG AT TG GC CA GG Find path that visits every EDGE once 4 2

Genome Assembly vs Minimal Superstring Minimal substring problem Every k-mer are known and used as a vertex, (all σ k ) Paths, and there may be multiple, are solutions Read fragments No guarantee that we will see every k-mer Can't disambiguate repeats 5 From DNA to Proteins DNA sequences OS that controls living biological systems Sections of DNA (Genes) encode proteins, like programs Triplets of nucleotides (codons) encode the amino-acid sequences, as well as the stop codes, used to assemble proteins Complications in going from DNA Protein: introns, RNA editing prior to translation, posttranslational modifications 6 3

Proteins Proteins are the machinery or hardware Compose the cellular structures Control the biochemical reactions in cells Regulate and trigger the chain reactions (metabolic pathways) that result in the cell s life cycle Determine which parts of the DNA code are activated, executed, and when Like DNA, proteins are long molecular chains Sequences of 20 amino acid residues rather than 4 nucleic acids 7 From Genes to Proteins The central dogma of molecular biology is that information encoded by the bases of DNA are transcribed by RNA and then converted into proteins 8 4

Protein Components Proteins are made from 20 amino acids Peptide bonds join amino acids into long chains 100 s to 1000 s of amino acid residues long Amino Acid 3-Letter 1-Letter Molecular Weight Alanine Ala A 89.09 Cysteine Cys C 121.16 Aspartate Asp D 133.10 Glutamate Glu E 147.13 Phenylalanine Phe F 165.19 Glycine Gly G 75.07 Histidine His H 155.16 Isoleucine Ile I 131.18 Lysine Lys K 146.19 Leucine Leu L 131.18 Amino Acid 3-Letter 1-Letter Molecular Weight Methionine Met M 149.21 Asparagine Asn N 132.12 Proline Pro P 115.13 Glutamine Gln Q 146.15 Arginine Arg R 174.20 Serine Ser S 105.09 Threonine The T 119.12 Valine Val V 117.15 Tryptophan Trp W 204.23 Tyrosine Tyr Y 181.19 9 Protein Assembly Amino acids are joined by peptide bonds into long chains These chains fold into proteins Interact with other proteins and large molecules N-terminus C-terminus 10 5

Protein Sequencing Purify a sample Break into pieces Proteases cleave proteins into smaller peptide chains Read fragments Edman degradation for short peptide sequences Mass spectrometry measures mass/charge The Hard part Reassemble Relatively easy 11 Peptide Fragmentation Collision Induced Dissociation H...-HN-CH-CO... NH-CH-CO-NH-CH-CO- OH R i-1 R i R i+1 H + Prefix Fragment Suffix Fragment Peptides tend to fragment along the backbone. Fragments can also lose neutral chemical groups like NH 3 and H 2 O. 12 6

Breaking Peptides into Fragment Ions Proteases, e.g. trypsin, break proteins into peptides. A Tandem Mass Spectrometer (MS/MS) further breaks the peptides down into fragment ions and measures the mass of each piece. Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones. Mass Spectrometer measure mass/charge ratio of an ion. 13 N- and C-terminal Peptides NH 2 - -CO 2 H 14 7

Terminal peptides and ion types Peptide Mass (D) 57 + 97 + 147 + 114 = 415 Peptide without Mass (D) 57 + 97 + 147 + 114 18 = 397 15 N- and C-terminal Peptides 486 NH 2 - -CO 2 H 415 71 301 185 154 332 57 429 16 8

N- and C-terminal Peptides 486 NH 2 - -CO 2 H 415 71 301 185 154 332 57 429 17 N- and C-terminal Peptides 486 415 71 301 185 154 332 57 429 18 9

N- and C-terminal Peptides 486 415 301 Reconstruct peptide from the set of masses of fragment ions (mass-spectrum) 71 185 154 332 57 429 19 Theoretical Mass Spectrum protein = PLAY Amino Acid 3-Letter 1-Letter Molecular Weight Alanine Ala A 89.09 Cysteine Cys C 121.16 Aspartate Asp D 133.10 Glutamate Glu E 147.13 Phenylalanine Phe F 165.19 Glycine Gly G 75.07 Histidine His H 155.16 Isoleucine Ile I 131.18 Lysine Lys K 146.19 Leucine Leu L 131.18 Amino Acid 3-Letter 1-Letter Molecular Weight Methionine Met M 149.21 Asparagine Asn N 132.12 Proline Pro P 115.13 Glutamine Gln Q 146.15 Arginine Arg R 174.20 Serine Ser S 105.09 Threonine The T 119.12 Valine Val V 117.15 Tryptophan Trp W 204.23 Tyrosine Tyr Y 181.19 20 10

Intensity H 2 O Mass Spectra G V D L K 57 Da = K L G 99 Da = V D V G 0 mass The peaks in the mass spectrum: Prefix and Suffix Fragments. Fragments with neutral losses (-H 2 O, -NH 3 ) Noise and missing peaks. 21 Protein Identification with MS/MS G V D L K MS/MS Peptide Identification: 0 mass 22 11

S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6 T: + c d Full ms2 638.00 [ 165.00-1925.00] 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0 226.9 326.0 397.1 425.0 489.1 524.9 588.1 589.2 629.0 687.3 850.3 851.4 949.4 1048.6 1049.6 200 400 600 800 1000 1200 1400 1600 1800 2000 m/z De Novo vs. Database Search Database Search Relative Abundance De Novo Database of known peptides MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. Mass, Score W Database of all peptides = 20 n R A V AAAAAAAA,AAAAAAAC,AAAAAAAD,AAAAAAAE, L G T AAAAAAAG,AAAAAAAF,AAAAAAAH,AAAAAAI, E P C L K AVGELTI, AVGELTK W, AVGELTL, AVGELTM, D T YYYYYYYS,YYYYYYYT,YYYYYYYV,YYYYYYYY AVGELTK 23 A Paradox Database of all peptides is huge O(20 n ). Database of all known peptides is much smaller O(10 8 ). However, de novo algorithms can be much faster, even though their search space is much larger! A database search scans all peptides in the database of all known peptides search space to find best one. De novo eliminates the need to scan database of all peptides by modeling the problem as a graph search. 24 12