SeqAn and OpenMS Integration Workshop. Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI)

Similar documents
Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

BIOINF 4120 Bioinforma2cs 2 - Structures and Systems -

Isotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Last updated: Copyright

Chemical Labeling Strategy for Generation of Internal Standards for Targeted Quantitative Proteomics

Targeted Proteomics Environment

Comprehensive support for quantitation

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

MassHunter TOF/QTOF Users Meeting

Quan%ta%on with XPRESS. and. ASAPRa%o

Agilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis

TUTORIAL EXERCISES WITH ANSWERS

A Description of the CPTAC Common Data Analysis Pipeline (CDAP)

X!TandemPipeline (Myosine Anabolisée) validating, filtering and grouping MSMS identifications

profileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research

Designed for Accuracy. Innovation with Integrity. High resolution quantitative proteomics LC-MS

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

HOWTO, example workflow and data files. (Version )

Analysis of Labeled and Non-Labeled Proteomic Data Using Progenesis QI for Proteomics

MassHunter Software Overview

Genome wide analysis of protein and mrna half lives reveals dynamic properties of mammalian gene expression

Spectronaut Pulsar. User Manual

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring

combined with the computing power of the W4M infrastructure

Tutorial 2: Analysis of DIA data in Skyline

PC235: 2008 Lecture 5: Quantitation. Arnold Falick

Automated SWATH Data Analysis Using Targeted Extraction of Ion Chromatograms

Data pre-processing in liquid chromatography mass spectrometry-based proteomics

itraq and RNA-Seq analyses provide new insights of Dendrobium officinale seeds (Orchidaceae)

Extend Your Metabolomics Insight!

Tutorial 1: Setting up your Skyline document

Statistical approach to protein quantification

Relative quantification using TMT11plex on a modified Q Exactive HF mass spectrometer

Computational Methods for Mass Spectrometry Proteomics

The Schrödinger KNIME extensions

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra

iprophet: Multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates

The Source Finder (SoFi)

Statistical mass spectrometry-based proteomics

Guide to Peptide Quantitation. Agilent clinical research

Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data

UCD Conway Institute of Biomolecular & Biomedical Research Graduate Education 2009/2010

Analyst Software. Peptide and Protein Quantitation Tutorial

Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-Independent Acquisition Mass Spectrometry

The Schrödinger KNIME extensions

A Better Scoring Model for De Novo Peptide Sequencing: The Symmetric Difference between Explained and Measured Masses Supplementary Figures

Mass spectrometry-based proteomics has become

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry

Modeling Mass Spectrometry-Based Protein Analysis

Agilent ESI and APCI sources: for polar to non-polar compounds

Improved Throughput and Reproducibility for Targeted Protein Quantification Using a New High-Performance Triple Quadrupole Mass Spectrometer

Statistical analysis of isobaric-labeled mass spectrometry data

Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data

Quantitative Proteomics

Tutorial 1: Library Generation from DDA data

The Schrödinger KNIME extensions

MS Based Proteomics: Recent Case Studies Using Advanced Instrumentation

Protocol. Product Use & Liability. Contact us: InfoLine: Order per fax: www:

Compounding insights Thermo Scientific Compound Discoverer Software

MALDI-HDMS E : A Novel Data Independent Acquisition Method for the Enhanced Analysis of 2D-Gel Tryptic Peptide Digests

ALIGNMENT OF LC-MS DATA USING PEPTIDE FEATURES. A Thesis XINCHENG TANG

A Software Suite for the Generation and Comparison of Peptide Arrays from Sets. of Data Collected by Liquid Chromatography-Mass Spectrometry

The Pitfalls of Peaklist Generation Software Performance on Database Searches

for XPS surface analysis

Supplemental Information. Mass Spec Studio for Integrative Structural Biology

SILAC and TMT. IDeA National Resource for Proteomics Workshop for Graduate Students and Post-docs Renny Lan 5/18/2017

Targeted protein quantification

Automated Quantification of 13C Labeled Peptides

SRM assay generation and data analysis in Skyline

Overview. Introduction. André Schreiber AB SCIEX Concord, Ontario (Canada)

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Systems Biology Exp. Methods

An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets

Agilent MassHunter Quantitative Data Analysis

WALKUP LC/MS FOR PHARMACEUTICAL R&D

1. Prepare the MALDI sample plate by spotting an angiotensin standard and the test sample(s).

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

QTOF-based proteomics and metabolomics for the agro-food chain.

Making Sense of Differences in LCMS Data: Integrated Tools

Computationally Efficient Analysis of Large Array FTIR Data In Chemical Reaction Studies Using Distributed Computing Strategy

Yifei Bao. Beatrix. Manor Askenazi

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Mass spectrometry in proteomics

Workshop: SILAC and Alternative Labeling Strategies in Quantitative Proteomics

Multi-residue analysis of pesticides by GC-HRMS

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra. Andrew Keller

PRIDE Cluster: building the consensus of proteomics data

Skyline Small Molecule Targets

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R

Protocol. Product Use & Liability. Contact us: InfoLine: Order per fax: www:

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

WADA Technical Document TD2003IDCR

Agilent 6400 Series Triple Quadrupole LC/MS/MS Users Session

Metabolomics Workflow. Discovery Workflow Guide

Agilent MassHunter Quantitative Data Analysis

RMassBank: Automatic Recalibration and Processing of Tandem HR-MS Spectra for MassBank

Transcription:

SeqAn and OpenMS Integration Workshop Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI)

Mass-spectrometry data analysis in KNIME Julianus Pfeuffer, Alexander Fillbrunn

OpenMS OpenMS an open-source C++ framework for computational mass spectrometry Jointly developed at ETH Zürich, FU Berlin, University of Tübingen Open source: BSD 3-clause license Portable: available on Windows, OSX, Linux Vendor-independent: supports all standard formats and vendor-formats through proteowizard OpenMS TOPP tools The OpenMS Proteomics Pipeline tools Building blocks: One application for each analysis step All applications share identical user interfaces Uses PSI standard formats Can be integrated in various workflow systems Galaxy WS-PGRADE/gUSE KNIME Kohlbacher et al., Bioinformatics (2007), 23:e191

OpenMS Tools in KNIME Wrapping of OpenMS tools in KNIME via GenericKNIMENodes (GKN) Every tool writes its CommonToolDescription (CTD) via its command line parser GKN generates Java source code for nodes to show up in KNIME Wraps C++ executables and provides file handling nodes

Installation of the OpenMS plugin Community-contributions update site (stable & trunk) Bioinformatics & NGS provides > 180 OpenMS TOPP tools as Community nodes SILAC, itraq, TMT, label-free, SWATH, SIP, Search engines: OMSSA, MASCOT, X!TANDEM, MSGFplus, Protein inference: FIDO

Data Flow in Shotgun Proteomics Sample HPLC/MS Raw Data 100 GB Sig. Proc. 50 MB Maps Data Reduction Peak Data 1 GB Diff. Quant. Annotated Maps Differentially Expressed Proteins 50 MB Identification 50 kb

Quantification Strategies Quantitative Proteomics Relative Quantification Absolute Quantification AQUA SISCAPA Labeled Label-Free In vivo In vitro Spectral Counting MRM Feature-Based 14 N/ 15 N SILAC itraq TMT 16 O/ 18 O After: Lau et al., Proteomics, 2007, 7, 2787

Quantitative Data LC-MS Maps Spectra are acquired with rates up to dozens per second Stacking the spectra yields maps Resolution: Up to millions of points per spectrum Tens of thousands of spectra per LC run Huge 2D datasets of up to hundreds of GB per sample MS intensity follows the chromatographic concentration

LC-MS Data (Map) Quantification (15 nmol/µl, 3x over-expressed, ) 10

Label-Free Quantification (LFQ) Label-free quantification is probably the most natural way of quantifying No labeling required, removing further sources of error, no restriction on sample generation, cheap Data on different samples acquired in different measurements higher reproducibility needed Manual analysis difficult Scales very well with the number of samples, basically no limit, no difference in the analysis between 2 or 100 samples

LFQ Analysis Strategy 1. Find features in all maps

LFQ Analysis Strategy 1. Find features in all maps 2. Align maps

LFQ Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features

LFQ Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features 4. Identify features GDAFFGMSCK

LFQ Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features 4. Identify features 5. Quantify GDAFFGMSCK 1.0 : 1.2 : 0.5

Feature-Based Alignment LC-MS maps can contain millions of peaks Retention time of peptides and metabolites can shift between experiments In label-free quantification, maps thus need to be aligned in order to identify corresponding features Alignment can be done on the raw maps (where it is usually called dewarping ) or on already identified features The latter is simpler, as it does not require the alignment of millions of peaks, but just of tens of thousands of features Disadvantage: it replies on an accurate feature finding

Feature-Based Alignment ~350,000 peaks ~ 700 features

Feature Finding Identify all peaks belonging to one peptide Key idea: Identify suspicious regions (e.g. highest peaks) Fit a model to that region and identify peaks explained by it

Feature Finding Extension: collect all data points close to the seed Refinement: remove peaks that are not consistent with the model Fit an optimal model for the reduced set of peaks Iterate this until no further improvement can be achieved

m/z Map 1 Multiple Alignment Dewarp k maps onto a comparable coordinate system Choose one map (usually the one with the largest number of features) as reference map (here: map 2 -> T 2 = 1) T 1 Map 2 T 2 Map k rt T k rt Consensus map

LFQ with OpenMS in KNIME Identification Feature finding and mapping Map alignment Feature linking Statistical analysis with R Snippets Visualization with KNIME plotting nodes

Preprocessing of single maps

Combining information of maps

Statistical post-processing and visualization