Tutorial 7: Automated SRM data analysis using mprophet

Similar documents
Tutorial 3: Building a spectral library in Skyline

Tutorial 4: Parameter optimization

Purchase Order Workflow Processing

CHM112 Lab Graphing with Excel Grading Rubric

Experiment #3. Graphing with Excel

LDS emarket. Section 11 - Catalog Load Process

TP1 - Introduction to ArcGIS

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Hypothesis Tests for One Population Mean

WebStats User s Guide (Windows Version) Advanced Internet Technologies, Inc. December 18, 2005

Purpose: Use this reference guide to effectively communicate the new process customers will use for creating a TWC ID. Mobile Manager Call History

Skyline Custom Reports and Results Grid

TRAINING GUIDE. Overview of Lucity Spatial

DEFENSE OCCUPATIONAL AND ENVIRONMENTAL HEALTH READINESS SYSTEM (DOEHRS) ENVIRONMENTAL HEALTH SAMPLING ELECTRONIC DATA DELIVERABLE (EDD) GUIDE

CEE3430 Engineering Hydrology HEC HMS Bare Essentials Tutorial and Example

AP Statistics Notes Unit Two: The Normal Distributions

ENSC Discrete Time Systems. Project Outline. Semester

Physics 2010 Motion with Constant Acceleration Experiment 1

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

Relativity Integration Points Guide. July 3, 2018 Version

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

Lab 1 The Scientific Method

CLASS. Fractions and Angles. Teacher Report. No. of test takers: 25. School Name: EI School. City: Ahmedabad CLASS 6 B 8709

, which yields. where z1. and z2

N C R S I L V E R Q U A N T U M F A Q

Pipetting 101 Developed by BSU CityLab

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

READING STATECHART DIAGRAMS

I. SEARCH PARAMETERS AND ACCEPTANCE CRITERIA

Samples. Lutum+Tappert DV-Beratung GmbH

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

ECE 545 Project Deliverables

Evaluating enterprise support: state of the art and future challenges. Dirk Czarnitzki KU Leuven, Belgium, and ZEW Mannheim, Germany

How do scientists measure trees? What is DBH?

Web-based GIS Systems for Radionuclides Monitoring. Dr. Todd Pierce Locus Technologies

Pattern Recognition 2014 Support Vector Machines

A water level indicator or other measuring device to determine the current depth to the water.

Activity Guide Loops and Random Numbers

SAP Note Missing documentation on enhancement MDR10001

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

SPH3U1 Lesson 06 Kinematics

Differentiation Applications 1: Related Rates

Dry-Contact switch Installation Guide

Unit Project Descriptio

AC Switch with Meter Installation Guide Overview

ALE 21. Gibbs Free Energy. At what temperature does the spontaneity of a reaction change?

Please Stop Laughing at Me and Pay it Forward Final Writing Assignment

Chapter 3: Cluster Analysis

Misc. ArcMap Stuff Andrew Phay

Cells though to send feedback signals from the medulla back to the lamina o L: Lamina Monopolar cells

Writing Guidelines. (Updated: November 25, 2009) Forwards

GENESIS Structural Optimization for ANSYS Mechanical

BASD HIGH SCHOOL FORMAL LAB REPORT

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

Determining the Accuracy of Modal Parameter Estimation Methods

The steps of the engineering design process are to:

Department of Electrical Engineering, University of Waterloo. Introduction

Standard Title: Frequency Response and Frequency Bias Setting. Andrew Dressel Holly Hawkins Maureen Long Scott Miller

Guide to Using the Rubric to Score the Klf4 PREBUILD Model for Science Olympiad National Competitions

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

Temperature sensor / Dual Temp+Humidity

CHE 105 EXAMINATION III November 11, 2010

Basics. Primary School learning about place value is often forgotten and can be reinforced at home.

Computational modeling techniques

CESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

Group Analysis: Hands-On

Lecture 24: Flory-Huggins Theory

BIOLOGY 101. CHAPTER 17: Gene Expression: From Gene to Protein. The Flow of Genetic Information

A B C. 2. Some genes are not regulated by gene switches. These genes are expressed constantly. What kinds of genes would be expressed constantly?

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

User Guide: Operation of ActiveAhead Mobile Application

ELT COMMUNICATION THEORY

Hubble s Law PHYS 1301

Subject description processes

ABSORPTION OF GAMMA RAYS

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Section 6-2: Simplex Method: Maximization with Problem Constraints of the Form ~

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium

Finding the Earth s magnetic field

Effective Scientific Writing. Brian Quinn, PhD

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

APPLICATION GUIDE (v4.1)

Editorial Calendar User Guide

Part 3 Introduction to statistical classification techniques

Synchronous Motor V-Curves

Internship Programme of German Business for the Countries of Western Balkans. How to complete your application?

Thermodynamics Partial Outline of Topics

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

Space Shuttle Ascent Mass vs. Time

Creating a pharmacophore from a single protein-ligand complex

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

Five Whys How To Do It Better

Tree Structured Classifier

1. Transformer A transformer is used to obtain the approximate output voltage of the power supply. The output of the transformer is still AC.

SUMMER REV: Half-Life DUE DATE: JULY 2 nd

NUMBERS, MATHEMATICS AND EQUATIONS

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

Transcription:

Tutrial 7: Autmated SRM data analysis using mprphet mprphet is a statistical tl which can be emplyed t achieve autmated high-cnfidence identificatin f peptides. It has three functinalities: First, mmap cmbines the raw data (mzxml files) with the meta data prvided in the transitin input file. Secnd, mquest detects all peak grups amng the recrded transitins and scres all detected peak grups fr a number f characteristics (e.g. shape crrelatins, etc.). Third, mprphet cmbines the individual peak grup scres int a single discriminant scre (d_scre) fr ptimal recvery f true signals, while cntrlling the FDR. Decy transitin grups are used as negative cntrls. They represent peptide species that are absent frm the bilgical samples, and help t parameterise the null distributin fr estimating the sensitivity and the FDR. Finally, all detected peak grups fr each recrded transitin grup are ranked based n their discriminant scre and reprted. Nte that mprphet relies n prbabilistic mdelling. It subdivides each dataset int a training set and a test set, in rder t first train a classifier and then apply it t the whle dataset. Since a training set is chsen at randm, results frm repeated analyses f the same data may vary slightly. mprphet can use tw different types f decy transitins: Empirical decy transitin grups are included in the measurements. They can be created e.g. by adding a randm integer t the precursr and fragment in masses, respectively. The empirical decy transitin apprach is highly recmmended fr wrkflws that lack reference peptides. The disadvantage f this apprach is that the empirical decy transitin grups have t be added t the transitin list befre the SRM measurements, which increases the amunt f transitins t be measured. Synthetic decy transitin grups are derived pst-acquisitin, by shifting the retentin time f the endgenus transitin peak signals by a factr defined in the parameter file (0.5 is the recmmended value). The synthetic decy apprach is recmmended fr wrkflws that include istpe-labelled internal standards in the samples. The advantage f this apprach is that the decy transitin grups d nt have t be measured. mprphet ffers several pre-defined wrkflws, which reflect the design f the experiment and define specifics f the decy transitin apprach. In this tutrial we will apply the tw mst cmmn appraches t ur case study. Fr an verview f all wrkflws see the table at the end f this tutrial. Wrkflw SPIKE_IN + synthetic decys Parameter file: param_aqua_heavy_stringent_ref_synthetic.def Target transitins measured fr light and heavy peptides Reference peptides are heavy peptides and we assume they are always present Synthetic decy transitin grups are generated frm the data Wrkflw LABEL_FREE + empirical decys Parameter file: param_light_nref.def Target transitins measured fr light peptides N reference peptides Empirical decy transitin grups measured alng with target transitin grups 1

Open the mprphet virtual machine Duble click n the.vmx file t pen the mprphet virtual machine (VM) and lgin if necessary (username: mprphet, passwrd: mprphet). Enable flder sharing in the virtual machine settings Fr the VMware player (Windws) the flder sharing is dne in either f the fllwing ways, depending n the versin: Player à Manage à Virtual machine settings à Optins à Shared flders. Select Always enabled and Add the flder Tutrial- 7_mPrphet. VM à Settings à Optins à Shared flders. Select Always enabled and Add the flder Tutrial-7_mPrphet. Fr VMware Fusin (Mac) the shared flder is added under: Virtual Machine à Settings à Sharing. Turn "ON" the "Shared flders" ptin, click n the "+" buttn, brwse t the flder Tutrial-7_mPrphet and click "Add". On the VM yur shared flder can be accessed thrugh: Places à Cmputer à File Systems à mnt à hgfs à Tutrial-7_mPrphet T adjust the keybard t yur preferred language settings: System à Preferences à Keybard Flder system t brwse files Terminal / Cmmand line Ggle Chrme brwser Change keybard language Exprt transitin infrmatin frm Skyline using a custm reprt frmat On yur laptp (nt VM), pen the file SRMcurse_20130717_iRT.sky frm the flder Tutrial-5_Scheduling. Exprt transitins and ther relevant infrmatin in a custm reprt frmat: File à Exprt à Reprt à Imprt à Select mprphet.skyr à mquest_mrmprphet nw appears in the list. Inspect it by clicking the Preview buttn. One clumn has t be added t this new reprt: Edit list à Select mquest_mrmprphet reprt à Edit à Rename it t mquest_mrmprphet_irt and add the clumn RetentinTimeCalculatrScre à click 2x OK. Save the resulting transitin list as mprphet _20130717_label.csv int the flder mprphet_20130717_label in Tutrial-7_mPrphet. 2

Refrmat Skyline reprt using mgen On the VM, pen the terminal and change t the fllwing directry: cd /mnt/hgfs/tutrial-7_mprphet/mprphet_20130717_label Refrmat line breaks f the transitin list t UNIX frmat: Install the tfrds: sud apt-get install tfrds (pwd is mprphet ). This cmmand requires a wrking internet cnnectin. Run: frmds mprphet_20130717_label.csv Run mgen: mgen.pl -SKY mprphet_20130717_label.csv -num_decys 0 We set 0 decys, because we d nt want t generate decys here. The utput file ends with _s01_decy. On yur laptp (nt VM), pen the newly generated file in Excel and rerganise it such that nly the fllwing clumns remain (headers f the required clumns are given in brackets): 1. Precursr in mass (Q1) 2. Fragment in mass (Q3) 3. Prtein name (prtein_name) 4. Unmdified peptide sequence (stripped_sequence) 5. Precursr charge state (prec_z) 6. Istype f the peptide, e.g. heavy r light (istype) 7. Fragment in type, e.g. b- r y-in (frg_type) 8. Fragment number (frg_nr) 9. Fragment in charge state (frg_z) 10. Relative library intensity f transitins within ne transitin grup, nrmalised t 100. (relative_intensity) 11. Expected retentin time f the peptide: Rename PredictedRetentinTime t Tr_recalibrated. (Tr_recalibrated) 12. irt: Rename RetentinTimeCalculatrScre t irt (irt) 13. Decys: In the label-wrkflw n decys are needed, hence this clumn can be deleted. Otherwise enter 1 fr decys and 0 fr nn-decys. (decy) 14. Delete all ther clumns. Save as tab-delimited file: mprphet_20130717_label_md.txt. - On mac save as MS-DOS txt. Rename ending t.xls: mprphet_20130717_label_md.xls. Refrmat line breaks t UNIX: frmds mprphet_20130717_label_md.xls Delete the ther.csv, xls and.txt files frm the flder. Run mprphet t identify target prteins T run mprphet, make sure yu have all data files (.mzxml), the transitin list (.xls), and the parameter file (.def) in flder mprphet_20130717_label. On the VM, run all three functinalities f mprphet (mmap, mquest and mprphet) tgether using the minteract.pl cmmand: minteract.pl -mmap -mquest -mprphet -wrkflw SPIKE_IN - mmap_machine QTRAP -mmap_cycle_time 2 -def param_aqua_heavy_stringent_ref_synthetic.def Ntes This cmmand takes a while and might give a lt f errr messages. Please wait until it is finished befre yu click anything. Yu can mnitr the prgress by lking at the newly generated files in the flder mprphet_20130717_label. Fr TSQ data, the parameters -mmap_machine and - mmap_cycle_time d nt have t be defined. If yu wuld like t repeat the mprphet analysis n the same data, use the -frce ptin at the end f the minteract cmmand line t enfrce the re-analysis f files that have already been prcessed. 3

Tip! It is better t type these cmmands manually int the terminal rather than t cpy and paste them, t avid incmpatibilities between DOS and UNIX. Tip! T see a list f all pssible parameters fr mprphet, run: minteract.pl -manual Tip! minteract crdinates mmap, mquest and mprphet and allws a ne cmmand analysis. Hwever, the three cmpnents can als be run separately (see mprphet manual n the www.mprphet.rg website fr mre infrmatin). Inspect mprphet results mprphet results are written t the flder mprphet_20130717_label where yu ran the cmmand. The fllwing result files are imprtant: mprphet.pdf This file cntains histgrams f all the different sub-scres and the verall discriminant scre stratified by target and decy peak grups (left figure) as well as the estimated sensitivity and errr rate fr discriminant scre cut-ffs (right figure). à Inspect the separatin between decy and target peak grups and the sensitivity (s-value) and FDR (q-value). The s-value represents the expected prprtin f true psitive peak grups dependent n the discriminant scre cut-ff and the q-value the expected prprtin f false psitives dependent n the discriminant scre cutff. The aim is t define a discriminant scre cut-ff resulting in a high sensitivity at a lw FDR. Nte that the s-value des nt reach 1 because the retentin time peptides are excluded fr the statistical mdels, but later n cunted twards the ttal. mprphet_raw_stat.xls This file prvides the discriminant scre cut-ff fr every pssible FDR. It is recmmended t select a d-scre cut-ff that crrespnds t an FDR f 1%. à In ur case, accrding t mprphet, the FDR f the cmplete dataset withut applying any cut-ff is already very lw (0.2%) and n cut-ff can be applied based n the FDR. mprphet_all_peakgrups.xls This file cntains all the peak-specific scres extracted and calculated frm the SRM traces acrss all the samples. We will need nly a few f the clumns t cntinue (see belw). shtml files These files can be pened frm within a web brwser and allw the visualisatin f all the extracted peak grups fr each SRM run (see next sectin). 4

Visualise the mprphet results On the VM, cpy yur data flders t the VM hme directry: cp -R /mnt/hgfs/tutrial-7_mprphet/mprphet_20130717_label/ //hme/mprphet/mprphet_data_analysis/ There is a space befre the //. D nt change the name f the flders anymre. On the VM, pen Ggle Chrme (link n Desktp) and navigate t yur mprphet results using the fllwing URL: http://lcalhst/mquest-web/mprphet_data_analysis/ Here yu can find fr each f the 9 input data files (mzxml) a crrespnding shtml file, which yu can nw pen and brwse. Click thrugh the peptides f a few samples and inspect the peak picking by lking at the plts and the scres in the table. (Decy peptides are marked by a 1 in the right clumn.) Here are a few pints that yu shuld pay attentin t: Which d-scre cut-ff is apprpriate fr the current data set? At which d-scre d the first decys appear? Fr which precursr was the wrng peak grup picked and why? Cmpare the scres and plts f the secnd and third peak grup by clicking n the peak grup rank in the table. Are there interfered transitins? Prcess mprphet results The mprphet scres fr each precursr can be fund in the file mprphet_all_peakgrups.xls. Befre cntinuing with dwnstream analysis, refine the table in Excel: Remve irt peptides and decys (decy clumn à TRUE). Remve all peak_grup_rank >1 Remve peaks with d-scre lwer than the d-scre cut-ff yu selected frm manual inspectin in the visualisatin step abve. Save as mprphet_all_peakgrups_label.xlsx. 5

mprphet results fr identificatin The m-scre represents the FDR f each identified peak grup. mprphet results fr quantificatin mprphet utputs als cntain all necessary infrmatin fr quantificatin. Yu can either directly extract quantitative infrmatin fr each precursr (e.g. light_heavy_rati_ttalxic) r extract quantitative infrmatin fr each transitin individually (e.g. abs_area_cde_target and abs_area_cde_ref). Dwnstream statistical analysis can be dne e.g. using the sftware SRMstats, which will be discussed in detail tmrrw. Label-free mprphet analysis Repeat the mprphet analysis fr the label-free dataset in the sub-flder mprphet_20130717_label-free. Start with the Skyline file SRMcurse_20130717_label-free_decys.sky frm the flder Tutrial-7_mPrphet which cntains a decy transitin grup fr every target transitin grup. Nte! These decys were added in Skyline thrugh Edit à Refine à Add decy peptides à 30 decy precursrs (Skyline autmatically excludes irt peptides fr decy generatin) à Decy generatin methd: Randm mass shift. We are nt generating these decys urselves in this tutrial, because randm m/z shifts are btained every time decys are generated and thus will nt fit t the acquired data anymre. Next t the m/z f each Precursr and transitin the mass shift f the decy is indicated in brackets. Exprt a reprt using the mquest_mrmprphet_irt reprt frmat: mprphet_20130717_label-free.csv. On the VM, in the terminal change the wrking directry: cd /mnt/hgfs/tutrial-7_mprphet/mprphet_20130717_label-free Cnvert line breaks t UNIX frmat: frmds mprphet_20130717_label-free.csv Run mgen t refrmat it: mgen.pl -SKY mprphet_20130717_label-free.csv -num_decys 0 Mdify the utput as described fr the label-based analysis, but remve the irt clumn and add a decy clumn which cntains 0 fr target peptides and 1 fr decys (indicated in prtein_name clumn). Save as (MS-DOS) tab-delimited file: mprphet_20130717_labelfree_md.txt. Rename ending t.xls: mprphet_20130717_label-free_md.xls. Refrmat t UNIX: frmds mprphet_20130717_label-free_md.xls Delete the.csv and the.txt files frm the flder. Run mprphet: 6

minteract.pl -mmap -mquest -mprphet -wrkflw LABEL_FREE - mmap_machine QTRAP -mmap_cycle_time 2 -def param_light_nref.def Cpy the results t the virtual machine fr visualisatin: cp -R /mnt/hgfs/tutrial-7_mprphet/mprphet_20130717_label-free/ //hme/mprphet/mprphet_data_analysis/ Inspect the peak picking in the shtml-files. Refine the mprphet_all_peakgrups.xls as described abve and save as mprphet_all_peakgrups_label-free.xls. Exercises 1. Which are the scres that mprphet takes int accunt fr the discriminatin f true and false peak grups fr the label-based and the label-free wrkflw, respectively? 2. What is the typical range in delta_irt f the best-scring peak grup? 3. Which d_scre cut-ff wuld yu chse t get gd results fr the label-based and label-free analysis, respectively? Which FDR and sensitivity wuld yu get accrding t the suggested cut-ff (mprphet_raw_stat.xls)? Is the FDR estimated by mprphet what yu wuld expect? Why/why nt? 4. Lk at a few examples where mprphet picked the wrng peak and try t explain why this happened. 5. Hw wuld the results change if yu used as an input fr mprphet a transitin list which has been refined befre in Skyline (i.e. cntaining nly nn-interfered transitins)? Acknwledgements and References The descriptins f this tutrial were adapted frm a manuscript which will sn be published (Surinva et al., Nature Methds 2013, in press). Many thanks t Silvia Surinva and Ruth Hüttenhain fr giving us access t the manuscript! Fr mre infrmatin check the mprphet website (www.mprphet.rg) and the fllwing publicatin: Reiter, L., Rinner, O., Pictti, P., Hüttenhain, R., Beck, M., Brusniak, M.-Y., Hengartner, M.O., and Aebersld, R. (2011). mprphet: autmated data prcessing and statistical validatin fr large-scale SRM experiments. Nature Methds 8, 430 435. We wuld like t thank Prime-XS and SystemsX fr supprting the. 7

mprphet wrkflws and assciated parameter files The fllwing table describes different experimental designs fr SRM experiments and suggests the wrkflw and parameter file required fr the mprphet analysis f SRM data derived frm the different experimental designs (Surinva et al., Nature Prtcls 2013, in press). The parameter file fr all the wrkflws can be fund in the directry /usr/lcal/apps/bignsys/mquest/cnf/ accessible via the mprphet virtual machine. Experimental design Wrkflw Parameter file Target transitins measured fr light Reference peptides are heavy. Empirical decy transitin grups measured fr light peptides. Target transitins measured fr light Reference peptides are heavy. Synthetic decy transitin grups are generated frm the data. Target transitins measured fr light Reference peptides are light. Empirical decy transitin grups measured fr heavy peptides. Target transitins measured fr light Reference peptides are light. Synthetic decy transitins generated frm the data. Target and empirical decy transitins measured nly fr light peptides. Target and empirical decy transitins measured nly fr light peptides. Target and empirical decy transitins measured fr light and heavy peptides. Reference peptides derived frm metablically heavy labeled sample (e.g. SILAC/N15). Target transitins measured fr light Reference peptides derived frm metablically heavy labeled sample (e.g. SILAC/N15). Synthetic decy transitins generated frm the data. Target and empirical decy transitins measured fr light and heavy peptides. Reference peptides derived frm a light (nn-labeled) sample. Target transitins measured fr light Reference peptides derived frm a light (nn-labeled) sample. Synthetic decy transitins generated frm the data. SPIKE_IN SPIKE_IN INVERTED_SPIKE_IN INVERTED_SPIKE_IN LABEL_FREE LABEL_FREE LABEL LABEL LABEL LABEL param_aqua_heavy_stringent_ref.def param_aqua_heavy_stringent_ref_ synthetic.def param_aqua_light_stringent_ref.def param_aqua_light_stringent_ref_ synthetic.def param_light_nref.def param_heavy_nref.def param_silac_heavy_ref.def param_silac_heavy_ref_synthetic.def param_silac_light_ref.def param_silac_light_ref_synthetic.def 8