mzmatch Excel Template Tutorial

Similar documents
Tutorial 2: Analysis of DIA data in Skyline

Compounding insights Thermo Scientific Compound Discoverer Software

Skyline Small Molecule Targets

Appendix B Microsoft Office Specialist exam objectives maps

TUTORIAL EXERCISES WITH ANSWERS

Agilent TOF Screening & Impurity Profiling Julie Cichelli, PhD LC/MS Small Molecule Workshop Dec 6, 2012

Agilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis

MassHunter TOF/QTOF Users Meeting

Application Note. Authors. Abstract. Introduction. Environmental

ProMass Deconvolution User Training. Novatia LLC January, 2013

Agilent MassHunter Quantitative Data Analysis

SRM assay generation and data analysis in Skyline

Agilent MassHunter Quantitative Data Analysis

LC-MS. Pre-processing (xcms) W4M Core Team. 22/09/2015 v 1.0.0

MassHunter Software Overview

Making Sense of Differences in LCMS Data: Integrated Tools

HSC Chemistry 7.0 User's Guide

MetWorks Metabolite Identification Software

All Ions MS/MS: Targeted Screening and Quantitation Using Agilent TOF and Q-TOF LC/MS Systems

ST-Links. SpatialKit. Version 3.0.x. For ArcMap. ArcMap Extension for Directly Connecting to Spatial Databases. ST-Links Corporation.

Conformational Analysis of n-butane

ncounter PlexSet Data Analysis Guidelines

41. Sim Reactions Example

Comparing whole genomes

MAGNETITE OXIDATION EXAMPLE

Life Cycle of Stars. Photometry of star clusters with SalsaJ. Authors: Daniel Duggan & Sarah Roberts

Advanced Forecast. For MAX TM. Users Manual

Downloading GPS Waypoints

OECD QSAR Toolbox v.4.1. Step-by-step example for building QSAR model

What s New in NIST11 (April 3, 2011)

Designing a Quilt with GIMP 2011

Metabolomics in an Identity Crisis? Am I a Feature or a Compound? The world leader in serving science

NMR Data workup using NUTS

Star Cluster Photometry and the H-R Diagram

Chemistry 14CL. Worksheet for the Molecular Modeling Workshop. (Revised FULL Version 2012 J.W. Pang) (Modified A. A. Russell)

Data Mining with the PDF-4 Databases. FeO Non-stoichiometric Oxides

HOWTO, example workflow and data files. (Version )

Agilent All Ions MS/MS

Technical Procedure for Glass Refractive Index Measurement System 3 (GRIM 3)

Creation and modification of a geological model Program: Stratigraphy

Photometry of Supernovae with Makali i

Moving into the information age: From records to Google Earth

ICM-Chemist How-To Guide. Version 3.6-1g Last Updated 12/01/2009

Information Dependent Acquisition (IDA) 1

Comprehensive support for quantitation

Cerno Bioscience MassWorks: Acquiring Calibration Data on Agilent GC/MSDs

Tests for Two Coefficient Alphas

The Rain in Spain - Tableau Public Workbook

Welcome! Course 7: Concepts for LC-MS

Creating Empirical Calibrations

WindNinja Tutorial 3: Point Initialization

Die Nadel im Heuhaufen

Cerno Application Note Extending the Limits of Mass Spectrometry

A Scientific Model for Free Fall.

X!TandemPipeline (Myosine Anabolisée) validating, filtering and grouping MSMS identifications

Last updated: Copyright

Space Objects. Section. When you finish this section, you should understand the following:

Exercises for Windows

NEW TOOLS FOR FINDING AND IDENTIFYING METABOLITES IN A METABOLOMICS WORKFLOW

Titrator 3.0 Tutorial: Calcite precipitation

How to Make or Plot a Graph or Chart in Excel

Exercise in gas chromatography

Spectronaut Pulsar. User Manual

OECD QSAR Toolbox v.4.1. Step-by-step example for predicting skin sensitization accounting for abiotic activation of chemicals

ON SITE SYSTEMS Chemical Safety Assistant

V. LAB REPORT. PART I. ICP-AES (section IVA)

M61 1 M61.1 PC COMPUTER ASSISTED DETERMINATION OF ANGULAR ACCELERATION USING TORQUE AND MOMENT OF INERTIA

profileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research

Lab 1 Uniform Motion - Graphing and Analyzing Motion

Agilent METLIN Personal Metabolite Database and Library MORE CONFIDENCE IN COMPOUND IDENTIFICATION

Application Note LCMS-116 What are we eating? MetaboScape Software; Enabling the De-replication and Identification of Unknowns in Food Metabolomics

Calculating Bond Enthalpies of the Hydrides

Virtual Beach Building a GBM Model

GAS CHROMATOGRAPHY MASS SPECTROMETRY. Pre-Lab Questions

USGS Troy WSC Laboratory Ion Chromatography SOP EPA 300.0, Rev Jordan Road SOP. No. 1 Rev. No. 1.7 Troy, NY Date 01/25/18 Page 1 of 7

Computer simulation of radioactive decay

Understanding Your Spectra Module. Agilent OpenLAB CDS ChemStation Edition

Tutorial on Visual Minteq 2.30 operation and input/output for simple problems related to acid/base ph and titrations.

1 Introduction to Minitab

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Handling Human Interpreted Analytical Data. Workflows for Pharmaceutical R&D. Presented by Peter Russell

Virtual Beach Making Nowcast Predictions

Watershed Modeling Orange County Hydrology Using GIS Data

Search for the Gulf of Carpentaria in the remap search bar:

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options for grouping with metabolism

Stefan Koch, Christoph Bueschl, Maria Doppler, Alexandra Simader, Jacqueline Meng-Reiterer, Marc Lemmens and Rainer Schuhmacher

APPEND AND MERGE. Colorado average snowfall amounts - Aggregated by County and Month

Lab #10 Atomic Radius Rubric o Missing 1 out of 4 o Missing 2 out of 4 o Missing 3 out of 4

LAB 2 - ONE DIMENSIONAL MOTION

FireFamilyPlus Version 5.0

Waters GPC User Guide and Tutorial for Using the GPC in the Reynolds Research Group 2 nd Edition: April 2012

Passing-Bablok Regression for Method Comparison

Bioanalytical Chem: 4590: LC-MSMS of analgesics LC-MS Experiment Liquid Chromatography Mass Spectrometry (LC/MS)

Application Note 12: Fully Automated Compound Screening and Verification Using Spinsolve and MestReNova

De Novo Metabolite Chemical Structure Determination. Paul R. West Ph.D. Stemina Biomarker Discovery, Inc.

Analysis of 2x2 Cross-Over Designs using T-Tests

Measuring ph with Smart Cuvettes

ATLAS of Biochemistry

GAS CHROMATOGRAPHY (GC)

In order to save time, the following data files have already been preloaded to the computer (most likely under c:\odmc2012\data\)

Transcription:

mzmatch Excel Template Tutorial

Installation & Requirements Installation The template may be used to process mzmatch output text files without additional installations or add-ins. Microsoft Excel 2007 required (2003 not sufficient, 2010 not tested) Requirements for full function R Statistical Software : for mzmatch pre-processing R packages: XCMS (BioC), mzmatch.r (Rforge), rjava and XML (CRAN) R package: rcdk : for FormulaGenerator Firefox or Internet Explorer : for Hyperlinks to online databases Thermo Xcalibur : for EIC lookup ReAdW : for conversion of.raw to.mzxml files If you wish to use R and Xcalibur links: Open the template and update cells D44 and D45 (on the Settings sheet) to the relevant paths on your computer

Data Pre-processing Step 1 - Setup Open mzmatch_template.xltm and SaveAs yourfile.xlsm (Macro enabled workbook) Go to the Settings sheet Update cells D44 and D45 (on the Settings sheet) to the relevant paths on your computer Step 2 - Convert RAW files to centroided mzxml files Save a copy of ReAdW.exe into the folder with your RAW data Click Convert RAW to mzxml files to run conversion

Data Pre-processing Step 3 If files are from Exactive, split into Pos and Neg using the Blue button Step 4 Select positive or negative mode in cell K1 (only process one mode at a time) For each polarity: sort replicate.mzxml files into folders according their experimental groups (sets). Check over the blue-shaded settings for mass, RT and Relatedpeaks windows, and RSD filter. (xcms parameters can be changed in the macro) Run xcms/mzmatch with the purple Combined Button NOTE: Files must be sorted into sets (folders) to run RSD filter NOTE: If xcms crashes in negative mode try selecting mzdata alt method in cell K2 mzmatch output files will be saved in the folder with your files

Peak data import Step 1 Import mzmatch output file combined_related.txt using the big Red button (Settings sheet) Manually check that replicate samples are in adjacent columns (if not, get cutting and pasting!) Step 2 On the Settings sheet, enter the number of replicates in each set (column F) NOTE: if you have named samples with set prefixes, the next Green button will do this for you Choose the Set-Type for each set using drop-down options in column C NOTE: hover mouse over cell C8 for more information

Update metabolite DB Step 1 Externally, prepare a list of actual retention times for authentic standards analysed under your current chromatographic conditions. Any excel-readable file with name, RT and mass (optional) in columns can be directly imported. ToxID is good for this. Step 2 NOTE: names must exactly match those in DB. (except that, can be replaced by _ ) Select the Rtcalculator sheet Enter the dead-volume time for your chromatographic column (cell O9) Scroll to the right and manually update expected retention times for given Pathways, Maps and Properties (if known) (optional) enter metabolite names and RT s for authentic standards in columns A:B and W:X NOTE: These can be entered automatically from an external excel/tsv/csv file in step 3

Update metabolite DB Step 3 Run the Update Retention Times in DB macro from either Settings or Rtcalculator sheet If the prediction model looks good (ie r 2 > 0.6), agree to update RT s in DB, otherwise try altering the variables (cells E1:J1) to suit your chromatography, and re-run the macro Step 4 (optional) If you have a species-specific database (eg. From metacyc or KEGG) enter these annotations in column G ( PreferredDB ) of the DB sheet. NOTE: This can be simplified by matching database identifiers using Excel s Vlookup function Select the entire database and Custom Sort: sort by searchmass (ascending) then by PreferredDB (ascending) to ensure annotated metabolites are at the top of the list of each group of isomers.

Run Metabolite Identification Step 1 On the Settings sheet, check over the settings in columns F and I are suitable Most commonly changed settings are: Identification RT windows (F3 and F4) and mass window (F6) RT window for duplicate peaks (I9) MaxIntensity cutoff (I10) Select the adducts (cells K15:K21) that you wish to include in the identification search Step 2 Click Run Identification Macro on the Settings sheet This could take from 2 to 20 minutes Save the file as soon as the macro is finished

Metabolite Identification: Process Metabolite Identification Macro This macro annotates information to every peak in the alldata sheet Apon completion, all basepeaks are copied to the allbasepeaks sheet All identifications with confidence < 5 are copied to the notlikely sheet All identification with confidence => 5 are copied to the identification sheet The identifications sheet is then checked for duplicates and shoulder peaks, and these are moved to the notlikely sheet

Metabolite Identification: Process Peak Information columns A: neutral exact mass (from mzmatch) B: Retention Time (from mzmatch) in minutes C: Formula from DB with closest match to mass (if within ppm window) D: Number of isomers in DB with this exact formula E: Metabolite name: best match from DB for this mass and RT F: Confidence level according to parameters on settings sheet G: Records whether the metabolite is in a preferred database (from DB) H: Map: the general area of metabolism for this metabolite (usually from KEGG) NOTE: column H can be changed by choosing a different header in cell H1 I: mass error (in ppm) from nearest match in DB (if within 2 x ppm window) J: RT error relative to authentic standard (white) or predicted RT (grey) as % of RT K: altppm: mass error for the next closest mass in the DB (if within ppm window) L: Sig: records which sample sets are significant (peaks > blank and RSD < window)

Metabolite Identification: Process Peak Information cont. M: BP: Basepeak for that peak N: Mzdiff: mass difference between this peak and the basepeak For basepeaks this column records common adducts/fragments/isotopes that were found O: relation.ship: relationship to the basepeak (according to mzmatch) P: addfrag: common adduct, fragment or neutral-loss Q: % error of C13-isotope intensity from theoretical R: % error of isotope intensity from theoretical for (Cl, S, N, O or H) S: RSD for QC samples (or for Treatment if no QC) T: minimum RSD for all included sample sets U: maximum intensity from all included sets V: Relation.id (from mzmatch) W: Peak Intensity ratio for mean of treatments vs mean of controls X: P-value for unpaired T-test between treatments and controls Y: Adduct of formula match to mass (ie H, Na, double-charge, etc) Z: Polarity AA: Number of detected peaks in included sets

Re-calibrate mass accuracy Step 1 On the Settings or Identification sheet, click the ppm check button If the polynomial curve looks like a good fit, agree to re-calibrate masses, otherwise, investigate the mass calibration manually Step 2 Sort the identification sheet by ppm error (use the blue sort button) Remove metabolites with large errors (>1.5 ppm) by cut/paste to the notlikely sheet NOTE: easiest to manually annotate all mis-annotated peaks (in column F), re-sort and move them all at once NOTE: delete rows that have been removed (even if they appear empty) to speed up processing Double-check the altppm column for alternate identifications before you remove peaks

Manual Data Filtration Step 1 recover false rejections Go to the notlikely sheet, check for false rejections, particularly with confidence of 4. (technical judgement required) Cut/paste false rejections onto the identification sheet Step 2 manual filtration On the Identification sheet, check for false positives and move to notlikely sheet by cut/paste, or by the remove row button Press the colouring button to make interpretation easier Press the hyperlink button to activate weblinks Use the Sort functions, info-boxes, graphs and hyperlinks to assist (columns B,D,K,L,W) Step 3 manual identification On the Identification sheet, check for duplicate identifications, and choose alternative isomers where appropriate

Manual Data Filtration Manual Filtration: suggested process 1. Related Peaks (mass difference, neutral loss) 2. Retention Time limits (min, max, %error) 3. Adduct likelihood (2+ or Na+) 4. Isomers (split peaks, duplicate identifications) 5. Isotopic abundance (C13 isotope, other unique isotopes) 6. Peak shape (check chromatogram if codadw < 0.95) 7. Biological likelihood (related pathways, common contaminants)

Biological data analysis Step 1 If you have exactive pos/neg data, run the Combine Pos/Neg function after processing each set individually Step 2 Run the Intensity comparison macro from the Identification sheet or settings sheet by clicking Compare All Sets. This calculates mean and SD for each set and compares each set to the designated control group (relative intensity and t-test). Step 3 In the Comparison sheet, sort data by your column of interest: Relative intensity vs control P-value (t-test) vs control Metabolite Map or KEGG Pathway Use buttons at the top to plot graphs or export to motif/metexplore

Multivariate analysis Step 1 This template doesn t incorporate functionality for multivariate analysis, use the light blue Export button to export either allbasepeaks or Identifications to Metaboanalyst, or R/matlab/etc for further analysis Step 2 If you wish to analyse all Basepeaks, run the assign Basepeaks macro to help with annotation Step 3 Unidentified masses can be investigated by clicking the empty formula (C) cell this will run FormulaGenerator in R

Other Features Additional Macros: Isotope Search for untargeted metabolic labelling studies C13, N15 and O18 supported Combine Datasets combines negative and positive data (from same column) Formula Generator Identify formulae for unknown masses (uses rcdk) Checks validity of formulae against Fiehn s Golden Rules

Other Features Additional Functions (Excel formula s): FormulaMatch looks up a mass in the database ExactMass calculates exact mass of a formula PPMcalc calculates the mass error from a given mass or formula IsotopeAbundance Calculates the theoretical isotopic abundance for a given atom in a formula FormulaValid checks formula validity against 5 Golden Rules AtomCount returns the number of specified atoms in a formula Pos calculates the positive charge at a given ph (given # cations & basic pka s) Neg - calculates the negative charge at a given ph (given # anions & acidic pka s)

FAQ WHERE TO START... which sheet? All automated functions can be run from the Settings sheet. After automated filtration and identification you can do manual curation on the identification sheet, including the mass re-calibration. Additional metabolites can be retrieved from the notlikely (or allbasepeaks ) sheet simply by using the cut/paste functions in Excel; it is recommended to cut/paste whole rows rather than individual cells. The easiest approach for meaningful biochemical analysis is to run the Compare all function and sort the Comparison sheet according to your interests. Additional columns (eg. stats, normalised intensities, other information) can always be added to the right of the existing data without affecting macro performance. POLARITY: The polarity is automatically corrected by mzmatch.r during the peak picking process, and all masses that appear in the Template are corrected neutral masses. Ensure that you set the correct 'polarity' option on the 'settings' sheet before running anything. The polarity setting is also useful for combining positive and negative mode data, and for the quicklink to Xcalibur qualbrowser EICs. (i.e. whether to add or subtract a proton to get from neutral mass back to m/z). Note: Due to the automatic polarity correction by mzmatch, the masses of cations in the database have been corrected by one proton. (eg. The mass of choline in the DB is 103, rather than actual mass of 104).

FAQ WHICH FILE TO USE FOR THE RETENTION TIME UPDATER? You need to manually generate a list of retention times for authentic standards under the current LC conditions. The simplest way is to use Toxid (or similar), otherwise do it manually from raw data. The retention time updater has been tested on Toxid.csv output files. However it should work for any excelreadable file that has a column for metabolite names and a column for retention times. (Note: the metabolite name must be identical to the name in the database - the only exception is that underscore "_" may be used in the place of comma "," to avoid issues with.csv files). IF IT RUNS SLOWLY? The peak-picking process in XCMS is quite slow, this can be left to run overnight if you have many samples. The speed of mzmatch.r functions and Excel macros will depend on the number of samples, number of detected peaks, and your computer speed. Speed can be improved by applying tighter filters earlier in the process (eg. Peak picking parameters and RSD filter), however this may cause loss of some peaks of interest. Visualisation of results in Excel can be slow if there are many active formulas. Try turning automatic calculation off, de-activating Hyperlinks, or running the Trim file size macro.

Any Further Questions/Ideas mzmatch information available at: Mzmatch.sourceforge.net Xcms information available at: metlin.scripps.edu/xcms/ Information about this mzmatch template available directly from: Dr Darren Creek University of Glasgow darrencreek@gmail.com Darren.creek@glasgow.ac.uk