Tutorial 2: Analysis of DIA data in Skyline

Tutorial 2: Analysis of DIA data in Skyline In this tutorial we will learn how to use Skyline to perform targeted post-acquisition analysis for peptide and inferred protein detection and quantitation using a SWATH-MS dataset acquired on a QqTOF instrument (6600 TripleTOF, AB Sciex) using a 64 variable width window precursor isolation scheme. The data come from the LFQBench study where quantitative benchmarking samples were created by mixing proteomes of 3 organisms in defined ratios. Initially, we will set all the parameters in the Skyline session required to work with dataindependent datasets and then we will proceed to extract the quantitation information from the raw data files. We will import a subset of the peptide query parameters (spectral library) used in the study into a Skyline document [figure adapted from Navarro, P. et al. A multicenter study benchmarks software tools for label-free proteome quantification. Nat Biotech 34, 1130 1136 (2016)] 1. Preparing Skyline session for data-independent acquisition (SWATH) In this section we are going to prepare a Skyline session with the appropriate settings for SWATH analysis. We will tune the settings for data-independent analysis, and then we will extract the peptides of interest from the SWATH files. Open Skyline and select Blank Document. 1.1. Defining the data-independent acquisition settings and isolation scheme Here we will set the parameters for extracting ion chromatograms from both MS1 and SWATH-MS data Go to Settings à Transition Settings à Instrument o Change the Max m/z value to 2000 m/z Go to Full-Scan tab Fill the window as indicated in the screen shot. Note! The resolving power depends on the type and settings of the instrument used for data acquisition. The optimum will be slightly different for each dataset. In this analysis we are using centroided data to save space so we will select centroided and specify a mass accuracy for extraction. With profile mode data the resolving power of the instrument can be specified. 1

Now we need to define a new Isolation scheme according to the parameters defined on the instrument for data-independent acquisition. Note! In our case, we used 64 variable width windows that covered the range from 400 to 1200 m/z with 1 m/z overlap. From Isolation scheme drop-down à select Add à fill the name as DIACourse_SWATH64 Select Prespecified isolation windows and activate Specify Margins A drop down menu will appear under Prespecified isolation windows with Measurement selected. Change this option to Extraction (note: the order is important change this setting to Extraction before pasting the windows borders in the next step). Open the file 64_variable_windows.csv in Excel (from C:\DIA_Course\Tutorial2_Skyline). The first column of the table is the start and the second column is the end m/z of each isolation window. The third colum specifies the margin = 0.5 Note! As the quadrupole transmission windows are not perfectly square, the margin option allows to specify how much of the window edges schould not be used for extraction. Skyline will then extract from start+margin until end-margin Highlight the relevant columns and copy the excel table with ctrl-c and paste it with ctrl-v into the Skyline table for isolation windows. The Edit Isolation Scheme window should now look like this: 2

Click Graph to see how the isolation windows cover the specified range. Click Close Click OK. You should see a message the message below indicating that the isolation windows are overlapping. This is intentional. Click Yes 3

Note: if you reopen the Edit Isolation Scheme dialogue for the isolation scheme we just created, the Measurement settings will be again shown, however, if you change back to Extraction you can see that our settings are preserved. Make sure that the Isolation scheme that you have just created is selected in the Isolation scheme drop down menu in the Transitions Settings Back in in the Full-Scan tab, make sure that the Retention time filtering is set to Use only scans within 5 minutes predicted RT. This is used for setting a 10 minutes window around the predicted RT. Go to Filter tab Fill in the options according to the screenshot Click OK. Save your skyline document as skyline_tutorial_acquisition_scheme.sky in the C:\DIA_Course\Tutorial2_Skyline directory. 1.2. Adding query parameters for the irt peptides A set of 11 synthetic peptides with well characterized chromatographic behavior have been spiked into the samples for the purpose of making a linear regression of the measured retention times of these peptides and the irt values [see Escher, C. et al. Using irt, a normalized retention time for more targeted measurement of peptides. Proteomics 12, 1111 21 (2012)]. This allows us to project the irt values for the target peptides onto the retention time space of the SWATH runs facilitating accurate predictions of their retention times in each SWATH run we will analyze. We will first add the query parameters for the irt peptides to the Skyline document. Open excel and then open the Biognosys_iRT_for_OpenSWATH_6tr.tsv file in excel. This is a tab delimited file, so when opening the file from Excel, you could get a Text Import window asking to specify the way the data is delimited. Select 4

Delimited, click Next >, select the Tab option and click Finish. Notice the headers and format of this file and the type of information contained in the assay library. You will see the information that is stored consists of the mass spectrometric and chromatographic parameters for the irt peptides. This format you generated in tutorial 1 and is used by OpenSWATH [see Schubert, O. T. et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat. Protocols 10, 426 441 (2015)] but can also be read by Skyline. Close excel. In Skyline, click File > Import > Transition List Select the file Biognosys_iRT_for_OpenSWATH_6tr.tsv and click Open. Skyline should ask if you want to create new irt calculator. Click Skip. We will create the irt calculator in a separate step later. Skyline should now ask if you want to create a spectral library from the spectral library intensities. Click Create. You should now see the 11 irt peptides on the left of the document. Click on the first peptide (LGG ) and click on the + beside this peptide to expand. You should see the precursor and fragment ion m/z and the ion type annotations as well as their intensity rank in the spectral library. You should also see the pseudo-spectrum created by Skyline from the fragment ion relative intensities in the Library Match window (if you don t see the spectrum click View > Library Match. There should be 1 protein, 11 peptides, 11 precursors, and 66 transitions (shown in the bottom right corner). 5

1.3. Adding an irt calculator and predictor Go to Settings à Peptide Settings à Prediction Click on the calculator symbol (next to the Retention time Predictor ) o Select Add o Name: Biognosys_iRT_calculator o irt database: click Create à Select your Skyline folder and save the file with the name: Biognosys_iRT_calculator o From the dropdown menu on the right select Biognosys-11 (irt-c18) NOTE! Skyline offers the direct application of some standard irt kits including the Biognosys kit that was used for this experiment. You will see the irt peptides and their predetermined irt values from the Biognosys irt kit appearing in the standard peptides table. o The window should now look like this: o Click OK twice 1.4. Adding peptide query parameters for the target peptides We have randomly selected 29 proteins (10 human, 10 yeast, and 9 e. coli) and included all of the proteotypic peptides mapping to those proteins in the library we are going to use for this analysis. 6

NOTE! This library has been generated from a DIA-Umpire analysis of the SWATH data, which you will learn about in a later tutorial, and according to what you learned in Tutorial 1 and our published method for this [Schubert, O. T. et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat. Protocols 10, 426 441 (2015)]. However, there are other options in Skyline for importing DDA search result data and creating spectral libraries to generate peptide query parameters [more info https://skyline.ms/wiki/home/software/skyline/page.view?name=tutorial_method_edit] Go to File à Import à Transition List and select ecolihumanyeast_30_protein_library.tsv Skyline has realized that this library contains irt values and asks if you want to add them to the irt calculator. Click Add. Skyline has realized that this library contains fragment ion relative intensities and asks if you want to create a (pseudo) spectral library from these. Click Create. NOTE! Importing peptide query parameters using this method can commonly result in errors if the document settings are not correct. If you encounter problems make sure that the modifications are set as they were for the database search ( Settings > Peptide Setting > Modifications ) and that the mass error is set appropriately if measured instead theoretical masses are being used ( Settings > Transition Settings > Instrument > Method match mass tolerance. Other problems can occur if exotic neutral losses or other unannotated fragment ions are encountered by Skyline in the import. Reopen the irt calculator ( Settings à Peptide Settings ; click the calculator symbol and select Edit Current ). Notice that the Measured peptides window has now been populated with the target peptides and their corresponding irt values from the assay library. Click OK twice. Open the Spectral Library Explorer (View à Spectral Libraries). Note that pseudo MS/MS spectra have been created for using the relative fragment ion intensities read from the spectral library. Each pseudo-spectrum contains 6 fragments only because this is what has been selected for this spectral library. Close the Spectral Library Explorer. You should now have 30 proteins, 166 peptides, 196 precursors, and 1,176 transitions 7

1.5. Adding Decoys The peptide query parameters only contains target peptides and proteins. Decoys need to be generated in the next step. Go to Edit à Refine à Add Decoy Peptides o Leave the default number of decoy precursors (155). o Select Shuffle Sequence from the decoy generation method dropdown menu. o Click OK Save your skyline document as skyline_tutorial_query_parameters.sky 2. Performing the SWATH data analysis 2.1. Import and extraction of SWATH data Go to File à Import à Results Select Add single-injection replicates in files Select Files to import simultaneously: Many Select Show chromatograms during import Click OK Select all 6 SWATH files from C:\DIA_Course\Data\DIA_data o lgillet_i150211_008_cent_thresh2.mzxml o lgillet_i150211_009_cent_thresh2.mzxml o lgillet_i150211_010_cent_thresh2.mzxml o lgillet_i150211_011_cent_thresh2.mzxml o lgillet_i150211_012_cent_thresh2.mzxml o lgillet_i150211_013_cent_thresh2.mzxml Click Open Skyline should ask if you want to remove the file name prefix. Click Do not remove. The next tasks in Section 2.2, Sample Annotation, can be performed while the data is importing. NOTE! The SWATH data should start importing and the target and decoy transitions are extracted. This process can take some time (15-20 minutes). For this tutorial we selected data that was already converted from the raw data format to mzxml and centroiding was performed (using msconvert with vendor centroiding). In order to reduce file size for the purpose of the course we also applied an absolute intensity threshold of 2. Using centroided data increases processing speed compared to profile data. Furthermore we use a small peptide query parameter set to make the processing time manageable for the course. In a lot of applications larger peptide query parameter sets are chosen resulting in a longer time for the data extraction process. 8

2.2. Sample Annotation Here we need to define which samples belong to which experimental group In the next step we want to annotate the samples. Go to Settings à Document Settings à Annotations We need to add one annotation to the document. Click Add and enter the values shown below in the Define Annotation window for Condition. In our experiment we have two conditions: Condition A are samples that have chimeric proteome composition of E.coli 20%, yeast 15% and human 65%, and condition B with composition of E.coli 5%, yeast 30% and human 65%. Click OK Back in the Document Settings Annotations tab select Condition Click OK Go to View à Document Grid Select the Views drop-down menu and select Replicates Annotate the samples as shown in the figure below and close the Document Grid window: For easy viewing we will split the data by condition into 2 windows. Go to View à Arrange Graphs à Grouped Group panes = 2, select: Distribute graphs among groups, Display: Tiled, Sort order: Document à OK 9

If the data is still importing at this stage you will need to wait until this is finished to save the document. Save your skyline document as skyline_tutorial_swath_extracted_data.sky 2.3. mprophet In this section, we will apply the mprophet algorithm to the data. The best scoring peak groups are selected and the q-values are calculated to enable FDR control. Go to Edit à Refine à Reintegrate From the drop-down menu of the peak scoring model select: Add Click Train Model and inspect the model score distributions Fill in DIACourse as name The window should look like this (the score distribution and weights might be slightly different depending on the Skyline version or minor difference in extraction parameters): Inspect the target and decoy discriminant score distributions in the Model Scores tab. Click on the Feature Scores tab and then select each of the scores in the table one by one. Notice which scores are best at discriminating between targets and decoys Inspect the P Values and Q Values tabs. Are the majority of the targets detected? Click OK Back in the Reintegrate window select following: 10

Click OK. Save your skyline document as skyline_tutorial_mprophet.sky. 2.4. Inspect the data manually Now we will manually inspect some of the chromatography and underlying spectra Add the peak area and retention time views to your document if they are not present ( View à Retention Times à Replicate Comparison and View à Peak Areas à Replicate Comparison ). If these new windows are floating you can dock them by clicking on the top border of the floating window, holding the left mouse button down, and dragging this window You can also similarly dock the Library Match window as shown below. Similarly, dock the peak area replicate and library match windows so that all information is easily viewable, for example, as shown in the below screen shots. Click on the first E. coli protein 1/sp P63284 CLPB_ECOLI. You should see all of the peptides for this protein shown on the various plots (XIC chromatrography, peak areas, and retention time replicate graphs). The first screenshot is below an example of when one protein is selected and all of the peptides for this 11

protein are summarized in each of these views (except library match window where nothing is shown). From the peak area replicate comparison, does this protein appear to be differentially regulated? If you select this first peptide (AAGATTANITQAIEQMR) in this protein, then you get specific information for this peptide in all of these views. 12

Examine the peak area patterns for the rest of the peptides belonging to this protein. Is the quantitative pattern for the peptides from this protein consistent with the expected differential regulation pattern? NOTE! If there is more than one precursor charge state for a given peptide sequence these are extracted and scored separately. You can look at these by clicking the + next to the peptide sequence and clicking on the individual charge states. Click on the first human protein in the document (1/sp P22314 UBA1_HUMAN). Examine the replicate peak areas from the protein level view and the peptide level view. Are the peak areas consistent with the expected ratios? Click on the first yeast protein (1/tr D3UEZ5 D3UEZ5_YEAS8) in the document. You can navigate to this quickly with Ctrl+F, and typing or pasting the accession number into the Find box. Examine the replicate peak areas from the protein level view and the peptide level view. Are the peak areas consistent with the expected ratios? Go back to the first peptide (AAGATTANITQAIEQMR) of the first E. coli protein and examine the XICs in each of the runs: There seems to be an interference in one of the transitions. Click on each of the transitions for the peptides one-by-one. Notice also that the dot product for this peptide is rather low (0.66). NOTE! The dot product score indicates the agreement between the fragment ion intensities in the library spectrum and peak areas measured in the SWATH data. Click down through the fragment ions one-by-one. The currently selected transition will be highlighted in red in the chromatogram. Notice that the fragment ion with the interference is y8 (the most abundant ion). Hover the mouse over the XIC for the y8 fragment and click on the circle that appears to retrieve the SWATH scan corresponding to the precursor isolation range in which this peptide falls. You can see the isotopic pattern for this fragment with the extraction window for the XIC highlighted in yellow. If you click the left arrow in the top right of the window a several times the scans for previous cycles will be displayed. Notice that there is an interfering isotopic pattern in a few cycles earlier (RT of 79.81 min) where the second 13

isotopologue from the interfering species falls into the extraction window for this transition. Click on the magnifying glass symbol at the top right of the window to zoom out. You can see the full range of the SWATH spectrum with the extraction windows for all transitions highlighted. Zoom in and look at the other fragment ions. 14

Close the spectrum viewer, right-click on the y8 fragment ion and select delete to remove this transition. The orange trace should disappear from the chromatography. Did the dot product change? In the peak area comparison does the relative ratios of the fragment ions compare better with the expected ratios in the library spectrum? Did the peak areas change and the ratio between them change? NOTE! You can toggle between the states with and without the y8 fragment using undo/redo via Ctrl+Z and Ctrl+Y to inspect these issues J. Explore the data further manually (including some decoys). 2.5. Mass and retention time deviation We can examine the mass accuracy and retention time prediction accuracy to determine if the optimal extraction parameters have been used. Click on View à Mass errors à Replicate Comparison Click on a few peptides and inspect the mass errors. Close the mass error plot. Click on View à Mass errors à Peptide Comparison. Right click on the x-axis labels of the plot and select Order à Peak Areas which orders the peptides by abundance from left to right (with the highest abundance on the left). What is the relationship between peptide signal intensity and mass error? Click on View à Mass errors à Histogram to the see the distribution of mass errors over the data set. Could the extraction window (±20 ppm) have been further optimized? Click on View à Retention Times à Regression à Score to Run to the see the linear regression used to predict the target peptide retention times based on the irt peptides and library irt values from the target peptides. Right click on the graph and select Plot à Residuals to see the deviations from the predicted retention times in this data set. Could the extraction window (± 5 minutes) have been further optimized for this analysis? NOTE! As mentioned in the introduction, the spectral library for this analysis was constructed from a DIA-Umpire analysis of the same files. As such, the accuracy of the retention time predictions are very good. Retention times from external spectral libraries acquired on different instruments, at different times, from different samples would lead to larger errors in these predictions. 15

Save the files as skyline_tutorial_inspected.sky 2.6. Quantitative comparison Go to View à Group Comparison à Add and fill out the parameters of the group comparison as in the screenshot and click OK (you need to click Advanced to see lower options). Go to View à Group Comparison à species_comparison A table should appear that shows the peptide level fold-change and adjusted p-value for the comparison between the A and B groups. Expand the width of the Protein column header so that you can see the full protein names including from which species they came. Right-click on the header of the Fold Change Result column and click on Sort Ascending Inspect the fold changes produced for some of the peptides in the table keeping in mind which species they are from and the expected ratios. What about the adjusted p-values? 16

Click on Show Graph in the top left corner of the table. A bar graph of the fold changes for these peptides is shown. You can highlight certain peptides in the graph by clicking on various peptides in the Targets window of the Skyline document (so that you can confirm which species various peptides in the graph are from). Notice that there are only 138 peptides in the grid (and graph), but we imported 166 target peptides into the Skyline document in section 1.3. Where are the rest? Go back to the group comparison grid by clicking the species_comparison:grid tab in the lower left corner of the window. Click Change Settings at the top left of grid to re-open the Edit Group Comparison window. Check the Use zero for missing peaks box. Notice that the Group Comparison:Grid has now updated and there are 166 peptides. Click on show graph again to see how fill in these quantitative values with zeroes has affected the distribution of fold changes. (Note: if the error bars disappear you can resize the Graph window to get them back again) 17

Go back to the group comparison grid by clicking the species_comparison:grid tab in the lower left corner of the window. Click Change Settings at the top left of grid to re-open the Edit Group Comparison window. Change the Scope option from Peptide to Protein and close the dialogue box. Click on Show Graph to see the protein level fold change. Now you can see from which species the proteins are from in the x-axis labels. Do the measured ratios fit the expected ratios well? Save the document as skyline_tutorial_group_comparison.sky 2.7. Comparing MS1 and MS2 data Until now we have only been looking at extracted ion chromatograms from MS2 data (i.e. fragment ions). Now we will compare the signals from MS1 XICs and MS2 XICs 18

Click on Settings à Transition Settings à Filter and add p to the ion types (indicates precursor). Now click on the library tab and type 6 in the Pick: product ions box. Click OK Add the precursor ion XICs by clicking Edit à Refine à Advanced and clicking on Auto-select all: Transitions. This will add the precursors ions for the first 3 isotopologues into the document and the XICs for those ions will now be visible in the chromatography. Click on the first peptide in the document (AAGATTANITQAIEQMR). The MS1 XICs from the precursor ions will now be overlaid with the SWATH MS2 level fragment ion XICs from the fragment ions. In general, the precursor ions have a much higher absolute intensity than the fragment ions. As such, it is usually easier to view these in separate graphs (View à Transitions à Split Graph). Now the precursor XICs from the MS1 scans are displayed in the upper graph and the fragment ion XICs from the SWATH scans are displayed in the lower graph. 19

Browse through some peptides and compare the signals coming from the MS1 XICs and the SWATH MS2 XICs. Is there a difference in the selectivity between the MS1 and SWATH MS2 data? Search (using ctrl-f) for the peptide ALPAVQQNNLDEDLIRK and examine the difference between the MS1 and MS2 data. Can you find cases where the SWATH MS2 data is higher quality (i.e. better selectivity) than the MS1 data similar to peptide above? Can you find any cases where it is the opposite? Save the document as skyline_tutorial_ms1_comparison.sky We would like to thank SystemsX for supporting the Zurich DIA / SWATH Course 2017. 20