BIOINF 4120 Bioinforma2cs 2 - Structures and Systems -

Similar documents
SeqAn and OpenMS Integration Workshop. Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI)

Quan%ta%on with XPRESS. and. ASAPRa%o

Introduction into Selected Reaction Monitoring (SRM) Christina Ludwig

Isotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics

Computational Methods for Mass Spectrometry Proteomics

Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Systems Biology Exp. Methods

PC235: 2008 Lecture 5: Quantitation. Arnold Falick

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

6 x 5 Ways to Ensure Your LC-MS/MS is Healthy

Comprehensive support for quantitation

TOMAHAQ Method Construction

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

MS-based proteomics to investigate proteins and their modifications

Chemistry Jeopardy. Method valida'on QA/QC. Internal Standard

Electrospray ionization mass spectrometry (ESI-

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Data pre-processing in liquid chromatography mass spectrometry-based proteomics

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

WADA Technical Document TD2015IDCR

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

From mass to compound iden3ty

Quantitative Proteomics

Modeling Mass Spectrometry-Based Protein Analysis

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Streaming - 2. Bloom Filters, Distinct Item counting, Computing moments. credits:

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas

Mass spectrometry-based proteomics has become

SRM assay generation and data analysis in Skyline

QTOF-based proteomics and metabolomics for the agro-food chain.

Isotope Dilution Mass Spectrometry

Making Sense of Differences in LCMS Data: Integrated Tools

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011

SILAC and TMT. IDeA National Resource for Proteomics Workshop for Graduate Students and Post-docs Renny Lan 5/18/2017

NANOLCMS SOLUTIONS HPLC BASICS

TUTORIAL EXERCISES WITH ANSWERS

Statistical analysis of isobaric-labeled mass spectrometry data

Aplicació de la proteòmica a la cerca de Biomarcadors proteics Barcelona, 08 de Juny 2010

A Software Suite for the Generation and Comparison of Peptide Arrays from Sets. of Data Collected by Liquid Chromatography-Mass Spectrometry

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1

Planning and Analyzing WFIRST Grism Observa:ons

WADA Technical Document TD2003IDCR

MassHunter TOF/QTOF Users Meeting

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples p.1

Tutorial 1: Setting up your Skyline document

Monday (March 28)- Mass spectrometry Tuesday (March 29)- Experiment 6: Separa>ons ICP-MS calcula>ons

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Multi-residue analysis of pesticides by GC-HRMS

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring

LECTURE-11. Hybrid MS Configurations HANDOUT. As discussed in our previous lecture, mass spectrometry is by far the most versatile

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4

HR/AM Targeted Peptide Quantification on a Q Exactive MS: A Unique Combination of High Selectivity, High Sensitivity, and High Throughput

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Simula'ons as a tool for higher mass resolu'on spectrometer: Lessons from exis,ng observa,ons

Bias/variance tradeoff, Model assessment and selec+on

Introduction. Chapter 1. Learning Objectives

Calibration in Proteomics. Proteomics 202 :: Practical Proteomics Using the Skyline Software Ecosystem Lindsay K. Pino Monday, Jan 22

Quantitative Proteomics

LC-MS. Pre-processing (xcms) W4M Core Team. 22/09/2015 v 1.0.0

Scin/lla/on of liquid neon Photon Detec/on at 27 K

Biological Mass Spectrometry

A d. Par$cle size and the rate of dissolu$on. Consider the surface of the fixed amount of compound as the func$on of linear microcrystal

(Refer Slide Time 00:09) (Refer Slide Time 00:13)

Mass spectrometry based proteomics (1)

Radia%ve B decays at LHCb

Reagents. Affinity Tag (Biotin) Acid Cleavage Site. Figure 1. Cleavable ICAT Reagent Structure.

Tandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software

Analysis of Labeled and Non-Labeled Proteomic Data Using Progenesis QI for Proteomics

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

Chemistry Instrumental Analysis Lecture 37. Chem 4631

profileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

for the Novice Mass Spectrometry (^>, John Greaves and John Roboz yc**' CRC Press J Taylor & Francis Group Boca Raton London New York

All Ions MS/MS: Targeted Screening and Quantitation Using Agilent TOF and Q-TOF LC/MS Systems

Designed for Accuracy. Innovation with Integrity. High resolution quantitative proteomics LC-MS

Sta$s$cs for Genomics ( )

Methods for proteome analysis of obesity (Adipose tissue)

Robert Crampton Ph.D Brent Olive PH.D, Don Gamelis PH.D. Argos Scien7fic

Chem 250 Unit 1 Proteomics by Mass Spectrometry

Choosing the metabolomics platform

The Power of LC MALDI: Identification of Proteins by LC MALDI MS/MS Using the Applied Biosystems 4700 Proteomics Analyzer with TOF/TOF Optics

Microarray Preprocessing

Unsupervised Learning: K- Means & PCA

Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu

Protein analysis using mass spectrometry

Features or compounds? A data reduction strategy for untargeted metabolomics to generate meaningful data

ALIGNMENT OF LC-MS DATA USING PEPTIDE FEATURES. A Thesis XINCHENG TANG

De Novo Metabolite Chemical Structure Determination. Paul R. West Ph.D. Stemina Biomarker Discovery, Inc.

A Brief History of EPA s Criteria for GC/MS Tuning Compounds. Harry McCarty and Kevin Roberts CSC

TANDEM MASS SPECTROSCOPY

Ensemble of Climate Models

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent

An Effective Workflow for Impurity Analysis Incorporating High Quality HRAM LCMS & MSMS with Intelligent Automated Data Mining

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra. Andrew Keller

Surface Ioniza.on on Metal Oxide Gas Sensors

Introduction to Chemical Research (CHEM 294) Introduction to Chemical Research (CHEM 294)

Transcription:

BIOINF 4120 Bioinforma2cs 2 - Structures and Systems - Oliver Kohlbacher Summer 2013 16. Quan0ta0ve Proteomics Overview LC- MS- based proteomics - defini0on of maps and features Quan0fica0on approaches Labeled quan0fica0on Label- free quan0fica0on Algorithmic problems Feature finding Map alignment Examples 2 LC- MS- Based Proteomics Quan0ta0ve proteomics tries to measure the expression level of all proteins (as many as possible) in a sample Problems Sensi0vity of MS makes detec0on of low- abundance proteins difficult MS signal intensity is propor0onal to pep0de concentra0on, but factor varies from pep0de to pep0de! No absolute quan0fica0on from the signal alone Complexity of sample makes separa0on difficult Datasets tend to get huge (up to hundreds of GB per sample), so data analysis is difficult 3 1

Shotgun Proteomics Proteins digestion A L E L F R H P N D M A A K G A S E D I P V K D L K F G G H P E T L E S E D E M K H K A K D K V E L F A K H L K K S A Y K L Q D V A G M H M K G W I L Q G G Q E E G V G V G A E L G F Q G V L N G Q I K M R G L L I M S L S W V I D G E Q L F D K F K A K L T A E V G H H E A E L T P L A Q S H A T K S T H N G I Y L E F Peptidedigest Separation Key ideas Separa0on of whole proteins possible but difficult, hence diges0on preferred Usually: trypsin cuts axer K and R and ensures pep0des suitable for MS (posi0ve charge at the end) Separate pep0des; this is easy Iden0fy proteins through pep0des K Y K F K H K H L K F D K L F K I P V K A L E L F R S E D E M K N D M A A K A S E D L K E L G F Q G G H P E T L E H P G D F G A D A Q G A M S K V E A D V A G H G Q E V L K Y L E F I S E A I I Q V L Q S K G H H E A E L T P A Q S H A T K M G L S D G E W Q L V L N V W G K I R 4 HPLC- MS Analysis I HPLC ESI TOF Spectrum (scan) RT Separation 1 Different peptides have different retention time Ionization Peptide receives z charge units Separation 2 Detector measures 5 Orbitrap analyzer Introduc0on Intensity mass / charge 6 2

LC- MS Data (Map) 7 LC- MS Data (Map) 8 LC- MS Data (Map) 9 3

LC- MS Data (Map) 10 LC- MS Data (Map) 11 LC- MS Data (Map) 12 4

LC- MS Data (Map) 13 LC- MS Data (Map) 14 LC- MS Data (Map) 15 5

LC- MS Data (Map) 16 LC- MS Data (Map) 17 LC- MS Data (Map) 18 6

LC- MS Data (Map) Identification (EVAAFAQFGSDLDASTK) 0 250 500 750 1000 Quantification (15 nmol/µl, 3x overexpressed, ) 19 Quan2fica2on Key problem Detector signal is propor0onal to pep0de concentra0on Constant factor varies from pep0de to pep0de! Hence, no correla0on between absolute signal intensity and absolute concentra0on Reason Different ioniza0on/flight behavior of different pep0des Consequences Rela0ve quan0fica0on possible between two samples Absolute quan0fica0on requires external standard for calibra0on 20 Differen2al Analysis Two basic approaches: Labeling (e.g., SILAC, itraq, ) Label-Free Quantification (LFQ) State 1 Proteins w/ heavy label Mix Fractionate Digest Isolate Healthy State 2 Proteins w/ light label diseased Nat. Biotechnol. 17: 994-999,1999 21 7

SILAC SILAC Stable Isotope Labeling with Amino Acids in Cell Culture Introduce stable labels by feeding labeled amino acids to the cell culture Labels will be integrated into all proteins axer a reasonable amount of 0me Mix and compare with an unlabeled sample Tryp0c digest ensures that each pep0de contains (with some excep0ons) exactly one K/R! Pep0des with heavy and light label are otherwise iden0cal and coelute Spectra contain isotope paherns for both heavy and light pep0des light heavy SILAC pair with charge 2 and approximately a 1:1 ratio (unpeurbed) 22 SILAC Stable Isotope Labeling with Amino Acids in Cell Culture 3 Mumby, Brekken, Genome Biol (2005), 6:230 23 Isobaric Labeling http://en.wikipedia.org/wiki/file:isobaric_labeling.png [accessed 19.11.11, 19:48 CET] 24 8

Isobaric Labeling Idea Label the different samples with labels of the same mass (isobaric) Design the label in a way that they fragment differently upon collision- induced dissocia0on MS 2 spectra will then contain repoer ions Quan0fica0on and iden0fica0on are then both based on tandem spectra only Key method: itraq isobaric tags for rela2ve and absolute quan2fica2on Based on covalent modifica0on of N- terminus of pep0des Labeling performed axer diges0on (also applicable to clinical samples) Kits available for 4 or 8 dis0nct labels ( quadruplex, octoplex ) 25 itraq Ross et al., Mol Cell Prot (2004), 3, 1154-1169. 26 itraq Ross et al., Mol Cell Prot (2004), 3, 1154-1169. 27 9

Differen2al Analysis Two basic approaches: Labeling (e.g., SILAC, itraq) Label-Free Quantification (LFQ) Map 1 ( healthy ) Map 2 ( diseased )? 28 Label- Free Quan2fica2on (LFQ) Label- free quan0fica0on is probably the most natural way of quan0fying No labeling required, removing fuher sources of error, no restric0on on sample genera0on, cheap Data on different samples acquired in different measurements higher reproducibility needed Manual analysis difficult Scales very well with the number of samples, basically no limit, no difference in the analysis between 2 or 100 samples 29 Data Reduc2on Peptide (feature) Isotope pattern Elution profile Feature Finding Problem: Identify all peaks belonging to one peptide and sum up their intensities 30 10

Data Reduc2on Features Aggregation of peaks to features achieves up to 1,000-fold reduction of data volume reduction to a meaningful quantity: ion count of one peptide 31 LFQ Analysis Strategy 1. Find features in all maps 32 LFQ Analysis Strategy 1. Find features in all maps 2. Align maps 33 11

LFQ Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features 34 LFQ Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features 4. Iden2fy features GDAFFGMSCK 35 LFQ Analysis Strategy 1. Find features in all maps 2. Align maps 3. Link corresponding features 4. Iden2fy features 5. Quan2fy GDAFFGMSCK 1.0 : 1.2 : 0.5 36 12

Proteomics Data Flow Sample HPLC/MS Raw Data 10 GB Sig.- Proc. 50 MB Maps Diff. Anal. Annot. Maps Data Reduction Filtered Raw Data 1 GB Differentially 50 MB Identification Expressed 1 kb Proteins 37 Quan2fying Analytes Analytes have to be in solu0on for proteomics and metabolomics We thus deal with concentra0ons: amounts per volume of sample V Molar concentra0on c i = n i / V [SI unit: mol/m 3 ] Mass concentra0on ρ i = m i / V [SI unit: kg/m 3 ] Transla0ng molar concentra0ons into mass concentra0ons can be done via the molecular weight M i of the analyte ρ i = c i M i Precision and Accuracy Reference value Probability density Accuracy good accuracy, poor precision good precision, poor accuracy Precision Accuracy: closeness to the true value (mostly influenced by systema0c error) repe00on of the experiment will not improve the result Precision: repeatability of the measurement (mostly influenced by random error) repe00on of the experiment will yield more a value closer to the true value An ideal experiment combines high accuracy with high precision Value 13

Measurement Errors Reference value Probability density Accuracy Precision Value Each measurement is associated with an error There are two basic types of error: Random error: defines the variance of repeated measurements (e.g., due to high noise level) this is always present in every measurement Systema2c error (bias): shixs the mean of repeated experiments (e.g., due to an incorrect calibra0on) Calibra2on Curve detector response concentration Measurement of the detector response for various (known) concentra0ons allows the construc0on of a calibra0on curve Most detector responses are chosen in a way that the response changes linearly with the concentra0on Once the calibra0on curve has been measured, it allows the determina0on of the concentra0on of an unknown sample Response saturation detector response slope = sensitivity LOD LOQ linear range LOL noise concentration LOD: level of detec0on at what concentra0on can we decide that the analyte is present LOQ: level of quan0fica0on at what concentra0on can we accurately quan0fy it LOL: limit of linearity satura0on effects sta here Linear range (dynamic range): the concentra0on range where we get a response that is linear in the concentra0on 14

Feature Finding Feature finding is a key data reduc0on step enabling complex analysis It is a key step for LC- MS data, both for proteomics and metabolomics Feature finding boils down huge maps to hundreds or thousands of features Various algorithms have been proposed for feature finding We will discuss the algorithm proposed by Gröpl et al. (2005), which uses a two- dimensional model fit The model is based on the shape of an ideal feature as defined by the separa0on process Gröpl, C, Lange, E, Reine, K, Kohlbacher, O, Sturm, M, Huber, C, Mayr, B, and Klein, C (2005). CompLife 2005, Springer LNBI 3695, p. 151-161. 43 Feature Finding Iden0fy all peaks belonging to one pep0de Key idea: iden0fy suspicious regions Fit a model to that region and see what peaks are explained by it 44 Feature A[ributes Attributes Position (, RT) Intensity, volume Quality 45 15

Features Models Feature model = Isotope pattern x Elution profile 46 Feature Models Physical processes leading to the shape of a feature: Chromatography Elu0on profiles are (ideally) shaped like a Gaussian Parameters: width, height, posi0on Mass spectrometry Mass spectra of pep0des are characterized by the isotope pahern Modeled by a binomial distribu0on A two- dimensional feature is then described by the product of these func0ons 47 Isotope Pa[erns Molecule with one carbon atom Two possibili0es: light variant, 12 C Heavy variant 13 C 98.9% of all molecules will be light 1.1% will be heavy 12 C 98.90% 13 C 1.10% 14 N 99.63% 15 N 0.37% 16 O 99.76% 17 O 0.04% 18 O 0.20% 1 H 99.98% 2 H 0.02% 48 16

Isotope Pa[erns Molecule with 10 carbon atoms Lightest variant contains only 12 C This is called monoisotopic Others contain 1-10 13 C atoms, these are heavier by 1-10 Da than the monoisotopic one In general, the rela0ve intensi0es follow a binomial distribu0on For higher masses (i.e., a larger number of atoms), the monoisotopic peak will be no longer the most likely variant 49 Averagine Since the isotope pahern changes with the composi0on of the pep0de, it is unknown which pahern should be fihed! Idea We know the mass of the feature Assume an average composi0on of an amino acid Then we can es0mate the composi0on The elemental composi0on of such an average amino acid, also called averagine, can be derived sta0s0cally: C 4.94 H 7.76 N 1.36 O 1.48 S 0.04 50 Isotope Pa[erns Based on averagine compositions one can compute the isotope patterns for any given Heavier peptides have smaller monoisotopic peaks In the limit, the distribution approaches a normal distribution m [Da] P (k=0) P (k=1) P (k=2) P (k=3) P (k=4) 1000 0.55 0.30 0.10 0.02 0.00 2000 0.30 0.33 0.21 0.09 0.03 3000 0.17 0.28 0.25 0.15 0.08 4000 0.09 0.20 0.24 0.19 0.12 51 17

Feature Model Isotope pahern is also modulated by the instrument resolu0on We can assume a Gaussian shape for each of the peaks of the isotope pahern 52 Feature Model RT Elu0on profile is typically assumed to be Gaussian There are some variants that also allow for asymmetric peaks This defines the RT dimension of a feature 53 Feature Finding Algorithm Algorithm consists of four phases 1. Seeding. Choose peaks of high intensi6es, as those are usually in features ( seeds ). 2. Extension. Conserva6vely add peaks around the seed, never mind if you pick up a few peaks too many. 3. Modeling. Es6mate parameters of a two- dimensional feature for the region. 4. Refinement. Op6mally fit a model to the collected peaks. Remove peaks not agreeing with the model. Iterate un6l convergence. 54 18

Algorithm: Seeding Sta with the highest peaks in the map Pick only one seed per feature, thus exclude peaks of already iden0fied features for later seeding More advanced variants of the algorithm use Wavelet techniques to detect the best seeds Problems Low- intensity features have intensi0es barely above the surrounding noise Choose a threshold based on the average noise Dilemma: threshold too high, features will not get seeded Threshold too low, millions of noise peaks will be considered as seeds ) HUGE run 0mes 55 Feature Finding Overview Sturm, OpenMS A Framework for Computational Mass Spectrometry, Disseation, Tübingen, 2010 56 Algorithm: Extension Explore the peaks around the seed Add them to a set of relevant peaks Abo if the peaks are getng too small or too far away up down up down 57 19

Algorithm: Refinement Remove peaks that are not consistent with the model Determine op2mal model for the reduced set of peaks Iterate this un0l no fuher improvement can be achieved Remove all peaks of this feature from poten0al seeds 58 Feature Finding Iden0fy all peaks belonging to one pep0de Key idea: iden0fy suspicious regions Fit a model to that region and iden0fy peaks explained by it 59 Feature Finding Extension: collect all data points close to the seed Refinement: remove peaks that are not consistent with the model Fit an op2mal model for the reduced set of peaks Iterate this un0l no fuher improvement can be achieved 60 20

Collec2ng Mass Traces A mass trace is a series of peaks along the RT dimension with lihle varia0on in the dimension Mass traces are found with a simple heuris0c abor0ng the search if the peak intensity hits the local noise level Search for mass traces in the correct distance Limit length of mass trace to the length of the most intense mass trace Sturm, OpenMS A Framework for Computational Mass Spectrometry, Disseation, Tübingen, 2010 61 Feature Deconvolu2on Sturm, OpenMS A Framework for Computational Mass Spectrometry, Disseation, Tübingen, 2010 62 Feature Deconvolu2on Features can overlap in various ways Mass traces can contain more than one chromatographic peak (features not baseline- separated in RT dimension) Mass traces can be interleaved between features in the m/ z dimension Co- elu0ng features can be sharing mass traces Resolving these conflicts is done in a feature deconvolu0on step by sta0s0cal tes0ng: Test several hypotheses that could explain the features The most likely of all hypotheses will be iden0fied through comparison with the data 63 21

Algorithm: Modeling Test all possible models for different charges states (charge +2, charge +3, ) Decide on the charge of the features based on the best fit for these models 1 2 3 64 Algorithm: Modeling/Refinement Es0mate quality of fit for model m and data d i at posi0ons r i : Maximum Likelihood Es0mator determines good star2ng values for model parameters Fuher op2miza2on of model parameters in refinement phase (least- squares fit) 65 Feature Assembly Feature resolu0on is not always possible unambiguously 66 22

S2ll Difficult: Low- Intensity Features Problem: The algorithm picked up the blue feature, The red one was not found as it was too close to the noise peaks (green) 67 Map Alignment Goal: Correct retention time offset and distoions in label-free experiments. 68 Mul2ple Feature Map Alignment Given k feature maps Map 1 Map2 RT and of a peptide may vary between maps compute suitable mapping Map k 69 23

Mul2ple Feature Map Alignment Dewarp k maps onto a comparable coordinate system Map 1 T 1 Map2 T 2 Map k T k 70 Mul2ple Feature Map Alignment Dewarp k maps onto a comparable coordinate system Assign corresponding elements across k maps Map 1 Map 2 T 1 Map k T 2 Consensus map T k 71 Map Alignment Algorithm The algorithms proposed by Lange et al. tries to find an op0mal alignment of two maps through pose clustering It assumes an affine transforma0on between two maps (suitable, unless the chromatographic separa0on has very poor reproducibility) The algorithm consists of two phases Superposi2on phase transform all maps onto the coordinate system of a reference map Consensus phase successive grouping of corresponding elements group elements in the transformed maps, which are nearest neighbors in a weighted Euclidean metric 72 24

Superposi2on Phase S M T = As t +b The problem is to find the affine transformation T that minimizes the distance between T(S) and M. 73 Superposi2on Phase S M T = As t +b T(S) and M 74 Pose Clustering S M T (s ) = a s + b T (s ) = a s + b 75 25

Pose Clustering S M s 1 s 2 m 1 m 2 m 1 = a s 1 +b b m 2 = a s 2 +b a 76 Pose Clustering S M s1 m 1 s 2 m 2 m 1 = a s 1 +b b m 2 = a s 2 +b a 77 Pose Clustering S M s 1 m 1 s 2 m 2 m 1 = a s 1 +b b m 2 = a s 2 +b a 78 26

Pose Clustering S M s 1 m 1 s 2 m 2 m 1 = a s 1 +b b m 2 = a s 2 +b a 79 Pose Clustering S M s 1 m 1 s 2 m 2 a b Matching of corresponding pairs will result in the correct transformation These are more likely than random matches! 80 Speeding Things Up S M s 1 s 2 Only consider pairs (s 1,s 2 ) in S with s 1 having a small distance to s 2 in. 81 27

Speeding Things Up S M s 1 m 1 s 2 m 2 Only match pair (s 1,s 2 ) onto pair (m 1,m 2 ) if s 1 and m 1 as well as s 2 and m 2 lie close together in. 82 Improve Matching S M s 1 m 1 s 2 m 2 Normalize intensities in M and S: weigh the vote of each transformation by the intensity similarities of the point matches (s 1,m 1 ) and (s 2,m 2 ). 83 Summary Quan0ta0ve shotgun proteomics produces large and complex datasets Manual analysis of these datasets is oxen prohibi0vely labor- intensive Feature detec0on significantly reduces the data and makes the quan0ta0ve analysis viable Map alignment enables the comparison of features across maps thus allowing for a label- free quan0fica0on 84 28

References Papers on Feature Finding and Map Alignment Gröpl, C, Lange, E, Reine, K, Kohlbacher, O, Sturm, M, Huber, C, Mayr, B, and Klein, C (2005). Algorithms for the automated absolute quan0fica0on of diagnos0c markers in complex proteomics samples. In: Proceedings of the 1st Symposium on Computa0onal Life Sciences (CLS 2005), edited by M. Behold, R. Glen, K. Diederichs, O. Kohlbacher, I. Fischer. Springer LNBI 3695, p. 151-161. Mayr, B, Kohlbacher, O, Reine, K, Sturm, M, Gröpl, C, Lange, E, Klein, C, and Huber, CG (2006). Absolute Myoglobin Quan0ta0on in Serum by Combining Two- Dimensional Liquid Chromatography- Electrospray Ioniza0on Mass Spectrometry and Novel Data Analysis Algorithms. J. Proteome Res. 5:414-421. Lange E, Gröpl C, Schulz- Trieglaff O, Leinenbach A, Huber C, Reine K. A geometric approach for the alignment of liquid chromatography- mass spectrometry data. Bioinforma0cs (2007), 23(13):i273-81. Web links to soaware tools www.openms.de 85 29