Supporting information for: A New Chemometric Approach for Automatic. Identification of Microplastics from. Environmental Compartments Based on

Similar documents
MAT 122 Homework 7 Solutions

Lecture 7: Edge Detection

Descriptive Data Summarization

The Theory of HPLC. Quantitative and Qualitative HPLC

A.P. Calculus Holiday Packet

Part I. Experimental Error

MATH 2053 Calculus I Review for the Final Exam

Lecture 7 Random Signal Analysis

Chapter 5: Application of Filters to Potential Field Gradient Tensor Data

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03

NONLINEAR RANDOM DISPLACEMENT AND FATIGUE ESTIMATES USING PDF TRANSFORMATIONS

Signal types. Signal characteristics: RMS, power, db Probability Density Function (PDF). Analogue-to-Digital Conversion (ADC).

Advanced Spectroscopy. Dr. P. Hunt Rm 167 (Chemistry) web-site:

FINAL REVIEW FALL 2017

Polynomial functions right- and left-hand behavior (end behavior):

4.1 Analysis of functions I: Increase, decrease and concavity

a x a y = a x+y a x a = y ax y (a x ) r = a rx and log a (xy) = log a (x) + log a (y) log a ( x y ) = log a(x) log a (y) log a (x r ) = r log a (x).

September Math Course: First Order Derivative

AB Calculus Diagnostic Test

Statistical Filters for Crowd Image Analysis

APPM 1350 Exam 2 Fall 2016

Student Study Session Topic: Interpreting Graphs

Revision of the ECMWF humidity analysis: Construction of a gaussian control variable

Cheng Soon Ong & Christian Walder. Canberra February June 2018

EXAFS data analysis. Extraction of EXAFS signal

Physical Chemistry II Laboratory

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Spectroscopy. Page 1 of 8 L.Pillay (2012)

Chapter 2: Statistical Methods. 4. Total Measurement System and Errors. 2. Characterizing statistical distribution. 3. Interpretation of Results

Linear Filtering of general Gaussian processes

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

MassLynx 4.1 Peak Integration and Quantitation Algorithm Guide

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Ï ( ) Ì ÓÔ. Math 2413 FRsu11. Short Answer. 1. Complete the table and use the result to estimate the limit. lim x 3. x 2 16x+ 39

Linear Regression (continued)

Taylor Series and stationary points

Chemistry 2. Assumed knowledge

LSU AP Calculus Practice Test Day

McGILL UNIVERSITY FACULTY OF SCIENCE FINAL EXAMINATION MATHEMATICS CALCULUS 1

MATH 019: Final Review December 3, 2017

1 + x 2 d dx (sec 1 x) =

Laplacian Filters. Sobel Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters

CPS 5310: Parameter Estimation. Natasha Sharma, Ph.D.

NORTH CAROLINA STATE UNIVERSITY Department of Chemistry. Physical Chemistry CH437 Problem Set #4 Due Date: September 22, 2015

Medical Image Analysis

Statistical methods. Mean value and standard deviations Standard statistical distributions Linear systems Matrix algebra

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

FT-IR Spectroscopy. An introduction in measurement techniques and interpretation

Narayana IIT/NEET Academy INDIA IIT_XI-IC_SPARK 2016_P1 Date: Max.Marks: 186

PTG-NIR Powder Characterisation System (not yet released for selling to end users)

REVIEW OF DIFFERENTIAL CALCULUS

2. What is the x-intercept of line B? (a) (0, 3/2); (b) (0, 3); (c) ( 3/2, 0); (d) ( 3, 0); (e) None of these.

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

Functions. A function is a rule that gives exactly one output number to each input number.

II. An Application of Derivatives: Optimization

Math 611b Assignment #6 Name. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Find the slope of the curve at the given point P and an equation of the tangent line at P. 1) y = x2 + 11x - 15, P(1, -3)

Math 147 Exam II Practice Problems

Business Statistics. Lecture 10: Course Review

Traffic Forecasting Tool

Prerna Tower, Road No 2, Contractors Area, Bistupur, Jamshedpur , Tel (0657) , PART III MATHEMATICS

MLC Practice Final Exam. Recitation Instructor: Page Points Score Total: 200.

Lecture 8. Assumed knowledge

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

MTH Calculus with Analytic Geom I TEST 1

Nanoscale IR spectroscopy of organic contaminants

Specificity of sites within eight-membered ring zeolite channels for the carbonylation of methyls to acetyls

Feature detectors and descriptors. Fei-Fei Li

Lineshape fitting of iodine spectra near 532 nm

Absorbance (a.u.) Energy (wavenumber)

MATH 18.01, FALL PROBLEM SET # 6 SOLUTIONS

ECON 331 Homework #2 - Solution. In a closed model the vector of external demand is zero, so the matrix equation writes:

Vibration-Rotation Spectrum of HCl

NO CALCULATOR 1. Find the interval or intervals on which the function whose graph is shown is increasing:

Impulsive Noise Filtering In Biomedical Signals With Application of New Myriad Filter

Harris Corner Detector

MATH 151, FALL 2017 COMMON EXAM III - VERSION B

Section 7: Local linear regression (loess) and regression discontinuity designs

Math 16A Second Midterm 6 Nov NAME (1 pt): Name of Neighbor to your left (1 pt): Name of Neighbor to your right (1 pt):

MATHEMATICS: PAPER II

1901 Application of Spectrophotometry

Feature detectors and descriptors. Fei-Fei Li

Supporting Information. Correlation of surface pressure and hue of planarizable. push pull chromophores at the air/water interface

SECTION 7: CURVE FITTING. MAE 4020/5020 Numerical Methods with MATLAB

Optical Properties of Thin Semiconductor Films

CS 5630/6630 Scientific Visualization. Elementary Plotting Techniques II

UNIVERSITY OF HOUSTON HIGH SCHOOL MATHEMATICS CONTEST Spring 2018 Calculus Test

Numerical Implementation of Transformation Optics

Spectroelectrochemistry Part 2: Experiments and Data evaluation

Asymmetry of Peaks in the XPS of Polymers

Please do not start working until instructed to do so. You have 50 minutes. You must show your work to receive full credit. Calculators are OK.

ARNOLD PIZER rochester problib from CVS Summer 2003

Partial Differential Equations, Winter 2015

Computer simulation of radioactive decay

Chapter 3 Prerequisite Skills. Chapter 3 Prerequisite Skills Question 1 Page 148. a) Let f (x) = x 3 + 2x 2 + 2x +1. b) Let f (z) = z 3 6z 4.

5. Find the intercepts of the following equations. Also determine whether the equations are symmetric with respect to the y-axis or the origin.

Towards a more physically based approach to Extreme Value Analysis in the climate system

1 Inner Product and Orthogonality

Stochastic Processes for Physicists

Transcription:

Supporting information for: A New Chemometric Approach for Automatic Identification of Microplastics from Environmental Compartments Based on FT-IR Spectroscopy Gerrit Renner,, Torsten C. Schmidt, and Jürgen Schram, Instrumental Analytical and Environmental Chemistry, Faculty of Chemistry, Niederrhein University of Applied Sciences, Frankenring 20, D-47798 Krefeld, Germany Instrumental Analytical Chemistry and Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr. 5, D-45141 Essen, Germany E-mail: schram@hs-niederrhein.de Phone: +49 (0)179 1043284 Estimation of Initial Fit Parameters Asymmetry Factor a One essential step of µident is curve fitting, and in this context, all fit parameters have to be estimated during an initialisation procedure. We decided to use a vibrational band model that respects peak asymmetry, which was presented by Stancik and Brauns. They redefined the static peak width γ by means of a dynamic peak width γ based on a logistic S1

function and established an asymmetry factor a and the dynamic peak width base value γ0. Having a look at the first derivative of an asymmetric peak, the ratio of absolute values of the extrema max( I) and min( I) differ significantly. This phenomenon can be used to estimate the asymmetry factor itself by analysing the corresponding extreme values and calculate their ratio R. In this context, we decided to use the modified Canberra distance due to its closed range of [ 1, +1], as it can be seen in Eq. 1. In addition, Fig. S1 gives an overview of different asymmetry factors and their corresponding peak shapes and ratios R including the first derivative. min(i) max(i) R = min(i) + max(i) = f(a) (1) Figure S1: Peak shapes and their corresponding first derivatives with differing asymmetry fators a. It can be observed that the ratio R changes and an absolute ratio R > 0.5 leads to a significant peak shoulder within the first derivative. However, the ratio R is influenced by the dynamic peak width γ 0 itself, which can be observed in Fig. S2. In this example, the asymmetry factor is kept constant while the dynamic peak S2

width γ 0 varies, and as a result, the ratio R still changes. min(i) max(i) R = min(i) + max(i) = f(a, γ 0) (2) Figure S2: Peak shapes and their corresponding first derivatives with differing dynamic peak widths γ 0 and constant asymmetry factor a. It can be observed that the ratio R changes and an absolute ratio R > 0.5 leads to a significant peak shoulder within the first derivative. In Fig. S1 as well as in Fig. S2 a significant change in the first derivative is recognisable if the absolute ratio R > 0.5. In addition, these kind of vibrational bands cannot be observed in real infrared spectra of microplastics, and therefore, we focussed on the estimation of vibrational bands with an absolute ratio R 0.5. Concerning this matter, 1 10 4 vibrational bands with differing dynamic peak width and asymmetry factor were simulated and the corresponding local extrema were analysed in order to calculate the ratio R. For a suitable estimation of the initial asymmetry factor, an empirical function was defined that describes the depence of the ratio R from the asymmetry factor a and the dynamic peak width γ0. In this context, we tried many different two dimensional logistic functions but a simple error function fits best with the empirical data, which can be seen in Fig. S3. The reason S3

why the experimental ratios R can be described with a logistic function are originated in the Canberra distance that behaves very similar to a logistic function. R= min(i) max(i) erf(p1 a γ0 ) min(i) + max(i) (3) Figure S3: Curve fitting of the experimental data of R,a and γ0 with an two dimensional error function (R2 = 0.9998). The empirical parameter p1 was estimated by means of the Trust-Region algorithm and the corresponding result is given by Eq. 4. p1 = 0.8795 ± 0.0005 (4) In a next step, the error function (Eq. 3) is inverted to isolate the asymmetry factor, which is shown in Eq. 5. R = erf(p1 a γ0 ) S4

with: erf(x) = 2 x e t2 dt (5) π a = erf 1 (R) p 1 γ 0 0 Evaluation of Estimated Fit Parameters The developed empirical procedure to estimate the initial fit parameters was tested with 1 10 4 simulated vibrational bands. During the initialisation, peak position ν 0, peak height A, dynamic peak width γ0 and asymmetry factor a are estimated, and in this test, results of A, γ0 and a are plotted against their set values, which can be seen in Fig. S4. It can be obtained that all parameters are estimated quite well, and the relative mean squared error Rel. M SE ranges between 3.9 4.3 %. S5

Figure S4: Evaluation of estimated fit parameters. A),C) and E) represent theoretical or set values of the peak parameters A, γ 0 and a. All plots indicate a good linear correlation and were fitted by first order linear regression (fit coefficients: a0 [intercept] and a1 [slope]). The overall accuracy seems to be quite well due to the regression line fits almost perfectly with the bisector(intercept=0 and slope=1). B), D) and F) illustrate the relative error distributions of the corresponding fit parameter estimation. In addition to a single parameter illustration, Fig. S5 shows the overall relative error for all evaluated vibrational bands. According to this data, very small vibrational bands cannot be S6

estimated as accurately as bigger vibrational bands. 0.3 Relative Error [%] 0.2 10 9 8 0.1 7 Asymmetry Factor a 0 6 5 4-0.1 3 2-0.2 1-0.3 16 14 12 10 8 0.02 6 4 0 2 Dynamic Peak Width * 0 0.04 0.06 0.08 Peak Height A * 0.1 Figure S5: Evaluation of estimated fit parameters.the peak parameters A, γ 0 and a are illustrated in a three dimensional scatter plot with a certain colour map wich describes the fourth dimension or the overall relative error, respectively. It can be observed that the error rises significantly if the dynamic peak width γ 0 2. Evaluation of the Distance Function d The main concept of the developed distance function is consideration of all vibrational band areas, positions and their statistical significance and therefore, we evaluated an empirical S7

function, which contains four constants k 1 to k 4. In this context, k 1 and k 2 define two distance thresholds for dissimilarity analysis of two different vibrational band positions ν, a lower (k 1 no significant dissimilarity) and an upper one (k 2 95% dissimilarity). On the other hand, k 3 and k 4 define the corresponding limits for dissimilarity analysis of two different normalized vibrational band areas Λ. For estimation, we analysed several different and similar measured vibrational bands from the environmental microplastics spectra (n=300), focussing on their vibrational band data distributions, as shown exemplarily in Fig. S6. Figure S6: Analysis of vibrational band data distributions. In the presented case, two different vibrational bands at 720 and 730 cm 1 were analysed. The distributions of the detected peak positions are defined by a distribution width of 4 to 5 cm 1, which can be estimated using a 4σ interval. The higher limit is given by measuring the distance between these two vibrational bands, as these bands are clearly separated with a corresponding difference of ν 95% 9. Analogously, normalised vibrational band area distributions were analysed, and in the given example, the normalised band area Λ 720,730 has an inter quartile range of 0.16 or a 3IQR interval of 0.48, respectively. The upper limit Λ 95% was set to 1.75. For further estimation, all these information were used to define a system S8

of equations, which is given by Eq. 6. 0.05 = 1 (exp [k 1 5 k 2 ] + 1) 1 0.95 = 1 (exp [k 1 10 k 2 ] + 1) 1 0.05 = 1 (exp [k 3 0.5 k 4 ] + 1) 1 (6) 0.95 = 1 (exp [k 3 1.75 k 4 ] + 1) 1 k 1 = 1.2, k 2 = 9, k 3 = 4.7, k 4 = 5.3 S9

Microplas cs Iden fica on Internal Data Normalisa on Technique Contents Automated Data Preprocessing........................................ 2 Baseline Correction........................................... 2 Noise Level Characterisation...................................... 2 Extract Fingerprint........................................... 3 Detect Peaks.............................................. 3 Initial Fit Parameters.......................................... 5 Curve Fitting.............................................. 6 Output: Positions, Areas and Weightings................................ 7 Asymmetric Pseudo Voigt Function (1st Derivative).............................. 8 Read Function Parameters....................................... 9 Substitute Parameters......................................... 9 Function................................................. 9 Dissmimilarity Function............................................. 9 Filter bands............................................... 10 Significance Weighting......................................... 10 Check For Minimum Number Of Peaks................................. 10 Normalise Peaks............................................ 11 Calculate distances between signals.................................. 11 1 1 / 11

Automated Data Preprocessing function z = mident specfit(x,y) Baseline Correction In a first step, baseline is corrected by differentiation using the Savitzky Golay Algorithm. The polynom order K is set to K=2 and the window size F is set to F=2. K = 2; % polynom order F = 7; % windowsize [, g] = sgolay(k, F); % Savitzky Golay Polynom g = g(:, 2); dx = mean(diff(x)); % step size N = size(x, 1); % number of data points HalfWin = ((F+1)/2) 1; SG1 = zeros(n (F+1)/2, size(x,2)); for n = HalfWin+1:N HalfWin 1 SG1(n, :) = dot(g, y(n HalfWin:n + HalfWin, :)); yderiv = SG1./dx; % 1. order derivative of y xderiv = x(1:size(y, 1) HalfWin 1, :); % corresponding x vector Noise Level Characterisation In order to characterise the noise level of the differentiated spectrum, a histogram analysis is performed. Considering that, the distribution of all differentiated absorbances is fitted using a Lorentzian Model. We define the level of noise based on the peak width, which encloses 68,3% of the entire peak area. binwidth = 3.5 std(yderiv)/size(yderiv,1)ˆ(1/3); % bin width, based on Scott s rule nbins = abs(max(yderiv) min(yderiv))/binwidth; % corresponding number of bins xx = linspace(min(yderiv),max(yderiv),nbins); % bin vector (x data) yy = interp1(unique(yderiv),1:numel(unique(yderiv)),xx, linear ); % bin vector (y data). AUTOMATED DATA PREPROCESSING Lorentzian Model myfunc = @(c,a,s,w,x) a./(((x c).ˆ2. (exp( s (x c)) + 1).ˆ2)./w.ˆ2 + 1); 2 / 11

Hitherto, bin vectors describe the cummultative distribution. To apply a Lorentzian Model, those data have to be differentiated. Additionally, the edges, which contain vibrational band information have to be cutted off, considering that this procedure characterises noise level. xx2 = xx(1: 1) ; yy2 = diff(yy) ; % differentiate bin vector hlp = cumsum(yy2)./sum(yy2); % normalised help vector yy2 = yy2(hlp>.05 & hlp<.95); % cut bin vector (y data) xx2 = xx2(hlp>.05 & hlp<.95); % cut bin vector (x data) Fit the noise distribution with the predefined Lorentzian Model. myfit = fit(xx2,yy2,myfunc,... StartPoint,[0 max(yy2) 0 0],... Upper,[.1 max(yy2) 1.2 1 1],... Lower,[.1 max(yy2).8 1 0],... Algorithm, Trust region ); noise = myfit.w tan(0.34135 pi); % As mentioned, noise is defined by peak width. Extract Fingerprint Assuming that most characteristic vibrational bands can be obtained from fingerprint region, this part of the spectrum is extracted. For further data evaluation limits of 700cm-1 to 1900cm-1 are predefined. lmts = [700 1900]; yderiv = yderiv(xderiv>lmts(1) & xderiv<lmts(2)); % extract fingerprint: y data xderiv = xderiv(xderiv>lmts(1) & xderiv<lmts(2)); % extract fingerprint: x data Detect Peaks. AUTOMATED DATA PREPROCESSING In a first step of peak detection, local extrema have to be determined using the findpeaks function. In order to filter some very small peaks, a minimum peak height ( noise level) is defined as threshold. [imax,locmax] = findpeaks(yderiv,xderiv, minpeakheight,noise); % find local maxima [imin,locmin] = findpeaks( yderiv,xderiv, minpeakheight,noise); % find local minima 3 / 11

In a next step, every local maximum is connected to a local minimum in its close proximity. These pairs will define the inflection points of the vibrational bands in the corresponding raw spectrum. [x0,idx] = sortrows([locmax ; locmin ],1); % sorted vector of all extreme value positions intensity = [imax ones(size(imax,1),1) ; imin zeros(size(imin,1),1)]; % vector of all extreme values intensity = intensity(idx,:); % sorted vector of all extreme values, based on their positions opt = [x0(:,1) intensity]; % sorted matrix, based on extreme value position % [position, intensity, type] % type:1 > local maximum % type:0 > local minimum opt clean = []; The peak matrix opt contains some noisy local extrema that do not describe an inflection point of a significant vibrational band. Those signals have to be eliminated in the following loop. while isempty(opt) if opt(1,3) == 1 hlp = find(opt(:,3) 1); else hlp = find(opt(:,3)); if isempty(hlp) [,idxopt] = max(opt(1:hlp 1,2)); opt clean = [opt clean; opt(idxopt,:)]; opt(1:hlp 1,:) = []; else [,idxopt] = max(opt(1:,2)); opt clean = [opt clean; opt(idxopt,:)]; opt = []; In absorbance mode, every vibrational band starts with a local maximum followed by a local minimum. Considering that, it has to be checked if the peak matrix opt starts with a local minimum or s with a local maximum, respectively. if opt clean(1,3) == 0 opt clean(1,:) = []; if opt clean(,3) == 1 opt clean(,:) = []; Inflection points are combined to new matrices: [position of maximum, position of minimum] & [intensity of maximum, intensity of minimum]. AUTOMATED DATA PREPROCESSING inflptspos = [opt clean(opt clean(:,3)==1,1) opt clean(opt clean(:,3)==0,1)]; inflptsint = [opt clean(opt clean(:,3)==1,2) opt clean(opt clean(:,3)==0,2)]; 4 / 11

Initial Fit Parameters In order to perform curve fitting using first order derivative of asymmetric pseudo Voigt function, suitable initial parameters have to be estimated. Vibtrational band position: x0 x0 = zeros(size(inflptspos,1),1); for u = 1:size(inflPtsPos,1) tmpx = xderiv(xderiv>=inflptspos(u,1) & xderiv<=inflptspos(u,2)); [,idx] = min(abs(yderiv(xderiv>=inflptspos(u,1) & xderiv<=inflptspos(u,2)))); x0(u) = tmpx(idx); Vibrational band width: w w = (inflptspos(:,2) inflptspos(:,1))/2; Vibrational band height: a a = max(inflptsint 2. w,[],2); Asymmetry factor: s R = (inflptsint(:,1) inflptsint(:,2))./(inflptsint(:,1)+inflptsint(:,2)); % Camberra Distance of inflection point intensities a1 = 2.007663533725664; % empirical fit parameter b1 = 56.245235357193230; % empirical fit parameter c1 = 0.030809453318739; % empirical fit parameter s = log(a1./(r. exp(c1. w) + 1) 1)./b1; % empiricial estimation of f s = real(s) imag(s); Shape factor: f f = x0 0+.2; % all shape factors are preset to 0.2. AUTOMATED DATA PREPROCESSING Combine all initial parameters in one matrix initial = [x0 a w f s]; 5 / 11

Filter peaks based on minimum and maximum peak width and minimum peak height minwidth = 2 mean(diff(xderiv)); maxwidth = 100; minheight = noise 3; initial(... initial(:,3)<minwidth... initial(:,3)>maxwidth... initial(:,2)<minheight,:) = []; Set upper and lower limits of fit parameters. up = [initial(:,1) + minwidth,... initial(:,2) 10,... initial(:,3) 10,... ones(size(initial,1),1),... ones(size(initial,1),1) 0.2]; lo = [initial(:,1) minwidth,... initial(:,2).1,... initial(:,3).1,... zeros(size(initial,1),1),... ones(size(initial,1),1) 0.2]; Reshape parameter matrices to parameter row vectors. initial = reshape(initial,1,[]); up = reshape(up,1,[]); lo = reshape(lo,1,[]); Curve Fitting In a next step, the incomming raw data (x,y) are fitted using a non-linear regression. In this context, first order derivative of asymmetric pseudo Voigt function is chosen as fit function. Considering that raw data (x,y) contains not a single but multiple vibrational bands, a cumulative fit function has to be defined. n = length(initial)/5; % number of peaks FCNprefix = ffunc9(x, ; idxarray = arrayfun(@(u) [ p sprintf( %03d,u)],1:n, uni,0); pararray = { p1 c ; p2 A ; p3 w ; p4 mu ; p5 a }; vararray = cell(5,n); for u = 1:n for v = 1:5 vararray{v,u} = [idxarray{u} pararray{v}, ]; FCNstr = strjoin(reshape(vararray,1,[])); FCNstr() = []; FCNstr = [FCNprefix FCNstr ) ]; % fit function. AUTOMATED DATA PREPROCESSING 6 / 11

Options for curve fitting: ft = fittype(fcnstr); options = fitoptions(ft); options.startpoint = initial; options.lower = lo; options.upper = up; options.robust = Bisquare ; options.display = off ; Fit [FIT.curve, FIT.gof] = fit(xderiv,yderiv,ft,options); Output: Positions, Areas and Weightings FITcoefficients = coeffvalues(fit.curve); % read fit parameters heights = FITcoefficients(2:5:); % fitted peak heighths widths = FITcoefficients(3:5:); % fitted peak widths shapes = FITcoefficients(4:5:); % fitted peak shapes asymm = FITcoefficients(5:5:); % fitted asymmetry factors. AUTOMATED DATA PREPROCESSING Fitted peak positions z.positions = FITcoefficients(1:5:) ; 7 / 11

fitted peak areas, calculated by numerical integration z.areas = arrayfun(@(u)... trapz(... linspace(... % x data for integration z.positions(u) 8 widths(u),... z.positions(u)+8 widths(u),... 500),... mypeakfunction2(... % y data for integration linspace(... z.positions(u) 8 widths(u),... z.positions(u)+8 widths(u),... 500),... z.positions(u),... heights(u),... widths(u),... shapes(u),... asymm(u)... )... ),... 1:n) ; Fitted peak weightings z.weightings = zeros(n,1); FIT.data = ffunc9(xderiv,fitcoefficients); % fitted y data for i = 1:n tmprange = xderiv > z.positions(i) 1.64 widths(i) &... xderiv < z.positions(i) + 1.64 widths(i); % the factor 1.64 encloses 90% of peak area z.weightings(i) = z.areas(i)/sqrt(sum((fit.data(tmprange) y1(tmprange)).ˆ2)/... sum((y1(tmprange) mean(y1(tmprange))).ˆ2)); % closing function Asymmetric Pseudo Voigt Function (1st Derivative). ASYMMETRIC PSEUDO VOIGT FUNCTION (1ST DERIVATIVE) function y = ffunc9(x,varargin) 8 / 11

Read Function Parameters This function uses a variable amount of parameters, considering that every additional peak will add a set of five paramerts to the function. varargin = cell2mat(varargin); c = varargin(1:5:); % peak centers a = varargin(2:5:); % peak heights w = varargin(3:5:); % peak widths f = varargin(4:5:); % shape factor s = varargin(5:5:); % asymmetry factor x = repmat(x,1,size(c,2)); % wavenumbers Substitute Parameters In this step, some calculations that are performed multiple times are out sourced in order to decrease computing operations. x = x c; u = exp(s. x); v = u. x; z = w.ˆ2; t = u+s. v+1; Function y = sum(... (24 a. f. z. x. (u + 1). t)./(v.ˆ2 + 2. v. x + 12. z + x.ˆ2).ˆ2... % Lorentzian (a. x. exp( (x.ˆ2. (u + 1).ˆ2)./(8. z)). (f 1). (u + 1). t)./(4. z),... % Gaussian 2);. DISSMIMILARITY FUNCTION Dissmimilarity Function function D = mident dissimilarity(x1,x2,y1,y2,s1,s2) 9 / 11

Filter bands The algorithm compares pairwisely the closeset peaks of the two spectra. In this context, a maximum gap size of 20 cm-1 is preset. allx = pdist2(x1,x2); % Calculate pairwise distance of peak positions. allx(allx > 20) = NaN; % Delete all pairs with a distance greater that 20. hlpidx = sum( isnan(allx),1) > 1; % Check reference for multipletts; allx(allx =min(allx,[],1)) = nan; hlpidx = sum( isnan(allx),2) > 1; % Check sample for multipletts; allx(allx =min(allx,[],2)) = nan; hlpidx1 = nansum(allx,2)==0; % Check sample for unused bands. hlpidx2 = nansum(allx,1)==0; % Check reference for unused bands. X1(hlpIDX1) = []; % Delete all unused sample peak positions. X2(hlpIDX2) = []; % Delete all unused reference peak positions. Y1(hlpIDX1) = []; % Delete all unused sample peak areas. Y2(hlpIDX2) = []; % Delete all unused reference peak areas. S1(hlpIDX1) = []; % Delete all unused sample peak significance values. S2all = S2; % Save reference significance vector. S2(hlpIDX2) = []; % Delete all unused reference peak significance values. Significance Weighting The Weighting factor W is defined as the cumulated significance S2 of the corresponding peak areas Y2 that can be found in the sample spectrum Y1 as well. W = sum(s2)ˆ2/sum(s2all)ˆ2; Check For Minimum Number Of Peaks The dissimilarity function only supports spectra with three or more peaks. If this is not the case, the dissimilarity is set to maximum.. DISSMIMILARITY FUNCTION if numel(x1) > 2 && numel(x2) > 2 10 / 11

Normalise Peaks In this section, all common sample and reference peaks are normalised. For that reason, all possible and unique peak pairs are combined using the Canberra Distance. ry1 = pdist(y1,@(u,v) (u v)./(u+v)) ; % Normalise sample peaks ry2 = pdist(y2,@(u,v) (u v)./(u+v)) ; % Normalise reference peaks rx1a = pdist(x1,@(u,v) u) ; % Create sample peak position vector 1 (normalised). rx1b = pdist(x1,@(u,v) v) ;% Create sample peak position vector 2 (normalised). rx2a = pdist(x2,@(u,v) u) ; % Create reference peak position vector 1 (normalised). rx2b = pdist(x2,@(u,v) v) ;% Create reference peak position vector 2 (normalised). rs1a = pdist(s1,@(u,v) u) ; % Create sample peak significance vector 1 (normalised). rs1b = pdist(s1,@(u,v) v) ;% Create sample peak significance vector 2 (normalised). rs2a = pdist(s2,@(u,v) u) ; % Create reference peak significance vector 1 (normalised). rs2b = pdist(s2,@(u,v) v) ;% Create reference peak significance vector 2 (normalised). rs1 = min([rs1a rs1b],[],2); % Calculate sample peak significance vector (normalised). rs2 = min([rs2a rs2b],[],2); % Calculate reference peak significance vector (normalised). Calculate distances between signals The dinstance or dissimilarity includes peak position and area differences. In addidtion, a weighting vector is used, which corresponds to peak signifiances. dx = sqrt((rx1a rx2a).ˆ2 + (rx1b rx2b).ˆ2); % Normalised peak position difference. dy = sqrt((ry1 ry2).ˆ2); % Normalised peak area difference. dw = mean([rs1 rs2],2); % Weighting vector k1 = 1.2; k2 = 9; k3 = 4.7; k4 = 5.3; % Empirical factors dz = 1 1./((exp(k1 dx k2)+1). (exp(k3 dy k4)+1)); % Normalised peak distance. D = sum(dz. dw/sum(dw))ˆw; % Calculate weighted dissimilarity. else % If sample and reference have no common vibriational band D = 1; % closing function. DISSMIMILARITY FUNCTION 11 / 11