Chemometrics The secrets behind multivariate methods in a nutshell!

Size: px
Start display at page:

Download "Chemometrics The secrets behind multivariate methods in a nutshell!"

Transcription

1 Chemometrics The secrets behind multivariate methods in a nutshell! "Statistics means never having to say you're certain." We will use spectroscopy as the analytical method to explain the most commonly applied multivariate models and their + and -! Least Squares Regression (LSR) the starting point! Classical Least Squares Regression (CLS) Inverse Least Squares Regression (ILS) Principle Components Analysis/Regression (PCA/R) Partial Least Squares Regression (PLS) (Remember: we are still thinking linear!) Nonlinear? Artificial Neural Networks (ANN)!

2 What do we want? Take advantage of changes we do not see! absorbance (abs. units) Absorbance Units Salt ions in water KCl wavenumber (cm -1 ) 1800 NaCl 1 % w/v 3 % w/v 5 % w/v 1 % w/v 3 % w/v 5 % w/v NaBr KBr 1 % w/v 3 % w/v 5 % w/v 1 % w/v 3 % w/v 5 % w/v MgCl 2 1 % w/v 3 % w/v 5 % w/v Wavenumber wavenumber cm-1 (cm -1 ) CaCl 2 Na 2 SO % w/v 0.25 % w/v 0.5 % w/v 1 % w/v 3 % w/v 5 % w/v wavenumber (cm -1 ) absorbance (abs.u.)

3 What do we want? Another example (courtesy K. Booksh, ASU) 80 corn flour samples NIR reflectance measurements (differences?) Calibrate for moisture, oil, protein, and starch %R Wavelength (nm)

4 What we really want Calibration models! Quantitation of analytical results (in our case spectral analysis) requires prior training of the system with samples of known concentration! Simplest case: measurement of band height or band area of samples with known concentration and comparison of raw numeric values to unknown sample (measured at same conditions). (Note: one-point calibration) Requirement well resolved bands, but how about the real world???

5 A little more sophisticated Calibration equations instead of singular values! Create calibration equation (or series of equations) based on set of standard mixtures (= training set). Set reflects composition of unknown samples as close as possible. Set spans the expected range of concentrations and composition. Training set measured under same conditions (e.g. path length, sampling method, instrument, resolution, etc.) as unknown sample.

6 A little more sophisticated Calibration equations instead of singular values! c a = A a (area a ) + B c a = A a (height a ) 2 + (height a ) B + C, etc. A,B,C calibration coefficients. Coefficients usually not known. Samples with known concentrations (training set). Minimum number of calibration samples is the number of unknown coefficients! Usually more samples measured to improve accuracy of calib. coefficients. (Note: robust model; minimize sum of squared errors/residuals) Repeated measurements of same conc. for averaging (Note: noise!).

7 A little more sophisticated Calibration equations instead of singular values! Best way to find calibration coefficients: Least Squares Regression (LSR)! Calculate the coefficients of given equation such that differences between known responses (peak areas or heights) and predicted responses are minimized. Areas of spectral component band and concentrations used to compute the coefficients of the calibration equation by Least Squares Regression.

8 LSR The easy way! What do we need to pay attention to: If more than one component in the samples a separate band must be used for each. Hence, one equation necessary for each component. LSR assumes that absorbance measurement of peak height or peak area is result of only one component. Hence, results not accurate for mixtures with overlapping bands. Predictions will have large errors if interfering (spectrally overlapping) components are present. Solution: more sophisticated statistics!

9 More sophisticated getting closer to real world samples! We should calculate the absorptivity coefficients across a much larger portion of the spectrum. Beer s law: A λ = ε λ c b Assuming that for a given λε λ and b remain constant we define the constant K λ = ε λ b. Solving this equation: measure the absorbance of a single sample of known concentration and use these values to solve for given K λ. Prediction of unknown sample: c = A λ / K λ

10 Classical Least Squares Regression (CLS) also called K-Matrix! Problem? Basing an entire calibration on a single sample is generally not a good idea (noise, instrument error, sample handling error, etc.). Solution? Measure the absorbances of a series of different concentrations and calculate the best fit line through all the data points (see LSR).

11 CLS has more problems Sample with two constituents: Algebraic solution requires: # equations = # unknowns. Let s consider A s of component A and B are at different λ and absorptivity constants K λ are different. We can solve each equation independently provided that the spectrum of one constituent does not interfere with the spectrum of the other. A λ1 = c A K Aλ1 A λ2 = c B K Bλ2 Do you see a problem with that?

12 CLS has more problems Yes, there is a problem! Equations make assumption that absorbance at λ1 is entirely due to constituent A and λ2 entirely due to constituent B! Similar to LSR we need to find two λ in the spectra of the training set exclusively representing constituents A and B. Difficult with complex mixtures or simple mixtures of similar materials! Do you see a solution to that?

13 CLS Beer s law can help us: Absorbances of multiple constituents at the same λ are additive. A λ1 = c A K Aλ1 + c B K Bλ1 A λ2 = c A K Aλ2 + c B K Bλ2 Did we forget something? We assume there is no error in the measurement (i.e. the calculated least squares line(s) that best fits the calibration samples is perfect). Unfortunately, this never happens! Hence, we add a variable E describing the residual error between the least squares fit line and the actual absorbances.

14 CLS Now we write: A λ1 = c A K Aλ1 + c B K Bλ1 + + c n K nλ1 + E λ1 A λ2 = c A K Aλ2 + c B K Bλ2 + + c n K nλ1 + E λ2 A λn = As with most calibration models CLS requires many more training samples to build an accurate calibration. Hence, we need to solve many equations (for many constitutents and λs). Solution? Linear algebra formulating the equations into a matrix every PC is craving for!

15 CLS If we solve the equations for the K matrix we can use the resulting best fit least squares line(s) to predict concentrations of unknown K = A C -1 (Note: check back on matrix algebra in the beginning of this class!) Advantage compared to LSR: We can use parts or the entire spectrum for calibration. Averaging effect increases accuracy of prediction. If entire spectrum is used for calibration the rows of the K matrix are actually spectra of the absorptivities for each of the constituents, which look very similar to the pure constituent spectra.

16 CLS The Good, the Bad and the Ugly Advantages: Based on Beer s law. Calculations are relatively fast. Applicable to moderately complex mixtures. Calibrations do not require wavelength selection as long as the # wavelengths exceeds the # constituents. Disadvantages: Requires knowing the entire composition and concentration of every constituent of the training set. Limited applicability for mixtures with constituents that interact. Very susceptible to baseline effects (e.g. drifts) as equations assume the response at a wavelength is only due to the calibrated constituents.

17 Again more sophisticated getting even closer to real world samples! Beer s law: A λ = ε λ c b No interference in the spectrum between the individual sample constituents. Concentrations of all the constituents in the samples are known ahead of time. Very unlikely for real world samples! Solution: let s rearrange Beer s law again!

18 Inverse Least Squares Regression (ILS) also called P-Matrix or Multiple Linear Regression (MLR)! Beer s law rearranged: c = A λ / ε λ b Assuming that for a given λ ε λ and b remain constant we define the constant P = 1/ε λ b. Now we can write: c = P A λ + E In this expression Beer s Law says that the concentration is a function of the absorbances at a series of given wavelengths!??? Where is the difference/advantage to CLS???

19 ILS CLS: A λ1 = c A K Aλ1 + c B K Bλ1 + E λ1 A λ2 = c A K Aλ2 + c B K Bλ2 + E λ2 Absorbance at a single wavelength is calculated as an additive function of the constituent concentrations, i.e. concentrations of ALL components need to be known! ILS: c A = A λ1 P Aλ1 c B = A λ1 P Bλ1 + A λ2 P Aλ2 + E A + A λ2 P Bλ2 + E B NOTE: even if the concentrations of all the other constituents in the mixture are not known, the matrix of coefficients P can still be calculated correctly!!!

20 Consequently: ILS Only concentrations of the constituents of interest need to be known. No knowledge of the sample composition is needed. What do we need? Selected wavelengths must be in a region the constituent of interest contributes to the overall spectrum. Measurements of the absorbances at different wavelengths are needed for each constituent. Measurements of at least one different λ is needed for each additional independent variation (constituent) in the spectrum.

21 ILS Matrix algebra is helping again: P = C A -1 Now we can accurately build models for complex mixtures when only some of the constituent concentrations are known. We just need to select wavelengths corresponding to the absorbances of the desired constituents. So where is the UGLY?

22 ILS Dimensionality of matrix equations: (see also: very beginning of this class on matrix algebra!) Number of selected wavelengths cannot exceed the number of training samples. Collinearity: (see also: linear independency!) We could measure many more training samples to allow for additional wavelengths. BUT: absorbances in a spectrum tend to all increase and decrease together as the concentrations of the constituents in the mixture change.

23 Overfitting: (see also: chance correlations!) ILS In general, starting from very few λ, and adding more to the model (of course selected to reflect the constituents of interest) will improve the prediction accuracy. BUT: if the number of λ increases in the calibration equations, the likelihood that "unknown" samples will vary in exactly the same manner decreases and prediction accuracy goes down again. Noise: (see also: issues of noise!) If too much information (too many λ) is used to calibrate the model starts to include the spectral noise (which is unique to the training set only) as constituent signal and the prediction accuracy for unknown samples suffers.

24 Consequence: ILS Averaging effect gained by selecting many wavelengths as in CLS is effectively lost. Wavelength selection is critically important to building an accurate ILS model. Ideal situation: selecting sufficient wavelengths to compute accurate least squares line(s) and few enough that the calibration is not (overly) affected by the collinearity of the spectral data. Hence optimization of model required! Advantage: Main advantage of this multivariate method is the ability to calibrate for a constituent of interest without having to account for interferences in the spectra.

25 ILS The Good, the Bad and the Ugly Advantages: Based on Beer s law. Calculations are relatively fast. True multivariate model, which allows calibration of very complex mixtures since only knowledge on constituents of interest is required. (Note: multivariate because concentration (dependent variable) is solved by calculating a solution from responses at several selected wavelengths (multiple independent variables). Disadvantages: Wavelength selection can be difficult and time consuming. Collinearity of wavelengths must be avoided. # wavelengths used in the model limited by # calibration samples. Large number of samples are required for accurate calibration

26 Maybe we need to think more abstract to solve some of these problems Why? Spectrum of real world samples: many different variations contribute, incl. constituents in the mixture, interaction between constituents, instrument variations (e.g. detector noise), changing of ambient conditions affecting baseline and absorbance, sample handling, etc. What do we hope for? That the largest variations in the calibration set are the changes in the spectrum due to the different concentrations of the constituents!

27 Do we need the absolute absorbance Well values in the spectrum? even if many complex variations affect the spectrum there should be a finite number of independent common variations in the spectral data. Conclusion? If we can calculate a set of variation spectra representing the changes in the absorbances at all wavelengths in the spectra, this data could be used instead of the raw spectral data for building the calibration model!

28 Let s continue this idea Can we reconstruct a spectrum from variations? If we multiply a sufficient amount of variation spectra each with a different constant scaling factor and add the results together we should be able to reconstruct a real spectrum. Each spectrum in the calibration set would have a different set of scaling constants for each variation since the concentrations of the constituents are all different. Hence, the fraction of each variation spectrum that must be added to reconstruct the unknown data should be related to the concentration of the constituents.

29 What does this mean mathematically? Variations spectra are called Eigenvectors! (also: loading vectors or principal components) The scaling constants applied to reconstruct the spectra (which we multiply with the variations spectra = Eigenvectors) are called scores. What do we need to do? We break down a spectroscopic data set into its most basic variations

30 Variations vs. Absorbances Spectrum of 3 components

31 which is called: Principal Component Analysis (PCA) Why do we want to do that? Because there should be much fewer common variations in the calibration spectra than the number of calibration spectra. Hence, because we are lazy (or time is pressing) we expect to significantly reduce the number of calculations for the calibration equations!

32 Principal Component Analysis (PCA) Why does prediction work on the basis of Eigenvectors? The calculated eigenvectors derive from the original calibration data (spectra). Hence, inherently they must relate to the concentrations of the constituents making up the samples. Following, the same loading vectors (eigenvectors, principal components) can be used to predict unknown samples. Consequence: The only difference between the spectra of samples with different constituent concentrations is the fraction of each added loading vector (scores).

33 Principal Component Analysis (PCA) What do we need to do? The calculated scores are unique to each separate principal component and to each training spectrum. In fact, they can be used in lieu of absorbances in either of the classical model equations in CLS or ILS. Consequently, the representation of the mixture spectrum is reduced from many wavelengths to a few scores.

34 Principal Component Analysis (PCA) Now comes the real advantage! We can use the ILS expression of Beer s law (c = P A λ + E) to calculate concentrations as this allows us to calculate concentrations among interfering species. At the same time the calculations maintain the averaging effect of CLS by using a large number of wavelengths in the spectrum (up to the entire spectrum) for calculating the eigenvectors. Eigenvector models combine the best of both worlds!

35 Principal Component Analysis (PCA) What do we get? Spectral data condensed into the most prevalent spectral variations (principal components, eigenvectors, loadings) and the corresponding scaling coefficients (scores).

36 Principal Component Analysis (PCA) Difference to CLS and ILS: PCA models base the concentration predictions on changes in the data ( variation spectra ) and not absolute absorbance measurements. Conclusion: in order to establish a PCA model spectral data must change. Simplest way: vary the concentrations of the constituents of interest in the trainings set. Important: avoid collinearity. i.e. 2 or more components in the calibration samples should not be present in the same ratio (e.g. A and B are present in the stock solution in ratio 2:1 and training set is prepared by dilution of that solution)! PCA will detect only ONE variation! Calibration of Eigenvector models requires randomly distributed ratios of the constituents of interest.

37 Principal Component Analysis (PCA) Mean centering of data: Data is commonly mean centered prior to PCA. Mean centering: mean spectrum (average spectrum) is calculated from all calibration spectra and then subtracted from every calibration spectrum. Effect: enhancement of small differences between spectra as changes in the absorbance data important and not the absolute absorbance (i.e. data not falsified!). Following mean centering a set of Eigenvectors (principal components) is created that represents the changes in the absorbances common to all calibration spectra. After training data has been fully processed by the PCA algorithm, two main matrices remain: + The Eigenvectors (spectra) + The scores (the eigenvector weighting values for all the calibration spectra)

38 Principal Component Analysis (PCA) Matrix expression of PCA: A = S F + E A - E A error matrix describing the model s ability to predict the calibration absorbances; has same dimensionality as the A matrix. - E A called the matrix of residual spectra (Note: see residual analysis!) (Note: mean spectrum only added if data mean centered; spectral residual (E A ) is the difference between the reconstructed spectrum and the original keep in mind: no model is perfect!) Original calibration spectrum

39 Principal Component Analysis (PCA) Now the question arises how many PCs do we need to model our data? (Note: same question for # of factors in PLS) - Calculated Eigenvectors are ordered by their degree of importance to the model in case of PCA the decisive parameter is the variance. - If too many PCs are taken into account ( overfit ) the Eigenvectors will begin modeling the system noise as the smallest contributions of variance in the training data set. - This is great: if we select the correct number of PCs we effectively filter our noise! - However: if the number of PCs is too small ( underfit ) the concentration prediction for unknown samples will suffer. - So here is the task: define a model that contains enough orthogonal (linear independent) Eigenvectors to properly model the components of interest without adding too much contribution from noise! But how?

40 Principal Component Analysis (PCA) Calculate the PRESS value for every possible factor (PRESS=Prediction Residual Error Sum of Squares) - We build a calibration model with a number of factors. - Then we predict some samples of known concentration (usually from the training set) with the model. - The sum of the squared difference between the predicted and known concentrations gives the Prediction Residual Error Sum of Squares for that model. (n is the number of samples in the training set; m is the number of constituents; Cp is the matrix of predicted sample concentrations from the model; C is the matrix of known concentrations of the samples)

41 Principal Component Analysis (PCA) Self prediction: - Models built using all the spectra in the training set. - Then the same spectra are predicted back against these models. - Disadvantage: all vectors calculated exist in all training spectra. Hence, the PRESS plot will continue to fall as new factors are added to the model and will never rise. This gives the (false) impression that all vectors are constituent vectors and that there are no noise vectors to eliminate (which is never the case!). - However, there is one tempting advantage this method is very fast as the model is only built once! Better way: Cross validation

42 Principal Component Analysis (PCA) Cross validation (1): - Again, unknown samples emulated by training set. - However: sample to be predicted left out during calibration. - Procedure repeated until every calibration sample has been left out and predicted at least once. The calculated squared residual error is added to all the previous PRESS values. - Disadvantage: time consuming as re-calculation is required for every left-out sample. - However, as the predicted samples are not the same as the samples used to build the model, the calculated PRESS value is a very good indication of the error in the accuracy of the model when used to predict "unknown" samples in the future! Hence: The only recommended way!

43 Principal Component Analysis (PCA) Cross validation (2): - Initially the prediction error (PRESS value) decreases as new Eigenvectors (PCs) are added to the model. This indicates that the model is still underfit and there are not enough factors to completely account for the constituents of interest. - At some point the PRESS values reach a minimum and start to PCs that contain uncorrelated noise indicating that the model overfit.

44 Principal Component Regression (PCR) PCA combined with ILS: - Quantitative models for complex samples can be established. - Instead of directly regressing the constituent concentrations against the spectroscopic response via Beer s law we regress the concentrations against the PCA scores. - Eigenvectors of PCA decomposition represent the spectral variations common to all of the spectroscopic calibration data. - We can use that information to calculate a regression equation providing a robust model for predicting concentrations of the desired constituents in very complex samples (instead of directly utilizing absorbances).

45 Principal Component Regression (PCR) How does it work? - Let s compare against the techniques we know: CLS: K = A C -1, A λ1 = c A K Aλ1 + c B K Bλ1 + E λ1 ILS: P = C A -1, c A = A λ1 P Aλ1 + A λ2 P Aλ2 + E A PCA: A = S F + E A - F-Matrix in PCA (containing the PCs) has similar function as K- Matrix in CLS: stores the spectral (or spectral variance) data of the constituents. The F-Matrix needs the S-Matrix (scores) to be useful; likewise, the K-Matrix needs the C-Matrix. - The scores summarized in the S-Matrix are unique to each calibration spectrum. - An optical spectrum is represented by a collection of absorbances at a series of wavelengths. In analogy, the very same spectrum can be represented by a series of scores for a given set of factors. Hence: we can regress the concentrations (C-Matrix) against the scores (similar to the classical approach regressing the concentrations against the absorbances, i.e. A-Matrix).

46 Principal Component Regression (PCR) Using the ILS approach we can formulate: - C = B S + E C C represents the constituent concentration matrix, B the matrix of regression coefficients and S the scores matrix from the PCA. - Now we understand why this approach is called PCR: we combine PCA (first step) with ILS regression (second step) to solve the calibration equation for the model. In contrast (and as we shall see later), partial least squares (PLS) regression performs these operations in one step. - We can use A = S F rearranged to S = A F -1 (neglecting the error matrix for simplicity): C = B A F -1 + E C PCR model equation

47 Principal Component Regression (PCR) IMPORTANT (1): - PCR calibration model is a two-step process: (1) PCA Eigenvectors and scores are calculated; (2) scores are regressed against the constituent concentrations using a regression method similar to ILS. - NOTE: Remember that the ILS approach can build accurate calibrations, provided that the selected variables are physically related to the constituent concentrations. However, the PCA factors/scores are calculated independently of any knowledge of these concentrations represent only the largest common variations among all the spectra in the training set. - We assume that these variations will be mostly related to changes in the constituent concentrations, but there is no guarantee this will be true.

48 Principal Component Regression (PCR) IMPORTANT (2): - Practically, many PCR models include more factors than are actually necessary as some of the Eigenvectors are probably not related to any of the constituents of interest. - Ideally, a PCR model should be built by performing a selection on the scores (similar to wavelengths selection in ILS model) determining which factors should be used to build a model for each constituent. - As these selection rules are difficult to establish and to wrap into algorithms corresponding treatment is not included in most chemometrics packages! That s why we have yet another technique PLS ;-)

49 PCA/PCR The Good, the Bad and the Ugly Advantages: Does not require wavelength selection, usually whole spectrum or large regions used (though scores selection might be advantageous sometimes!). Larger number of wavelengths provides averaging effect (model less susceptible to spectral noise). PCA data compression (much less PCs than spectra) allows using inverse regression to calculate model coefficients calibrating only for constituents of interest. Can be used for very complex mixtures since only knowledge of constituents of interest is required. Can sometimes be used to predict samples with constituents (contaminants) not present in the original calibration mixtures.

50 PCA/PCR The Good, the Bad and the Ugly Disadvantages: Calculations slower than most classical methods (not a tremendous problem nowadays given the available computation power). Optimization requires some knowledge of PCA; models are more complex to understand and interpret coefficients calibrating only for constituents of interest. Large number of samples are required for accurate calibration. Hence preparation/collection of calibration samples can be difficult avoiding collinearity of constituent concentrations.

51 Partial Least Squares (PLS) yet another technique! More focused on concentrations - PLS is closely related to PCA. - Main difference: spectral decomposition uses concentration information provided in the training set. - PCA: first we decompose spectral matrix into set of Eigenvectors and scores; then we regress them against the concentrations in a separate step. - PLS: concentration information used already during the decomposition process; hence, spectra containing higher constituent concentrations weighted more heavily than spectra containing low concentrations. - Consequence: Eigenvectors and scores calculated using PLS are different from those in PCR. The main idea of PLS is to get as much concentration information as possible into the first few loading vectors

52 Partial Least Squares (PLS) Here s what we do - PCA decomposes spectra into the most common variations. - PLS takes advantage of the correlation relationship already existing between the spectral data and the constituent concentrations and decomposes the concentration data also into the most common variations. - Consequently: two sets of vectors (one set for spectral data; one set for constituent concentrations) and two sets of corresponding scores are generated for the calibration model.

53 Partial Least Squares (PLS) Let s contrast the results of PCA and PLS: - PCA decomposes first and then performs the regression. - PLS performs decomposition of spectral and concentration data simultaneously (=regression already included in one step). Principal components are called factors in PLS. PCA PLS

54 Partial Least Squares (PLS) How does it work? - We will not derive the algorithms. For those interested in applying the methodology guidelines for implementation will be posted on the web. - It is assumed that the two sets of scores (spectral and concentration scores) are related to each other through some type of regression (which appears natural as the spectral features are dominated by the constituent concentrations). Hence, a calibration model can be constructed. - As each new factor is calculated for the model, the scores are "swapped" before the contribution of the factor is removed from the raw data. The reduced data matrices are then used to calculate the next factor. This process is repeated until the desired number of factors is calculated.

55 Partial Least Squares (PLS) Main difference between PCA and PLS - In PLS the resulting spectral vectors are directly related to the constituents of interest. - In PCR the vectors only represent the most common spectral variations in the data completely ignoring their relation to the constituents of interest until the final regression step.

56 Partial Least Squares (PLS) to make it even more complicated there is a PLS-1 and a PLS-2 algorithm! - PLS-1 is the procedure we just discussed, which results in in a separate set of scores and loading vectors for each constituent of interest (see previous slide). Hence, the calculated vectors are optimized for each individual constituent. - PLS-2 basically adopts the strategy of PCA and calibrates for all constituents simultaneously. Hence, the calculated vectors are not optimized for each individual constituent. - Consequence: in principle, the predictions derived from PLS-1 should be more accurate then PLS-2 and PCA. But: speed of calculation! (Note: separate set of eigenvectors and scores must be calculated for every constituent of interest; training sets with a large number of samples and constituents will significantly increase the time of calculation.)

57 Partial Least Squares (PLS) Advantage of PLS: - For systems that have constituent concentrations that are widely varied. - Example: calibration spectra contain A in concentration range 40-60%, B in concentration range 5-8% and C in concentration range %. - Here, PLS-1 will very likely predict better than PLS-2 or PCA. - If the concentration ranges of A, B and C are approx. the same, PCA and PLS-2 will perform with similar predictive quality, however, PLS-1 will definitely take longer to calculate.

58 PLS The Good, the Bad and the Ugly Advantages: Combines the full spectral coverage of CLS with partial composition regression of ILS ( best of both worlds argument!). Single step decomposition and regression. Eigenvectors/Factors directly related to constituents of interest rather than largest common spectral variations. Calibrations are generally more robust if calibration set accurately reflects range of variability expected in unknown samples. Can sometimes be used to predict samples with constituents (contaminants) not present in the original calibration mixtures. In general, literature argues that PLS has superior predictive ability - HOWEVER: there are many published examples where certain calibrations simply have performed better using PCR or PLS-2 instead of PLS-1!!!

59 PLS The Good, the Bad and the Ugly Disadvantages: Extensive calculation times. Models are fairly abstract and difficult to understand and interpret. Large number of samples are required for accurate calibration. Hence preparation/collection of calibration samples can be difficult avoiding collinearity of constituent concentrations.

60 Decision maker on the correct method

61 What do we want? 80 corn flour samples NIR reflectance measurements (differences?) Calibrate for moisture, oil, protein, and starch %R Samples { 40 Calibration 20 Validation 20 Test Wavelength (nm)

62 Corn flour samples PCR vs. PLS for Oil in Corn Flour Left: PCR; right: PLS Factors Percent Spectral Variance Cumulative % Spectral Variance Factors Percent Spectral Variance Cumulative % Spectral Variance

63 Salinity PCR absorbance (abs. units) NaCl KCl 1 % w/v 3 % w/v 5 % w/v 1 % w/v 3 % w/v 5 % w/v NaBr KBr 1 % w/v 3 % w/v 5 % w/v 1 % w/v 3 % w/v 5 % w/v CaCl 2 1 % w/v 3 % w/v 5 % w/v wavenumber (cm -1 ) MgCl 2 1 % w/v 3 % w/v 5 % w/v Na 2 SO % w/v 0.25 % w/v 0.5 % w/v absorbance (abs.u.) Salt ions in water wavenumber (cm -1 ) wavenumber (cm -1 ) -0.02

64 Salinity PCR around 3350 cm -1 - water absorption is too strong around 2100 cm -1 - weak water absorption (included) around 1635 cm -1 around 900 cm -1 - appropriate water absorption (included) - appropriate water absorption (included) around 1100 cm -1 - absorption band of SO 4 2- ion wavenumber range used for chemometric data evaluation: 2300 cm cm -1 Spectral evaluation

65 Salinity PCR PC #1 PC #2 PC #3 PC # loadings (a.u.) PC #5 PC #6 PC #7 PC # Salt ions in water wavenumber (cm -1 )

66 Salinity PCR 800 estimated detection limit: 1200 measured conc. (mm L -1 ) Synthetic samples 100 mm L -1 Na + + K Br Mg input conc. (mm L -1 ) Cl Ca estimated detection limit: 0.3 mm L -1 SO input conc. (mm L -1 ) 10 0

67 Salinity PCR 400 estimated detection limit: 400 measured conc (mm L -1 ) 100 mm L Na + + K + Cl - 0 typ. 479 mm L typ. 559 mm L Ca 2+ Br - typ. 11 mm L -1 typ. 1 mm L mm L estimated detection limit: Artificial seawater 0-50 Mg 2+ typ. 54 mm L input conc (mm L -1 ) SO 4 2- typ. 29 mm L input conc (mm L -1 ) 10

68 Salinity PCR a sensor is proposed for salinity analysis of aqueous samples investigated ions: Cl, Na +, Mg 2+, SO 2-4, Ca 2+, K +, Br measurement principle is based on changes of the water IR spectrum due to the ions (species and concentration dependent) multicomponent analysis of several salt ions was successful the influence of Na + and K + on the water spectrum is too similar to be discriminated estimated detection limits: 100 mm L -1 for all ion species except SO mm L -1 for SO 2-4 Cl, Na + + K +, and SO 2-4 can be determined at the concentrations present in sea water Ca 2+, Br, and Mg 2+ are present in real world samples at too low concentrations Conclusions

69 Design of Training Data Set In general Quality of training data set is the most important aspect! Predictive ability of the equations are only as good as the data used to calculate them in the first place! Control the variables: Collecting representative samples. Accurate primary calibration method. Appropriate sample measurements (reproducibility of conditions, etc.).

70 Design of Training Data Set Training samples similar to unknown samples Training samples should be as similar as possible to unknown samples! Spectrum of pristine constituent looks. different from when it is part of a mixture! Exception: very simple mixtures; samples in gas phase. Factor based models (PCA, PLS) can compensate for interconstituent interactions. BUT only if the training set contains examples of these!!!

71 Design of Training Data Set Training samples similar to unknown samples If samples are simple mixtures: Few components, distinct absorption features. Use simple models (CLS, ILS)! If samples are complex mixtures: Many components, overlapping absorption features. Use factor based models (PCR, PLS) extracting the relevant information from the spectra and ignoring the rest! HOWEVER: give the model the best chance to learn! Train it using samples that emulate the unknowns as closely as possible.

72 Design of Training Data Set Training samples similar to unknown samples Strategy: Collect actual samples from the measurement environment (e.g. plant, the field, etc.). Analyze them in the lab using other primary calibration methods (e.g. chromatography, wet chemical test, etc.). This data along with the sample spectra formulates the training set to build a reliable calibration model.

73 Design of Training Data Set Bracket concentration range Strategy: Constituent values for the training samples should span the expected range of all future unknown samples. Extrapolation is generally not a good idea! External validation is the only way to determine how well a model will predict outside the original calibration range. Consequently: constituent values in the training samples should be larger and smaller than the expected values in unknown samples. Do not hesitate using a lot of calibration samples!

74 Design of Training Data Set Use enough samples Strategy: Training set must have at least as many samples as there are constituents of interest. Usually many more than that (Note: noise)! How many samples are required to build a good model? As many samples as it takes! The more data fed into the model, the higher your confidence in the prediction!

75 Design of Training Data Set Use enough samples Strategy: Use a sufficiently large number of samples for calibration to allow sufficient factors in the model. For complex matrices you need enough samples to account for all the variability in the real samples. Note: the maximum number of factors that can be calculated for a given training set is limited by the smallest dimension of the data matrix. Example: if a training set has 300 samples but the calibration regions have only 20 total spectral data points, then the maximum number of factors is limited to 20 as well.

76 Design of Training Data Set Use enough samples Keep in mind: Quantity AND quality of the data is important! The more samples, the better the discrimination between analytically relevant signatures and noise. BUT:Only accumulating a huge number of spectra as a training set will not guarantee a better model - carefully measuring and qualifying a much smaller number for calibration is the way to go!

77 Design of Training Data Set Constituent collinearity We already know: Collinearity is the effect observed when the relative amounts of two or more constituents are constant throughout all the training samples. Why is this a problem? Factor based models do not calibrate by creating a direct relationship between the constituent data and spectral response. They correlate the change in concentration to corresponding changes in the spectra. If constituents are collinear, multivariate models cannot not differentiate them, and calibrations for the constituents will be unstable.

78 Design of Training Data Set Constituent collinearity Note: For simple bivariate calibrations we make one stock solution with high concentrations of all constituents of interest. Then we make multiple dilutions of that one mixture to create the remaining samples. This procedure will completely fail for multivariate models! To an eigenvector-based model only one factor will arise containing nearly all the variance in the training data set. How can we determine that collinearity happens?

79 Design of Training Data Set Constituent collinearity Plot the sample concentrations of each constituent in the model against the others: If the points fall on a straight line, the concentrations are collinear. If the constituents were uncorrelated they would form a cluster of points!

80 Spectral region selection An optimization problem Should we always use the whole spectrum? It is very easy (and convenient) to simply select the entire range of the training spectra as the set of data to use for calibration. PLS and PCR models will certainly be able to figure out the regions in the spectra that are most important for calibration. Since there is no apparent penalty in using as many wavelengths as possible for calibrations, why not just use the entire spectrum?

81 Spectral region selection An optimization problem There are many reasons why not Regions of the spectrum where either the detector, the spectrometer source or the optics are not effective (e.g. noise at detector cut-off). Example:including data from wavelengths below the detector cut-off is adding randomly distributed and uncorrelated absorbances to the factors. In general, only selecting the highly correlated regions of the spectrum for calibration will improve the accuracy for predicting the constituents of interest (Note: we evaluate the changes in the spectra!).

82 Spectral region selection An optimization problem There are more reasons why not Use this information along with your chemical knowledge on the samples to pre-select spectral regions for inclusion in a calibration. PCA and PLS factor analysis can correct for some non-linearities (e.g. Beer s law). However, they cannot correct for regions of over-absorbance (total absorbance).

83 Spectral region selection An optimization problem What is the price to pay? Discovery of impurities in the samples or unknown absorbers may be impaired. If the spectral bands of the impurities do not appear in the selected calibration regions, then there is will be no indication that the predicted constituent values are potentially incorrect. Only a problem if samples are entirely unknown, which is a problem as such anyway (see also: how much should we know about the sample to establish reliable calibration models)!

84 Spectral region selection How do we determine the useful regions? Correlation analysis Calculate the correlation of the absorbance at every wavelength in the training spectra to the concentrations of constituents. Regions that show high correlation are regions that should be selected for calibration, regions that show low or no correlation should be ignored.

85 Spectral region selection How do we determine the useful regions? 20 FT-IR spectra of ethanol/water mixtures. Coefficient of determination (R 2 ). (Note: goodness of the fit for linear regression; 1 = perfect fit; 1 indicates regions of high correlation between the spectral absorbances and the constituent concentrations; regions that are near 0 are not correlated) Linear correlation (R). (Note: This type of correlation plot not only indicates regions of the spectrum that are correlated to the constituents but the type of correlation as well.) Negative correlation in (R): two constituent mixture; increase in ethanol concentration gives a corresponding decrease in water.

86 Spectral region selection How do we determine the useful regions? Reason for negative correlation Increasing the concentration of one constituent in a mixture "dilutes" the others. If dilution occurs as ratio function of the increase of the constituent added, a negative correlation will appear. In most cases, these regions are as useful for calibration as the positively correlated regions! Always true?

87 Spectral region selection How do we determine the useful regions? No! Collinearity! If the concentrations of the constituents of interest vary as a function of one another (or of other unknown constituents), the correlations will indicate regions that are not really useful. Example: creating a training set by simply making dilutions of a single mixture! Be careful in correlation analysis!

88 Artificial neural networks What is it? Definition? Sophisticated modeling techniques capable of modeling extremely complex functions. Capable of modeling non-linear relationships. Can handle large numbers of variables. They learn by example: NN user gathers representative data, and then invokes training algorithms to automatically learn the structure of the data. Hence easy to use!

89 Artificial neural networks What is it? When to apply? NN are applicable in virtually every situation in which a relationship between the predictor variables (independents, inputs) and predicted variables (dependents, outputs) exists. Even when the relationship is very complex and not easy to articulate in the usual terms of correlations or defined differences between groups.

90 Artificial neural networks What is it? Why neural? Grew out of research in Artificial Intelligence. Attempts to mimic the fault-tolerance and capacity to learn of biological neural systems by modeling the low-level structure of the brain. Idea: brain composed of a large number (approx ) of neurons; massively interconnected (average of several thousand interconnects per neuron, although this varies enormously ;-)

91 Artificial neural networks What is it? Why neural? Each neuron is a specialized cell which can propagate an electrochemical signal. Neuron has a branching input structure (dendrites), cell body, and a branching output structure (axon). Axons of one cell connect to the dendrites of another via a synapse. If neuron is activated, fires an electrochemical signal along the axon. Signal crosses the synapses to other neurons, which may in turn fire. Neuron fires only if the total signal received at the cell body from the dendrites exceeds a certain level (firing threshold).

92 Artificial neural networks What is it? Why neural? Conclusion: from a very large number of extremely simple processing units (each performing a weighted sum of its inputs, and then firing a binary signal if the total input exceeds a certain level) the brain manages to perform extremely complex tasks. Can we use that model?

93 Artificial neural networks What is it? The artificial neural network (ANN) Artificial neuron: + Receives a number of inputs (either from original data, or from the output of other neurons in the ANN). + Each input comes via a connection that has a strength ( weight ) + Weights correspond to synaptic efficacy in a biological neuron. + Each neuron also has a single threshold value. + The weighted sum of the inputs is formed, and the threshold subtracted, to compose the activation of the neuron (postsynaptic potential). + The activation signal is passed through an activation function ( transfer function ) to produce the output.

94 Artificial neural networks What is it? The artificial neural network (ANN) How does it work? If a step activation function is used (i.e. neuron output is 0 if the input is less than zero and 1 if the input is greater than or equal to 0), then neuron acts like the biological neuron. (Note: usually sigmoid functions applied) Network:inputs (which carry the values of variables of interest in the outside world) and outputs (which form predictions, or control signals) have to be connected. Inputs and outputs correspond to sensory and motor nerves such as those coming from the eyes and leading to the hands.

95 Artificial neural networks What is it? The artificial neural network (ANN) Usual structure Input layer, hidden layer and output layer connected together in feed-forward structure: signals flow from inputs, forward through any hidden units, finally reaching the output units. Distinct layered topology. Hidden and output layer neurons are each connected to all of the units in the preceding layer.

96 Artificial neural networks How does it work? Operation of an ANN Feed information The input variable values are placed in the input units. Processing + The hidden and output layer units are progressively executed. + Each of them calculates its activation value by taking the weighted sum of the outputs of the units in the preceding layer and subtracting the threshold. + Activation value is passed through the activation function to produce the output of the neuron. + When entire network has been executed: outputs of output layer act as the output of the entire network.

Chemometrics. 1. Find an important subset of the original variables.

Chemometrics. 1. Find an important subset of the original variables. Chemistry 311 2003-01-13 1 Chemometrics Chemometrics: Mathematical, statistical, graphical or symbolic methods to improve the understanding of chemical information. or The science of relating measurements

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information.

Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental science

More information

Multivariate calibration

Multivariate calibration Multivariate calibration What is calibration? Problems with traditional calibration - selectivity - precision 35 - diagnosis Multivariate calibration - many signals - multivariate space How to do it? observed

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 6. Principal component analysis (PCA) 6.1 Overview 6.2 Essentials of PCA 6.3 Numerical calculation of PCs 6.4 Effects of data preprocessing

More information

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references

More information

Neural Networks Introduction

Neural Networks Introduction Neural Networks Introduction H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural Networks 1/22 Biological

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation

More information

Putting Near-Infrared Spectroscopy (NIR) in the spotlight. 13. May 2006

Putting Near-Infrared Spectroscopy (NIR) in the spotlight. 13. May 2006 Putting Near-Infrared Spectroscopy (NIR) in the spotlight 13. May 2006 0 Outline What is NIR good for? A bit of history and basic theory Applications in Pharmaceutical industry Development Quantitative

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

The Theory of HPLC. Quantitative and Qualitative HPLC

The Theory of HPLC. Quantitative and Qualitative HPLC The Theory of HPLC Quantitative and Qualitative HPLC i Wherever you see this symbol, it is important to access the on-line course as there is interactive material that cannot be fully shown in this reference

More information

Semi-Quantitative Analysis of Analytical Data using Chemometric Methods. Part II.

Semi-Quantitative Analysis of Analytical Data using Chemometric Methods. Part II. Semi-Quantitative Analysis of Analytical Data using Chemometric Methods. Part II. Simon Bates, Ph.D. After working through the various identification and matching methods, we are finally at the point where

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Introduction to Principal Component Analysis (PCA)

Introduction to Principal Component Analysis (PCA) Introduction to Principal Component Analysis (PCA) NESAC/BIO NESAC/BIO Daniel J. Graham PhD University of Washington NESAC/BIO MVSA Website 2010 Multivariate Analysis Multivariate analysis (MVA) methods

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Designing Information Devices and Systems II Fall 2018 Elad Alon and Miki Lustig Homework 9

Designing Information Devices and Systems II Fall 2018 Elad Alon and Miki Lustig Homework 9 EECS 16B Designing Information Devices and Systems II Fall 18 Elad Alon and Miki Lustig Homework 9 This homework is due Wednesday, October 31, 18, at 11:59pm. Self grades are due Monday, November 5, 18,

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Pros and Cons of Water Analysis Methods

Pros and Cons of Water Analysis Methods Water Lens, LLC 4265 San Felipe, Suite 1100 Houston, Texas 77027 Office: (844) 987-5367 www.waterlensusa.com Pros and Cons of Water Analysis Methods Prepared by: Adam Garland, CTO Water Lens, LLC ICP-MS/OES

More information

Inferential Analysis with NIR and Chemometrics

Inferential Analysis with NIR and Chemometrics Inferential Analysis with NIR and Chemometrics Santanu Talukdar Manager, Engineering Services Part 2 NIR Spectroscopic Data with Chemometrics A Tutorial Presentation Part 2 Page.2 References This tutorial

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Agilent Oil Analyzer: customizing analysis methods

Agilent Oil Analyzer: customizing analysis methods Agilent Oil Analyzer: customizing analysis methods Application Note Author Alexander Rzhevskii & Mustafa Kansiz Agilent Technologies, Inc. Introduction Traditionally, the analysis of used oils has been

More information

Artificial Neural Network and Fuzzy Logic

Artificial Neural Network and Fuzzy Logic Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI - (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Answers to spectroscopy questions. 1. Consider the spectrum below. Questions a f refer to this spectrum.

Answers to spectroscopy questions. 1. Consider the spectrum below. Questions a f refer to this spectrum. Answers to spectroscopy questions. 1. Consider the spectrum below. Questions a f refer to this spectrum. a. Is the spectrum above a band spectrum or a line spectrum? This is a band spectra, there are what

More information

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller 2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that

More information

University of Massachusetts Boston - Chemistry Department Physical Chemistry Laboratory Introduction to Maximum Probable Error

University of Massachusetts Boston - Chemistry Department Physical Chemistry Laboratory Introduction to Maximum Probable Error University of Massachusetts Boston - Chemistry Department Physical Chemistry Laboratory Introduction to Maximum Probable Error Statistical methods describe random or indeterminate errors in experimental

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Near Infrared reflectance spectroscopy (NIRS) Dr A T Adesogan Department of Animal Sciences University of Florida

Near Infrared reflectance spectroscopy (NIRS) Dr A T Adesogan Department of Animal Sciences University of Florida Near Infrared reflectance spectroscopy (NIRS) Dr A T Adesogan Department of Animal Sciences University of Florida Benefits of NIRS Accurate Rapid Automatic Non-destructive No reagents required Suitable

More information

Photometric Redshifts with DAME

Photometric Redshifts with DAME Photometric Redshifts with DAME O. Laurino, R. D Abrusco M. Brescia, G. Longo & DAME Working Group VO-Day... in Tour Napoli, February 09-0, 200 The general astrophysical problem Due to new instruments

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

IASI PC compression Searching for signal in the residuals. Thomas August, Nigel Atkinson, Fiona Smith

IASI PC compression Searching for signal in the residuals. Thomas August, Nigel Atkinson, Fiona Smith IASI PC compression Searching for signal in the residuals Tim.Hultberg@EUMETSAT.INT Thomas August, Nigel Atkinson, Fiona Smith Raw radiance (minus background) Reconstructed radiance (minus background)

More information

Generalized Least Squares for Calibration Transfer. Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc.

Generalized Least Squares for Calibration Transfer. Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc. Generalized Least Squares for Calibration Transfer Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc. Manson, WA 1 Outline The calibration transfer problem Instrument differences,

More information

EXTENDING PARTIAL LEAST SQUARES REGRESSION

EXTENDING PARTIAL LEAST SQUARES REGRESSION EXTENDING PARTIAL LEAST SQUARES REGRESSION ATHANASSIOS KONDYLIS UNIVERSITY OF NEUCHÂTEL 1 Outline Multivariate Calibration in Chemometrics PLS regression (PLSR) and the PLS1 algorithm PLS1 from a statistical

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Mixture Analysis Made Easier: Trace Impurity Identification in Photoresist Developer Solutions Using ATR-IR Spectroscopy and SIMPLISMA

Mixture Analysis Made Easier: Trace Impurity Identification in Photoresist Developer Solutions Using ATR-IR Spectroscopy and SIMPLISMA Mixture Analysis Made Easier: Trace Impurity Identification in Photoresist Developer Solutions Using ATR-IR Spectroscopy and SIMPLISMA Michel Hachey, Michael Boruta Advanced Chemistry Development, Inc.

More information

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition

More information

Chapter 9: The Perceptron

Chapter 9: The Perceptron Chapter 9: The Perceptron 9.1 INTRODUCTION At this point in the book, we have completed all of the exercises that we are going to do with the James program. These exercises have shown that distributed

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Fall 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate it

More information

Artifical Neural Networks

Artifical Neural Networks Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

ReducedPCR/PLSRmodelsbysubspaceprojections

ReducedPCR/PLSRmodelsbysubspaceprojections ReducedPCR/PLSRmodelsbysubspaceprojections Rolf Ergon Telemark University College P.O.Box 2, N-9 Porsgrunn, Norway e-mail: rolf.ergon@hit.no Published in Chemometrics and Intelligent Laboratory Systems

More information

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, 2012 Sasidharan Sreedharan www.sasidharan.webs.com 3/1/2012 1 Syllabus Artificial Intelligence Systems- Neural Networks, fuzzy logic,

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17 Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Spring 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

Multivariate Analysis, TMVA, and Artificial Neural Networks

Multivariate Analysis, TMVA, and Artificial Neural Networks http://tmva.sourceforge.net/ Multivariate Analysis, TMVA, and Artificial Neural Networks Matt Jachowski jachowski@stanford.edu 1 Multivariate Analysis Techniques dedicated to analysis of data with multiple

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence Artificial Intelligence (AI) Artificial Intelligence AI is an attempt to reproduce intelligent reasoning using machines * * H. M. Cartwright, Applications of Artificial Intelligence in Chemistry, 1993,

More information

Bearing fault diagnosis based on EMD-KPCA and ELM

Bearing fault diagnosis based on EMD-KPCA and ELM Bearing fault diagnosis based on EMD-KPCA and ELM Zihan Chen, Hang Yuan 2 School of Reliability and Systems Engineering, Beihang University, Beijing 9, China Science and Technology on Reliability & Environmental

More information

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Fall, 2018 Outline Introduction A Brief History ANN Architecture Terminology

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Developing a Spectrophotometric Quantitative Assay for p-nitrophenol

Developing a Spectrophotometric Quantitative Assay for p-nitrophenol Developing a Spectrophotometric Quantitative Assay for p-nitrophenol The insecticide parathion (O,O-diethyl-o-p-nitrophenyl phosphorothioate) undergoes a welldefined pathway of biodegradation. In the first

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

Using a Hopfield Network: A Nuts and Bolts Approach

Using a Hopfield Network: A Nuts and Bolts Approach Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of

More information

Analysis of Cocoa Butter Using the SpectraStar 2400 NIR Spectrometer

Analysis of Cocoa Butter Using the SpectraStar 2400 NIR Spectrometer Application Note: F04 Analysis of Cocoa Butter Using the SpectraStar 2400 NIR Spectrometer Introduction Near-infrared (NIR) technology has been used in the food, feed, and agriculture industries for over

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 2017-2018 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

2.01 INFRARED ANALYZER

2.01 INFRARED ANALYZER NIR INFORMATION PAGE 1 OF 5 What does it do? First, it allows for better control and knowledge of incoming ingredients. If the ability exists to analyze a truck or rail car immediately upon arrival, the

More information

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216

More information

Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares

Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares R Gutierrez-Osuna Computer Science Department, Wright State University, Dayton, OH 45435,

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Spectroscopy in Transmission

Spectroscopy in Transmission Spectroscopy in Transmission + Reflectance UV/VIS - NIR Absorption spectra of solids and liquids can be measured with the desktop spectrograph Lambda 9. Extinctions up to in a wavelength range from UV

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Lecture 32. Lidar Error and Sensitivity Analysis

Lecture 32. Lidar Error and Sensitivity Analysis Lecture 3. Lidar Error and Sensitivity Analysis Introduction Accuracy in lidar measurements Precision in lidar measurements Error analysis for Na Doppler lidar Sensitivity analysis Summary 1 Errors vs.

More information

Validating Slope Spectroscopy Methods: A Formula for Robust Measurements

Validating Slope Spectroscopy Methods: A Formula for Robust Measurements CWP0802261A White Paper Validating Slope Spectroscopy Methods: A Formula for Robust Measurements February 26, 2008 I-Tsung Shih, PhD Mark Salerno 1.0 Abstract Demands on measurement systems are ever increasing.

More information

Application of Raman Spectroscopy for Detection of Aflatoxins and Fumonisins in Ground Maize Samples

Application of Raman Spectroscopy for Detection of Aflatoxins and Fumonisins in Ground Maize Samples Application of Raman Spectroscopy for Detection of Aflatoxins and Fumonisins in Ground Maize Samples Kyung-Min Lee and Timothy J. Herrman Office of the Texas State Chemist, Texas A&M AgriLife Research

More information

A Support Vector Regression Model for Forecasting Rainfall

A Support Vector Regression Model for Forecasting Rainfall A Support Vector Regression for Forecasting Nasimul Hasan 1, Nayan Chandra Nath 1, Risul Islam Rasel 2 Department of Computer Science and Engineering, International Islamic University Chittagong, Bangladesh

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

ASEAN GUIDELINES FOR VALIDATION OF ANALYTICAL PROCEDURES

ASEAN GUIDELINES FOR VALIDATION OF ANALYTICAL PROCEDURES ASEAN GUIDELINES FOR VALIDATION OF ANALYTICAL PROCEDURES Adopted from ICH Guidelines ICH Q2A: Validation of Analytical Methods: Definitions and Terminology, 27 October 1994. ICH Q2B: Validation of Analytical

More information

Artificial Neural Networks Francesco DI MAIO, Ph.D., Politecnico di Milano Department of Energy - Nuclear Division IEEE - Italian Reliability Chapter

Artificial Neural Networks Francesco DI MAIO, Ph.D., Politecnico di Milano Department of Energy - Nuclear Division IEEE - Italian Reliability Chapter Artificial Neural Networks Francesco DI MAIO, Ph.D., Politecnico di Milano Department of Energy - Nuclear Division IEEE - Italian Reliability Chapter (Chair) STF - China Fellow francesco.dimaio@polimi.it

More information

CHAPTER 3. Pattern Association. Neural Networks

CHAPTER 3. Pattern Association. Neural Networks CHAPTER 3 Pattern Association Neural Networks Pattern Association learning is the process of forming associations between related patterns. The patterns we associate together may be of the same type or

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information