Multivariate calibration

Similar documents
Multivariate data analysis (MVA) - Introduction

Putting Near-Infrared Spectroscopy (NIR) in the spotlight. 13. May 2006

Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information.

Spectroscopy in Transmission

Demonstrating the Value of Data Fusion

Independence and Dependence in Calibration: A Discussion FDA and EMA Guidelines

Chromatography & instrumentation in Organic Chemistry

Detection and Quantification of Adulteration in Honey through Near Infrared Spectroscopy

Chemometrics The secrets behind multivariate methods in a nutshell!

Multivariate analysis (chemometrics) - quality of multivariate calibration

FTIR characterization of Mexican honey and its adulteration with sugar syrups by using chemometric methods

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton

Chemometrics. 1. Find an important subset of the original variables.

EXTENDING PARTIAL LEAST SQUARES REGRESSION

Water in Food 3 rd International Workshop. Water Analysis of Food Products with at-line and on-line FT-NIR Spectroscopy. Dr.

USE OF NEAR INFRARED SPECTROSCOPY TECHNOLOGY FOR PREDICTING BENDING PROPERTIES OF CLEAR WOOD SPECIMENS KIRK DAVID KLUDT

Rapid Compositional Analysis of Feedstocks Using Near-Infrared Spectroscopy and Chemometrics

Application of Raman Spectroscopy for Noninvasive Detection of Target Compounds. Kyung-Min Lee

Discrimination of Dyed Cotton Fibers Based on UVvisible Microspectrophotometry and Multivariate Statistical Analysis

The Role of a Center of Excellence in Developing PAT Applications

Food authenticity and adulteration testing using trace elemental and inorganic mass spectrometric techniques

Advancing from unsupervised, single variable-based to supervised, multivariate-based methods: A challenge for qualitative analysis

NIRS Getting started. US Dairy Forage Research Center 2004 Consortium Annual Meeting

Molecular Spectroscopy In The Study of Wood & Biomass

ABSTRACT. USDA FS Forest Products Laboratory, Madison WI ADVANCING RAMAN SPECTROSCOPY FOR LlGNlN APPLICATIONS

Cross model validation and multiple testing in latent variable models

DISCRIMINATION AND CLASSIFICATION IN NIR SPECTROSCOPY. 1 Dept. Chemistry, University of Rome La Sapienza, Rome, Italy

Generalized Least Squares for Calibration Transfer. Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc.

QTOF-based proteomics and metabolomics for the agro-food chain.

Inferential Analysis with NIR and Chemometrics

Chemometrics Techniques for Metabonomics

Robust estimation of principal components from depth-based multivariate rank covariance matrix

LIST OF ABBREVIATIONS:

Pulp and Paper Applications

Robert P. Cogdill, Carl A. Anderson, and James K. Drennen, III

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Analysis of Cocoa Butter Using the SpectraStar 2400 NIR Spectrometer

Multivariate data-modelling to separate light scattering from light absorbance: The Optimized Extended Multiplicative Signal Correction OEMSC

The Theory of HPLC. Quantitative and Qualitative HPLC

MULTIVARIATE STATISTICAL ANALYSIS OF SPECTROSCOPIC DATA. Haisheng Lin, Ognjen Marjanovic, Barry Lennox

Vector Space Models. wine_spectral.r

Extraction of pure component spectrum from mixture spectra containing a known diluent

Experimental Classification of Matter

PAT: THE EXTRACTION OF MAXIMUM INFORMATION FROM MESSY SPECTRAL DATA. Zeng-Ping Chen and Julian Morris

Positive and Nondestructive Identification of Acrylic-Based Coatings

Correcting Lineshapes in NMR Spectra

ENERGY = ATP ATP. B. How is Energy stored in our cells? 1. In the chemical bonds between the phosphates

Scientific report STSM COST FP1006

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

NIR Spectroscopy Identification of Persimmon Varieties Based on PCA-SVM

The integration of NeSSI with Continuous Flow Reactors and PAT for Process Optimization

INTRODUCCIÓ A L'ANÀLISI MULTIVARIANT. Estadística Biomèdica Avançada Ricardo Gonzalo Sanz 13/07/2015

Application of Raman Spectroscopy for Detection of Aflatoxins and Fumonisins in Ground Maize Samples

Whole Tablet Measurements Using the Frontier Tablet Autosampler System

Application of near-infrared (NIR) spectroscopy to sensor based sorting of an epithermal Au-Ag ore

Plum-tasting using near infra-red (NIR) technology**

Overview of methods and challenges for microplastic analysis. Jes Vollertsen, Professor of Environmental Engineering, Aalborg University

Application of color detection by near infrared spectroscopy in color detection of printing

2.01 INFRARED ANALYZER

Inkjet printing of pharmaceuticals

Application Note: 130. Measurement of Lozenges and Whole Tablets using the Series 2000 Tablet Analyser

Explaining Correlations by Plotting Orthogonal Contrasts

i-fast validatietraject MOBISPEC

ReducedPCR/PLSRmodelsbysubspaceprojections

Pungency Level Assessment With E-Tongue Application to Tabasco Sauce

DEVELOPMENT OF ANALYTICAL METHODS IN NIR SPECTROSCOPY FOR THE CHARACTERIZATION OF SLUDGE FROM WASTEWATER TREATEMENT PLANTS FOR AGRICULTURAL USE

Développement d un système d imagerie spectrale pour contrôle de la qualité en-ligne des procédés d extrusions de composites plastiques

BASICS OF CHEMOMETRICS

3) In CE separation is based on what two properties of the solutes? (3 pts)

Comparison and combination of near-infrared and Raman spectra for PLS and NAS quantitation of glucose, urea and lactate

Analysis of heterogeneity distribution in powder blends using hyperspectral imaging

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Multiplexing immunoassay with SERS

Instrumental Chemical Analysis

PAT/QbD: Getting direction from other industries

2 SPECTROSCOPIC ANALYSIS

Introduction to Principal Component Analysis (PCA)

Techniques and Applications of Hyperspectral Image Analysis

To Hy-Ternity and beyond. Boris Larchevêque, INDATECH

Quantitative Analysis of Caffeine in Energy Drinks by High Performance Liquid Chromatography

Analysis of NIR spectroscopy data for blend uniformity using semi-nonnegative Matrix Factorization

COC Biotechnology Program

Development of Near Infrared Spectroscopy for Rapid Quality Assessment of Red Ginseng

Chemometrics: Classification of spectra

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Functional Preprocessing for Multilayer Perceptrons

Chemometrics. Classification of Mycobacteria by HPLC and Pattern Recognition. Application Note. Abstract

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Online NIR-Spectroscopy for the Live Monitoring of Hydrocarbon Index (HI) in Contaminated Soil Material

Quality control analytical methods- Switch from HPLC to UPLC

NMR Spectroscopy in Education : Using NMR Prediction for Investigation of Red Bull and The great NMR trivia challenge!

Experimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University

Research Article Honey Discrimination Using Visible and Near-Infrared Spectroscopy

Exploring Raman spectral data using chemometrics

Introduction to Pharmaceutical Chemical Analysis

Substance Characterisation for REACH. Dr Emma Miller Senior Chemist

Whole Tablet Measurements Using the Spectrum One NTS Tablet Autosampler System

Principal component analysis

Yat Yun Wei Food Safety Division Health Sciences Authority. All Rights Reserved Health Sciences Authority

FAST CROSS-VALIDATION IN ROBUST PCA

Transcription:

Multivariate calibration What is calibration? Problems with traditional calibration - selectivity - precision 35 - diagnosis Multivariate calibration - many signals - multivariate space How to do it? observed 30 25 20 15 10 5 Example: Mix 0 0 5 10 15 20 25 30 35 predicted 2012-10-12 MVA Calibration 2007 H. Antti 1

What is calibration? 1) Samples with known concentrations (c i ) Signal amplitudes (A i ) from measurement on samples Standard curve A C 2) New samples with unknown concentrations A Measurements signal amplitudes, A j predicted conc. values, c j for new samples (from standard curve) C 2012-10-12 MVA Calibration 2007 H. Antti 2

Problems with traditional calibration Selectivity: There is NO unique signal where ONLY the analyte absorbs. S Precision: Noise in the signal amplitude is transferred to the predicted concentration for a new sample. Diagnosis: The standard curve is ONLY valid for samples similar to the ones in the calibration. A f C C 2012-10-12 MVA Calibration 2007 H. Antti 3

Multivariate calibration Many signals (spectrum digitised at K different wavelengths) K variables K signals S f Multivariate space - each variable defines a coordinate axis- Space with K coordinate axes. - Points, lines, distances,..., have got the same properties in K as in 2 and 3 dimensions. 2012-10-12 MVA Calibration 2007 H. Antti 4

Multivariate calibration One analyte (y-variable): - all points (digitised spectra) are describing a line ± noise in K-space. One analyte + interacting compounds, or many analytes + interacting compounds : - all points are describing a hyper-plane ± noise in K-space. 2012-10-12 MVA Calibration 2007 H. Antti 5

How to do it? Select samples representing the interesting variation. (Use design - FF, FrF, D-opt, Mixture) Measure Y-data for each sample using the "traditional" method i.e. the method we wish to replace. X Y Use the new method (usually spectroscopy) to characterise the samples, these measurements are the X-data. Select calibration and test samples (PCA of X) Calculate a calibration model using PLS. Evaluate and interpret the model. Test the model using external samples! Use model for classification and prediction of new samples. 2012-10-12 MVA Calibration 2007 H. Antti 6 X Y New samples?

Application areas Wheat, corn,... Protein, water, fat NIR Peat,. Water, energy, sugar, C, N, S NIR Lake water Humus acids, lignin sulfonates UV Beer, wine alcohol, protein, sugar, etc.nir, IR Whisky, wine Taste, smell, quality GC, HPLC Pulp, paper Raw material, lignin, products. NIR, UV, IR, NMR Pigs (living) Fat, meat, etc. NIR Humans (living) Hair, blood, urine, skin, operations NIR, FT-IR, NMR Plant material Screening for natural products NIR Pharmaceuticals Compounds & metabolites UV-Vis, FT-IR Process quality Sensors NIR, IR, UV, GC + many more 2012-10-12 MVA Calibration 2007 H. Antti 7

Example - Mix (Prediction of wood mixes) Pulp wood from three wood species (Pine, Spruce, Birch) was ground to a powder and mixed according to a mixture design. (30 samples in total) - The sum of the three constituents in each mixture = 1 (100%) (Closure) 100 % Pine 100 % Spruce 100 % Birch 2012-10-12 MVA Calibration 2007 H. Antti 8

From spectra to data table For each sample (mixture) a NIR spectrum was acquired. This generated 1050 wavelengths (variables) in the NIR region characterizing each sample. Spectra were digitized giving the X data matrix below. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 1 2 3 1 2 3 Wavelenghts 1 (variables) K X 0.2 0.1 0 1 wavelenghts K (variables) N K: no of wavelengths N: no of samples = no of spectrum 2012-10-12 MVA Calibration 2007 H. Antti 9

Exempel - Mix (Prediktion av vedblandningar) The aim with the study was to use the NIR spectra of known mixtures of wood samples To calculate a multivariate calibration model for prediction of sample mixtures (Y). 20 samples (mixtures) were used to calculate a calibration model (training set). 10 samples were selected for testing the models predictive ability (test set). N NIR data X K PLS 20 NIR data X 1050 PLS 20 Y 3 3Y (% pine, % spruce, % birch) 2012-10-12 MVA Calibration 2007 H. Antti 10

Selection of calibration & test set Based on scores from the PCA of X (spectra) calibration (training) and test samples are selected. The calibration shall span the experimental space and give a good description of the entire space. The test samples shall be equally distributed over the entire space but not outside the limits set by the calibration (training) samples. 1050 30 NIR data X PCA Selection of calibration and test samples. 2012-10-12 MVA Calibration 2007 H. Antti 11

PCA (Principal Component Analysis) 1 2 3 wavelengths 1 (variables) K t 1 t 2 t i X scores N p 1 p 2 p i loadings 2012-10-12 MVA Calibration 2007 H. Antti 12

PCA of X (spectra) 4 PCs significant according to cross validation 2 PCs significant according to eigenvalue (>2) After two PCs 97.8 % of the variation in X is described and 97.3 % of the variation in X can be predicted according to cross validation. Hence, we are describing the main part of the variation with two PCs and set for two PCs for interpretation of the systematic variation in X. 2012-10-12 MVA Calibration 2007 H. Antti 13

Interpretation of scores (t1/t2) (33, 33, 33) 100% Birch 100% Spruce 100% Pine Scores contain discriminating information regarding wood mixtures. I.e. spectra contain discriminating information regarding wood mixtures. 2012-10-12 MVA Calibration 2007 H. Antti 14

Selection of calibration & test set from scores Calibration samples (circled) span the experimental space and are evenly distributed Over the whole surface. Test samples (in squares) are evenly distributed over the surface but not outside the model limits set by the calibration samples. 2012-10-12 MVA Calibration 2007 H. Antti 15

Model limits for calibration The black lines in the score plot define the limits for the calibration model. Within these limits the model will be valid. 2012-10-12 MVA Calibration 2007 H. Antti 16

Loadings (p1 & p2) for PCA Loading (p1/wavelength) shows that the separation in the first PC depends on almost all wavelengths in the spectra. Loading (p2/wavelength) shows that the separation in the second PC depends on early wavelengths in the spectra. 2012-10-12 MVA Calibration 2007 H. Antti 17

DModX for the 30 samples in X DModX for the 30 samples don t suggest any extreme outliers. 2012-10-12 MVA Calibration 2007 H. Antti 18

Calibration and Validation 1 2 3 våglängder 1 (variabler) K X PLS Y Model N 1 A 1 K Test set prediction Y test Validation 2012-10-12 MVA Calibration 2007 H. Antti 19

Estimate/Prediction Estimate: Fit of model to calibration samples Prediction: Prediction of test samples (not included in model) 35 Measured using standard method observed 30 25 20 15 10 5 The model must be tested with external samples in order to make sure that it works in a real situation! 0 0 5 10 15 20 25 30 35 predicted Predicted by PLS model 2012-10-12 MVA Calibration 2007 H. Antti 20

Calculation of calibration model (PLS) Cross validation gives 3 significant components R2X= 0.996, R2Y= 0.979, Q2= 0.874 Q2 increases significantly for every new component added 2012-10-12 MVA Calibration 2007 H. Antti 21

Interpretation of scores for the PLS model t1/u1 correlation X/Y in 1st comp. t1/u1 correlation X/Y in 2nd comp. t1/t2 overview of sample variation in X. u1/u2 overview of sample variation in Y. 2012-10-12 MVA Calibration 2007 H. Antti 22

DModX, DModY for calibration samples No extreme outliers in X nor Y space! 2012-10-12 MVA Calibration 2007 H. Antti 23

Prediction of test samples Validation of the model by prediction of the 10 test samples RMSEP is the average prediction error in the same unit as Y. RMSEP = sqrt(press/n) 2012-10-12 MVA Calibration 2007 H. Antti 24

DModX, DModY for test samples No extreme outliers in X nor Y space! 2012-10-12 MVA Calibration 2007 H. Antti 25

Summary of Calibration Model High R2, Q2 Good correlation between X and Y (t/u) No outliers ( scores, DModX,Y) Good predictions of external samples (test set) Conclusion We have a model that can be used for prediction of unknown samples (within the model limits). 2012-10-12 MVA Calibration 2007 H. Antti 26

Prediction of unknown samples Spectra for five unknown samples were used to predict the mixtures of the three wood species (Pine, Spruce, Birch) Predictions The sum of the predicted values for the three wood species is close to 100 %. This comes from the properties of the experimental design (closure). 2012-10-12 MVA Calibration 2007 H. Antti 27

DModX and Scores for unknown samples DModX + scores imply that sample 23 doesn t really fit the model. Care should be taken in terms of the reliability of the prediction of sample 23. 2012-10-12 MVA Calibration 2007 H. Antti 28

Application of the example - Mix NIR control Raw material Process Product 2012-10-12 MVA Calibration 2007 H. Antti 29

Conclusion - Multivariate calibration Multivariate calibration gives robust models that can separate systematic variation from noise. Multivariate calibration uses many variables for calibration. Multivariate calibration is based on projection methods (PCA, PLS) Replace traditional method with a new faster, simpler, cheaper,. method (spectroscopy). Selection of calibration and test samples (PCA) Correlation X/Y (PLS) An absolute must to validate model with external samples Prediction of unknowns once the model has been validated and is reliable. 2012-10-12 MVA Calibration 2007 H. Antti 30