Optimizing Model Development and Validation Procedures of Partial Least Squares for Spectral Based Prediction of Soil Properties

Similar documents
Eyal Ben-Dor Department of Geography Tel Aviv University

The role of soil spectral library for the food security issue

ANALYSIS OF LARGE SCALE SOIL SPECTRAL LIBRARIES

Chemometrics: Classification of spectra

Multivariate Calibration with Robust Signal Regression

Second EARSeL Workshop on Imaging Spectroscopy, Enschede, 2000

Application of near-infrared (NIR) spectroscopy to sensor based sorting of an epithermal Au-Ag ore

CS229 Final Project. Wentao Zhang Shaochuan Xu

Corresponding Author: Jason Wight,

Hurricane Prediction with Python

Course in Data Science

Visible-near infrared spectroscopy to assess soil contaminated with cobalt

BASICS OF CHEMOMETRICS

PRECISION AGRICULTURE APPLICATIONS OF AN ON-THE-GO SOIL REFLECTANCE SENSOR ABSTRACT

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

The prediction of house price

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Chemometrics. 1. Find an important subset of the original variables.

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

CS 6375 Machine Learning

Machine Learning 11. week

Application of Raman Spectroscopy for Detection of Aflatoxins and Fumonisins in Ground Maize Samples

Basics of Multivariate Modelling and Data Analysis

Multivariate data-modelling to separate light scattering from light absorbance: The Optimized Extended Multiplicative Signal Correction OEMSC

Near Infrared reflectance spectroscopy (NIRS) Dr A T Adesogan Department of Animal Sciences University of Florida

NIR Spectroscopy Identification of Persimmon Varieties Based on PCA-SVM

Lineshape fitting of iodine spectra near 532 nm

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

The Future of Spectroscopy

Computational Genomics

Structure-Activity Modeling - QSAR. Uwe Koch

PREDICTION OF SOIL ORGANIC CARBON CONTENT BY SPECTROSCOPY AT EUROPEAN SCALE USING A LOCAL PARTIAL LEAST SQUARE REGRESSION APPROACH

Chemometrics. Classification of Mycobacteria by HPLC and Pattern Recognition. Application Note. Abstract

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Chunmei Chen A,B and Donald L Sparks A. Delaware, Newark, DE 19711, USA.

EO-1 SVT Site: TUMBARUMBA, AUSTRALIA

Performance of spectrometers to estimate soil properties

Use of Near Infrared Spectroscopy to Determine Chemical Properties of Forest Soils. M. Chodak 1, F. Beese 2

Deep Learning. Convolutional Neural Networks Applications

Seismic source modeling by clustering earthquakes and predicting earthquake magnitudes

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Issues and Techniques in Pattern Classification

Rapid Compositional Analysis of Feedstocks Using Near-Infrared Spectroscopy and Chemometrics

A COMPARISON BETWEEN DIFFERENT PIXEL-BASED CLASSIFICATION METHODS OVER URBAN AREA USING VERY HIGH RESOLUTION DATA INTRODUCTION

SATELLITE REMOTE SENSING

Intelligence and statistics for rapid and robust earthquake detection, association and location

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Generalized Least Squares for Calibration Transfer. Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc.

Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information.

Predicting Africa Soil Properties Using Machine Learning Techniques

CHEMISTRY. CHEM 0100 PREPARATION FOR GENERAL CHEMISTRY 3 cr. CHEM 0110 GENERAL CHEMISTRY 1 4 cr. CHEM 0120 GENERAL CHEMISTRY 2 4 cr.

CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Sparse Proteomics Analysis (SPA)

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

CS7267 MACHINE LEARNING

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

ABB Remote Sensing Atmospheric Emitted Radiance Interferometer AERI system overview. Applications

Soil surface illumination at micro-relief scale and soil BRDF data collected by a hyperspectral camera

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Holdout and Cross-Validation Methods Overfitting Avoidance

Change Detection Over Sokolov Open Pit Mine Areas, Czech Republic, Using Multi Temporal HyMAP Data ( )

HYPERSPECTRAL IMAGING

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

CS534 Machine Learning - Spring Final Exam

GLOBAL SYMPOSIUM ON SOIL ORGANIC CARBON, Rome, Italy, March 2017

Reduced Order Greenhouse Gas Flaring Estimation

Laboratory, field and airborne spectroscopy for monitoring organic carbon content in agricultural soils

COMPLEX PRINCIPAL COMPONENT SPECTRA EXTRACTION

Machine Learning and Data Mining. Linear regression. Kalev Kask

Detection of Chlorophyll Content of Rice Leaves by Chlorophyll Fluorescence Spectrum Based on PCA-ANN Zhou Lina1,a Cheng Shuchao1,b Yu Haiye2,c 1

NASA s Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) AVIRIS: PEARL HARBOR, HAWAII

c 4, < y 2, 1 0, otherwise,

A MULTIVARIATE REGRESSION ANALYSIS FOR DERIVING ENGINEERING PARAMETERS OF EXPANSIVE SOILS FROM SPECTRAL REFLECTANCE

Functional Preprocessing for Multilayer Perceptrons

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

Latent Variable Methods Course

6.036 midterm review. Wednesday, March 18, 15

Application of Raman Spectroscopy for Noninvasive Detection of Target Compounds. Kyung-Min Lee

WILEY STRUCTURAL HEALTH MONITORING A MACHINE LEARNING PERSPECTIVE. Charles R. Farrar. University of Sheffield, UK. Keith Worden

Metric-based classifiers. Nuno Vasconcelos UCSD

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Active and Semi-supervised Kernel Classification

Anomaly (outlier) detection. Huiping Cao, Anomaly 1

A Service Architecture for Processing Big Earth Data in the Cloud with Geospatial Analytics and Machine Learning

Remote sensing of sealed surfaces and its potential for monitoring and modeling of urban dynamics

Cross model validation and multiple testing in latent variable models

Spectroscopy in Transmission

Introduction. Chapter 1

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton

Chemometrics The secrets behind multivariate methods in a nutshell!

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Electrical and Computer Engineering Department University of Waterloo Canada

Nearest Correlation Louvain Method for Fast and Good Selection of Input Variables of Statistical Model

Mutual information for the selection of relevant variables in spectrometric nonlinear modelling

Remote Sensing Calibration of Casey Lake and Silver Lake

Gaussian Models

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Transcription:

Optimizing Model Development and Validation Procedures of Partial Least Squares for Spectral Based Prediction of Soil Properties

Soil Spectroscopy Extracting chemical and physical attributes from spectral data Execution from various platforms Handheld, Field, Airborne, Satellite Endless potential applications Agriculture, Environmental, Health

Modeling Soil Properties Ben-Dor, E., & Banin, A. (1995). Near-infrared analysis as a rapid method to simultaneously evaluate several soil properties. Soil Science Society of America Journal, 59(2), 364-372. SSA Specific Surface Area Carbonates CaCO 3 HIGF Hygroscopic Water CEC Cation Exchange Capacity OM Organic Matter Clay Clay Size Fraction

Chemical and Physical Chromophores 60% Cellular Structure Clays 50% Reflectance 40% 30% 20% 10% 0% 420 520 620 720 820 920 1020 1120 1220 1320 1420 1520 1620 1720 1820 1920 2020 2120 2220 2320 Wavelength Chlorophyll Absorption Concrete Soil Grass OH Water Absorptions Specific Absorptions Features Chemistry Composition Absorption Intensity Concentration Overall reflectance Physical Properties

Big Data Modeling Spectral data is Multivariate hundreds and thousands of bands Linear and non-linear spectral ranges Tens and hundreds of samples Data from multiple origins

Supervised Machine Learning Function Fitting (MLR, PCA-R, PLS-R, ANN) Classification (K-means, SVM, Nearest Neighbor)

Multiple Preprocessing Choices

Partial Least Squares Regression (PLS-R) Dimension rotation in covariance to the modeled properties' values New factors (latent variables) as predictors with explained variance Capability to deal with multivariate data (spectra) without overfitting

How to Apply The Unscrumbler SPSS Matlab R Drawbacks Limited output Limited configuration Requires Programming knowledge No Automation for Pre-processing Solution PARACUDA

PARACUDA PARACUDA Engine All Possibilities Approach (APA) spectral manipulation Cloud based Airborne or Field data modeling Easy and Automatic utilization µ PLS/ANN λ

PARACUDA - the drawbacks ONE model for ONE preprocessing sequence No spectral assignments No multi-threading No transformation to modeled attributes Limited validation technique Solution PARACUDA II

PARACUDA II Main Structure Spectral Data Chemical Data Validation

Transformations to modeled values (chemistry) Closer to Normal Distribution Outlier detection Multiple algorithms (box-cox, Log x, sqrt) Normality test

PARACUDA II Preprocessing Module

Observing Preprocessing with Correlograms

Validation Techniques K-fold / leave-one-out cross validation Internal validation (75% Calibration, 25% Test) Internal validation * 256

Validation 1 Model Population 1:1 line for R 2 Cal >R 2 Test Evaluating model clusters

Validation 1 Model Population 256 validation iterations Analysis of models population Reporting both the population performance and best model Results Comparison

Validation 2 Best Available Model Best available model out of 256 iterations Capability to apply on any given data (point and image)

Validation 2 Best Available Model Extremely high performance High significance Results Comparison No overfitting

Validation 3 - Spectral Assignments Sensitivity Ratio for every wavelength Wavelength Importance Model Evaluation Bands with high SR High importance to developed model

Using PARACUDA II Samples How to use Create an Excel file with Spectra + Modeled attributes Run PARACUDA II (one click operation) Operation time ~ 5 minutes / modeled property (with 100 ASD samples) Who can use Anyone Modeled Data Spectral Data

Conclusions Extremely fast operation Parallel programming (multi-threading + GPU) Statistical transformations to modeled attributes Sophisticated APA preprocessing Internal iterative validation New statistics Spectral assignments Best available model One-click operation

Questions? Email: Nimrod.RSLab@gmail.com