Multivariate Calibration with Robust Signal Regression

Similar documents
Functional SVD for Big Data

Robust high-dimensional linear regression: A statistical perspective

Optimizing Model Development and Validation Procedures of Partial Least Squares for Spectral Based Prediction of Soil Properties

EXTENDING PARTIAL LEAST SQUARES REGRESSION

Lecture 3: Statistical Decision Theory (Part II)

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Robust estimation, efficiency, and Lasso debiasing

Indian Statistical Institute

ECS289: Scalable Machine Learning

Spatial Process Estimates as Smoothers: A Review

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

9. Robust regression

Regularization and Variable Selection via the Elastic Net

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

STA141C: Big Data & High Performance Statistical Computing

Lecture 1: Supervised Learning

Machine Learning And Applications: Supervised Learning-SVM

Median Cross-Validation

Modeling Real Estate Data using Quantile Regression

An Empirical Characteristic Function Approach to Selecting a Transformation to Normality

Regression, Ridge Regression, Lasso

Extreme L p quantiles as risk measures

A Magiv CV Theory for Large-Margin Classifiers

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

High-dimensional regression

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

STAT 704 Sections IRLS and Bootstrap

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

Robust model selection criteria for robust S and LT S estimators

P -spline ANOVA-type interaction models for spatio-temporal smoothing

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Multidimensional Density Smoothing with P-splines

Nonparametric Small Area Estimation Using Penalized Spline Regression

Estimation of cumulative distribution function with spline functions

Computing regularization paths for learning multiple kernels

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Inference based on robust estimators Part 2

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Inference For High Dimensional M-estimates: Fixed Design Results

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A Modern Look at Classical Multivariate Techniques

Homework 6. Due: 10am Thursday 11/30/17

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES

Long-Run Covariability

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Estimation de mesures de risques à partir des L p -quantiles

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters

Generalized Boosted Models: A guide to the gbm package

Applied Time Series Topics

An Introduction to Statistical Machine Learning - Theoretical Aspects -

Flexible Spatio-temporal smoothing with array methods

On DC. optimization algorithms for solving minmax flow problems

Space-time modelling of air pollution with array methods

Accelerated primal-dual methods for linearly constrained convex problems

Machine Learning. Part 1. Linear Regression. Machine Learning: Regression Case. .. Dennis Sun DATA 401 Data Science Alex Dekhtyar..

ECE521 week 3: 23/26 January 2017

Statistical Machine Learning from Data

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Smoothing Proximal Gradient Method. General Structured Sparse Regression

MSA220/MVE440 Statistical Learning for Big Data

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

25 : Graphical induced structured input/output models

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

LECTURE NOTE #3 PROF. ALAN YUILLE

A Bahadur Representation of the Linear Support Vector Machine

Linear model selection and regularization

Building a Prognostic Biomarker

Recovering Indirect Information in Demographic Applications

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology

UNIVERSITETET I OSLO

Machine Learning for OR & FE

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Linear Models in Machine Learning

Biostatistics Advanced Methods in Biostatistics IV

TGDR: An Introduction

An automatic report for the dataset : affairs

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Classification Logistic Regression

Lecture 12 Robust Estimation

Bootstrap, Jackknife and other resampling methods

Estimation of Mars surface physical properties from hyperspectral images using the SIR method

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Single Index Quantile Regression for Heteroscedastic Data

Bi-level feature selection with applications to genetic association

the-go Soil Sensing Technology

ECS171: Machine Learning

Is the Whole Greater Than the Sum of Its Parts?

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Statistical Methods as Optimization Problems

Generalized Elastic Net Regression

Relaxed linearized algorithms for faster X-ray CT image reconstruction

Fast Regularization Paths via Coordinate Descent

Transcription:

Multivariate Calibration with Robust Signal Regression Bin Li and Brian Marx from Louisiana State University Somsubhra Chakraborty from Indian Institute of Technology Kharagpur David C Weindorf from Texas Tech University. July 31, 2018 1/22

Outline Motivating example. Recap: Penalized Signal Regression (PSR). Generalized Huber Loss and robust PSR. Simulation and empirical results. Related issues. 2/22

A Soil Data Example Data: 675 soil samples collected from CA, NE, and TX in 2014, 225 samples in each location. All the soil samples were scanned using a portable VisNIR spectroradiometer with a spectral range of 350 to 2500 nm. Ten physicochemical properties were measured: soil cation exchange capacity (CEC), total nitrogen level, electrical conductivity (EC), total carbon level, loss on ignition (LOI), soil organic matter (SOM), clay, sand, silt, and soil ph level. LOI and SOM are highly correlated, so LOI was removed. Objective: use VisNIR spectra to predict soil properties. 3/22

Sample spectra 0.04 0.02 0.00 500 1000 1500 2000 2500 Wavelength (nm) Thirty sample spectra (first derivative) for the soil data. 4/22

Penalized Signal Regression PSR: P. Eilers and B.D. Marx (Statistical Science, 1996). PSR minimizes the following objective: S(α) = y XBα 2 +λ Dα 2, with difference matrix D penalizes differences on α. The closed form solution for α: ˆα = (U U +λd D) 1 U y Response Y: soil property indicator (m 1 column vector, m = 675). Input X: VisNIR Spectra m p matrix, p = 214. B-spline basis matrix B: p n matrix, n = 100. Difference matrix D: (n d) n, d is the order of the difference penalty (d = 0,1,2,3). Coefficient vector α: n 1. 5/22

Q-Q Plot of PSR Residuals Sample Quantiles 10 0 5 10 CEC 3 2 1 0 1 2 3 Theoretical Quantiles Carbon Sample Quantiles 500 500 1500 EC 3 2 1 0 1 2 3 Theoretical Quantiles LOI Sample Quantiles 0.10 0.00 0.10 Nitrogen 3 2 1 0 1 2 3 Theoretical Quantiles SOM Sample Quantiles Sample Quantiles 0.5 0.5 1.0 30 10 10 3 2 1 0 1 2 3 Theoretical Quantiles Clay 3 2 1 0 1 2 3 Theoretical Quantiles Sample Quantiles Sample Quantiles 0 2 4 6 40 20 0 3 2 1 0 1 2 3 Theoretical Quantiles Sand 3 2 1 0 1 2 3 Theoretical Quantiles Sample Quantiles Sample Quantiles 1 0 1 2 3 4 10 0 10 20 30 3 2 1 0 1 2 3 Theoretical Quantiles Silt 3 2 1 0 1 2 3 Theoretical Quantiles Normal quantile-quantile plots of the residuals (from PSR models) for nine soil property indicators. 6/22

With vs. Without Outliers on PSR and Robust PSR PSR PSR Coefficient 3000 1000 0 1000 3000 with outliers w/o outliers 500 1000 1500 2000 2500 Predicted 5 10 15 20 25 30 35 5 10 15 20 25 30 35 Wave length (nm) Measured rpsr rpsr Coefficient 3000 1000 1000 3000 with outliers w/o outliers Predicted 5 10 15 20 25 30 35 500 1000 1500 2000 2500 5 10 15 20 25 30 35 Wave length (nm) Measured 7/22

Generalized Huber Loss Generalized Huber loss { e 2 e < K ρ η (e) = K 2 +2ηK( e K) e K., 0 η 1. ρ(e) 0 5 10 15 η = 1 η = 0.5 η = 0 e 2 ρ(e) 0 2 4 6 8 10 12 14 η = 1 η = 0.5 η = 0 4 2 0 2 4 e 4 2 0 2 4 e 8/22

Robust Penalized Signal Regression (rpsr) The rpsr estimator minimizes { m } Q(α) = ρ η (y i U iα) +λα D d D dα i=1 which can be represented as a difference of two convex functions as follows: Q(α) = h 1 (α) h 2 (α), where m h 1 (α) = ei 2 +λα D Dα, h 2 (α) = i=1 m I( e i > K) [ ei 2 +2ηK(K e i ) K 2], i=1 9/22

Difference Convex Programming Difference Convex Programming: An and Tao (1997). Consider minimizing a nonconvex objective function g(w) = g 1 (w) g 2 (w) where both g 1 (w) and g 2 (w) are convex in w. D.C. programming constructs a sequence of subproblems and solves them iteratively. Given the solution for the (m 1)th subproblem w (m 1), the mth subproblem solves w (m) [ ] = arg min g 1(w) g 2 (w (m 1) ) + w w (m 1), g 2 (w (m 1) ), w = arg min w g 1(w) w, g 2 (w (m 1) ). where g 2 (w (m 1) ) is the subgradient of g 2 (w) at w (m 1) with respect to w. 10/22

Robust PSR Algorithm Minimizing the objective function of rpsr becomes minimizing a sequence of PSR with the adjusted responses Y A y 1 I( e 1 > K)[e 1 ηksign(e 1 )] Y A =.. y m I( e m > K)[e m ηksign(e m )] m 1 Only the observations with the residuals greater than K (in absolute value) will be adjusted. If K is greater than all the residuals {e i }, then rpsr and PSR solutions are the same. 11/22

Robust PSR Algorithm (cont.) Initial ˆα is from the PSR estimate (with the same value of λ). Algorithm stops when max{ (ˆα j cur ˆα pre j )/ˆα pre j } n j=1 < 10 6. The cutoff value K is chosen based on 1.5 IQR rule on the residuals in each iteration. Grid search on the optimal values for λ and η based on CV performance. The rpsr algorithm usually converges within just a few iterations. 12/22

Simulation Studies Underlying model: Y i = f(x i )+ǫ i. f(x i ): PSR fitted value on CEC with λ = 10 5. Three error distributions on ǫ i : Normal: ei N(0,2.39 2 ). Mixed normal: ei 0.95N(0,2.39 2 )+0.05N(0,23.9 2 ). Slash distribution: ei N(0,1)/U(0,1). Three levels of η are considered: 0, 0.5 and 1. 10-fold CV to find optimal value of λ. 50 random splits of the datasets: 75% training and 25% test sets. Comparative RMSE and MAE on test samples. 13/22

Simulation Results Normal Mixed normal Slash Comparative RMSE 1.00 1.05 1.10 1.15 PSR rpsr_1 rpsr_0.5 rpsr_0 PSR rpsr_1 rpsr_0.5 rpsr_0 Normal Comparative MAE 1.00 1.05 1.10 1.15 1.20 1.25 Comparative RMSE 1.0 1.5 2.0 2.5 Mixed normal Comparative MAE 1.0 1.5 2.0 2.5 3.0 3.5 Comparative RMSE 0 20 40 60 80 100 120 PSR rpsr_1 rpsr_0.5 rpsr_0 Slash Comparative MAE 0 50 100 150 PSR rpsr_1 rpsr_0.5 rpsr_0 PSR rpsr_1 rpsr_0.5 rpsr_0 PSR rpsr_1 rpsr_0.5 rpsr_0 14/22

Simulation Results (cont.) Average of test RMSEs and MAEs based on 50 replications. RMSE PSR rpsr (η = 1) rpsr (η = 0.5) rpsr (η = 0) Normal 0.696 0.694 0.699 0.709 Mixed 1.422 0.897 0.820 0.800 Slash 13.569 1.728 1.452 1.310 MAE PSR rpsr (η = 1) rpsr (η = 0.5) rpsr (η = 0) Normal 0.440 0.437 0.442 0.446 Mixed 0.885 0.545 0.508 0.502 Slash 7.646 1.022 0.850 0.792 15/22

Model Stability Three error distributions as above: normal, mixed normal and slash. Three levels of η are considered: 0, 0.5 and 1. PSR and rpsr are fitted on 95% of the random sample with λ = 10 5. 20 random splits of the datasets. Evaluate model stability on coefficient estimation: L 2 distance standard deviation (L 2 DSD) criterion L 2 DSD = SD({ ˆβ (i) β 2 } 20 i=1), where β is the average ˆβ from 20 replications. Evaluate model stability on prediction: SD of the predicted values on all 675 samples. 16/22

Simulation Results (cont.) Summary of L 2 DSD of ˆβ and SD of ŷ based on 20 replications. Error PSR rpsr (η = 1) η = 0.5 η = 0 Normal 202 200 191 186 Mixed 1247 402 359 377 Slash 2495 442 416 512 Normal 0.126 0.126 0.135 0.148 Mixed 0.361 0.150 0.148 0.152 Slash 0.728 0.275 0.224 0.237 17/22

Soil Data Study 50 random splits: 75% training and 25% test sets. Three levels of η are considered: 0, 0.5 and 1. RMSE on test samples. Error PSR rpsr (η = 1) η = 0.5 η = 0 CEC 2.622 2.590 2.595 2.624 EC 290.2 286.7 287.9 290.1 Nitrogen 0.01848 0.01815 0.01806 0.01818 Carbon 0.1817 0.1794 0.1782 0.1795 SOM 0.3383 0.3284 0.3269 0.3288 Clay 3.240 3.117 3.138 3.204 Sand 5.425 5.362 5.397 5.479 Silt 4.473 4.412 4.413 4.438 ph 0.3740 0.3729 0.3731 0.3782 18/22

Soil Data Study (cont.) MAE on test samples. Error PSR rpsr (η = 1) η = 0.5 η = 0 CEC 1.795 1.749 1.737 1.747 EC 211.0 206.5 205.5 205.8 Nitrogen 0.01265 0.01240 0.01227 0.01232 Carbon 0.1313 0.1300 1.287 1.290 SOM 0.2065 0.1946 0.1924 0.1933 Clay 2.233 2.105 2.101 2.124 Sand 4.011 3.940 3.946 3.977 Silt 3.325 3.291 3.289 3.302 ph 0.2842 0.2828 0.2821 0.2855 19/22

Soil Data Study (cont.) Compare PSR and rpsr coefficients and identify the outliers. Data: use all 675 samples as training set; Y: Carbon. PSR vs. rpsr model with η = 0.5 Leading two PCs explains 79.8% total variance. rpsr identifies 17 outliers (about 2.5% of total samples). PC2 0.01 0.00 0.01 0.02 NE CA TX Predicted Carbon 0 1 2 3 4 5 6 Coefficient 200 100 0 100 200 rpsr PSR 0.04 0.02 0.00 0.02 0.04 PC1 0 1 2 3 4 5 6 Measured Carbon 500 1000 1500 2000 2500 Wavelength (nm) 20/22

Connection With Lee and Oh s Procedure (2007) Lee and Oh (2007) explored robust penalized regression spline using Huber loss. They proposed an iterative fitting procedure basesd on the pseudo-response ỹ: ỹ i = ŷ i + ψ(e i) 2, where ψ( ) is the first order derivative of Huber loss ρ H ( ). We can show the pseudo-response ỹ i is equivalent to our adjusted response Y A with η = 1: ỹ i = y i I( e i > K)[e i ηksign(e i )]. Lee and Oh s approach is theoretically supported by Cox s result (1983), which requires ψ( ) to be 2 nd order differentiable. The proposed rpsr procedure is a generalization of Lee and Oh s procedure, and motivated from a different perspective. 21/22

Reference An, L. and Tao, P. (1997). Sovling a class of linearly constrained indefinite quadratic problems by d.c. algorithms, Journal of Global Optimization, 11, 253-285. Cox, D. (1983). Asymptotics for m-type smoothing splines, The Annals of Statistics, 11(2), 530-551. Eilers, P. and Marx, B. (1996). Flexible smoothing with b-splines and penalties, Statistical Science, 11(2), 89-121. Lee, T. and Oh, H. (2007). Robust penalized regression spline fitting with application to additive mixed modeling, Computational Statistics, 22(1), 159-171. Li, B. and Marx, B. (2018), Multivariate calibration with robust signal regression. Accepted In Statistical Modelling: An International Journal. 22/22