Virtual Sensors and Large-Scale Gaussian Processes

Similar documents
Stable and Efficient Gaussian Process Calculations

Nonparametric Bayesian Methods (Gaussian Processes)

Advanced Introduction to Machine Learning CMU-10715

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Improved Linear Algebra Methods for Redshift Computation from Limited Spectrum Data II

Neutron inverse kinetics via Gaussian Processes

Kernel-Based Retrieval of Atmospheric Profiles from IASI Data

SPARSE INVERSE GAUSSIAN PROCESS REGRESSION WITH APPLICATION TO CLIMATE NETWORK DISCOVERY

An Introduction to Gaussian Processes for Spatial Data (Predictions!)

Linear Methods for Regression. Lijun Zhang

Introduction to Gaussian Process

2 Related Works. 1 Introduction. 2.1 Gaussian Process Regression

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Pattern Recognition and Machine Learning

Prediction of double gene knockout measurements

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Gaussian Process for Internal Model Control

Introduction to Machine learning

arxiv: v3 [stat.ml] 14 Apr 2016

Learning Gaussian Process Models from Uncertain Data

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Nonparameteric Regression:

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Markov Modeling of Time-Series Data. via Spectral Analysis

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Sparse Inverse Kernel Gaussian Process Regression

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration

Reliability Monitoring Using Log Gaussian Process Regression

STA414/2104 Statistical Methods for Machine Learning II

Probabilistic Machine Learning. Industrial AI Lab.

An Introduction to Statistical and Probabilistic Linear Models

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

Optimization of Gaussian Process Hyperparameters using Rprop

Gaussian Process Regression

Gaussian Process Regression Forecasting of Computer Network Conditions

Introduction to emulators - the what, the when, the why

DETECTING HUMAN ACTIVITIES IN THE ARCTIC OCEAN BY CONSTRUCTING AND ANALYZING SUPER-RESOLUTION IMAGES FROM MODIS DATA INTRODUCTION

Linear regression methods

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

A Data-driven Approach for Remaining Useful Life Prediction of Critical Components

Multi-Kernel Gaussian Processes

ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS

Lecture 5: GPs and Streaming regression

Time Series: Theory and Methods

Lecture 1c: Gaussian Processes for Regression

MS-C1620 Statistical inference

Joint Emotion Analysis via Multi-task Gaussian Processes

STAT 518 Intro Student Presentation

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Land Surface Temperature Measurements From the Split Window Channels of the NOAA 7 Advanced Very High Resolution Radiometer John C.

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Gaussian Process Modeling in Cosmology: The Coyote Universe. Katrin Heitmann, ISR-1, LANL

Introduction to Gaussian Processes

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Lecture 9. Time series prediction

Machine Learning 4771

New Approaches To Photometric Redshift Prediction Via Gaussian Process Regression In The Sloan Digital Sky Survey

CSC 576: Variants of Sparse Learning

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS

Mutual Information Based Data Selection in Gaussian Processes for People Tracking

GAUSSIAN PROCESS REGRESSION

CSCI567 Machine Learning (Fall 2014)

Relevance Vector Machines

Fast Regularization Paths via Coordinate Descent

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Probabilistic numerics for deep learning

Additive Gaussian Processes

Feature Engineering, Model Evaluations

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Exploiting Sparse Non-Linear Structure in Astronomical Data

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

STA 4273H: Sta-s-cal Machine Learning

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

6. Regularized linear regression

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

Regularized Regression A Bayesian point of view

Talk on Bayesian Optimization

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Machine Learning (BSMC-GA 4439) Wenke Liu

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Adaptive Sparseness Using Jeffreys Prior

Relevance Vector Machines for Earthquake Response Spectra

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

MTTTS16 Learning from Multiple Sources

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015

Learning Non-stationary System Dynamics Online using Gaussian Processes

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Unsupervised Learning Methods

Transcription:

Virtual Sensors and Large-Scale Gaussian Processes Ashok N. Srivastava, Ph.D. Principal Investigator, IVHM Project Group Lead, Intelligent Data Understanding ashok.n.srivastava@nasa.gov Coauthors: Kamalika Das, Ph.D. and Santanu Das, Ph.D.

NASA Data Systems Earth and Space Science Earth Observing System generates ~21 TB of data per week. NASA Ames simulations generating 1-5 TB per day Aeronautical Systems Distributed archive growing at 100K flights per month with 2M flights already. Exploration Systems Space Shuttle and International Space station downlinks about 1.5GB per day.

Developing Virtual Sensors Virtual Sensors predict the value of one sensor measurement by learning the nonlinear correlations between its values and potentially hundreds of other sensor measurements. Space Shuttle Example: Detecting Anomalies in the Main Propulsion System Broken Valve B. Matthews, A. N. Srivastava, et. al., Multidimensional Anomaly Detection on the Space Shuttle Main Propulsion System: A Case Study, submitted to AIAA Journal on Aerospace Computing, Information, and Communication, 2010.

Virtual Sensors for Estimating the Large Scale Structure of the Universe z 1.10 M. Way, and A. N. Srivastava, Novel Methods for Predicting Photometric Redshifts, Astrophysical Journal, 2006. L. Foster, A, A. Waagen, N. Aijaz, M. Hurley, A. Luis, J. Rinsky, C. Satyavolu, M. J. Way, P. Gazis, and A. N. Srivastava, Stable and Efficient Gaussian Process Calculations, Journal of Machine Learning Research, 10(Apr):857--882, 2009. M. Way, L. Foster, P. Gazis, and A. N. Srivastava, New Approaches to Photometric Redshift Prediction, Astrophysical Journal, 2009.

Virtual Sensors in the Earth Sciences Detecting change in cloud cover New sensors on the MODIS system can detect clouds over snow and ice in the 1.6mm band (circa 1999). Difficult over snow and ice-covered surfaces because of low contrast in visible and thermal infrared wavelengths. Older sensors from the AVHRR system do not detect cloud cover over snow and ice because of poor contrast. Predict 1.6mm channel using a Virtual Sensor MODIS Spectral Measurements MODIS Cloud Signal (Black) AVHRR Spectral Measurements No Cloud Signal (Black) for AVHRR 1980 1985 1990 1995 2000 2005 year Detecting land cover change using surface reflectance measurement Predict missing surface reflectance data in one sensor channel using observations from a combination of other channels. Create a high quality complete data record for use in new Earth science analysis and explorations. Study the residual pattern of the prediction algorithm across years in order to make significant conclusions regarding change in land cover across the globe. A. N. Srivastava, N. C. Oza, and J. Stroeve, Virtual Sensors: Using Data Mining Techniques to Efficiently Estimate Remote Sensing Spectra, Special Issue on Advanced Data Analysis, IEEE Transactions on Geoscience and Remote Sensing, March 2005.

Prediction Methods for Virtual Sensors Build a prediction model that offers Interpretability Confidence in the prediction Scalability Choices of Regression Functions Linear regression Generalized Linear Models such as Elastic Nets* (perform Lasso and Ridge Regression simultaneously) Neural networks Support vector machines & Gaussian Process Regression * J. Friedman, T. Hastie, R. Tibshirani, Regularization Paths for Generalized Linear Models via Coordinate Descent, Journal of Statistical Software, 2010.

Gaussian Process Regression Training data data matrix of observations n x d y vector of target data n x 1 Test data X* matrix of new observations n* x d Covariance function Goal Predict y* corresponding to X* Model building Train hyperparameters on a sample of X Compute covariance matrix K (n x n) Prediction Compute cross covariance matrix K* (n* x n) Compute mean prediction on y* using Compute variance of prediction using Algorithm Analysis Storage Complexity: Storing covariance matrix O(n 2 ) Time Complexity: Computing matrix inversion O(n 3 )

Computational Challenges Subset of Regressors (Wahba, 1990) where, Memory: Storing covariance matrix O(nm) Time: Solving linear systems O(nm 2 ) Can be numerically unstable

Cures for Numerical Instability Approach 1. Select columns to make K 1 well conditioned 2. Use stable technique for least squares problem such as QR factorization V method 3. Requirement: maintain O(nm) memory use and O(nm 2 ) efficiency. Column Selection 1. Use Cholesky factorization with pivoting to partially factor K 2. selects appropriate columns for K 1 3. K 1 will be well conditioned if cond(k 1 ) is O(condition of optimal low rank approximation).

Stable GP Approximate by Cholesky factorization where V is is n m and V is m m 11 Predicted mean can be rewritten as Inverting m m instead of n n matrix Method is numerically stable Method can be faster and needs less memory L. Foster, A. Waagen, N. Aijaz, M. Hurley, A. Luis, J. Rinsky, C. Satyavolu, M. J. Way, P. Gazis, and A. N. Srivastava, Stable and Efficient Gaussian Process Calculations, Journal of Machine Learning Research, 10(Apr):857--882, 2009.

Run time (seconds) GP V: Scaling to 3 million points 250 200 150 100 50 Analysis on simulated data GP GP-V 0 10 2 10 3 10 4 10 5 Data points (log scale) Input data dimension =5 10 6 10 7 Number of sample points = 3 million Run time = time to build the model + time to evaluate 500 test points Maximum rank = 25 (used for GP-V) Hyper parameters are trained on 100 sample points Accuracy does not degrade with approximation

Stable GP Results Based on 100 runs on NH 3 laser data d = 34, n = 1166, maximum rank = 340 Samples for hyperparameter training = 300 With low-rank matrix inversion approximation using pivoting Stable GP performed close to standard GP. A. N. Srivastava, S. Das. (2009). Detection and prognostics on low-dimensional systems, Transactions of Systems, Man and Cybernetics Part C 39, 1, 2009.

Conclusion New Gaussian Process regression algorithm for Virtual Sensors in Earth Science data. Have shown a method to scale from 10 2 points to 10 6 points Scalability dependent on Number of dimensions of input data Number of modes in input data Choice of clustering algorithm Accuracy dependent on Choice of covariance function Choice of number of clusters and entropy threshold Sparsity in the covariance matrix constructed from the data For more information please see: dashlink.arc.nasa.gov/member/ashok

APPENDIX

Gaussian Process Regression Gaussian Process regression uses Bayesian inference under additive Gaussian noise assumption to learn a function on a given data set with a confidence measure*:,, where Likelihood function: Gaussian prior over parameters: Inference is the posterior distribution over the weights given by Predictive distribution is:, where * C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006.

Low-rank Approximations Numerical approximation techniques exist such as Subset of Regressors, Q-R decomposition, V method Numerical instability can be a problem Solution: Stable GP (V formulation using Cholesky decomposition with pivoting) The V-Formulation provides an extremely scalable and numerically stable method to compute Gaussian Process Regression for arbitrary kernels.