Space-Time Kernels. Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London

Similar documents
Multitask Learning of Environmental Spatial Data

Machine Learning of Environmental Spatial Data Mikhail Kanevski 1, Alexei Pozdnoukhov 2, Vasily Demyanov 3

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Machine Learning Algorithms for GeoSpatial Data. Applications and Software Tools

Discussion About Nonlinear Time Series Prediction Using Least Squares Support Vector Machine

Spatio-Temporal Analytics of Network Data

NON-FIXED AND ASYMMETRICAL MARGIN APPROACH TO STOCK MARKET PREDICTION USING SUPPORT VECTOR REGRESSION. Haiqin Yang, Irwin King and Laiwan Chan

Support Vector Machine Regression for Volatile Stock Market Prediction

Reading, UK 1 2 Abstract

A Support Vector Regression Model for Forecasting Rainfall

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction

Analysis of Multiclass Support Vector Machines

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

arxiv: v1 [q-fin.st] 27 Sep 2007

Similarity and kernels in machine learning

Support Vector Machine For Functional Data Classification

Support Vector Machines for Classification: A Statistical Portrait

Iterative ARIMA-Multiple Support Vector Regression models for long term time series prediction

Improving forecasting under missing data on sparse spatial networks

RAINFALL RUNOFF MODELING USING SUPPORT VECTOR REGRESSION AND ARTIFICIAL NEURAL NETWORKS

From Last Meeting. Studied Fisher Linear Discrimination. - Mathematics. - Point Cloud view. - Likelihood view. - Toy examples

Heterogeneous mixture-of-experts for fusion of locally valid knowledge-based submodels

Linear Dependency Between and the Input Noise in -Support Vector Regression

Decision-Oriented Environmental Mapping with Radial Basis Function Neural Networks

Hierarchical models for the rainfall forecast DATA MINING APPROACH

Machine Learning 2010

Graphical LASSO for local spatio-temporal neighbourhood selection

Kernel Methods & Support Vector Machines

Soft Sensor Modelling based on Just-in-Time Learning and Bagging-PLS for Fermentation Processes

Linear and Non-Linear Dimensionality Reduction

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Introduction to Machine Learning

Predicting Time of Peak Foreign Exchange Rates. Charles Mulemi, Lucio Dery 0. ABSTRACT

Spatio-temporal feature selection for black-box weather forecasting

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

Two-stage Pedestrian Detection Based on Multiple Features and Machine Learning

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

Time series prediction

Environmental Data Mining and Modelling Based on Machine Learning Algorithms and Geostatistics

Support Vector Machines.

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Evaluation of Support Vector Machines and Minimax Probability. Machines for Weather Prediction. Stephen Sullivan

Fast Bootstrap for Least-square Support Vector Machines

Low Bias Bagged Support Vector Machines

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

References. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule

SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS

MODELLING TRAFFIC FLOW ON MOTORWAYS: A HYBRID MACROSCOPIC APPROACH

Kernel Methods. Outline

Smooth Bayesian Kernel Machines

Spectral and Spatial Methods for the Classification of Urban Remote Sensing Data

Kobe University Repository : Kernel

A DATA FIELD METHOD FOR URBAN REMOTELY SENSED IMAGERY CLASSIFICATION CONSIDERING SPATIAL CORRELATION

Dynamic Time-Alignment Kernel in Support Vector Machine

Probabilistic Energy Forecasting

Spatio-temporal feature selection for black-box weather forecasting

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Pattern Recognition and Machine Learning

Discussion of Some Problems About Nonlinear Time Series Prediction Using ν-support Vector Machine

Efficient Complex Output Prediction

A Hybrid Method of CART and Artificial Neural Network for Short-term term Load Forecasting in Power Systems

Foreign Exchange Rates Forecasting with a C-Ascending Least Squares Support Vector Regression Model

Predicting Link Travel Times from Floating Car Data

A Hybrid Time-delay Prediction Method for Networked Control System


Analysis of Interest Rate Curves Clustering Using Self-Organising Maps

Discriminative Direction for Kernel Classifiers

Monsuru Adepeju 1 and Andy Evans 2. School of Geography, University of Leeds, LS21 1HB 1

Multivariate statistical methods and data mining in particle physics

Support Vector Machine for Classification and Regression

CIS 520: Machine Learning Oct 09, Kernel Methods

Accuracy of near real time updates in wind power forecasting with regard to different weather regimes

Research on fault prediction method of power electronic circuits based on least squares support vector machine

Neural network time series classification of changes in nuclear power plant processes

Estimating atmospheric visibility using synergy of MODIS data and ground-based observations

Package IDmining. October 25, 2017

A Wavelet Neural Network Forecasting Model Based On ARIMA

COMPARISON OF CLEAR-SKY MODELS FOR EVALUATING SOLAR FORECASTING SKILL

Multiple Kernel Self-Organizing Maps

Curriculum Vitae. Ran Wei

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Support Vector Machine (SVM) and Kernel Methods

A new metric of crime hotspots for Operational Policing

Face Recognition Using Global Gabor Filter in Small Sample Case *

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Statistical Learning Reading Assignments

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

GIS Ostrava 2014 Final resume

Load Forecasting Using Artificial Neural Networks and Support Vector Regression

A Brief Overview of Machine Learning Methods for Short-term Traffic Forecasting and Future Directions

Performance Analysis of Some Machine Learning Algorithms for Regression Under Varying Spatial Autocorrelation

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

An Assessment of Crime Forecasting Models

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

THE POTENTIAL OF APPLYING MACHINE LEARNING FOR PREDICTING CUT-IN BEHAVIOUR OF SURROUNDING TRAFFIC FOR TRUCK-PLATOONING SAFETY

Negatively Correlated Echo State Networks

Transcription:

Space-Time Kernels Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London Joint International Conference on Theory, Data Handling and Modelling in GeoSpatial Information Science, Hong Kong, 26 th -28 th May 2010

Methods for modelling spatio-temporal data Griffith (2010) describes basic approaches 1. Multiple ARIMA Models 2. STAR Models 3. 3D Geostatistical Models 4. Panel data models with fixed and random effects 5. Spatial Filter Models 6. Kernel Methods and Machine Learning Griffith, D. A. 2010. Modeling spatio-temporal relationships: retrospect and prospect. Journal of Geographical Systems. 12(2). pp. 111-123 2

Outline of Presentation 3

Outline Introduction to kernels What is a kernel? Kernel Methods Types of kernels Review of kernels in space-time analysis Kernels in spatial analysis Kernels in temporal analysis Kernels in space-time analysis space time kernel Application of STK to temperature prediction Conclusion and Discussion 4

Introduction to kernels 5

Introduction to kernels The problem Machine Learning and Statistical Algorithms are well developed for the linear case. Real world data is often complex and nonlinear. However, many of these algorithms depend on dot products between two vectors. 6

Introduction to kernels Mercer s Theorem: Any continuous, symmetric, positive semi-definite function K(x, y) that can be expressed as a dot product in a high dimensional space is a kernel. Transforms linear algorithms. Data are mapped to high dimensional feature space where a linear solution can be found. 7

URL: http://www.youtube.com/watch?v=3licbrzprza 8

Types of kernel General purpose kernels Linear: Polynomial: Gaussian RBF: 9

Custom Kernels Kernels can be designed to suit specific problem domains Examples: Bag of words kernel Tree kernels Graph kernels 10

Kernel Methods Support Vector Machines Kernel Principal/Independent Components Analysis Canonical Correlation Analysis Spectral Clustering Gaussian Processes Fisher s linear discriminant analysis Ridge regression Many More 11

Kernels in spatial analysis 12

Kernels in spatial analysis Spatial Interpolation Kanevski et al (2007) use SVM with Gaussian kernels for reservoir porosity mapping. Kernel parameter σ (bandwidth) dictates spatial influence of training samples. Optimal Solution Under-fitting Over-fitting σσtoo too low high Kanevski M., Pozdnoukhov A., and Timonin V. Machine Learning Algorithms for Environmental Spatial Data. Theory and Case Studies. EPFL Press, Lausanne, 2007. A book with software on machine learning algorithms, 450 pages. 13

Kernels in spatial analysis Spatial Regression What about local variations in data? Custom multi-scale kernels can be employed to improve results. Case study Caesium contamination in Chernobyl Kernel Dictionaries Input data The multi-scale kernel captures large and small scale variations Weston Pozdnoukhov, J., 1999 A. Extensions And Kanevski, to the M.(2007). Support Multi-scale Vector Method. Support Ph.D. Vector Thesis, Algorithms Royal for Holloway Hot Spot Detection University and of London Modelling. Stochastic Environmental Research and Risk Assessment.22(5), pp.647-660 14

Kernels in temporal analysis 15

Kernels in temporal analysis Kernel methods are widely used in time series forecasting Travel time prediction Financial time series Environmental time series Rϋeping (2001) reviews kernels in time series analysis RBF kernel good general purpose Fourier kernel periodic component Polynomial kernel non-periodic series Ruping, S. 2001. SVM Kernels for Time Series Analysis. In Klinkenberg, R., Ruping, S., Fick, A., Henze, N., Herzog, C., Molitor, R., and Schroder, O. (eds), LLWA 01 - Tagungsband der GI-Workshop-Woche Lernen - Lehren - Wissen - Adaptivitat, pp.43-50, Dortmund, Germany. 16

The Space-time kernel 17

The Space-Time Kernel Problem Can a kernel be designed to deal with space-time series data that is: Non-stationary? Heterogeneous? Multi-scale? 18

The Space-Time Kernel We have seen that different kernels can be applied to different problems Gaussian kernels for spatial problems Fourier, Polynomial kernels for temporal applications According to the theory of convolution kernels (Haussler, 1999), the product of two kernels is also a kernel. 19

The Space-time kernel Weighting parameter Gaussian (RBF) kernel Fourier Kernel Polynomial Kernel 20

Case Study Annual average temperature prediction using Support Vector Regression with Space-time kernel 21

Case Study - Data Data Georeferenced historical annual average temprature time series from 137 meteorological stations in China between 1951 and 1992. All stations are fitted, we consider results from three. Beijing Guangzhou Urumchi T. Cheng a, J.Q. Wang b, X. Li b, W. Zhang.2008. A HYBRID APPROACH TO MODEL NONSTATIONARY SPACE-TIME SERIES. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B2. Beijing 2008 22

Case Study Experimental Procedure 1. Variography is used to determine the spatial neighbourhood of each station. 2. Data is split into training (42 years) and testing (10 years) sets. 3. Each station is fitted using SVR based on its previous values and those of its spatial neighbours. 4. Once the model is trained, multi-step forecasting is used to predict the next ten years. 5. The results are compared with pure time series SVR and standard SVR. 23

Case Study - Results Fitted (1951-1992) RMSE Plain SVR Time series SVR-STK SVR Beijing 0.981 0.462 0.209 Guangzhou 0.910 0.314 0.084 Urumchi 1.173 0.853 0.306 Forecasting (1993-2002) RMSE Plain SVR Time series SVR-STK SVR Beijing 0.802 0.316 0.403 Guangzhou 0.813 0.418 0.387 Urumchi 0.837 0.551 0.541 Table 1. Accuracy (RMSE) measures for three meteorological stations Beijing, Guangzhou and Urumchi in 52 years 24

Conclusions and future directions 25

Conclusions and future directions The results are promising but The space-time series is very short and has no periodic component. The model produced is still a global model. SVR is just one implementation of the spacetime kernel. 26

Some Major References 1. Aizerman, M., Braverman, E., and Rozonoer, L., 1964. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25: 821-837. 2. Cheng, T., Wang, J.Q., 2009. Accommodating spatial associations in DRNN for space time analysis. Computers, Environment and Urban Systems, 33(6), 409-418. 3. Haussler, D. 1999. Convolutional kernels on discrete structures. Technical ReportUCSC- CRL-99-10, Computer Science Dept., UC Santa Cruz. 4. Nussbaumer, H. J., 1982. Fast fourier transform and convolution algorithms. Springer, Berlin. 5. Pozdnoukhov, A., and Kanevski, M., 2008. GeoKernels: modeling of spatial data on GeoManifolds. In M. Verleysen, editor, ESANN 2008: European Symposium on Artificial Neural Networks Advances in Computational Intelligence and Learning, Bruges, Belgium, 23-25, April. 6. Ralaivola, L., and d'alché-buc F., 2004. Dynamical modeling with kernels for nonlinear time series prediction. Advances in neural information processing systems, 16, pp. 129-136. 7. Vapnik, V., 1995. The nature of statistical learning theory. New York, Springer-Verlag. 8. Weston J., 1999 Extensions to the Support Vector Method. Ph.D. Thesis, Royal Holloway University of London 27

Thank you for listening! Visit our website: www.standard.cege.ucl.ac.uk Any questions or comments? 28

Kernels in time analysis time series forecasting SVR for travel time prediction SVR is used to predict travel times over three distances on Taiwan highways Input data Travel time on 45km stretch of highway; 178 and 350km stretches are also used. Experimental setup Time series SVR incorporates the principles of time series modelling. 28 days are used for training, 7 for testing The one-step ahead method is used for training and testing of the data. An embedding dimension of 5 is chosen to train the model. Performance Comparison A comparison with conventional techniques reveals significant improvements in forecasting ability. Forecasting errors SVR versus conventional techniques Wu, C.H., Wei, C.C, Su, D.C, Chang, M.H., Ho, J.M. (2003).Travel Time Prediction with Support Vector Regression. IEEE. http://www.csie.nuk.edu.tw/~wuch/publications/2003-itsc-svr.pdf Forecasting results real versus predicted 29

Application of STK 30