Machine Learning of Environmental Spatial Data Mikhail Kanevski 1, Alexei Pozdnoukhov 2, Vasily Demyanov 3

Similar documents
Multitask Learning of Environmental Spatial Data

Machine Learning Algorithms for GeoSpatial Data. Applications and Software Tools

Analysis of Interest Rate Curves Clustering Using Self-Organising Maps

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

arxiv: v1 [q-fin.st] 27 Sep 2007

Decision-Oriented Environmental Mapping with Radial Basis Function Neural Networks

Environmental Data Mining and Modelling Based on Machine Learning Algorithms and Geostatistics

Space-Time Kernels. Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Mining Classification Knowledge

Advanced Mapping of Environmental Data: Introduction

Nonlinear Classification

Hierarchical models for the rainfall forecast DATA MINING APPROACH

Mining Classification Knowledge

Chap.11 Nonlinear principal component analysis [Book, Chap. 10]

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Spatial Statistics & R

ECE521 Lectures 9 Fully Connected Neural Networks

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Machine Learning 2010

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Matching the dimensionality of maps with that of the data

CS-E3210 Machine Learning: Basic Principles

Learning from Data. Amos Storkey, School of Informatics. Semester 1. amos/lfd/

Lecture 6. Notes on Linear Algebra. Perceptron

Introduction to SVM and RVM

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Application of Fully Recurrent (FRNN) and Radial Basis Function (RBFNN) Neural Networks for Simulating Solar Radiation

Tools of AI. Marcin Sydow. Summary. Machine Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Neutron inverse kinetics via Gaussian Processes

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

Learning from Data: Regression

Internet Engineering Jacek Mazurkiewicz, PhD

Reducing Uncertainty in Modelling Fluvial Reservoirs by using Intelligent Geological Priors

Machine Learning Lecture 7

CSE446: non-parametric methods Spring 2017

Stochastic gradient descent; Classification

Introduction to Neural Networks

Learning from Data: Multi-layer Perceptrons

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Probabilistic Energy Forecasting

Artificial neural networks

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

COMS 4771 Introduction to Machine Learning. Nakul Verma

Recent Advances in Bayesian Inference Techniques

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Neural Networks biological neuron artificial neuron 1

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Multivariate Analysis, TMVA, and Artificial Neural Networks

Linear & nonlinear classifiers

A bottom-up strategy for uncertainty quantification in complex geo-computational models

Introduction to Neural Networks

Statistical Machine Learning from Data

MODELLING ENERGY DEMAND FORECASTING USING NEURAL NETWORKS WITH UNIVARIATE TIME SERIES

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Relevance Vector Machines for Earthquake Response Spectra

Feed-forward Network Functions

Jae-Bong Lee 1 and Bernard A. Megrey 2. International Symposium on Climate Change Effects on Fish and Fisheries

Deep Feedforward Networks

Pattern Recognition and Machine Learning

Machine Learning. 7. Logistic and Linear Regression

Neural Network Training

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

Multivariate Methods in Statistical Data Analysis

DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Multilayer Perceptron

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Statistical Machine Learning from Data

Support Vector Machine. Industrial AI Lab.

Geographical General Regression Neural Network (GGRNN) Tool For Geographically Weighted Regression Analysis

Multivariate statistical methods and data mining in particle physics

CSC242: Intro to AI. Lecture 21

CSC321 Lecture 4: Learning a Classifier

Artificial Neural Networks

Optimal Artificial Neural Network Modeling of Sedimentation yield and Runoff in high flow season of Indus River at Besham Qila for Terbela Dam

Learning Vector Quantization (LVQ)

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

Spatio-temporal avalanche forecasting with support vector machines

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Reading Group on Deep Learning Session 1

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Kernel Methods and Support Vector Machines

Ch 4. Linear Models for Classification

Neural networks. Chapter 20, Section 5 1

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition

Neural networks. Chapter 19, Sections 1 5 1

Advanced statistical methods for data analysis Lecture 2

Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology

Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto 2017

SOIL MOISTURE MODELING USING ARTIFICIAL NEURAL NETWORKS

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

CSCI 315: Artificial Intelligence through Deep Learning

Transcription:

1 3 4 5 6 7 8 9 10 11 1 13 14 15 16 17 18 19 0 1 3 4 5 6 7 8 9 30 31 3 33 International Environmental Modelling and Software Society (iemss) 01 International Congress on Environmental Modelling and Software Managing Resources of a Limited Planet, Sixth Biennial Meeting, Leipzig, Germany R. Seppelt, A.A. Voinov, S. Lange, D. Bankamp (Eds.) http://www.iemss.org/society/index.php/iemss-01-proceedings Machine Learning of Environmental Spatial Data Mikhail Kanevski 1, Alexei Pozdnoukhov, Vasily Demyanov 3 1 Institute of Geomatics and Analysis of Risk, University of Lausanne (Mikhail.Kanevski@unil.ch) National Centre for Geocomputation, National University of Ireland Maynooth, Ireland Institute of Petroleum Engineering, Heriot-Watt University, Edinburgh WORKING PAPER There is a growing demand for new adaptive processing tools for different environmental problems: spatio-temporal measurements, satellite images, time series of monitoring, environmental risks and natural hazards assessments, renewable resources estimates, topo-climatic and meteorological measurements, etc. The models and approaches proposed for such problems should be nonlinear, robust and automatic able to work in a changing environment with noisy and variable and several spatio-temporal scales data, capable to integrate sciencebased models. Very often the input space the space of independent variables, ios high dimensional and is composed of geographical coordinates and other relevant features, e.g., generated from digital elevation models. An important task is work and to characterize uncertainties both in original data and in the results. Such tools can be provided by Machine Learning (ML), which is a general and powerful field for processing and nonlinear universal modelling of complex high dimensional data. The workshop will present the basic concepts underlying a wide range of conventional ML algorithms and provide the cutting-edge data analysis, modelling and visualisation tools: 34 Artificial neural networks: multilayer perceptrons, radial basis function 35 networks, general regression and probabilistic neural networks, 36 Self-organizing Kohonen maps, 37 Support vector machines and other kernel-based methods. 38 39 40 41 4 43 44 45 46 47 Real case studies from environmental a variety of problems, like pollution (soil, water systems), climate (temperature and precipitation in a complex regions), natural hazards (landslides, avalanches), renewable resources (wind fields) and other fields of applications will be outlined focusing on the software tools used. The workshop will be useful both for the beginners and advanced researchers and users. Some topics of general interest predictability, complexity, automatic data processing will be presented in detail. Workshop deliverables: tutorial slides, detailed "how-to-do-it" case studies, software tools, and datasets.

48 49 50 51 5 53 54 55 56 57 58 The workshop is based on the following books of the authors: 59 60 61 6 63 64 65 66 67 68 69 70 71 7 73 74 75 76 77 78 79 80 81 8 83 84 Kanevski M., and M. Maignan. Analysis and Modelling of Spatial Environmental Data. EPFL Press, Lausanne, Switzerland, 004. Kanevski M. (Editor). Advanced Mapping of Spatial Environmental Data. Geostatistics, Machine Learning, Bayesian Maximum Entropy. iste & Wiley, 008. Kanevski, M., A. Pozdnoukhov, and V. Timonin, Machine Learning for Spatial Environmental Data: Theory, Applications and Software. EPFL Press, Lausanne, 009. ACKNOWLEDGMENTS The authors would like to thank to Dr. V. Timonin, who contributed to the development of machine learning software. Many thanks to colleagues with whom many interesting case studies on environmental data mining were carried out: Dr. D. Tuia, Dr. L. Foresti, G. Matasci, M. Volpi, Ch. Kreis, Ch. Kaiser. Partly the research was carried out within the framework of Swiss National Science Foundation projects GeoKernels. Phase and KernelCD.

85 86 87 88 89 90 91 9 93 94 95 96 97 98 99 100 101 10 103 104 105 106 107 108 Lecture 1. Predictive learning from environmental data MAIN TOPICS: A. ENVIRONMENTAL DATA: a. spatial, temporal, spatio-temporal; b. multivariate; c. noisy; d. extremes and outliers; e. multiscale variability; f. nonstationarity Representativity of data? Predictability? Data = Information(structure, patterns) + noise 109 110 111 11 113 114 115 116 117 118 119 10 11 1 13 14 15 B. PREDICTIVE LEARNING: a. basic problems and concepts b. some theory c. model selection and model assessment C. ILLUSTRATIVE CASE STUDIES

16 17 18 19 130 131 13 133 134 135 136 137 138 139 140 141 14 143 144 145 146 147 148 149 150 151 15 153 154 155 156 157 158 159 160 161 16 163 Lecture. Spatial prediction of environmental data.1 First model: k-nearest Neighbours = benchmark model Patterns (right) and corresponding k-nn cross-validation curves. Multilayer Perceptrons workhorse of machine learning: Data preprocessing Construction of MLP model Training of MLP model Regularization techniques Analysis of the results (residuals) Simulated case studies Real data case studies 164 165 166 167 168 169 170 171 17 173 174 175 176 177 178 179 Noise injection as a regularization procedure

180 181 18 183 184 185 186 187 188 189 190 191 19 193 194 195 196 197 198 199 00 01 0 03 04 05 06 07 08 09 10 11 1 13 14 15 16 17 18 19 0 1 3 4 5 6 7 8 9 30 31 3 33.3 General Regression Neural Networks INPUT IMAGE LAYER GRNN estimate at a node D i from samples Z i : Z OUT INTAGRATION LAYER i= 1 i N OUTPUT ( i ) ( Di h ) Z exp D h GRNN Mapping: detection of patterns, modelling, analysis of the residuals, mapping, assessment of the uncertainty. = N i= 1 exp

34 35 36 37 38 39 40 41 4 43 44 45 46 47 48 49 50 51 5 53 54 55 56 57 58 59 60 61 6 63 64 65 66 67 68 69 70 71 7 73 74 75 76 Lecture 3. Introduction to Statistical Learning Theory Support Vector Machines Support Vector Regression Support Vector Classification/Regression Based on Statistical Learning Theory Non-linear Robust Classification and Regression Kernel Method High dimensional, prone to over-fitting Allows for training errors Unique solution (unlike MLPs) Probabilistic interpretation of the classification Multiclass classification Advanced topics: Active Learning Monitoring networks optimization Multiple kernel learning

77 78 79 80 81 8 83 84 85 86 87 88 89 90 91 9 93 94 95 96 97 98 99 300 301 30 303 304 305 306 307 308 309 Lecture4. ANNEX topics 4.1 Geostatistics and MLA 4. ANNEX Models 4.3 Multitask learning 310 311 31 313 314 315 316 Manifold learning

317 318 319 30 31 3 33 34 35 36 37 4.4 Self-Organizing Maps (Kohonen Maps) SOM is Algorithm that projects high-dimensional data usually onto a two dimensional map (rectangular or hexagonal) having (M x xm y ) neurons/nodes. Each neuron is associated with weight vector with the dimension equal to the dimension of the data. The projection preserves the topology of the data so that similar data will be mapped to nearby locations on the map. 38 39 330 331 33 333 334 335 336 337 338 339 340 341 34 343 344 345 346 347 348 349 350 351 35 353 354 355 356 357 358 359 360 361 36 363 0 10000 0000 Self-organizing (Kohonen) map 1 0 0 1 SOM case study: classification of spatio-temporal pollution data 4.3 Manifold Learning SOM classification of spatio-temporal environmental adata Discussions, Conclusions, Future Research