MACHINE LEARNING FOR GEOLOGICAL MAPPING: ALGORITHMS AND APPLICATIONS

Similar documents
Phase-averaged analysis of an oscillating water column wave energy converter

Responses of temperate mobile. macroinvertebrates to reef habitat. structure and protection from fishing. Timothy John Alexander, B.Sc.

SIXTH SCHEDULE REPUBLIC OF SOUTH SUDAN MINISTRY OF PETROLEUM, MINING THE MINING (MINERAL TITLE) REGULATIONS 2015

Newey, Philip Simon (2009) Colony mate recognition in the weaver ant Oecophylla smaragdina. PhD thesis, James Cook University.

Course in Data Science

Pattern Recognition and Machine Learning

Application and Challenges of Artificial Intelligence in Exploration

Statistical Rock Physics

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money

Contents 1 Introduction 2 Statistical Tools and Concepts

Resource Management through Machine Learning

GUIDELINES FOR OPEN PIT SLOPE DESIGN EDITORS: JOHN READ, PETER STACEY # & CSIRO. J x PUBLISHING

Effect of 3D Stress States at Crack Front on Deformation, Fracture and Fatigue Phenomena

Statistical Evaluations in Exploration for Mineral Deposits

Doctor of Philosophy

Man, Machine and Data: A Mineral Exploration Perspective

Learning from Data. Amos Storkey, School of Informatics. Semester 1. amos/lfd/

ASSESSING AND EVALUATING RECREATION RESOURCE IMPACTS: SPATIAL ANALYTICAL APPROACHES. Yu-Fai Leung

Evaluation of Mineral Resource risk at a high grade underground gold mine

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

A Statistical Framework for Analysing Big Data Global Conference on Big Data for Official Statistics October, 2015 by S Tam, Chief

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Three-Dimensional Electron Microscopy of Macromolecular Assemblies

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Spatiotemporal Analysis of Urban Traffic Accidents: A Case Study of Tehran City, Iran

For personal use only

Quantitative Interpretation

QUANTITATIVE INTERPRETATION

Contents. Preface to the second edition. Preface to the fírst edition. Acknowledgments PART I PRELIMINARIES

TASMANIAN SEAGRASS COMMUNITIES

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Self Organizing Maps. We are drowning in information and starving for knowledge. A New Approach for Integrated Analysis of Geological Data.

Neutron inverse kinetics via Gaussian Processes

Annex I to Resolution 6.2/2 (Cg-XVI) Approved Text to replace Chapter B.4 of WMO Technical Regulations (WMO-No. 49), Vol. I

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

1. Introduction. S.S. Patil 1, Sachidananda 1, U.B. Angadi 2, and D.K. Prabhuraj 3

STANDARD DEFINITIONS

CAMBRIAN INTRUSION-RELATED COPPER MINERALISATION AT THE THOMAS CREEK PROSPECT, SOUTHWESTERN TASMANIA

Industrial Rotating Kiln Simulation

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

ASX Announcement. FLEM survey underway at Henrietta Cobalt Nickel Prospect, Tasmania ASX: AX8 HIGHLIGHTS. Market Data. Board and Management.

Multivariate Analysis of Ecological Data using CANOCO

Case study: Rapid visualisation and modelling of geological data

Estimating the radiation environment in the Great Barrier Reef

Location Theory and Decision Analysis

Spatial Analysis and Modeling of Urban Land Use Changes in Lusaka, Zambia: A Case Study of a Rapidly Urbanizing Sub- Saharan African City

Stratimagic. Seismic Facies Classification

International Journal of Scientific & Engineering Research, Volume 6, Issue 7, July ISSN

Mir Md. Maruf Morshed

Linear and Logistic Regression. Dr. Xiaowei Huang

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

PATTERN CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Optical Tweezers: Experimental Demonstrations of the Fluctuation Theorem

For personal use only

Doctor of Philosophy (Applied Science) University of Canberra

The performance of estimation methods for generalized linear mixed models

EMEKA M. ILOGHALU, NNAMDI AZIKIWE UNIVERSITY, AWKA, NIGERIA.

WIDE AREA CONTROL THROUGH AGGREGATION OF POWER SYSTEMS

Landslide Hazard Assessment Methodologies in Romania

Prospectivity Modelling of Granite-Related Nickel Deposits Throughout Eastern Australia

Required Materials Plummer, C., Physical geology. Columbus, OH: McGraw Hill Higher Education

PROANA A USEFUL SOFTWARE FOR TERRAIN ANALYSIS AND GEOENVIRONMENTAL APPLICATIONS STUDY CASE ON THE GEODYNAMIC EVOLUTION OF ARGOLIS PENINSULA, GREECE.

Deconstructing Data Science

Notes on Discriminant Functions and Optimal Classification

60% upgrade of Flying Doctor Resource to 104,600 tonnes of contained zinc and lead.

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Case study: Integration of REFLEX iogas and an Olympus PXRF analyzer with Leapfrog Geo for advanced dynamic modelling and better decision making

Kurt Marfurt Arnaud Huck THE ADVANCED SEISMIC ATTRIBUTES ANALYSIS

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

The Importance of Spatial Literacy

UNCERTAINTY ANALYSIS OF TWO-SHAFT GAS TURBINE PARAMETER OF ARTIFICIAL NEURAL NETWORK (ANN) APPROXIMATED FUNCTION USING SEQUENTIAL PERTURBATION METHOD

Compliant Exploration for JORC and CIM Definition Standards / NI Certification

Brief Introduction of Machine Learning Techniques for Content Analysis

DEFORMATION AND METAMORPHISM OF THE AILEU FORMATION, EAST TIMOR. R. F. Berry (B.Sc. Hons.) SCHOOL OF EARTH SCIENCES. May 1979

Contemporary Data Collection and Spatial Information Management Techniques to support Good Land Policies

Data Informatics. Seon Ho Kim, Ph.D.

USE OF RADIOMETRICS IN SOIL SURVEY

3.4 Fuzzy Logic Fuzzy Set Theory Approximate Reasoning Fuzzy Inference Evolutionary Optimization...

SESE MEASURED RESOURCE EXCEEDS 650MT COAL

Handwritten English Character Recognition using Pixel Density Gradient Method

ECE521 week 3: 23/26 January 2017

The Outer Space Treaty of 1967 Preamble

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

ASSESSMENT OF LOCAL GOVERNANCE STRUCTURES, ATTITUDES AND PERCEPTIONS INFLUENCING MANAGEMENT OF THE MANGROVE ECOSYSTEM IN VANGA, KENYA

Lecture 3: Decision Trees

Three Dimensional Modeling of Geological Parameters in Volcanic Geothermal Systems. Part I Methods and Data.

Significant New Mineralised Zone Paroo Range (RGU:100%)

Empirical Risk Minimization, Model Selection, and Model Assessment

Natural Susceptibility to Coastal Erosion: Methodology and Mapping Summary

Anomaly Detection in Logged Sensor Data. Master s thesis in Complex Adaptive Systems JOHAN FLORBÄCK

Computational Learning Theory

Transiogram: A spatial relationship measure for categorical data

ENVS S102 Earth and Environment (Cross-listed as GEOG 102) ENVS S110 Introduction to ArcGIS (Cross-listed as GEOG 110)

Interpretation and Reservoir Properties Estimation Using Dual-Sensor Streamer Seismic Without the Use of Well

For personal use only

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Emergent proper+es and singular limits: the case of +me- irreversibility. Sergio Chibbaro Institut d Alembert Université Pierre et Marie Curie

Lesson 6: Accuracy Assessment

Transcription:

MACHINE LEARNING FOR GEOLOGICAL MAPPING: ALGORITHMS AND APPLICATIONS MATTHEW J. CRACKNELL BSc (Hons) ARC Centre of Excellence in Ore Deposits (CODES) School of Physical Sciences (Earth Sciences) Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy University of Tasmania May, 2014

i Did you ever fly a kite in bed? Did you ever walk with ten cats on your head? Did you ever milk this kind of cow? Well, we can do it. We know how. If you never did you should. These things are fun and fun is good. Dr. Seuss

ii

iii DECLARATION OF ORIGINALITY This thesis contains no material which has been accepted for a degree or diploma by the University or any other institution, except by way of background information and duly acknowledged in the thesis, and to the best of my knowledge and belief no material previously published or written by another person except where due acknowledgement is made in the text of the thesis, nor does the thesis contain any material that infringes copyright. AUTHORITY OF ACCESS This non-published content of the thesis (see below) may be made available for loan and limited copying and communication in accordance with the Copyright Act 1968. STATEMENT REGARDING PUBLISHED WORK CONTAINED IN THESIS Chapter 4 of this thesis is published under a Creative Commons Attribution (CC BY) licence. You are free to copy, communicate and adapt the work, so long as you attribute the authors. To view a copy of this licence, visit http://creativecommons.org/licenses/. The publishers of the papers comprising Chapters 5 to 6 hold the copyright for that content, and access to the material should be sought from the respective journals. Matthew J. Cracknell May 2014

iv Machine learning for geological mapping

v STATEMENT OF CO-AUTHORSHIP The following people and institutions contributed to the publication of work undertaken as part of this thesis: Matthew James Cracknell, ARC Centre of Excellence in Ore Deposits (CODES), School of Earth Sciences, University of Tasmania = Candidate Anya Marie Reading, ARC Centre of Excellence in Ore Deposits (CODES), School of Earth Sciences, University of Tasmania = Author 1 Andrew William McNeill, Mineral Resources Tasmania, Department of Infrastructure Energy & Resources (DIER) = Author 2 Author details and their roles: Paper 1, Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information : Located in Chapter 4 Candidate was the primary author and with Author 1 contributing to its development, refinement and presentation.

vi Machine learning for geological mapping Paper 2, The upside of uncertainty: Identification of lithology contact zones from airborne geophysics and satellite data using Random Forests and Support Vector Machines : Located in Chapter 5 Candidate was the primary author and with Author 1 contributing to development, refinement and presentation. Paper 3, Mapping geology and volcanic-hosted massive sulfide alteration in the Hellyer Mt Charter region, Tasmania, using Random Forests and Self-Organising Maps : Located in Chapter 6 Candidate was the primary author and with Author 1 contributing to its refinement and presentation and Author 2 contributing to its formalisation and development. We the undersigned agree with the above stated proportion of work undertaken for each of the above published (or submitted) peer-reviewed manuscripts contributing to this thesis: Signed: Anya M. Reading Supervisor School Of Earth Sciences University of Tasmania Jocelyn McPhie Head of School School Of Earth Sciences University of Tasmania Date:

vii ABSTRACT Machine learning algorithms are designed to identify efficiently and to predict accurately patterns within multivariate data. They provide analysts computational tools to aid predictive modelling and the interpretation of interactions between data and the phenomena under investigation. The analysis of large volumes of disparate multivariate geospatial data using machine learning algorithms therefore offers great promise to industry and research in the geosciences. Geoscience data are frequently characterised by a restriction in the number and distribution of direct observations, irreducible noise in these data and a high degree of intraclass variability and interclass similarity. The choice of machine learning algorithm, or algorithms and the details of how algorithms are applied must therefore be appropriate to the context of geoscience data. With this knowledge, I aim to employ machine learning as a means of understanding the spatial distribution of complex geological phenomena. I conduct a rigorous and comprehensive comparison of machine learning algorithms, representing the five general machine learning strategies, for supervised lithology classification applications. I also develop and test a novel method for obtaining robust estimates of the uncertainty associated with machine learning algorithm categorical predictions. The insights gained from these experiments leads to the further development and comparison of new methods for the incorporation of spatial-contextual information into machine learning supervised classifiers. In using machine learning algorithms for geoscience applications, I have developed bestpractice methodologies that address the challenges facing geoscientists for geospatial supervised classification. Guidelines are established that detail the preparation and integration of disparate spatial data, the optimisation of trained classifiers for a given application and the robust statistical and spatial evaluation of outputs. I demonstrate, through a case study in a region that is prospective for economic mineralisation, the combination of supervised and unsupervised machine learning algorithms for the critical appraisal of pre-existing geological maps and formulation of meaningful interpretations of geological phenomena.

viii Machine learning for geological mapping The experiments conducted as part of my research confirm the efficacy of machine learning algorithms to generate accurate geological maps representing a variety of terranes. I identify and explore key aspects of the spatial and statistical distributions of geoscience data that affect machine learning algorithm performance. My research clearly identifies Random Forests as a good first-choice algorithm for the prediction of classes representing lithologies using commonly available multivariate geological and geophysical data. Furthermore, Random Forests prediction uncertainty is shown to be closely related to ambiguous and/or erroneous classifications and, thus provides a practical means of indicating variable levels of confidence. Spatial-contextual information is best incorporated into machine learning supervised classifiers via the pre-processing of input variables and/or the post-regularisation of classifications. My findings indicate that a trade-off between optimal predictive models and interpretable explanatory models exists, whereby, intuitively interpretable models are not necessarily the most accurate. The practical application of machine learning algorithms requires the implementation of three key stages: (1) data pre-processing; (2) algorithm training; and (3) prediction evaluation. This methodology provides the foundation for generating accurate and geologically meaningful predictions with minimal user intervention and assists in the formulation of robust interpretations of complex geological phenomena. For example, classifications obtained by Random Forests are useful for critically appraising interpreted geological maps. Clusters produced by Self-Organising Maps indicate the presence of discrete, spatially contiguous and geologically significant sub-classes within individual lithological units, which represent regions of contrasting primary composition and alteration styles. My results may be widely applied to a broad range of practical geoscience challenges such as ore deposit targeting, geo-hazard risk assessment, engineering and construction projects, hydrological and environmental modelling and ecological studies. The applications of machine learning algorithms detailed in this thesis align well with state-of-the-art Big Data online infrastructure and virtual laboratories currently emerging in Australia.

ix CONTENTS DECLARATION OF ORIGINALITY... III AUTHORITY OF ACCESS... III STATEMENT REGARDING PUBLISHED WORK CONTAINED IN THESIS... III STATEMENT OF CO-AUTHORSHIP...V ABSTRACT...VII CONTENTS...IX LIST OF TABLES... XV LIST OF FIGURES... XVII LIST OF ABBREVIATIONS...XXI ACKNOWLEDGEMENTS... XXIII CHAPTER 1 INTRODUCTION... 1 1.1. Machine learning...2 1.2. Geological maps...4 1.3. Research scope and hypothesis...5 1.3.1. Major research questions to be addressed...6 1.4. Thesis structure...7 CHAPTER 2 MACHINE LEARNING THEORY AND IMPLEMENTATION... 9 2.1. Machine learning...9 2.1.1. Supervised versus unsupervised learning...10 2.2. Supervised classification...10 2.2.1. Classification strategies...11 2.2.1.1. Statistical learning algorithms...11 2.2.1.2. Instance-based learners...14 2.2.1.3. Logic-based learners...17 2.2.1.4. Support Vector Machines...20 2.2.1.5. Perceptrons...23 2.2.2. Supervised classifier implementation...25 2.2.2.1. Data pre-processing...26 2.2.2.2. Classifier training...27

x Machine learning for geological mapping 2.2.2.3. Prediction evaluation... 29 2.3. Unsupervised clustering... 33 2.3.1. Clustering strategies... 33 2.3.1.1. Partitioning algorithms... 33 2.3.1.2. Hierarchical algorithms... 35 2.3.1.3. Self-Organising Maps... 36 2.3.2. Unsupervised clustering implementation... 38 2.4. Conclusions... 38 CHAPTER 3 A REVIEW OF MACHINE LEARNING FOR GEOSCIENCE CLASSIFICATION APPLICATIONS...41 3.1. Machine learning non-geoscience applications... 41 3.2. Machine learning geoscience applications... 44 3.2.1. Classification of 0D data... 45 3.2.1. Classification of 1D data... 46 3.2.1.1. One temporal dimension... 46 3.2.1.2. One spatial dimension... 47 3.2.1. Classification of 2D data... 51 3.2.1.3. Land cover/vegetation mapping... 52 3.2.1.4. Geological mapping... 55 Supervised classification... 55 Unsupervised clustering... 58 Combined supervised and unsupervised methods... 60 3.3. Practical machine learning implementation... 61 3.3.1. Data... 63 3.3.2. Data pre-processing... 64 3.3.3. Prediction evaluation... 64 3.3.4. Integrated workflow... 65 3.4. Conclusions... 66 CHAPTER 4 GEOLOGICAL MAPPING USING REMOTE SENSING DATA: A COMPARISON OF FIVE MACHINE LEARNING ALGORITHMS, THEIR RESPONSE TO VARIATIONS IN THE SPATIAL DISTRIBUTION OF TRAINING DATA AND THE USE OF EXPLICIT SPATIAL INFORMATION...69 4.0. Abstract... 69 4.1. Introduction... 70 4.1.1. Machine learning for supervised classification... 72 4.1.2. Machine learning algorithm theory... 73 4.1.2.1. Naïve Bayes... 73 4.1.2.2. k-nearest Neighbours... 73

Contents xi 4.1.2.3. Random Forests...73 4.1.2.4. Support Vector Machines...74 4.1.2.5. Artificial Neural Networks...74 4.1.3. Geology and tectonic setting...75 4.2. Data...77 4.3. Methods...78 4.3.1. Pre-processing...78 4.3.2. Classification model training...79 4.3.3. Prediction evaluation...79 4.4. Results...79 4.5. Discussion...84 4.5.1. Machine learning algorithms compared...84 4.5.2. Influence of training data spatial distribution...87 4.5.3. Using spatially constrained data...88 4.6. Conclusions...89 4.7. Acknowledgements...90 4.8. Description of supplementary information...91 CHAPTER 5 THE UPSIDE OF UNCERTAINTY: IDENTIFICATION OF LITHOLOGY CONTACT ZONES FROM AIRBORNE GEOPHYSICS AND SATELLITE DATA USING RANDOM FORESTS AND SUPPORT VECTOR MACHINES...93 5.0. Abstract...93 5.1. Introduction...94 5.1.1. The lithology prediction problem...97 5.1.2. Random Forests...98 5.1.3. Support Vector Machines...99 5.2. Data...101 5.2.1. Tectonic setting and history...101 5.2.2. Data sources...103 5.2.3. Data pre-processing...103 5.3. Methods...103 5.3.1. Training and evaluating algorithms...105 5.3.2. Variance...106 5.4. Results...106 5.5. Discussion...114 5.6. Conclusions...118 5.7. Acknowledgements...119

xii Machine learning for geological mapping CHAPTER 6 MAPPING GEOLOGY AND VOLCANIC-HOSTED MASSIVE SULFIDE ALTERATION IN THE HELLYER MT CHARTER REGION, TASMANIA, USING RANDOM FORESTS AND SELF-ORGANISING MAPS... 121 6.0. Abstract...121 6.1. Introduction...122 6.1.1. Geological setting...123 6.1.2. Random Forests...128 6.1.3. Self-Organising Maps...130 6.2. Data and Methods...130 6.2.1. Source data...130 6.2.2. Data sampling...131 6.2.3. Training Random Forests and variable selection...133 6.2.4. Implementing Self-Organising Maps...136 6.3. Results...137 6.3.1. Geological classification using Random Forests...137 6.3.2. Discrimination of geological sub-classes using Self-Organising Maps...141 6.4. Discussion...144 6.5. Conclusions...146 6.6. Acknowledgements...147 CHAPTER 7 SPATIAL-CONTEXTUAL MACHINE LEARNING SUPERVISED CLASSIFIERS: LITHOSTRATIGRAPHY CLASSIFICATION EXAMPLE... 149 7.0. Abstract...149 7.1. Introduction...150 7.1.1. Pre-processing methods...152 7.1.1.1. Focal operators...152 7.1.1.2. Image segmentation...153 7.1.2. Training data selection...154 7.1.3. Post-processing methods...155 7.1.4. Combination methods...155 7.1.5. Study aims...155 7.2. Data...156 7.2.1. Lithostratigraphy classification target...156 7.2.2. Geophysical data input variables...159 7.2.2.1. Pre-processing...160 7.3. Methods...160 7.3.1. Data sampling...160 7.3.2. Global pixel-based classifiers...162

Contents xiii 7.3.3. Spatial-contextual classifiers...162 7.3.3.1. Pre-processing...162 7.3.3.2. Algorithm training...164 7.3.3.3. Post-processing...165 7.3.4. Prediction evaluation...165 7.4. Results...165 7.5. Discussion...173 7.5.1. Spatial-contextual classifiers compared...173 7.5.2. Issues of spatial scale...175 7.5.3. Geological interpretations...176 7.6. Conclusions...177 CHAPTER 8 SYNTHESIS AND DISCUSSION... 179 8.1. Algorithms...179 8.1.1. Supervised classification...179 8.1.1.1. Implementation...180 8.1.1.2. Decision structures...181 8.1.1.3. Accuracy comparison...181 8.1.1.4. Spatial-contextual classifiers...183 8.1.1.5. Prediction uncertainty...184 8.1.2. Unsupervised clustering...185 8.2. Applications...186 8.2.1. Data pre-processing...186 8.2.1.1. Data preparation...187 8.2.1.2. Variable extraction...188 8.2.1.3. Variable selection...189 8.2.2. Classifier training...189 8.2.2.1. Training and test data...190 8.2.2.2. Classifier induction...190 8.2.2.3. Classification post-processing...191 8.2.3. Evaluation and interpretation...192 8.2.3.1. Statistical evaluation...193 8.2.3.2. Interrogating decision structures...194 8.2.3.3. Complementary interpretation...197 8.3. Extended research implications...199 8.3.1. Integrated workflow using R...199 8.3.2. Wider geoscience applications...200 8.3.3. Big Data...202 CHAPTER 9 CONCLUSIONS... 205

xiv Machine learning for geological mapping REFERENCES... 209 APPENDIX A MACHINE LEARNING ALGORITHM SENSITIVITY TO IMBALANCED CLASS DISTRIBUTIONS... 253 A.1. Introduction...253 A.2. Methods...254 A.3. Results...256 A.4. Discussion and Conclusions...259 APPENDIX B VARIANCE AND ENTROPY FOR MULTICLASS CLASSIFICATION UNCERTAINTY... 261 APPENDIX C SUPPLEMENTARY INFORMATION... 263 C.1. Data...263 C.2. MLA software and parameters...266 APPENDIX D R PACKAGES... 269 APPENDIX E DATA SOURCES AND PRE-PROCESSING... 271 APPENDIX F R CODE AND SCRIPTS... 275 README.txt...275