Using Image Moment Invariants to Distinguish Classes of Geographical Shapes

Similar documents
Enhanced Fourier Shape Descriptor Using Zero-Padding

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

PRELIMINARY STUDIES ON CONTOUR TREE-BASED TOPOGRAPHIC DATA MINING

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

FARSI CHARACTER RECOGNITION USING NEW HYBRID FEATURE EXTRACTION METHODS

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

A Machine Learning Approach for Knowledge Base Construction Incorporating GIS Data for Land Cover Classification of Landsat ETM+ Image

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Image Recognition Using Modified Zernike Moments

Outline. Geographic Information Analysis & Spatial Data. Spatial Analysis is a Key Term. Lecture #1

Least-Cost Transportation Corridor Analysis Using Raster Data.

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

DETECTING HUMAN ACTIVITIES IN THE ARCTIC OCEAN BY CONSTRUCTING AND ANALYZING SUPER-RESOLUTION IMAGES FROM MODIS DATA INTRODUCTION

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Comparing Color and Leader Line Approaches for Highlighting in Geovisualization

PRINCIPLES OF PHOTO INTERPRETATION

Object Recognition Using Local Characterisation and Zernike Moments

Exploring representational issues in the visualisation of geographical phenomenon over large changes in scale.

An Introduction to Statistical and Probabilistic Linear Models

Machine Learning Linear Models

Neural networks (NN) 1

Logistic Regression. COMP 527 Danushka Bollegala

ESTIMATION OF LANDFORM CLASSIFICATION BASED ON LAND USE AND ITS CHANGE - Use of Object-based Classification and Altitude Data -

Algorithms for Characterization and Trend Detection in Spatial Databases

Be able to define the following terms and answer basic questions about them:

A comparison of optimal map classification methods incorporating uncertainty information

Temporal and spatial approaches for land cover classification.

Chapter 6 Spatial Analysis

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

Multivariate statistical methods and data mining in particle physics

Chapter 6. Fundamentals of GIS-Based Data Analysis for Decision Support. Table 6.1. Spatial Data Transformations by Geospatial Data Types

MIRA, SVM, k-nn. Lirong Xia

Machine Learning and Deep Learning! Vincent Lepetit!

Detection of Unauthorized Electricity Consumption using Machine Learning

Hu and Zernike Moments for Sign Language Recognition

Earthquake-Induced Structural Damage Classification Algorithm

Urban land cover and land use extraction from Very High Resolution remote sensing imagery

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Linear Regression and Discrimination

1. Introduction. S.S. Patil 1, Sachidananda 1, U.B. Angadi 2, and D.K. Prabhuraj 3

Classification using stochastic ensembles

Seismic source modeling by clustering earthquakes and predicting earthquake magnitudes

A Tutorial on Support Vector Machine

Concurrent Self-Organizing Maps for Pattern Classification

Analysis of Hu's Moment Invariants on Image Scaling and Rotation

CS 188: Artificial Intelligence Spring Announcements

Investigation of the Effect of Transportation Network on Urban Growth by Using Satellite Images and Geographic Information Systems

Machine Learning for Signal Processing Bayes Classification

Quality and Coverage of Data Sources

Relationship between Loss Functions and Confirmation Measures

Fractal Dimension and Vector Quantization

Basics of GIS. by Basudeb Bhatta. Computer Aided Design Centre Department of Computer Science and Engineering Jadavpur University

Characterization of Catchments Extracted From. Multiscale Digital Elevation Models

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 30, NO. 1, FEBRUARY I = N

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME:

An Efficient Algorithm for Fast Computation of Pseudo-Zernike Moments

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

A New Efficient Method for Producing Global Affine Invariants

GLR-Entropy Model for ECG Arrhythmia Detection

Why Spatial Data Mining?

arxiv: v1 [stat.ml] 23 Dec 2015

AUTOMATED BUILDING DETECTION FROM HIGH-RESOLUTION SATELLITE IMAGE FOR UPDATING GIS BUILDING INVENTORY DATA

Data Mining und Maschinelles Lernen

GIS Methodology in Determining Population Centers and Distances to Nuclear Power Plants

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Machine Learning. Ensemble Methods. Manfred Huber

Using object oriented technique to extract jujube based on landsat8 OLI image in Jialuhe Basin

Spatial analysis. Spatial descriptive analysis. Spatial inferential analysis:

Spatial Decision Tree: A Novel Approach to Land-Cover Classification

Preliminary Research on Grassland Fineclassification

Introduction to Machine Learning

A HYBRID MOMENT METHOD FOR IDENTIFICATION OF 3D-OBJECTS FROM 2D-MONOCROMATIC IMAGES

Introduction to GIS I

A. Pelliccioni (*), R. Cotroneo (*), F. Pungì (*) (*)ISPESL-DIPIA, Via Fontana Candida 1, 00040, Monteporzio Catone (RM), Italy.

CS 188: Artificial Intelligence Spring Announcements

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Self-Tuning Semantic Image Segmentation

That s Hot: Predicting Daily Temperature for Different Locations

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, August 18, ISSN

Final Exam, Fall 2002

Learning features by contrasting natural images with noise

Analysis of Fast Input Selection: Application in Time Series Prediction

Development of statewide 30 meter winter sage grouse habitat models for Utah

Fuzzy Geographically Weighted Clustering

Classifier Selection. Nicholas Ver Hoeve Craig Martek Ben Gardner

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Subcellular Localisation of Proteins in Living Cells Using a Genetic Algorithm and an Incremental Neural Network

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

Performance Analysis of Some Machine Learning Algorithms for Regression Under Varying Spatial Autocorrelation

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

FINAL: CS 6375 (Machine Learning) Fall 2014

Maximum Entropy Generative Models for Similarity-based Learning

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

In matrix algebra notation, a linear model is written as

1 Handling of Continuous Attributes in C4.5. Algorithm

OBJECT BASED IMAGE ANALYSIS FOR URBAN MAPPING AND CITY PLANNING IN BELGIUM. P. Lemenkova

An internet based tool for land productivity evaluation in plot-level scale: the D-e-Meter system

Basics of Multivariate Modelling and Data Analysis

Background literature. Data Mining. Data mining: what is it?

Transcription:

Using Image Moment Invariants to Distinguish Classes of Geographical Shapes J. F. Conley, I. J. Turton, M. N. Gahegan Pennsylvania State University Department of Geography 30 Walker Building University Park PA 680 Email: [jfc73, ijt, mng]@psu.edu. Introduction There are plenty of statistical methods and computer algorithms that can be used to find spatial disease clusters, yet simply finding the clusters is only the beginning of the knowledge discovery process (Fayyad et al. 996. Once a cluster is found, an explanation for its existence needs to be hypothesized, and this task of identifying potential causes for the clusters is currently left to the epidemiologist. One computational approach to this problem was tried in the Geographical Explanations Machine (GEM (Openshaw and Turton 00, although it was largely unusable due to the very lengthy computation times involved. Instead of searching for correlations between the locations of the clusters and other variables in the dataset, as in GEM s approach, this research aims to identify potential causes based on the shape of the cluster. To do so, it adapts existing computer vision and pattern recognition techniques to the problem of geographical cluster analysis. Cluster Detection Pattern/Process Database Shape Extraction Shape Representation Pattern Matching Hypothesis Generation Figure. The software system of which this research is a part. This works by comparing the shape of a detected cluster to those in a database containing shapes representative of various geographic processes. The entire computational process from cluster detection to hypothesis generation is shown in figure. This paper focuses on two sections of this system: the shape representation and pattern matching processes. Shapes are represented by a set of shape statistics that are computed for each shape and a pattern recognition system then assigns the detected cluster to one or more potential causes based on what shapes in the pattern/process database have similar

values of the shape statistics. In this case, the shape statistics are the set of seven moment invariants developed by Hu (96 and the classification system is a KNN classifier (Hastie et al. 00, pp. 5-0. This paper reports on experiments into the ability of the Hu s invariants paired with a K-Nearest Neighbor (KNN classifier to distinguish between different classes of geographical shapes in a pattern/process database.. Testing the Invariants Two experiments were conducted with a set of shapes, classified at different levels of aggregation. The shapes are, where possible, taken from Pennsylvania. The first experiment contained twelve classes, as shown in table. This first experiment had a fine level of distinction between the classes, creating road, river, and soil shape classes for each of the physiographic regions of Pennsylvania because of the expected correlations between the terrain contours of an area and the shapes of roads, rivers, and soil regions in that area. Thus the different terrain patterns of the different physiographic regions could give rise to different road, river, and soil region shapes. Shapes for the roads and rivers were generated by drawing a 50-meter buffer around the centre lines from a GIS data layer. The shapes of the soil regions are taken directly from a polygon GIS data layer. The shapes for all classes were taken from publicly available GIS data layers from PASDA (Pennsylvania Spatial Data Access and the USGS (United States Geological Survey, and there are 350 shapes in each class. The second test condenses these twelve classes down to four classes: urban areas, soil regions, roads, and rivers, using 50 shapes from each aggregated class. The 50 shapes are evenly divided between the subgroups for roads, rivers, and soil regions. Urban areas in the United States Rivers in the ridge and valley region Roads in the PA ridge and valley region Rivers in the Allegheny plateau Roads in the Allegheny plateau in PA Rivers in the PA piedmont Roads in the piedmont region of PA Soil regions in the ridge and valley region Roads in Pittsburgh Soil regions in Allegheny plateau Roads in Philadelphia Soil regions in the PA piedmont Table. Shape classes in the -class test.. About the Invariants Moment invariants are derived from the theory of statistical moments (Casella and Berger 00, pp. 59-68. Hu (96 developed seven invariants for computer vision by applying the concepts from bivariate statistical moments to pixel values in gray-scale images. Two-dimensional statistical moments, take the form of equation : the expected ( m p, q value (denoted Ε of the two variables, X and Y, raised to integer powers, p and q. In a geographical context, X and Y are spatial variables and f ( x, y is a function that varies across space. m p q p q = Ε( X Y = x y f ( x, ( y x X y Y

Extending these moments to image processing, f ( x, y is the intensity of the pixel at ( x, y. Furthermore, for black and white images, f ( x, y = if the pixel is black and 0 if it is white. To make these moments invariant to location, so that if a shape is moved in any direction, the moment values remain constant, we shift the x and y values so that the shape s centroid is at (0, 0, giving centralized moments, µ, in equation. p q µ = ( x x ( y y f ( x, y x X y Y Next, the centralized moments are normalized with respect to area, giving normalized moments, η in equation 3. ( µ p, q η p, q = where the normalization factorγ = p+q γ + (3 ( µ 0, 0 Finally, the seven invariants in equations through 0, which are also made invariant to rotation, are given by Hu (96. φ = η + (,0 η0, φ ( φ = η,0 η0,, 3 = ( η, + (3η, η φ + = ( η, + ( η, η φ = ( η 5 (3η, η, ( η ( η, [( η, [3( η,, 3( η ( η,, ] + φ 6 = ( η,0 η0,[( η, ( η, ] + η,( η, ( η, 0, 3 ] (5 (6 (7 (8 (9 φ = (3η 7 ( η, η, ( η ( η, [( η, [3( η,, 3( η ( η,, ] ] (0 To improve classification rates, the natural logarithm of the absolute value of the seven invariants are used instead of the invariants themselves, because the invariants often have very low absolute values (less than 0.00, so taking the logarithm of the invariants reduces the density of the invariants near the origin. Also, the values of the natural logarithms are then converted into standardized z-scores before classification.. Pattern Recognition with a KNN Classifier Once these seven invariants are computed for each shape in the database, a classification algorithm can be applied to the seven-dimensional shape vectors in the data space created by the invariants to determine how well these invariants distinguish between the twelve shape classes listed in table or between the four aggregated shape classes. This research used a K Nearest Neighbor classifier (Hastie et al. 00, pp. 5-0 because a KNN classifier does not make any assumptions about the form of the distribution of the vectors in the data space. A preliminary examination of the 3

distribution suggests a skewed distribution rather than a standard normal distribution, so a classifier that does not rely on a particular distribution is preferred. 3. Results Classification accuracy rates for all values of k between and 00 on both the -class and the aggregated -class tests were computed using a leave-one-out validation method in which each shape s class was estimated as the class that is most frequent in the k nearest neighbors in the rest of the dataset. Figures and 3 show the accuracy rate for each class and value of k in the -class and -class tests respectively. A confusion matrix for the -class test, which is not included because of space considerations, shows that for many of the misclassified shapes, they are still classified in the same general category, i.e., urban areas, roads, rivers, and soils. 0.8 0.7 0.6 0.5 0. 0.3 0. 0. 0 6 6 6 3 36 accuracy 6 5 56 6 66 7 76 8 86 9 96 urban road_ridge road_plateau road_plain road_pitt road_philly river_ridge river_plateau river_plain jun_miff_soil ind_soil lanc_soil k Figure. Accuracy rates for each class on the -class test.. Discussion In both the -class and -class tests the KNN classification of the standardized log invariants was better than random (which would give accuracy rates of 0.083 and 0.5 respectively for most classes and most values of k. The worst classification was that of soils in the -class test. Further investigation shows that the classification for soil region shapes in the piedmont region are often misclassified as urban area shapes, giving this sub-group an accuracy rate well below random in the -class test (e.g., an accuracy of 0. when k = 35, which decreases the accuracy of the combined soil class.

0.9 0.8 0.7 0.6 0.5 0. 0.3 0. 0. 0 7 3 9 5 3 37 3 9 55 6 67 73 79 85 9 97 accuracy urban soil river road k Figure 3. Accuracy rates for each class on the -class test. Further testing will determine the ability of this system to identify the shape classes in the context of cluster detection and analysis. This will be done by generating a synthetic dataset containing disease clusters among a background population where the clusters are based on shapes from this pattern/process shape database, running a cluster detection algorithm on the synthetic dataset, and classifying the detected clusters based on their shapes. If these seven invariants and a KNN classifier fail, other image processing statistics, such as Legendre or Zernike moments (Teh and Chin 988, and other pattern recognition classifiers, such as a maximum likelihood classifier (Hastie et al. 00, pp. 9-3, can be used. 5. Acknowledgements The authors would like to acknowledge the Academic Computing Fellowship program from the Pennsylvania State University Graduate School for financial support of the first author s doctoral research, of which this is a part. 6. References Casella G, and Berger RL, 00, Statistical Inference, nd Edition. Duxbury, Pacific Grove, CA, USA. Fayyad U, Piatetsky-Shapiro G, and Smyth P, 996, From Data Mining to Knowledge Discovery in Databases. AI Magazine, 7(3:37-5. Hastie T, Tibshirani R, and Friedman J, 00, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, USA. Hu M-K, 96, Visual pattern recognition by moment invariants. IEEE Transactions on Information Theory, 8:79-87. Openshaw S, and Turton I, 00, Using a Geographical Explanations Machine to Explore Spatial Factors Relating to Primary School Performance. Geographical & Environmental Modelling, 5(:85-0. Teh C-H, and Chin RT, 988, On Image Analysis by the Methods of Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 0(:96-53. 5