USING TOPONYM CO-OCCURRENCES TO MEASURERELATIONSHIPS BETWEEN PLACES

Similar documents
Small and medium-sized cities in the Age of the Urban Triumph

Extracting and Analyzing Semantic Relatedness between Cities Using News Articles

Measuring topographic similarity of toponyms

Road Surface Condition Analysis from Web Camera Images and Weather data. Torgeir Vaa (SVV), Terje Moen (SINTEF), Junyong You (CMR), Jeremy Cook (CMR)

Learning Features from Co-occurrences: A Theoretical Analysis

Globally Estimating the Population Characteristics of Small Geographic Areas. Tom Fitzwater

Correlation Preserving Unsupervised Discretization. Outline

Spatializing time in a history text corpus

Multi-Label Informed Latent Semantic Indexing

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

Evaluation Module 5 - Class B11 (September 2012) Responsible for evaluation: Dorte Nielsen / Cristina Lerche Data processing and preparation of

Activity Identification from GPS Trajectories Using Spatial Temporal POIs Attractiveness

CSC Neural Networks. Perceptron Learning Rule

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

XXIII CONGRESS OF ISPRS RESOLUTIONS

Assignment 1 Physics/ECE 176

Topic Models and Applications to Short Documents

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Machine Learning for natural language processing

Online Advertising is Big Business

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

Collaborative topic models: motivations cont

TE4-2 30th Fuzzy System Symposium(Kochi,September 1-3,2014) Advertising Slogan Selection System Using Review Comments on the Web. 2 Masafumi Hagiwara

Exploit your geodata to enable smart cities and countries

15 Introduction to Data Mining

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

Social Media & Text Analysis

Object-based feature extraction of Google Earth Imagery for mapping termite mounds in Bahia, Brazil

Unsupervised Learning with Permuted Data

Urban land use information retrieval based on scene classification of Google Street View images

Text mining and natural language analysis. Jefrey Lijffijt

SANTA CLARA COUNTY THEMATIC ATLAS

Methodological issues in the development of accessibility measures to services: challenges and possible solutions in the Canadian context

Latent Semantic Analysis. Hongning Wang

GIS and the Other Enterprise Database

Information Retrieval and Organisation

Linear Spectral Hashing

CS 6375 Machine Learning

Online Advertising is Big Business

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research

Agglomeration economies and urban growth

Generative Clustering, Topic Modeling, & Bayesian Inference

Alleghany County Schools Curriculum Guide GRADE/COURSE: World Geography

Machine learning for pervasive systems Classification in high-dimensional spaces

CME323 Distributed Algorithms and Optimization. GloVe on Spark. Alex Adamson SUNet ID: aadamson. June 6, 2016

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Internet Engineering Jacek Mazurkiewicz, PhD

6.036 midterm review. Wednesday, March 18, 15

Understanding Your Community A Guide to Data

Open Source Software Education in Texas

Why Do We Live Here? : A Historical Geographical Study of La Tabatiere, Quebec North Shore

JOB DESCRIPTION. Research Associate - Urban Economy and Employment

Machine Learning for natural language processing

Measuring Discriminant and Characteristic Capability for Building and Assessing Classifiers

Be relevant and effective thinking beyond accuracy and timeliness

A Correlation of. Eastern Hemisphere. Ohio s Learning Standards Social Studies: K-12 Grade 6

What is semi-supervised learning?

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Joint-accessibility Design (JAD) Thomas Straatemeier

Astroinformatics: massive data research in Astronomy Kirk Borne Dept of Computational & Data Sciences George Mason University

National Survey on the Natural Environment of Japan. Hajime Hirosawa Biodiversity Center of Japan Ministry of the Environment (MOE)

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Advancing Machine Learning and AI with Geography and GIS. Robert Kircher

On the Directional Predictability of Eurostoxx 50

The evolution of Geomatics at Delft University of Technology

Welcome to GST 101: Introduction to Geospatial Technology. This course will introduce you to Geographic Information Systems (GIS), cartography,

Labour Market Areas in Italy. Sandro Cruciani Istat, Italian National Statistical Institute Directorate for territorial and environmental statistics

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Oil Field Production using Machine Learning. CS 229 Project Report

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Introduction To Artificial Neural Networks

Latent Dirichlet Allocation

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Statistical Learning Reading Assignments

Finding Main Streets: Applying Machine Learning to Urban Design Planning

Click Prediction and Preference Ranking of RSS Feeds

An Approach to Classification Based on Fuzzy Association Rules

The GapVis project and automated textual geoparsing

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Topics in Natural Language Processing

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Your web browser (Safari 7) is out of date. For more security, comfort and. the best experience on this site: Update your browser Ignore

Multi-theme Sentiment Analysis using Quantified Contextual

St John s Catholic Primary School. Geography Policy. Mission Statement

Socials Studies. Chapter 3 Canada s People 3.0-Human Geography

Information Extraction from Text

Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Provision of Web-Based Childcare Support Maps by Local Governments in Japan

Logistic Regression & Neural Networks

DM-Group Meeting. Subhodip Biswas 10/16/2014

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Kaviti Sai Saurab. October 23,2014. IIT Kanpur. Kaviti Sai Saurab (UCLA) Quantum methods in ML October 23, / 8

Latent Dirichlet Allocation Introduction/Overview

Transcription:

USING TOPONYM CO-OCCURRENCES TO MEASURERELATIONSHIPS BETWEEN PLACES A Novel Perspective on the Dutch Settlement System Evert Meijers (TU Delft)

CITIES CANNOT BE STUDIED IN ISOLATION Fate and fortune of cities depends on how they are embedded in flows of goods, people, information and capital, as well as their absorptive capacity to use and exploit these flows (Bourne & Simmons, 1978; Hohenberg & Lees, 1986; Camagni & Capello, 2004; McCann & Acs, 2011; Meijers, Burger & Hoogerbrugge, 2016; Neal, 2013; Taylor & Derudder, 2016) The importance of networks between places and regions has always been obscured by the difficulty of obtaining consistent information on these networks or flows between places. The lack of evidence on networks between cities has been considered the dirty little secret (Short, Kim, Kuus, Wells, 1996) of research into networks of cities, especially on the global scale.

ENTER: THE TOPONYM CO-OCCURRENCE APPROACH Linguistics (Harris, 1954) Scientometrics: The use of co-occurrence data in scientometrics assumes that the greater the probability of two elements co-occuring in the same article, the more strongly they are related. (Chavalarias and Cointet, 2013) Infometrics: If two organizations are related, their names are likely to be mentioned together on Webpages (Vaughan and You, 2010) The essence of toponym co-occurrence approach: (a) it retrieves information on intercity relationships from text corpora in which places are mentioned together ( semantic relatedness ). (b) uses machine learning techniques to excavate the context in which these place names co-occur in texts in order to categorize these relationships in a meaningful way.

EARLY APPLICATIONS IN GEOGRAPHY Devriendt et al. 2008 Tobler and Wineburg, 1971

RESEARCH APPROACH: LARGE-SCALE APPLICATION TO WEB DATA About 70% of online documents contain place references (Hill, 2006) CommonCrawl Web Archive (March 2017 snapshot of the internet). Focus on 25 million.nl pages (out of 3 billion pages) All Dutch places with over 750 inhabitants (N=1639) Co-occurrences: minimal 2, maximum 25 Out of 1,342,341 pairs of places in the Netherlands, 515,658 co-occur at least once (38,4%).

DEALING WITH BIASED PLACE NAMES

THE SETTLEMENT SYSTEM OF THE NETHERLANDS Observed spatial organisation of the Netherlands based on the pattern of toponym co-occurrences

GRAVITY MODEL EXPLAINS PATTERN REASONABLY WELL

THE SETTLEMENT SYSTEM OF THE NETHERLANDS Observed versus expected relations between Dutch places, based on toponym co-occurrences (and gravity modelling)

NETWORK EMBEDDEDNESS OF PLACES (ONLY CONSIDERING 100 LARGEST PLACES IN NL) BASED ON RESIDUALS OF GRAVITY MODEL 6

ALSO WORKS FOR SMALLER PLACES ZEELAND

CLASSIFYING RELATIONSHIPS THROUGH MACHINE LEARNING Traditional travel surveys distinguish between different travel motives such as commuting, education, leisure, shopping etc. Explored whether it would be possible to classify relationships between places according to similar motives We employ a so-called supervised algorithm. To train this algorithm, we used labelled data to train the classifier, taken from the open data repository of Netherlands Statistics (CBS) who have tagged articles on their websites in a professional way (and cover the different travel motives we intend to study)

CLASSIFICATION ATTEMPT UNSATISFYING SO-FAR Using 0,75 probability level, only 1.75% of all cooccurrences are categorised Out of 515,658 co-occurrences, our trained classifier managed to label between 11% (commuting) and 61% (collaboration) using the lower probability threshold (0.25) The classification method shows promise, but needs to be improved to be truly useful; e.g. using an unsupervised algorithm, better training Other challenges: automated way of dealing with place name disambiguation, using other text corpora, etc.

MORE INFO: