Data Science Unit. Global DTM Support Team, HQ Geneva

Similar documents
Using Solar Active Region Latitude Analysis to Monitor Solar Cycle Progress

Implementing Visual Analytics Methods for Massive Collections of Movement Data

What is a map? Understanding your map needs

Give 4 advantages of using ICT in the collection of data. Give. Give 4 disadvantages in the use of ICT in the collection of data

SpatialSTEM: A Mathematical/Statistical Framework for Understanding and Communicating Map Analysis and Modeling

Position and Displacement

PURE MATHEMATICS Unit 1

DATA DISAGGREGATION BY GEOGRAPHIC

Overview. GIS Data Output Methods

ENV208/ENV508 Applied GIS. Week 2: Making maps, data visualisation, and GIS output

Appendix 1: UK climate projections

Rules for Motion Maps

A route map to calibrate spatial interaction models from GPS movement data

3 Results. Part I. 3.1 Base/primary model

Linear Algebra I. Ronald van Luijk, 2015

Math 016 Lessons Wimayra LUY

Project Appraisal Guidelines

Course Name: AP Physics. Team Names: Jon Collins. Velocity Acceleration Displacement

Introduction to Vectors

Discriminative Direction for Kernel Classifiers

The South African Social Security

Isomorphisms between pattern classes

AN APPLICATION OF LINEAR ALGEBRA TO NETWORKS

Before we consider two canonical turbulent flows we need a general description of turbulence.

Visitor Flows Model for Queensland a new approach

Geo-spatial Analysis for Prediction of River Floods

Approaching quantitative accuracy in early Dutch city maps

5.1 Introduction. 5.2 Data Collection

Analysing Australian Temperature Trends

Motion Graphs Refer to the following information for the next four questions.

Geographic Data Science - Lecture IV

Traffic Demand Forecast

The Gram Schmidt Process

The Gram Schmidt Process

Revision Guide. Chapter 7 Quantum Behaviour

Using Solar Active Region Latitude Analysis to Monitor Solar Cycle Progress

LINEAR ALGEBRA - CHAPTER 1: VECTORS

Refractivity Data Fusion

Singular Value Decomposition

Geography. Geography A. Curriculum Planner and Skills Mapping Grid GCSE Version 1 October 2012

Clustering with k-means and Gaussian mixture distributions

The Building Blocks of the City: Points, Lines and Polygons

Scope and Sequence: National Curriculum Mathematics from Haese Mathematics (7 10A)

Examples: u = is a vector in 2. is a vector in 5.

Math Week 1 notes

Weighted flow diagrams for statistical output

7.7H The Arithmetic of Vectors A Solidify Understanding Task

BERTINORO 2 (JVW) Yet more probability Bayes' Theorem* Monte Carlo! *The Reverend Thomas Bayes

A4. Methodology Annex: Sampling Design (2008) Methodology Annex: Sampling design 1

IAEG-SDGs WGGI Task Team Dec. 7, 2017, New York

Refinement of the OECD regional typology: Economic Performance of Remote Rural Regions

Quantifying and mitigating correlated noise between formed beams on the ASKAP Phased Array Feeds

Developments & Limitations in GSR Analysis

New York State Regents Examination in Algebra II (Common Core) Performance Level Descriptions

Curve Fitting Re-visited, Bishop1.2.5

Compact guides GISCO. Geographic information system of the Commission

Extracting Patterns of Individual Movement Behaviour from a Massive Collection of Tracked Positions

Effect of Magnet Geometry on the Magnetic Component of the Lorentz Force Equation

THE EARTH AND ITS REPRESENTATION

TEACHER NOTES FOR ADVANCED MATHEMATICS 1 FOR AS AND A LEVEL

1 Using standard errors when comparing estimated values

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Figure 1: Conventional labelling of axes for diagram of frequency distribution. Frequency of occurrence. Values of the variable

Intelligent GIS: Automatic generation of qualitative spatial information

Least Squares Optimization

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Research & Reviews: Journal of Pure and Applied Physics

8.3.2 The finite size scaling method

Hidden Markov Models Part 1: Introduction

Harbor Creek School District

Geology for Engineers Sediment Size Distribution, Sedimentary Environments, and Stream Transport

Comparison Figures from the New 22-Year Daily Eddy Dataset (January April 2015)

1)Write the integer that represents the opposite of each real-world. situation. In words, write the meaning of the opposite.

Comparing Internal Migration Around the GlobE (IMAGE): The Effects of Scale and Pattern

Vector Basics, with Exercises

Chapter 2 Invertible Mappings

Machine Learning, Fall 2009: Midterm

Isometries. Chapter Transformations of the Plane

AP PHYSICS 2 FRAMEWORKS

Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems

A Simplified Method for the Design of Steel Beam-to-column Connections

4 a b 1 1 c 1 d 3 e 2 f g 6 h i j k 7 l m n o 3 p q 5 r 2 s 4 t 3 3 u v 2

Decoherence and the Classical Limit

Worldwide Data Quality Effects on PBL Short-Range Regulatory Air Dispersion Models

Granite School District Parent Guides Utah Core State Standards for Mathematics Grades K-6

Probability Distributions

Spacetime Diagrams Lab Exercise

IV. Overlay of community valuations and conservation valuations

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

which has a check digit of 9. This is consistent with the first nine digits of the ISBN, since

A MONTE CARLO SIMULATION OF COMPTON SUPPRESSION FOR NEUTRON ACTIVATION ANALYSIS. Joshua Frye Adviser Chris Grant 8/24/2012 ABSTRACT

ENVR 416 Aerosol Technology - Laboratory Session Fall 2007

EECS490: Digital Image Processing. Lecture #26

Popular Mechanics, 1954

Indicator: Proportion of the rural population who live within 2 km of an all-season road

FORENSIC GEOLOGY SAND SIZE-DISTRIBUTIONS AS INDICATORS OF CRIME SCENE LOCATIONS

Chapter 4. Displaying and Summarizing. Quantitative Data

Linear Programming and Marginal Analysis

Math Review. Name:

Transcription:

NET FLUX VISUALISATION FOR FLOW MONITORING DATA Data Science Unit Global DTM Support Team, HQ Geneva March 2018

Summary This annex seeks to explain the way in which Flow Monitoring data collected by the Displacement Tracking Matrix (DTM) is visualised on the Flow Monitoring website. The purpose of visualising Flow Monitoring data is to facilitate a better understanding of mobility trends in assessed areas. CONTENTS 1 Processs 3 1.1 Definition of migration network............................... 3 2 Calculation of net flux estimates 4 2.1 Plotting of migration waves................................. 4 3 Strengths of the Model 10 3.1 Geographical visualisation................................. 10 3.2 Simplification of flow monitoring data........................... 10 3.3 Estimations provide a margin for data fluctuation.................... 11 4 Limitations 11 4.1 Misconceptions of migration flows............................ 11 4.2 Reliance on flow monitoring data............................. 11

. 1 PROCESSS Data visualised on the DTM Flow Monitoring website is retrieved from the Flow Monitoring Registry, the DTM component that collects information on the volume and basic characteristics of populations transiting selected Flow Monitoring Points (FMPs) during observation hours. Data collected includes previous transit point(s), next destination, and intended destination (where possible), means of transportation, as well as the number, sex and nationality of migrants passing through a Flow Monitoring Point. There are three stages in establishing how individual data contributes to the visualisation of flows. These stages include: 1) the definition of a migration network; 2) the calculation of net flux estimates; and 3) the plotting of migration waves. 1.1 DEFINITION OF MIGRATION NETWORK Information on places of origin, transit points, and destinations collected at each FMP in the Flow Monitoring Registry data 1 supports the continuous identification of the components of a network 2 where migrants are moving. Figure 1 illustrates a possible migration network for a FMP in relation to places of origin, transit points, and destinations. Calculations are made based on different routes utilised by respondents. Figure 1:. A possible migration network based on Flow Monitoring Registry data. 1 Note that this migration network is based on reported routes, places of origin, transits and destinations as claimed by migrants, and thus, in certain cases, may not necessarily represent the actual route they take. 2 A network is composed by vertices and edges (see e.g. Diestel R. Graph theory. Springer Publishing Company, Inc; 2017). Thus, the vertices and edges of a migration network are the reported locations and the reported routes, respectively. MARCH 2018 GO TO CONTENTS Page 3 of 12

2 CALCULATION OF NET FLUX ESTIMATES The volume and direction of migration flows are determined for each segment of the network. Migration flows along a single segment can occur in both directions. To simplify visualisation and to more easily identify trends over time, the net flux (or balance) of migrants moving in the two possible directions is calculated for each segment. For instance, in the scenario of Table 2, 10 migrants are travelling from point A to point B, while 7 migrants are travelling in the opposite direction. The Flow Monitoring website would present the information in Table 1 as a balance of segment origin destination number of migrants A B point A point B 10 B A point B point A 7 Table 1: Example of bidirectional flows in rows in the FMR dataset corresponding to a single segment in the migration network. 3 migrants travelling from Point A to Point B, with an arrow representing the direction of the net balance (A to B), as shown in the following table: Net flux estimates are the difference between segment ID origin destination net flux AB point A point B 3 Table 2: Example of the Net Flux table for a single segment in the migration network. the number of individuals per segment travelling in each direction. This can be represented with the following equation: where stands for the absolute value. net flux segmentid = Volume A B Volume B A, (1) 2.1 PLOTTING OF MIGRATION WAVES Since the migration network lies in a plane specified by the projection of geographic coordinates, the final visualisation is a 2D-vector field defined by the linear combination of the contributions of each segment to the network. This will refer be as the final visualisation or the final vector field. If N is the number of segments in the migration network, then the k-th segment is defined by the vector s k := s k,f s k,o, (2) where s k,o and s k,f are the segment s initial and final points, and for k = 1, 2,..., N. The contribution of such segment to the total vector field, denoted by v k, is a vector field parallel to the segment pointing to the direction for which net flux is positive. The amplitude of v k is expected to be maximal on top of the migration path, and decreasing as the distance to the segment increases. These MARCH 2018 GO TO CONTENTS Page 4 of 12

properties are fulfilled by a vector field defined as: s k v k ( r) = f k [d( r, k)] s k, (3) where r is a vector in a two-dimensional space, denotes the Euclidean norm, d( r, k) represents the smallest distance between the vector r, and the k-th segment, and f k is an envelope function with the following properties: it is even, its maximum value is located at the origin, and it decays to infinity. For the envelope function f k, a Gaussian envelope is employed, f k (x) = w k σ k 2π e x2 /2σ2 k, for x R, (4) where σ k is the semi-width of the Gaussian function and w k is the weight of the given segment with respect to the whole migration network. The calibration of those parameters will be defined later. Since the segment has a finite length, the distance function d( r, k) can be defined explicitly as: r s k,o, if proj( r s k,o, ŝ) < 0, d( r, k) := r s k,f, if proj( r s k,f, ŝ) < 0, (5) d 0 ( r, s k ), elsewhere, where proj( u, v) denotes the projection of the vector u onto the vector v, namely proj( u, v) := { u v v, if v > 0, 0 elsewhere, (6) where represents the scalar product, and where stands for the 2D skew product 3 d 0 ( r, s k ) = s k ( r s k,o ), (7) s Figure 2: The segment s contribution to the total visualisation vector field. Panel a) for proj( r s k,o ) > 0 and proj( r s k,f ) < 0, b) for proj( r s k,o, s) < 0 and c) proj( r s k,f, s) > 0. The inset graph illustrates the Gaussian envelope function f k. 3 Explicitly, for any pair of 2D vectors u := (u 1, u 2) and v := (v 1, v 2), then u v := u 2v 1 u 1v 2. MARCH 2018 GO TO CONTENTS Page 5 of 12

The final visualisation is the vector sum of the contributions of all network segments. For instance, Figure 3 shows the case of a two-segment migration network. Note that the total visualisation is constructed as a set of arrows placed on an even grid. The number of grid points, denoted by N g, will determine the resolution of the visualisation, and therefore it will also affect the value of the parameter w k, which is not yet explicitly defined. Note that σ k and w k determine the weight of each contribution on the final vector field. Thus, σ k must be proportional to, or at least monotonic with the net flux along the k-th segment, whereas w k establishes a link between σ k and the number of points N g in such a way that the arrows do not overlap. Figure 3: Total vector field (green arrows) for two segments: s 1 and s 2. A plausible calibration of those parameters consists of classifying the net flux of each segment into intervals of values that provide the most even distribution, and assigns a value of the weighting parameters based on this classification. This most even distribution can be obtained via an optimisation problem using the probability distribution of the net flux by segments. This provides data-driven classification intervals, instead of using ad hoc choices. These intervals are obtained by plotting histograms of the net flux using different bin sizes, and choosing the value n b that provides the histogram p(x; n b ) such that n b = min n Z + ( n p(x i ; n) median i=1 ) [ ] {p(x j ; n)} n 2 j=1, with max [{p(x i; n)]} n i=1 ] > γ, (8) n where Z + is a set of non-negative integers, and γ is an adequate cut-off value to avoid the case in MARCH 2018 GO TO CONTENTS Page 6 of 12

which n b = N. The histogram thereby defines the intervals for the values of the net flux. If the values of the net flux are spread along several orders of magnitude, a bijective transformation to the FMR data is required before performing the optimisation routine. The specific form of the transformation will depend on the probability distribution of the net flux values. For instance, Figure 4 shows an example of a probability distribution for the net flux by network segment, where a logarithmic transformation is used to make the data uniform. Figure 4: An example of a probability distribution of the net flux. The horizontal axis is displayed in a logarithmic scale. The output for the optimisation problem in such a case is n b = 10 for γ = 150, and so the intervals are defined as shown in Figure 5: Figure 5: Example of an output for the optimisation problem for net flux classification intervals. The horizontal red line indicates the median of the net flux. MARCH 2018 GO TO CONTENTS Page 7 of 12

The grid used for displaying the visualisation is bound by the frame geographic points in the Flow Monitoring Registry dataset. If L is the distance (in kilometres) between the furthest south-eastern point and the furthest north-western point reported in the dataset, the scaling factor ρ is defined as: ( ) L ρ := min 10,, (9) with being the average distance between the centres of the segments in the migration network. The chosen values are σ k = ρn, where n = 1, 2,..., n b. Finally, the weight factor w k is defined by: w k = d cell σ k 2π, (10) where d cell is the length of the diagonal of each cell in the visualisation grid. All the calculations are performed within the geo-projection plane, in order to avoid dealing with non-euclidean geometries. Finally, the size and direction of the net flux are taken into account in the design of the arrows, by correspondingly adapting the size and direction of the arrows. Figure 6: Final visualisation for the migration network. Other features, such as the colour or the opacity of the arrows, can also be used to indicate the intensity of the flow. As illustrated in Figure 6, the bigger the size of the arrow, the larger the volume of migrants. However, it must be acknowledged that the Flow Monitoring component is unable to capture all possible migration routes. Thus, as the distance from the segment increases, the size of the arrow (signifying the volume of net flux) decreases as well. This should not be read as a definite indication that fewer migrants are travelling along these routes, but rather serve as informed estimations. For MARCH 2018 GO TO CONTENTS Page 8 of 12

areas that do not contain any Flow Monitoring data and are far from identified migration routes, no information is displayed on the map. Since the FMR exercise do not identify migrant-by-migrant, there is not way to know a group of migrants that passes by al FMPs was already counted by another FMP. This fact could be translated on double-counting if figures across several FMPs are aggregated. To avoid this situation, the process describe previously for the net flux visualisation performed within a domain of validity for each FMP separately. Such domains of validity are built using a Voronoi tessellation of the area of assessment, having the FMPs as the centers of the Voronoi regions. 4 An example of these domains of validity is illustrated in the Figure 7. Voronoi domain border Flow monitoring point Figure 7: Voronoi tessellation for some FMPs in Africa. Each Voronoi region is drawn around each FMP, and indicates the domain where the most accurate data is reported by the given FMP.. Notwithstanding the Voronoi domains can induce discontinuities on the visualisation, principally around the borders of the domains, they are expected to be minute. However, the case visible discontinuities can be prevented by means of a cut-off function for the contribution of each FMP in regions beyond the limits of the respective Voronoi domain. The final visualisation of the data is a snapshot of migration flows at specific moments in time. Therefore, it should not be assumed that a collection of arrows moving in a continuous direction signifies that migrants are necessarily travelling along that exact route. Taking Figure 6 as an example, the continuous stream of arrows from Point E to Point A and finally to Point B, should not 4 For more details about Voronoi tesellation check: Okabe, Atsuyuki, et al. Spatial tessellations: concepts and applications of Voronoi diagrams Vol. 501. John Wiley & Sons, 2009. MARCH 2018 GO TO CONTENTS Page 9 of 12

be interpreted as migrants moving from Point E to Point B. The model merely indicates the density and direction of migration flows at these three locations at a specified time, whereby flows exist independently of each other. Finally, the level of detail can be controlled by the resolution of the visualisation, with a larger number of grid points resulting in a less coarse-grained picture. Figure 8 illustrates the same flows with different resolutions: Figure 8: Net flux visualisations for different resolution paramenters. Here it is clear the effect of the number of grid points Ng in the final visualisation of a migration network. A large value of Ng provides more detailed information (a), whereas a small value of Ng provides a general flow pattern (b). 3 STRENGTHS OF THE MODEL There are several advantages to using this Flow Monitoring model, including, but not limited to: 3.1 GEOGRAPHICAL VISUALISATION This model provides a clearer understanding of migration flows at country and regional levels. By plotting Flow Monitoring data, migration dynamics can be more readily identified in terms of volume and direction. Trends can be derived from the model as the input of data displays potential changes in migration flows on a weekly basis. Additionally, the level of detail for the flow is controlled by the chosen resolution to the visualisation. Thus, for a lower resolution visualisation, the routes with a higher volume of net flux are dominant. An increased resolution corresponds to more disaggregated patterns. 3.2 SIMPLIFICATION OF FLOW MONITORING DATA Displaying all Flow Monitoring data in multiple directions and in varying volumes on a map would result in a highly complex visualisation of a migration network. The confusion that may arise from MARCH 2018 GO TO CONTENTS ä Page 10 of 12

visualising all this complex data might even prove counter-productive to the purpose of establishing an informative model. As such, this model is useful as it includes a simplified display of net flux calculations only, focusing on the direction of the remainder. 3.3 ESTIMATIONS PROVIDE A MARGIN FOR DATA FLUCTUATION Rather than providing definite numbers of migrants, the model strictly offers estimations of migration flows. This is in acknowledgement of the limitations of the model, as not all migration routes can be covered by the Flow Monitoring methodology. The model thus factors in these data gaps to provide informed estimations based on the observations of enumerators and interviews with migrants and key informants at Flow Monitoring Points. 4 LIMITATIONS While there are several strengths to this Flow Monitoring model, there are also limitations that should be considered. Below is a list of some identified limitations: 4.1 MISCONCEPTIONS OF MIGRATION FLOWS The direction of the arrows can be misleading if the model is not properly understood. Since the arrows only point in the direction of the net flux, this visualisation may give the false impression that migrant flows are only occurring in the arrows suggested direction. The reader might thus fail to observe the possibility that migrants could also be travelling in the opposite direction of the arrow. Besides that, arrows are only plotted onto the map when there is a net flux of migrants travelling in the indicated direction. This means that if a migration route happens to have an equal number of migrants travelling along both directions, no arrows would be displayed on the map. This may potentially lead to the false assumption that no movement is occurring in that area. 4.2 RELIANCE ON FLOW MONITORING DATA Another weakness of this model is the fact that it is reliant on the availability and reliability of Flow Monitoring data. As the data is collected at specific points of transit within set time frames, it only provides a partial view of the total volume and characteristics of migrant flows transiting through Flow Monitoring Points. The model thus leaves out migrants who travelled along routes where Flow Monitoring Points were absent. This limitation is particularly evident in countries with limited capacity to set up sufficient Flow Monitoring Points and in countries with many government restricted areas. Furthermore, the visualisation of a continuous stream of arrows passing through several locations may mislead users into believing that migrants are travelling along all locations covered by the arrows. This may result in inaccurate interpretations as the arrows do not necessarily reflect the itineraries of migrants. MARCH 2018 GO TO CONTENTS Page 11 of 12

There may be occasions where the arrows pass over potentially inaccessible areas such as lakes, seas, mountains, and deserts. While there remains a possibility that these migrants may have travelled over large bodies of water or through rough terrain, the arrows are meant to indicate the direction of migration flows based on the destination, regardless of the route taken. Thus, more detailed information about the exact itineraries of migrants could potentially reduce the number of arrows over these unexpected areas. MARCH 2018 GO TO CONTENTS Page 12 of 12