Data Science Unit. Global DTM Support Team, HQ Geneva

NET FLUX VISUALISATION FOR FLOW MONITORING DATA Data Science Unit Global DTM Support Team, HQ Geneva March 2018

Summary This annex seeks to explain the way in which Flow Monitoring data collected by the Displacement Tracking Matrix (DTM) is visualised on the Flow Monitoring website. The purpose of visualising Flow Monitoring data is to facilitate a better understanding of mobility trends in assessed areas. CONTENTS 1 Processs 3 1.1 Definition of migration network............................... 3 2 Calculation of net flux estimates 4 2.1 Plotting of migration waves................................. 4 3 Strengths of the Model 10 3.1 Geographical visualisation................................. 10 3.2 Simplification of flow monitoring data........................... 10 3.3 Estimations provide a margin for data fluctuation.................... 11 4 Limitations 11 4.1 Misconceptions of migration flows............................ 11 4.2 Reliance on flow monitoring data............................. 11

. 1 PROCESSS Data visualised on the DTM Flow Monitoring website is retrieved from the Flow Monitoring Registry, the DTM component that collects information on the volume and basic characteristics of populations transiting selected Flow Monitoring Points (FMPs) during observation hours. Data collected includes previous transit point(s), next destination, and intended destination (where possible), means of transportation, as well as the number, sex and nationality of migrants passing through a Flow Monitoring Point. There are three stages in establishing how individual data contributes to the visualisation of flows. These stages include: 1) the definition of a migration network; 2) the calculation of net flux estimates; and 3) the plotting of migration waves. 1.1 DEFINITION OF MIGRATION NETWORK Information on places of origin, transit points, and destinations collected at each FMP in the Flow Monitoring Registry data 1 supports the continuous identification of the components of a network 2 where migrants are moving. Figure 1 illustrates a possible migration network for a FMP in relation to places of origin, transit points, and destinations. Calculations are made based on different routes utilised by respondents. Figure 1:. A possible migration network based on Flow Monitoring Registry data. 1 Note that this migration network is based on reported routes, places of origin, transits and destinations as claimed by migrants, and thus, in certain cases, may not necessarily represent the actual route they take. 2 A network is composed by vertices and edges (see e.g. Diestel R. Graph theory. Springer Publishing Company, Inc; 2017). Thus, the vertices and edges of a migration network are the reported locations and the reported routes, respectively. MARCH 2018 GO TO CONTENTS Page 3 of 12

2 CALCULATION OF NET FLUX ESTIMATES The volume and direction of migration flows are determined for each segment of the network. Migration flows along a single segment can occur in both directions. To simplify visualisation and to more easily identify trends over time, the net flux (or balance) of migrants moving in the two possible directions is calculated for each segment. For instance, in the scenario of Table 2, 10 migrants are travelling from point A to point B, while 7 migrants are travelling in the opposite direction. The Flow Monitoring website would present the information in Table 1 as a balance of segment origin destination number of migrants A B point A point B 10 B A point B point A 7 Table 1: Example of bidirectional flows in rows in the FMR dataset corresponding to a single segment in the migration network. 3 migrants travelling from Point A to Point B, with an arrow representing the direction of the net balance (A to B), as shown in the following table: Net flux estimates are the difference between segment ID origin destination net flux AB point A point B 3 Table 2: Example of the Net Flux table for a single segment in the migration network. the number of individuals per segment travelling in each direction. This can be represented with the following equation: where stands for the absolute value. net flux segmentid = Volume A B Volume B A, (1) 2.1 PLOTTING OF MIGRATION WAVES Since the migration network lies in a plane specified by the projection of geographic coordinates, the final visualisation is a 2D-vector field defined by the linear combination of the contributions of each segment to the network. This will refer be as the final visualisation or the final vector field. If N is the number of segments in the migration network, then the k-th segment is defined by the vector s k := s k,f s k,o, (2) where s k,o and s k,f are the segment s initial and final points, and for k = 1, 2,..., N. The contribution of such segment to the total vector field, denoted by v k, is a vector field parallel to the segment pointing to the direction for which net flux is positive. The amplitude of v k is expected to be maximal on top of the migration path, and decreasing as the distance to the segment increases. These MARCH 2018 GO TO CONTENTS Page 4 of 12

properties are fulfilled by a vector field defined as: s k v k ( r) = f k [d( r, k)] s k, (3) where r is a vector in a two-dimensional space, denotes the Euclidean norm, d( r, k) represents the smallest distance between the vector r, and the k-th segment, and f k is an envelope function with the following properties: it is even, its maximum value is located at the origin, and it decays to infinity. For the envelope function f k, a Gaussian envelope is employed, f k (x) = w k σ k 2π e x2 /2σ2 k, for x R, (4) where σ k is the semi-width of the Gaussian function and w k is the weight of the given segment with respect to the whole migration network. The calibration of those parameters will be defined later. Since the segment has a finite length, the distance function d( r, k) can be defined explicitly as: r s k,o, if proj( r s k,o, ŝ) < 0, d( r, k) := r s k,f, if proj( r s k,f, ŝ) < 0, (5) d 0 ( r, s k ), elsewhere, where proj( u, v) denotes the projection of the vector u onto the vector v, namely proj( u, v) := { u v v, if v > 0, 0 elsewhere, (6) where represents the scalar product, and where stands for the 2D skew product 3 d 0 ( r, s k ) = s k ( r s k,o ), (7) s Figure 2: The segment s contribution to the total visualisation vector field. Panel a) for proj( r s k,o ) > 0 and proj( r s k,f ) < 0, b) for proj( r s k,o, s) < 0 and c) proj( r s k,f, s) > 0. The inset graph illustrates the Gaussian envelope function f k. 3 Explicitly, for any pair of 2D vectors u := (u 1, u 2) and v := (v 1, v 2), then u v := u 2v 1 u 1v 2. MARCH 2018 GO TO CONTENTS Page 5 of 12

The final visualisation is the vector sum of the contributions of all network segments. For instance, Figure 3 shows the case of a two-segment migration network. Note that the total visualisation is constructed as a set of arrows placed on an even grid. The number of grid points, denoted by N g, will determine the resolution of the visualisation, and therefore it will also affect the value of the parameter w k, which is not yet explicitly defined. Note that σ k and w k determine the weight of each contribution on the final vector field. Thus, σ k must be proportional to, or at least monotonic with the net flux along the k-th segment, whereas w k establishes a link between σ k and the number of points N g in such a way that the arrows do not overlap. Figure 3: Total vector field (green arrows) for two segments: s 1 and s 2. A plausible calibration of those parameters consists of classifying the net flux of each segment into intervals of values that provide the most even distribution, and assigns a value of the weighting parameters based on this classification. This most even distribution can be obtained via an optimisation problem using the probability distribution of the net flux by segments. This provides data-driven classification intervals, instead of using ad hoc choices. These intervals are obtained by plotting histograms of the net flux using different bin sizes, and choosing the value n b that provides the histogram p(x; n b ) such that n b = min n Z + ( n p(x i ; n) median i=1 ) [ ] {p(x j ; n)} n 2 j=1, with max [{p(x i; n)]} n i=1 ] > γ, (8) n where Z + is a set of non-negative integers, and γ is an adequate cut-off value to avoid the case in MARCH 2018 GO TO CONTENTS Page 6 of 12

which n b = N. The histogram thereby defines the intervals for the values of the net flux. If the values of the net flux are spread along several orders of magnitude, a bijective transformation to the FMR data is required before performing the optimisation routine. The specific form of the transformation will depend on the probability distribution of the net flux values. For instance, Figure 4 shows an example of a probability distribution for the net flux by network segment, where a logarithmic transformation is used to make the data uniform. Figure 4: An example of a probability distribution of the net flux. The horizontal axis is displayed in a logarithmic scale. The output for the optimisation problem in such a case is n b = 10 for γ = 150, and so the intervals are defined as shown in Figure 5: Figure 5: Example of an output for the optimisation problem for net flux classification intervals. The horizontal red line indicates the median of the net flux. MARCH 2018 GO TO CONTENTS Page 7 of 12

The grid used for displaying the visualisation is bound by the frame geographic points in the Flow Monitoring Registry dataset. If L is the distance (in kilometres) between the furthest south-eastern point and the furthest north-western point reported in the dataset, the scaling factor ρ is defined as: ( ) L ρ := min 10,, (9) with being the average distance between the centres of the segments in the migration network. The chosen values are σ k = ρn, where n = 1, 2,..., n b. Finally, the weight factor w k is defined by: w k = d cell σ k 2π, (10) where d cell is the length of the diagonal of each cell in the visualisation grid. All the calculations are performed within the geo-projection plane, in order to avoid dealing with non-euclidean geometries. Finally, the size and direction of the net flux are taken into account in the design of the arrows, by correspondingly adapting the size and direction of the arrows. Figure 6: Final visualisation for the migration network. Other features, such as the colour or the opacity of the arrows, can also be used to indicate the intensity of the flow. As illustrated in Figure 6, the bigger the size of the arrow, the larger the volume of migrants. However, it must be acknowledged that the Flow Monitoring component is unable to capture all possible migration routes. Thus, as the distance from the segment increases, the size of the arrow (signifying the volume of net flux) decreases as well. This should not be read as a definite indication that fewer migrants are travelling along these routes, but rather serve as informed estimations. For MARCH 2018 GO TO CONTENTS Page 8 of 12

areas that do not contain any Flow Monitoring data and are far from identified migration routes, no information is displayed on the map. Since the FMR exercise do not identify migrant-by-migrant, there is not way to know a group of migrants that passes by al FMPs was already counted by another FMP. This fact could be translated on double-counting if figures across several FMPs are aggregated. To avoid this situation, the process describe previously for the net flux visualisation performed within a domain of validity for each FMP separately. Such domains of validity are built using a Voronoi tessellation of the area of assessment, having the FMPs as the centers of the Voronoi regions. 4 An example of these domains of validity is illustrated in the Figure 7. Voronoi domain border Flow monitoring point Figure 7: Voronoi tessellation for some FMPs in Africa. Each Voronoi region is drawn around each FMP, and indicates the domain where the most accurate data is reported by the given FMP.. Notwithstanding the Voronoi domains can induce discontinuities on the visualisation, principally around the borders of the domains, they are expected to be minute. However, the case visible discontinuities can be prevented by means of a cut-off function for the contribution of each FMP in regions beyond the limits of the respective Voronoi domain. The final visualisation of the data is a snapshot of migration flows at specific moments in time. Therefore, it should not be assumed that a collection of arrows moving in a continuous direction signifies that migrants are necessarily travelling along that exact route. Taking Figure 6 as an example, the continuous stream of arrows from Point E to Point A and finally to Point B, should not 4 For more details about Voronoi tesellation check: Okabe, Atsuyuki, et al. Spatial tessellations: concepts and applications of Voronoi diagrams Vol. 501. John Wiley & Sons, 2009. MARCH 2018 GO TO CONTENTS Page 9 of 12

be interpreted as migrants moving from Point E to Point B. The model merely indicates the density and direction of migration ﬂows at these three locations at a speciﬁed time, whereby ﬂows exist independently of each other. Finally, the level of detail can be controlled by the resolution of the visualisation, with a larger number of grid points resulting in a less coarse-grained picture. Figure 8 illustrates the same ﬂows with different resolutions: Figure 8: Net ﬂux visualisations for different resolution paramenters. Here it is clear the effect of the number of grid points Ng in the ﬁnal visualisation of a migration network. A large value of Ng provides more detailed information (a), whereas a small value of Ng provides a general ﬂow pattern (b). 3 STRENGTHS OF THE MODEL There are several advantages to using this Flow Monitoring model, including, but not limited to: 3.1 GEOGRAPHICAL VISUALISATION This model provides a clearer understanding of migration ﬂows at country and regional levels. By plotting Flow Monitoring data, migration dynamics can be more readily identiﬁed in terms of volume and direction. Trends can be derived from the model as the input of data displays potential changes in migration ﬂows on a weekly basis. Additionally, the level of detail for the ﬂow is controlled by the chosen resolution to the visualisation. Thus, for a lower resolution visualisation, the routes with a higher volume of net ﬂux are dominant. An increased resolution corresponds to more disaggregated patterns. 3.2 SIMPLIFICATION OF FLOW MONITORING DATA Displaying all Flow Monitoring data in multiple directions and in varying volumes on a map would result in a highly complex visualisation of a migration network. The confusion that may arise from MARCH 2018 GO TO CONTENTS ä Page 10 of 12

visualising all this complex data might even prove counter-productive to the purpose of establishing an informative model. As such, this model is useful as it includes a simplified display of net flux calculations only, focusing on the direction of the remainder. 3.3 ESTIMATIONS PROVIDE A MARGIN FOR DATA FLUCTUATION Rather than providing definite numbers of migrants, the model strictly offers estimations of migration flows. This is in acknowledgement of the limitations of the model, as not all migration routes can be covered by the Flow Monitoring methodology. The model thus factors in these data gaps to provide informed estimations based on the observations of enumerators and interviews with migrants and key informants at Flow Monitoring Points. 4 LIMITATIONS While there are several strengths to this Flow Monitoring model, there are also limitations that should be considered. Below is a list of some identified limitations: 4.1 MISCONCEPTIONS OF MIGRATION FLOWS The direction of the arrows can be misleading if the model is not properly understood. Since the arrows only point in the direction of the net flux, this visualisation may give the false impression that migrant flows are only occurring in the arrows suggested direction. The reader might thus fail to observe the possibility that migrants could also be travelling in the opposite direction of the arrow. Besides that, arrows are only plotted onto the map when there is a net flux of migrants travelling in the indicated direction. This means that if a migration route happens to have an equal number of migrants travelling along both directions, no arrows would be displayed on the map. This may potentially lead to the false assumption that no movement is occurring in that area. 4.2 RELIANCE ON FLOW MONITORING DATA Another weakness of this model is the fact that it is reliant on the availability and reliability of Flow Monitoring data. As the data is collected at specific points of transit within set time frames, it only provides a partial view of the total volume and characteristics of migrant flows transiting through Flow Monitoring Points. The model thus leaves out migrants who travelled along routes where Flow Monitoring Points were absent. This limitation is particularly evident in countries with limited capacity to set up sufficient Flow Monitoring Points and in countries with many government restricted areas. Furthermore, the visualisation of a continuous stream of arrows passing through several locations may mislead users into believing that migrants are travelling along all locations covered by the arrows. This may result in inaccurate interpretations as the arrows do not necessarily reflect the itineraries of migrants. MARCH 2018 GO TO CONTENTS Page 11 of 12

There may be occasions where the arrows pass over potentially inaccessible areas such as lakes, seas, mountains, and deserts. While there remains a possibility that these migrants may have travelled over large bodies of water or through rough terrain, the arrows are meant to indicate the direction of migration flows based on the destination, regardless of the route taken. Thus, more detailed information about the exact itineraries of migrants could potentially reduce the number of arrows over these unexpected areas. MARCH 2018 GO TO CONTENTS Page 12 of 12