Merging statistics and geospatial information Demography / Commuting / Spatial planning / Registers Mirosław Migacz Chief GIS Specialist Janusz Dygaszewicz Director Central Statistical Office of Poland INSPIRE Conference 2014: Inspire for good governance Aalborg, June 17th 2013
Agenda The aim The team The tasks Spatial visualization of demographic data Enterprise address spatialization Commuting statistics Statistical indicators for spatial planning The results Conclusions 2
The aim Geospatial analysis with use of: Population and Housing Census 2011 results other statistical datasets possessed by CSO Evaluation of reference materials in geostatistics production process: Spatial address databases (maintained within official statistics) Database of Topographic Objects (acquired from the mapping agency) 3
The team Programming and Coordination of Statistical Surveys Department @ CSO, Warsaw 5 Urban Statistics Centre @ SO Poznań 3 Regional and Environmental Surveys Department @ CSO, Warsaw 2 Statistical Computing Centre, Łódź 1 4
SPATIAL VISUALIZATION OF DEMOGRAPHIC DATA
Spatial visualization of demographic data Source data attribute data spatial data Methods of aggregation to various statistical units 1 km x 1 km grid Cadastral units Statisticalregions Census enumeration areas Cartographic presentation of theresults
Source data Attribute Tables with population distribution data acquiredfrom the Population and Housing Census 2011: Person ID X, Y coordinates (acquired from spatial address databases created and maintained within official statistics) 39 tables (one for each million people) Spatial Boundaries of statistical regions and census enumeration areas (spatial address databases) Cadastral units (mapping agency) Kilometer grid Grid_ETRS89_LAEA_PL_1K (European Forum for Geography and Statistics)
1 km x 1 km grid Grid_ETRS89_LAEA_PL_1K the european INSPIRE grid Cell coordinates lower left corner Aggregation of persons to specific grid cells possible w/o GIS software (Visual Basic for Applications used here for example)
1 km x 1 km grid Number of persons in each grid cell calculated with ArcGIS (Dissolve tool), though any other database software could be used The operation was conducted separately for each of the 39 tables
Cadastral units Aggregation to irregular division of space requires GIS software Environment: ArcGIS file geodatabase Spatial operations on a feature class with 38,5 mln objects exceed RAM capabilities of workstations and servers
Cadastral units Back to 39 separate tables >> need for automation Use of Python scripting with the arcpy module that contains all ArcGIS tools The script was processing 39 datasets Spatial join of the 1st dataset to the geometry of cadastral units (with calculation of total population) the initial dataset For each subsequent spatial join the current calculated population was added to the total population for each cadastral unit
Statistical regions and census enumeration areas The same tools that were used for cadastral units (ArcGIS, Python) A slightly different method of cyclic dataset processing: statisticalregions / census enumeration areas were spatially joined to datasets with persons 39 times >> 39 temporary feature classes 39 feature classes merged into one >> 1 feature class with 39 duplicate geometries for each statistical region / census enumeration area deduplication of the geometries with total population calculation for each geometry (Dissolve tool in ArcGIS)
Data aggregation conclusions Point data aggregation to grids can be done without GIS software any database software with e.g. VBA is sufficient Point data aggregation to an irregular division of space requires GIS software Processing of huge datasets requires automation, which can be acchieved with Python scripting: requires script preparation and testing on a data sample all processes can be run on a separate machine / server and they do not require the operator s attention
Cartographic presentation of the results 1 km x 1 km grid totalpopulationineachgridcell(= population density) Cadastral units, statistical regions, census enumeration areas choropleth maps of population density Classifications (5 classes) Colour scales average value as the center of the middle class 2 color gradient quantiles monochromatic
1 km x 1 km grid
1 km x 1 km grid
1 km x 1 km grid
1 km x 1 km grid
Cadastral units
Census enumeration areas
Cadastral units quantiles
Census enumeration areas quantiles
Cadastral units vs census enumeration areas (quantiles)
Quantiles conclusions Significant differences between quantile presentations: For the 1 km x 1 km grid a separate class for 0 was created Huge differences in classification between cadastral units and census enumeration areas due to these divisions having been created for different purposes: Cadastral units for legal management of land ownership Census enumeration areas for the purpose of conducting censuses (size dependant on the population count)
ENTERPRISE ADDRESS SPATIALIZATION
Source data Attribute Social insurance registers Taxpayers register Inland revenues database Statistical register of enterprises Spatial Spatial address databases (maintained within official statistics) Database of Topographic Objects (acquired from the mapping agency)
Enterprise address spatialization Address descriptive information paired with: address points from the Spatial Address Databases address points from the Database of Topographic Objects Pairing as is (62%) Address number simplification (e.g. 3A > 3) (5,9%) No address point (nearest address number) (18,6%) No address number (address point on same street or locality centroid) (1,8%) No street ID (locality centroid) (3,6%) Other cases (locality centroid) (8,1%)
COMMUTING STATISTICS
Commuter a person whose employer s registered office is outside the administrative borders of the gmina (municipality, LAU2) of residence
Commuting statistics Source data attribute data spatial data Actions Directions of population movements related to employment Commuting to/from Poznań Commuting within voivodships Cartographic presentation of the results
Source data Attribute Tables with demographic data acquired from the Population and Housing Census 2011: Person ID Age Gender Dwelling address and X, Y coordinates Workplace address and X, Y coordinates Income Economic activity classification Factof commuting 3,1 million records Spatial Boundaries of the territorial division of the country Spatial Address Databases (source of dwelling coordinates and boundaries of statistical regions and census enumeration areas) Spatialized enterprise addresses (source of workplace coordinates) Kilometer grid Grid_ETRS89_LAEA_PL_1K (European Forum for Geography and Statistics)
Percentage of commuters in the number of employees statistical unit: powiat (county) (LAU1)
Surplus arriving / departing to work statistical unit: 1km x 1km grid (ETRS89 LAEA)
Surplus arriving / departing to work area: city of Poznań and surroundings statistical unit: 1km x 1km grid (ETRS89 LAEA)
Quotient of commuting flows area: city of Poznań statistical unit: census enumeration area
Arriving / departing to work area: city of Poznań statistical unit: 250m x 250m grid (ETRS89 LAEA)
Percentage of people commuting to voivodship (NUTS2) capitals statistical unit: gmina (municipality) (LAU2)
STATISTICAL INDICATORS FOR SPATIAL PLANNING
Statistical indicators for spatial planning Source data spatial data Scope selected administrative units Aims Sourcedata usability analysis for purposes of creating statistical indicators for spatial planning methodology for statistical indicators describing building density methodology for statistical indicators describing road density Cartographic presentation of the indicators
Source data Spatial Database of Topographic Objects (buildings and road network) cadastral data ortophotomap Boundaries of statistical regions and census enumeration areas (spatial address databases) Boundaries of the territorial division of the country
Scope 27 gminas (LAU2) from 4 powiats (LAU1) located north of Warsaw
Source data evaluation comparing the content of randomly selected grid cells within the Database of Topographic Objects with the ortophotomap roads 59 grid cells sampled out of a total number of 1185 79,7% cells with total compliance rest with compliance > 75% buildings 135 grid cells sampled out of a total number of 2697 44,5% cells with total compliance 37,8% cells with compliance > 75% rest majorly with compliance > 50% gaps found mainly in urban areas omissions in the building layer
Building density indicator (%) W W P P Z P Z Z PZ 100% P P building density ratio total building area survey area
Building density indicator town of Ząbki grid cell with the biggest number of buildings city of Wołomin grid cell with the highest building density ratio
Road density indicator (km/km2) WD DD PP DD total road length PP survey area
Road density model (m/km 2 )
CONCLUSIONS
Conclusions census results referenced to a point (X,Y) huge opportunity for spatial analyses geostatistical products that reflect user needs high demand for demographic data lower than LAU2 level positive reception of project results SUCCESS
Conclusions GEO.STAT.GOV.PL The project outcome will have a strong impact on future developments of the Geostatistics Portal (incl. INSPIRE services) Wednesday, June 18 th, 16:00 @ Room 4 Geostatistics Portal the multitool for statistics on maps (session: Maps, Stats and Observation Data )
Merging statistics and geospatial information Demography / Commuting / Spatial planning / Registers Mirosław Migacz Chief GIS Specialist Central Statistical Office of Poland @mireslav www.linkedin.com/in/migacz m.migacz@stat.gov.pl www.slideshare.net/mirosawmigacz