WP7 Multi Domains Setting the scene
WP7 Multi Domains General summary of WP7 work done, Review of the future deliverables and milestones.
WP7 Multi Domains TEAM WP0: CO-ORDINATION WP 1 : Webscraping / Job Vacancies WP 2 : Webscraping / Enterprise Characteristics WP 3 : Smart Meters WP 4 : AIS Data WP 5 : Mobile Phone Data WP 6 : Early Estimates WP 7 : Multi Domains WP 8 : Methodology ESSnet BIG DATA WP9: DISSEMINATION Apart from GUS (Statistics Poland) which is leading WP 7 and CBS (Statistics Netherlands), this WP had been carried out by two other representatives of ESSnet Big Data partners: CSO (Statistics Ireland) and ONS (Statistics United Kingdom). From SGA-2 (in March 2017) Portugal joined to this team. 3
Janusz Dygaszewicz Project Manager of Polish work WP7 Multi Domains TEAM Anna Nowicka Leader cooperation Jacek Maślankowski Coordinator of methodology PARTNERS Alessandra Sozzi Nigel Swier, Leone Wardman Regional statistical office in Poznań Regional statistical office in Bydgoszcz Piet Daas Country team for each domain Regional statistical office in Rzeszów Department of Social Research John Sheridan Sinead Bracken Rui Alves Sónia Quaresma Department of Agriculture Regional statistical office in Olsztyn Population Tourism/ border crossing AGRICULTURE
Let's introduce ourselves
WP7 Multi Domains The aim is to investigate how a combination of big data sources and existing official statistical data can be used to improve current statistics and create new statistics in statistical domains. The work package focusses on the statistical domains : Population, Tourism/border crossings and Agriculture. The work package team will describe the data collection, data linking, data processing and methodological aspects when combining data in statistical domains. Challenges ahead are: representativity issues, linking to other datasets, metadata, international comparability and long lasting solutions with sustainable cost.
General summary of WP7 work done
Work done - general overview Brainstorming on data sources Questionnaire on different aspects of Big Data implementation e.g., data access, data quality, combining data, methodology Final use cases preparation Several videoconferences Annotated bibliography Pre-Pilot use cases implementation 8
SGA-2 perspectives Extend the scope of pilot surveys Combining data within domain as well as inter-domain data combination Sharing general framework Under SGA-2 to achieve the main goals, WP7 has started experimental work. For this reason WP7 is planning to carry out the three following case studies. 10
POPULATION Everyday citizen satisfaction Responsibility: PL coordinator, supported by UK, PT Data sources: Social media/blogs/internet portals Methodology: Webscraping, Data/Text/Web mining, Machine learning. As the data sources are selective, i.e. only cover units that put text on social media and the internet, the methodology will aim at yielding valid information for the population as a whole. Use will be made of methods described in the literature (such as research on the use of public social media messages done in the Netherlands). The goal of the case study: to examine the level of daily satisfaction of the population by analyzing the content of messages for the presence of defined expressions describing emotional states, e.g., happiness, joy, sadness, fear, anger; to present the moods of the population associated with various public events; to observe morbidity areas, e.g., flu. Plan of Combining Datasets: Combine in one repository the selected data from all Big Data sources, Comparison with the results of social studies to add more detailed information, Supplement of information gained in social studies. Main benefits and value added for official statistics: Support traditional European Social Survey, supplement the research methodology of some phenomena that are difficult to measure through traditional polls.
AGRICULTURE Estimation of Agricultural statistics pilot case study on crop types based on satellite data Responsibility: PL coordinator, supported by IE. Data sources: Satellite images, administrative data, in situ surveys. Methodology: combining data data fusion on radar and optical remote sensing data; data comparison with traditional surveys e.g. FSS; combining data administrative data sources with satellite data. The goal of the case study: Crop type: look at the types of crops being grown and see if we can tell this accurately from the imagery; analysis of possibilities of using satellite images. Plan of Combining Datasets: Data fusion combining data sources by spatial reference. Main benefits and value added for official statistics: Increase the quality of the agricultural surveys; Decrease of respondents burden; More detailed data published by official statistics; Potential decrease of the cost of conducting surveys.
TOURISM/BORDER CROSSING Border movement Responsibility: PL coordinator, supported by NL and PT. Data sources: Traffic sensors (data already acquired from Polish and German data owners), traditional surveys on tourism, flight statistics such as origin, destination, estimation of number of passengers from Civil Aviation Authority of the Republic of Poland and webscraping. Depending on availability, Mobile Call Data will also be used, building on the results of WP 5. Methodology: spatial-temporal models and graph interpolation methods; cross-entropy econometrics for combining data sets. The goal of the case study: to estimate border traffic through internal border of EU (Polish-German, Polish-Slovakian, Polish-Czech and Polish-Lithuanian border) also regarding to some mirror statistics. Partial estimation of domestic traffic may be an extra result. Selected data sources from national authorities show the scale of border movement that is regarded as tourism in terms of statistical surveys. Plan of Combining Datasets: Unifying structure of data sets; Collecting exogenous variables (road class, etc.); Preparing distance and graph matrices; Quantifying reliability of each data source (expected standard error); Combining traffic data from different sources with cross-entropy econometrics. Main benefits and value added for official statistics: Decreased burden of interviewers, more detailed results than from the survey solely, data consistent with mirror statistics.