Globally Estimating the Population Characteristics of Small Geographic Areas. Tom Fitzwater

Similar documents
Targeted LiDAR use in Support of In-Office Address Canvassing (IOAC) March 13, 2017 MAPPS, Silver Spring MD

Lessons Learned from the production of Gridded Population of the World Version 4 (GPW4) Columbia University, CIESIN, USA EFGS October 2014

Gridded Population of the World Version 4 (GPWv4)

Understanding China Census Data with GIS By Shuming Bao and Susan Haynie China Data Center, University of Michigan

Methodological issues in the development of accessibility measures to services: challenges and possible solutions in the Canadian context

GEOGRAPHIC INFORMATION SYSTEMS Session 8

What are we like? Population characteristics from UK censuses. Justin Hayes & Richard Wiseman UK Data Service Census Support

LandScan Global Population Database

Demographic Data in ArcGIS. Harry J. Moore IV

Uganda - National Panel Survey

National assessment on member s readiness to produce indicators

Techniques for Science Teachers: Using GIS in Science Classrooms.

An Introduction to China and US Map Library. Shuming Bao Spatial Data Center & China Data Center University of Michigan

THE GLOBAL INDICATOR FRAMEWORK: NEW AND INNOVATIVE METHODS FOR DISAGGREGATIN BY GEOLOCATION

SUPPORTS SUSTAINABLE GROWTH

Using American Factfinder

Technical Documentation Demostats april 2018

Foundation Geospatial Information to serve National and Global Priorities

Indicator: Proportion of the rural population who live within 2 km of an all-season road

Digitization in a Census

GIS = Geographic Information Systems;

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

Geoscape Capturing Australia s Built Environment for emergency modelling and management. Dan Paull Chief Executive Officer PSMA Australia

AP HUG REVIEW WELCOME TO 2 ND SEMESTER! Annette Parkhurst, M.Ed. January, 2015

Welcome. C o n n e c t i n g

Geospatial Analysis of Job-Housing Mismatch Using ArcGIS and Python

Rules of the territorial division

Disaggregation according to geographic location the need and the challenges

The Road to Data in Baltimore

Evaluating Urban Vegetation Cover Using LiDAR and High Resolution Imagery

Merging statistics and geospatial information

High resolution population grid for the entire United States

Indicator : Average share of the built-up area of cities that is open space for public use for all, by sex, age and persons with disabilities

Crowdsourcing approach for large scale mapping of built-up land

Compact guides GISCO. Geographic information system of the Commission

ENV208/ENV508 Applied GIS. Week 1: What is GIS?

2018 NASCIO Award Submission Category: Cross-Boundary Collaboration and Partnerships. Project Title: Tennessee Wildfires: A Coordinated GIS Response

DATA DISAGGREGATION BY GEOGRAPHIC

COMPILATION OF ENVIRONMENTALLY-RELATED QUESTIONS IN CENSUSES AND SURVEY, AND SPECIALIZED ENVIRONMENTAL SURVEYS

Sharthi Laldaparsad Statistics South Africa, Policy Research & Analysis. Sub-regional workshop on integration of administrative data,

Land-Line Technical information leaflet

GIS ADMINISTRATOR / WEB DEVELOPER EVANSVILLE-VANDERBURGH COUNTY AREA PLAN COMMISSION

Land Accounts - The Canadian Experience

CENSUS MAPPING WITH GIS IN NAMIBIA. BY Mrs. Ottilie Mwazi Central Bureau of Statistics Tel: October 2007

Disaster Management & Recovery Framework: The Surveyors Response

Use of Geospatial Data: Philippine Statistics Authority 1

Integration for Informed Decision Making

Understanding and Measuring Urban Expansion

NR402 GIS Applications in Natural Resources

New Land Cover & Land Use Data for the Chesapeake Bay Watershed

Acknowledgments xiii Preface xv. GIS Tutorial 1 Introducing GIS and health applications 1. What is GIS? 2

Developing Spatial Data to Support Statistical Analysis of Education

Population Research Center (PRC) Oregon Population Forecast Program

Poverty and Hazard Linkages

Machine Learning to Automatically Detect Human Development from Satellite Imagery

Measuring Disaster Risk for Urban areas in Asia-Pacific

Looking at Communities: Comparing Urban and Rural Neighborhoods

Working with Census Operations Case Study from KSA

Summary and Statistical Report of the 2007 Population and Housing Census Results 1

How GIS can be used for improvement of literacy and CE programmes

Chapter 6. Fundamentals of GIS-Based Data Analysis for Decision Support. Table 6.1. Spatial Data Transformations by Geospatial Data Types

Combining Geospatial and Statistical Data for Analysis & Dissemination

Brazil Paper for the. Second Preparatory Meeting of the Proposed United Nations Committee of Experts on Global Geographic Information Management

What s new (or not so new) in Population and Poverty Data Initiatives

Integrating Official Statistics and Geospatial Information NBS Experience

Preparing the GEOGRAPHY for the 2011 Population Census of South Africa

Achieving the Vision Geo-statistical integration addressing South Africa s Developmental Agenda. geospatial + statistics. The Data Revolution

Data Collection. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1

Introduction to GIS I

User Guide. Affirmatively Furthering Fair Housing Data and Mapping Tool. U.S. Department of Housing and Urban Development

The Church Demographic Specialists

2010 Census Data Release and Current Geographic Programs. Michaellyn Garcia Geographer Seattle Regional Census Center

Welcome to NR502 GIS Applications in Natural Resources. You can take this course for 1 or 2 credits. There is also an option for 3 credits.

Rural Pennsylvania: Where Is It Anyway? A Compendium of the Definitions of Rural and Rationale for Their Use

2011 Built-up Areas - Methodology and Guidance

CONSTRUCTING THE POVERTY AND OPPORTUNITIES/PUBLIC SERVICES MAPS INFORMATION MANAGEMENT. Background: Brazil Without Extreme Poverty Plan

GEOGRAPHY (GEOGRPHY) Geography (GEOGRPHY) 1

Trimble s ecognition Product Suite

Demographic Data. How to get it and how to use it (with caution) By Amber Keller

ParkServe. The Trust for Public Land s. Emmalee Dolfi Gabriel Patterson-King ESRI User Conference 2017

Introduction to GIS. Dr. M.S. Ganesh Prasad

DATA SCIENCE SIMPLIFIED USING ARCGIS API FOR PYTHON

QUESTIONNAIRE THE CURRENT STATUS OF MAPPING IN THE WORLD

Typical information required from the data collection can be grouped into four categories, enumerated as below.

Long Island Breast Cancer Study and the GIS-H (Health)

BROOKINGS May

Exploring the Potential of Integrated Cadastral and Building Data for Evaluation of Remote-Sensing based Multi-Temporal Built-up Land Layers

Modeling global, urban, and rural exposure to climate-related hazards

Spatial Statistical Information Services in KOSTAT

Second High Level Forum on GGIM Seminar on Regional Cooperation in Geospatial Information Management Doha, Qatar, 7 February 2013

GIS Lecture 5: Spatial Data

GDDS2: GIS AND SMALL AREA STATISTICS. Page 1 of 6

Advanced Image Analysis in Disaster Response

STATE GEOGRAPHIC INFORMATION DATABASE

Search for the Gulf of Carpentaria in the remap search bar:

Census Geography, Geographic Standards, and Geographic Information

Hennepin GIS. Tree Planting Priority Areas - Analysis Methodology. GIS Services April 2018 GOAL:

The 2010 Census Successful Because of GIS

Internet GIS Sites. 2 OakMapper webgis Application

NEW YORK DEPARTMENT OF SANITATION. Spatial Analysis of Complaints

Transcription:

Globally Estimating the Population Characteristics of Small Geographic Areas Tom Fitzwater U.S. Census Bureau Population Division

What we know 2

Where do people live? Difficult to measure and quantify. Characteristics: Age and sex distribution, poverty, literacy, access to services, etc. Measuring at many scales: Country Region City Neighborhood Person Data users want answers at every scale. 3

Regional population change Projected population change: 2016 to 2050 Source: U.S. Census Bureau, International Data Base, August 2016 update. http://www.census.gov/popul ation/international/data/idb/r egion.php?n=%20results%20 &T=6&A=aggregate&RT=0&Y =2016,2050&R=110,120,130, 140,150,160&C= 4

Projected population change by country Source: U.S. Census Bureau, International Data Base, August 2016 update. http://www.census.gov/population/international/data/idb/informationgateway.php 5

Population distribution Source: U.S. Census Bureau, subnational population estimates, 2011. 6

7

Our methods 8

International Projects Applied research and analysis: International population estimates and projections HIV/AIDS research tools Global aging issues Geospatial products: Subnational population estimates Gridded products (e.g., Demobase) 9

What we produce Use multiple inputs for guidance, including censuses and surveys. Estimates and projections: Evaluate and separately estimate base population, fertility, mortality, and international migration for each country. Analyze and project trends in above variables which change rates and age/sex composition over time. Additional indicators Censuses Infrequent Detailed geography Surveys More frequent Less detailed geography More characteristics Geospatial Administrative boundaries Statistical boundaries How many? Where? 10

Common census and survey variables No country releases individual records, except in the form of sanitized public microdata. All censuses have total population, age, and sex variables. Other variables differ among countries. Personal: ethnicity, language, health, fertility, mortality Family: marital status, household size Migration: birthplace, citizenship Education: literacy, level of education Work: employment, job categories Economic: income, poverty Housing: wall/roof materials, building age Land use: urban/rural, agriculture Access to services: water, electricity, transport, communications 11

How we evaluate data In our products, we account for census inaccuracies or unexpected events, such as: Systematic undercounting of women and children. Major events, such as disasters, war, disease. Unexplained or unlikely population growth. Differential coverage between censuses. Discrepancies in age reporting, such as age heaping. Problems reported in a post-census survey. These considerations are critical to ensure proper analysis of a country s population characteristics. 12

Studying change over time within a country Comparability: Even if an indicator is present in 2 censuses/surveys, we do not assume they are directly comparable. Methods, definitions, or wording of the question may have changed, rendering comparisons misleading. Evolving geography: Some countries redraw boundaries frequently. We match census or survey data to available geographic boundaries. 13

Comparing data between countries Working across borders is a significant challenge. From a geospatial perspective, boundaries from one country s government may not match boundaries from neighboring country. Census and survey variables may not be comparable from country to country due to differing definitions. Example: How do you define urban areas? 14

Different time series Censuses are held on different dates and at varying frequency. 15

Geospatial data International land boundaries Large-Scale International Boundaries (LSIB) (geonode.state.gov) Shorelines and inland water bodies World Vector Shoreline (WVS), or national sources. Internal/subnational boundaries National sources, GADM, open data repositories. Persistent challenges: Multiple interpretations for international boundaries. Representing the area of enumeration. Inland water bodies. Finding freely licensed subnational data. Matching subnational boundaries with statistical data. 16

New tools and methods 17

New tools and methods We leverage our agency s strengths: One of the largest holders of geographic information in the U.S. government. Center for excellence in production and understanding of human geography and population data. Continually innovating to conduct U.S. censuses and surveys more efficiently and effectively. We apply tools and methods built for domestic purposes to international projects. 18

Automated data collection & categorization Objectives: Index datasets and publications from key websites. Categorize datasets as useful or not useful. Structure datasets regardless of format (PDF, XLSX, CSV, API, HTML, etc.). Save staff time and increase data availability. Primary toolset: Apache Nutch (web scraper), Python Natural Language Toolkit (language), Python scikit-learn (machine learning classifier) 19

Automated data collection & categorization Build language-specific training datasets. Extract text from datasets or documents (initial focus: PDF). Identify common features based on n-grams, or sequences of n words. e.g., 2-gram = 2 word pair like demographic survey Tested Naïve Bayes & Support Vector classifiers. Promising early results: ~93% accuracy (SVM) for English-language documents, compared to human classification. Future considerations: Continuously search for new data. Tag data by subject, source, location. Automatically extract and structure data tables. Develop feedback process to improve classification. Explore geospatial data identification and extraction. Expand to other file formats and APIs. Cloud architecture to scale up/down easily. 20

Building extraction Identifying physical features from remotely sensed data. Use a combination of: High resolution imagery (aerial or satellite) Light detection and ranging (lidar) Vector geospatial data (e.g., road networks) Work under way to enhance U.S. housing unit location accuracy in support of census/survey operations. Census Bureau holds 150 million+ housing unit locations for U.S. Considering applying similar methods to other countries to refine estimates of population distribution. 21

Constructing built-up layer Built/not-built derived from normalized difference vegetation index (NDVI). Change identified by comparing multiple time periods. Multi-date Built layer 22

Extracting ground from lidar Lidar point cloud. Queried for ground points. Expanded ground raster layer. Ground points converted to raster layer. 23

Extracting buildings Expanded ground raster (left) used as mask to remove low areas of Built/Not Built layer for latest year (right). Resultant raster layer represents primarily buildings. Building polygons generated and compared with existing point features in collection. Footprints and heights captured. 24

Extracting roads Start with buildings raster layer Combine with non-ground point cloud to generate Built/Not Built layer Contraction and dilation to separate roads from parking lot to get paved area layer Small road segments eliminated and intersected with existing road vectors. Misaligned road segments identified and saved. 25

Population/settlement surfaces Increasing number of datasets are providing estimates of population and settlements in raster/gridded format. Advantages: Custom geography (not dependent on aggregate results). Works across many scales (from neighborhoods to continents). Ideal for scientific applications. 26

Example datasets Demobase (U.S. Census Bureau) WorldPop (University of Southampton) Gridded Population of the World (CIESIN) High Resolution Settlement Layer (Facebook/CIESIN) LandScan (Oak Ridge National Laboratory) Global Human Settlement Layer (Joint Research Commission) Global Urban Footprint (German Aerospace Center) 27

28

Example methodology: Demobase Starting point: Correlation between population distribution and built-up area. Probability layer: Combination of built-up area and other layers. Where are humans likely to live? Population assigned to pixels based on fraction of pixel that is built up. Probability layer plays an important role in rural areas. 29

Example methodology: Demobase Probability layer combines: Proximity to built-up Settlement points Land cover heterogeneity Slope and elevation Detailed methods: Azar, et al. Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data. Remote Sensing of Environment (2013). http://dx.doi.org/10.1016/j.rse.2012.11.022 30

Thank you! Tom Fitzwater U.S. Census Bureau Population Division john.t.fitzwater@census.gov 31