Big-Geo-Data EHR Infrastructure Development for On-Demand Analytics

Similar documents
Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS

GeoHealth Applications Platform ESRI Health GIS Conference 2013

An online data and consulting resource of THE UNIVERSITY OF TOLEDO THE JACK FORD URBAN AFFAIRS CENTER

FIRE DEPARMENT SANTA CLARA COUNTY

Introduction to Field Data Collection

Methodological issues in the development of accessibility measures to services: challenges and possible solutions in the Canadian context

CENSUS MAPPING WITH GIS IN NAMIBIA. BY Mrs. Ottilie Mwazi Central Bureau of Statistics Tel: October 2007

AUTOMATED METERED WATER CONSUMPTION ANALYSIS

Finding Common Ground Through GIS

Techniques for Science Teachers: Using GIS in Science Classrooms.

Egypt Public DSS. the right of access to information. Mohamed Ramadan, Ph.D. [R&D Advisor to the president of CAPMAS]

Are You Maximizing The Value Of All Your Data?

TRAITS to put you on the map

GIS 520 Data Cardinality. Joining Tabular Data to Spatial Data in ArcGIS

ArcGIS Platform For NSOs

Demographic Data in ArcGIS. Harry J. Moore IV

GEOGRAPHIC INFORMATION SYSTEMS Session 8

Putting the U.S. Geospatial Services Industry On the Map

ArcGIS Online Routing and Network Analysis. Deelesh Mandloi Matt Crowder

GIS = Geographic Information Systems;

ArcGIS. for Server. Understanding our World

No. of Days. Building 3D cities Using Esri City Engine ,859. Creating & Analyzing Surfaces Using ArcGIS Spatial Analyst 1 7 3,139

Creating a Staff Development Plan with Esri

Empowering Local Health through GIS

ArcGIS is Advancing. Both Contributing and Integrating many new Innovations. IoT. Smart Mapping. Smart Devices Advanced Analytics

Application of Indirect Race/ Ethnicity Data in Quality Metric Analyses

Geo Business Gis In The Digital Organization

GIS for Crime Analysis. Building Better Analysis Capabilities with the ArcGIS Platform

IMPLEMENTING GOVERNMENT-WIDE ENTERPRISE GIS; THE FEDERATED MODEL

The Underutilization of GIS & How to Cure It. Adam Carnow Esri

Demographic Data. How to get it and how to use it (with caution) By Amber Keller

Spatial Data Analysis with ArcGIS Desktop: From Basic to Advance

No. of Days. ArcGIS 3: Performing Analysis ,431. Building 3D cities Using Esri City Engine ,859

No. of Days. ArcGIS Pro for GIS Professionals ,431. Building 3D cities Using Esri City Engine ,859

7 GEOMATICS BUSINESS SOLUTIONS - ANNUAL REPORT 2006

Purpose Study conducted to determine the needs of the health care workforce related to GIS use, incorporation and training.

Institute of Statistical and Geographical Information of Jalisco State Subnational Statistical and Geographical System India.

compass.durhamnc.gov Building Community by Illustrating Community Durham s Neighborhood Compass

OFWIM 2017 Annual Conference What Does Web GIS Really Mean for Fish and Wildlife Agencies?

The Emerging Role of Enterprise GIS in State Forest Agencies

An Assessment of People, Place and Business on Syracuse s Near Northside

Oakland County Parks and Recreation GIS Implementation Plan

Applying Health Outcome Data to Improve Health Equity

EMMA : ECDC Mapping and Multilayer Analysis A GIS enterprise solution to EU agency. Sharing experience and learning from the others

GIS-T 2010 Building a Successful Geospatial Data Sharing Framework: A Ohio DOT Success Story

June 19 Huntsville, Alabama 1

Presented at ESRI Education User Conference, July 6-8, 2001, San Diego, CA

Geocoding of Statistics Portugal Business Register and it s integration with the INSPIRE s Annex III Buildings theme

INSPIRE in the context of EC Directive 2002/49/EC on Environmental Noise

Spatial Analysis with Web GIS. Rachel Weeden

OC Enterprise GIS. Kevin Hills, PLS Cameron Smith, GISP. OC Survey

GeoAnalytics A key component for insurance industry disruption

Map your way to deeper insights

Acknowledgments xiii Preface xv. GIS Tutorial 1 Introducing GIS and health applications 1. What is GIS? 2

Introduction to ArcGIS Maps for Office. Greg Ponto Scott Ball

Washington Master Address Services: Project Overview Ben Vaught, OCIO David Wright, DOR Craig Erickson, DOH Tom Kimpel, OFM

Using ArcGIS Server to Bring Geospatial Analysis

The Journal of Database Marketing, Vol. 6, No. 3, 1999, pp Retail Trade Area Analysis: Concepts and New Approaches

A GIS helps you answer questions and solve problems by looking at your data in a way that is quickly understood and easily shared.

2013 NASCIO Award Submission Category: Cross-Boundary Collaboration and Partnerships. Project Title: Public Safety and Enterprise GIS in Tennessee

GIS in Developing Countries

Spatial Variation in Local Road Pedestrian and Bicycle Crashes

Spatial Organization of Data and Data Extraction from Maptitude

GIS Capability Maturity Assessment: How is Your Organization Doing?

Ministry of Health and Long-Term Care Geographic Information System (GIS) Strategy An Overview of the Strategy Implementation Plan November 2009

Geography for the 2020 Round of Census

GIS Lecture 5: Spatial Data

Crime Analysis. GIS Solutions for Intelligence-Led Policing

Introduction to Google Mapping Tools

Locational business intelligence in the U.S. Forest Service: Geospatial Accomplishment Reporting ESRI USER CONFERENCE 2015 JULY 21, 2015

Law Enforcement Solutions and Applications

ArcGIS Deployment Pattern. Azlina Mahad

Institutional Research with Public Data and Open Source Software

Enabling ENVI. ArcGIS for Server

Oregon Department of Transportation. Geographic Information Systems Strategic Plan

The Trade Area Analysis Model

Geo-enabling a Transactional Real Estate Management System A case study from the Minnesota Dept. of Transportation

Visualization of Commuter Flow Using CTPP Data and GIS

How GIS can be used for improvement of literacy and CE programmes

Web GIS Patterns and Practices

Evaluating e-government : implementing GIS services in Municipality

Section 2. Indiana Geographic Information Council: Strategic Plan

Leveraging Web GIS: An Introduction to the ArcGIS portal

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

Jun Tu. Department of Geography and Anthropology Kennesaw State University

CHAPTER 22 GEOGRAPHIC INFORMATION SYSTEMS

Esri and GIS Education

Spatial Data Availability Energizes Florida s Citizens

DIFFERENT INFLUENCES OF SOCIOECONOMIC FACTORS ON THE HUNTING AND FISHING LICENSE SALES IN COOK COUNTY, IL

2007 / 2008 GeoNOVA Secretariat Annual Report

Geomapping Drive-Time Based Market Areas for DoD TRICARE Beneficiaries

Asset Management Planning. GIS and Asset Management Integration Readiness Assessment

An Internet-Based Integrated Resource Management System (IRMS)

Visualization of Origin- Destination Commuter Flow Using CTPP Data and ArcGIS

PALS: Neighborhood Identification, City of Frederick, Maryland. David Boston Razia Choudhry Chris Davis Under the supervision of Chao Liu

Medical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta

2018 NASCIO Award Submission Category: Cross-Boundary Collaboration and Partnerships. Project Title: Tennessee Wildfires: A Coordinated GIS Response

DATA SCIENCE SIMPLIFIED USING ARCGIS API FOR PYTHON

2010 Census Data Release and Current Geographic Programs. Michaellyn Garcia Geographer Seattle Regional Census Center

The Pace of Change Is Accelerating Creating Many Challenges

Transcription:

Big-Geo-Data EHR Infrastructure Development for On-Demand Analytics Sohayla Pruitt, MA Senior Geospatial Scientist Duke Medicine DUHS DHTS EIM HIRS Page 1

Institute of Medicine, World Health Organization, and others recognize that clinical care may only contribute 10% to the health of a population. Duke Medicine has spent MILLIONS on the electronic capture of clinical data. How do we make a health system more aware of the other determinants of health? Page 2

GeoMedicine Page 3

Typical Geospatial Workflow Requirements Personnel Software Data Hardware GIS Methods Requires specialized software and highly trained personnel. Data comes from a variety of sources (both free and costly), and it is often obtained in a variety of both nonspatial and spatial formats. Several specialized methods used to adequately prepare the data for use in geospatial visualization and analysis. Often involves ad hoc analyses that all too often get funded as a small piece of a larger research project. Page 4

Typical Geospatial Workflow Disadvantages Work is Not Easily Shared MANY Possible Data Sources Data Can Be Expensive Work is Not Easily Scalable Data Can Easily Become Stagnant Significant Bias Introduced Data Requires Significant Preparation Not Enough Time Spent on Analysis Too Much Time Spent on Data Prep Page 5

The Changing Paradigm: On-Demand Geospatial Analytics We have become an on-demand society With easy access to online mapping applications and GPS enabled smartphones and cars, we have become reliant on geospatial information. Successful applications deliver immediate answers to user questions, without the user ever having to manually collect, process, or analyze the data. Page 6

Duke Medicine s EDW Geospatial Strategic Vision Develop an enterprise geospatial infrastructure within Duke s Enterprise Data Warehouse, where automated methodologies download, update, and process geospatial data layers and then link them to each patient s geocoded address. Ensure the end user has access to thousands of the most up-to-date geospatial information. Develop sophisticated geospatial visualization and analytics tools that assist in transforming data to information on-demand and project agnostic. Spur geospatial health-care research efforts and help eliminate some of the bias from the analysis. Page 7

Duke Medicine s EDW Geospatial Infrastructure Development Page 8

Automated Address Standardization and Geocoding Page 9

Automated Address Standardization and Geocoding Status: Completed August 2012 Methodology: Used SAS Data Management Studio, the USPS knowledge pack, and the Tom-Tom Rooftop +6 Geocoding Data Pack to deploy an automated process that runs nightly on patient address records. Results: ~5.9 million address records evaluated for USPS verification / standardization and Rooftop or Street-level of Geocoding accuracy. ~90% of all patients seen in the past 10 years have a current address that has been USPS verified and standardized. ~83% of all patients seen in the past 10 years have a current address that has been geocoded to the Rooftop or Street-level of Geocoding accuracy. The 7% of addresses that were standardized but not geocoded, were caused by the address being a non-physical address (i.e. P.O. Boxes, Military Addresses, BEFORE etc.). AFTER Page 10

On-Demand Geospatial Visualization Automated Address Standardization and Geocoding Page 11

On-Demand Geospatial Visualization Status: Completed November 2012 Results: Filter data in EDW to create patient cohorts and visualize them on a map. Different map types available depending on PHI authorization: Dot Distribution or Thematic Maps using several geographic boundaries (i.e. Counties, ZIP Codes, Census Tracts, Block Groups). Page 12

On-Demand Geospatial Visualization Automated Address Standardization and Geocoding Automated Geospatial Data Collection, Transformation, Loading Page 13

Automated Geospatial Data Collection, Transformation, and Loading Status: In Progress / Ongoing Results: Acquired following data sets to date: ESRI Infrastructure Data, resulting in ~30 feature types (i.e. interstates, roads, parks, etc.). MapInfo Business Points, resulting in ~7000 business feature types at multiple SIC code grouping levels (i.e. eating and drinking establishments vs. restaurants vs. fast food locations). Census 2010 Summary File 1 Demographic Data at Block Group Level, resulting in ~6000 statistical measures expressed as raw counts, percentages, medians, or averages. American Community Survey 5-year Estimate Demographic and Socio-economic Data at Block Group Level (2006-2011), resulting in ~6000 statistical measures expressed as raw counts, percentages medians, or averages. Developed automated routines to prepare data for geospatial analysis (i.e. geo-location, clipping to NC, spatial re-projection, filtering, spatial joins, etc.) and load in a geodatabase within the EDW. Page 14

On-Demand Geospatial Visualization Automated Address Association with Geospatial Data Features Automated Address Standardization and Geocoding Automated Geospatial Data Collection, Transformation, Loading Page 15

Automated Address Association with Geospatial Data Features Status: In Progress / Ongoing Results: Developed automated routines to calculate relationship from each patient s address in the EDW to the nearest geospatial feature. Resulted in ~30 new variables characterizing each patient s distance to infrastructure related variables (i.e. distance to interstates, distance to roads, distance to parks, etc.). Resulted in ~7000 new variables characterizing each patient s distance to business variables at varying degrees of categorization (i.e. distance to eating and drinking establishments, distance to restaurants, distance to fast food locations). Resulted in ~12,000 demographic and socioeconomic block group value variables expressed as raw counts, percentages medians, or averages. Page 16

On-Demand Geospatial Visualization Automated Address Association with Geospatial Data Features Automated Address Standardization and Geocoding Automated Geospatial Data Collection, Transformation, Loading On-Demand Patient Geospatial Variable Filtering/Export Page 17

On-Demand Patient Geospatial Variable Filtering/Export Status: In Progress / Ongoing Results: Each patient s socioeconomic and demographic block group level variables are available for on-demand filtering, visualization, and export. The new geospatial data elements can be exported and used in advanced statistical models. The distance to infrastructure features and business establishments have not yet been made available on demand, but is in progress. New Geo Data Element Filters Page 18

On-Demand Patient Geospatial Variable Filtering/Export New Geo Data Elements for Export Page 19

On-Demand Geospatial Visualization Automated Address Association with Geospatial Data Features On-Demand Patient Geospatial Variable Visualization Automated Address Standardization and Geocoding Automated Geospatial Data Collection, Transformation, Loading On-Demand Patient Geospatial Variable Filtering/Export Page 20

On-Demand Patient Geospatial Variable Visualization Status: In Progress / Ongoing Results: Each cohort s socioeconomic and demographic block group level variables are available for on-demand charting. Common block group socioeconomic and demographic status variables can also be mapped thematically and visualized alongside a cohort. The distance to infrastructure features and business establishments have not yet been made available on demand, but is in progress. Page 21

On-Demand Patient Geospatial Variable Visualization Results Continued: We are exploring the use of BI dashboards to provide on-demand visualizations that help answer who? and where? Page 22

On-Demand Patient Geospatial Variable Visualization Results Continued: Working with SAS partners to explore the use of SAS Visual Analytics to provide on-demand analytic functionality. Will allow us to move toward providing a more complete on-demand environment that will decrease the need for researchers to extract the data from the EDW onto their own machines in order to statistically analyze. Page 23

On-Demand Geospatial Visualization Automated Address Association with Geospatial Data Features On-Demand Patient Geospatial Variable Visualization Automated Address Standardization and Geocoding Automated Geospatial Data Collection, Transformation, Loading On-Demand Patient Geospatial Variable Filtering/Export On-Demand Geospatial Predictive Analytics Page 24

On-Demand Geospatial Predictive Analytics Innovation: Developed first ever proof of concept of a mhealth technology that is capable of learning a user s behavior throughout time/space, the socio-geographic factors that influence that behavior, and delivering real-time intervention, just in time and just in place. Overview: Supported the Community Health and Resource Mapping (CHARM) team in building the big-geo-data infrastructure on top of mhealth data collected on smokers. Demonstrated how 5000+ geospatial variables could be considered within a logistic regression model to: (a) Identify the geospatial characteristics in common and statistically significant to the locations where participants reported smoking within their mhealth app, and (b) identify other areas with similar geospatial characteristics that have a high statistical probability or likelihood that they might engage in smoking in the future, based on their past behavior. OUTPUT: Generated Probability Hotspot Maps (PHMs)*, where values range from 0-1 and specify the likelihood that participants will engage in smoking behavior in a given location. INPUT: Modeled x,y mhealth smoking logs of 17 smokers across NC against 5000 geospatial data variables * PHMs are different than density hotspot maps, as they do not only summarize the density where behavior occurred in the past, but they identify NEW areas where the behavior is likely Page to 25 occur in the future.

Summary of Benefits The geospatial data that is being integrated within the EHR will be very useful as it will contribute a valuable set of data elements that is not collected upon interaction with a patient (i.e. educational attainment, income, primary mode of transportation, distance from primary care clinics, distance from fitness facilities, etc.) This integration, will allow our community on-demand access to geospatial visualization and analytics without having to be expert geospatial modelers who know where to acquire the geospatial data, how to process it, and how to interact with advanced geospatial software to get the information and analysis they need. This approach to research and management can transform how our organization examines the geographic and environmental determinants in the Population Medicine equation. Page 26