Integrating Online and Geospatial Information Sources

Similar documents
Lecture 9: Reference Maps & Aerial Photography

Dan Goldberg GIS Research Laboratory Department of Computer Science University of Southern California

GIS for ChEs Introduction to Geographic Information Systems

Teaching GIS for Land Surveying

Lab 9: Satellite Geodesy (35 points)

Geospatial Data Sources. SLO GIS User Group June 8, 2010

Mapping Historical Information Using GIS

Lecture 9: Geocoding & Network Analysis

Data Creation and Editing

Applied Cartography and Introduction to GIS GEOG 2017 EL. Lecture-2 Chapters 3 and 4

Introduction to the 176A labs and ArcGIS

Geographic Products and Data. Improvements in Spatial Accuracy and Accessing Data

GIS Lecture 5: Spatial Data

File Geodatabase Feature Class. Tags platts, price assessement, crude oil, crude, petroleum

Web Visualization of Geo-Spatial Data using SVG and VRML/X3D

Louisiana Transportation Engineering Conference. Monday, February 12, 2007

2. GETTING STARTED WITH GIS

Principles of IR. Hacettepe University Department of Information Management DOK 324: Principles of IR

Lower Density Growth Management Metadata

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

Information. Information Technology. Geographic. Services (GIS) 119 W Indiana Ave Deland, FL 32720

Export Basemap Imagery from GIS to CAD

SRJC Applied Technology 54A Introduction to GIS

The Northwest Environmental Training Center presents:

FHWA GIS Outreach Activities. Loveland, Colorado April 17, 2012

Shale Plays. File Geodatabase Feature Class. Tags shale plays, basins, gas production, regions

Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS

These modules are covered with a brief information and practical in ArcGIS Software and open source software also like QGIS, ILWIS.

The National Hydrography Dataset in the Pacific Region. U.S. Department of the Interior U.S. Geological Survey

DEVELOPMENT OF GPS PHOTOS DATABASE FOR LAND USE AND LAND COVER APPLICATIONS

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Gridded Ambient Air Pollutant Concentrations for Southern California, User Notes authored by Beau MacDonald, 11/28/2017

NR402 GIS Applications in Natural Resources

Popular Mechanics, 1954

Introduction to the 176A labs and ArcGIS Purpose of the labs

1994: JV formed for civil engineering consultancy COWI/KX A/s, Denmark(51%) IFU, Denmark (24% ) IL&FS Infrastructure Dev Corp, India(25%)

INDOT Office of Traffic Safety

VVS maps are generated using DIVA

Road Extraction and Feature Conflation

Digital Map of Mexico Platform and MxSIG. March 2017

DATA SOURCES AND INPUT IN GIS. By Prof. A. Balasubramanian Centre for Advanced Studies in Earth Science, University of Mysore, Mysore

Please note: This presentation was used as speaker s notes for the 2008 Petroleum User Group Conference on Feb. 27, 2008 in Houston, TX.

Among various open-source GIS programs, QGIS can be the best suitable option which can be used across partners for reasons outlined below.

Spatial Thinking and Modeling of Network-Based Problems

Write a report (6-7 pages, double space) on some examples of Internet Applications. You can choose only ONE of the following application areas:

Utah UIC Geospatial Integration EIEN Project

2010 Census Data Release and Current Geographic Programs. Michaellyn Garcia Geographer Seattle Regional Census Center

IMPERIAL COUNTY PLANNING AND DEVELOPMENT

Gridded Traffic Density Estimates for Southern

Digital Tax Maps Westport Island Project Summary

About places and/or important events Landmarks Maps How the land is, hills or flat or mountain range Connected to maps World Different countries

GeoPostcodes. Trinidad & Tobago

Lauren Jacob May 6, Tectonics of the Northern Menderes Massif: The Simav Detachment and its relationship to three granite plutons

Steve Pietersen Office Telephone No

Utilizing Data from American FactFinder with TIGER/Line Shapefiles in ArcGIS

How we made the maps. By Chris Walker. Map US Coast Survey

John Laznik 273 Delaplane Ave Newark, DE (302)

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research

Evaluating Physical, Chemical, and Biological Impacts from the Savannah Harbor Expansion Project Cooperative Agreement Number W912HZ

GIS Needs Assessment. for. The City of East Lansing

What is GIS? Arizona Maps & GIS Spatial datasets Library GIS Services More Web resources GIS-AZUL HOME

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

B. Topographic maps are also called. contour maps

GIS-T 2010 Building a Successful Geospatial Data Sharing Framework: A Ohio DOT Success Story

Census and USGS: Bringing Improved TIGER to The National Map. Jennie Karalewich-Census Andrea Johnson- Census Dick Vraga- USGS

Targeted LiDAR use in Support of In-Office Address Canvassing (IOAC) March 13, 2017 MAPPS, Silver Spring MD

Montgomery County Community College GEO 210 Introduction to Geographic Information Systems (GIS) 3-2-2

Introduction to GIS. Dr. M.S. Ganesh Prasad

Esri UC2013. Technical Workshop.

Valdosta State University Strategic Research & Analysis

GIS CONCEPTS Part I. GIS ON THE WEB Part II

Tutorial: Urban Trajectory Visualization. Case Studies. Ye Zhao

Rural Louisiana. A quarterly publication of the Louisiana Tech Rural Development Center

Lecture 7: GIS Data Capture and Metadata

Houston Plat Tracker puts the GIS in Land Development

Utilizing GIS Technology for Rockland County. Rockland County Planning Department Douglas Schuetz & Scott Lounsbury

Visualizing Big Data on Maps: Emerging Tools and Techniques. Ilir Bejleri, Sanjay Ranka

Spatial Organization of Data and Data Extraction from Maptitude

Production Line Tool Sets

GIS Data Structure: Raster vs. Vector RS & GIS XXIII

INTRODUCTION TO GIS. Dr. Ori Gudes

Key Questions and Issues. What is GIS? GIS is to geographic analysis as: What is GIS? 9/3/2013. GEO 327G/386G, UT Austin 1

GeoPostcodes. Luxembourg

Studying Topography, Orographic Rainfall, and Ecosystems (STORE)

Internet GIS Sites. 2 OakMapper webgis Application

Learning ArcGIS: Introduction to ArcCatalog 10.1

Flood Database for Oklahoma: A web-mapping application for spatial data organization and access

Introduction to Geographic Information Systems (GIS): Environmental Science Focus

The Retreating Glaciers of Mt. Jefferson, Oregon. Using EET Methods in Student Research Projects

Census Geography, Geographic Standards, and Geographic Information

Proclaiming Certain Lands as Reservation for the Confederated Tribes of the

Transcription:

Integrating Online and Geospatial Information Sources Craig Knoblock Cyrus Shahabi Jose Luis Ambite Maria Muslea Snehal Thakkar Jason Chen Mehdi Sharifzadeh University of Southern California

Introduction Geospatial data sources have become widely available Huge amount of data available online that can be related to these geospatial sources Challenge is to support the dynamic integration of these two types of sources Craig A. Knoblock University of Southern California 2

Outline Geospatial Data Sources Semi-structured Data Sources Integrating Semi-structured and Geospatial Sources Combining online schedules with vectors and points Using online sources and image processing to align vectors and imagery Exploiting property records to identify structures in imagery Integrating vectors and points with online oil field maps Discussion and Future Work Craig A. Knoblock University of Southern California 3

Geospatial Data Sources Imagery Craig A. Knoblock University of Southern California 4

Geospatial Data Sources Imagery Maps Craig A. Knoblock University of Southern California 5

Geospatial Data Sources Imagery Maps Vectors Craig A. Knoblock University of Southern California 6

Geospatial Data Sources Imagery Maps Vectors Elevations Craig A. Knoblock University of Southern California 7

Geospatial Data Sources Imagery Maps Vectors Elevations Points Craig A. Knoblock University of Southern California 8

TerraWorld System Data from the National Imagery and Mapping Agency (NIMA) Includes imagery, map, vector, elevation, and point data Covers most of the world (including the oceans!) Hardware 8 High-end Dell Servers Separate servers for imagery & maps, vectors, databases, and web servers Storage Attached Network (SAN) 3 terabytes of storage Provides high-speed data access to all servers Craig A. Knoblock University of Southern California 9

Outline Geospatial Data Sources Semi-structured Data Sources Integrating Semi-structured and Geospatial Sources Combining online schedules with vectors and points Using online sources and image processing to align vectors and imagery Exploiting property records to identify structures in imagery Integrating vectors and points with online oil field maps Discussion and Future Work Craig A. Knoblock University of Southern California 10

Semi-structured Data Sources Property tax sites Craig A. Knoblock University of Southern California 11

Semi-structured Data Sources Property tax sites Telephone books Craig A. Knoblock University of Southern California 12

Semi-structured Data Sources Property tax sites Online telephone books Railroad schedules <IRANIAN_RAILWAYS> <TRAIN> <ROW> <CITY>Tehran</CITY> <TIME>12:35</TIME> </ROW> <ROW> <CITY>Esfahan</CITY> <TIME>19:45</TIME> </ROW> </TRAIN> <TRAIN> <ROW> <CITY>Tehran</CITY> <TIME>14:00</TIME> </ROW> </TRAIN> </IRANIAN_RAILWAYS> Craig A. Knoblock University of Southern California 13

Machine Learning of Wrappers Developed machine learning techniques for rapidly extracting data from semi-structured sources (wrapper) Started a spin-off company from ISI (Fetch Technologies) that has commercial product based on this work GUI Labeled Pages Inductive Learning System Wrapper EC Tree Craig A. Knoblock University of Southern California 14

Outline Geospatial Data Sources Semi-structured Data Sources Integrating Semi-structured and Geospatial Sources Combining online schedules with vectors and points Using online sources and image processing to align vectors and imagery Exploiting property records to identify structures in imagery Integrating vectors and points with online oil field maps Discussion and Future Work Craig A. Knoblock University of Southern California 15

Combining Online Schedules with Vectors and Points [Shahabi et al., 2001] Stations Railroads Schedules How do we efficiently determine which trains will pass a given point or region Railroad vectors specify all possible paths of the trains Stations show the locations of the stops Schedules provide the detailed timetable and stops Craig A. Knoblock University of Southern California 16

Integrating Schedules with Vector Data Approach: Create a wrapper for the online schedule and download it to a database Match the names of the stations in the online schedule with the names of the stations in the gazetteer Exploits work we have done on record linkage across sources Align the points in the gazetteer with the vector data of the railroads Find the shortest paths between the stations Compute the trains that will pass a given region within some time interval Determines how much real paths can deviate from the shortest distance between two points to compute this efficiently Craig A. Knoblock University of Southern California 17

Integrating Schedules with Vectors Craig A. Knoblock University of Southern California 18

Integrating Schedules with Vectors Craig A. Knoblock University of Southern California 19

Outline Geospatial Data Sources Semi-structured Data Sources Integrating Semi-structured and Geospatial Sources Combining online schedules with vectors and points Using online sources and image processing to align vectors and imagery Exploiting property records to identify structures in imagery Integrating vectors and points with online oil field maps Discussion and Future Work Craig A. Knoblock University of Southern California 20

Aligning Vectors with Imagery (Chen et al., 2003) Integration Challenges Different geographic projections Global transformations do not exist Previously this was performed by: Manually identifying control points Applying conflation techniques Craig A. Knoblock University of Southern California 21

Conflation Conflation: Compiling two geo-spatial datasets by establishing the correspondence between the matched entities and transforming other objects accordingly. Requires identifying matched entities, named control points, on the image and the vectors Each pair of corresponding control points from the two datasets indicates corresponding positions on each datasets Existing algorithms only deal with vector to vector spatial data integration or accomplish imagery to vector data integration manually We explored two techniques Control points generated from online sources Control points produced from localized image processing Vector Data Imagery Find and Filter Control Points Conflating Imagery and Vector Data Craig A. Knoblock University of Southern California 22

Finding Control Points Using Online Sources Online sources can be used to locate points on vector data USGS Gazetteer Points (Micrsoft TerraService) Record Linkage Control Point Pairs US Census TIGER/Line Files Yellow Pages Data for Gazetteer Points Geocoder I Property Tax Data Craig A. Knoblock University of Southern California 23

Finding Control Points Using Online Sources Control Point Pairs Features Previously Identified on Imagery (Yellow points) Feature Name Latitude Longitude Church of Christ 33.91971-118.40790 El Segundo Christian Church 33.91811-118.41790 El Segundo Public Library 33.92391-118.41690 El Segundo Foursquare Church 33.92154-118.41750 First Baptist Church 33.92531-118.40990 Points on vector data (Red points) Feature Name Church of Christ El Segundo Hilltop Community El Segundo Christian Church El Segundo Public Library Foursquare Church Of El Segundo First Baptist Church of El Segundo Address 717 East Grand Ave 223 West Franklin Ave 111 W Mariposa Ave 429 Richmond Street 591 East Palm Avenue Craig A. Knoblock University of Southern California 24

Finding Control Points Using Localized Image Processing Craig A. Knoblock University of Southern California 25

Resulting Control Point Pairs Intersection Points Located on Vector Data (Red points) Intersection Points Detected on Imagery (Yellow points) Craig A. Knoblock University of Southern California 26

Filtering Control Points Vector Median Filter Keep half control-point vectors Vector median Control-point vectors After Filtering Craig A. Knoblock University of Southern California 27

Conflating Imagery and Vector Data Conflate imagery and vector data by computing the transformations between the control point pairs and transforming other objects accordingly Two steps Delaunay Triangulation Partition the space into multiple triangles Linear Rubber-Sheeting Stretching of vector data within each triangle as if it was made of rubber Vector Data Imagery Find and Filter Control Points Delaunay Triangulation : Partition both Imagery and Vector Conflated Vector on Imagery Linear Rubber-Sheeting : Transform Vector data to Imagery Craig A. Knoblock University of Southern California 28

Conflating Imagery and Vector Data: Delaunay Triangulation Sub-divide the vector data into multiple triangles using the control points as vertices, then construct the corresponding triangles on the imagery Red lines : Original Road Network Point : Control Point Pairs Green lines: Delaunay Triangulation Craig A. Knoblock University of Southern California 29

Conflating Imagery and Vector Data: Linear Rubber-Sheeting Imagine stretching a vector map as if it was made of rubber Deform algorithmically, forcing registration of control points over the vector data with their corresponding points on the imagery Red lines : Original Road Network Yellow lines : Conflated Road Network Point : Control Point Pairs Green lines: Delaunay Triangulation Craig A. Knoblock University of Southern California 30

Results El Segundo Mean Std Mean + Std Dataset Displace. Dev Deviation Original TIGER/Lines 26.19 5 (21.19, 31.19) Using Online Sources 15.92 8.38 ( 7.54, 24.3 ) Using Local Image Pro 8.61 6 ( 2.61, 14.61) Craig A. Knoblock University of Southern California 31

Conflation Results of Using Localized Image Processing Before Conflation After Conflation Craig A. Knoblock University of Southern California 32

Outline Geospatial Data Sources Semi-structured Data Sources Integrating Semi-structured and Geospatial Sources Combining online schedules with vectors and points Using online sources and image processing to align vectors and imagery Exploiting property records to identify structures in imagery Integrating vectors and points with online oil field maps Discussion and Future Work Craig A. Knoblock University of Southern California 33

Identifying Structures in Imagery Craig A. Knoblock University of Southern California 34

Locate the Roads in the Image Craig A. Knoblock University of Southern California 35

Exploiting Online Sources to Accurately Identify Structures in Imagery Street Vector Data Corrected Tiger Line Files Constraint Satisfaction Satellite Image Terraserver Street Address City, State Zipcode 642 Penn St El Segundo, CA 90245 640 Penn St El Segundo, CA 90245 636 Penn St El Segundo, CA 90245 604 Palm Ave El Segundo, CA 90245 610 Palm Ave El Segundo, CA 90245 645 Sierra St El Segundo, CA 90245 639 Sierra St El Segundo, CA 90245 Census Master Address File 604 or 642 604 or 610 642, Penn or 636,Penn 636,Penn or 630,Penn 630,Penn or 628,Penn 628,Penn or 624,Penn 624,Penn or 618,Penn 610, Palm or 645,Sierra 645, Sierra or 639,Sierra 639, Sierra or 633,Sierra 633, Sierra or 629,Sierra 629, Sierra or 623,Sierra Initial Hypothesis 604 642,644,646 Penn 636,638,640 Penn 630,632,634 Penn 628, Penn 624, Penn 610 645, Sierra 639, Sierra 633, Sierra 629, Sierra 623, Sierra Result After Constraint Satisfaction Address Latitude Longitude 642 Penn St 33.923413-118.409809 640 Penn St 33.923412-118.409809 636 Penn St 33.923412-118.409809 604 Palm Ave 33.923414-118.409809 610 Palm Ave 33.923414-118.409810 645 Sierra St 33.923413-118.409810 639 Sierra St 33.923412-118.409810 Geocoded Houses Address # units Area(sq ft) Lot size 642 Penn St 3 1793 135.72 * 53.33 604 Palm Ave 1 884 69 * 42 610 Palm Ave 1 756 66 * 42 645 Sierra St 1 1337 120 * 62 639 Sierra St 1 1408 121*53.5 Los Angeles County Assessor s Site Data Extracted from On-line Site Property Craig A. Tax Knoblock Records University of Southern California 36

Identifying Structures in Imagery Craig A. Knoblock University of Southern California 37

Labeling Structures in Imagery Craig A. Knoblock University of Southern California 38

Outline Geospatial Data Sources Semi-structured Data Sources Integrating Semi-structured and Geospatial Sources Combining online schedules with vectors and points Using online sources and image processing to align vectors and imagery Exploiting property records to identify structures in imagery Integrating vectors and points with online oil field maps Discussion and Future Work Craig A. Knoblock University of Southern California 39

Integrating Vectors and Points with Online Oil Field Maps Goal: Determine which houses are built over abandoned oil wells Integrate the online oil maps with street vector data Challenge: Not given lat/long coordinates of maps Given a database of some of the oil wells on the maps Source : California Dept. of Conservation, Division of Oil, Gas and Geothermal Resources http://www.consrv.ca.gov/dog/maps/index_map.htm Maps: in PDF format. Wells information : vector(point) dataset contains, for example, status/operator/lat/long Craig A. Knoblock University of Southern California 40

Sample Oil Map Craig A. Knoblock University of Southern California 41

Sample Oil Map (Zoom In) Craig A. Knoblock University of Southern California 42

Vector Data ( Online Wells Info ) Issue : Some wells are detected on the maps while not found on the vector data, and vice versa. Craig A. Knoblock University of Southern California 43

Integration Approach (Work in Progress) PDF PDF to Image : Ghostscript ( GSView) Image Extracting well points Extracted Points Online Wells-Info (*.dbf) Well points matching Georeferenced Map Integration DB Points Corrected Vector Data Vector datasets (point datasets) Vector datasets (line datasets/ TIGERLines) Craig A. Knoblock University of Southern California 44

Outline Geospatial Data Sources Semi-structured Data Sources Integrating Semi-structured and Geospatial Sources Combining online schedules with vectors and points Using online sources and image processing to align vectors and imagery Exploiting property records to identify structures in imagery Integrating vectors and points with online oil field maps Discussion and Future Work Craig A. Knoblock University of Southern California 45

Discussion Described four example applications Combining online schedules with vectors and points Using online sources and image processing to align vectors and imagery Exploiting property records to identify structures in imagery Integrating vectors and points with online oil field maps Goal is not to develop the specific applications, but to develop the techniques for automatically integrating these diverse types of sources Craig A. Knoblock University of Southern California 46

Future Work Build a general framework for integrating online and geospatial data sources Our previous integration work focused on integrating structured data (e.g., SIMS & Ariadne projects at USC) Extend this to support geospatial data types (imagery, maps, vectors, elevations, points) Develop integration techniques over these types Conflation integration imagery and vectors Moving object queries queries across time and space Constraint satisfaction integrating different types of data Investigate approaches to rapidly and automatically integrating these sources Craig A. Knoblock University of Southern California 47