Mobility Patterns and User Dynamics in Racially Segregated Geographies of US Cities

Similar documents
MEASURING RACIAL RESIDENTIAL SEGREGATION

Spatiotemporal Analysis of Commuting Patterns in Southern California Using ACS PUMS, CTPP and LODES

arxiv: v2 [cs.si] 13 Apr 2016

Spatiotemporal Analysis of Commuting Patterns: Using ArcGIS and Big Data

Exercise on Using Census Data UCSB, July 2006

Environmental Analysis, Chapter 4 Consequences, and Mitigation

Tracey Farrigan Research Geographer USDA-Economic Research Service

GIS Lecture 5: Spatial Data

Exploring Urban Areas of Interest. Yingjie Hu and Sathya Prasad

Encapsulating Urban Traffic Rhythms into Road Networks

Geospatial Analysis of Job-Housing Mismatch Using ArcGIS and Python

Detecting Origin-Destination Mobility Flows From Geotagged Tweets in Greater Los Angeles Area

Spatial and Socioeconomic Analysis of Commuting Patterns in Southern California Using LODES, CTPP, and ACS PUMS

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Learning Likely Locations

Geographical Bias on Social Media and Geo-Local Contents System with Mobile Devices

Understanding Individual Daily Activity Space Based on Large Scale Mobile Phone Location Data

Discovering Urban Spatial-Temporal Structure from Human Activity Patterns

A Cloud Computing Workflow for Scalable Integration of Remote Sensing and Social Media Data in Urban Studies

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

Cluster Analysis Techniques for Neighborhood Change

Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones

KEYWORDS: census maps, map scale, inset maps, feature density analysis, batch mapping. Introduction

Exploring spatial decay effect in mass media and social media: a case study of China

Explaining Racial/Ethnic Gaps In Spatial Mismatch: The Primacy of Racial Segregation

Measurement of human activity using velocity GPS data obtained from mobile phones

Explaining Racial/Ethnic Gaps in Spatial Mismatch: The Primacy of Racial Segregation

A Socioeconomic Analysis of the Spatial Distribution of Fire Hydrants. History of Portland Fire Hydrants

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

PALS: Neighborhood Identification, City of Frederick, Maryland. David Boston Razia Choudhry Chris Davis Under the supervision of Chao Liu

Spatial Data, Spatial Analysis and Spatial Data Science

GEOG 510 DEATON Assignment 2: Homicides in Chicago INTRODUCTION

Social Science Research

Demographic Data in ArcGIS. Harry J. Moore IV

CRP 608 Winter 10 Class presentation February 04, Senior Research Associate Kirwan Institute for the Study of Race and Ethnicity

Child Opportunity Index Mapping

BROOKINGS May

GIS Analysis of Crenshaw/LAX Line

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data

Working with Census 2000 Data from MassGIS

Using American Factfinder

The Scope and Growth of Spatial Analysis in the Social Sciences

During the latter half of the twentieth. Modest Progress:

SOUTH COAST COASTAL RECREATION METHODS

GIS Spatial Statistics for Public Opinion Survey Response Rates

INTRODUCTION SEGREGATION AND NEIGHBORHOOD CHANGE: WHERE ARE WE AFTER MORE THAN A HALF-CENTURY OF FORMAL ANALYSIS 1

Abstract Teenage Employment and the Spatial Isolation of Minority and Poverty Households Using micro data from the US Census, this paper tests the imp

Exploring the Impact of Ambient Population Measures on Crime Hotspots

What are we like? Population characteristics from UK censuses. Justin Hayes & Richard Wiseman UK Data Service Census Support

DM-Group Meeting. Subhodip Biswas 10/16/2014

NEW YORK DEPARTMENT OF SANITATION. Spatial Analysis of Complaints

A Street Named for a King

Mapping Accessibility Over Time

Datahoods or Data-Driven Neighborhoods. Using GIS to better understand neighborhoods

Utilizing Data from American FactFinder with TIGER/Line Shapefiles in ArcGIS

Understanding China Census Data with GIS By Shuming Bao and Susan Haynie China Data Center, University of Michigan

Keywords: Air Quality, Environmental Justice, Vehicle Emissions, Public Health, Monitoring Network

Neighborhood Locations and Amenities

Census Transportation Planning Products (CTPP)

APPENDIX C-3 Equitable Target Areas (ETA) Technical Analysis Methodology

The Church Demographic Specialists


2010 Census Data Release and Current Geographic Programs. Michaellyn Garcia Geographer Seattle Regional Census Center

The History Behind Census Geography

Bus Landscapes: Analyzing Commuting Pattern using Bus Smart Card Data in Beijing

The Building Blocks of the City: Points, Lines and Polygons

Counterfactual Dissimilarity: Can Changes in Demographics and Income Explain Increased Racial Integration in U.S. Cities?

Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales. ACM MobiCom 2014, Maui, HI

A Note on Commutes and the Spatial Mismatch Hypothesis

Using GIS to Explore the Relationship between Socioeconomic Status and Demographic Variables and Crime in Pittsburgh, Pennsylvania

Using Social Media for Geodemographic Applications

Spatial Organization of Data and Data Extraction from Maptitude

Regional Performance Measures

The History Behind Census Geography

Socio-Economic Levels and Human Mobility

HORIZON 2030: Land Use & Transportation November 2005

Figure 8.2a Variation of suburban character, transit access and pedestrian accessibility by TAZ label in the study area

Guilty of committing ecological fallacy?

Analysis of Bank Branches in the Greater Los Angeles Region

emerge Network: CERC Survey Survey Sampling Data Preparation

(Department of Urban and Regional planning, Sun Yat-sen University, Guangzhou , China)

Integrating GIS into Food Access Analysis

Visualization of Commuter Flow Using CTPP Data and GIS

emerge Network: CERC Survey Survey Sampling Data Preparation

Regional Performance Measures

Clustering Analysis of London Police Foot Patrol Behaviour from Raw Trajectories

Urban socioeconomic attribute differences in the border areas of Imperial County, California, USA, and Mexicali, Baja California, Mexico

Module 10 Summative Assessment

Polarization and Protests: Understanding Complex Social and Political Processes Using Spatial Data and Agent-Based Modeling Simulations

In matrix algebra notation, a linear model is written as

Your web browser (Safari 7) is out of date. For more security, comfort and. the best experience on this site: Update your browser Ignore

Space-adjusting Technologies and the Social Ecologies of Place

Basic Training Battlemind to Home Symposium. Sept

CIVL 7012/8012. Collection and Analysis of Information

Twitter s Effectiveness on Blackout Detection during Hurricane Sandy

Social Vulnerability Index. Susan L. Cutter Department of Geography, University of South Carolina

PRODUCING MILLIONS OF MAPS FOR THE UNITED STATES 2010 DECENNIAL CENSUS*

Does city structure cause unemployment?

from

Transcription:

Mobility Patterns and User Dynamics in Racially Segregated Geographies of US Cities Nibir Bora, Yu-Han Chang, and Rajiv Maheswaran Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292 nbora@usc.edu,{ychang,maheswar}@isi.edu Abstract. In this paper we try to understand how racial segregation of the geographic spaces of three major US cities (New York, Los Angeles and Chicago) affect the mobility patterns of people living in them. Collecting over 75 million geo-tagged tweets from these cities during a period of one year beginning October 2012 we identified home locations for over 30,000 distinct users, and prepared models of travel patterns for each of them. Dividing the cities geographic boundary into census tracts and grouping them according to racial segregation information we try to understand how the mobility of users living within an area of a particular predominant race correlate to those living in areas of similar race, and to those of a different race. While these cities still remain to be vastly segregated in the 2010 census data, we observe a compelling amount of deviation in travel patterns when compared to artificially generated ideal mobility. A common trend for all races is to visit areas populated by similar race more often. Also, blacks, Asians and Hispanics tend to travel less often to predominantly white census tracts, and similarly predominantly black tracts are less visited by other races. Keywords: Mobility patterns, racial segregation, Twitter 1 Introduction Sociologists and economists have long been trying to understand the influence of racial segregation in the United States, on various social aspects like income, education, employment and so on. Every decennial census has hinted the gradually decreasing residential racial segregation in many major metropolitans, but, it is important to continually analyze the effects and how they change over time to have a better understanding of today s social environment. In this paper, we try to understand if, and how, racial segregation affect the way people move around in large metropolitans. Ubiquitously available data from geo-location based sharing services like Twitter poses a prudent source of real-time spatial movement information. Coalescing users belonging to racially predominant geographic areas with their mobility patterns, we analyze to find variations in travel to areas of similar and dissimilar races. We also build generalized models of ideal human mobility and create a corpus of travel activity

2 Nibir Bora, Yu-Han Chang, and Rajiv Maheswaran analogous to the actual data. Comparing the actual mobility of users to the ideal models, we look for bias and interesting behavior patterns and dynamics in three U.S. cities- New York, Los Angeles and Chicago. 2 Related Work A number of studies ([1], [2], [3]) have accounted for the qualitative statistics of racial segregation in major U.S. metropolitans, and how its extent has changed over time. Most of these studies use one or more of the five indexes explained by Massey and Denton [4] to compare the magnitude of segregation between two racial groups. Analyzing data from the 1980 census in 60 U.S. metropolitans, Denton and Massey s [1] findings indicated that blacks were highly segregated from white in all socioeconomic levels, relative to Hispanics or Asians. Although the levels of segregation has declined modestly during the 1980s [5], through the 1990s [2], and up to 2000 [3], blacks still remain more residentially segregated than Hispanics and Asians. Clark [6] points out that although a certain degree of racial integration is acceptable it is unrealistic to expect large levels of integration across neighborhoods, because there exists a tendency for households of a given race to cluster with others of similar race [7]. Veering from the conventional studies that measure the extent of segregation, few researchers have tried to identify social problems arising as a result of it. Peterson and Krivo [8] studied the effect of racial segregation on violent crime. Card and Rothstein [9] found that black-white SAT test score gaps during 1998-2001 were much higher in more segregated cities compared to nearly integrated cities. In our study, we consider another interesting effect- biases in mobility patterns as a result of segregation, and at the same time shed some light on the extent of segregation in the 2010 census data. Spatiotemporal models of human mobility have been studied on various datasets, such as circulation of US bank notes [10] and cell phone logs [11]. Temporal human activities like replying to emails, placing phone calls, etc. are known to occur in rapid successions of short duration followed by long inactive separations, resembling a Pareto distribution. The truncated power law distribution characterizing heavy-tailed behavior for both distance and time duration of hops between subsequent events in a trajectory of normal travel pattern has been established by many studies [10], [11]. Although geo-location based data from Twitter has been used in several applications like spotting and tracking earthquake shakes [12] and street-gang behavior [13], it has not been used to model effects of racial segregation on mobility patterns and behavior. 3 Data Description Census Tracts & Racial Segregation Data. To build a geographic scaffolding for our experiments, we use census tract polygons defined by the U.S. Census Bureau 1 in three large metropolitans- New York City, Los Angeles and 1 http://www.census.gov/

Title Suppressed Due to Excessive Length 3 (a) New York (b) Los Angeles (c) Chicago Fig. 1: Figure shows racial segregation maps for (a) New York City, (b) Los Angeles, and (c) Chicago. Colors- blue:white, green:black, red:asian, orange:hispanic, brown:others. Note: The maps show only a portion of the entire city. Chicago. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, typically having a population between 1,200 and 8,000 people. Census information is also available for city blocks, however, using such small geographic entities would account for scarce user movement data. Census tract shapefiles can be downloaded from the National Historical Geographic Information System (NHGIS) website2. To find the predominant race in each census tracts we use table P5 (Hispanic or Latino Origin by Race) in the 2010 Summary File 1 (SF1), also available on the NHGIS website. Table P5 enumerates the number of people in each census tract belonging to the racial groups- non-hispanic White, non-hispanic Black, non-hispanic Asian, Hispanic or Latino, and other. The population of these five categories sums up to the total population of the tract. The predominant race in a tract is the one having a majority (50% or more) population. Figure 1 shows census tract boundaries color coded by the majority race. The tracts with a lighter shade of color represents the ones where the race with maximum population did not have a 50% majority. 2 https://www.nhgis.org/

4 Nibir Bora, Yu-Han Chang, and Rajiv Maheswaran Users & Twitter Data. Once the geographic canvas is ready, we need human entities to model mobility patterns. An ideal dataset would consist of a large set of people, location of their homes and their daily movement traces on a geographic coordinate system. Since such a corpus is difficult to build and acquire, we consider an alternative source- Twitter. A geo-tagged tweet is up to 140 characters of text and is associated with a user id, timestamp, latitude and longitude. Frequent Twitter users who use its location sharing service, would produce a close representation of their daily movement in their tweeting activity. We collected strictly geo-tagged tweets using Twitter s Streaming API 3 and limited them to the polygon bounding the three cities New York, Los Angeles and Chicago. This way, we received all tweets with location information and not just a subset. We disregard all other fields obtained from Twitter, including the user s twitter handle, and in no way use the information we retain to identify personal information about a user. Over a period of one year, beginning October 2012, we accumulated over 75 million geo-tagged tweets. Next, we try to identify home locations of users by following a very straight forward method based on the assumptions that users generally tweet from home at night. For each unique user we start by collecting all tweets between 7:00pm and 4:00am, and apply a single pass of DBSCAN clustering algorithm [14]. The largest cluster produced by the cluster analysis is chosen as the one corresponding to the user s home, and its centroid is used as the exact coordinates. Skipping users with very few tweets, and ones for whom a cluster could not be formed, we were able to identify home locations for over 30,000 unique users. A user is assigned to a particular census tract if the coordinates of his/her home lies within its geographic bounds, and is assumed to belong to the race of majority population in that tract. Once these preprocessing steps have been carried out, we are left with a rich set of data comprising of users, their race, their home location, and their movement activity on a geographic space. This data will act as the seed for all our following experiments. 4 Experimental Setup Keeping in line with our primary objective of identifying effects of racial segregation on movement behavior, we calculate number of visits among users living in tracts of different races. Human mobility, however, tends to follow uniform patterns and can be simulated by parameterized models. The question that arises is whether or not by visiting a tract of similar or dissimilar race, a user is simply adhering to the ideal movement pattern he/she is supposed to follow, or is there a bias due to the presence of a particular race. To answer this question, we build models of movement patterns for each of the three cities and generate synthetic datasets. Measuring the variation of actual mobility data from the ideal (simulated) movement patterns would indicate the presence of any inter-race bias. This steps involved in this process are explained next. 3 https://dev.twitter.com/

Title Suppressed Due to Excessive Length 5 (a) New York Fit 0 10km 20km 30km 40km 50km (b) Los Angeles Fit 0 10km 20km 30km 40km 50km (c) Chicago Fit 0 10km 20km 30km 40km 50km Fig. 2: Figure shows displacement from home while tweeting in (a) New York City, (b) Los Angeles, and (c) Chicago. 4.1 Models of Movement Pattern An established characteristic of human mobility is its Levy flight and random walk properties. In essence, the trajectory of movement follows a sequence of random steps where the step size belongs to a power law distribution (probability distribution function (PDF): f(x) = Cx α ), meaning, there are a large number of short hops and fewer long hops. As shown in [13] the distance from home while tweeting also follows a power law distribution. Figure 2 shows the distance from home distribution over the range 100m to 50km, and the corresponding least square fit for power law in the three cities. As a test of correctness we use a two-sample Kolmogorov-Smirnov test, where the null hypothesis states that the two samples are drawn from a continuous power law distribution. In each case, the null hypothesis was accepted with significance (p < 5), hence verifying the correctness of the parameter fits. Tweets within 100m from the home location of users were removed as such small shifts in distance may occur due to GPS noise even when the user is stationary. As shown in [13], the direction of travel from home also follows a uniform distribution, only to be skewed by physical and geographic barriers like freeways, oceans etc. For computational simplicity we disregard any such skew and assume that the distribution follows a perfect uniform distribution, i.e. equally likely to travel in any direction. The resulting PDF, shown below, is the product of two probabilities- one for distance and the other for direction of travel θ (θ is constant). f(x) = Cx α 1 θ (1) location data for a user is generated by creating a random sample from the distribution in Equation 1. Keeping the number of simulated tweet locations equal to the number of actual tweets, a synthetic dataset is created by sampling for each user.

6 Nibir Bora, Yu-Han Chang, and Rajiv Maheswaran +8% White Black Asian Hispanic Visit fraction -32% +104% -23% +270% -25% +90% -56% -4% -33% -41% +3% -76% -49% -48% +25% (a) New York +4% White Black Asian Hispanic +68% Visit fraction -21% -9% -7% -43% +83% -8% +19% -42% +26% +115% +11% -61% +7% -11% (b) Los Angeles Visit fraction +21% -59% White +119% -34% -56% +62% Black -13% -36% +11% -78% Asian +390% -37% -25% -49% Hispanic +200% +106% (c) Chicago Fig. 3: Figure shows fraction of visits between each of the four race in the three cities. Colors- blue:white, green:black, red:asian, orange:hispanic. 5 Results & Discussion With the artificial data being a representation of ideal movement pattern one should follow, the comparison with actual movements yield a number of interesting results. Figure 3 shows, for each of the three cities, fraction of visits by people living in white, black, Asian and Hispanic tracts, alongside the simulated ideal fractions. As clearly visible, the actual movement patterns of users deviate significantly from the expected behavior.

Title Suppressed Due to Excessive Length 7 A noticeable trend for any given race is that visits are always higher to tracts of similar race, with the difference from simulated visits for blacks, Asians and Hispanics being very high, going up to four times for Asians. The difference is not as high for whites, however, both the actual and simulated fractions are noticeably larger than visits to any other races. This is explained by the fact that there are far too many white tracts compared to other races, and white segregation clusters are large as well. While visiting tracts of other races the actual and simulated data are very close except in New York and Chicago where the actual visits by people living in white tracts to black tracts are over 50% less than what ideally should have been. Likewise, people living in black tracts in all three cities would visit white tracts less often, while their visits to Asian and Hispanic tracts does not skew much from the artificial data. Hispanics in New York and Chicago visits blacks less often than expected, but it is just the opposite in Los Angeles. In general, visits to white tracts by blacks, Asians and Hispanics are always much lower than simulated. The only exception to this trend is for Asians in Chicago, where there are very few tracts with a majority Asian population, meaning, it could simply be a bias in sample size. The visits to predominantly black tracts by other races is also lower than the artificial data, although there is an interesting exception in Los Angeles where Asians and Hispanics visit blacks more often than expected. It is fascinating to see that all races are biased towards areas of identical race, and tend to keep away from others. It is also interesting to note that these trends do not resonate equally in all cities. For example, blacks in New York and Los Angeles, would visit Hispanics close to or even more than expected, but in Chicago the fraction of visits is less. 6 Conclusion In this paper we try to understand the effects of racial segregation on mobility patterns of people living in three major U.S. metropolitans- New York City, Los Angele and Chicago. We assembled a dataset comprising of human entities, their home locations and daily movement data by accumulating geo-tagged tweets from these cities and performing simple preprocessing steps. The human entities were combined with geographic entities, in this case census tract polygons, and each user was associated with a particular race homologous to the race with majority population in that tract, as calculated from the 2010 census data. Building parameterized models of human mobility for these cities we generated synthetic data to compare with the actual movement of people. We observed significant effects of racial segregation on people s mobility, leading to some interesting observations. Although racial segregation in the U.S. has been decreasing in the past few decades the major metropolitans are still vastly segregated. People living within tracts of any particular race are biased towards other races and tend to visit tracts of similar race more often. However, the difference in visits to other races

8 Nibir Bora, Yu-Han Chang, and Rajiv Maheswaran is not evenly distribute. Blacks, Asians and Hispanics usually have a higher percentage of difference in their visits to white tracts, and similarly, black tracts are less visited by other races. Within these patterns we also observed some variations among the three cities, for example, the higher than expected visits to black tracts in Los Angeles. Our approach allows to use readily available geo-location based data from Twitter to model human mobility and investigate effects of geographic and sociological constraints. However, this approach is far from being perfect and opens up numerous avenues for future research. For instance, census tracts have people from different races living in them, but human entities are designated to the race with majority population, when in reality they may belong to a different race. Another assumption we made was the uniform distribution of direction of travel. It would be interesting to introduce skews in the distribution according to the presence of geographic barriers. Acknowledgements. This material is based upon work supported in part by the AFOSR under Award No. FA9550-10-1-0569 (Bora, Zaytsev, and Chang) and the ARO under MURI award No. W911NF-11-1-0332 (Maheswaran). References 1. Denton, N.A., Massey, D.S.: Residential segregation of blacks, hispanics, and asians by socioeconomic status and generation. Social Science Quarterly 69(4) (1988) 797 817 2. Glaeser, E.L., Vigdor, J.L.: Racial segregation in the 2000 census: Promising news. Brookings Institution, Center on Urban and Metropolitan Policy (2001) 3. Logan, J.R., Stults, B.J., Farley, R.: Segregation of minorities in the metropolis: Two decades of change. Demography 41(1) (2004) 1 22 4. Massey, D.S., Denton, N.A.: The dimensions of residential segregation. Social forces 67(2) (1988) 281 315 5. Massey, D.S., Denton, N.A.: Trends in the residential segregation of blacks, hispanics, and asians: 1970-1980. American sociological review (1987) 802 825 6. Clark, W.A.: Residential preferences and neighborhood racial segregation: A test of the schelling segregation model. Demography 28(1) (1991) 1 19 7. Bayer, P., McMillan, R., Rueben, K.S.: What drives racial segregation? new evidence using census microdata. Journal of Urban Economics 56(3) (2004) 514 535 8. Peterson, R.D., Krivo, L.J.: Racial segregation and black urban homicide. Social Forces 71(4) (1993) 1001 1026 9. Card, D., Rothstein, J.: Racial segregation and the black white test score gap. Journal of Public Economics 91(11) (2007) 2158 2184 10. Brockmann, D., Hufnagel, L., Geisel, T.: The scaling laws of human travel. Nature 439(7075) (2006) 462 465 11. Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understanding individual human mobility patterns. Nature 453(7196) (2008) 779 782 12. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web, ACM (2010) 851 860 13. Bora, N., Zaytsev, V., Chang, Y.H., Maheswaran, R.: Gang networks, neighborhoods and holidays: Spatiotemporal patterns in social media. In: 2013 ASE/IEEE International Conference on Social Computing. (2013)

Title Suppressed Due to Excessive Length 9 14. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise, Kdd (1996)