A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing

Similar documents
TASK ASSIGNMENT WITH RIGOROUS PRIVACY PROTECTION IN SPATIAL CROWDSOURCING. Hien To. Submitted in Partial Fulfillment of the Requirements

Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies

Cri$ques Ø 5 cri&ques in total Ø Each with 6 points

Differentially Private Real-time Data Release over Infinite Trajectory Streams

DYNAMIC DIFFERENTIAL LOCATION PRIVACY WITH PERSONALIZED ERROR BOUNDS

[Title removed for anonymity]

Differential Privacy and its Application in Aggregation

Streaming - 2. Bloom Filters, Distinct Item counting, Computing moments. credits:

Modeling radiocarbon in the Earth System. Radiocarbon summer school 2012

CMPUT651: Differential Privacy

Interes'ng- Phrase Mining for Ad- Hoc Text Analy'cs

Communication-efficient and Differentially-private Distributed SGD

Order- Revealing Encryp2on and the Hardness of Private Learning

Lecture 11- Differential Privacy

Differentially Private Publication of Location Entropy

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

Enabling Accurate Analysis of Private Network Data

1 Differential Privacy and Statistical Query Learning

The Optimal Mechanism in Differential Privacy

Calibrating Noise to Sensitivity in Private Data Analysis

DPT: Differentially Private Trajectory Synthesis Using Hierarchical Reference Systems

Tsybakov noise adap/ve margin- based ac/ve learning

DPT: Differentially Private Trajectory Synthesis Using Hierarchical Reference Systems

Differential Privacy Models for Location- Based Services

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Bayesian networks Lecture 18. David Sontag New York University

A Model for Quan.fying Informa.on Leakage. Steven Whang, Hector Garcia Molina Stanford University

Multi-key Hierarchical Identity-Based Signatures

Experimental Designs for Planning Efficient Accelerated Life Tests

Detec%ng and Analyzing Urban Regions with High Impact of Weather Change on Transport

The Challenge of Geospatial Big Data Analysis

Frequency-hiding Dependency-preserving Encryption for Outsourced Databases

Privacy in Statistical Databases

IS4200/CS6200 Informa0on Retrieval. PageRank Con+nued. with slides from Hinrich Schütze and Chris6na Lioma

Predic've Analy'cs for Energy Systems State Es'ma'on

Spatial Crowdsourcing: Challenges and Applications

Lecture 11. Data Standards and Quality & New Developments in GIS

Unsupervised Learning: K- Means & PCA

Determinis)c Compressed Sensing for Images using Chirps and Reed- Muller Sequences

Exact data mining from in- exact data Nick Freris

Geo-Indistinguishability: Differential Privacy for Location-Based Systems

Lecture 1 Introduction to Differential Privacy: January 28

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada

The Optimal Mechanism in Differential Privacy

Mobility Analytics through Social and Personal Data. Pierre Senellart

Pseudospectral Methods For Op2mal Control. Jus2n Ruths March 27, 2009

Differentially Private Sequential Data Publication via Variable-Length N-Grams

Nearest Neighbor Search with Keywords in Spatial Databases

Database Privacy: k-anonymity and de-anonymization attacks

Bias/variance tradeoff, Model assessment and selec+on

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

Practical Differential Privacy via Grouping and Smoothing

CSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs

Insecurity of An Anonymous Authentication For Privacy-preserving IoT Target-driven Applications

Sample complexity bounds for differentially private learning

Differential Privacy

Differentially Private Event Sequences over Infinite Streams

CSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions:

Associa'on of U.S. tornado counts with the large- scale environment on monthly 'me- scales

Quantum Wireless Sensor Networks

Bellman s Curse of Dimensionality

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

Introduction to Particle Filters for Data Assimilation

Differentially Private Sequential Data Publication via Variable-Length N-Grams

Differentially Private Oblivious RAM

k-points-of-interest Low-Complexity Privacy-Preserving k-pois Search Scheme by Dividing and Aggregating POI-Table

Reduced Models for Process Simula2on and Op2miza2on

Weather Technology in the Cockpit (WTIC) Program Applying Cloud Technology and Crowd Sourcing to Enhance Cockpit Weather

Bounded Privacy: Formalising the Trade-Off Between Privacy and Quality of Service

Answering Many Queries with Differential Privacy

Lecture 12. Data Standards and Quality & New Developments in GIS

Informa(onal Subs(tutes and Complements for Predic(on

Tutorial: Urban Trajectory Visualization. Case Studies. Ye Zhao

CSE 21 Math for Algorithms and Systems Analysis. Lecture 10 Condi<onal Probability

Quantitative Approaches to Information Protection

Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring 2016

Personalized Social Recommendations Accurate or Private

Introduc)on to the Design and Analysis of Experiments. Violet R. Syro)uk School of Compu)ng, Informa)cs, and Decision Systems Engineering

Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services

Release Connection Fingerprints in Social Networks Using Personalized Differential Privacy

Mario A. Nascimento. Univ. of Alberta, Canada http: //

PoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2

Applied Time Series Analysis FISH 507. Eric Ward Mark Scheuerell Eli Holmes

Report on Differential Privacy

PDQ Tracker High Level Requirements

Unsupervised Anomaly Detection for High Dimensional Data

Parameter Es*ma*on: Cracking Incomplete Data

Statistical Privacy For Privacy Preserving Information Sharing

Social and Technological Network Analysis. Lecture 11: Spa;al and Social Network Analysis. Dr. Cecilia Mascolo

Ensemble of Climate Models

Some thoughts on linearity, nonlinearity, and partial separability

arxiv: v4 [cs.db] 1 Sep 2017

Pufferfish Privacy Mechanisms for Correlated Data. Shuang Song, Yizhen Wang, Kamalika Chaudhuri University of California, San Diego

Data Mining II Mobility Data Mining

Differential Privacy and Verification. Marco Gaboardi University at Buffalo, SUNY

Boos$ng Can we make dumb learners smart?

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks. Yang Cao Emory University

Window-aware Load Shedding for Aggregation Queries over Data Streams

Transcription:

A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing Nov 12 2014 Hien To, Gabriel Ghinita, Cyrus Shahabi VLDB 2014 1

Mo/va/on Ubiquity of mobile users 6.5 billion mobile subscrip/ons, 93.5% of the world popula/on [1] Technology advances on mobiles Smartphone's sensors. e.g., video cameras Network bandwidth improvements From 2.5G (up to 384Kbps) to 3G (up to 14.7Mbps) and recently 4G (up to 100 Mbps) VLDB 2014 2 [1] hop://mobithinking.com/mobile- marke/ng- tools/latest- mobile- stats/

Spa/al Crowdsourcing q Crowdsourcing Outsourcing a set of tasks to a set of workers q Spa/al Crowdsourcing Crowdsourcing a set of spa%al tasks to a set of workers. Spa%al task is related to a loca/on.e.g., taking pictures Loca/on privacy is one of the major impediments that may hinder workers from par/cipa/on in SC VLDB 2014 3

Problem Statement Current solu/ons require the workers to disclose their loca/ons to untrustworthy en//es, i.e., SC- server. Requesters SC- server Report loca+ons Workers A framework for protec/ng privacy of worker loca/ons, whereby the SC- server only has access to data sani/zed according to differen%al privacy. VLDB 2014 4

Outline v Background v Privacy Framework v Worker PSD (Private Spa/al Decomposi/on) v Task Assignment v Experiments VLDB 2014 5

U/lity- Privacy Trade- off 100% Utility 0% 100% Privacy 0% VLDB 2014 6

Related Work v Pseudonymity (using fake iden/ty) e.g. fake iden/ty + loca/on == resident of the home v K- anonymity model (not dis/nguish among other k records) iden//es are known the loca/on k- anonymity fails to prevent the loca/on of a subject being not iden/fiable all k users reside in the exact same loca/on k- anonymity, do not provide rigorous privacy v Cryptography such technique is computa%onal expensive =>not suitable for SC applica/ons VLDB 2014 7

ε 1 Differen/al Privacy (DP) DP ensures an adversary do not know from the sani/zed data whether an individual is present or not in the original data ε - dis$nguishability [Dwork 06] A database produces transcript U on a set of queries. Transcript U sa/sfies ε - dis/nguishability if for every pair of sibling datasets D 1 and D, D and 2 1 = D2 they differ in only one record, it holds that : privacy budget Pr[ QS ln Pr[ QS DP allows only aggregate queries, e.g., count, sum. L - sensi+vity: D D2 Given neighboring datasets 1 and, the sensi/vity of query set QS is the the maximum change in their query results q σ ( QS) = max QS( D ) QS( D ) D, D 1 2 i= 1 [Dwork 06] shows that it is sufficient to achieve Laplace noise with mean D D 1 2 = = U] U] ε ε 1 λ = σ (QS) / ε VLDB 2014 8 2 - DP by adding random

Outline v Background v Privacy Framework v Worker Private Spa/al Decomposi/on v Task Assignment v Experiments VLDB 2014 9

Privacy Framework 0. Workers send their loca/ons to a trusted CSP SC-Server Cell Service Provider 1. CSP releases a PSD according to ε. PSD is accessed by SC- server 2. SC- server receives tasks from requesters PSD 2. Task Request t 1. Sanitized Release 3. Geocast {t,gr} Worker Database 0. Report Locations 3. When SC- server receives task t, it queries the PSD to determine a GR that enclose sufficient workers. Then, SC- server ini/alizes geocast communica/on to disseminate t to all workers within GR 4. Workers confirm their availability to perform the assigned task Requesters 4. Consent Workers trust SCP GR Workers do not trust SC- server and requesters Focus on private task assignment rather than post assignment Workers VLDB 2014 10

Design Goal and Performance Metrics Protec/ng worker loca/on may reduce the effec/veness and efficiency of worker- task matching, captured by following metrics: Assignment Success Rate (ASR): measures the ra/o of tasks accepted by workers to the total number of task requests Worker Travel Distance (WTD): the average travel distance of all workers System Overhead: the average number of no/fied workers (ANW). ANW affects both communica%on overhead required to geocast task requests and the computa%on overhead of matching algorithm VLDB 2014 11

Outline v Background v Privacy Framework v Worker PSD (Private Spa+al Decomposi+on) v Task Assignment v Experiments VLDB 2014 12

Adap/ve Grid (Worker PSD) Creates a coarse- grained, fixed size 1 m 1 grid over data domain. Then issues count queries for each level- 1 cell using m 1 m 2 m1 1 ε 1 N ε = max 10, 4 k2 m m2 Par//ons each level- 1 cell into 2 m 2 level- 2 cells, is adap/vely chosen based on noisy count N' of level- 1 cell m 2 1 N' ε 2 = 4 k2 ε = ε 1 + ε 2 Level 1 Level 2 C A c1 c2 c c3 c 13 14 c 15 4 c7 c8 c16 c17 c 18 c11 VLDB c2014 12 c19 c20 c21 13 c9 c10 D c B ' ' ( N = 100) ( = 100) A N B ' ' ( N = 100) ( = 200) C N D [Qardaji 13] c5 c6

Customized AG 2 2 k / n = N' / m = Expected #workers (noisy count) in level- 2 cells 2 2 N'=100 m 2 large m n ε ε 2 2 1 0.5 3 11 0.5 0.25 2 25 0.1 0.05 1 100 L Original AG ( k 2 = 5) leads to high communica+on cost n ε ε 2 m2 1 0.5 6 2.8 0.5 0.25 5 5.6 0.1 0.05 2 28 J Customized AG ( k 2 2, p = 88%) = h Increase to decrease overhead, but only to the point where there is at least one worker in a cell The probability that the real count is larger than zero: p h 1 count = PSD 1 exp 2 VLDB 2014 1/ ε 14 2 n ε

Customized AG Original AG and Customized AG adapts to data distribu/ons Original AG minimizes overall es/ma/on error of region queries while customized AG increases the number of 2 nd level cells Yelp Dataset Original AG Customized AG VLDB 2014 15

Outline v Background v Privacy Framework v Worker PSD (Private Spa/al Decomposi/on) v Task Assignment v Experiments VLDB 2014 16

Analy/cal U/lity Model We define Acceptance Rate as a decreasing func/on of task- worker distance (e.g. linear, Zipian) p a a = F( d); 0 p 1 SC- server establishes an Expected U%lity ( EU ) threshold, which is the a targeted success rate for a task. EU > p. X is a random variable for an event that a worker accepts a received task P( X a = True) = p ; P( X = False) = 1 p a Assuming w independent workers. U is the probability that at least one worker accepts the task X ~ Binomial ( w, p ) U = 1 (1 p a ) w a VLDB 2014 17

Acceptance Rate Func/ons 0.5 Acceptacerate 0 distance MTD VLDB 2014 18

Geocast Region Construc/on Determines a small region that contains sufficient workers Greedy Algorithm (GDY) 1. Init GR = {}, max- heap of candidates Q = { the cell that contains } 2. c i Q 4. If U EU, return GR 5. neighbors = { ci ' s neighbors} GR MTD Q Q 3. U (1 U )(1 ) 1 U c i = Q neighbors 6. ; Go to 2. t c1 c2 c c 3 4 c9 c10 c c 11 12 t c5 c6 c7 c8 c 13 c 14 c16 c17 c 15 c 18 c19 c20 c21 VLDB 2014 19

Par/al Cell Selec/on L The number of workers can s/ll be large with AG, especially when ε 2 small Allow par$al cell inclusion on the lastly added cell ci c i t 7 t 6 Sub-cell c i ' t 5 c1 c2 c3 c4 c5 c6 c7 c8 t 8 t 0 t1 2 t 4 t t3 Splitng c i c c 10 9 t c11 c12 c 13 c 14 c16 c17 Splitng 7 c 15 c 18 c19 c20 c21 c VLDB 2014 20

Communica/on Cost Cellular Internet WLAN The more compact the GR, the lower the cost Digital Compactness Measurement [Kim 84] area( GR) DCM = area( MIN BALL) Measurement: Hop count Farthest distance between twoworkers = 2 Communicationrange c1 c2 c 3 Mobile Ad- hoc Networks Infrastructure- based Mode v.s Infrastructure- less Mode c 9 c 10 c 13 c 14 c 15 c16 c17 c c11 c 18 12 VLDB 2014 21 c19 c20 c21 t c 4 c5 c6 c7 c8

Geocast Regions A B C D VLDB 2014 22

Outline Background Privacy Framework Worker PSD (Private Spa/al Decomposi/on) Task Assignment Experiments VLDB 2014 23

Experimental Setup Datasets Name #Tasks #Workers MTD (km) Gowalla 151,075 6,160 3.6 Yelp 15,583 70,817 13.5 Assump/ons Gowalla and Yelp users are workers Check- in points (i.e., of restaurants) are task loca/ons Parameter setngs ε = {0.1, 0.4, 0.7,1} 1000 random tasks x 10 seeds EU ={0.3,0.5,0.7,0.9} MaxAR = {0.1, 0.4, 0.7,1} VLDB 2014 24

GR Construc/on Heuris/cs (Gow.- Linear) GDY = geocast (GREedy algorithm) + original Adap/ve grid (AG) [Qardaji 13] G- GR = geocast + AG with customized GRanularity G- PA = geocast with PAr/al cell selec/on + original Adap/ve grid (AG) G- GP = geocast with Par/al cell selec/on + AG with customized Granularity 120 100 80 60 40 20 ANW WTD- FC HOP 0.5 GDY G-GR 0.4 G-PA G-GP 0.3 0.2 0.1 GDY G-GR 8 G-PA G-GP GDY 6 G-PA 4 2 G-GR G-GP 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 VLDB 2014 25

Effect of Grid Size to ASR 100 90 Over-provision Under-provision 80 ASR 70 60 Gowalla-Linear Gowalla-Zipf Yelp-Linear Yelp-Zipf 50 0.1 0.2 0.4 0.8 1.41 1.6 3.2 6.4 12.8 25.6 k2 Average ASR over all values of budget by varying k2 VLDB 2014 26

Compactness- based Heuris/cs (Yelp- Zipf) 10 HOP 80 ANW 8 6 4 G-GP-Pure 2 G-GP-Hybrid G-GP-Compact 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 60 40 G-GP-Pure 20 G-GP-Hybrid G-GP-Compact 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 VLDB 2014 27

Overhead of Archieving Privacy (Gow.- Zipf) 60 ANW WTD- FC ASR 0.4 100 40 20 Privacy Non-Privacy 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0.3 0.2 0.1 Privacy Non-Privacy 0 Eps=0.1 Eps=0.4 VLDB 2014 Eps=0.7 Eps=1 80 60 40 Privacy 20 Non-Privacy 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 28

Effect of Varying MAR (Yelp- Linear) 50 40 ANW Eps=0.1 Eps=0.7 Eps=0.4 Eps=1 0.4 0.3 WTD- FC Eps=0.1 Eps=0.7 Eps=0.4 Eps=1 8 6 CELL Eps=0.1 Eps=0.7 Eps=0.4 Eps=1 30 20 0.2 4 10 0.1 2 0 AR=0.1 AR=0.4 AR=0.7 AR=1 0 AR=0.1 AR=0.4 AR=0.7 AR=1 0 AR=0.1 AR=0.4 AR=0.7 AR=1 VLDB 2014 29

Effect of Varying EU (Yelp- Linear) ANW 50 40 30 20 10 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 EU=30 EU=50 EU=70 EU=90 WTD- FC 0.4 0.3 0.2 0.1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 VLDB 2014 EU=30 EU=50 EU=70 EU=90 CELL 8 Eps=0.1 Eps=0.4 6 Eps=0.7 Eps=1 4 2 0 EU=30 EU=50 EU=70 EU=90 30

Demo hop://geocast.azurewebsites.net/geocast/ VLDB 2014 31 hops://www.youtube.com/watch?v=4zkij9gk79s

Conclusion Introduced a novel privacy- aware framework in SC, which enables workers par/cipa/on without compromising their loca/on privacy Iden/fied geocas/ng as a needed step to preseve privacy prior to workers consen/ng to a task Provided heuris/cs and op/miza/ons for determining effec/ve geocast regions that achieve high assignment success rate with low overhead Experimental results on real datasets shows that the proposed techniques are effec/ve and the cost of privacy is prac/cal VLDB 2014 32

References Hien To, Gabriel Ghinita, Cyrus Shahabi. A Framework for Protec%ng Worker Loca%on Privacy in Spa%al Crowdsourcing. In Proceedings of the 40th Interna/onal Conference on Very Large Data Bases (VLDB 2014) Hien To, Gabriel Ghinita, Cyrus Shahabi. PriGeoCrowd: A Toolbox for Private Spa%al Crowdsourcing. (demo) In Proceedings of the 31st IEEE Interna/onal Conference on Data Engineering (ICDE 2015) VLDB 2014 33