A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing Nov 12 2014 Hien To, Gabriel Ghinita, Cyrus Shahabi VLDB 2014 1
Mo/va/on Ubiquity of mobile users 6.5 billion mobile subscrip/ons, 93.5% of the world popula/on [1] Technology advances on mobiles Smartphone's sensors. e.g., video cameras Network bandwidth improvements From 2.5G (up to 384Kbps) to 3G (up to 14.7Mbps) and recently 4G (up to 100 Mbps) VLDB 2014 2 [1] hop://mobithinking.com/mobile- marke/ng- tools/latest- mobile- stats/
Spa/al Crowdsourcing q Crowdsourcing Outsourcing a set of tasks to a set of workers q Spa/al Crowdsourcing Crowdsourcing a set of spa%al tasks to a set of workers. Spa%al task is related to a loca/on.e.g., taking pictures Loca/on privacy is one of the major impediments that may hinder workers from par/cipa/on in SC VLDB 2014 3
Problem Statement Current solu/ons require the workers to disclose their loca/ons to untrustworthy en//es, i.e., SC- server. Requesters SC- server Report loca+ons Workers A framework for protec/ng privacy of worker loca/ons, whereby the SC- server only has access to data sani/zed according to differen%al privacy. VLDB 2014 4
Outline v Background v Privacy Framework v Worker PSD (Private Spa/al Decomposi/on) v Task Assignment v Experiments VLDB 2014 5
U/lity- Privacy Trade- off 100% Utility 0% 100% Privacy 0% VLDB 2014 6
Related Work v Pseudonymity (using fake iden/ty) e.g. fake iden/ty + loca/on == resident of the home v K- anonymity model (not dis/nguish among other k records) iden//es are known the loca/on k- anonymity fails to prevent the loca/on of a subject being not iden/fiable all k users reside in the exact same loca/on k- anonymity, do not provide rigorous privacy v Cryptography such technique is computa%onal expensive =>not suitable for SC applica/ons VLDB 2014 7
ε 1 Differen/al Privacy (DP) DP ensures an adversary do not know from the sani/zed data whether an individual is present or not in the original data ε - dis$nguishability [Dwork 06] A database produces transcript U on a set of queries. Transcript U sa/sfies ε - dis/nguishability if for every pair of sibling datasets D 1 and D, D and 2 1 = D2 they differ in only one record, it holds that : privacy budget Pr[ QS ln Pr[ QS DP allows only aggregate queries, e.g., count, sum. L - sensi+vity: D D2 Given neighboring datasets 1 and, the sensi/vity of query set QS is the the maximum change in their query results q σ ( QS) = max QS( D ) QS( D ) D, D 1 2 i= 1 [Dwork 06] shows that it is sufficient to achieve Laplace noise with mean D D 1 2 = = U] U] ε ε 1 λ = σ (QS) / ε VLDB 2014 8 2 - DP by adding random
Outline v Background v Privacy Framework v Worker Private Spa/al Decomposi/on v Task Assignment v Experiments VLDB 2014 9
Privacy Framework 0. Workers send their loca/ons to a trusted CSP SC-Server Cell Service Provider 1. CSP releases a PSD according to ε. PSD is accessed by SC- server 2. SC- server receives tasks from requesters PSD 2. Task Request t 1. Sanitized Release 3. Geocast {t,gr} Worker Database 0. Report Locations 3. When SC- server receives task t, it queries the PSD to determine a GR that enclose sufficient workers. Then, SC- server ini/alizes geocast communica/on to disseminate t to all workers within GR 4. Workers confirm their availability to perform the assigned task Requesters 4. Consent Workers trust SCP GR Workers do not trust SC- server and requesters Focus on private task assignment rather than post assignment Workers VLDB 2014 10
Design Goal and Performance Metrics Protec/ng worker loca/on may reduce the effec/veness and efficiency of worker- task matching, captured by following metrics: Assignment Success Rate (ASR): measures the ra/o of tasks accepted by workers to the total number of task requests Worker Travel Distance (WTD): the average travel distance of all workers System Overhead: the average number of no/fied workers (ANW). ANW affects both communica%on overhead required to geocast task requests and the computa%on overhead of matching algorithm VLDB 2014 11
Outline v Background v Privacy Framework v Worker PSD (Private Spa+al Decomposi+on) v Task Assignment v Experiments VLDB 2014 12
Adap/ve Grid (Worker PSD) Creates a coarse- grained, fixed size 1 m 1 grid over data domain. Then issues count queries for each level- 1 cell using m 1 m 2 m1 1 ε 1 N ε = max 10, 4 k2 m m2 Par//ons each level- 1 cell into 2 m 2 level- 2 cells, is adap/vely chosen based on noisy count N' of level- 1 cell m 2 1 N' ε 2 = 4 k2 ε = ε 1 + ε 2 Level 1 Level 2 C A c1 c2 c c3 c 13 14 c 15 4 c7 c8 c16 c17 c 18 c11 VLDB c2014 12 c19 c20 c21 13 c9 c10 D c B ' ' ( N = 100) ( = 100) A N B ' ' ( N = 100) ( = 200) C N D [Qardaji 13] c5 c6
Customized AG 2 2 k / n = N' / m = Expected #workers (noisy count) in level- 2 cells 2 2 N'=100 m 2 large m n ε ε 2 2 1 0.5 3 11 0.5 0.25 2 25 0.1 0.05 1 100 L Original AG ( k 2 = 5) leads to high communica+on cost n ε ε 2 m2 1 0.5 6 2.8 0.5 0.25 5 5.6 0.1 0.05 2 28 J Customized AG ( k 2 2, p = 88%) = h Increase to decrease overhead, but only to the point where there is at least one worker in a cell The probability that the real count is larger than zero: p h 1 count = PSD 1 exp 2 VLDB 2014 1/ ε 14 2 n ε
Customized AG Original AG and Customized AG adapts to data distribu/ons Original AG minimizes overall es/ma/on error of region queries while customized AG increases the number of 2 nd level cells Yelp Dataset Original AG Customized AG VLDB 2014 15
Outline v Background v Privacy Framework v Worker PSD (Private Spa/al Decomposi/on) v Task Assignment v Experiments VLDB 2014 16
Analy/cal U/lity Model We define Acceptance Rate as a decreasing func/on of task- worker distance (e.g. linear, Zipian) p a a = F( d); 0 p 1 SC- server establishes an Expected U%lity ( EU ) threshold, which is the a targeted success rate for a task. EU > p. X is a random variable for an event that a worker accepts a received task P( X a = True) = p ; P( X = False) = 1 p a Assuming w independent workers. U is the probability that at least one worker accepts the task X ~ Binomial ( w, p ) U = 1 (1 p a ) w a VLDB 2014 17
Acceptance Rate Func/ons 0.5 Acceptacerate 0 distance MTD VLDB 2014 18
Geocast Region Construc/on Determines a small region that contains sufficient workers Greedy Algorithm (GDY) 1. Init GR = {}, max- heap of candidates Q = { the cell that contains } 2. c i Q 4. If U EU, return GR 5. neighbors = { ci ' s neighbors} GR MTD Q Q 3. U (1 U )(1 ) 1 U c i = Q neighbors 6. ; Go to 2. t c1 c2 c c 3 4 c9 c10 c c 11 12 t c5 c6 c7 c8 c 13 c 14 c16 c17 c 15 c 18 c19 c20 c21 VLDB 2014 19
Par/al Cell Selec/on L The number of workers can s/ll be large with AG, especially when ε 2 small Allow par$al cell inclusion on the lastly added cell ci c i t 7 t 6 Sub-cell c i ' t 5 c1 c2 c3 c4 c5 c6 c7 c8 t 8 t 0 t1 2 t 4 t t3 Splitng c i c c 10 9 t c11 c12 c 13 c 14 c16 c17 Splitng 7 c 15 c 18 c19 c20 c21 c VLDB 2014 20
Communica/on Cost Cellular Internet WLAN The more compact the GR, the lower the cost Digital Compactness Measurement [Kim 84] area( GR) DCM = area( MIN BALL) Measurement: Hop count Farthest distance between twoworkers = 2 Communicationrange c1 c2 c 3 Mobile Ad- hoc Networks Infrastructure- based Mode v.s Infrastructure- less Mode c 9 c 10 c 13 c 14 c 15 c16 c17 c c11 c 18 12 VLDB 2014 21 c19 c20 c21 t c 4 c5 c6 c7 c8
Geocast Regions A B C D VLDB 2014 22
Outline Background Privacy Framework Worker PSD (Private Spa/al Decomposi/on) Task Assignment Experiments VLDB 2014 23
Experimental Setup Datasets Name #Tasks #Workers MTD (km) Gowalla 151,075 6,160 3.6 Yelp 15,583 70,817 13.5 Assump/ons Gowalla and Yelp users are workers Check- in points (i.e., of restaurants) are task loca/ons Parameter setngs ε = {0.1, 0.4, 0.7,1} 1000 random tasks x 10 seeds EU ={0.3,0.5,0.7,0.9} MaxAR = {0.1, 0.4, 0.7,1} VLDB 2014 24
GR Construc/on Heuris/cs (Gow.- Linear) GDY = geocast (GREedy algorithm) + original Adap/ve grid (AG) [Qardaji 13] G- GR = geocast + AG with customized GRanularity G- PA = geocast with PAr/al cell selec/on + original Adap/ve grid (AG) G- GP = geocast with Par/al cell selec/on + AG with customized Granularity 120 100 80 60 40 20 ANW WTD- FC HOP 0.5 GDY G-GR 0.4 G-PA G-GP 0.3 0.2 0.1 GDY G-GR 8 G-PA G-GP GDY 6 G-PA 4 2 G-GR G-GP 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 VLDB 2014 25
Effect of Grid Size to ASR 100 90 Over-provision Under-provision 80 ASR 70 60 Gowalla-Linear Gowalla-Zipf Yelp-Linear Yelp-Zipf 50 0.1 0.2 0.4 0.8 1.41 1.6 3.2 6.4 12.8 25.6 k2 Average ASR over all values of budget by varying k2 VLDB 2014 26
Compactness- based Heuris/cs (Yelp- Zipf) 10 HOP 80 ANW 8 6 4 G-GP-Pure 2 G-GP-Hybrid G-GP-Compact 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 60 40 G-GP-Pure 20 G-GP-Hybrid G-GP-Compact 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 VLDB 2014 27
Overhead of Archieving Privacy (Gow.- Zipf) 60 ANW WTD- FC ASR 0.4 100 40 20 Privacy Non-Privacy 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0.3 0.2 0.1 Privacy Non-Privacy 0 Eps=0.1 Eps=0.4 VLDB 2014 Eps=0.7 Eps=1 80 60 40 Privacy 20 Non-Privacy 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 28
Effect of Varying MAR (Yelp- Linear) 50 40 ANW Eps=0.1 Eps=0.7 Eps=0.4 Eps=1 0.4 0.3 WTD- FC Eps=0.1 Eps=0.7 Eps=0.4 Eps=1 8 6 CELL Eps=0.1 Eps=0.7 Eps=0.4 Eps=1 30 20 0.2 4 10 0.1 2 0 AR=0.1 AR=0.4 AR=0.7 AR=1 0 AR=0.1 AR=0.4 AR=0.7 AR=1 0 AR=0.1 AR=0.4 AR=0.7 AR=1 VLDB 2014 29
Effect of Varying EU (Yelp- Linear) ANW 50 40 30 20 10 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 EU=30 EU=50 EU=70 EU=90 WTD- FC 0.4 0.3 0.2 0.1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 VLDB 2014 EU=30 EU=50 EU=70 EU=90 CELL 8 Eps=0.1 Eps=0.4 6 Eps=0.7 Eps=1 4 2 0 EU=30 EU=50 EU=70 EU=90 30
Demo hop://geocast.azurewebsites.net/geocast/ VLDB 2014 31 hops://www.youtube.com/watch?v=4zkij9gk79s
Conclusion Introduced a novel privacy- aware framework in SC, which enables workers par/cipa/on without compromising their loca/on privacy Iden/fied geocas/ng as a needed step to preseve privacy prior to workers consen/ng to a task Provided heuris/cs and op/miza/ons for determining effec/ve geocast regions that achieve high assignment success rate with low overhead Experimental results on real datasets shows that the proposed techniques are effec/ve and the cost of privacy is prac/cal VLDB 2014 32
References Hien To, Gabriel Ghinita, Cyrus Shahabi. A Framework for Protec%ng Worker Loca%on Privacy in Spa%al Crowdsourcing. In Proceedings of the 40th Interna/onal Conference on Very Large Data Bases (VLDB 2014) Hien To, Gabriel Ghinita, Cyrus Shahabi. PriGeoCrowd: A Toolbox for Private Spa%al Crowdsourcing. (demo) In Proceedings of the 31st IEEE Interna/onal Conference on Data Engineering (ICDE 2015) VLDB 2014 33