Differentially Private Real-time Data Release over Infinite Trajectory Streams

Size: px
Start display at page:

Download "Differentially Private Real-time Data Release over Infinite Trajectory Streams"

Transcription

1 Differentially Private Real-time Data Release over Infinite Trajectory Streams Kyoto University, Japan Department of Social Informatics Yang Cao, Masatoshi Yoshikawa 1

2 Outline Motivation: opportunity & privacy risk Problem definition and analysis Proposed solution Experiment results Conclusion & Future work 2

3 Motivation: Opportunity A great opportunity to utilize personal real-life data Easy to collect Life log data, along with people s trajectory. <uid,time,loc, > Trajectory streams are consisting of many people s trajectories Statistics of trajectory streams are useful e.g., Q count {how many people at Pittsburgh station now?} A trajectory: time sequenced locations 3

4 Motivation: Opportunity A great opportunity to utilize personal real-life data Easy to collect Life log data, along with people s trajectory. <uid,time,loc, > Trajectory streams are consisting of many people s trajectories Statistics of trajectory streams are useful e.g., Count: How many people at Pittsburgh station now? A trajectory: time sequenced locations e.g., Count: How many people at Pittsburgh station with Heart Rate>100 now? Health-aware Navigation System Marketing Analysis Intelligent Transportation System Leverage statistics of trajectory streams to data-based innovations! 4

5 Motivation: Privacy Risk Publish Statistics (of personal data) is risky 5

6 Motivation: Privacy Risk Publish Statistics (of personal data) is risky E.g., Release the data of Q1: COUNT(Sex=Female)=A. Q2: COUNT(Sex=Female OR What if B=A+1? Name Age Sex Dis. u1 40 M HIV u2 30 M - (Age=40 & Sex=Male & Employer= u1 )) = B 6

7 Motivation: Privacy Risk Publish Statistics (of personal data) is risky E.g., Release the data of Q1: COUNT(Sex=Female)=A. Q2: COUNT(Sex=Female OR If B=A+1 (Age=40 & Sex=Male & Employer= u1 )) = B Q3: COUNT(Sex=Female OR (Age=42 & Sex=Male & Employer= u1 ) & Diagnosis= HIV ) = C C = 1 or 0 Name Age Sex Dis. u1 40 M HIV u2 30 M - 7

8 Motivation: Privacy Risk Publish Statistics (of personal data) is risky E.g., Release the data of Q1: COUNT(Sex=Female)=A. Q2: COUNT(Sex=Female OR If B=A+1 (Age=40 & Sex=Male & Employer= u1 )) = B Q3: COUNT(Sex=Female OR (Age=42 & Sex=Male & Employer= u1 ) & Diagnosis= HIV ) = C C = 1 or 0 Positively or negatively compromised! Name Age Sex Dis. u1 40 M HIV u2 30 M - 8

9 Motivation: Privacy Risk Publish Statistics (of personal data) is risky Linkage attack on anonymized data [1][2] Released database An attacker s database [1] L. Sweeney, Simple demographics often identify people uniquely, Health (San Francisco), [2] C. Dwork, A Firm Foundation for Private Data Analysis, Commun. ACM, Jan

10 Motivation: Privacy Risk Publish Statistics (of personal data) is risky Linkage attack [1][2] Personal Trajectory data is highly sensitive! Four spatiotemporal data points can identify 95% of the individuals [3] Goal: to publish statistics of trajectory streams by a Privacy Preserving Data Publishing (PPDP) method, for open data utilizing untrusted cloud services data mining outsourcing [1] L. Sweeney, Simple demographics often identify people uniquely, Health (San Francisco), [2] C. Dwork, A Firm Foundation for Private Data Analysis, Commun. ACM, Jan [3]Y.-A. de Montjoye et al, Unique in the Crowd: The privacy bounds of human mobility, Sci. Rep.,Mar

11 Our contributions A rigorous and flexible PPDP framework over infinite trajectory streams The first definition of personalized privacy model for spatiotemporal data The protection is based on ε-differential Privacy. rigorous Designed algorithms to publish counts in real-time. E.g., Counts: How many people at Pittsburgh station now? Published data utility is better than the previous results. flexible real-time trajectory data users privacy preferences Privacy Model & PPDP algorithm sensitive data raw statistics in real-time noisy data publishable! Private Data 11

12 Our contributions A rigorous and flexible PPDP framework over infinite trajectory streams The first definition of personalized privacy model for spatiotemporal data The protection is based on ε-differential Privacy. rigorousness Designed algorithms to publish counts in real-time. E.g., Counts: How many people at Pittsburgh station now Personlized privacy & Published data utility is better. flexibility real-time trajectory data users privacy preferences Privacy Model & PPDP algorithm sensitive data raw statistics in real-time noisy data publishable! Private Data 12

13 Outline Motivation: opportunity & privacy risk Problem definition and analysis Proposed solution Experiment result Conclusion & Future work 13

14 Problem Definition PPDP over infinite trajectory streams Data collection process as follows: Trusted Server uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar (b) trajectory representation of raw data safely publish risk locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics (a) raw data 14

15 Problem Definition PPDP over infinite trajectory streams How to transform (c) to safe version (c), while keeping it similar to (c) as much as possible? Trusted Server safely publish uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar (a) raw data t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar (b) trajectory representation of raw data risk locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics 15

16 Problem Definition PPDP over infinite trajectory streams How to transform (c) to safe version (c), while keeping it similar to (c) as much as possible? Ad-hoc methods CANNOT provide a reliable privacy guarantee e.g., method of deleting the values of 1 It is hard to model the attacker s background knowledge in this big data locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics risk Ad-hoc PPDP algorithm locs t1 t2 t3 t4 t5 park office bar gym (c) safe(?) statistics 16

17 Differential Privacy: A rigorous privacy definition ε-differential Privacy (ε-dp) [4] As a de facto privacy standard for statistical data publishing. Randomized algorithm A achieve ε-dp, if it satisfies Pr[A(Q(D))] Pr[A(Q(D*))] eε, ε > 0 D*:database except any one individual s data then A achieves ε-dp ; ε is a given positive parameter. D Q( ) D* Q( ) ε: Privacy budget unitary privacy level control. Robust under attacker with arbitrary background knowl. Laplace Mechanism & Exponential Mechanism can be used as sub-procedure of A [4] C. Dwork, et al., Calibrating Noise to Sensitivity in Private Data Analysis, TCC [5] F. McSherry and K. Talwar, Mechanism Design via Differential Privacy, FOCS,

18 Differential Privacy: A rigorous privacy definition ε-differential Privacy (ε-dp) [4] As a de facto privacy standard for statistical data publishing. Randomized algorithm A achieve ε-dp, if it satisfies Pr[A(Q(D))] Pr[A(Q(D*))] eε, ε > 0 D*:database except any one individual s data then A achieves ε-dp ; ε is a given positive parameter. D A( Q( )) D* A( Q( )) ε: Privacy budget unitary privacy level control. Robust under attacker with arbitrary background knowl. Laplace Mechanism & Exponential Mechanism can be used as sub-procedure of A [4] C. Dwork, et al., Calibrating Noise to Sensitivity in Private Data Analysis, TCC [5] F. McSherry and K. Talwar, Mechanism Design via Differential Privacy, FOCS, Less noise More High Data utility Low Low Privacy Level High ε 0 18

19 Differential Privacy: A rigorous privacy definition ε-differential Privacy (ε-dp) [4] As a de facto privacy standard for statistical data publishing. Randomized algorithm A achieve ε-dp, if it satisfies Pr[A(Q(D))] Pr[A(Q(D*))] eε, ε > 0 D*:database except any one individual s data then A achieves ε-dp ; ε is a given positive parameter. D A( Q( )) D* A( Q( )) ε: Privacy budget unitary privacy level control. Robust under Linkage attack. Laplace Mechanism [4] & Exponential Mechanism [5] Less noise High Data utility can be used as sub-procedure of A [4] C. Dwork, et al., Calibrating Noise to Sensitivity in Private Data Analysis, TCC [5] F. McSherry and K. Talwar, Mechanism Design via Differential Privacy, FOCS, More Low Low Privacy Level High ε 0 19

20 Problem Analysis PPDP over infinite trajectory streams How to apply ε-dp to PPDP of infinite trajectory streams? depends on what we want to protect! 20

21 Problem Analysis PPDP over infinite trajectory streams How to apply ε-dp to PPDP of infinite trajectory streams? depends on what we want to protect! 21

22 Problem Analysis PPDP over infinite trajectory streams How to apply ε-dp to PPDP of infinite trajectory streams? depends on what we want to protect! Two naive methods: (1) to protect data of each one timestamp. (2) to protect each user s data of all timestamps. However, it is not safe that protecting only 1 data point uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar (a) raw data ε ε ε ε ε t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar (b) trajectory representation of raw data risk locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics 22

23 Problem Analysis PPDP over infinite trajectory streams How to apply ε-dp to PPDP of infinite trajectory streams? depends on what we want to protect! Two naive methods: (1) to protect data of each one timestamp. (2) to protect each user s data of all timestamps. However, it is unrealistic for infinite trajectory streams uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar (a) raw data ε ε ε t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar (b) trajectory representation of raw data risk locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics 23

24 Problem Analysis PPDP over infinite trajectory streams How to apply ε-dp to PPDP of infinite trajectory streams? depends on what we want to protect! Two naive methods: (1) to protect data of each one timestamp. (2) to protect each user s data of all timestamps. Using Laplace Mechanism [4] as sub-procedure: uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar (a) raw data ε ε ε t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar (b) trajectory representation of raw data risk locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics [4] C. Dwork, et al., Calibrating Noise to Sensitivity in Private Data Analysis, TCC

25 Problem Analysis PPDP over infinite trajectory streams How to apply ε-dp to PPDP of infinite trajectory streams? depends on what we want to protect! Two naive methods: (1) to protect data of each one timestamp. (2) to protect each user s data of all timestamps. Using Laplace Mechanism [4] as sub-procedure: risk locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics adding special Laplace noise to counts PPDP algo. A locs t1 t2 t3 t4 t5 park office bar gym (c) ε-dp statistics [4] C. Dwork, et al., Calibrating Noise to Sensitivity in Private Data Analysis, TCC

26 Problem Analysis PPDP over infinite trajectory streams How to apply ε-dp to PPDP of infinite trajectory streams? depends on what we want to protect! Two naive methods: (1) to protect data of each one timestamp. (2) to protect each user s data of all timestamps. Our observation: In real-life, individuals may have different requirements on privacy. ε ε ε ε ε t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar (1) protecting any one patiotemporal data point ε ε ε t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar (2) protecting all length of trajectories 26

27 Problem Analysis PPDP over infinite trajectory streams How to apply ε-dp to PPDP of infinite trajectory streams? depends on what we want to protect! Two naive methods: (1) to protect data of each one timestamp. (2) to protect each user s data of all timestamps. Our observation: In real-life, individuals may have different requirements on privacy. 2-trajectory 3-trajectory ll1 ll2 ll3 privacy preference of user i overall privacy : ε t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar (our method) to protect any ll-trajectory ll successive spatio-temporal data points 27

28 Related work Differential Privacy on finite streams (1) to protect data of each one timestamp. event-level privacy [8] (2) to protect each user s data of all timestamps. user-level privacy [8] FAST [7]: Laplace + Kalman filter (predict/correct the noisy data) Differential Privacy on infinite streams w-event privacy [6] [6]G. Kellaris et al, Differentially Private Event Sequences over Infinite Streams, VLDB 14. [7]L. Fan et al, FAST: Differentially Private Real-time Aggregate Monitor with Filtering and Adaptive Sampling, SIGMOD 13. [8] C. Dwork, Differential Privacy in New Settings., SODA, pp ,

29 Related work Differential Privacy on finite streams (1) to protect data of each one timestamp. event-level privacy [8] (2) to protect each user s data of all timestamps. user-level privacy [8] FAST [7]: Laplace + Kalman filter (predict/correct the noisy data) Differential Privacy on infinite streams w-event privacy [6] ε ε ε ε ε t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar (1) event-level privacy ε ε ε t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar (2) user-level privacy [7]L. Fan et al, FAST: Differentially Private Real-time Aggregate Monitor with Filtering and Adaptive Sampling, SIGMOD 13. [8] C. Dwork, Differential Privacy in New Settings., SODA, pp ,

30 Related work Differential Privacy on finite streams (1) to protect data of each one timestamp. event-level privacy [8] (2) to protect each user s data of all timestamps. user-level privacy [8] FAST [7]: Laplace + Kalman filter (predict/correct the noisy data) Differential Privacy on infinite streams w-event privacy [6] t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar ll-trajectory privacy(ll=3) (this study) t1 t2 t3 t4 t5 u1 park bar u2 bar park u3 park office gym bar w-event privacy (w=3) [6] [6]G. Kellaris et al, Differentially Private Event Sequences over Infinite Streams, VLDB

31 Outline Motivation: opportunity & privacy risk Problem definition and analysis Proposed solution Experiment result Conclusion & Future work 31

32 Overview of our solution A rigorous and flexible privacy model: ll-trajectory privacy Challenge: proof of how to achieve it ll1 ll2 ll3 PPDP algorithm to publish counts in real-time. u 1 u 2 u 3 Challenge: real-time & infinite streams design a Greedy Algorithm (GA) to dynamically add noise at each timestamp; Challenge: to much noise re-publish the noisy data with Minimum Manhattan Distance (MMD) to the current data ll-trajectory PPDP algorithm privacy model t1 t2 t3 t4 t5 pa ba rk r ba pa r rk pa rk offi ce gy ba m r (b) Infinite trajectories risk loc t t t t t par offi bar gy (c) raw statistics Laplace Mechanism re-publish strategy loc t t t t t par offi bar gy (c) l-trajectory private data 32

33 Overview of our solution A rigorous and flexible privacy model: ll-trajectory privacy Challenge: proof of how to achieve it by DP mechanism ll1 ll2 ll3 PPDP algorithm to publish counts in real-time. u 1 u 2 u 3 Challenge: real-time & infinite streams design a Greedy Algorithm (GA) to dynamically add noise at each timestamp; Challenge: to much noise re-publish the noisy data with Minimum Manhattan Distance (MMD) to the current data ll-trajectory PPDP algorithm privacy model t1 t2 t3 t4 t5 pa ba rk r ba pa r rk pa rk offi ce gy ba m r (b) Infinite trajectories risk loc t t t t t par offi bar gy (c) raw statistics Laplace Mechanism re-publish strategy loc t t t t t par offi bar gy (c) l-trajectory private data 33

34 Overview of our solution A rigorous and flexible privacy model: ll-trajectory privacy Challenge: proof of how to achieve it by DP mechanism ll1 ll2 ll3 PPDP algorithm to publish counts in real-time. u 1 u 2 u 3 Challenge: real-time & infinite streams design a Greedy Algorithm (GA) to dynamically add noise at each timestamp; Challenge: to much noise re-publish the noisy data with Minimum Manhattan Distance (MMD) to the current data ll-trajectory PPDP algorithm privacy model t1 t2 t3 t4 t5 pa ba rk r ba pa r rk pa rk offi ce gy ba m r (b) Infinite trajectories risk loc t t t t t par offi bar gy (c) raw statistics Laplace Mechanism GA re-publish strategy loc t t t t t par offi bar gy (c) l-trajectory private data 34

35 Overview of our solution A rigorous and flexible privacy model: ll-trajectory privacy Challenge: proof of how to achieve it by DP mechanism ll1 ll2 ll3 PPDP algorithm to publish counts in real-time. u 1 u 2 u 3 Challenge: real-time & infinite streams design a Greedy Algorithm (GA) to dynamically add noise at each timestamp; Challenge: to much noise re-publish the noisy data with Minimum Manhattan Distance (MMD) to the current data ll-trajectory PPDP algorithm privacy model t1 t2 t3 t4 t5 pa ba rk r ba pa r rk pa rk offi ce gy ba m r (b) Infinite trajectories risk loc t t t t t par offi bar gy (c) raw statistics Laplace Mechanism GA re-publish strategy MMD loc t t t t t par offi bar gy (c) l-trajectory private data 35

36 Overview of our solution A rigorous and flexible privacy model: ll-trajectory privacy Challenge: proof of how to achieve it by DP mechanism ll1 ll2 ll3 PPDP algorithm to publish counts in real-time. u 1 u 2 u 3 Challenge: real-time & infinite streams design a Greedy Algorithm (GA) to dynamically add noise at each timestamp; Challenge: to much noise re-publish the noisy data with Minimum Manhattan Distance (MMD) to the current data ll-trajectory PPDP algorithm privacy model t1 t2 t3 t4 t5 pa ba rk r ba pa r rk pa rk offi ce gy ba m r (b) Infinite trajectories risk loc t t t t t par offi bar gy (c) raw statistics Laplace Mechanism GA re-publish strategy MMD loc t t t t t par offi bar gy (c) l-trajectory private data 36

37 How to achieve ll-trajectory privacy We proved that how to achieve it by conventional DP mechanisms (e.g., Laplace/Exponential mechanism) ε i is privacy budget variables at each ti the sum of ε i at timestamp i of any ll-trajectory should be less than ε uid tim loc. u1 t1 e park t1 t2 t3 t4 t5 u3 t1 park u2 t2 bar u1 park bar ll1 u3 t2 office u2 bar park ll2 u3 t3 gym u1 t5 bar u3 park office gym bar ll3 u2 t5 park u3 t5 bar (a) raw data privacy budget variables: ε1 ε2 ε3 ε4 ε5 (b) trajectory representation of raw data locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics risk 37

38 How to achieve ll-trajectory privacy We proved that how to achieve it by conventional DP mechanisms (e.g., Laplace/Exponential mechanism) ε i is privacy budget variables at each ti the sum of ε i at timestamp i of any ll-trajectory should be less than ε ε1+ε5 ε ε1+ε2+ε3 ε uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar (a) raw data privacy budget variables: ε1 ε2 ε3 ε4 ε5 t1 t2 t3 t4 t5 u1 park bar ll1 u2 bar park ll2 u3 park office gym bar ll3 (b) trajectory representation of raw data ε2+ε5 ε ε2+ε3+ε5 ε locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics risk 38

39 uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar (a) raw data How to achieve ll-trajectory privacy We proved that how to achieve it by conventional DP mechanisms (e.g., Laplace/Exponential mechanism) If we know all data in advance: ε i is privacy budget variables at each ti using Linear Programming the sum of ε i at timestamp i of any Maximize: ll-trajectory should be less than ε ε1+ε5 ε ε1+ε2+ε3 ε privacy budget variables: ε2+ε5 ε ε1 ε2 ε3 ε4 ε5 ε2+ε3+ε5 ε t1 t2 t3 t4 t5 u1 park bar ll1 u2 bar park ll2 u3 park office gym bar ll3 (b) trajectory representation of raw data approximate Maximum Utility locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics risk 39

40 How to achieve ll-trajectory privacy We proved that how to achieve it by conventional DP mechanisms (e.g., Laplace/Exponential mechanism) If we know all data in advance: privacy budget variables ε However, we need: i at each ti using Real-time Linear publishing Programming & the sum of ε i at timestamp i of any for Maximize: ll-trajectory Infinite streams should be less than ε ε1+ε5 ε ε1+ε2+ε3 ε privacy budget variables: ε2+ε5 ε ε1 ε2 ε3 ε4 ε5 ε2+ε3+ε5 ε uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar (a) raw data t1 t2 t3 t4 t5 u1 park bar ll1 u2 bar park ll2 u3 park office gym bar ll3 (b) trajectory representation of raw data locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics risk 40

41 How to achieve ll-trajectory privacy We proved that how to achieve it by conventional DP mechanisms (e.g., Laplace/Exponential mechanism) privacy budget variables ε i at each ti the sum of ε i at timestamp i of any ll-trajectory should be less than ε ε1+ε5 ε? ε1+ε2+ε3 ε privacy budget variables: ε2+ε5 ε? ε1 ε2 ε3 ε4 ε5 ε2+ε3+ε5 ε? uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar (a) raw data t1 t2 t3 t4 t5 u1 park bar ll1 u2 bar park ll2 u3 park office gym bar ll3 (b) trajectory representation of raw data locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics risk 41

42 PPDP algorithm: GA GA: Greedy Algorithm to get approximately optimal ε i by incomplete information uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar (a) raw data privacy budget variables: ε1 ε2 ε3 ε4 ε5 t1 t2 t3 t4 t5 u1 park bar ll1 u2 bar park ll2 u3 park office gym bar ll3 (b) trajectory representation of raw data ε1+ε5 ε? ε1+ε2+ε3 ε ε2+ε5 ε? ε2+ε3+ε5 ε? locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics risk 42

43 PPDP algorithm: GA GA: Greedy Algorithm to get approximately optimal ε i by incomplete information idea: exponential decay (1) set? =0, then compute ε3=w (2) use w/2 as the value of ε3 (reserve w/2 for? ) uid tim loc. u1 t1 e park u3 t1 park u2 t2 bar u3 t2 office u3 t3 gym u1 t5 bar u2 t5 park u3 t5 bar (a) raw data privacy budget variables: ε1 ε2 ε3 ε4 ε5 t1 t2 t3 t4 t5 u1 park bar ll1 u2 bar park ll2 u3 park office gym bar ll3 (b) trajectory representation of raw data ε1+ε5 ε? ε1+ε2+ε3 ε ε2+ε5 ε? ε2+ε3+ε5 ε? locs t1 t2 t3 t4 t5 park office bar gym (c) raw statistics risk 43

44 PPDP algorithm: MMD Republishing strategy to improve data utility of counts idea: in real-life data, the data has periodically repeated pattern re-publish Adjacent noisy (Adj) data (has been studied in [6]) re-publish noisy data who is holding Minimum Manhattan Distance (MMD) to real data of current timestamp (using Exponential Mechanism[5]) published noisy data n1 ~ nt-1 the current real data rt counts time= nmmd or nadj [5] F. McSherry and K. Talwar, Mechanism Design via Differential Privacy, FOCS, [6]G. Kellaris et al, Differentially Private Event Sequences over Infinite Streams, VLDB

45 Personalized privacy control Proposed framework dynamic budget allocation (GA) Private Approximation strategy PPDP framework collected database at timestamp t ε 1 ε t-1 us 1 us t n 1 n t-1 r t GA Dynamic Budget Allocation set of uids us t real statistics r t ε t,1 ε t,2 D t Private Approximation Strategy Private Publishing stream Adj or MMD Private publishing ε t private statistics n t capable of publishing diverse statistical data over infinite trajectory streams Adj / MMD is optimized for counts publishing 45

46 Outline Motivation: opportunity & privacy risk Problem definition and analysis Proposed solution Experiment result Future work 46

47 Experiments Four real-life trajectory datasets Desc. PeopleFlow 1 Geolife 2 T-Drive 3 WorldCup98 4 people moving; people moving; diverse type of mobility; sparse;beijing taxis moving; Beijing webpage click streams Locs. Amt ,000 users Amt. 11, , ,762 TimeStamps Amt. 1,694 1, Interval of TS 5 mins 1 min 10 mins 1 hour Length of TS 6 days 24 hours* ~7 days ~35 days Datapoints Amt. 102, ,990 37,255 1,258, *.Aggregating 50,176 hours timestamps to 24 hours by omitting the date. 47

48 Experiments Budget Allocation Approximation Strategy realtime Utility Evaluation Uniform uniformly X O bad LP Approximately global optimised X X normal GA+Adj dynamically republish Adj O better GA+MMD dynamically republish MMD O Best FAST fix [7] uniformly _ O better,not stable [7]L. Fan et al, FAST: Differentially Private Real-time Aggregate Monitor with Filtering and Adaptive Sampling, SIGMOD

49 Experiments Real data v.s. Noisy data(ll=20,ε=1) real data GA+MMD FAST Uniform PeopleFlow counts real data GA+MMD FAST [7] timestamp

50 Experiments Metrics Mean of Absolute Error (MAE) MAE(R, N) = 1 T * locs T locs i=1 j=1 r i [ j] n i [ j] All of them are the lower, the better Mean of Square Error (MSE) sensitivity to large error MSE(R, N) = 1 T * locs T locs i=1 j=1 r i [ j] n i [ j] 2 KL-divergence similarity of two distribution (the lower the more similar ) D KL (R N) = 1 T T locs! ln r i[ j] * $ # & r " n i [ j] * i [ j] * % i=1 j=1 50

51 Experiments MAE/MSE by varying ll (ε=1) PeopleFlow Geolife T-Drive WorldCup98 1E+03 1E+02 UNIFORM FAST GA+Adj GA+MMD 1E+03 1E+02 UNIFORM FAST GA+Adj GA+MMD 1E+03 1E+02 UNIFORM FAST GA+Adj GA+MMD 1E+03 1E+02 UNIFORM FAST GA+Adj GA+MMD MAE 1E+01 1E+01 1E+01 1E+01 1E+00 1E E E ll ll ll ll PeopleFlow Geolife T-Drive WorldCup98 1E+06 1E+05 Uniform GA+Adj FAST GA+MMD 1E+06 1E+05 Uniform GA+Adj FAST GA+MMD 1E+06 1E+05 Uniform GA+Adj FAST GA+MMD 1E+06 1E+05 Uniform GA+Adj FAST GA+MMD MSE 1E+04 1E+03 1E+04 1E+03 1E+04 1E+03 1E+04 1E+03 1E+02 1E+02 1E+02 1E+02 1E+01 1E ll 1E ll 1E ll ll 51

52 Experiments MAE/MSE by varying ε (ll=20) PeopleFlow Geolife T-Drive WorldCup98 1E+06 1E+05 1E+04 UNIFORM FAST GA+Adj GA+MMD 1E+06 1E+04 UNIFORM FAST GA+Adj GA+MMD 1E+06 1E+04 UNIFORM FAST GA+Adj GA+MMD 1E+06 1E+04 UNIFORM FAST GA+Adj GA+MMD MAE 1E+03 1E+02 1E+02 1E+02 1E+02 1E+01 1E+00 1E+00 1E+00 1E E E E E ε ε ε ε PeopleFlow Geolife T-Drive WorldCup98 1E+12 1E+09 UNIFORM FAST GA+Adj GA+MMD 1E+12 1E+08 UNIFORM FAST GA+Adj GA+MMD 1E+12 1E+09 UNIFORM FAST GA+Adj GA+MMD 1E+12 1E+09 UNIFORM FAST GA+Adj GA+MMD MSE 1E+06 1E+06 1E+06 1E+03 1E+04 1E+03 1E+03 1E+00 1E+00 1E+00 1E E E E E ε ε ε ε 52

53 Outline Motivation: opportunity & privacy risk Problem definition and analysis Proposed solution Experiment result Conclusion & Future work 53

54 Conclusion Proposed a rigorous and flexible privacy model for spatio-temporal data: l-trajectory privacy; Personal Data application: PPDM Privacy Models business model: privacy as services/money PPDP algorithms PPDM algorithms PPDP Framework for spatio-temporal data; safe OpenData safe MiningResult Algorithms GA+MMD for publishing private counts with high utility in real-time. 54

55 Future Work Improve algorithm GA+MMD for private counts the counts should be non-negative & integer More flexible privacy models ll-trajectory privacy Location-based privacy model PPDP/ PPDM for mining personal spatio-temporal data 55

56 Thank you! Any question? 56

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks. Yang Cao Emory University

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks. Yang Cao Emory University Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks Yang Cao Emory University 207..5 Outline Data Mining with Differential Privacy (DP) Scenario: Spatiotemporal Data

More information

The Optimal Mechanism in Differential Privacy

The Optimal Mechanism in Differential Privacy The Optimal Mechanism in Differential Privacy Quan Geng Advisor: Prof. Pramod Viswanath 11/07/2013 PhD Final Exam of Quan Geng, ECE, UIUC 1 Outline Background on Differential Privacy ε-differential Privacy:

More information

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada Maryam Shoaran Alex Thomo Jens Weber University of Victoria, Canada Introduction Challenge: Evidence of Participation Sample Aggregates Zero-Knowledge Privacy Analysis of Utility of ZKP Conclusions 12/17/2015

More information

A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing

A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing Nov 12 2014 Hien To, Gabriel Ghinita, Cyrus Shahabi VLDB 2014 1 Mo/va/on Ubiquity of mobile users 6.5 billion mobile subscrip/ons,

More information

The Optimal Mechanism in Differential Privacy

The Optimal Mechanism in Differential Privacy The Optimal Mechanism in Differential Privacy Quan Geng Advisor: Prof. Pramod Viswanath 3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 1 Outline Background on differential privacy Problem formulation

More information

Enabling Accurate Analysis of Private Network Data

Enabling Accurate Analysis of Private Network Data Enabling Accurate Analysis of Private Network Data Michael Hay Joint work with Gerome Miklau, David Jensen, Chao Li, Don Towsley University of Massachusetts, Amherst Vibhor Rastogi, Dan Suciu University

More information

CMPUT651: Differential Privacy

CMPUT651: Differential Privacy CMPUT65: Differential Privacy Homework assignment # 2 Due date: Apr. 3rd, 208 Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course, you are encouraged

More information

Human resource data location privacy protection method based on prefix characteristics

Human resource data location privacy protection method based on prefix characteristics Acta Technica 62 No. 1B/2017, 437 446 c 2017 Institute of Thermomechanics CAS, v.v.i. Human resource data location privacy protection method based on prefix characteristics Yulong Qi 1, 2, Enyi Zhou 1

More information

Lecture 11- Differential Privacy

Lecture 11- Differential Privacy 6.889 New Developments in Cryptography May 3, 2011 Lecture 11- Differential Privacy Lecturer: Salil Vadhan Scribes: Alan Deckelbaum and Emily Shen 1 Introduction In class today (and the next two lectures)

More information

Differentially Private Event Sequences over Infinite Streams

Differentially Private Event Sequences over Infinite Streams Differentially Private Event Sequences over Infinite Streams Georgios Kellaris 1 Stavros Papadopoulos 2 Xiaokui Xiao 3 Dimitris Papadias 1 1 HKUST 2 Intel Labs / MIT 3 Nanyang Technological University

More information

Pufferfish Privacy Mechanisms for Correlated Data. Shuang Song, Yizhen Wang, Kamalika Chaudhuri University of California, San Diego

Pufferfish Privacy Mechanisms for Correlated Data. Shuang Song, Yizhen Wang, Kamalika Chaudhuri University of California, San Diego Pufferfish Privacy Mechanisms for Correlated Data Shuang Song, Yizhen Wang, Kamalika Chaudhuri University of California, San Diego Sensitive Data with Correlation } Data from social networks, } eg. Spreading

More information

Database Privacy: k-anonymity and de-anonymization attacks

Database Privacy: k-anonymity and de-anonymization attacks 18734: Foundations of Privacy Database Privacy: k-anonymity and de-anonymization attacks Piotr Mardziel or Anupam Datta CMU Fall 2018 Publicly Released Large Datasets } Useful for improving recommendation

More information

Rényi Differential Privacy

Rényi Differential Privacy Rényi Differential Privacy Ilya Mironov Google Brain DIMACS, October 24, 2017 Relaxation of Differential Privacy (ε, δ)-differential Privacy: The distribution of the output M(D) on database D is (nearly)

More information

Locally Differentially Private Protocols for Frequency Estimation. Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha

Locally Differentially Private Protocols for Frequency Estimation. Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha Locally Differentially Private Protocols for Frequency Estimation Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha Differential Privacy Differential Privacy Classical setting Differential Privacy

More information

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism Privacy of Numeric Queries Via Simple Value Perturbation The Laplace Mechanism Differential Privacy A Basic Model Let X represent an abstract data universe and D be a multi-set of elements from X. i.e.

More information

Report on Differential Privacy

Report on Differential Privacy Report on Differential Privacy Lembit Valgma Supervised by Vesal Vojdani December 19, 2017 1 Introduction Over the past decade the collection and analysis of personal data has increased a lot. This has

More information

Personalized Social Recommendations Accurate or Private

Personalized Social Recommendations Accurate or Private Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma Outline Introduction Motivation The model General

More information

Differential Privacy and its Application in Aggregation

Differential Privacy and its Application in Aggregation Differential Privacy and its Application in Aggregation Part 1 Differential Privacy presenter: Le Chen Nanyang Technological University lechen0213@gmail.com October 5, 2013 Introduction Outline Introduction

More information

On Node-differentially Private Algorithms for Graph Statistics

On Node-differentially Private Algorithms for Graph Statistics On Node-differentially Private Algorithms for Graph Statistics Om Dipakbhai Thakkar August 26, 2015 Abstract In this report, we start by surveying three papers on node differential privacy. First, we look

More information

Calibrating Noise to Sensitivity in Private Data Analysis

Calibrating Noise to Sensitivity in Private Data Analysis Calibrating Noise to Sensitivity in Private Data Analysis Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith Mamadou H. Diallo 1 Overview Issues: preserving privacy in statistical databases Assumption:

More information

Differential Privacy: a short tutorial. Presenter: WANG Yuxiang

Differential Privacy: a short tutorial. Presenter: WANG Yuxiang Differential Privacy: a short tutorial Presenter: WANG Yuxiang Some slides/materials extracted from Aaron Roth s Lecture at CMU The Algorithmic Foundations of Data Privacy [Website] Cynthia Dwork s FOCS

More information

Privacy in Statistical Databases

Privacy in Statistical Databases Privacy in Statistical Databases Individuals x 1 x 2 x n Server/agency ) answers. A queries Users Government, researchers, businesses or) Malicious adversary What information can be released? Two conflicting

More information

What Can We Learn Privately?

What Can We Learn Privately? What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Homin Lee Kobbi Nissim Adam Smith Los Alamos UT Austin Ben Gurion Penn State To appear in SICOMP

More information

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Katrina Ligett HUJI & Caltech joint with Seth Neel, Aaron Roth, Bo Waggoner, Steven Wu NIPS 2017

More information

Information-theoretic foundations of differential privacy

Information-theoretic foundations of differential privacy Information-theoretic foundations of differential privacy Darakhshan J. Mir Rutgers University, Piscataway NJ 08854, USA, mir@cs.rutgers.edu Abstract. We examine the information-theoretic foundations of

More information

Lecture 20: Introduction to Differential Privacy

Lecture 20: Introduction to Differential Privacy 6.885: Advanced Topics in Data Processing Fall 2013 Stephen Tu Lecture 20: Introduction to Differential Privacy 1 Overview This lecture aims to provide a very broad introduction to the topic of differential

More information

Differentially Private Linear Regression

Differentially Private Linear Regression Differentially Private Linear Regression Christian Baehr August 5, 2017 Your abstract. Abstract 1 Introduction My research involved testing and implementing tools into the Harvard Privacy Tools Project

More information

PoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2

PoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2 Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets 1 Shaanxi Normal University, Xi'an, China E-mail: lizekun@snnu.edu.cn Shuyu Li Shaanxi Normal University, Xi'an,

More information

Differentially Private Publication of Location Entropy

Differentially Private Publication of Location Entropy Differentially Private Publication of Location Entropy Hien To hto@usc.edu Kien Nguyen kien.nguyen@usc.edu yrus Shahabi shahabi@usc.edu Department of omputer Science, University of Southern alifornia Los

More information

Semantic Security: Privacy Definitions Revisited

Semantic Security: Privacy Definitions Revisited 185 198 Semantic Security: Privacy Definitions Revisited Jinfei Liu, Li Xiong, Jun Luo Department of Mathematics & Computer Science, Emory University, Atlanta, USA Huawei Noah s Ark Laboratory, HongKong

More information

Differential Privacy and Pan-Private Algorithms. Cynthia Dwork, Microsoft Research

Differential Privacy and Pan-Private Algorithms. Cynthia Dwork, Microsoft Research Differential Privacy and Pan-Private Algorithms Cynthia Dwork, Microsoft Research A Dream? C? Original Database Sanitization Very Vague AndVery Ambitious Census, medical, educational, financial data, commuting

More information

Differentially Private ANOVA Testing

Differentially Private ANOVA Testing Differentially Private ANOVA Testing Zachary Campbell campbza@reed.edu Andrew Bray abray@reed.edu Anna Ritz Biology Department aritz@reed.edu Adam Groce agroce@reed.edu arxiv:1711335v2 [cs.cr] 20 Feb 2018

More information

Analysis Based on SVM for Untrusted Mobile Crowd Sensing

Analysis Based on SVM for Untrusted Mobile Crowd Sensing Analysis Based on SVM for Untrusted Mobile Crowd Sensing * Ms. Yuga. R. Belkhode, Dr. S. W. Mohod *Student, Professor Computer Science and Engineering, Bapurao Deshmukh College of Engineering, India. *Email

More information

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data Jinzhong Wang April 13, 2016 The UBD Group Mobile and Social Computing Laboratory School of Software, Dalian University

More information

New Statistical Applications for Differential Privacy

New Statistical Applications for Differential Privacy New Statistical Applications for Differential Privacy Rob Hall 11/5/2012 Committee: Stephen Fienberg, Larry Wasserman, Alessandro Rinaldo, Adam Smith. rjhall@cs.cmu.edu http://www.cs.cmu.edu/~rjhall 1

More information

Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies

Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies Differential Privacy with ounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies ASTRACT Florian Tramèr Zhicong Huang Jean-Pierre Hubaux School of IC, EPFL firstname.lastname@epfl.ch

More information

Encapsulating Urban Traffic Rhythms into Road Networks

Encapsulating Urban Traffic Rhythms into Road Networks Encapsulating Urban Traffic Rhythms into Road Networks Junjie Wang +, Dong Wei +, Kun He, Hang Gong, Pu Wang * School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan,

More information

Challenges in Privacy-Preserving Analysis of Structured Data Kamalika Chaudhuri

Challenges in Privacy-Preserving Analysis of Structured Data Kamalika Chaudhuri Challenges in Privacy-Preserving Analysis of Structured Data Kamalika Chaudhuri Computer Science and Engineering University of California, San Diego Sensitive Structured Data Medical Records Search Logs

More information

How Well Does Privacy Compose?

How Well Does Privacy Compose? How Well Does Privacy Compose? Thomas Steinke Almaden IPAM/UCLA, Los Angeles CA, 10 Jan. 2018 This Talk Composition!! What is composition? Why is it important? Composition & high-dimensional (e.g. genetic)

More information

Jun Zhang Department of Computer Science University of Kentucky

Jun Zhang Department of Computer Science University of Kentucky Application i of Wavelets in Privacy-preserving Data Mining Jun Zhang Department of Computer Science University of Kentucky Outline Privacy-preserving in Collaborative Data Analysis Advantages of Wavelets

More information

Diagnosing New York City s Noises with Ubiquitous Data

Diagnosing New York City s Noises with Ubiquitous Data Diagnosing New York City s Noises with Ubiquitous Data Dr. Yu Zheng yuzheng@microsoft.com Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University Background Many cities suffer

More information

TASK ASSIGNMENT WITH RIGOROUS PRIVACY PROTECTION IN SPATIAL CROWDSOURCING. Hien To. Submitted in Partial Fulfillment of the Requirements

TASK ASSIGNMENT WITH RIGOROUS PRIVACY PROTECTION IN SPATIAL CROWDSOURCING. Hien To. Submitted in Partial Fulfillment of the Requirements TASK ASSIGNMENT WITH RIGOROUS PRIVACY PROTECTION IN SPATIAL CROWDSOURCING by Hien To Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science Viterbi

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services

Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services Caleb Chen CAO, Jieying SHE, Yongxin TONG, Lei CHEN The Hong Kong University of Science and Technology Is Istanbul the capital

More information

Local Differential Privacy

Local Differential Privacy Local Differential Privacy Peter Kairouz Department of Electrical & Computer Engineering University of Illinois at Urbana-Champaign Joint work with Sewoong Oh (UIUC) and Pramod Viswanath (UIUC) / 33 Wireless

More information

A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories

A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller 2015 UT CID Report #1512 This UT CID research was supported in part by the following

More information

Tuan V. Dinh, Lachlan Andrew and Philip Branch

Tuan V. Dinh, Lachlan Andrew and Philip Branch Predicting supercomputing workload using per user information Tuan V. Dinh, Lachlan Andrew and Philip Branch 13 th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Delft, 14 th -16

More information

Mobility Analytics through Social and Personal Data. Pierre Senellart

Mobility Analytics through Social and Personal Data. Pierre Senellart Mobility Analytics through Social and Personal Data Pierre Senellart Session: Big Data & Transport Business Convention on Big Data Université Paris-Saclay, 25 novembre 2015 Analyzing Transportation and

More information

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015 CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring 2015 3 February 2015 The interactive online mechanism that we discussed in the last two lectures allows efficient

More information

Spatial Data Science. Soumya K Ghosh

Spatial Data Science. Soumya K Ghosh Workshop on Data Science and Machine Learning (DSML 17) ISI Kolkata, March 28-31, 2017 Spatial Data Science Soumya K Ghosh Professor Department of Computer Science and Engineering Indian Institute of Technology,

More information

Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales. ACM MobiCom 2014, Maui, HI

Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales. ACM MobiCom 2014, Maui, HI Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales Desheng Zhang & Tian He University of Minnesota, USA Jun Huang, Ye Li, Fan Zhang, Chengzhong Xu Shenzhen Institute

More information

Estimating Large Scale Population Movement ML Dublin Meetup

Estimating Large Scale Population Movement ML Dublin Meetup Deutsche Bank COO Chief Data Office Estimating Large Scale Population Movement ML Dublin Meetup John Doyle PhD Assistant Vice President CDO Research & Development Science & Innovation john.doyle@db.com

More information

Lecture 1 Introduction to Differential Privacy: January 28

Lecture 1 Introduction to Differential Privacy: January 28 Introduction to Differential Privacy CSE711: Topics in Differential Privacy Spring 2016 Lecture 1 Introduction to Differential Privacy: January 28 Lecturer: Marco Gaboardi Scribe: Marco Gaboardi This note

More information

An Overview of Traffic Matrix Estimation Methods

An Overview of Traffic Matrix Estimation Methods An Overview of Traffic Matrix Estimation Methods Nina Taft Berkeley www.intel.com/research Problem Statement 1 st generation solutions 2 nd generation solutions 3 rd generation solutions Summary Outline

More information

Differential Privacy in an RKHS

Differential Privacy in an RKHS Differential Privacy in an RKHS Rob Hall (with Larry Wasserman and Alessandro Rinaldo) 2/20/2012 rjhall@cs.cmu.edu http://www.cs.cmu.edu/~rjhall 1 Overview Why care about privacy? Differential Privacy

More information

Northrop Grumman Concept Paper

Northrop Grumman Concept Paper Northrop Grumman Concept Paper A Comprehensive Geospatial Web-based Solution for NWS Impact-based Decision Support Services Glenn Higgins April 10, 2014 Northrop Grumman Corporation Information Systems

More information

1 Hoeffding s Inequality

1 Hoeffding s Inequality Proailistic Method: Hoeffding s Inequality and Differential Privacy Lecturer: Huert Chan Date: 27 May 22 Hoeffding s Inequality. Approximate Counting y Random Sampling Suppose there is a ag containing

More information

ArcGIS is Advancing. Both Contributing and Integrating many new Innovations. IoT. Smart Mapping. Smart Devices Advanced Analytics

ArcGIS is Advancing. Both Contributing and Integrating many new Innovations. IoT. Smart Mapping. Smart Devices Advanced Analytics ArcGIS is Advancing IoT Smart Devices Advanced Analytics Smart Mapping Real-Time Faster Computing Web Services Crowdsourcing Sensor Networks Both Contributing and Integrating many new Innovations ArcGIS

More information

Privacy-Preserving Data Mining

Privacy-Preserving Data Mining CS 380S Privacy-Preserving Data Mining Vitaly Shmatikov slide 1 Reading Assignment Evfimievski, Gehrke, Srikant. Limiting Privacy Breaches in Privacy-Preserving Data Mining (PODS 2003). Blum, Dwork, McSherry,

More information

Extremal Mechanisms for Local Differential Privacy

Extremal Mechanisms for Local Differential Privacy Extremal Mechanisms for Local Differential Privacy Peter Kairouz Sewoong Oh 2 Pramod Viswanath Department of Electrical & Computer Engineering 2 Department of Industrial & Enterprise Systems Engineering

More information

Privacy-preserving Data Mining

Privacy-preserving Data Mining Privacy-preserving Data Mining What is [data] privacy? Privacy and Data Mining Privacy-preserving Data mining: main approaches Anonymization Obfuscation Cryptographic hiding Challenges Definition of privacy

More information

CHARTING SPATIAL BUSINESS TRANSFORMATION

CHARTING SPATIAL BUSINESS TRANSFORMATION CHARTING SPATIAL BUSINESS TRANSFORMATION An in-depth look at the business patterns of GIS and location intelligence adoption in the private sector EXECUTIVE SUMMARY The global use of geographic information

More information

H E L S I N K I 3D+ City Models and Smart Projects. Project Manager/ Architect/MSc (Civ.Eng) Jarmo Suomisto

H E L S I N K I 3D+ City Models and Smart Projects. Project Manager/ Architect/MSc (Civ.Eng) Jarmo Suomisto H E L S I N K I 3D+ 3D City Models and Smart Projects Project Manager/ Architect/MSc (Civ.Eng) Jarmo Suomisto Project Manager/ MSc (Civ.Eng) Kari Kaisla Coordinator/ MSc (Civ.Eng) Enni Airaksinen Project

More information

Spatial Crowdsourcing: Challenges and Applications

Spatial Crowdsourcing: Challenges and Applications Spatial Crowdsourcing: Challenges and Applications Liu Qiyu, Zeng Yuxiang 2018/3/25 We get some pictures and materials from tutorial spatial crowdsourcing: Challenges, Techniques,and Applications in VLDB

More information

Towards information flow control. Chaire Informatique et sciences numériques Collège de France, cours du 30 mars 2011

Towards information flow control. Chaire Informatique et sciences numériques Collège de France, cours du 30 mars 2011 Towards information flow control Chaire Informatique et sciences numériques Collège de France, cours du 30 mars 2011 Mandatory access controls and security levels DAC vs. MAC Discretionary access control

More information

Time Series Data Cleaning

Time Series Data Cleaning Time Series Data Cleaning Shaoxu Song http://ise.thss.tsinghua.edu.cn/sxsong/ Dirty Time Series Data Unreliable Readings Sensor monitoring GPS trajectory J. Freire, A. Bessa, F. Chirigati, H. T. Vo, K.

More information

Jun Zhang Department of Computer Science University of Kentucky

Jun Zhang Department of Computer Science University of Kentucky Jun Zhang Department of Computer Science University of Kentucky Background on Privacy Attacks General Data Perturbation Model SVD and Its Properties Data Privacy Attacks Experimental Results Summary 2

More information

Detecting Origin-Destination Mobility Flows From Geotagged Tweets in Greater Los Angeles Area

Detecting Origin-Destination Mobility Flows From Geotagged Tweets in Greater Los Angeles Area Detecting Origin-Destination Mobility Flows From Geotagged Tweets in Greater Los Angeles Area Song Gao 1, Jiue-An Yang 1,2, Bo Yan 1, Yingjie Hu 1, Krzysztof Janowicz 1, Grant McKenzie 1 1 STKO Lab, Department

More information

Data-driven characterization of multidirectional pedestrian trac

Data-driven characterization of multidirectional pedestrian trac Data-driven characterization of multidirectional pedestrian trac Marija Nikoli Michel Bierlaire Flurin Hänseler TGF 5 Delft University of Technology, the Netherlands October 8, 5 / 5 Outline. Motivation

More information

Visualisation of Spatial Data

Visualisation of Spatial Data Visualisation of Spatial Data VU Visual Data Science Johanna Schmidt WS 2018/19 2 Visual Data Science Introduction to Visualisation Basics of Information Visualisation Charts and Techniques Introduction

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Differentially Private Sequential Data Publication via Variable-Length N-Grams

Differentially Private Sequential Data Publication via Variable-Length N-Grams Differentially Private Sequential Data Publication via Variable-Length N-Grams Rui Chen Concordia University Montreal, Canada ru_che@encs.concordia.ca Gergely Acs INRIA France gergely.acs@inria.fr Claude

More information

TRAITS to put you on the map

TRAITS to put you on the map TRAITS to put you on the map Know what s where See the big picture Connect the dots Get it right Use where to say WOW Look around Spread the word Make it yours Finding your way Location is associated with

More information

Differential Privacy and Verification. Marco Gaboardi University at Buffalo, SUNY

Differential Privacy and Verification. Marco Gaboardi University at Buffalo, SUNY Differential Privacy and Verification Marco Gaboardi University at Buffalo, SUNY Given a program P, is it differentially private? P Verification Tool yes? no? Proof P Verification Tool yes? no? Given a

More information

Demographic Data in ArcGIS. Harry J. Moore IV

Demographic Data in ArcGIS. Harry J. Moore IV Demographic Data in ArcGIS Harry J. Moore IV Outline What is demographic data? Esri Demographic data - Real world examples with GIS - Redistricting - Emergency Preparedness - Economic Development Next

More information

Differentially Private Sequential Data Publication via Variable-Length N-Grams

Differentially Private Sequential Data Publication via Variable-Length N-Grams Author manuscript, published in "ACM Computer and Communication Security (CCS) (2012)" Differentially Private Sequential Data Publication via Variable-Length N-Grams Rui Chen Concordia University Montreal,

More information

Publishing Search Logs A Comparative Study of Privacy Guarantees

Publishing Search Logs A Comparative Study of Privacy Guarantees Publishing Search Logs A Comparative Study of Privacy Guarantees Michaela Götz Ashwin Machanavajjhala Guozhang Wang Xiaokui Xiao Johannes Gehrke Abstract Search engine companies collect the database of

More information

Crime Analysis. GIS Solutions for Intelligence-Led Policing

Crime Analysis. GIS Solutions for Intelligence-Led Policing Crime Analysis GIS Solutions for Intelligence-Led Policing Applying GIS Technology to Crime Analysis Know Your Community Analyze Your Crime Use Your Advantage GIS aids crime analysis by Identifying and

More information

[Title removed for anonymity]

[Title removed for anonymity] [Title removed for anonymity] Graham Cormode graham@research.att.com Magda Procopiuc(AT&T) Divesh Srivastava(AT&T) Thanh Tran (UMass Amherst) 1 Introduction Privacy is a common theme in public discourse

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Wishart Mechanism for Differentially Private Principal Components Analysis

Wishart Mechanism for Differentially Private Principal Components Analysis Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-6) Wishart Mechanism for Differentially Private Principal Components Analysis Wuxuan Jiang, Cong Xie and Zhihua Zhang Department

More information

Notes on the Exponential Mechanism. (Differential privacy)

Notes on the Exponential Mechanism. (Differential privacy) Notes on the Exponential Mechanism. (Differential privacy) Boston University CS 558. Sharon Goldberg. February 22, 2012 1 Description of the mechanism Let D be the domain of input datasets. Let R be the

More information

* Abstract. Keywords: Smart Card Data, Public Transportation, Land Use, Non-negative Matrix Factorization.

*  Abstract. Keywords: Smart Card Data, Public Transportation, Land Use, Non-negative Matrix Factorization. Analysis of Activity Trends Based on Smart Card Data of Public Transportation T. N. Maeda* 1, J. Mori 1, F. Toriumi 1, H. Ohashi 1 1 The University of Tokyo, 7-3-1 Hongo Bunkyo-ku, Tokyo, Japan *Email:

More information

DPT: Differentially Private Trajectory Synthesis Using Hierarchical Reference Systems

DPT: Differentially Private Trajectory Synthesis Using Hierarchical Reference Systems DPT: Differentially Private Trajectory Synthesis Using Hierarchical Reference Systems Xi He Duke University hexi88@cs.duke.edu Cecilia M. Procopiuc AT&T Labs-Research mpro@google.com Graham Cormode University

More information

Cognitive Engineering for Geographic Information Science

Cognitive Engineering for Geographic Information Science Cognitive Engineering for Geographic Information Science Martin Raubal Department of Geography, UCSB raubal@geog.ucsb.edu 21 Jan 2009 ThinkSpatial, UCSB 1 GIScience Motivation systematic study of all aspects

More information

Biggest problem with data analytics. tons of data, not much insights.

Biggest problem with data analytics. tons of data, not much insights. Biggest problem with data analytics tons of data, not much insights. Mobile Marketing CAC KPI Advertisers Organic Which customer segment should we go after? Owned Media S E O What ad copy should I create?

More information

Differentially Private Password Frequency Lists

Differentially Private Password Frequency Lists Differentially Private Password Frequency Lists Or, How to release statistics from 70 million passwords on purpose Jeremiah Blocki Microsoft Research Email: jblocki@microsoftcom Anupam Datta Carnegie Mellon

More information

Uncertain Time-Series Similarity: Return to the Basics

Uncertain Time-Series Similarity: Return to the Basics Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730 Problem Problem: uncertain time-series similarity Applications: location tracking of moving objects;

More information

Answering Many Queries with Differential Privacy

Answering Many Queries with Differential Privacy 6.889 New Developments in Cryptography May 6, 2011 Answering Many Queries with Differential Privacy Instructors: Shafi Goldwasser, Yael Kalai, Leo Reyzin, Boaz Barak, and Salil Vadhan Lecturer: Jonathan

More information

Differential Privacy Models for Location- Based Services

Differential Privacy Models for Location- Based Services Differential Privacy Models for Location- Based Services Ehab Elsalamouny, Sébastien Gambs To cite this version: Ehab Elsalamouny, Sébastien Gambs. Differential Privacy Models for Location- Based Services.

More information

Lesson 16: Technology Trends and Research

Lesson 16: Technology Trends and Research http://www.esri.com/library/whitepapers/pdfs/integrated-geoenabled-soa.pdf GEOG DL582 : GIS Data Management Lesson 16: Technology Trends and Research Overview Learning Objective Questions: 1. Why is integration

More information

On Differentially Private Frequent Itemsets Mining

On Differentially Private Frequent Itemsets Mining On Differentially Private Frequent Itemsets Mining Chen Zeng University of Wisconsin-Madison zeng@cs.wisc.edu Jeffrey F. Naughton University of Wisconsin-Madison naughton@cs.wisc.edu Jin-Yi Cai University

More information

An Industry Perspective. Bryn Fosburgh Vice President Trimble

An Industry Perspective. Bryn Fosburgh Vice President Trimble An Industry Perspective Bryn Fosburgh Vice President Trimble Who are we? Professionals & Consultants Geospatial Professionals working at or with: AEC Consultants Transportation Departments Construction

More information

RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response

RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova Google & USC Presented By: Pat Pannuto RAPPOR, What is is good for? (Absolutely something!)

More information

Oman NSDI Supporting Economic Development. Saud Al-Nofli Director of Spatial Data Directorate General of NSDI, NCSI

Oman NSDI Supporting Economic Development. Saud Al-Nofli Director of Spatial Data Directorate General of NSDI, NCSI Oman NSDI Supporting Economic Development 2017 Saud Al-Nofli Director of Spatial Data Directorate General of NSDI, NCSI "It s critical to make correct decisions the first time to optimize the Investments

More information

Release Connection Fingerprints in Social Networks Using Personalized Differential Privacy

Release Connection Fingerprints in Social Networks Using Personalized Differential Privacy Release Connection Fingerprints in Social Networks Using Personalized Differential Privacy Yongkai Li, Shubo Liu, Jun Wang, and Mengjun Liu School of Computer, Wuhan University, Wuhan, China Key Laboratory

More information

Lessons From the Trenches: using Mobile Phone Data for Official Statistics

Lessons From the Trenches: using Mobile Phone Data for Official Statistics Lessons From the Trenches: using Mobile Phone Data for Official Statistics Maarten Vanhoof Orange Labs/Newcastle University M.vanhoof1@newcastle.ac.uk @Metti Hoof MaartenVanhoof.com Mobile Phone Data (Call

More information

Uncovering the Digital Divide and the Physical Divide in Senegal Using Mobile Phone Data

Uncovering the Digital Divide and the Physical Divide in Senegal Using Mobile Phone Data Uncovering the Digital Divide and the Physical Divide in Senegal Using Mobile Phone Data Song Gao, Bo Yan, Li Gong, Blake Regalia, Yiting Ju, Yingjie Hu STKO Lab, Department of Geography, University of

More information

Progress in Data Anonymization: from k-anonymity to the minimality attack

Progress in Data Anonymization: from k-anonymity to the minimality attack Progress in Data Anonymization: from k-anonymity to the minimality attack Graham Cormode graham@research.att.com Tiancheng Li, Ninghua Li, Divesh Srivastava 1 Why Anonymize? For Data Sharing Give real(istic)

More information

Accessibility as an Instrument in Planning Practice. Derek Halden DHC 2 Dean Path, Edinburgh EH4 3BA

Accessibility as an Instrument in Planning Practice. Derek Halden DHC 2 Dean Path, Edinburgh EH4 3BA Accessibility as an Instrument in Planning Practice Derek Halden DHC 2 Dean Path, Edinburgh EH4 3BA derek.halden@dhc1.co.uk www.dhc1.co.uk Theory to practice a starting point Shared goals for access to

More information