Database Privacy: k-anonymity and de-anonymization attacks

Size: px
Start display at page:

Download "Database Privacy: k-anonymity and de-anonymization attacks"

Transcription

1 18734: Foundations of Privacy Database Privacy: k-anonymity and de-anonymization attacks Piotr Mardziel or Anupam Datta CMU Fall 2018

2 Publicly Released Large Datasets } Useful for improving recommendation systems, collaborative research } Contain personal information } Mechanisms to protect privacy, e.g. anonymization by removing names } Yet, private information leaked by attacks on anonymization mechanisms 2

3 Non-Interactive Linking Background/A uxiliary Information DB1 DB2 Algorithm to link information De-identified record 3

4 Roadmap } Motivation } Privacy definitions } Netflix-IMDb attack } Theoretical analysis } Empirical verification of assumptions } Conclusion 4

5 Sanitization of Databases Add noise, delete names, etc. Real Database Sanitized Database Health records Census data Protect privacy Provide useful information (utility) 5

6 Database Privacy } Releasing sanitized databases 1. k-anonymity [Samarati 2001; Sweeney 2002] 2. Differential privacy [Dwork et al. 2006] (future lecture) 6

7 Re-identification by linking Linking two sets of data on shared attributes may uniquely identify some individuals: 87 % of US population uniquely identifiable by 5-digit ZIP, gender, DOB 7

8 K-anonymity } Quasi-identifier: Set of attributes that can be linked with external data to uniquely identify individuals } Make every record in the table indistinguishable from at least k-1 other records with respect to quasi-identifiers } Linking on quasi-identifiers yields at least k records for each possible value of the quasi-identifier 8

9 K-anonymity and beyond Provides some protection: linking on ZIP, age, nationality yields 4 records Limitations: lack of diversity in sensitive attributes, background knowledge, subsequent releases on the same data set l-diversity, m-invariance, t-closeness, 9

10 Re-identification Attacks in Practice Examples: } Netflix-IMDB } Movielens attack } Twitter-Flicker } Recommendation systems Amazon, Hunch,.. Goal of De-anonymization: To find information about a record in the released dataset 10

11 Roadmap } Motivation } Privacy definitions } Netflix-IMDb attack } Theoretical analysis } Empirical verification of assumptions } Conclusion 11

12 Anonymization Mechanism Delete name identifiers and add noise Gladiator Titanic Heidi Bob Alice Charlie Each row corresponds to an individual Each column corresponds to an attribute, e.g. movie Gladiator Titanic Heidi r r Anonymized Netflix DB 12 r

13 De-anonymization Attacks Still Possible } Isolation Attacks } Recover individual s record from anonymized database } E.g., find user s record in anonymized Netflix movie database } Information Amplification Attacks } Find more information about individual in anonymized database } E.g. find ratings for specific movie for user in Netflix database 13

14 Netflix-IMDb Empirical Attack [Narayanan et al 2008] Anonymized Netflix DB Gladiator Titanic Heidi r r r Publicly available IMDb ratings (noisy) Titanic Heidi Bob 2 1 Used as auxiliary information Weighted Scoring Algorithm Isolation Attack! r

15 Problem Statement Anonymized database Gladiator Titanic Heidi Auxiliary information about a record (noisy) r Titanic Heidi r Bob 2 1 r Attacker uses algorithm to find record Enhance theoretical understanding of why empirical de-anonymization attacks work 15 Attacker s goal: Find r 1 or record similar to Bob s record

16 Research Goal Characterize classes of auxiliary information and properties of database for which re-identification is possible 16

17 Roadmap } Motivation } Privacy definitions } Netflix-IMDb attack } Theoretical analysis } Empirical verification of assumptions } Conclusion 17

18 Netflix-IMDb Empirical Attack [Narayanan et al 2008] Anonymized Netflix DB Gladiator Titanic Heidi r r r Publicly available IMDb ratings (noisy) Titanic Heidi Bob 2 1 Used as auxiliary information How do you measure similarity of this record with Bob s record? (Similarity Metric) 18 Weighted Scoring Algorithm r What does auxiliary information about a record mean?

19 Definition: Asymmetric Similarity Metric Gladiator v 1 Titanic v 2 Heidi v 3 y r Intuition: Measures how closely two people s ratings match on one movie Movie (i) T(y(i), r(i)) Gladiator 0 Titanic 0.6 Individual Attribute Similarity y(i) r(i) T(y(i),r(i)) =1 p(i) 5 0 T( y(v 1 ), r(v 1)) = 1 = 0 5 p(i): range of attribute i Heidi 0 Similarity Metric Intuition: Measures how closely two people s ratings match overall 19 S(y,r) 0.6/2 = 0.3 S(y,r) = i supp(y) T(y(i),r(i)) supp(y) supp(y): non null attributes in y

20 Definition: Auxiliary Information Intuition: aux about y should be a subset of record y aux can be noisy e.g. Netflix r y sample aux captures information available outside normal data release process e.g. IMDb aux perturb Bound level of perturbation in aux γ [0,1] (m,γ)-perturbed auxiliary information i supp(aux).t(y(i),aux(i)) 1 γ supp(aux) = m = no. of non null attributes in aux 20

21 Weighted Scoring [Narayanan et al 2008, Frankowski et al 2006] Intuition: The fewer the number of people who watched a movie, the rarer it is Weight of an attribute i w(i) = 1 log( supp(i) ) supp(i) = no. of non null entries in column i Use weight as an indicator of rarity Score gives a weighted average of how closely two people match on every movie, giving higher weight to rare movies 21 Score( aux, rj) = Scoring Methodology supp(aux) = m = no. of non null attributes in aux Compute Score for every record r in anonymized DB to find out which one is closest to target record y w( i)* T( aux( i), rj( i)) i supp ( aux) supp( aux)

22 Weighted Scoring Algorithm [Narayanan et al 2008] Compute Score for every r in D w i v 1 v 2 v 3 r r r Score(aux, r j ) Score( aux, r j ) = w( i)* T( aux( i), r i supp ( aux) supp( aux) v 1 v aux One of the records r in anonymized database is y, which row is it? Eccentricity measure > threshold e(aux,d) = max r D(Score(aux,r)) max 2, r D(Score(aux,r)) j ( i)) 22 Output record with max Score r Score(aux, r) used to predict S(y,r)

23 Where do Theorems Fit? Computed: Score of all records r in D with aux Theorems help bridge the gap Desired: Guarantee about Similarity r r

24 Theorems } Theorem 1: When Isolation Attacks work? } Theorem 2: Why Information Amplification Attacks work? 24

25 Theorem 1: When Isolation Attacks work? Intuition: If eccentricity is high, algorithm always finds the record corresponding to auxiliary information! If aux is (m,γ)-perturbed Eccentricity threshold > γm Eccentricity: Highest score - Second highest score then Score(aux,r) = Score(aux,y) γ: Indicator of perturbation in aux M : Average of weights in aux r: Record output by algorithm y : Target record If r is the only record with the highest score then r = y 25

26 Isolation Attack: Theorem A. Datta, D. Sharma and A. Sinha. Provable De-anonymization of Large Datasets with Sparse Dimensions. In proceedings of ETAPS First Conference on Principles of Security and Trust (POST 2012) 26

27 Theorems } Theorem 1: When Isolation Attacks work? } Theorem 2: Why Information Amplification Attacks work? 27

28 Intuition: Why Information Amplification Attacks work? } If two records agree on rare attributes, then with high probability they agree on other attributes too } Use intuition to find record r similar to aux on many rare attributes (using aux as proxy for y) 28

29 Intuition: Why Information Amplification Attacks work? > 0.75 For > 90% of records } If a high fraction of attributes in aux are rare, then any record r that is similar to aux, is similar to y Similarity > 0.75 Similarity >

30 Theorem 2: Why Information Amplification Attacks work? Define Function f D ( η1, η2, η3) If a high fraction of attributes in aux are rare, then any record r similar to aux, is similar to y - Measure overall similarity between target record y and r that depends on: η : 1 η : 2 η3 : Fraction of rare attributes in aux Lower bound on similarity between r and aux Fraction of target records for which guarantee holds S( y, r) fd( η 1, η2, η3) 30

31 Theorem 2: Why Information Amplification Attacks work? Using Function f D ( η1, η2, η3) S( y, r) fd( η 1, η2, η3) Theorem gives guarantee about similarity of record output by algorithm with target record 31

32 Roadmap } Motivation } Privacy definitions } Netflix-IMDb attack } Theoretical analysis } Empirical verification of assumptions } Conclusion 32

33 Empirical verification } Use `anonymized' Netflix database with 480,189 users and 17,770 movies } Percentage values claimed in our results = percentage of records not filtered out because of } insufficient attributes required to form aux OR } insufficient rare or non-rare attributes required to form aux A. Datta, D. Sharma and A. Sinha. Provable De-anonymization of Large Datasets with Sparse Dimensions. In proceedings of ETAPS First Conference on Principles of Security and Trust (POST 2012) 33

34 Do Assumptions hold over Netflix Database? % Records for which Theorem 1 assumptions hold m = 10 m = 20 % Records Perturbation measure, gamma (γ) m : no. of attributes in aux Averaged over sample of records chosen with replacement 34 A. Datta, D. Sharma and A. Sinha. Provable De-anonymization of Large Datasets with Sparse Dimensions. In proceedings of ETAPS First Conference on Principles of Security and Trust (POST 2012)

35 Does Intuition about f D hold for Netflix Database? η : 1 η : 2 η : 3 f D ( η, η, 3) η Fraction of rare attributes in aux Lower bound on similarity between r and aux Fraction of target records for which guarantee holds can be evaluated given D f(n1,n2,0.9) n2:min value of S(aux,r) S( y, r) fd( η 1, η2, η3) Intuition verified n1:fraction of rare attributes in aux For Netflix DB, f D ( η 3) and 1, η2, η is monotonically increasing in η 1 η2 and tends to 1 as increases η 2 1

36 Roadmap } Motivation } Privacy definitions } Netflix-IMDb attack } Theoretical analysis } Empirical verification of assumptions } Conclusion 36

37 Conclusion } Naïve anonymization mechanisms do not work } We obtain provable bounds about, and verify empirically, why some de-anonymization attacks work in practice } Even perturbed auxiliary information can be used to launch de-anonymization attacks if: } Database has many rare dimensions and } Auxiliary information has information about these rare dimensions 37

38 Summary } Anonymity via sanitization } Offline sanitization } Online sanitization (next lecture) } Privacy definitions } k-anonymity } l-diversity }... 38

39 Summary } Deanonmyization attacks } Isolation } Amplification } Measuring attack success without ground truth } Assumptions } (m, γ)-perturbation } Measurables } similarity } eccentricity } η 1, η 2, η 3 39

40 Deanonymization Ground Truth Y y sanitization process modeling assumption Sanitized R Auxiliary Aux r aux 40

41 Isolation attack Ground Truth Y y isolation attack sanitization process modeling assumption (m, γ)-perturbation Sanitized R Auxiliary Aux r r aux 41

42 Amplification attack Ground Truth Y y* i 5 y 4 sanitization process amplification attack modeling assumption (m, γ)-perturbation Sanitized R Auxiliary Aux actual identity r* assumed identity r i 5 4 aux i??? 42 η 1 Fraction of rare attributes in aux η 2 Minimal similarity between r and aux η 3 Fraction of records satisfying of η 1, η 2

43 Offline/non-interactive release sanitized dataset k-anonymity Minimum anonymity set size Anonymization settings Privacy definitions Online/interactive sanitize queries l-diversity Minimum sensitive range size 43

44 Modeling Y -- Ground Truth records (NOT KNOWN) R -- Sanitized records = Obf(Y) Aux -- Auxiliary records (m, γ)-perturbation g.t./aux relationship Assumptions and Experimental Measurements Given aux in Aux, isolate r in R closest to it Deanonymization attacks Measurements e eccentricity best isolate r vs second best r η 1 Fraction of rare attributes in aux η 2 Minimal similarity between r and aux η 3 Fraction of records satisfying of η 1, η 2 Isolation Link auxiliary aux in A to r in R. Is aux is same identity as g.t. y à r? Amplification Use R to find values of fields not in aux Are predicted values close to g.t. y? Success estimation Theorem 1: Isolation success (m, γ), eccentricity à successful isolation Theorem 2: Amplification success η 1, η 2, η 3 à isolated r is close to g.t. y 44

Theoretical Results on De-Anonymization via Linkage Attacks

Theoretical Results on De-Anonymization via Linkage Attacks 377 402 Theoretical Results on De-Anonymization via Linkage Attacks Martin M. Merener York University, N520 Ross, 4700 Keele Street, Toronto, ON, M3J 1P3, Canada. E-mail: merener@mathstat.yorku.ca Abstract.

More information

Lecture 11- Differential Privacy

Lecture 11- Differential Privacy 6.889 New Developments in Cryptography May 3, 2011 Lecture 11- Differential Privacy Lecturer: Salil Vadhan Scribes: Alan Deckelbaum and Emily Shen 1 Introduction In class today (and the next two lectures)

More information

Report on Differential Privacy

Report on Differential Privacy Report on Differential Privacy Lembit Valgma Supervised by Vesal Vojdani December 19, 2017 1 Introduction Over the past decade the collection and analysis of personal data has increased a lot. This has

More information

The Optimal Mechanism in Differential Privacy

The Optimal Mechanism in Differential Privacy The Optimal Mechanism in Differential Privacy Quan Geng Advisor: Prof. Pramod Viswanath 3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 1 Outline Background on differential privacy Problem formulation

More information

Differential Privacy and Pan-Private Algorithms. Cynthia Dwork, Microsoft Research

Differential Privacy and Pan-Private Algorithms. Cynthia Dwork, Microsoft Research Differential Privacy and Pan-Private Algorithms Cynthia Dwork, Microsoft Research A Dream? C? Original Database Sanitization Very Vague AndVery Ambitious Census, medical, educational, financial data, commuting

More information

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism Privacy of Numeric Queries Via Simple Value Perturbation The Laplace Mechanism Differential Privacy A Basic Model Let X represent an abstract data universe and D be a multi-set of elements from X. i.e.

More information

Differential Privacy for Probabilistic Systems. Michael Carl Tschantz Anupam Datta Dilsun Kaynar. May 14, 2009 CMU-CyLab

Differential Privacy for Probabilistic Systems. Michael Carl Tschantz Anupam Datta Dilsun Kaynar. May 14, 2009 CMU-CyLab Differential Privacy for Probabilistic Systems Michael Carl Tschantz Anupam Datta Dilsun Kaynar May 14, 2009 CMU-CyLab-09-008 CyLab Carnegie Mellon University Pittsburgh, PA 15213 Differential Privacy

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

Differential Privacy and its Application in Aggregation

Differential Privacy and its Application in Aggregation Differential Privacy and its Application in Aggregation Part 1 Differential Privacy presenter: Le Chen Nanyang Technological University lechen0213@gmail.com October 5, 2013 Introduction Outline Introduction

More information

t-closeness: Privacy beyond k-anonymity and l-diversity

t-closeness: Privacy beyond k-anonymity and l-diversity t-closeness: Privacy beyond k-anonymity and l-diversity paper by Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian presentation by Caitlin Lustig for CS 295D Definitions Attributes: explicit identifiers,

More information

The Optimal Mechanism in Differential Privacy

The Optimal Mechanism in Differential Privacy The Optimal Mechanism in Differential Privacy Quan Geng Advisor: Prof. Pramod Viswanath 11/07/2013 PhD Final Exam of Quan Geng, ECE, UIUC 1 Outline Background on Differential Privacy ε-differential Privacy:

More information

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use! Data Mining The art of extracting knowledge from large bodies of structured data. Let s put it to use! 1 Recommendations 2 Basic Recommendations with Collaborative Filtering Making Recommendations 4 The

More information

Privacy in Statistical Databases

Privacy in Statistical Databases Privacy in Statistical Databases Individuals x 1 x 2 x n Server/agency ( ) answers. A queries Users Government, researchers, businesses (or) Malicious adversary What information can be released? Two conflicting

More information

18734: Foundations of Privacy. Anonymous Cash. Anupam Datta. CMU Fall 2018

18734: Foundations of Privacy. Anonymous Cash. Anupam Datta. CMU Fall 2018 18734: Foundations of Privacy Anonymous Cash Anupam Datta CMU Fall 2018 Today: Electronic Cash Goals Alice can ask for Bank to issue coins from her account. Alice can spend coins. Bank cannot track what

More information

Differential Privacy: a short tutorial. Presenter: WANG Yuxiang

Differential Privacy: a short tutorial. Presenter: WANG Yuxiang Differential Privacy: a short tutorial Presenter: WANG Yuxiang Some slides/materials extracted from Aaron Roth s Lecture at CMU The Algorithmic Foundations of Data Privacy [Website] Cynthia Dwork s FOCS

More information

Calibrating Noise to Sensitivity in Private Data Analysis

Calibrating Noise to Sensitivity in Private Data Analysis Calibrating Noise to Sensitivity in Private Data Analysis Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith Mamadou H. Diallo 1 Overview Issues: preserving privacy in statistical databases Assumption:

More information

Differential Privacy

Differential Privacy CS 380S Differential Privacy Vitaly Shmatikov most slides from Adam Smith (Penn State) slide 1 Reading Assignment Dwork. Differential Privacy (invited talk at ICALP 2006). slide 2 Basic Setting DB= x 1

More information

Progress in Data Anonymization: from k-anonymity to the minimality attack

Progress in Data Anonymization: from k-anonymity to the minimality attack Progress in Data Anonymization: from k-anonymity to the minimality attack Graham Cormode graham@research.att.com Tiancheng Li, Ninghua Li, Divesh Srivastava 1 Why Anonymize? For Data Sharing Give real(istic)

More information

Publishing Search Logs A Comparative Study of Privacy Guarantees

Publishing Search Logs A Comparative Study of Privacy Guarantees Publishing Search Logs A Comparative Study of Privacy Guarantees Michaela Götz Ashwin Machanavajjhala Guozhang Wang Xiaokui Xiao Johannes Gehrke Abstract Search engine companies collect the database of

More information

Decoupled Collaborative Ranking

Decoupled Collaborative Ranking Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides

More information

Acknowledgements. Today s Lecture. Attacking k-anonymity. Ahem Homogeneity. Data Privacy in Biomedicine: Lecture 18

Acknowledgements. Today s Lecture. Attacking k-anonymity. Ahem Homogeneity. Data Privacy in Biomedicine: Lecture 18 Acknowledgeents Data Privacy in Bioedicine Lecture 18: Fro Identifier to Attribute Disclosure and Beyond Bradley Malin, PhD (b.alin@vanderbilt.edu) Professor of Bioedical Inforatics, Biostatistics, and

More information

[Title removed for anonymity]

[Title removed for anonymity] [Title removed for anonymity] Graham Cormode graham@research.att.com Magda Procopiuc(AT&T) Divesh Srivastava(AT&T) Thanh Tran (UMass Amherst) 1 Introduction Privacy is a common theme in public discourse

More information

Differentially Private Real-time Data Release over Infinite Trajectory Streams

Differentially Private Real-time Data Release over Infinite Trajectory Streams Differentially Private Real-time Data Release over Infinite Trajectory Streams Kyoto University, Japan Department of Social Informatics Yang Cao, Masatoshi Yoshikawa 1 Outline Motivation: opportunity &

More information

l-diversity: Privacy Beyond k-anonymity

l-diversity: Privacy Beyond k-anonymity l-diversity: Privacy Beyond k-anonymity Ashwin Machanavajjhala Johannes Gehrke Daniel Kifer Muthuramakrishnan Venkitasubramaniam Department of Computer Science, Cornell University {mvnak, johannes, dkifer,

More information

Lecture 20: Introduction to Differential Privacy

Lecture 20: Introduction to Differential Privacy 6.885: Advanced Topics in Data Processing Fall 2013 Stephen Tu Lecture 20: Introduction to Differential Privacy 1 Overview This lecture aims to provide a very broad introduction to the topic of differential

More information

Privacy-preserving Data Mining

Privacy-preserving Data Mining Privacy-preserving Data Mining What is [data] privacy? Privacy and Data Mining Privacy-preserving Data mining: main approaches Anonymization Obfuscation Cryptographic hiding Challenges Definition of privacy

More information

Jun Zhang Department of Computer Science University of Kentucky

Jun Zhang Department of Computer Science University of Kentucky Jun Zhang Department of Computer Science University of Kentucky Background on Privacy Attacks General Data Perturbation Model SVD and Its Properties Data Privacy Attacks Experimental Results Summary 2

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS. The original slides can be accessed at: www.mmds.org Customer

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Privacy-Preserving Data Mining

Privacy-Preserving Data Mining CS 380S Privacy-Preserving Data Mining Vitaly Shmatikov slide 1 Reading Assignment Evfimievski, Gehrke, Srikant. Limiting Privacy Breaches in Privacy-Preserving Data Mining (PODS 2003). Blum, Dwork, McSherry,

More information

Privacy and Data Mining: New Developments and Challenges. Plan

Privacy and Data Mining: New Developments and Challenges. Plan Privacy and Data Mining: New Developments and Challenges Stan Matwin School of Information Technology and Engineering Universit[é y] [d of]ottawa, Canada stan@site.uottawa.ca Plan Why privacy?? Classification

More information

How Well Does Privacy Compose?

How Well Does Privacy Compose? How Well Does Privacy Compose? Thomas Steinke Almaden IPAM/UCLA, Los Angeles CA, 10 Jan. 2018 This Talk Composition!! What is composition? Why is it important? Composition & high-dimensional (e.g. genetic)

More information

Answering Many Queries with Differential Privacy

Answering Many Queries with Differential Privacy 6.889 New Developments in Cryptography May 6, 2011 Answering Many Queries with Differential Privacy Instructors: Shafi Goldwasser, Yael Kalai, Leo Reyzin, Boaz Barak, and Salil Vadhan Lecturer: Jonathan

More information

Cryptography CS 555. Topic 25: Quantum Crpytography. CS555 Topic 25 1

Cryptography CS 555. Topic 25: Quantum Crpytography. CS555 Topic 25 1 Cryptography CS 555 Topic 25: Quantum Crpytography CS555 Topic 25 1 Outline and Readings Outline: What is Identity Based Encryption Quantum cryptography Readings: CS555 Topic 25 2 Identity Based Encryption

More information

An Algorithm for k-anonymitybased Fingerprinting

An Algorithm for k-anonymitybased Fingerprinting An Algorithm for k-anonymitybased Fingerprinting Sebastian Schrittwieser, Peter Kieseberg, Isao Echizen, Sven Wohlgemuth and Noboru Sonehara, Edgar Weippl National Institute of Informatics, Japan Vienna

More information

5th March Unconditional Security of Quantum Key Distribution With Practical Devices. Hermen Jan Hupkes

5th March Unconditional Security of Quantum Key Distribution With Practical Devices. Hermen Jan Hupkes 5th March 2004 Unconditional Security of Quantum Key Distribution With Practical Devices Hermen Jan Hupkes The setting Alice wants to send a message to Bob. Channel is dangerous and vulnerable to attack.

More information

Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification

Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang Du, Zhouxuan Teng, and Zutao Zhu Department of Electrical Engineering and Computer Science Syracuse University, Syracuse,

More information

Basics in Cryptology. Outline. II Distributed Cryptography. Key Management. Outline. David Pointcheval. ENS Paris 2018

Basics in Cryptology. Outline. II Distributed Cryptography. Key Management. Outline. David Pointcheval. ENS Paris 2018 Basics in Cryptology II Distributed Cryptography David Pointcheval Ecole normale supérieure, CNRS & INRIA ENS Paris 2018 NS/CNRS/INRIA Cascade David Pointcheval 1/26ENS/CNRS/INRIA Cascade David Pointcheval

More information

EE 381V: Large Scale Learning Spring Lecture 16 March 7

EE 381V: Large Scale Learning Spring Lecture 16 March 7 EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based

More information

Collaborative filtering with information-rich and information-sparse entities

Collaborative filtering with information-rich and information-sparse entities Mach Learn (2014) 97:177 203 DOI 10.1007/s10994-014-5454-z Collaborative filtering with information-rich and information-sparse entities Kai Zhu Rui Wu Lei Ying R. Srikant Received: 2 March 2014 / Accepted:

More information

The Pennsylvania State University The Graduate School College of Engineering A FIRST PRINCIPLE APPROACH TOWARD DATA PRIVACY AND UTILITY

The Pennsylvania State University The Graduate School College of Engineering A FIRST PRINCIPLE APPROACH TOWARD DATA PRIVACY AND UTILITY The Pennsylvania State University The Graduate School College of Engineering A FIRST PRINCIPLE APPROACH TOWARD DATA PRIVACY AND UTILITY A Dissertation in Computer Science and Engineering by Bing-Rong Lin

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

CMPUT651: Differential Privacy

CMPUT651: Differential Privacy CMPUT65: Differential Privacy Homework assignment # 2 Due date: Apr. 3rd, 208 Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course, you are encouraged

More information

Towards a Systematic Analysis of Privacy Definitions

Towards a Systematic Analysis of Privacy Definitions Towards a Systematic Analysis of Privacy Definitions Bing-Rong Lin and Daniel Kifer Department of Computer Science & Engineering, Pennsylvania State University, {blin,dkifer}@cse.psu.edu Abstract In statistical

More information

t-closeness: Privacy Beyond k-anonymity and l-diversity

t-closeness: Privacy Beyond k-anonymity and l-diversity t-closeness: Privacy Beyond k-anonymity and l-diversity Ninghui Li Tiancheng Li Department of Computer Science, Purdue University {ninghui, li83}@cs.purdue.edu Suresh Venkatasubramanian AT&T Labs Research

More information

A Key Recovery Attack on MDPC with CCA Security Using Decoding Errors

A Key Recovery Attack on MDPC with CCA Security Using Decoding Errors A Key Recovery Attack on MDPC with CCA Security Using Decoding Errors Qian Guo Thomas Johansson Paul Stankovski Dept. of Electrical and Information Technology, Lund University ASIACRYPT 2016 Dec 8th, 2016

More information

On Node-differentially Private Algorithms for Graph Statistics

On Node-differentially Private Algorithms for Graph Statistics On Node-differentially Private Algorithms for Graph Statistics Om Dipakbhai Thakkar August 26, 2015 Abstract In this report, we start by surveying three papers on node differential privacy. First, we look

More information

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada Maryam Shoaran Alex Thomo Jens Weber University of Victoria, Canada Introduction Challenge: Evidence of Participation Sample Aggregates Zero-Knowledge Privacy Analysis of Utility of ZKP Conclusions 12/17/2015

More information

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent

More information

CS-E4320 Cryptography and Data Security Lecture 11: Key Management, Secret Sharing

CS-E4320 Cryptography and Data Security Lecture 11: Key Management, Secret Sharing Lecture 11: Key Management, Secret Sharing Céline Blondeau Email: celine.blondeau@aalto.fi Department of Computer Science Aalto University, School of Science Key Management Secret Sharing Shamir s Threshold

More information

Lecture 3 Linear Algebra Background

Lecture 3 Linear Algebra Background Lecture 3 Linear Algebra Background Dan Sheldon September 17, 2012 Motivation Preview of next class: y (1) w 0 + w 1 x (1) 1 + w 2 x (1) 2 +... + w d x (1) d y (2) w 0 + w 1 x (2) 1 + w 2 x (2) 2 +...

More information

Pseudonym and Anonymous Credential Systems. Kyle Soska 4/13/2016

Pseudonym and Anonymous Credential Systems. Kyle Soska 4/13/2016 Pseudonym and Anonymous Credential Systems Kyle Soska 4/13/2016 Moving Past Encryption Encryption Does: Hide the contents of messages that are being communicated Provide tools for authenticating messages

More information

COS433/Math 473: Cryptography. Mark Zhandry Princeton University Spring 2018

COS433/Math 473: Cryptography. Mark Zhandry Princeton University Spring 2018 COS433/Math 473: Cryptography Mark Zhandry Princeton University Spring 2018 Secret Sharing Vault should only open if both Alice and Bob are present Vault should only open if Alice, Bob, and Charlie are

More information

arxiv: v1 [math.co] 12 Dec 2017

arxiv: v1 [math.co] 12 Dec 2017 Topology of Privacy: Lattice Structures and Information Bubbles for Inference and Obfuscation arxiv:1712.04130v1 [math.co] 12 Dec 2017 Michael Erdmann Carnegie Mellon University December 12, 2017 c 2017

More information

Differentially Private ANOVA Testing

Differentially Private ANOVA Testing Differentially Private ANOVA Testing Zachary Campbell campbza@reed.edu Andrew Bray abray@reed.edu Anna Ritz Biology Department aritz@reed.edu Adam Groce agroce@reed.edu arxiv:1711335v2 [cs.cr] 20 Feb 2018

More information

SOCIAL networks provide a virtual stage for users to

SOCIAL networks provide a virtual stage for users to IEEE TRANSACTION ON DEPENDABLE AND SECURE COMPUTING 1 Collective Data-Sanitization for Preventing Sensitive Information Inference Attacks in Social Networks Zhipeng Cai, Senior Member, IEEE, Zaobo He,

More information

Privacy in Statistical Databases

Privacy in Statistical Databases Privacy in Statistical Databases Individuals x 1 x 2 x n Server/agency ) answers. A queries Users Government, researchers, businesses or) Malicious adversary What information can be released? Two conflicting

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

Provably Private Data Anonymization: Or, k-anonymity Meets Differential Privacy

Provably Private Data Anonymization: Or, k-anonymity Meets Differential Privacy Provably Private Data Anonymization: Or, k-anonymity Meets Differential Privacy Ninghui Li Purdue University 305 N. University Street, West Lafayette, IN 47907, USA ninghui@cs.purdue.edu Wahbeh Qardaji

More information

Large-scale Collaborative Ranking in Near-Linear Time

Large-scale Collaborative Ranking in Near-Linear Time Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack

More information

Introduction to Modern Cryptography Lecture 11

Introduction to Modern Cryptography Lecture 11 Introduction to Modern Cryptography Lecture 11 January 10, 2017 Instructor: Benny Chor Teaching Assistant: Orit Moskovich School of Computer Science Tel-Aviv University Fall Semester, 2016 17 Tuesday 12:00

More information

Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective Supplementary Materials

Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective Supplementary Materials Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective Supplementary Materials 1. Contents The supplementary materials contain auxiliary experiments for the empirical analyses

More information

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 29-30, 2015 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 1 / 61 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Context: Theory of variable privacy. The channel coding theorem and the security of binary randomization. Poorvi Vora Hewlett-Packard Co.

Context: Theory of variable privacy. The channel coding theorem and the security of binary randomization. Poorvi Vora Hewlett-Packard Co. Context: Theory of variable privacy The channel coding theorem and the security of binary randomization Poorvi Vora Hewlett-Packard Co. Theory of security allows binary choice for data revelation: Bob

More information

Privacy-preserving logistic regression

Privacy-preserving logistic regression Privacy-preserving logistic regression Kamalika Chaudhuri Information Theory and Applications University of California, San Diego kamalika@soe.ucsd.edu Claire Monteleoni Center for Computational Learning

More information

Lecture 1 Introduction to Differential Privacy: January 28

Lecture 1 Introduction to Differential Privacy: January 28 Introduction to Differential Privacy CSE711: Topics in Differential Privacy Spring 2016 Lecture 1 Introduction to Differential Privacy: January 28 Lecturer: Marco Gaboardi Scribe: Marco Gaboardi This note

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

AN AXIOMATIC VIEW OF STATISTICAL PRIVACY AND UTILITY

AN AXIOMATIC VIEW OF STATISTICAL PRIVACY AND UTILITY AN AXIOMATIC VIEW OF STATISTICAL PRIVACY AND UTILITY DANIEL KIFER AND BING-RONG LIN Abstract. Privacy and utility are words that frequently appear in the literature on statistical privacy. But what do

More information

Enabling Accurate Analysis of Private Network Data

Enabling Accurate Analysis of Private Network Data Enabling Accurate Analysis of Private Network Data Michael Hay Joint work with Gerome Miklau, David Jensen, Chao Li, Don Towsley University of Massachusetts, Amherst Vibhor Rastogi, Dan Suciu University

More information

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Mingyan Liu (Joint work with Yang Liu) Department of Electrical Engineering and Computer Science University of Michigan,

More information

Pufferfish Privacy Mechanisms for Correlated Data. Shuang Song, Yizhen Wang, Kamalika Chaudhuri University of California, San Diego

Pufferfish Privacy Mechanisms for Correlated Data. Shuang Song, Yizhen Wang, Kamalika Chaudhuri University of California, San Diego Pufferfish Privacy Mechanisms for Correlated Data Shuang Song, Yizhen Wang, Kamalika Chaudhuri University of California, San Diego Sensitive Data with Correlation } Data from social networks, } eg. Spreading

More information

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Katrina Ligett HUJI & Caltech joint with Seth Neel, Aaron Roth, Bo Waggoner, Steven Wu NIPS 2017

More information

Answering Numeric Queries

Answering Numeric Queries Answering Numeric Queries The Laplace Distribution: Lap(b) is the probability distribution with p.d.f.: p x b) = 1 x exp 2b b i.e. a symmetric exponential distribution Y Lap b, E Y = b Pr Y t b = e t Answering

More information

Nature Methods: doi: /nmeth Supplementary Figure 1

Nature Methods: doi: /nmeth Supplementary Figure 1 Supplementary Figure 1 Schematic comparison of linking attacks and detection of a genome in a mixture of attacks. (a) Each box in the figure represents a dataset in the form of a matrix. Multiple boxes

More information

On the Complexity of the Minimum Independent Set Partition Problem

On the Complexity of the Minimum Independent Set Partition Problem On the Complexity of the Minimum Independent Set Partition Problem T-H. Hubert Chan 1, Charalampos Papamanthou 2, and Zhichao Zhao 1 1 Department of Computer Science the University of Hong Kong {hubert,zczhao}@cs.hku.hk

More information

CSE 598A Algorithmic Challenges in Data Privacy January 28, Lecture 2

CSE 598A Algorithmic Challenges in Data Privacy January 28, Lecture 2 CSE 598A Algorithmic Challenges in Data Privacy January 28, 2010 Lecture 2 Lecturer: Sofya Raskhodnikova & Adam Smith Scribe: Bing-Rong Lin Theme: Using global sensitivity of f. Suppose we have f of interest.

More information

Statistical Privacy For Privacy Preserving Information Sharing

Statistical Privacy For Privacy Preserving Information Sharing Statistical Privacy For Privacy Preserving Information Sharing Johannes Gehrke Cornell University http://www.cs.cornell.edu/johannes Joint work with: Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh

More information

arxiv: v4 [cs.db] 1 Sep 2017

arxiv: v4 [cs.db] 1 Sep 2017 Composing Differential Privacy and Secure Computation: A case study on scaling private record linkage arxiv:70.00535v4 [cs.db] Sep 07 ABSTRACT Xi He Duke University hexi88@cs.duke.edu Cheryl Flynn AT&T

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Recommender Systems Instructor: Yizhou Sun yzsun@cs.ucla.edu May 17, 2017 Methods Learnt: Last Lecture Classification Clustering Vector Data Text Data Recommender System Decision

More information

Differential Privacy: An Economic Method for Choosing Epsilon

Differential Privacy: An Economic Method for Choosing Epsilon Differential Privacy: An Economic Method for Choosing Epsilon Justin Hsu Marco Gaboardi Andreas Haeberlen Sanjeev Khanna Arjun Narayan Benjamin C. Pierce Aaron Roth University of Pennsylvania, USA University

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US

More information

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University 2018 EE448, Big Data Mining, Lecture 10 Recommender Systems Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of This Course Overview of

More information

CS120, Quantum Cryptography, Fall 2016

CS120, Quantum Cryptography, Fall 2016 CS10, Quantum Cryptography, Fall 016 Homework # due: 10:9AM, October 18th, 016 Ground rules: Your homework should be submitted to the marked bins that will be by Annenberg 41. Please format your solutions

More information

Identification, Data Combination and the Risk of Disclosure 1 Tatiana Komarova, 2 Denis Nekipelov, 3 Evgeny Yakovlev. 4

Identification, Data Combination and the Risk of Disclosure 1 Tatiana Komarova, 2 Denis Nekipelov, 3 Evgeny Yakovlev. 4 Identification, Data Combination and the Risk of Disclosure 1 Tatiana Komarova, 2 Denis Nekipelov, 3 Evgeny Yakovlev. 4 This version: September 24, 2012 ABSTRACT Data combination is routinely used by researchers

More information

k-points-of-interest Low-Complexity Privacy-Preserving k-pois Search Scheme by Dividing and Aggregating POI-Table

k-points-of-interest Low-Complexity Privacy-Preserving k-pois Search Scheme by Dividing and Aggregating POI-Table Computer Security Symposium 2014 22-24 October 2014 k-points-of-interest 223-8522 3-14-1 utsunomiya@sasase.ics.keio.ac.jp POIs Points of Interest Lien POI POI POI POI Low-Complexity Privacy-Preserving

More information

Lecture 10. Public Key Cryptography: Encryption + Signatures. Identification

Lecture 10. Public Key Cryptography: Encryption + Signatures. Identification Lecture 10 Public Key Cryptography: Encryption + Signatures 1 Identification Public key cryptography can be also used for IDENTIFICATION Identification is an interactive protocol whereby one party: prover

More information

Differentially Private Password Frequency Lists

Differentially Private Password Frequency Lists Differentially Private Password Frequency Lists Or, How to release statistics from 70 million passwords on purpose Jeremiah Blocki Microsoft Research Email: jblocki@microsoftcom Anupam Datta Carnegie Mellon

More information

Extremal Mechanisms for Local Differential Privacy

Extremal Mechanisms for Local Differential Privacy Extremal Mechanisms for Local Differential Privacy Peter Kairouz Sewoong Oh 2 Pramod Viswanath Department of Electrical & Computer Engineering 2 Department of Industrial & Enterprise Systems Engineering

More information

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015 CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring 2015 3 February 2015 The interactive online mechanism that we discussed in the last two lectures allows efficient

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information

Privacy Preserving Neighbourhood-based Collaborative Filtering Recommendation Algorithms

Privacy Preserving Neighbourhood-based Collaborative Filtering Recommendation Algorithms Privacy Preserving Neighbourhood-based Collaborative Filtering Recommendation Algorithms Zhigang Lu School of Computer Science The University of Adelaide A thesis submitted for the degree of Master of

More information

VIDEO Intypedia008en LESSON 8: SECRET SHARING PROTOCOL. AUTHOR: Luis Hernández Encinas. Spanish Scientific Research Council in Madrid, Spain

VIDEO Intypedia008en LESSON 8: SECRET SHARING PROTOCOL. AUTHOR: Luis Hernández Encinas. Spanish Scientific Research Council in Madrid, Spain VIDEO Intypedia008en LESSON 8: SECRET SHARING PROTOCOL AUTHOR: Luis Hernández Encinas Spanish Scientific Research Council in Madrid, Spain Hello and welcome to Intypedia. We have learned many things about

More information

DAN FELDMAN, UNIVERSITY OF HAIFA

DAN FELDMAN, UNIVERSITY OF HAIFA PRIVATE CORE-SETS FOR FACE IDENTIFICATION DAN FELDMAN, UNIVERSITY OF HAIFA A JOINT WORK WITH Rita Osadchy: SOME SLIDES FROM: KOBBI NISSIM [STOC 11] PRIVATE CORESETS [OAKLAND 10] - SECURE COMPUTATION OF

More information

Local Differential Privacy

Local Differential Privacy Local Differential Privacy Peter Kairouz Department of Electrical & Computer Engineering University of Illinois at Urbana-Champaign Joint work with Sewoong Oh (UIUC) and Pramod Viswanath (UIUC) / 33 Wireless

More information

ECash and Anonymous Credentials

ECash and Anonymous Credentials ECash and Anonymous Credentials CS/ECE 598MAN: Applied Cryptography Nikita Borisov November 9, 2009 1 E-cash Chaum s E-cash Offline E-cash 2 Anonymous Credentials e-cash-based Credentials Brands Credentials

More information

Utility-Privacy Tradeoff in Databases: An Information-theoretic Approach

Utility-Privacy Tradeoff in Databases: An Information-theoretic Approach 1 Utility-Privacy Tradeoff in Databases: An Information-theoretic Approach Lalitha Sankar, Member, IEEE, S. Raj Rajagopalan, and H. Vincent Poor, Fellow, IEEE arxiv:1102.3751v4 [cs.it] 21 Jan 2013 Abstract

More information

Interact with Strangers

Interact with Strangers Interact with Strangers RATE: Recommendation-aware Trust Evaluation in Online Social Networks Wenjun Jiang 1, 2, Jie Wu 2, and Guojun Wang 1 1. School of Information Science and Engineering, Central South

More information

Bounded Privacy: Formalising the Trade-Off Between Privacy and Quality of Service

Bounded Privacy: Formalising the Trade-Off Between Privacy and Quality of Service H. Langweg, H. Langweg, M. Meier, M. Meier, B.C. B.C. Witt, Witt, D. Reinhardt D. Reinhardt et al. (Hrsg.): Sicherheit 2018, Lecture Notes in ininformatics (LNI), Gesellschaft für fürinformatik, Bonn 2018

More information