RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response
|
|
- Paulina Cook
- 5 years ago
- Views:
Transcription
1 RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova Google & USC Presented By: Pat Pannuto
2 RAPPOR, What is is good for? (Absolutely something!) 1. Google wants to collect user metrics 2. Google doesn t want to be creepy Or subject to subpoenas etc etc 3. Generic tool to collect pretty much an information of interest Booleans Ordinals Numeric values Arbitrary strings (!)
3 (Refresher): Randomized Survey Mechanism Consider a potentially embarrassing question: Did you vote for Donald Trump? 1. Flip a coin If heads: Say Yes. If tails: Flip coin again If heads: Say no. If tails: Answer truthfully 2. P(Y Y) = ; P(Y N) =.5 + 0; P(N Y) = ; P(N N) = But what if I ask the same question again tomorrow?
4 Memoization enables privacy tradeoff The idea: Play the randomized response game twice For actual answer, A, generate a permanent randomized response R Client saves a permanent mapping of A s -> R s for all time For every query, generate a noisy response randomly from R Longitudinal attacks reveal R not A Noisy responses mitigate short term tracking Not protected: Long-term widespread tracking ( big data ) Protected by policy, e.g. data retention rules
5 Memoization alone is not sufficient Guarantees weaken as true value changes Report the number of days old you are every day
6 The RAPPOR Algorithm 1. Given actual value v, use h hashes to populate Bloom filter size k 2. For each bit i in Bloom filter: ß Permanent Response B # = 1 with prob.5f; 0 with prob.5f; B # with prob 1 f Where f is a parameter that controls longitudinal privacy guarantee v 3. For each bit i in response S: ß Instantaneous Response P S # = 1 = q, if B # = 1; p, if B # = 0 B B S
7 Variations on RAPPOR (aka: When Pat wonders if this isn t what Google is really doing in practice ) One-Time RAPPOR Skip generation of S, just report B Basic RAPPOR No Bloom filter (i.e. direct map responses to bits; equivalently h = 1) Basic One-Time RAPPOR Combine the above Key: The one-time s don t actually memoize (fixes space problem at expense of longitudinal privacy )
8 How Private (and proofs!*)? *in the paper.. Permanent Randomized Response ε : = 2h ln(?@a B C A B C ) Small note: Note there is no k here, aka Bloom filters do not provide differential privacy Instantaneous Randomized Response Probabilities of seeing a 1 given B set or not set: q = P S # = 1 b # = 1 =? f p + q + 1 f q U p = P S # = 1 b # = 0 =? f p + q + 1 f p U ε? = h log ( W (?@X ) X (?@W ) )
9 Undoing all that hard work: Learning from RAPPOR-collected data Mitigate hash collisions via cohorts For each cohort, attempt to reconstruct aggregate real Bloom filters Count of times bit i set in S for cohort j t #Y = Z [\@(X] A B B CX)^\ (?@C)(W@X) Number of reports for cohort j Estimate of times bit i set in hidden B for all reporters in cohort j Consolidate into a vector Y of all t #Y s -- i 1, k ; j [1, m] Create a design matrix, X of size km x M, where M is candidate strings Columns of X contain hm 1 s, concatonation of all m cohorts Bloom filters Lasso regression for Y ~ X, then least squares, then Bonferroni correction of 0.05/M [or Benjamini-Hochberg]
10 RAPPOR parameter selection Must choose f, p, q k, h, m Recall, k and m do not affect privacy bounds ε : = 2h ln(?@a B C A B C )
11 What can Basic One-Time RAPPOR learn? For f=0, p=0.5, q=0.75, and confidence = And a uniform distribution of strings Uniform -> SNR problem For ln(3)-differential privacy: Roughly N/10 strings for N samples 1% frequency -> 1 million samples 0.01% -> 10 billion No theoretical analytics for real RAPPOR / non-uniform samples
12 Trade-off: False Discovery Rate vs Rare String Detection
13 Simulating learning a normal distribution q = 0.75, p = 0.5, ε = ln (3), f = 0
14 Exponential distribution of 1 million strings Query: Is string present? p = 0.5, q = 0.75, f = 0.5, h = 2, k = 128, m = 16 Also two false positives The point The tail is hard Caught everything > 1%
15 Real-world data Windows Process Names Chrome Homepages - 187k reports; 10k machines - unexpected frequency - ~2% have BADAPPLE - how did they search??
16 Final thoughts High-level concept simple and intuitive 2-level randomized response Extracting information requires know it is there Unclear how well client-side permanent random response scales
Locally Differentially Private Protocols for Frequency Estimation. Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha
Locally Differentially Private Protocols for Frequency Estimation Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha Differential Privacy Differential Privacy Classical setting Differential Privacy
More information1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:
CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More information1 Introduction. Keywords: privacy-preserving analytics, differential privacy
Proceedings on Privacy Enhancing Technologies ; 2016 (3):41 61 Giulia Fanti, Vasyl Pihur, and Úlfar Erlingsson Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationLocally Differentially Private Protocols for Frequency Estimation
Locally Differentially Private Protocols for Frequency Estimation Tianhao Wang, Jeremiah Blocki, and Ninghui Li, Purdue University; Somesh Jha, University of Wisconsin Madison https://www.usenix.org/conference/usenixsecurity7/technical-sessions/presentation/wang-tianhao
More informationCPSC 467: Cryptography and Computer Security
CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 14 October 16, 2013 CPSC 467, Lecture 14 1/45 Message Digest / Cryptographic Hash Functions Hash Function Constructions Extending
More informationAnalysis Based on SVM for Untrusted Mobile Crowd Sensing
Analysis Based on SVM for Untrusted Mobile Crowd Sensing * Ms. Yuga. R. Belkhode, Dr. S. W. Mohod *Student, Professor Computer Science and Engineering, Bapurao Deshmukh College of Engineering, India. *Email
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/26/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 More algorithms
More informationIndependence. CS 109 Lecture 5 April 6th, 2016
Independence CS 109 Lecture 5 April 6th, 2016 Today s Topics 2 Last time: Conditional Probability Bayes Theorem Today: Independence Conditional Independence Next time: Random Variables The Tragedy of Conditional
More informationCS 124 Math Review Section January 29, 2018
CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to
More informationIntroduction to Randomized Algorithms III
Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability
More informationComputational Learning Theory
1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationCS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS341 info session is on Thu 3/1 5pm in Gates415 CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/28/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets,
More information4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationWhy should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.
I. Probability basics (Sections 4.1 and 4.2) Flip a fair (probability of HEADS is 1/2) coin ten times. What is the probability of getting exactly 5 HEADS? What is the probability of getting exactly 10
More information12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)
12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) Many streaming algorithms use random hashing functions to compress data. They basically randomly map some data items on top of each other.
More informationToss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2
1 Basic Probabilities The probabilities that we ll be learning about build from the set theory that we learned last class, only this time, the sets are specifically sets of events. What are events? Roughly,
More informationAs mentioned, we will relax the conditions of our dictionary data structure. The relaxations we
CSE 203A: Advanced Algorithms Prof. Daniel Kane Lecture : Dictionary Data Structures and Load Balancing Lecture Date: 10/27 P Chitimireddi Recap This lecture continues the discussion of dictionary data
More informationHybrid Machine Learning Algorithms
Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised
More informationLinear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017
Linear Models: Comparing Variables Stony Brook University CSE545, Fall 2017 Statistical Preliminaries Random Variables Random Variables X: A mapping from Ω to ℝ that describes the question we care about
More informationPrivacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism
Privacy of Numeric Queries Via Simple Value Perturbation The Laplace Mechanism Differential Privacy A Basic Model Let X represent an abstract data universe and D be a multi-set of elements from X. i.e.
More informationPAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht
PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More informationImproved Direct Product Theorems for Randomized Query Complexity
Improved Direct Product Theorems for Randomized Query Complexity Andrew Drucker Nov. 16, 2010 Andrew Drucker, Improved Direct Product Theorems for Randomized Query Complexity 1/28 Big picture Usually,
More informationProbability and Discrete Distributions
AMS 7L LAB #3 Fall, 2007 Objectives: Probability and Discrete Distributions 1. To explore relative frequency and the Law of Large Numbers 2. To practice the basic rules of probability 3. To work with the
More informationStatistical Preliminaries. Stony Brook University CSE545, Fall 2016
Statistical Preliminaries Stony Brook University CSE545, Fall 2016 Random Variables X: A mapping from Ω to R that describes the question we care about in practice. 2 Random Variables X: A mapping from
More informationAlgorithms for Data Science
Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based
More informationCOMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from
COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from http://www.mmds.org Distance Measures For finding similar documents, we consider the Jaccard
More informationCS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32
CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 32 CS 473: Algorithms, Spring 2018 Universal Hashing Lecture 10 Feb 15, 2018 Most
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationComputational Learning Theory
Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm
More informationCS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =
CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14 1 Probability First, recall a couple useful facts from last time about probability: Linearity of expectation: E(aX + by ) = ae(x)
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationCS Communication Complexity: Applications and New Directions
CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the
More informationCSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13
CSCB63 Winter 2019 Week 11 Bloom Filters Anna Bretscher March 30, 2019 1 / 13 Today Bloom Filters Definition Expected Complexity Applications 2 / 13 Bloom Filters (Specification) A bloom filter is a probabilistic
More informationLecture Note 2. 1 Bonferroni Principle. 1.1 Idea. 1.2 Want. Material covered today is from Chapter 1 and chapter 4
Lecture Note 2 Material covere toay is from Chapter an chapter 4 Bonferroni Principle. Iea Get an iea the frequency of events when things are ranom billion = 0 9 Each person has a % chance to stay in a
More informationPrivacy in Statistical Databases
Privacy in Statistical Databases Individuals x 1 x 2 x n Server/agency ) answers. A queries Users Government, researchers, businesses or) Malicious adversary What information can be released? Two conflicting
More informationCPSC 467: Cryptography and Computer Security
CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 16 October 30, 2017 CPSC 467, Lecture 16 1/52 Properties of Hash Functions Hash functions do not always look random Relations among
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationRandomized Complexity Classes; RP
Randomized Complexity Classes; RP Let N be a polynomial-time precise NTM that runs in time p(n) and has 2 nondeterministic choices at each step. N is a polynomial Monte Carlo Turing machine for a language
More informationP (A) = P (B) = P (C) = P (D) =
STAT 145 CHAPTER 12 - PROBABILITY - STUDENT VERSION The probability of a random event, is the proportion of times the event will occur in a large number of repititions. For example, when flipping a coin,
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationUncertain Knowledge and Bayes Rule. George Konidaris
Uncertain Knowledge and Bayes Rule George Konidaris gdk@cs.brown.edu Fall 2018 Knowledge Logic Logical representations are based on: Facts about the world. Either true or false. We may not know which.
More informationPersonalized Social Recommendations Accurate or Private
Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma Outline Introduction Motivation The model General
More information[Title removed for anonymity]
[Title removed for anonymity] Graham Cormode graham@research.att.com Magda Procopiuc(AT&T) Divesh Srivastava(AT&T) Thanh Tran (UMass Amherst) 1 Introduction Privacy is a common theme in public discourse
More informationLecture 18 - Secret Sharing, Visual Cryptography, Distributed Signatures
Lecture 18 - Secret Sharing, Visual Cryptography, Distributed Signatures Boaz Barak November 27, 2007 Quick review of homework 7 Existence of a CPA-secure public key encryption scheme such that oracle
More informationOverview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland
Overview Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland Confidence Intervals When a random variable lies in an interval a X b with a specified
More informationLecture 24: Bloom Filters. Wednesday, June 2, 2010
Lecture 24: Bloom Filters Wednesday, June 2, 2010 1 Topics for the Final SQL Conceptual Design (BCNF) Transactions Indexes Query execution and optimization Cardinality Estimation Parallel Databases 2 Lecture
More informationLecture 5. 1 Review (Pairwise Independence and Derandomization)
6.842 Randomness and Computation September 20, 2017 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Tom Kolokotrones 1 Review (Pairwise Independence and Derandomization) As we discussed last time, we can
More informationPart 1: Hashing and Its Many Applications
1 Part 1: Hashing and Its Many Applications Sid C-K Chau Chi-Kin.Chau@cl.cam.ac.u http://www.cl.cam.ac.u/~cc25/teaching Why Randomized Algorithms? 2 Randomized Algorithms are algorithms that mae random
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationCSE 190, Great ideas in algorithms: Pairwise independent hash functions
CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required
More informationIntroduction to AI Learning Bayesian networks. Vibhav Gogate
Introduction to AI Learning Bayesian networks Vibhav Gogate Inductive Learning in a nutshell Given: Data Examples of a function (X, F(X)) Predict function F(X) for new examples X Discrete F(X): Classification
More informationBasic Probability Reference Sheet
February 27, 2001 Basic Probability Reference Sheet 17.846, 2001 This is intended to be used in addition to, not as a substitute for, a textbook. X is a random variable. This means that X is a variable
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More information6.895 Randomness and Computation March 19, Lecture Last Lecture: Boosting Weak Learners Into Strong Learners
6.895 Randomness and Computation March 9, 2008 Lecture 3 Lecturer: Ronitt Rubinfeld Scribe: Edwin Chen Overview. Last Lecture: Boosting Weak Learners Into Strong Learners In the last two lectures, we showed
More informationPr[C = c M = m] = Pr[C = c] Pr[M = m] Pr[M = m C = c] = Pr[M = m]
Midterm Review Sheet The definition of a private-key encryption scheme. It s a tuple Π = ((K n,m n,c n ) n=1,gen,enc,dec) where - for each n N, K n,m n,c n are sets of bitstrings; [for a given value of
More informationThe Derivative of a Function
The Derivative of a Function James K Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University March 1, 2017 Outline A Basic Evolutionary Model The Next Generation
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationCSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015
CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring 2015 3 February 2015 The interactive online mechanism that we discussed in the last two lectures allows efficient
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10
EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped
More informationLast few slides from last time
Last few slides from last time Example 3: What is the probability that p will fall in a certain range, given p? Flip a coin 50 times. If the coin is fair (p=0.5), what is the probability of getting an
More informationBayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net
CS 188: Artificial Intelligence Fall 2010 Lecture 15: ayes Nets II Independence 10/14/2010 an Klein UC erkeley A ayes net is an efficient encoding of a probabilistic model of a domain ayes Nets Questions
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationLecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing
Bloom filters and Hashing 1 Introduction The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of
More informationThe following are generally referred to as the laws or rules of exponents. x a x b = x a+b (5.1) 1 x b a (5.2) (x a ) b = x ab (5.
Chapter 5 Exponents 5. Exponent Concepts An exponent means repeated multiplication. For instance, 0 6 means 0 0 0 0 0 0, or,000,000. You ve probably noticed that there is a logical progression of operations.
More informationCPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017
CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations
More informationPHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1
PHP2510: Principles of Biostatistics & Data Analysis Lecture X: Hypothesis testing PHP 2510 Lec 10: Hypothesis testing 1 In previous lectures we have encountered problems of estimating an unknown population
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationWords vs. Terms. Words vs. Terms. Words vs. Terms. Information Retrieval cares about terms You search for em, Google indexes em Query:
Words vs. Terms Words vs. Terms Information Retrieval cares about You search for em, Google indexes em Query: What kind of monkeys live in Costa Rica? 600.465 - Intro to NLP - J. Eisner 1 600.465 - Intro
More informationUncertainty. Michael Peters December 27, 2013
Uncertainty Michael Peters December 27, 20 Lotteries In many problems in economics, people are forced to make decisions without knowing exactly what the consequences will be. For example, when you buy
More informationReport on Differential Privacy
Report on Differential Privacy Lembit Valgma Supervised by Vesal Vojdani December 19, 2017 1 Introduction Over the past decade the collection and analysis of personal data has increased a lot. This has
More informationTopics in Computer Mathematics
Random Number Generation (Uniform random numbers) Introduction We frequently need some way to generate numbers that are random (by some criteria), especially in computer science. Simulations of natural
More informationPrivacy-preserving Data Mining
Privacy-preserving Data Mining What is [data] privacy? Privacy and Data Mining Privacy-preserving Data mining: main approaches Anonymization Obfuscation Cryptographic hiding Challenges Definition of privacy
More informationAgainst the F-score. Adam Yedidia. December 8, This essay explains why the F-score is a poor metric for the success of a statistical prediction.
Against the F-score Adam Yedidia December 8, 2016 This essay explains why the F-score is a poor metric for the success of a statistical prediction. 1 What is the F-score? From Wikipedia: In statistical
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationStreaming - 2. Bloom Filters, Distinct Item counting, Computing moments. credits:www.mmds.org.
Streaming - 2 Bloom Filters, Distinct Item counting, Computing moments credits:www.mmds.org http://www.mmds.org Outline More algorithms for streams: 2 Outline More algorithms for streams: (1) Filtering
More informationLecture 24: Randomized Complexity, Course Summary
6.045 Lecture 24: Randomized Complexity, Course Summary 1 1/4 1/16 1/4 1/4 1/32 1/16 1/32 Probabilistic TMs 1/16 A probabilistic TM M is a nondeterministic TM where: Each nondeterministic step is called
More informationCounting. 1 Sum Rule. Example 1. Lecture Notes #1 Sept 24, Chris Piech CS 109
1 Chris Piech CS 109 Counting Lecture Notes #1 Sept 24, 2018 Based on a handout by Mehran Sahami with examples by Peter Norvig Although you may have thought you had a pretty good grasp on the notion of
More informationLecture 11: Non-Interactive Zero-Knowledge II. 1 Non-Interactive Zero-Knowledge in the Hidden-Bits Model for the Graph Hamiltonian problem
CS 276 Cryptography Oct 8, 2014 Lecture 11: Non-Interactive Zero-Knowledge II Instructor: Sanjam Garg Scribe: Rafael Dutra 1 Non-Interactive Zero-Knowledge in the Hidden-Bits Model for the Graph Hamiltonian
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements
More informationFrom Probability, For the Enthusiastic Beginner (Draft version, March 2016) David Morin,
Chapter 4 Distributions From Probability, For the Enthusiastic Beginner (Draft version, March 2016) David Morin, morin@physics.harvard.edu At the beginning of Section 3.1, we introduced the concepts of
More informationLecture 5, CPA Secure Encryption from PRFs
CS 4501-6501 Topics in Cryptography 16 Feb 2018 Lecture 5, CPA Secure Encryption from PRFs Lecturer: Mohammad Mahmoody Scribe: J. Fu, D. Anderson, W. Chao, and Y. Yu 1 Review Ralling: CPA Security and
More informationZero-Knowledge Proofs and Protocols
Seminar: Algorithms of IT Security and Cryptography Zero-Knowledge Proofs and Protocols Nikolay Vyahhi June 8, 2005 Abstract A proof is whatever convinces me. Shimon Even, 1978. Zero-knowledge proof is
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: A Bayesian model of concept learning Chris Lucas School of Informatics University of Edinburgh October 16, 218 Reading Rules and Similarity in Concept Learning
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationCS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms
CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms Tim Roughgarden March 3, 2016 1 Preamble In CS109 and CS161, you learned some tricks of
More informationMAT Mathematics in Today's World
MAT 1000 Mathematics in Today's World Last Time We discussed the four rules that govern probabilities: 1. Probabilities are numbers between 0 and 1 2. The probability an event does not occur is 1 minus
More informationMACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION
MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be
More informationLocal Differential Privacy
Local Differential Privacy Peter Kairouz Department of Electrical & Computer Engineering University of Illinois at Urbana-Champaign Joint work with Sewoong Oh (UIUC) and Pramod Viswanath (UIUC) / 33 Wireless
More informationCS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)
CS5314 Randomized Algorithms Lecture 15: Balls, Bins, Random Graphs (Hashing) 1 Objectives Study various hashing schemes Apply balls-and-bins model to analyze their performances 2 Chain Hashing Suppose
More informationProblem Set #2 Due: 1:00pm on Monday, Oct 15th
Chris Piech PSet #2 CS109 October 5, 2018 Problem Set #2 Due: 1:00pm on Monday, Oct 15th For each problem, briefly explain/justify how you obtained your answer in order to obtain full credit. Your explanations
More informationSYMMETRIC ENCRYPTION. Mihir Bellare UCSD 1
SYMMETRIC ENCRYPTION Mihir Bellare UCSD 1 Syntax A symmetric encryption scheme SE = (K, E, D) consists of three algorithms: K and E may be randomized, but D must be deterministic. Mihir Bellare UCSD 2
More information