Lecture 7: Fingerprinting. David Woodruff Carnegie Mellon University

Similar documents
Chapter 6 Randomization Algorithm Theory WS 2012/13 Fabian Kuhn

Chapter 7 Randomization Algorithm Theory WS 2017/18 Fabian Kuhn

Lecture 10 Oct. 3, 2017

String Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then

compare to comparison and pointer based sorting, binary trees

LECTURE 5: APPLICATIONS TO CRYPTOGRAPHY AND COMPUTATIONS

Randomness. What next?

Lecture 31: Miller Rabin Test. Miller Rabin Test

CS483 Design and Analysis of Algorithms

1. Algebra 1.7. Prime numbers

U.C. Berkeley CS174: Randomized Algorithms Lecture Note 12 Professor Luca Trevisan April 29, 2003

25.2 Last Time: Matrix Multiplication in Streaming Model

Cryptography CS 555. Topic 18: RSA Implementation and Security. CS555 Topic 18 1

Introduction to Modern Cryptography. Benny Chor

CPSC 467b: Cryptography and Computer Security

Primality Testing. 1 Introduction. 2 Brief Chronology of Primality Testing. CS265/CME309, Fall Instructor: Gregory Valiant

Lecture 2: Randomized Algorithms

Lecture 22: RSA Encryption. RSA Encryption

CSE525: Randomized Algorithms and Probabilistic Analysis April 2, Lecture 1

1 Recommended Reading 1. 2 Public Key/Private Key Cryptography Overview RSA Algorithm... 2

Instructor: Bobby Kleinberg Lecture Notes, 25 April The Miller-Rabin Randomized Primality Test

Lecture 1: Introduction to Public key cryptography

Cryptography: Joining the RSA Cryptosystem

1 Randomized Computation

Suppose F is a field and a1,..., a6 F. Definition 1. An elliptic curve E over a field F is a curve given by an equation:

Applied Cryptography and Computer Security CSE 664 Spring 2018

ALG 4.1 Randomized Pattern Matching:

basics of security/cryptography

conp = { L L NP } (1) This problem is essentially the same as SAT because a formula is not satisfiable if and only if its negation is a tautology.

The streaming k-mismatch problem

Theory of Computer Science to Msc Students, Spring Lecture 2

15-855: Intensive Intro to Complexity Theory Spring Lecture 16: Nisan s PRG for small space

Algorithms CMSC Homework set #1 due January 14, 2015

cse 311: foundations of computing Fall 2015 Lecture 12: Primes, GCD, applications

CSCI Honor seminar in algorithms Homework 2 Solution

Correctness, Security and Efficiency of RSA

2. Exact String Matching

CSCI3390-Lecture 16: Probabilistic Algorithms: Number Theory and Cryptography

Lecture 16: Communication Complexity

Lecture 11: Key Agreement

Communication Complexity. The dialogues of Alice and Bob...

CPSC 467b: Cryptography and Computer Security

Great Theoretical Ideas in Computer Science

Asymmetric Encryption

Factorization & Primality Testing

Tutorial on Quantum Computing. Vwani P. Roychowdhury. Lecture 1: Introduction

WXML Final Report: AKS Primality Test

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

CS5371 Theory of Computation. Lecture 18: Complexity III (Two Classes: P and NP)

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

CPSC 467b: Cryptography and Computer Security

Lecture 7: ElGamal and Discrete Logarithms

Module 9: Tries and String Matching

Introduction to Cryptography. Lecture 6

Continuing discussion of CRC s, especially looking at two-bit errors

University of Tokyo: Advanced Algorithms Summer Lecture 6 27 May. Let s keep in mind definitions from the previous lecture:

One can use elliptic curves to factor integers, although probably not RSA moduli.

CPSC 467: Cryptography and Computer Security

Lecture 3 Sept. 4, 2014

Cryptography IV: Asymmetric Ciphers

10 Concrete candidates for public key crypto

10 Modular Arithmetic and Cryptography

Distributed Statistical Estimation of Matrix Products with Applications

String Search. 6th September 2018

Public-Key Cryptosystems CHAPTER 4

Lecture Lecture 9 October 1, 2015

Lecture 19: Public-key Cryptography (Diffie-Hellman Key Exchange & ElGamal Encryption) Public-key Cryptography

Theme : Cryptography. Instructor : Prof. C Pandu Rangan. Speaker : Arun Moorthy CS

15-251: Great Theoretical Ideas In Computer Science Recitation 9 : Randomized Algorithms and Communication Complexity Solutions

Ma/CS 6a Class 4: Primality Testing

CIS 551 / TCOM 401 Computer and Network Security

Definition: For a positive integer n, if 0<a<n and gcd(a,n)=1, a is relatively prime to n. Ahmet Burak Can Hacettepe University

Overview. Background / Context. CSC 580 Cryptography and Computer Security. March 21, 2017

Cryptography and Security Final Exam

Chapter 9 Mathematics of Cryptography Part III: Primes and Related Congruence Equations

ENEE 457: Computer Systems Security 10/3/16. Lecture 9 RSA Encryption and Diffie-Helmann Key Exchange

Lecture 11 - Basic Number Theory.

Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography

Topics in Cryptography. Lecture 5: Basic Number Theory

Public-Key Encryption: ElGamal, RSA, Rabin

Primes and Factorization

Problem Set 4 Solutions

1 The Low-Degree Testing Assumption

COMS W4995 Introduction to Cryptography September 29, Lecture 8: Number Theory

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculty of Mathematics and Computer Science Exam Cryptology, Tuesday 30 October 2018

APA: Estep, Samuel (2018) "Elliptic Curves" The Kabod 4( 2 (2018)), Article 1. Retrieved from vol4/iss2/1

1 Number Theory Basics

Lecture Notes 20: Zero-Knowledge Proofs

INF 4130 / /8-2017

Public Key Cryptography. All secret key algorithms & hash algorithms do the same thing but public key algorithms look very different from each other.

INF 4130 / /8-2014

Introduction to Number Theory. The study of the integers

NORTHWESTERN UNIVERSITY Thrusday, Oct 6th, 2011 ANSWERS FALL 2011 NU PUTNAM SELECTION TEST

Computing the Entropy of a Stream

Lecture 8: Number theory

Lecture 24: Randomized Complexity, Course Summary

Introduction to Cryptography. Lecture 8

Some mysteries of multiplication, and how to generate random factored integers

CSCI-B609: A Theorist s Toolkit, Fall 2016 Oct 4. Theorem 1. A non-zero, univariate polynomial with degree d has at most d roots.

Lecture Prelude: Agarwal-Biswas Probabilistic Testing

Transcription:

Lecture 7: Fingerprinting David Woodruff Carnegie Mellon University

How to Pick a Random Prime How to pick a random prime in the range {1, 2,, M}? How to pick a random integer X? Pick a uniformly random bit string of length log M 1 If it represents a number M, output X. Else go back to the last step In expectation, repeat this step at most twice How to check if X is prime? Miller Rabin primality test very efficient but fails with tiny probability Agrawal Kayal Saxena has a worse running time, but deterministic How likely is X to be prime?

Density of Primes Let π n be the number primes in the set {1, 2,, n} Prime Number Theorem: lim 1 / Chebyshev: π n n/ ln n for every n 2 If we want at least k primes in {1, 2,, n}, then n 2klgk, if k 4 Dusart: For n > 60184, we have. π n

String Equality Problem Alice Bob Is x = y? x x and y are N bit strings y Alice and Bob want to exchange messages to decide if x = y Alice could send x to Bob but this takes N communication Is there a more efficient scheme?

String Equality Problem Suppose we are OK if we achieve a probabilistic guarantee: If x = y, then Pr[Bob says equal] = 1 If x y, then Pr[Bob says unequal] 1 δ Protocol Alice chooses a random prime p from {1, 2,, M} for M 2 5N lg 5N She sends Bob p and the value h x x mod p, where we think of x as an integer in {0, 1, 2,, 2 1} If h x y mod p, Bob says equal, else he says unequal

String Equality Problem Lemma: If x = y, then Bob always says equal Proof: If x = y, then x mod p = y mod p. So Bob s test will always succeed Lemma: If x y, then Pr[Bob says equal].2 Proof: Interpret x, y 0, 1, 2,, 2 1 If Bob says equal, then x mod p = y mod p, i.e., (x y) = 0 mod p So p divides D = x y, and D < 2 D = p p p for primes p,,p which may repeat Since each p 2,we have k < N Pr[p divides D] why?,,,

Communication Cost If Alice were to naively send x to Bob, would take N bits of communication Instead she sends a prime p and x mod p, where p is in {1, 2,, M} and M 2 5N lg 5N Communication = O(log p) = O(log M) = O(log N + log log N) = O(log N) bits

Reducing the Error Probability We have 20% error probability, how to reduce it to δ? Repeat the scheme r = log δ times independently with primes p,,p {1, 2,, M}, and M 2 5N lg 5N Bob outputs equal if and only if x = y mod p for each i If x = y, Bob outputs equal with probability 1 If x y, Bob outputs equal with probability at most δ Communication cost is O(log(1/δ log N. Can we do better? If instead Alice sets M = 2 snlg sn, the number of primes in {1, 2,, M} is at least sn, and so error probability is 1/s. Set s = 1/δ. Communication is O(log M) = O(log s + log N) = O(log(1/δ + log N)

Fingerprinting (the Karp Rabin Method) In the string matching problem, we have A text T of length m A pattern P of length n Goal: output all occurrences of the pattern P inside the text T If T = abracadabra and P = ab, the output should be {0,7} abracadabra Consider h x x mod p for x 0,1, where we think of x as an integer in {0, 1, 2,, 2 1}

Fingerprinting (the Karp Rabin Method) h x x mod p for x 0,1 Create x by dropping the most significant bit of x, and appending a bit to the right E.g., if x = 0011001, then x could be 0110010 or 0110011 Given h x z, can we compute h x quickly? Suppose x is the lowest order bit of x, and x is the highest order bit of x x 2 x x 2 x Since h a b h a h b mod p, and h 2a 2h a mod p, h x 2h x x h 2 x mod p Given and 2, this is just O(1) arithmetic operations mod p

Fingerprinting (the Karp Rabin Method) T denotes the string from the a th to b th positions of T, inclusive Goal: output all locations a in {0, 1,, m n} such that T P 1. Pick a random prime p {1, 2,, M} with M 2s n lg sn for some s 2. Compute h P and h 2 and store the results 3. Compute h T and check if it equals h P. If so, output match at location 0 4. For each i {0,, m n 1}, compute h T using h T and h 2. If h T h P, output match at location i+1

Error Probability m n 1 mcomparisons, each with probability at most 1/s of failure By a union bound, the probability there is at least one failure is at most m/s If s = 100m, we succeed on all comparisons with probability 99/100 M 2s n lg sn = O(mn log(mn)), so O(log m + log n) bits to store Since p in {1, 2,, M}, p takes O(log m + log n) bits to store Assume unit cost RAM model, so operations on O(log(mn)) bits take O(1) time

Running Time Computing h x for n bit x can be done in O(n) time. Why? So h P,h 2, and h T,, can be computed in O(n) time Computing h T using h T and h 2 done in O(1) time! can be Total time is O(m + n), which is optimal

Fingerprinting Extensions Fingerprinting also works for strings x 0, 1, 2,, q 1 Think of x as an integer q,, x in its q ary representation Drop the leftmost digit of x to create x, and append a digit to the right If x x,x,x,,x, then x x,x,,x,x x q x x q x h x q h x x h q x mod p Given h x and h q, if q < p, computing h x arithmetic operations mod p requires O(1)

Extensions How would you solve the following? Given an m x m bit rectangular binary text T, and an n x n bit pattern P, where n m and n m, find all occurrences of P inside T. Show how to do this in O m m time Assume you can do modular arithmetic of integers at most poly(m m in O(1) time

Extensions Walk through the columns of T, and create fingerprints h T,, of the n values T,,T,,,T, q poly m m n Walk through the rows of T, and for the (i,j) th entry, create a fingerprint of the n values h T,,, h T,,,, h T,, Note: the fingerprints are of q ary instead of binary strings, but when fingerprinting these strings we can use a prime p poly m m n n. Show this! Walking through the columns and rows and creating the fingerprints, and comparing with the hash of the pattern P, takes O m m time