Computational and Statistical Learning Theory

Similar documents
Computational and Statistical Learning Theory

Computational and Statistical Learning Theory

Introduction to Machine Learning

Computational and Statistical Learning Theory

Cryptography. Lecture 2: Perfect Secrecy and its Limitations. Gil Segev

CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrosky. Lecture 4

Implementation Tutorial on RSA

Lecture 15 - Zero Knowledge Proofs

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

Lecture 1: Perfect Secrecy and Statistical Authentication. 2 Introduction - Historical vs Modern Cryptography

Shift Cipher. For 0 i 25, the ith plaintext character is. E.g. k = 3

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

ICML '97 and AAAI '97 Tutorials

Introduction to Modern Cryptography. Benny Chor

Introduction to Cryptology. Lecture 2

Computational Learning Theory. Definitions

CPE 776:DATA SECURITY & CRYPTOGRAPHY. Some Number Theory and Classical Crypto Systems

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

9 Knapsack Cryptography

Lecture 19: Public-key Cryptography (Diffie-Hellman Key Exchange & ElGamal Encryption) Public-key Cryptography

Great Theoretical Ideas in Computer Science

COS433/Math 473: Cryptography. Mark Zhandry Princeton University Spring 2017

Background: Lattices and the Learning-with-Errors problem

Dan Boneh. Stream ciphers. The One Time Pad

Fully Homomorphic Encryption

COMS W4995 Introduction to Cryptography October 12, Lecture 12: RSA, and a summary of One Way Function Candidates.

Attacks on RSA & Using Asymmetric Crypto

Harvard CS 121 and CSCI E-121 Lecture 22: The P vs. NP Question and NP-completeness

Modern Cryptography Lecture 4

Theory of Computation Chapter 12: Cryptography

6.892 Computing on Encrypted Data September 16, Lecture 2

Introduction to Computational Learning Theory

Lectures 2+3: Provable Security

Computational and Statistical Learning Theory

Advanced Topics in Cryptography

Lecture 29: Computational Learning Theory

1 Recommended Reading 1. 2 Public Key/Private Key Cryptography Overview RSA Algorithm... 2

Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography

Lecture Notes, Week 6

6.080/6.089 GITCS Apr 15, Lecture 17

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

CIS 551 / TCOM 401 Computer and Network Security

Historical cryptography. cryptography encryption main applications: military and diplomacy

Practice Final Exam Winter 2017, CS 485/585 Crypto March 14, 2017

Lecture 13: Private Key Encryption

Lemma 1.2. (1) If p is prime, then ϕ(p) = p 1. (2) If p q are two primes, then ϕ(pq) = (p 1)(q 1).

Number Theory in Cryptography

1. Introduction Recap

Solution to Midterm Examination

Web-Mining Agents Computational Learning Theory

Math 412: Number Theory Lecture 13 Applications of

Cook-Levin Theorem. SAT is NP-complete

CODING AND CRYPTOLOGY III CRYPTOLOGY EXERCISES. The questions with a * are extension questions, and will not be included in the assignment.

Definition: For a positive integer n, if 0<a<n and gcd(a,n)=1, a is relatively prime to n. Ahmet Burak Can Hacettepe University

Review. CS311H: Discrete Mathematics. Number Theory. Computing GCDs. Insight Behind Euclid s Algorithm. Using this Theorem. Euclidian Algorithm

Public Key Cryptography

CPSC 467b: Cryptography and Computer Security

1 Indistinguishability for multiple encryptions

6.080/6.089 GITCS April 8, Lecture 15

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Foundations of Network and Computer Security

Computational Learning Theory (COLT)

Mathematics of Public Key Cryptography

SAT, NP, NP-Completeness

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Introduction to Cybersecurity Cryptography (Part 5)

Lecture 28: Public-key Cryptography. Public-key Cryptography

NP Completeness and Approximation Algorithms

COS433/Math 473: Cryptography. Mark Zhandry Princeton University Spring 2017

Lecture 22: RSA Encryption. RSA Encryption

A Knapsack Cryptosystem Secure Against Attacks Using Basis Reduction and Integer Programming

8 Elliptic Curve Cryptography

Quantum-secure symmetric-key cryptography based on Hidden Shifts

Statistical Learning Learning From Examples

COP 4531 Complexity & Analysis of Data Structures & Algorithms

Block Ciphers and Feistel cipher

RSA RSA public key cryptosystem

Lecture #14: NP-Completeness (Chapter 34 Old Edition Chapter 36) Discussion here is from the old edition.

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013

COMP4109 : Applied Cryptography

Computer Science A Cryptography and Data Security. Claude Crépeau

Lecture 18: More NP-Complete Problems

Public-Key Encryption: ElGamal, RSA, Rabin

Lecture Notes. Advanced Discrete Structures COT S

Cryptography: The Landscape, Fundamental Primitives, and Security. David Brumley Carnegie Mellon University

Winter 2008 Introduction to Modern Cryptography Benny Chor and Rani Hod. Assignment #2

Lecture 17 - Diffie-Hellman key exchange, pairing, Identity-Based Encryption and Forward Security

1 Public-key encryption

Comp487/587 - Boolean Formulas

Number theory (Chapter 4)

Solutions to the Mathematics Masters Examination

Cryptanalysis of a Public Key Cryptosystem Proposed at ACISP 2000

Lecture 7: CPA Security, MACs, OWFs

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Show that the following problems are NP-complete

Question 2.1. Show that. is non-negligible. 2. Since. is non-negligible so is μ n +

Summer School on Introduction to Algorithms and Optimization Techniques July 4-12, 2017 Organized by ACMU, ISI and IEEE CEDA.

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

Transcription:

Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning

Learning Using FIND-CONS For any family of hypothesis classes consider: FINDCONS H : Given a sample S, either return h H n consistent with the sample (i.e. s.t. L S h = 0), or declare that none exists. Decision problem CONS H S = 1 iff h Hn s.t. L S h = 0. Theorem: If H n over X n = R n is efficiently properly PAC learnable, then VCdim H n poly(n), and CONS H RP 3 TERM DNF n = T 1 T 2 T 3 T i CONJ n E.g. h x = x 1 x 7 x 23 x 75 x 2 x 3 x 5 x 6 x 7

Hardness of 3-TERM-DNF Claim: CONS 3 TERM DNF is NP-hard Proof: Reduction from graph 3-colorability 3COLOR = { undirected graph G(V, E) c:v 1,2,3 u v E c u c(v) } Map G V, E S G s.t. G 3COLOR S G CONS 3 T DNF S G = x i = 1,1,1,, 1,0,1,, 1,1, y = 1 i = 1.. V } { x ij = 1,1, 1,0,1,1, 1,0,1,1, 1,1, y = 0 i j E i i Claim: G 3COLOR S G CONS 3 T DNF S G satisfied by T = T 1 T 2 T 3 where T k = c i k x i Claim:S G CONS 3 T DNF G 3COLOR c(i) = min k s.t. T k satisfies x i j

Hardness of Learning 3-TERM-DNF Conclusion: If NP RP, then 3 TERM DNF is not efficiently properly PAC learnable Proof: 3 TERM DNF efficiently properly PAC learnable CONST 3 TERM DNF RP 3COLOR RP RP = NP Theorem: Assuming NP RP, H n over X n = 0,1 n of X n = R n is efficiently properly PAC learnable iff: VCdim H n poly(n), and Required for learnability There is a poly-time algorithm for CONS H (polynomial in size of input) What about improper learning? Required for efficient proper learnability

Hardness of CONSISTENT Axis-aligned rectangles in n dimensions Halfspaces in n dimensions Conjunctions on n variables 3-term DNF s DNF formulas of size poly(n) Generic logical formulas of size poly(n) Python programs of size at most poly(n) Decision trees of size at most poly(n) Neural nets with at most poly(n) units Functions computable in poly(n) time CONSISTENT: Poly-time What does this imply? CONSISTENT: NP-Hard What does this imply?

Another Look at 3-TERM-DNF 3 TERM DNF n = T 1 T 2 T 3 T i CONJ n E.g. h x = x[1] x 7 x 23 x 2 x 3 x 6 x 7 Can also write: h x = x 1 x 2 x 6 x 1 x 2 x 7 x 1 x 3 x 6 x 1 x 3 x 7 x 7 x 2 x 6 x 7 x 2 x 7 x 7 x 3 x 6 x 7 x 3 x 7 x 23 x 2 x 6 x 23 x 2 x 7 x 23 x[3] x 6 x 23 x[3] x 7 Claim: 3 TERM DNF n 3CNF n r 3CNF n = T i T i = of three literals i=1 But 3CNF n CONJ disjunctions of three literals Can be efficiently learned using FINDCONS CONJ, treating each 2n 3 3-disjunction as a single variable I.e. map x ቀx 1 x 2 x 3, x 1 x 2 x 4,, x 1 x 2

Efficiently Learning 3-TERM-DNF Sample complexity of learning 3-TERM-DNF using FINDCONS 3CNF : O Compare with cardinality bound: O Why the gap? n Τε n 3 Τε 3CNF Relax statistically easy but computationally hard class to larger class that s statistically harder but computationally easier: 3-TERM-DNF CONS 3 T DNF CONS 3CNF Sample complexity O n Τε O n 3 ε Runtime 2 O n /ε O n 6 ε x 1 x 2 x 3 x 4

Hardness of CONSISTENT Axis-aligned rectangles in n dimensions Halfspaces in n dimensions Conjunctions on n variables CONSISTENT: Poly-time Efficiently Properly Learnable 3-term DNF s DNF formulas of size poly(n) Generic logical formulas of size poly(n) Decision trees of size at most poly(n) Neural nets with at most poly(n) units Functions computable in poly(n) time Python programs of size at most poly(n) CONSISTENT: NP-Hard Not Efficiently Properly Learnable Improperly Learnable?

Proper vs Improper Learning 3-TERM-DNF is not efficiently properly learnable, but is efficiently (improperly) learnable Are there classes that are not efficiently learnable, even improperly? What about the class of all functions computable in time poly(n)? Universal class: any H n where the predictors are poly-time computable is H n TIME poly n VCdim = O poly n CONS is NP-Complete: If P = NP we could actually learn this universal class! If RP NP, we can t efficiently properly learn it But can we improperly learn it? How do we show that a class cannot be learned, no matter what hypothesis we are allowed to output?

plaintext message s 0,1 n f K ( ) Learning and Crypto How to attack the cryptosystem: Usually assume you know the form of f (but not K) e.g. you know target is using a GSM phone, or ssh, or file format specifies encryption, or you captured an Enigma machine Collect messages for which you know the plaintext, i.e. pairs s i, c i or have partial or statistical information on the plaintext s i, e.g. s i is an English word, and so Pr s i 1 = "t" = 0.17 Search for a key K s.t. c i = f K (s i ) Information Theoretically: Need m = log K n H(s) encrypted message c = f K (s) message pairs Only way to be secure: One Time Pad, i.e. K = 0,1 m n f K 1 ( ) Encryption key K K Claude Shannon 1916-2001 decoded message s = f K 1 ( c)

Learning and Crypto plaintext message s 0,1 n f K ( ) encrypted message f 1 K ( ) c = f K (s) Encryption key K K decoded message s = f K 1 ( c) Reducing Code Breaking to Machine Learning: Hypothesis class f K 1 K K over domain X = {encrypted messages c} and targets plaintext messages s Training set c i, s i i=1..m, possibly noisy s i Learn decoder c s Proper learning: find decoder of the form f K 1 But really, finding any decoder that decodes most messages is fine Improper learning Statistically, enough to have m log H log K messages So it better be computationally hard Learning is computationally easy Breaking code is easy No crypto Crypto is possible learning is computationally hard

Discrete Cube Root Cryptosystem Choose primes p, q s.t. 3 (p 1)(q 1) Public (encryption) key: K = p q 2 n Private (decryption) key: D = 1 Τ3 mod p 1 q 1 f K s = s 3 mod K easy to compute from K and s in time O n 2 f K 1 c = c D mod K Proof: 3D = a p 1 q 1 + 1 for some integer a and so mod K we have c D 3 = c 3D = c p 1 a c q 1 a c = 1 a 1 a c = c easy to compute from D and c in time O n 3 Hard to compute from K and c Hard in what sense? Worst case: no poly time alg that works for every c maybe only hard on crazy c, but we can decrypt most messages We would like: no poly time alg that works for any c impossible always return 3487, you ll be right on some c Enough: no poly time alg that works for random c can make c random by sending f K s r, r for random r

Cryptographic Assumption 1. There exists a poly-time algorithm ENC K, s = f K s = s 3 mod K 2. For every K K, there exists D(K) and a poly-time algorithm DEC D(K), c = f K 1 c = s D mod K, i.e. s.t. s S DEC D K, f K (s) = s 3. There is no poly-time (randomized) algorithm BREAK(K, c) s.t. K K Pr BREAK K, f 1 K s = s s~unif S,randomness in B poly n K, S 0,1 n, i.e. size of inputs is O(n) For discrete cube root: K = K = p 1 q 1 p, q prime, 2 n 2 K < 2 n S = 0,1 n 2 = s 0 s < 2 n 2 } For a useable public-key cryptosystem also need: A poly(n) algorithm for generating K, D K -- we won t use this Pr BREAK K, f k s = s 1 rather then K,s poly n K Pr 1 s ( K enough for us) poly n

Hardness of Learning Discrete Cube Root H n = c, i f K 1 c [i] K K, i = 1.. n 2 VCdim H n log H n log 0,1 n = n All predictors in H n are computable in time O n 3 Claim: Subject to discrete cube root assumption, H n is not efficiently PAC learnable. Proof: Given poly-time learning algorithm A, construct BREAK(K, c): For i=1..n, h i A D i, ε = 1 4n, δ = 1 4n D i x, y : a Unif 0,1 n 2 x = (f K a, i), y = a[i] Return [h 1 c, h 2 c,, h n 2 (c)] W.p. 1 n 2 > 3 (over randomness in BREAK) learning worked for all i, and in that 4n 4 case, the probability to err on some bit is n 2 < 1 (over the choice of c), and so overall 4n 4 w.p.>1/2, BREAK K, c = f 1 K (c).

Crypto Hardness of Discrete Cube Root Hardness of Learning We identified a family of functions that is provably not efficiently PAC-learnable Despite having polynomial VC-dimension Despite containing only easily computable functions Subject to an assumption about security of a cryptosystem Since H n TIME O n 3 efficiently PAC-learnable, this implies poly-time computable functions are not In fact, if there is any secure public-key crypto-system, then poly-time computable functions are not efficiently PAC-learnable Even stronger: if we can efficiently learn poly-time computable functions then no cipher is secure to known-plaintext attacks

Learning and Crypto plaintext message s 0,1 n f( ) encrypted message c = f(s) f 1 ( ) Any poly-time encryption function f with a poly-time computable decryption function f 1 decoded message s = f 1 ( c) Reducing Breaking an unknown code to learning poly-time functions: Hypothesis class poly time computable h: X Y over domain X = {encrypted messages c} and targets plaintext messages s Training set c i, s i i=1..m, possibly noisy s i Learn decoder c s Conclusion: if we can efficiently learn poly-time computable functions, we can break any cipher with a known-plaintext attack If there exists any secure cipher system, then poly-time functions are not efficiently PAC learnable

Hardness of Learning via Crypto Public-key crypto is possible (one way functions exist) hard to learn poly-time functions Hardness of Discrete Cube Root hard to learn log(n)-depth logic circuits hard to learn log(n)-depth poly-size neural networks Hardness of breaking RSA hard to learn poly-length logical formulas hard to learn poly-size automata hard to learn push-down automata, ie regexps for some depth d, hard to learn poly-size depth-d threshold circuits (output of unit is one iff number of input units that are one is greater than threshold) Hardness of lattice-shortest-vector based cryptography hard to learn intersection of n r halfspaces (for any r > 0)