Computational and Statistical Learning Theory

Similar documents
Computational and Statistical Learning Theory

Computational and Statistical Learning Theory

Introduction to Machine Learning

Computational and Statistical Learning Theory

CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrosky. Lecture 4

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

Cryptography. Lecture 2: Perfect Secrecy and its Limitations. Gil Segev

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

Lecture 1: Perfect Secrecy and Statistical Authentication. 2 Introduction - Historical vs Modern Cryptography

Lecture 15 - Zero Knowledge Proofs

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Computational Learning Theory. Definitions

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Shift Cipher. For 0 i 25, the ith plaintext character is. E.g. k = 3

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational and Statistical Learning Theory

Fully Homomorphic Encryption

Introduction to Cryptology. Lecture 2

Implementation Tutorial on RSA

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

Introduction to Modern Cryptography. Benny Chor

Lecture 19: Public-key Cryptography (Diffie-Hellman Key Exchange & ElGamal Encryption) Public-key Cryptography

COS433/Math 473: Cryptography. Mark Zhandry Princeton University Spring 2017

1 Indistinguishability for multiple encryptions

Lecture 29: Computational Learning Theory

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam

6.080/6.089 GITCS Apr 15, Lecture 17

ICML '97 and AAAI '97 Tutorials

Dan Boneh. Stream ciphers. The One Time Pad

Lecture 20: conp and Friends, Oracles in Complexity Theory

Historical cryptography. cryptography encryption main applications: military and diplomacy

Modern Cryptography Lecture 4

6.080/6.089 GITCS April 8, Lecture 15

Harvard CS 121 and CSCI E-121 Lecture 22: The P vs. NP Question and NP-completeness

COMS W4995 Introduction to Cryptography October 12, Lecture 12: RSA, and a summary of One Way Function Candidates.

CPE 776:DATA SECURITY & CRYPTOGRAPHY. Some Number Theory and Classical Crypto Systems

Introduction to Computational Learning Theory

1. Introduction Recap

CIS 551 / TCOM 401 Computer and Network Security

Theory of Computation Chapter 12: Cryptography

Advanced Topics in Cryptography

1 Public-key encryption

Great Theoretical Ideas in Computer Science

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds

COS433/Math 473: Cryptography. Mark Zhandry Princeton University Spring 2017

Lecture Notes 20: Zero-Knowledge Proofs

On the tradeoff between computational complexity and sample complexity in learning

SAT, NP, NP-Completeness

Web-Mining Agents Computational Learning Theory

Complete problems for classes in PH, The Polynomial-Time Hierarchy (PH) oracle is like a subroutine, or function in

Block Ciphers and Feistel cipher

Background: Lattices and the Learning-with-Errors problem

Lecture Notes. Advanced Discrete Structures COT S

Computational Learning Theory

Statistical Learning Learning From Examples

Lecture #14: NP-Completeness (Chapter 34 Old Edition Chapter 36) Discussion here is from the old edition.

Attacks on RSA & Using Asymmetric Crypto

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]

6.892 Computing on Encrypted Data September 16, Lecture 2

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013

Lecture 18: More NP-Complete Problems

Computer Science A Cryptography and Data Security. Claude Crépeau

Introduction to Interactive Proofs & The Sumcheck Protocol

Winter 2008 Introduction to Modern Cryptography Benny Chor and Rani Hod. Assignment #2

Lecture 13: Private Key Encryption

1 Recommended Reading 1. 2 Public Key/Private Key Cryptography Overview RSA Algorithm... 2

Practice Final Exam Winter 2017, CS 485/585 Crypto March 14, 2017

Show that the following problems are NP-complete

Comp487/587 - Boolean Formulas

Introduction to Cybersecurity Cryptography (Part 5)

Cook-Levin Theorem. SAT is NP-complete

Lecture 7: CPA Security, MACs, OWFs

NP-Complete Reductions 2

THE UNIVERSITY OF CALGARY FACULTY OF SCIENCE DEPARTMENT OF COMPUTER SCIENCE DEPARTMENT OF MATHEMATICS & STATISTICS MIDTERM EXAMINATION 1 FALL 2018

Algorithm Design and Analysis

CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrovsky. Lecture 7

COP 4531 Complexity & Analysis of Data Structures & Algorithms

Computational Learning Theory (COLT)

Mathematics of Public Key Cryptography

Lectures 2+3: Provable Security

ENEE 457: Computer Systems Security 10/3/16. Lecture 9 RSA Encryption and Diffie-Helmann Key Exchange

Lecture 28: Public-key Cryptography. Public-key Cryptography

Lecture 17 - Diffie-Hellman key exchange, pairing, Identity-Based Encryption and Forward Security

Lecture Notes, Week 6

RSA RSA public key cryptosystem

Lecture 5, CPA Secure Encryption from PRFs

NP Complete Problems. COMP 215 Lecture 20

Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography

Lecture Notes Each circuit agrees with M on inputs of length equal to its index, i.e. n, x {0, 1} n, C n (x) = M(x).

Lecture 2: Perfect Secrecy and its Limitations

9 Knapsack Cryptography

NP Completeness and Approximation Algorithms

Foundations of Network and Computer Security

CPSC 467b: Cryptography and Computer Security

6.842 Randomness and Computation April 2, Lecture 14

Summer School on Introduction to Algorithms and Optimization Techniques July 4-12, 2017 Organized by ACMU, ISI and IEEE CEDA.

Modern symmetric-key Encryption

Transcription:

Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning

Efficient PAC Learning Definition: A family H n of hypothesis classes is efficiently properly PAC-Learnable if there exists a learning rule A such that n ε, δ > 0, δ m n, ε, δ, D s.t. L D h = 0 for some h H, S D m n,ε,δ, L D A S ε and A can be computed in time poly n, 1 ε, log 1 δ and A always outputs a predictor in H n View 1: A( ) outputs a program (e.g. Turing Machine description) mapping X n Y that runs in time poly(n, 1 ε, log 1 δ ) View 2: A(S, x) or A(D, ε, δ, x) outputs y = h(x) View 3: Output description of h. Formally: Output w 0,1 of length w poly(n, 1 ε, log w is a description of a hypothesis h w H n There exists a poly-time algorithm B w, x h w (x) 1 δ )

Learning Using FIND-CONS For any family of hypothesis classes consider: FINDCONS H : Given a sample S, either return h H n consistent with the sample (i.e. s.t. L S h = 0), or declare that none exists. Decision problem CONS H S = 1 iff h Hn s.t. L S h = 0. Theorem: If VCdim H n poly(n), and There is a poly-time algorithm for FINDCONS H then H n is efficiently PAC learnable. Theorem: If H n is efficiently properly PAC learnable, then VCdim H n poly(n), and CONS H RP

3-TERM-DNF H n = 3 TERM DNF n = T 1 T 2 T 3 T i CONJ n E.g. h x = x 1 x 7 x 23 x 75 x 2 x 3 x 5 x 6 x 7 VCdim log 3 TERM DNF n log 3 n + 1 3 = O(n) Learnable with m(n, ε, δ) poly n, 1 ε, log 1 δ. Efficiently?

Hardness of 3-TERM-DNF Claim: CONS 3 TERM DNF is NP-hard Proof: Reduction from graph 3-colorability 3COLOR = { undirected graph G(V, E) c:v 1,2,3 u v E c u c(v) } Map G V, E S G s.t. G 3COLOR S G CONS 3 T DNF S G = x i = 1,1,1,, 1,0,1,, 1,1, y = 1 i = 1.. V } { x ij = 1,1, 1,0,1,1, 1,0,1,1, 1,1, y = 0 i j E i i Claim: G 3COLOR S G CONS 3 T DNF S G satisfied by T = T 1 T 2 T 3 where T k = c i k x i Claim:S G CONS 3 T DNF G 3COLOR c(i) = min k s.t. T k satisfies x i j

Hardness of Learning 3-TERM-DNF Conclusion: If NP RP, then 3 TERM DNF is not efficiently properly PAC learnable Proof: 3 TERM DNF efficiently properly PAC learnable CONST 3 TERM DNF RP 3COLOR RP RP = NP

Hardness of CONSISTENT Axis-aligned rectangles in n dimensions Halfspaces in n dimensions Conjunctions on n variables CONSISTENT: Poly-time Efficiently Properly Learnable 3-term DNF s DNF formulas of size poly(n) Generic logical formulas of size poly(n) Decision trees of size at most poly(n) Neural nets with at most poly(n) units Functions computable in poly(n) time Python programs of size at most poly(n) CONSISTENT: NP-Hard Not Efficiently Properly Learnable Improperly Learnable?

Another Look at 3-TERM-DNF 3 TERM DNF n = T 1 T 2 T 3 T i CONJ n E.g. h x = x[1] x 7 x 23 x 2 x 3 x 6 x 7 Can also write: h x = x 1 x 2 x 6 x 1 x 2 x 7 x 1 x 3 x 6 x 1 x 3 x 7 x 7 x 2 x 6 x 7 x 2 x 7 x 7 x 3 x 6 x 7 x 3 x 7 x 23 x 2 x 6 x 23 x 2 x 7 x 23 x[3] x 6 x 23 x[3] x 7 Claim: 3 TERM DNF n 3CNF n r 3CNF n = T i T i = of three literals i=1 But 3CNF n = CONJ disjunctions of three literals Can be efficiently learned using FINDCONS CONJ, treating each 2n 3 3-disjunction as a single variable I.e. map x x 1 x 2 x 3, x 1 x 2 x 4,, x 1 x 2 x 3

Efficiently Learning 3-TERM-DNF Sample complexity of learning 3-TERM-DNF using FINDCONS 3CNF : O Compare with cardinality bound: O Why the gap? n ε n 3 ε 3CNF Relax statistically easy but computationally hard class to larger class that s statistically harder but computationally easier: 3-TERM-DNF CONS 3 T DNF CONS 3CNF Sample complexity O n ε O n 3 ε Runtime 2 O n /ε O n 6 ε x 1 x 2 x 3 x 4

Proper vs Improper Learning 3-TERM-DNF is not efficiently properly learnable, but is efficiently (improperly) learnable Are there classes that are not efficiently learnable, even improperly? What about the class of all functions computable in time poly(n)? Universal class: any H n where the predictors are poly-time computable is H n TIME poly n VCdim = O poly n CONS is NP-Complete: If P = NP we could actually learn this universal class! If RP NP, we can t efficiently properly learn it But can we improperly learn it? How do we show that a class cannot be learned, no matter what hypothesis we are allowed to output?

Learning and Crypto plaintext message s 0,1 n f K ( ) encrypted message f 1 K ( ) c = f K (s) Encryption key K K How to attack the cryptosystem: Usually assume you know the form of f (but not K) e.g. you know target is using a GSM phone, or ssh, or file format specifies encryption, or you captured an Enigma machine Collect messages for which you know the plaintext, i.e. pairs s i, c i or have partial or statistical information on the plaintext s i, e.g. s i is an English word, and so Pr s i 1 = "t" = 0.17 Search for a key K s.t. c i = f K (s i ) Information Theoretically: Need m = log K n H(s) message pairs Only way to be secure: One Time Pad, i.e. K = 0,1 m n Claude Shannon 1916-2001 decoded message s = f K 1 ( c)

Learning and Crypto plaintext message s 0,1 n f K ( ) encrypted message f 1 K ( ) c = f K (s) Encryption key K K decoded message s = f K 1 ( c) Reducing Code Breaking to Machine Learning: Hypothesis class f K 1 K K over domain X = {encrypted messages c} and targets plaintext messages s Training set c i, s i i=1..m, possibly noisy s i Learn decoder c s Proper learning: find decoder of the form f K 1 But really, finding any decoder that decodes most messages is fine Improper learning Statistically, enough to have m log H log K messages So it better be computationally hard Learning is computationally easy Breaking code is easy No crypto Crypto is possible learning is computationally hard

Discrete Cube Root Cryptosystem Choose primes p, q s.t. 3 (p 1)(q 1) Public (encryption) key: K = p q 2 n Private (decryption) key: D = 1 3 mod p 1 q 1 f K s = s 3 mod K easy to compute from K and s in time O n 2 f K 1 c = c D mod K Proof: 3D = a p 1 q 1 + 1 for some integer a and so mod K we have c D 3 = c 3D = c p 1 a c q 1 a c = 1 a 1 a c = c easy to compute from D and c in time O n 3 Hard to compute from K and c Hard in what sense? Worst case: no poly time alg that works for every c maybe only hard on crazy c, but we can decrypt most messages We would like: no poly time alg that works for any c impossible always return 3487, you ll be right on some c Enough: no poly time alg that works for random c can make c random by sending f K s r, r for random r

Cryptographic Assumption 1. There exists a poly-time algorithm ENC K, s = f K (s) 2. For every K K, there exists D(K) and a poly-time algorithm DEC D(K), c = f K 1 (c), i.e. s.t. s S DEC D K, f K (s) = s 3. There is no poly-time (randomized) algorithm BREAK(K, c) s.t. K K Pr BREAK K, f 1 K s = s s~unif S,randomness in B poly n K, S 0,1 n, i.e. size of inputs is O(n) For discrete cube root: K = K = p 1 q 1 p, q prime, 2 n 2 K < 2 n S = 0,1 n 2 = s 0 s < 2 n 2 } For a useable public-key cryptosystem we also need a poly(n) algorithm for generating K, D K -- we won t use this

Hardness of Learning Discrete Cube Root H n = c, i f K 1 c [i] K K, i = 1.. n 2 VCdim H n log H n log 0,1 n = n All predictors in H n are computable in time O n 3 Claim: Subject to discrete cube root assumption, H n is not efficiently PAC learnable. Proof: Given poly-time learning algorithm A, construct BREAK(K, c): For i=1..n, h i A D i, ε = 1 4n, δ = 1 4n D i x, y : a Unif 0,1 n 2 x = (f K a, i), y = a[i] Return [h 1 c, 1, h 2 c, 2,, h n 2 (c, n 2)] W.p. 1 n 2 > 3 (over randomness in BREAK) learning worked for all i, and in that 4n 4 case, the probability to err on some bit is n 2 < 1 (over the choice of c), and so overall 4n 4 w.p.>1/2, BREAK K, c = f 1 K (c).

Hardness of Learning: Conclusions We identified a family of functions that is provably not efficiently PAC-learnable Despite having polynomial VC-dimension Despite containing only easily computable functions Subject to an assumption about security of a cryptosystem Since H n TIME O n 3 efficiently PAC-learnable, this implies poly-time computable functions are not In fact, if there is any secure public-key crypto-system, then poly-time computable functions are not efficiently PAC-learnable Even stronger: if we can efficiently learn poly-time computable functions then no cipher is secure to known-plaintext attacks

Learning and Crypto plaintext message s 0,1 n f( ) encrypted message c = f(s) f 1 ( ) Any poly-time encryption function f with a poly-time computable decryption function f 1 decoded message s = f 1 ( c) Reducing Breaking an unknown code to learning poly-time functions: Hypothesis class poly-time computable h: X Y over domain X = {encrypted messages c} and targets plaintext messages s Training set c i, s i i=1..m, possibly noisy s i Learn decoder c s Conclusion: if we can efficiently learn poly-time computable functions, we can break any cipher with a known-plaintext attack If there exists any secure cipher system, then poly-time functions are not efficiently PAC learnable

Hardness of Learning: Conclusions We identified a family of functions that is provably not efficiently PAC-learnable Despite having polynomial VC-dimension Despite containing only easily computable functions Subject to an assumption about security of a cryptosystem Since H n TIME O n 3 efficiently PAC-learnable, this implies poly-time computable functions are not In fact, if there is any secure public-key crypto-system, then poly-time computable functions are not efficiently PAC-learnable Even stronger: if we can efficiently learn poly-time computable functions then no cipher is secure to known-plaintext attacks f 1 K can be computed with circuits with O n 3 gates and depth O(log n) Can t efficiently PAC learn O(log n)-depth circuits Can t efficiently PAC lean O(log n)-depth neural nets