Introduction to Machine Learning

Size: px
Start display at page:

Download "Introduction to Machine Learning"

Transcription

1 Introduction to Machine Learning Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses

2 Efficient PAC Learning Until now we were mostly worried about sample complexity How many examples do we need in order to Probably Approximately Correctly learn from a specific concept class? This is a CS course, not a stats course. We can t ignore the computational price! How much computational effort is needed for PAC learning?

3 Definition of Efficient PAC Learning Of course we want polynomial. But in what? 1 st Attempt: In the number of training examples m The number of necessary examples m is a function of the complexity of H, and parameters δ, ε (which we really care about). If m is large, we can throw away the excess. 2 nd Attempt: In 1 δ, 1 ε Must output an efficient prediction rule Learning: procedure one-step-erm S = x 1, y 1,, x m, y m return h S Predicting: h S x = argmin h H L S (h ) (x)

4 Polynomial In What? We will define complexity of a learning algorithm A with respect to δ, ε and another parameter n which is related to the ``size of H, X Parameter n can be embedding dimension. Example: If we decide to use n features to describe objects, how will that increase runtime? The standard way to do this is by defining a sequence of pairs (X n, H n ) n=1, and studying asymptotic complexity of learning X n, H n as n grows Important to remember: A does not get distribution as part of input

5 Formal Definition: Efficient PAC Learning [Valiant 1984] A sequence (X n, H n ) n=1 is efficient PAC learnable if algorithm A(S, ε, δ) and a polynomial p(n, 1, 1 ) s.t. ε δ For all n, D X n, Y s.t. h: L D h = 0 and ε, δ: A receives as input S D m with m p n, 1, 1, runs ε δ in time at most p n, 1 ε, 1 δ, and outputs a predictor h: X n Y that can be evaluated in time p n, 1, 1 ε δ and w.p. 1 δ: L D h ε Over sample and/or algorithm randomization

6 Conclusion: If for sequence (X n, H n ) n=1 we have (1) VCdim H n poly(n) and (2) CONSISTENT polytime computable (in sample size) then sequence is efficient PAC learnable. Efficient PAC Learning Using CONSISTENT Reminder: CONSISTENT H S = ( x 1, y 1,, (x m, y m ) If h H s.t. h x i = y i for all i, output such h Otherwise, output doesn t exist. If VCdim(H) D can learn (in realizable case) using CONSISTENT on O D + log 1 δ /ε samples

7 Example: If n encodes H n Assume n H n = n x, h (X n, H n ), h(x) computable in polynomial time q(n) VCDim H n log n CONSISTENT computable in time mn q(n) Problem efficient PAC learnable in O q n n ε 1 log n δ time

8 Exponential Size (or Infinite) Classes Axis-aligned rectangles in n dimensions? Halfspaces in n dimensions? Boolean functions on n variables

9 Axis-Aligned Rectangles in n Dimensions VCDim H n = O(n) CONSISTENT solvable in time O nm n = 2

10 Halfspaces in n Dimensions Reminder: h w x = sign w, x VCDim H n = O(n) CONSISTENT solvable in time O poly n, m n = 2

11 X n = 0,1 n Y = 0,1 Boolean Conjunctions Note: H = 3 n + 1 H n = h i1..i k,j 1..j r i 1,.., i k, j 1,.., j r n } h i1..i k,j 1..j r x 1,, x n = x i1 x ik x j1 x jr Can we solve CONSISTENT efficiently? Yes! Literals Start with h = h 1,..n,1,..n Scan samples in any order Ignore samples x, 0 Given sample x, 1 : Fix h by removing violating literals What is the running time?

12 X n = 0,1 n Y = 0,1 H n = 3-term DNFs h A1,A 2,A 3 A 1, A 2, A 3 conjunctions} h A1,A 2,A 3 x = A 1 x A 2 x A 3 x Note: H = 3 3n Can we still solve CONSISTENT efficiently? Probably not! It s NP-Hard

13 Exponential Size (or Infinite) Classes Axis-aligned rectangles in n dimensions Halfspaces in n dimensions Conjunctions on n variables 3-term DNF s Python programs of size at most n Python programs that run in poly(n) time Decision trees of size at most n Circuits of size at most n Even circuits of depth at most log n CONSISTENT: Poly-time What does this imply? CONSISTENT: NP-Hard What does THIS imply???

14 Implication Of CONSISTENT is Hard CONSISTENT computes h H n consistent with sample Efficient PAC learnability allows outputting any function as prediction (as long as it is efficiently computable) CONSISTENT is hard learning is hard CONSISTENT is hard proper learning is hard Def: A proper learning algorithm is a learning algorithm that must output h H n We already saw improper learning When? The halving algorithm! Is there a problem that is not efficient PAC properly learnable, but still efficient PAC (improperly) learnable?

15 Improper Learning of 3-term DNFs Distribution rule: a b c d = a c a d b c (b d) A 1 A 2 A 3 = (u v w) u A 1,v A 2,w A 3 Can view as conjunction over 2n 3 variables At most 3 ((2n)3) + 1 possible conjunctions Can solve using CONSISTENT for conjunctions in dimension 2n 3. Can efficient PAC learn 3-term DNFs Pay polynomially in sample complexity, gain exponentially in computational complexity At most 2n 3 possibilities

16 Improper Learning Sample complexity CONSISTENT Computational-Complexity Conjunctions 3-term DNF over 0,1 2n Conjunctions 3 over 0, 1 2n 3 O n + log 1 δ ε NP-Hard O O n3 + log 1 δ ε n 3 n3 + log 1 δ ε

17 Improper Learning Conjunctions over 0,1 2n 3 (Larger concept class, easy CONSISTENT) 3-term DNF over 0,1 n (Smaller concept class, hard CONSISTENT)

18 So How Do We Prove Hardness of Learning? Hardness of learning is reminiscent of cryptography The business of cryptography is preventing from an adversary to uncover a secret (key) even using partial observations Much of cryptography is based on existence of trapdoor one-way functions f: 0,1 n {0,1} n s.t. (1) Easy to compute f (2) Hard to compute f 1, even only with high probability [(3) Easy to compute f 1 given trapdoor s f ] Given family F n of trapdoor one-way functions f, each with a distinct trapdoor s f 0,1 poly n Define H n = h = f 1 : f F n Efficient PAC learning H n would violate (2) because thanks to (1) we could simulate a sample x 1 = f(y 1 ), y 1,, (x m = f y m, y m )

19 Breaking a Crypto System Given Efficient PAC Learnability break-crypto-system x 0,1 n, f F n draw y 1,, y m // m = poly(n) 0,1 n randomly compute x 1 = f y 1,, x m = f(y m ) // easy send x 1, y 1,, (x m, y m ) to efficient PAC learner, obtain h // efficiently computable return h(x) By efficient PAC learnability, succeeds for most x s with high probability

20 The Cubic Root Problem Fix integer N of n log N bits s.t. N = pq for two primes p, q Let f x = x 3 mod N Under mild number-theoretic assumption, f 1 well defined Given p, q, easy to compute f 1 Using a python program of polynomial size, running in poly n time Using a circuit of O(log n ) depth Without p, q, believed to be hard to compute f 1 No efficient PAC learning of Short python programs Efficient python programs Logarithmic depth circuits

21 The Agnostic Case If you thought realizable was hard... agnostic is even harder ERM for halfspaces is NP-Hard. Improperly learning halfspaces is crypto-hard. Everything interesting we know of is hard in the agnostic case (except maybe for intervals ) So what to do?

22 What is the Source of Hardness? Is it the fact that the concept classes are large? No We saw that over squared losses (over the reals) we can efficiently optimize linear classifiers Working with discrete valued losses is what makes a hard problem hard l sqr ( 1, h x ) l 0 1 ( 1, h x ) h x R Why is this easier? (a) It s continuous (b) It s convex (c) All of the above h x R

23 Convexity Definition (convex set): A set C in a vector space is convex if u, v C and for all α 0,1 : αu + 1 α v C

24 Convexity Definition (convex function): A function f: C R for convex domain C is convex if u, v C, α 0,1 : f αu + 1 α v αf u + 1 α f(v) αf(u) + 1 α f(v) f(αu + 1 α v) f(u) u f(v) v C αu + 1 α v

25 Minimizing Convex Functions Any local minimum is also a global minimum See proof in book! We can greedily search for increasingly better solutions in small neighborhoods. When we re stuck, we re done!

26 Surrogate Loss Functions Instead of minimizing l 0 1 minimize something that is Convex (for any fixed y) An upper bound of l 0 1 For example: l sqr y, h x = y h x 2 l logistic y, h x = log (1 + exp yh x ) l hinge (y, h(x)) = max{0,1 y h(x)} Easier to minimize If surrogate small, so is l 0 1 l 0 1 l hinge 1 1 y = 1 h x R y = 1 h x R

27 Summary: Learning Using Convex Surrogates If your problem is hard to learn w. r. t. l: Y Y R Predict using h: X Y Define surrogate l sur : Y Y R Y convex set l sur (y, y) convex function in y for all y l l sur Can now naturally define sur L D h = E x D l sur (y, h x ) L D (h) sur m L S h = 1 m lsur (y i, h x i ) i=1 L S (h)

28 Sample Bounds for Probably Approximately Optimizing over Surrogates Can we approximately, probably minimize L D by sur minimizing L S over some class of functions {h: X Y}? If Y = M, M (bounded functions), then Hoeffding bound can be used (as in binary case) to say that, for any h: sur sur Pr S D m L D (h) L S (h) > t 2e t2 2M 2 m Union bound won t work here (class of functions typically uncountable) VC-Subgraph is a method for extending VC theory to real valued functions sur

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Efficient PAC Learning Definition: A family H n of

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Learning Using FIND-CONS For any family of hypothesis

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution

More information

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate

More information

Lecture 20: conp and Friends, Oracles in Complexity Theory

Lecture 20: conp and Friends, Oracles in Complexity Theory 6.045 Lecture 20: conp and Friends, Oracles in Complexity Theory 1 Definition: conp = { L L NP } What does a conp computation look like? In NP algorithms, we can use a guess instruction in pseudocode:

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

On the tradeoff between computational complexity and sample complexity in learning

On the tradeoff between computational complexity and sample complexity in learning On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham

More information

Statistical Learning Learning From Examples

Statistical Learning Learning From Examples Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

CS151 Complexity Theory. Lecture 1 April 3, 2017

CS151 Complexity Theory. Lecture 1 April 3, 2017 CS151 Complexity Theory Lecture 1 April 3, 2017 Complexity Theory Classify problems according to the computational resources required running time storage space parallelism randomness rounds of interaction,

More information

6.842 Randomness and Computation April 2, Lecture 14

6.842 Randomness and Computation April 2, Lecture 14 6.84 Randomness and Computation April, 0 Lecture 4 Lecturer: Ronitt Rubinfeld Scribe: Aaron Sidford Review In the last class we saw an algorithm to learn a function where very little of the Fourier coeffecient

More information

Web-Mining Agents Computational Learning Theory

Web-Mining Agents Computational Learning Theory Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Avrim Blum Lecture 9: October 7, 2015 1 Computational Hardness of Learning Today we will talk about some computational hardness results

More information

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good

More information

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Computational Learning Theory

Computational Learning Theory 0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions

More information

Chapter 2: The Basics. slides 2017, David Doty ECS 220: Theory of Computation based on The Nature of Computation by Moore and Mertens

Chapter 2: The Basics. slides 2017, David Doty ECS 220: Theory of Computation based on The Nature of Computation by Moore and Mertens Chapter 2: The Basics slides 2017, David Doty ECS 220: Theory of Computation based on The Nature of Computation by Moore and Mertens Problem instances vs. decision problems vs. search problems Decision

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework Computational Learning Theory - Hilary Term 2018 1 : Introduction to the PAC Learning Framework Lecturer: Varun Kanade 1 What is computational learning theory? Machine learning techniques lie at the heart

More information

Learning Theory Continued

Learning Theory Continued Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.

More information

Pr[X = s Y = t] = Pr[X = s] Pr[Y = t]

Pr[X = s Y = t] = Pr[X = s] Pr[Y = t] Homework 4 By: John Steinberger Problem 1. Recall that a real n n matrix A is positive semidefinite if A is symmetric and x T Ax 0 for all x R n. Assume A is a real n n matrix. Show TFAE 1 : (a) A is positive

More information

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction

More information

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k.

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k. Complexity Theory Problems are divided into complexity classes. Informally: So far in this course, almost all algorithms had polynomial running time, i.e., on inputs of size n, worst-case running time

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

Computational Learning Theory (COLT)

Computational Learning Theory (COLT) Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine

More information

Lecture 4. 1 Circuit Complexity. Notes on Complexity Theory: Fall 2005 Last updated: September, Jonathan Katz

Lecture 4. 1 Circuit Complexity. Notes on Complexity Theory: Fall 2005 Last updated: September, Jonathan Katz Notes on Complexity Theory: Fall 2005 Last updated: September, 2005 Jonathan Katz Lecture 4 1 Circuit Complexity Circuits are directed, acyclic graphs where nodes are called gates and edges are called

More information

Umans Complexity Theory Lectures

Umans Complexity Theory Lectures Complexity Theory Umans Complexity Theory Lectures Lecture 1a: Problems and Languages Classify problems according to the computational resources required running time storage space parallelism randomness

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic

More information

1 Randomized Computation

1 Randomized Computation CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at

More information

Learning and Fourier Analysis

Learning and Fourier Analysis Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture2.pdf CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

Circuits. Lecture 11 Uniform Circuit Complexity

Circuits. Lecture 11 Uniform Circuit Complexity Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2 Recall Non-uniform complexity P/1 Decidable 2 Recall Non-uniform complexity P/1 Decidable NP P/log NP = P 2 Recall

More information

P, NP, NP-Complete, and NPhard

P, NP, NP-Complete, and NPhard P, NP, NP-Complete, and NPhard Problems Zhenjiang Li 21/09/2011 Outline Algorithm time complicity P and NP problems NP-Complete and NP-Hard problems Algorithm time complicity Outline What is this course

More information

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on

More information

Lecture 7: Passive Learning

Lecture 7: Passive Learning CS 880: Advanced Complexity Theory 2/8/2008 Lecture 7: Passive Learning Instructor: Dieter van Melkebeek Scribe: Tom Watson In the previous lectures, we studied harmonic analysis as a tool for analyzing

More information

Introduction to Machine Learning (67577) Lecture 5

Introduction to Machine Learning (67577) Lecture 5 Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

ICML '97 and AAAI '97 Tutorials

ICML '97 and AAAI '97 Tutorials A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best

More information

Lecture 6: Introducing Complexity

Lecture 6: Introducing Complexity COMP26120: Algorithms and Imperative Programming Lecture 6: Introducing Complexity Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2015 16 You need this book: Make sure you use the up-to-date

More information

Day 3: Classification, logistic regression

Day 3: Classification, logistic regression Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised

More information

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6 CS 395T Computational Learning Theory Lecture 3: September 0, 2007 Lecturer: Adam Klivans Scribe: Mike Halcrow 3. Decision List Recap In the last class, we determined that, when learning a t-decision list,

More information

Lecture 18: P & NP. Revised, May 1, CLRS, pp

Lecture 18: P & NP. Revised, May 1, CLRS, pp Lecture 18: P & NP Revised, May 1, 2003 CLRS, pp.966-982 The course so far: techniques for designing efficient algorithms, e.g., divide-and-conquer, dynamic-programming, greedy-algorithms. What happens

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

Announcements. CSE332: Data Abstractions Lecture 2: Math Review; Algorithm Analysis. Today. Mathematical induction. Dan Grossman Spring 2010

Announcements. CSE332: Data Abstractions Lecture 2: Math Review; Algorithm Analysis. Today. Mathematical induction. Dan Grossman Spring 2010 Announcements CSE332: Data Abstractions Lecture 2: Math Review; Algorithm Analysis Dan Grossman Spring 2010 Project 1 posted Section materials on using Eclipse will be very useful if you have never used

More information

CS Communication Complexity: Applications and New Directions

CS Communication Complexity: Applications and New Directions CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the

More information

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch] NP-Completeness Andreas Klappenecker [based on slides by Prof. Welch] 1 Prelude: Informal Discussion (Incidentally, we will never get very formal in this course) 2 Polynomial Time Algorithms Most of the

More information

Great Theoretical Ideas in Computer Science. Lecture 7: Introduction to Computational Complexity

Great Theoretical Ideas in Computer Science. Lecture 7: Introduction to Computational Complexity 15-251 Great Theoretical Ideas in Computer Science Lecture 7: Introduction to Computational Complexity September 20th, 2016 What have we done so far? What will we do next? What have we done so far? > Introduction

More information

On Basing Lower-Bounds for Learning on Worst-Case Assumptions

On Basing Lower-Bounds for Learning on Worst-Case Assumptions On Basing Lower-Bounds for Learning on Worst-Case Assumptions Benny Applebaum Boaz Barak David Xiao Abstract We consider the question of whether P NP implies that there exists some concept class that is

More information

Great Theoretical Ideas in Computer Science. Lecture 9: Introduction to Computational Complexity

Great Theoretical Ideas in Computer Science. Lecture 9: Introduction to Computational Complexity 15-251 Great Theoretical Ideas in Computer Science Lecture 9: Introduction to Computational Complexity February 14th, 2017 Poll What is the running time of this algorithm? Choose the tightest bound. def

More information

Complexity, P and NP

Complexity, P and NP Complexity, P and NP EECS 477 Lecture 21, 11/26/2002 Last week Lower bound arguments Information theoretic (12.2) Decision trees (sorting) Adversary arguments (12.3) Maximum of an array Graph connectivity

More information

Computational and Statistical Learning theory

Computational and Statistical Learning theory Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,

More information

Lecture 4 : Quest for Structure in Counting Problems

Lecture 4 : Quest for Structure in Counting Problems CS6840: Advanced Complexity Theory Jan 10, 2012 Lecture 4 : Quest for Structure in Counting Problems Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K. Theme: Between P and PSPACE. Lecture Plan:Counting problems

More information

Lecture 25: Cook s Theorem (1997) Steven Skiena. skiena

Lecture 25: Cook s Theorem (1997) Steven Skiena.   skiena Lecture 25: Cook s Theorem (1997) Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena Prove that Hamiltonian Path is NP

More information

CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds

CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds Lecturer: Toniann Pitassi Scribe: Robert Robere Winter 2014 1 Switching

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time Analysis of Boolean Functions (CMU 8-859S, Spring 2007) Lecture 0: Learning DNF, AC 0, Juntas Feb 5, 2007 Lecturer: Ryan O Donnell Scribe: Elaine Shi Learning DNF in Almost Polynomial Time From previous

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical

More information

Intro to Theory of Computation

Intro to Theory of Computation Intro to Theory of Computation LECTURE 25 Last time Class NP Today Polynomial-time reductions Adam Smith; Sofya Raskhodnikova 4/18/2016 L25.1 The classes P and NP P is the class of languages decidable

More information

6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch

6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch 6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch Today Probabilistic Turing Machines and Probabilistic Time Complexity Classes Now add a new capability to standard TMs: random

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: NP-Completeness I Date: 11/13/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: NP-Completeness I Date: 11/13/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: NP-Completeness I Date: 11/13/18 20.1 Introduction Definition 20.1.1 We say that an algorithm runs in polynomial time if its running

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Generating Function Notes , Fall 2005, Prof. Peter Shor

Generating Function Notes , Fall 2005, Prof. Peter Shor Counting Change Generating Function Notes 80, Fall 00, Prof Peter Shor In this lecture, I m going to talk about generating functions We ve already seen an example of generating functions Recall when we

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems

More information

CPSC 467b: Cryptography and Computer Security

CPSC 467b: Cryptography and Computer Security CPSC 467b: Cryptography and Computer Security Michael J. Fischer Lecture 10 February 19, 2013 CPSC 467b, Lecture 10 1/45 Primality Tests Strong primality tests Weak tests of compositeness Reformulation

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

Boolean circuits. Lecture Definitions

Boolean circuits. Lecture Definitions Lecture 20 Boolean circuits In this lecture we will discuss the Boolean circuit model of computation and its connection to the Turing machine model. Although the Boolean circuit model is fundamentally

More information

Introduction to Computational Learning Theory

Introduction to Computational Learning Theory Introduction to Computational Learning Theory The classification problem Consistent Hypothesis Model Probably Approximately Correct (PAC) Learning c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1

More information

Lecture 15: A Brief Look at PCP

Lecture 15: A Brief Look at PCP IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Basic Course on Computational Complexity Lecture 15: A Brief Look at PCP David Mix Barrington and Alexis Maciel August 4, 2000 1. Overview

More information

1 Cryptographic hash functions

1 Cryptographic hash functions CSCI 5440: Cryptography Lecture 6 The Chinese University of Hong Kong 24 October 2012 1 Cryptographic hash functions Last time we saw a construction of message authentication codes (MACs) for fixed-length

More information

1 Circuit Complexity. CS 6743 Lecture 15 1 Fall Definitions

1 Circuit Complexity. CS 6743 Lecture 15 1 Fall Definitions CS 6743 Lecture 15 1 Fall 2007 1 Circuit Complexity 1.1 Definitions A Boolean circuit C on n inputs x 1,..., x n is a directed acyclic graph (DAG) with n nodes of in-degree 0 (the inputs x 1,..., x n ),

More information

CSCI3390-Lecture 14: The class NP

CSCI3390-Lecture 14: The class NP CSCI3390-Lecture 14: The class NP 1 Problems and Witnesses All of the decision problems described below have the form: Is there a solution to X? where X is the given problem instance. If the instance is

More information

Introduction to Computational Complexity

Introduction to Computational Complexity Introduction to Computational Complexity Tandy Warnow October 30, 2018 CS 173, Introduction to Computational Complexity Tandy Warnow Overview Topics: Solving problems using oracles Proving the answer to

More information

Notes for Lecture 21

Notes for Lecture 21 U.C. Berkeley CS170: Intro to CS Theory Handout N21 Professor Luca Trevisan November 20, 2001 Notes for Lecture 21 1 Tractable and Intractable Problems So far, almost all of the problems that we have studied

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information