ICML '97 and AAAI '97 Tutorials

Size: px
Start display at page:

Download "ICML '97 and AAAI '97 Tutorials"

Transcription

1 A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories

2 Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best Experts/Multiplicative Update Algorithms Statistical Query Learning and Noisy Data Boosting Computational Intractability Results Fourier Methods and Membership Queries

3 The Basic Model Target function f : X! Y (Y = f0; 1g or f+1;,1g), may from class F come Input distribution/density P over X, may be known or arbitrary Class of hypothesis functions H from X to Y Random sample S of m pairs hx i ;f(x i )i Generalization Error (h) =Pr P [h(x) 6= f(x)]

4 Measures of Eciency Parameters ; 2 [0; 1]: ask for (h) with probability at 1, least Input dimension n Target function \complexity" s(f) Sample and computational complexity: scale nicely with 1=;n;s(f) 1=;

5 Variations Ad Innitum Target f from known class F, distribution P arbitrary: model PAC/Valiant Fixed, known P : Distribution-specic PAC model Target f arbitrary: Agnostic model Target f from known class F, add black-box access to f: model with membership queries PAC

6 Sample Complexity/Learning Curves Algorithm-specic vs. general Assume f 2 H How many examples does an arbitrary consistent algorithm to achieve (h)? require

7 The Case of Finite H Fix unknown f 2 H, distribution P Fix \bad" h 0 2 H ((h 0 ) ) Probability h 0 survives m examples (1, ) m (Independence) Probability some bad hypothesis survives jhj(1, ) m (Union Bound) Solve jhj(1, ) m, m = ((1=) log(jhj=)) suces Example: jhj =2 n, m = ((n =) log(1=)) suces Independent of distribution P

8 Occam's Razor Assume f 2 H 0, but given m examples h is chosen from Hm, H0 H 1 Hm Same argument establishes that failure probability is bounded jhmj(1, ) m by Example: jhmj =2 n m, m = ((n m =) log(1=)) suces; m = (((n =) log(1=)) 1=(1,) ) or Compression ( <1) implies Learning

9 An Example: Covering Methods Target f is a conjunction of boolean attributes chosen from x 1 ;:::;xn Eliminate any x i such that x i =0in a positive example Any surviving x i \covers" or explains all negative examples x i =0 with Greedy approach: always is an x i covering (1, 1=k opt ) of so cover all negatives in O(k opt log(m)) remainder,

10 Innite H and the VC Dimension Still true that probability that bad h 2 H survives is (1,) m, now jhj is innite but H shatters x 1 ;:::;x d if all 2 d labelings are realized by H VC dimension d H : size of the largest shattered set

11 Examples of the VC Dimension Finite class H: need 2 d functions to shatter d points, so H log(jhj) d Axis-parallel rectangles in the plane: d H =4 Convex d-gons in the plane: d H =2d +1 Hyperplanes in n dimensions: d H = n +1 d H usually \nicely" related to number of parameters, number operations of

12 The Dichotomy Counting Function For any set of points S, dene H (S) =fh T S : h 2 Hg and = max jsjm fj H (S)jg H(m) H (m) counts maximum number of labelings realized by H m points, so H (m) 2 m always on Important Lemma: for m d H, H (m) m d H Proof is by double induction on m and d H

13 Two Clever Tricks Idea: in expression jhj(1, ) m, try to replace jhj by H (2m) Two-Sample Trick: probability some bad h 2 H survives m probability some h 2 H makes NO mistakes on examples rst m examples and m=2 mistakes on second m examples Incremental Trick: draw 2m sample S = Randomization S S2 rst, randomly split into S 1 and S 2 later 1 S Fix one of the H (2m) labelings of S which makes at least mistakes; probability (wrt split) all mistakes end up in m=2 S 2 is exponentially small in m Failure probability H (2m)2,m=2, sucient sample size is = ((d H =) log(1=)+(1=) log(1=)) m

14 Extensions and Renements Not all h 2 H with (h) have (h) = ; compute error shells (Energy vs. Entropy, Sta- distribution-specic tistical Mechanics) Replace H (m) with distribution-specic expected number dichotomies of \Uniform Convergence Happens": unrealizable f, squared log-loss, ::: error,

15 Best Experts, Multiplicative Updates, Weighted Majority::: Assume nothing about the data Input sequence x 1 ;:::;xm arbitrary Label sequence y 1 ;:::;ym arbitrary Given x m+1, want to predict y m+1 What could we hope to say?

16 A Modest Proposal Only compare performance to a xed collection of \expert h 1 ;:::;h N advisors" Expert h i predicts y i j on x j Goal: for any data sequence, match the number of mistakes by the best advisor on that sequence made Idea: to punish us, force adversary to punish all the advisors As in sample complexity analysis for probabilistic assumptions, expect performance to degrade for large N

17 A Simple Algorithm Maintain weight w i for advisor h i, w i =1=N initially Predict using weighted majority at each trial If we err, and h i erred, w i w i =2

18 A Simple Analysis If W = P N i=1 w i, then when we err W 0 (W=2) + (1=2)(W=2)=(3=4)W After k errors, W 1 (3=4) k If expert i makes ` errors, then w i (1=N)(1=2)` Total weight w i gives (3=4) k (1=N)(1=2)` k (1= log(4=3))(` + log(n)) 2:41(` + log(n)) Sparse vs. distributed representations

19 Statistical Query Learning Model algorithms that use a random sample only to compute statistics Replace source of random examples of target f drawn from P by an oracle for estimating probabilities distribution Learning algorithm submits a query : X Y!f0; 1g Example: (x; y) =1if and only if x 5 = y Response is an estimate of P =Pr P [(x; f(x)) = 1] Demand that complexity of queries and accuracy of estimates permit ecient simulation from examples Captures almost all natural algorithms

20 Noise Tolerance of SQ Algorithms Let source of examples return hx; yi, where y = f(x) with 1, and y = :f(x) with probability probability Dene X 1 = fx : (x; 0) 6= (x; 1)g, X 2 = X, X 1, and 1 =Pr P [x 2 X 1 ] p P = (p 1 =(1, 2)) Pr P; [(x; y) = 1jx 2 X 1 ] + P; [((x; y) =1)^ (x 2 X 2 )] Pr Can eciently estimate P from source of noisy examples Only known method for noise tolerance; other P decompositions give tolerance to other forms of corruption

21 Limitations of SQ Algorithms How many queries are required to learn? SQ dimension d F;P : largest number of pairwise (almost) functions in F with respect to P uncorrelated 1=3 d F;P queries required for nontrivial generalization (Easy case: queries are functions in F ) Also lower bounded by VC dimension of F Application: no SQ algorithm for small decision trees, uniform distribution (including C4.5/CART)

22 Boosting Replace source of random examples with an oracle accepting distributions On input P i, oracle returns a function h i 2 H such that P [h i (x) 6= h(x)] 1=2,, 2 (0; 1=2] (weak learning) Pr Each successive P i is a ltering of true input distribution P, can simulate from random examples so Oracle models a mediocre but nontrivial heuristic Goal: combine the h i into h such that Pr P [h(x) 6= f(x)]

23 How is it Possible? Intuition: each successive P i should force oracle to learn \new" something Example: P 1 = P ; P 2 balances h 1 (x) =f(x) and h 1 6= f(x); 3 restricted to h 1 (x) 6= h 2 (x) P h(x) is majority vote of h 1 ;h 2 ;h 3 Claim: if each h i satises Pr Pi [h i (x) 6= f(x)], then P [h(x) 6= f(x)] 3 2, 2 3 Pr Represent error algebraically, solve constrained maximization problem Now recurse

24 Measuring Performance How many rounds (ltered distributions) required to go from, to? 1=2 Recursive scheme: polynomial in log(1=) and 1=, hypothesis is ternary majority tree of weak hypotheses Adaboost: (1= 2 ) log(1=), hypothesis is linear threshold of weak hypotheses function Top-down decision tree algorithms: (1=) 1=2, hypothesis is decision tree over weak hypotheses a Advantage is actually problem- and algorithm-dependent

25 Computational Intractability Results for Learning Representation-dependent: hard to learn the functions in using hypothesis representation H F Representation-independent: hard to learn the functions F, period in

26 A Standard Reduction Given examples of f 2 F according to any P, L outputs h 2 H (h) with probability 1, in time polynomial in with 1=; 1=;n;s(f) Given sample S of m pairs hx i ;f(x i )i, run algorithm L using = 1=(2m) and drawing randomly from S to provide exam- ples So: reduce hard combinatorial problem to consistency problem for (F; H), then learning is hard unless RP = NP Note: converse from Occam's Razor Then if there exists h 2 H consistent with S, L will nd it probability 1, with

27 Examples of Representation-Dependent Intractability Consistency for (k-term DNF,k-term DNF) as hard as graph k-coloring Consistency for (k-term DNF,2k-term DNF) as hard as approximate graph coloring Consistency for (k-term DNF,k-CNF) is easy Approximate consistency for (conjunctions with errors, conjunctions) as hard as set covering

28 Representation-Independent Intractability Complexity theory: set of examples problem instance, simple hypothesis solution for instance \Read o" a k-coloring of original graph G from the sytactic of a k-term DNF formula for the associated sample S G form No restrictions on form of h: everything is learnable At least ask that hypotheses be polynomially evaluatable: h(x) in time polynomial in jxj;s(h) compute Want learning to be hard for any ecient algorithm

29 Public-Key Cryptography and Learning A sends many encrypted messages F (x 1 );:::;F(xm) to B Ecient eavesdropper E may obtain (or generate) many pairs i ;F(x i )i hx Want it to be hard for E to decrypt a \new" F (x 0 ) (generalization) Thus, E wants to \learn" inverse of F (decryption) from examples! Inverse is \simple" given \trapdoor" (fairness of learning)

30 Applications Given by PKC: encryption schemes F with polynomial-time E's task as hard as factoring (F (x) =x e mod p q) inverses, Thus, learning (small) boolean circuits is intractable Also hard: boolean formulae, nite automata, small-depth networks neural DNF and decision trees?

31 The Fourier Representation Any function f : f0; 1g n!<is a 2 n -dimensional real vector Inner product: hf; hi =(1=2 n ) P x f(x)h(x) hf; hi = E U [f(x)h(x)] = Pr U [f(x) =h(x)], Pr U [f(x) 6= h(x)] Set of 2 n orthogonal basis functions: gz(x) = +1 if x has parity on bits i where z i =1,else gz(x) =,1 even Write f(x) = P z z(f) gz(x), coecients z(f) =hf; gzi

32 Fun Fourier Facts Parseval's Identity: hf; fi = P z(z(f)) 2 =1 Small DNF f: P z:w(z)c0 log(n) ( z(f)) 2 1 Small decision tree f: spectrum is sparse, only a small number of non-zero coecients Tree-based spectrum estimation using membership queries

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework Computational Learning Theory - Hilary Term 2018 1 : Introduction to the PAC Learning Framework Lecturer: Varun Kanade 1 What is computational learning theory? Machine learning techniques lie at the heart

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

Introduction to Computational Learning Theory

Introduction to Computational Learning Theory Introduction to Computational Learning Theory The classification problem Consistent Hypothesis Model Probably Approximately Correct (PAC) Learning c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

Dan Roth 461C, 3401 Walnut

Dan Roth  461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Learning Using FIND-CONS For any family of hypothesis

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Computational Learning Theory (COLT)

Computational Learning Theory (COLT) Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine

More information

Simple Learning Algorithms for. Decision Trees and Multivariate Polynomials. xed constant bounded product distribution. is depth 3 formulas.

Simple Learning Algorithms for. Decision Trees and Multivariate Polynomials. xed constant bounded product distribution. is depth 3 formulas. Simple Learning Algorithms for Decision Trees and Multivariate Polynomials Nader H. Bshouty Department of Computer Science University of Calgary Calgary, Alberta, Canada Yishay Mansour Department of Computer

More information

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, 018 Review Theorem (Occam s Razor). Say algorithm A finds a hypothesis h A H consistent with

More information

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization

More information

Web-Mining Agents Computational Learning Theory

Web-Mining Agents Computational Learning Theory Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)

More information

On the Sample Complexity of Noise-Tolerant Learning

On the Sample Complexity of Noise-Tolerant Learning On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Computational Learning Theory

Computational Learning Theory 0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Testing by Implicit Learning

Testing by Implicit Learning Testing by Implicit Learning Rocco Servedio Columbia University ITCS Property Testing Workshop Beijing January 2010 What this talk is about 1. Testing by Implicit Learning: method for testing classes of

More information

Can PAC Learning Algorithms Tolerate. Random Attribute Noise? Sally A. Goldman. Department of Computer Science. Washington University

Can PAC Learning Algorithms Tolerate. Random Attribute Noise? Sally A. Goldman. Department of Computer Science. Washington University Can PAC Learning Algorithms Tolerate Random Attribute Noise? Sally A. Goldman Department of Computer Science Washington University St. Louis, Missouri 63130 Robert H. Sloan y Dept. of Electrical Engineering

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Efficient PAC Learning Definition: A family H n of

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity

More information

CS446: Machine Learning Spring Problem Set 4

CS446: Machine Learning Spring Problem Set 4 CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Polynomial time Prediction Strategy with almost Optimal Mistake Probability Polynomial time Prediction Strategy with almost Optimal Mistake Probability Nader H. Bshouty Department of Computer Science Technion, 32000 Haifa, Israel bshouty@cs.technion.ac.il Abstract We give the

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Statistical Learning Learning From Examples

Statistical Learning Learning From Examples Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We

More information

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6 CS 395T Computational Learning Theory Lecture 3: September 0, 2007 Lecturer: Adam Klivans Scribe: Mike Halcrow 3. Decision List Recap In the last class, we determined that, when learning a t-decision list,

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

2 Upper-bound of Generalization Error of AdaBoost

2 Upper-bound of Generalization Error of AdaBoost COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y

More information

Lecture Notes 20: Zero-Knowledge Proofs

Lecture Notes 20: Zero-Knowledge Proofs CS 127/CSCI E-127: Introduction to Cryptography Prof. Salil Vadhan Fall 2013 Lecture Notes 20: Zero-Knowledge Proofs Reading. Katz-Lindell Ÿ14.6.0-14.6.4,14.7 1 Interactive Proofs Motivation: how can parties

More information

New Results for Random Walk Learning

New Results for Random Walk Learning Journal of Machine Learning Research 15 (2014) 3815-3846 Submitted 1/13; Revised 5/14; Published 11/14 New Results for Random Walk Learning Jeffrey C. Jackson Karl Wimmer Duquesne University 600 Forbes

More information

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech Distributed Machine Learning Maria-Florina Balcan, Georgia Tech Model for reasoning about key issues in supervised learning [Balcan-Blum-Fine-Mansour, COLT 12] Supervised Learning Example: which emails

More information

Efficient Noise-Tolerant Learning from Statistical Queries

Efficient Noise-Tolerant Learning from Statistical Queries Efficient Noise-Tolerant Learning from Statistical Queries MICHAEL KEARNS AT&T Laboratories Research, Florham Park, New Jersey Abstract. In this paper, we study the problem of learning in the presence

More information

Testing Problems with Sub-Learning Sample Complexity

Testing Problems with Sub-Learning Sample Complexity Testing Problems with Sub-Learning Sample Complexity Michael Kearns AT&T Labs Research 180 Park Avenue Florham Park, NJ, 07932 mkearns@researchattcom Dana Ron Laboratory for Computer Science, MIT 545 Technology

More information

8.1 Polynomial Threshold Functions

8.1 Polynomial Threshold Functions CS 395T Computational Learning Theory Lecture 8: September 22, 2008 Lecturer: Adam Klivans Scribe: John Wright 8.1 Polynomial Threshold Functions In the previous lecture, we proved that any function over

More information

Computational Lower Bounds for Statistical Estimation Problems

Computational Lower Bounds for Statistical Estimation Problems Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Abstract. In this paper we study an extension of the distribution-free model of learning introduced

Abstract. In this paper we study an extension of the distribution-free model of learning introduced Learning in the Presence of Malicious Errors Michael Kearns y AT&T Bell Laboratories Ming Li z University ofwaterloo Abstract In this paper we study an extension of the distribution-free model of learning

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 10 LEARNING THEORY For nothing ought to be posited without a reason given, unless it is self-evident or known by experience or proved by the authority of Sacred

More information

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process

More information

Computational learning theory. PAC learning. VC dimension.

Computational learning theory. PAC learning. VC dimension. Computational learning theory. PAC learning. VC dimension. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics COLT 2 Concept...........................................................................................................

More information

Computational Learning Theory

Computational Learning Theory 09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

COS597D: Information Theory in Computer Science October 5, Lecture 6

COS597D: Information Theory in Computer Science October 5, Lecture 6 COS597D: Information Theory in Computer Science October 5, 2011 Lecture 6 Lecturer: Mark Braverman Scribe: Yonatan Naamad 1 A lower bound for perfect hash families. In the previous lecture, we saw that

More information

CS 498F: Machine Learning Fall 2010

CS 498F: Machine Learning Fall 2010 Instructions: Make sure you write your name on every page you submit. This is a closed-book, closed-note exam. There are 8 questions + 1 extra credit question. Some reminders of material you may need is

More information

Statistical and Computational Learning Theory

Statistical and Computational Learning Theory Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the

More information

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time Analysis of Boolean Functions (CMU 8-859S, Spring 2007) Lecture 0: Learning DNF, AC 0, Juntas Feb 5, 2007 Lecturer: Ryan O Donnell Scribe: Elaine Shi Learning DNF in Almost Polynomial Time From previous

More information

An Algorithms-based Intro to Machine Learning

An Algorithms-based Intro to Machine Learning CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Avrim Blum Lecture 9: October 7, 2015 1 Computational Hardness of Learning Today we will talk about some computational hardness results

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

3 Finish learning monotone Boolean functions

3 Finish learning monotone Boolean functions COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 5: 02/19/2014 Spring 2014 Scribes: Dimitris Paidarakis 1 Last time Finished KM algorithm; Applications of KM

More information

1 The Probably Approximately Correct (PAC) Model

1 The Probably Approximately Correct (PAC) Model COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #3 Scribe: Yuhui Luo February 11, 2008 1 The Probably Approximately Correct (PAC) Model A target concept class C is PAC-learnable by

More information

A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University

A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University Gradient Boosting a powerful machine learning algorithm it can

More information

Lecture 5: February 16, 2012

Lecture 5: February 16, 2012 COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on

More information

Computational Learning Theory (VC Dimension)

Computational Learning Theory (VC Dimension) Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is

More information

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard

More information

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181. Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution

More information

Majority is incompressible by AC 0 [p] circuits

Majority is incompressible by AC 0 [p] circuits Majority is incompressible by AC 0 [p] circuits Igor Carboni Oliveira Columbia University Joint work with Rahul Santhanam (Univ. Edinburgh) 1 Part 1 Background, Examples, and Motivation 2 Basic Definitions

More information

Upper and Lower Bounds for Tree-like. Cutting Planes Proofs. Abstract. In this paper we study the complexity of Cutting Planes (CP) refutations, and

Upper and Lower Bounds for Tree-like. Cutting Planes Proofs. Abstract. In this paper we study the complexity of Cutting Planes (CP) refutations, and Upper and Lower Bounds for Tree-like Cutting Planes Proofs Russell Impagliazzo Toniann Pitassi y Alasdair Urquhart z UCSD UCSD and U. of Pittsburgh University of Toronto Abstract In this paper we study

More information

What Can We Learn Privately?

What Can We Learn Privately? What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Homin Lee Kobbi Nissim Adam Smith Los Alamos UT Austin Ben Gurion Penn State To appear in SICOMP

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Theoretical Advances in Neural Computation and Learning, V. P. Roychowdhury, K. Y. Siu, A. Perspectives of Current Research about

Theoretical Advances in Neural Computation and Learning, V. P. Roychowdhury, K. Y. Siu, A. Perspectives of Current Research about to appear in: Theoretical Advances in Neural Computation and Learning, V. P. Roychowdhury, K. Y. Siu, A. Orlitsky, editors, Kluwer Academic Publishers Perspectives of Current Research about the Complexity

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Lecture 4: LMN Learning (Part 2)

Lecture 4: LMN Learning (Part 2) CS 294-114 Fine-Grained Compleity and Algorithms Sept 8, 2015 Lecture 4: LMN Learning (Part 2) Instructor: Russell Impagliazzo Scribe: Preetum Nakkiran 1 Overview Continuing from last lecture, we will

More information

Lecture 14 - P v.s. NP 1

Lecture 14 - P v.s. NP 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 27, 2018 Lecture 14 - P v.s. NP 1 In this lecture we start Unit 3 on NP-hardness and approximation

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

1 Last time and today

1 Last time and today COMS 6253: Advanced Computational Learning Spring 2012 Theory Lecture 12: April 12, 2012 Lecturer: Rocco Servedio 1 Last time and today Scribe: Dean Alderucci Previously: Started the BKW algorithm for

More information