Computational and Statistical Learning Theory
|
|
- Ethan Paul
- 5 years ago
- Views:
Transcription
1 Computational and Statistical Learning Theory TTIC Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning
2 Hardness of Learning via Crypto Assumption: No poly-time algorithm for 3 b mod K that works for nonnegligible b, K = pq (p, q primes with 3 p 1 q 1 ) K, b f K 1 (b) very hard: No poly-time alg for non-negligible K, b K, a a 3 mod K easy K, a f K (a) easy Hard to learn H = h K b, i : b, i f K 1 (b) i D K, b f K 1 b easy (e.g. polytime) Hard to learn polytime functions K h K H D K a a = 3 a mod K Computable using log-depth logic circuit Computable using log-depth neural nets Hard to learn H Hard to learn log-depth circuit Hard to learn log-depth NN
3 Hardness of Learning via Crypto Public-key crypto is possible hard to learn poly-time functions Hardness of Discrete Cube Root hard to learn log(n)-depth logic circuits hard to learn log(n)-depth poly-size neural networks Hardness of breaking RSA hard to learn poly-length logical formulas hard to learn poly-size automata hard to learn push-down automata, ie regexps for some depth d, hard to learn poly-size depth-d threshold circuits (output of unit is one iff number of input units that are one is greater than threshold) hard to learn O(1)-depth, poly-size neural networks Hardness of lattice-shortest-vector based cryptography hard to learn intersection of n r halfspaces (for any r > 0) hard to learn depth-2, O n size neural networks
4 Intersections of Halfspaces H n k(n) = k n x i=1 w i, x > 0 w 1,, w k n R n O n 1.5 usvp RP Lattice-based cryptosystem is secure For any r > 0, hard to learn H n k n =n r Hard to learn 2-layer NN with n r hidden units Sasha Sherstov The unique shortest lattice vector problem: SVP v 1, v 2,, v n R n = arg min a 1,a 2,,a n Z a 1v 1 + a 2 v a n v n O n 1.5 usvp: only required to return SVP if next-shortest is O n 1.5 times longer Adam Klivans
5 Hardness of Learning via Crypto K, b f K 1 (b) very hard: No poly-time alg for non-negligible K, b Easy to generate random (K, D K ) K, a f K (a) easy Hard to learn H = h K b, i : b, i f K 1 (b) i D K, b f K 1 b easy (e.g. polytime) Hard to learn polytime functions
6 Hardness of Learning via Crypto K, b f K 1 (b) very hard: No poly-time alg for non-negligible K, b Easy to generate random (K, D K ) No poly-time alg for all K and almost all b K, a f K (a) easy Hard to learn H = h K b, i : b, i f K 1 (b) i D K, b f K 1 b easy (e.g. polytime) Hard to learn polytime functions
7 Hardness of Learning: Take II Recall how we proved hardness of proper learning: Reduction from deciding consistency with H If we had efficient proper learner, could train it and find consistent hypothesis in H if it exists Problem: if learning is not proper, might return good hypothesis not in H, even though D not consistent with H Instead: reduction from deciding between two possibilities: Sample is consistent with H For every consistent sample, return 1 w.p. 3/4 (over randomization in algorithm) Sample comes from random distribution E.g. sampled such that labels y independent of x For all but negligible samples S D m, return 0 w.p. 3/4 Amit Daniely
8 Hardness Relative to RSAT RSAT assumption: For some f K = ω 1, there is no poly-time randomized algorithm that gets as input a K-SAT formula with n f(k) constraints, and: If the input is satisfiable, then w.p. 3/4 (over the randomization in the algorithm), it outputs 1 If each constraint is generated independently and uniformly at random, then with probability approaching 1 (as n ) over the formula, w.p. 3/4 (over the randomization in the algorithm), it outputs 0 Theorem: Under the RSAT assumption, Poly-length DNFs are not efficiently PAC learnable e.g. h x = x 1 x 7 x 15 x 17 x 2 x 24 Intersection of ω 1 halfspaces are not efficiently PAC learnable 2-layer Neural Networks with O log log log n hidden units are not efficiently PAC learnable Amit Daniely
9 Hardness of Learning Axis-aligned rectangles in n dimensions Halfspaces in n dimensions Conjunctions on n variables 3-term DNF s DNF formulas of size poly(n) Generic logical formulas of size poly(n) Neural nets with at most poly(n) units Functions computable in poly(n) time Efficiently Properly Learnable Efficiently Learnable, but not Properly Not Efficiently Learnable
10 Realizable vs Agnostic Definition: A family H n of hypothesis classes is efficiently properly PAC-Learnable if there exists a learning rule A such that n ε, δ > 0, δ m n, ε, δ, D s.t. L D h = 0 for some h H, S D m n,ε,δ, L D A S ε and A(S)(x) can be computed in time poly n, 1 Τε, log 1 Τδ and A always outputs a predictor in H n Definition: A family H n of hypothesis classes is efficiently properly agnostically PAC-Learnable if there exists a learning rule A such that δ n ε, δ > 0, m n, ε, δ, D S D m n,ε,δ, L D A S inf L D h + ε h H n and A(S)(x) can be computed in time poly n, 1 Τε, log 1 Τδ and A always outputs a predictor in H n
11 Conditions for Efficient Agnostic Learning ERM H S = arg min h H L S(h) Claim: If VCdim H n poly(n), and Each h H n is computable in time poly(n) There is a poly-time (in size of input) algorithm for ERM H (i.e. that returns any ERM) then H n is efficiently agnostically properly PAC learnable. AGREEMENT H S, k = 1 iff h H L S h (1 k S ) Claim: If H n is efficiently properly agnostically PAC learnable, then AGREEMENT H RP
12 Poly-time functions? What is Properly Agnostically Learnable? Poly-length logical formulas? Poly-size depth-2 neural networks? Halfspaces (linear predictors)? X n = 0,1 n, H n = x w, x > 0 w R n Claim: AGREEMENT H is NP-Hard (optional HW problem) Conclusion: If NP RP, halfspaces are not efficiently properly agnostically learnable No! Conjunctions? Also NP-hard! No! (not even in realizable case) No! (not even in realizable case) No! (not even in realizable case) No! Yes! Unions of segments on the line n X n = 0,1, H n = x i=1 a i x b i a i, b i 0,1 Efficiently Properly Agnostically PAC Learnable!
13 Source of the Hardness min h H i l(h w x i ; y i ) h w x = w, x l 01 h x ; y = yh(x) 0 l 01 (h x ; y = 1) l sqr (h x ; y = 1) 1 1 h x R -1 h x R
14 Using a surrogate loss min h H i l(h w x i ; y i ) Instead of l 01 (z; y), use a surrogate l(z; y) s.t.: y l(z; y) is convex in z (and so easy to optimize) z,y l 01 z; y l(z; y) l sqr z; y = y z 2 l hinge z; y = 1 yz + = max{0,1 yz} l logistic z; y = log(1 + exp yz )
15 Agnostically Learning Halfspaces with the Hinge Loss H = x w, x > 0 w R n H = x w, x w R n arg min h H 1 m σ i l hinge (h(x i ) ; y i ) 1 arg min σ w R n m i 1 y i w, x i + Use linear programming: min σ i ξ i w R n y i w, x i 1 ξ i ξ R m ξ i 0 ξ i = 1 y i w, x i +
16 Does Minimizing a Surrogate Loss Also Minimize 0/1 Loss? l 01 z; y = yz 0 1 yz + = l hinge z; y Realizable case: w L S 01 x w, x = 0 L S hinge x w, x = 0, where w = L S hinge ERM hinge S = 0 w min i y i h w (x i ) L S 01 ERM hinge S L S hinge ERM hinge S = 0 Non-Realizable case: What can we ensure by minimizing surrogate loss???
17 Can we Efficiently Agnostically Learn? Minimizing a surrogate loss can be very bad (might result in L 01 w = 0.49 even when L 01 w = 0.001) Halfspaces not efficiently properly agnostically PAC learnable Finding the halfspace that minimizes the number of errors on a training set is NP-hard What about improper learning? next week we ll reduce learning intersection of halfspaces to agnostic learning halfspaces
18 Why Study Hardness? Understand why machine learning is essentially a computational problem Understand why we must sometimes take a non-exact/heuristic approach, and that it cannot be exact (eg use surrogate loss) Understand what we can never guarantee, and not try to guarantee it (e.g. cannot learn with a large NN just because there is a small NN that completely explains the data) Understand and be able to argue about sample complexity gaps between the statistical limit (using any learning rule) and the computational limit (using a tractable learning rule)
19 Weak vs Strong Learning Recall definition of (realizable) PAC learning of H using rule A( ): For any D s.t. inf L D h = 0, and any ε, δ > 0, using m(ε, δ) sample, h H δ L D A S < ε S D m(ε,δ) A( ) is a weak learner for H if: There exists ε < 1 Τ2, δ < 1, m, s.t. for any D with inf δ S D m (e.g. ε = 0.49 and 1 δ = 0.01) L D A S L D h = 0, h H < ε If H is weakly learnable, is it also strongly learnable? Yes: H is weakly learnable VCdim(H)< H is (strongly) learnable If H n is efficiently weakly learnable, is it also strongly efficiently learnable? If we have access to an (efficient) weak learner A( ), can we use it to build an (efficient) strong learner?
20 The Boosting Problem Boosting the Confidence: If the learning algorithm works only with some very small fixed probability 1 δ 0 (e.g. 1 δ 0 = 0.01), can we construct a new algorithm that works with arbitrarily high probability 1 δ (for any δ > 0)? Boosting the error: If the learning algorithm only returns a predictor that is guaranteed to be slightly better then chance, i.e. has error ε 0 = 1 γ < 1 (for some fixed γ > 0), 2 2 can we construct a new algorithm that achieves arbitrarily low error ε?
21 Boosting the Confidence For any δ: 1. For i=1..k: k = log 2Τδ log 1Τδ 0 Collect m 0 independent samples S i inf i h i = A(S i ) 2. Collect m val = 4 log 4k δ additional independent samples S val ε 2 3. Return h = arg min h 1,,h k L Sval h i w.p. 1 δ, L h i ε 0 ERM from class of size k Claim: w.p. 1 δ, L h ε 0 + ε Total samples used: O m 0 ε 0 log 1 + log1 δ δ ε 2 Efficient algorithm for some δ 0 < 1 and all ε > 0 with runtime and sample complexity poly(n, ε 0 ) efficient algorithm for any δ > 0 with runtime poly(n, ε, log 1 Τδ )
22 Boosting the Error? What if we can only find a predictor with relatively high excess error ε? We can always find a predictor with error 1 2 What if we have an algorithm that, for any source dist D s.t. inf L D h = 0, finds L D A S 1 γ. h 2 Can we use A( ) to find a predictor with arbitrarily low error?
23 Example: Weak Learning with a Weak Class X = R 2, H = axis aligned rectangles Decision stumps: B = s x i < θ i = 1,2, s = ±1, θ R Claim: For any D, if h HL D h = 0 h B L D h 3 7 < Since VCdim(B)=3, with m = m VC D = 3, ε = 0.001, δ = 0.9 : w.p. 0.1 over S D m : L D ERM B S < 0.43 Conclusion: ERM B ( ) is a weak learner for H with ε = 0.43 < 0.5 and δ = 0.9 < 1
Computational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Learning Using FIND-CONS For any family of hypothesis
More informationIntroduction to Machine Learning
Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Efficient PAC Learning Definition: A family H n of
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationTTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model
TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing
More informationComputational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationWeb-Mining Agents Computational Learning Theory
Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)
More informationStatistical Learning Learning From Examples
Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We
More informationFoundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015
10-806 Foundations of Machine Learning and Data Science Lecturer: Avrim Blum Lecture 9: October 7, 2015 1 Computational Hardness of Learning Today we will talk about some computational hardness results
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationEfficiently Training Sum-Product Neural Networks using Forward Greedy Selection
Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Greedy Algorithms, Frank-Wolfe and Friends
More informationLecture 29: Computational Learning Theory
CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More information3 Finish learning monotone Boolean functions
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 5: 02/19/2014 Spring 2014 Scribes: Dimitris Paidarakis 1 Last time Finished KM algorithm; Applications of KM
More information10.1 The Formal Model
67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationPAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht
PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable
More informationComputational Learning Theory
1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number
More informationComputational Learning Theory
Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationCS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell
CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate
More informationLecture 7: Passive Learning
CS 880: Advanced Complexity Theory 2/8/2008 Lecture 7: Passive Learning Instructor: Dieter van Melkebeek Scribe: Tom Watson In the previous lectures, we studied harmonic analysis as a tool for analyzing
More informationComputational Learning Theory. Definitions
Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational
More informationFrom average case complexity to improper learning complexity
From average case complexity to improper learning complexity [Extended Abstract] Amit Daniely Nati Linial Dept. of Mathematics, The School of Computer Science Hebrew University, Jerusalem, and Engineering,
More informationComputational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework
Computational Learning Theory - Hilary Term 2018 1 : Introduction to the PAC Learning Framework Lecturer: Varun Kanade 1 What is computational learning theory? Machine learning techniques lie at the heart
More informationPAC Model and Generalization Bounds
PAC Model and Generalization Bounds Overview Probably Approximately Correct (PAC) model Basic generalization bounds finite hypothesis class infinite hypothesis class Simple case More next week 2 Motivating
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More information1 Learning Linear Separators
8803 Machine Learning Theory Maria-Florina Balcan Lecture 3: August 30, 2011 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationIntroduction to Machine Learning (67577) Lecture 5
Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationBackground: Lattices and the Learning-with-Errors problem
Background: Lattices and the Learning-with-Errors problem China Summer School on Lattices and Cryptography, June 2014 Starting Point: Linear Equations Easy to solve a linear system of equations A s = b
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationICML '97 and AAAI '97 Tutorials
A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More informationCS446: Machine Learning Spring Problem Set 4
CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that
More informationOn the power and the limits of evolvability. Vitaly Feldman Almaden Research Center
On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationLearnability, Stability, Regularization and Strong Convexity
Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationMachine Learning. Model Selection and Validation. Fabio Vandin November 7, 2017
Machine Learning Model Selection and Validation Fabio Vandin November 7, 2017 1 Model Selection When we have to solve a machine learning task: there are different algorithms/classes algorithms have parameters
More informationVC dimension and Model Selection
VC dimension and Model Selection Overview PAC model: review VC dimension: Definition Examples Sample: Lower bound Upper bound!!! Model Selection Introduction to Machine Learning 2 PAC model: Setting A
More informationLecture 13: Introduction to Neural Networks
Lecture 13: Introduction to Neural Networks Instructor: Aditya Bhaskara Scribe: Dietrich Geisler CS 5966/6966: Theory of Machine Learning March 8 th, 2017 Abstract This is a short, two-line summary of
More informationLattice-Based Cryptography: Mathematical and Computational Background. Chris Peikert Georgia Institute of Technology.
Lattice-Based Cryptography: Mathematical and Computational Background Chris Peikert Georgia Institute of Technology crypt@b-it 2013 1 / 18 Lattice-Based Cryptography y = g x mod p m e mod N e(g a, g b
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session
More informationCS 6375: Machine Learning Computational Learning Theory
CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty
More informationComputational Learning Theory (COLT)
Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine
More informationDay 3: Classification, logistic regression
Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised
More informationCOS 402 Machine Learning and Artificial Intelligence Fall Lecture 3: Learning Theory
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 3: Learning Theory Sanjeev Arora Elad Hazan Admin Exercise 1 due next Tue, in class Enrolment Recap We have seen: AI by introspection
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationCS 446: Machine Learning Lecture 4, Part 2: On-Line Learning
CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not
More informationLEARNING KERNEL-BASED HALFSPACES WITH THE 0-1 LOSS
LEARNING KERNEL-BASED HALFSPACES WITH THE 0- LOSS SHAI SHALEV-SHWARTZ, OHAD SHAMIR, AND KARTHIK SRIDHARAN Abstract. We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationClassification: The PAC Learning Framework
Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationIntroduction to Machine Learning
Introduction to Machine Learning Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber University of Maryland FEATURE ENGINEERING Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationComputational Learning Theory
Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son
More information1 Learning Linear Separators
10-601 Machine Learning Maria-Florina Balcan Spring 2015 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being from {0, 1} n or
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationLecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity
Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on
More informationMachine Learning: Homework 5
0-60 Machine Learning: Homework 5 Due 5:0 p.m. Thursday, March, 06 TAs: Travis Dick and Han Zhao Instructions Late homework policy: Homework is worth full credit if submitted before the due date, half
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationA Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997
A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science
More informationComputational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013
Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction
More informationAgnostic Learning of Disjunctions on Symmetric Distributions
Agnostic Learning of Disjunctions on Symmetric Distributions Vitaly Feldman vitaly@post.harvard.edu Pravesh Kothari kothari@cs.utexas.edu May 26, 2014 Abstract We consider the problem of approximating
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationLearning Kernel-Based Halfspaces with the Zero-One Loss
Learning Kernel-Based Halfspaces with the Zero-One Loss Shai Shalev-Shwartz The Hebrew University shais@cs.huji.ac.il Ohad Shamir The Hebrew University ohadsh@cs.huji.ac.il Karthik Sridharan Toyota Technological
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationLearning and Fourier Analysis
Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture2.pdf CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationCS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6
CS 395T Computational Learning Theory Lecture 3: September 0, 2007 Lecturer: Adam Klivans Scribe: Mike Halcrow 3. Decision List Recap In the last class, we determined that, when learning a t-decision list,
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationMachine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016
Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning
More informationHypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell
Hypothesis Testing and Computational Learning Theory EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Overview Hypothesis Testing: How do we know our learners are good? What does performance
More informationChapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013
Chapter 2 Reductions and NP CS 573: Algorithms, Fall 2013 August 29, 2013 2.1 Reductions Continued 2.1.1 The Satisfiability Problem SAT 2.1.1.1 Propositional Formulas Definition 2.1.1. Consider a set of
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationIntroduction to Computational Learning Theory
Introduction to Computational Learning Theory The classification problem Consistent Hypothesis Model Probably Approximately Correct (PAC) Learning c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More information