VC dimension and Model Selection
|
|
- Hollie Cannon
- 5 years ago
- Views:
Transcription
1 VC dimension and Model Selection
2 Overview PAC model: review VC dimension: Definition Examples Sample: Lower bound Upper bound!!! Model Selection Introduction to Machine Learning 2
3 PAC model: Setting A distribution: D (unknown) Target function: c t from C c t : X {0,1} Hypothesis: h from H h: X {0,1} Error probability: error(h) = Prob D [h(x) c t (x)] Oracle: EX(c t,d) Intro. to Machine Learning
4 PAC model: Definition C and H are concept classes over X. C is PAC learnable by H if There Exist an Algorithm A such that: For any distribution D over X and c t in C for every input e and d: outputs a hypothesis h in H, while having access to EX(c t,d) with probability 1-d we have error(h) < e Complexities: sample, running time Intro. to Machine Learning
5 PAC model last week For a finite hypothesis class H, sample size m: Realizable case: m > (1/e) ln ( H /d) Non-realizable m > (2/e 2 ) ln (2 H /d) Impossibility results: m > (1/2) log H m > (1/4e) ln (1/d) Introduction to Machine Learning 5
6 VC dimension: motivation Infinite hypothesis class Threshold Rectangles TODAY: general VC dimension Applies both to realizable and non-realizable. Introduction to Machine Learning 6
7 VC dimension: definition Notation: C concept class; S - sample Projection: Π C S = c S c C Shattering: C shatters S if Π C S = 2 S VC dimension: size of largest S shattered max d: S, S = d, Π C S = 2 S If no max then infinity For every d there is a shattered set of size d Introduction to Machine Learning 7
8 VC dimension: Threshold c θ (x) = I(x θ) VC 1 S={0.5}: c = 1 and c = 0 VC <2 S = z 1, z 2 Assume z 1 < z 2 c z 1 = 1, c(z 2 ) = 0 Introduction to Machine Learning 8
9 VC dimension: union of intervals Intervals on [0,1] Finite but unbounded For any d points: VC-dim = infinity Introduction to Machine Learning 9
10 VC dimension: convex polygon Convex polygon For any d points VC dimension = infinity Introduction to Machine Learning 10
11 VC dimension: hyperplane c w,θ x = sign( i w i x i + θ) VC dimension d+1 S = 0, e 1,, e d Given labeling L in {-1,+1}, define c w,θ x : w i = L(x i ) θ = L 0 2 c w,θ 0 = sign(θ) & c w,θ e i = sign(w i + θ) Introduction to Machine Learning 11
12 VC dimension: hyperplane VC dimension < d+2 For contradiction Assume there is a shattered S and S =d+2 Radom Theorem: S R d, S d + 2 S S conv S conv S S Let S be positive and S-S be negative Let c w,θ be the separating hyperplane Let POS be the positive and NEG be negatives of c w,θ Introduction to Machine Learning 12
13 VC dimension: hyperplane conv S POS & conv S S NEG closed under convex combinations Radom Theorem: conv S conv S S However: POS NEG = Contradiction! There is no such set S VC-dim < d+2 QED Introduction to Machine Learning 13
14 VC dim: Sample lower bound Theorem: VC-dim(C)=d+1 m d 16ε Proof: Let {z 0, z 1,, z d } D x 1 8ε x = z 0 = 8ε x = z i 0 otherwize d Target function: c t z 0 = 1; c t z i = 0 or 1 (prob 1 2 ) RARE = {z 1,, z d } Assume S RARE d 2 UNSEEN d 2 Pr error 1 2 2ε 8ε d UNSEEN Introduction to Machine Learning 14
15 VC dim: Sample lower bound E S RARE = 8εm d 2 Pr S RARE d With probability at least ½ Error at least 2ε QED Introduction to Machine Learning 15
16 VC dim: sample upper bound Incorrect proof For sample S: C S = Π C S is finite Use finite class bound: m 1 ε log Π C S δ Problem: S defines C S = Π C S Solution Take 2m points S = S 1 S 2 The randomization in the split to S 1 and S 2 Benefit: We have Π C S Introduction to Machine Learning 16
17 VC dim: sample upper bound Bad concepts Bad = {h error(h)>ε} Hitting set S: For every h in Bad Exists x in S c t (x) h(x) Goal Compute prob. of S being a hitting set Event A: S 1 not hitting set Exists h in Bad which is consistent Pr[A] <??? Event B: Exists h in Bad h consistent with S 1 h has εm errors on S 2 Introduction to Machine Learning 17
18 VC dim: sample upper bound Pr[B]=Pr[B A]Pr[A] Since B implies A Pr[B A] Fix such an h Expected errors εm Probability at least ½ Result: 2 Pr[B] Pr[A] F = Π C S 1 S 2 Fix h in F: h consistent with S 1 h has errors εm on S 2 l number of errors Compute the prob. over partitions S 1 and S 2 Introduction to Machine Learning 18
19 VC dim: sample upper bound Number of total partitions: 2m l Number of partitions which make h consistent on S 1 m l Prob bound m l 2m l l 1 m i = i=0 1 2m i 2 l Bounding probabilities: Union bound over h in F Pr[B] F 2 -εm Pr[A] 2Pr[B] 2 F 2 -εm Introduction to Machine Learning 19
20 VC dim: sample upper bound High confidence δ 2 F 2 εm m 1 ε log 2 F δ Need to bound F F = Π C S 1 S 2 Sauer-Shelah Lemma: VC-dim(C)=d S =2m d Π C S i=0 Bound 2 m 2(2m) d for m d for m>d 2m i Introduction to Machine Learning 20
21 VC dim: Sampling Theorem Sample bound m 1 ε log 4 2m d δ m ε ε log 1 δ +d ε m = O( 1 ε log 1 δ +d ε log d ) ε log 2m Non-realizable m = O( 1 ε 2 log 1 δ + d ε 2 log d ε ) Realizable case Proof methodology Introduction to Machine Learning 21
22 Rademacher Complexity Motivation: Tighter bounds; Dist. Dependent Notation: f 1, +1 ; f F Pr σ i = +1 = Pr σ i = 1 = 1 2 Introduction to Machine Learning 22
23 Rademacher Complexity Definition (Radmacher Complexity): S sample of size m R S F R D F = E σ [max f F i=1 m σ i f(x i )] = E S [R S (F)] Introduction to Machine Learning 23
24 Rademacher Complexity: expected overfitting Theorem (expected overfitting): E S Proof: max f F 1 m Two sample trick, add S = E S max f F m i=1 f xi E D [f x ] 2R D (F) 1 m m i=1 f xi E S [ 1 m m i=1 f xi ] E S,S max f F 1 m m i=1 f xi f(x i ) Introduction to Machine Learning 24
25 Rademacher Complexity: expected overfitting = E S,S max f F 1 m m i=1 σi (f x i f(x i )) E S max f F 1 m m i=1 σi f x i +E S max f F 1 m m i=1 σi f x i = 2R D (F) QED Introduction to Machine Learning 25
26 Rademachar Theorem With probability 1-δ, for every h H: ε h ε h + R D H + ln 2 δ 2m ε h + R S H + 3 ln(2 δ ) 2m Introduction to Machine Learning 26
27 Model selection - Outline Motivation Overfitting Structural Risk Minimization Hypothesis Validation Introduction to Machine Learning 27
28 Motivation: Problems: We have too few examples We have a very rich hypothesis class How can we find the best hypothesis? Alternatively: Usually we choose the hypothesis class How rich of a class we want? How should we go about doing it? Introduction to Machine Learning 28
29 Overfitting Concept class: Intervals on a line Can classify any training set Zero training error: Is this the only goal?! Introduction to Machine Learning 29
30 Overfitting: Intervals Can always get zero training error! Are we interested in zero training error?! Introduction to Machine Learning 30
31 Overfitting: Intervals intervals errors Introduction to Machine Learning 31
32 Overfitting Simple concept plus noise A very complex concept insufficient number of examples + noise 1/3 Introduction to Machine Learning 32
33 Model Selection error train error generalization error complexity penelty complexity Introduction to Machine Learning 33
34 Theoretical Model Nested Hypothesis classes H 1 H 2 H 3 H i There is a target function c t (x), non-realizable. True errors: ε(h) = Pr [ h c t ] ε i = inf h Hi e(h) ε(h * )= inf i ε i h * is best hypothesis Training error ε h = 1 m m i=1 I[ h c t ] ε i = inf h Hi ε (h) Introduction to Machine Learning 34
35 Theoretical Model Complexity of h d(h) = min i {h H i } Add a penalty for d(h) minimize: ε(h)+penalty(h) Penalty based. Chose the hypothesis which minimizes: ε(h)+penalty(h) Introduction to Machine Learning 35
36 Structural Risk Minimization Parameters: λ i and δ i such that: Pr h H i : ε h ε h > λ i δ i i δ i = δ δ i = δ/2 i Implies: with prob. 1-δ Pr h H: ε h ε h > λ d(h) δ d(h) Introduction to Machine Learning 36
37 Structural Risk Minimization Setting penalty h Finite H i = λ d(h) λ i = log H i /δ m VC-dim(H i )=i λ i = i log i/δ m Introduction to Machine Learning 37
38 SRM: Performance THEOROM h * : best hypothesis g srm : SRM choice With probability 1-d ε(h * ) ε(g srm ) ε(h * )+ 2 penalty(h * ) Note: bound depends only on h * Introduction to Machine Learning 38
39 Proof Bounding the error in H i Pr ε(g srm ε g srm > λ srm Pr[ h H srm : ε(h) ε h > λ srm δ srm Bounding the error across H i ε g srm ε g srm λ srm ε h + λ ε h + λ ε g srm ε h + λ srm ε h + 2λ ε(g srm ) QED Introduction to Machine Learning 39
40 Hypothesis Validation Separate sample to training and selection. Using the training Select from each H i a candidate g i Using the selection sample select between g 1,,g m The split size (1- )m training set m selection set Introduction to Machine Learning 40
41 Hypothesis Validation: Algorithm Using (1-γ)m examples: S 1 ε 1 h = error on S 1 g i = arg min h H i ε 1 (h) Using γm examples: S 2 ε 2 h = error on S 2 g HV = arg min g i G ε 2(g i ) Return g HV Introduction to Machine Learning 41
42 Hypo. Validation: Performance Errors ε hv (m) = error of HV Using m examples ε A (m) = error of A Any algorithm Using m examples Selecting g i from H i only restriction on A For example: any penalty function e Theorem: with probability 1-d hv ( m) e 2 A ((1 ) m) ln(2m m / d ) Introduction to Machine Learning 42
43 Hypo. Validation: Analysis Pr ε g i ε 2 g i > λ 2e λ2 γm Pr i: ε g i ε 2 g i > λ 2 G e λ2γm = δ Since G m: λ = ln 2m/δ γm ε 2 g i + λ ε 2 g i ε 2 g i ε 2 g HV ε 2 g HV ε g HV λ ε 2 g i + 2λ ε g HV Introduction to Machine Learning 43
44 Summary PAC model Generalization bounds Empirical Risk Minimization VC dimension Rademacher complexity Model Selection Structural Risk Minimization (SRM) Hypothesis selection Introduction to Machine Learning 45
PAC Model and Generalization Bounds
PAC Model and Generalization Bounds Overview Probably Approximately Correct (PAC) model Basic generalization bounds finite hypothesis class infinite hypothesis class Simple case More next week 2 Motivating
More informationCS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell
CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU10701 11. Learning Theory Barnabás Póczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationStatistical Learning Learning From Examples
Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationGeneralization theory
Generalization theory Chapter 4 T.P. Runarsson (tpr@hi.is) and S. Sigurdsson (sven@hi.is) Introduction Suppose you are given the empirical observations, (x 1, y 1 ),..., (x l, y l ) (X Y) l. Consider the
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationLecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.
Lecture Slides for INTRODUCTION TO Machine Learning By: alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Postedited by: R. Basili Learning a Class from Examples Class C of a family car Prediction:
More informationCS 6375: Machine Learning Computational Learning Theory
CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationComputational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization
More informationComputational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013
Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationCOMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization
: Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage
More informationComputational Learning Theory
Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationThe Vapnik-Chervonenkis Dimension
The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and
More informationSolving Classification Problems By Knowledge Sets
Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationTHE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY
THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY Dan A. Simovici UMB, Doctoral Summer School Iasi, Romania What is Machine Learning? The Vapnik-Chervonenkis Dimension Probabilistic Learning Potential
More informationComputational Learning Theory
1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number
More informationProbably Approximately Correct (PAC) Learning
ECE91 Spring 24 Statistical Regularization and Learning Theory Lecture: 6 Probably Approximately Correct (PAC) Learning Lecturer: Rob Nowak Scribe: Badri Narayan 1 Introduction 1.1 Overview of the Learning
More informationComputational Learning Theory (VC Dimension)
Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis
More informationComputational Learning Theory. Definitions
Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational
More informationMACHINE LEARNING. Probably Approximately Correct (PAC) Learning. Alessandro Moschitti
MACHINE LEARNING Probably Approximately Correct (PAC) Learning Alessandro Moschitti Department of Information Engineering and Computer Science University of Trento Email: moschitti@disi.unitn.it Objectives:
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More information1 The Probably Approximately Correct (PAC) Model
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #3 Scribe: Yuhui Luo February 11, 2008 1 The Probably Approximately Correct (PAC) Model A target concept class C is PAC-learnable by
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationDay 3: Classification, logistic regression
Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised
More informationHypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell
Hypothesis Testing and Computational Learning Theory EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Overview Hypothesis Testing: How do we know our learners are good? What does performance
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More information12.1 A Polynomial Bound on the Sample Size m for PAC Learning
67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 12: PAC III Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 In this lecture will use the measure of VC dimension, which is a combinatorial
More informationClassification: The PAC Learning Framework
Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationIntroduction to Machine Learning
Introduction to Machine Learning Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber University of Maryland FEATURE ENGINEERING Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationMachine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016
Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationComputational Learning Theory (COLT)
Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine
More informationMachine Learning. Model Selection and Validation. Fabio Vandin November 7, 2017
Machine Learning Model Selection and Validation Fabio Vandin November 7, 2017 1 Model Selection When we have to solve a machine learning task: there are different algorithms/classes algorithms have parameters
More informationThe definitions and notation are those introduced in the lectures slides. R Ex D [h
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationReferences for online kernel methods
References for online kernel methods W. Liu, J. Principe, S. Haykin Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, 2010. W. Liu, P. Pokharel, J. Principe. The kernel least mean square
More informationAn Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI
An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process
More informationLearning Theory. Aar$ Singh and Barnabas Poczos. Machine Learning / Apr 17, Slides courtesy: Carlos Guestrin
Learning Theory Aar$ Singh and Barnabas Poczos Machine Learning 10-701/15-781 Apr 17, 2014 Slides courtesy: Carlos Guestrin Learning Theory We have explored many ways of learning from data But How good
More informationMACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension
MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB Prof. Dan A. Simovici (UMB) MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension 1 / 30 The
More informationComputational Learning Theory
09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationLearning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks!
Learning Theory Machine Learning 10-601B Seyoung Kim Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks! Computa2onal Learning Theory What general laws constrain inducgve learning?
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear
More informationUnderstanding Machine Learning A theory Perspective
Understanding Machine Learning A theory Perspective Shai Ben-David University of Waterloo MLSS at MPI Tubingen, 2017 Disclaimer Warning. This talk is NOT about how cool machine learning is. I am sure you
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationA Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997
A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science
More informationLecture 15: Neural Networks Theory
EECS 598-5: Theoretical Foundations of Machine Learning Fall 25 Lecture 5: Neural Networks Theory Lecturer: Matus Telgarsky Scribes: Xintong Wang, Editors: Yuan Zhuang 5. Neural Networks Definition and
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationThe Sample Complexity of Revenue Maximization in the Hierarchy of Deterministic Combinatorial Auctions
The Sample Complexity of Revenue Maximization in the Hierarchy of Deterministic Combinatorial Auctions Ellen Vitercik Joint work with Nina Balcan and Tuomas Sandholm Theory Lunch 27 April 2016 Combinatorial
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationSelective Prediction. Binary classifications. Rong Zhou November 8, 2017
Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More information1 Learning Linear Separators
10-601 Machine Learning Maria-Florina Balcan Spring 2015 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being from {0, 1} n or
More informationε-nets and VC Dimension
ε-nets and VC Dimension Sampling is a powerful idea applied widely in many disciplines, including CS. There are at least two important uses of sampling: estimation and detection. CNN, Nielsen, NYT etc
More information1 A Lower Bound on Sample Complexity
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #7 Scribe: Chee Wei Tan February 25, 2008 1 A Lower Bound on Sample Complexity In the last lecture, we stopped at the lower bound on
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data
More information