Introduction to Machine Learning
|
|
- Calvin Day
- 6 years ago
- Views:
Transcription
1 Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 1 / 11
2 Recap Rademacher complexity provides nice guarantees R(h) ˆR(h) + m (H) + log 1 δ 2m (1) But in practice hard to compute for real hypothesis classes Is there a relationship with simpler combinatorial measures? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 2 / 11
3 Growth Function Define the growth function Π H : for a hypothesis set H as: m,π H (m) max {x 1,...,x m } X {(h(x1 ),...,h(x m ) : h H} (2) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 3 / 11
4 Growth Function Define the growth function Π H : for a hypothesis set H as: m,π H (m) max {x 1,...,x m } X {(h(x1 ),...,h(x m ) : h H} (2) i.e., the number of ways m points can be classified using H. Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 3 / 11
5 Rademacher Complexity vs. Growth Function If G is a function taking values in { 1,+1}, then 2lnΠG (m) m (G) m (3) Uses Masart s lemma Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 4 / 11
6 Vapnik-Chervonenkis Dimension VC(H) max m : Π H (m) = 2 m (4) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 5 / 11
7 Vapnik-Chervonenkis Dimension VC(H) max m : Π H (m) = 2 m (4) The size of the largest set that can be fully shattered by H. Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 5 / 11
8 VC Dimension for Hypotheses Need upper and lower bounds Lower bound: example Upper bound: Prove that no set of d + 1 points can be shattered by H (harder) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 6 / 11
9 Intervals What is the VC dimension of [a,b] intervals on the real line. Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 7 / 11
10 Intervals What is the VC dimension of [a,b] intervals on the real line. What about two points? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 7 / 11
11 Intervals What is the VC dimension of [a,b] intervals on the real line. What about two points? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 7 / 11
12 Intervals What is the VC dimension of [a,b] intervals on the real line. What about two points? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 7 / 11
13 Intervals What is the VC dimension of [a,b] intervals on the real line. Two points can be perfectly classified, so VC dimension 2 Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 7 / 11
14 Intervals What is the VC dimension of [a,b] intervals on the real line. Two points can be perfectly classified, so VC dimension 2 What about three points? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 7 / 11
15 Intervals What is the VC dimension of [a,b] intervals on the real line. Two points can be perfectly classified, so VC dimension 2 What about three points? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 7 / 11
16 Intervals What is the VC dimension of [a,b] intervals on the real line. Two points can be perfectly classified, so VC dimension 2 What about three points? No set of three points can be shattered Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 7 / 11
17 Intervals What is the VC dimension of [a,b] intervals on the real line. Two points can be perfectly classified, so VC dimension 2 What about three points? No set of three points can be shattered Thus, VC dimension of intervals is 2 Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 7 / 11
18 Sine Functions Consider hypothesis that classifies points on a line as either being above or below a sine wave {t sin(ωx) : ω } (5) Can you shatter three points? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 8 / 11
19 Sine Functions Consider hypothesis that classifies points on a line as either being above or below a sine wave Can you shatter three points? {t sin(ωx) : ω } (5) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 8 / 11
20 Sine Functions Consider hypothesis that classifies points on a line as either being above or below a sine wave {t sin(ωx) : ω } (5) Can you shatter four points? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 8 / 11
21 Sine Functions Consider hypothesis that classifies points on a line as either being above or below a sine wave {t sin(ωx) : ω } (5) How many points can you shatter? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 8 / 11
22 Sine Functions Consider hypothesis that classifies points on a line as either being above or below a sine wave {t sin(ωx) : ω } (5) Thus, VC dim of sine on line is Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 8 / 11
23 Connecting VC with growth function VC dimension obviously encodes the complexity of a hypothesis class, but we want to connect that to Rademacher complexity and the growth function so we can prove generalization bounds. Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 9 / 11
24 Connecting VC with growth function VC dimension obviously encodes the complexity of a hypothesis class, but we want to connect that to Rademacher complexity and the growth function so we can prove generalization bounds. Theorem Sauer s Lemma Let H be a hypothesis set with VC dimension d. Then m d m Π H (m) Φ d (m) (6) i i=0 Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 9 / 11
25 Connecting VC with growth function VC dimension obviously encodes the complexity of a hypothesis class, but we want to connect that to Rademacher complexity and the growth function so we can prove generalization bounds. Theorem Sauer s Lemma Let H be a hypothesis set with VC dimension d. Then m d m Π H (m) Φ d (m) (6) i i=0 This is good because the sum when multiplied out becomes ( m ) = m (m 1)... i i! = (m d ). When we plug this into the learning error limits: log(π H (2m)) = log( (m d )) = (d logm). Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 9 / 11
26 Proof of Sauer s Lemma Prelim: Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 10 / 11
27 Proof of Sauer s Lemma Prelim: We ll proceed by induction. Our two base cases are: If m = 0, ΠH (m) = 1. You have no data, so there s only one (degenerate) labeling If d = 0, ΠH (m) = 1. If you can t even shatter a single point, then it s a fixed function Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 10 / 11
28 Induction Step Assume that it holds for all m, d for which m + d < m + d. We are given H, S = m, S = x 1,...,x m, and d is the VC dimension of H. Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 11 / 11
29 Induction Step Assume that it holds for all m, d for which m + d < m + d. We are given H, S = m, S = x 1,...,x m, and d is the VC dimension of H. Build two new hypothesis spaces Encodes where the extended set has differences on the first m points. Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 11 / 11
30 What is VC dimension of H 1 and H 2? If a set is shattered by H1, then it is also shattered by H VC-dim(H 1 ) VC-dim(H) = d (7) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 12 / 11
31 What is VC dimension of H 1 and H 2? If a set is shattered by H1, then it is also shattered by H VC-dim(H 1 ) VC-dim(H) = d (7) If a set T is shattered by H2, then T {x m } is shattered by H since there will be two hypotheses in H for every element of H 2 by adding x m VC-dim(H 2 ) d 1 (8) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 12 / 11
32 Bounding Growth Function Π H (S) = H 1 + H 2 (9) d d 1 m 1 m 1 + i i (10) i=0 i=0 (11) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 13 / 11
33 Bounding Growth Function Π H (S) = H 1 + H 2 (9) d d 1 m 1 m 1 + i i (10) i=0 i=0 We can rewrite this as d i=0 (m 1 i 1 ) because ( x 1 ) = 0. (11) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 13 / 11
34 Bounding Growth Function Π H (S) = H 1 + H 2 (9) d d 1 m 1 m 1 + (10) i i i=0 i=0 d m 1 m 1 = + (11) i i 1 i=0 (12) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 13 / 11
35 Bounding Growth Function Π H (S) = H 1 + H 2 (9) d d 1 m 1 m 1 + (10) i i i=0 i=0 d m 1 m 1 = + (11) i i 1 i=0 d m = (12) i i=0 (13) Pascal s Triangle Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 13 / 11
36 Bounding Growth Function Π H (S) = H 1 + H 2 (9) d d 1 m 1 m 1 + (10) i i i=0 i=0 d m 1 m 1 = + (11) i i 1 i=0 d m = (12) i i=0 =Φ d (m) (13) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 13 / 11
37 Wait a minute... Is this combinatorial expression really (m d )? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 14 / 11
38 Generalization Bounds Combining our previous generalization results with Sauer s lemma, we have that for a hypothesis class H with VC dimension d, for any δ > 0 with probability at least 1 δ, for any h H, 2d log em R(h) ˆR(h) + d m log 1 δ + 2m (14) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 15 / 11
39 Whew! We now have some theory down We re now going to see if we can find an algorithm that has good VC dimension Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 16 / 11
40 Whew! We now have some theory down We re now going to see if we can find an algorithm that has good VC dimension And works well in practice... Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 16 / 11
41 Whew! We now have some theory down We re now going to see if we can find an algorithm that has good VC dimension And works well in practice... Support Vector Machines Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 16 / 11
42 Whew! We now have some theory down We re now going to see if we can find an algorithm that has good VC dimension And works well in practice... Support Vector Machines In class: more VC dimension examples Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 16 / 11
Introduction to Machine Learning
Introduction to Machine Learning Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber University of Maryland FEATURE ENGINEERING Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan
More informationClassification: The PAC Learning Framework
Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationThe definitions and notation are those introduced in the lectures slides. R Ex D [h
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation
More informationHypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell
Hypothesis Testing and Computational Learning Theory EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Overview Hypothesis Testing: How do we know our learners are good? What does performance
More informationClassification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13
Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13 Beyond Binary Classification Before we ve talked about
More informationCOMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization
: Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage
More informationInexact Search is Good Enough
Inexact Search is Good Enough Advanced Machine Learning for NLP Jordan Boyd-Graber MATHEMATICAL TREATMENT Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 1 of 1 Preliminaries:
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, 018 Review Theorem (Occam s Razor). Say algorithm A finds a hypothesis h A H consistent with
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationLogistic Regression. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE
Logistic Regression Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE Introduction to Data Science Algorithms Boyd-Graber and Paul Logistic
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU10701 11. Learning Theory Barnabás Póczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationCS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell
CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland LOGISTIC REGRESSION FROM TEXT Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationComputational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013
Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction
More informationGANs. Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG
GANs Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG Machine Learning: Jordan Boyd-Graber UMD GANs 1 / 7 Problems with Generation Generative Models Ain t Perfect
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationDiscrete Probability Distributions
Discrete Probability Distributions Data Science: Jordan Boyd-Graber University of Maryland JANUARY 18, 2018 Data Science: Jordan Boyd-Graber UMD Discrete Probability Distributions 1 / 1 Refresher: Random
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationThe Vapnik-Chervonenkis Dimension
The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationMACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension
MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB Prof. Dan A. Simovici (UMB) MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension 1 / 30 The
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationTHE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY
THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY Dan A. Simovici UMB, Doctoral Summer School Iasi, Romania What is Machine Learning? The Vapnik-Chervonenkis Dimension Probabilistic Learning Potential
More informationWeb-Mining Agents Computational Learning Theory
Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationLearning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationDecision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1
Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last
More informationComputational Learning Theory (VC Dimension)
Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationLanguage Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN
Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer
More informationComputational Learning Theory
0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More informationExpectations and Entropy
Expectations and Entropy Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH Data Science: Jordan Boyd-Graber UMD Expectations and Entropy 1 / 9 Expectation
More informationMACHINE LEARNING. Vapnik-Chervonenkis (VC) Dimension. Alessandro Moschitti
MACHINE LEARNING Vapnik-Chervonenkis (VC) Dimension Alessandro Moschitti Department of Information Engineering and Computer Science University of Trento Email: moschitti@disi.unitn.it Computational Learning
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationVC dimension and Model Selection
VC dimension and Model Selection Overview PAC model: review VC dimension: Definition Examples Sample: Lower bound Upper bound!!! Model Selection Introduction to Machine Learning 2 PAC model: Setting A
More informationComputational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization
More informationLogistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE
Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression
More informationLecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.
Lecture Slides for INTRODUCTION TO Machine Learning By: alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Postedited by: R. Basili Learning a Class from Examples Class C of a family car Prediction:
More informationComputational Learning Theory
Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationAn Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI
An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informationComputational Learning Theory
Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationwhere x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationModels of Language Acquisition: Part II
Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationComputational Learning Theory. Definitions
Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational
More informationRandom Variables and Events
Random Variables and Events Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH Data Science: Jordan Boyd-Graber UMD Random Variables and Events 1 /
More informationComputational Learning Theory
Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8] Computational learning theory Setting 1: learner poses queries to teacher Setting 2: teacher chooses examples Setting
More informationCSE 648: Advanced algorithms
CSE 648: Advanced algorithms Topic: VC-Dimension & ɛ-nets April 03, 2003 Lecturer: Piyush Kumar Scribed by: Gen Ohkawa Please Help us improve this draft. If you read these notes, please send feedback.
More informationFrom Batch to Transductive Online Learning
From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationClustering. Introduction to Data Science. Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM LAUREN HANNAH
Clustering Introduction to Data Science Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM LAUREN HANNAH Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Introduction to Data Science
More information1 A Lower Bound on Sample Complexity
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #7 Scribe: Chee Wei Tan February 25, 2008 1 A Lower Bound on Sample Complexity In the last lecture, we stopped at the lower bound on
More informationAn Introduction to No Free Lunch Theorems
February 2, 2012 Table of Contents Induction Learning without direct observation. Generalising from data. Modelling physical phenomena. The Problem of Induction David Hume (1748) How do we know an induced
More informationLearning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38
Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and
More informationCS 6375: Machine Learning Computational Learning Theory
CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty
More informationOn the Approximability of Partial VC Dimension
On the Approximability of Partial VC Dimension Cristina Bazgan 1 Florent Foucaud 2 Florian Sikora 1 1 LAMSADE, Université Paris Dauphine, CNRS France 2 LIMOS, Université Blaise Pascal, Clermont-Ferrand
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationLearning Theory, Overfi1ng, Bias Variance Decomposi9on
Learning Theory, Overfi1ng, Bias Variance Decomposi9on Machine Learning 10-601B Seyoung Kim Many of these slides are derived from Tom Mitchell, Ziv- 1 Bar Joseph. Thanks! Any(!) learner that outputs a
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More informationComputational Learning Theory for Artificial Neural Networks
Computational Learning Theory for Artificial Neural Networks Martin Anthony and Norman Biggs Department of Statistical and Mathematical Sciences, London School of Economics and Political Science, Houghton
More informationVC-dimension for characterizing classifiers
VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationk-fold unions of low-dimensional concept classes
k-fold unions of low-dimensional concept classes David Eisenstat September 2009 Abstract We show that 2 is the minimum VC dimension of a concept class whose k-fold union has VC dimension Ω(k log k). Keywords:
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationOn the Sample Complexity of Noise-Tolerant Learning
On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationComputational Learning Theory
09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997
More informationLecture 29: Computational Learning Theory
CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational
More informationBITS F464: MACHINE LEARNING
BITS F464: MACHINE LEARNING Lecture-09: Concept Learning Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031 INDIA Jan
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 10 LEARNING THEORY For nothing ought to be posited without a reason given, unless it is self-evident or known by experience or proved by the authority of Sacred
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data
More informationName (NetID): (1 Point)
CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationMachine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel
Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING Slides adapted from Tom Mitchell and Peter Abeel Machine Learning: Jordan Boyd-Graber UMD Machine Learning
More information