12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
|
|
- Gervais Garrison
- 6 years ago
- Views:
Transcription
1 12. Structural Risk Minimization ECE 830 & CS 761, Spring / 23
2 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses Y Definition: predictor / classifier A predictor is a function f : X Y, where X is the feature space and Y is the label space. Let F be a collection of predictors. We also have a loss function l : Y Y R +. e.g. y is true label, ŷ is predicted label, and we measure l(y, ŷ) 0. In this lecture we focus on l(y, ŷ) = 1 {y ŷ} (0/1 loss). Main assumption: (x i, y i ) iid P iid, where P is unknown. 2 / 23
3 Goal: Select an f F so that it minimizes Risk = R(f) = E (x,y) P [l(y, f(x))] So far we have focused on empirical risk minimization, where ˆR(f) := 1 n n l(y i, f(x i )) i=1 is the empirical risk. Then the empirical risk minimizer (ERM) is ˆf = arg min ˆR(f). 3 / 23
4 ERM performance What can be said about the ERM s performance? Case 1: Finite sets of classifiers. We showed that with probability 1 δ R(f) ˆR(f) + which leads to the bound log F + log(1/δ) n f F E[R( ˆf)] min R(f) log F + log n n 4 / 23
5 ERM performance What can be said about the ERM s performance? Case 2: Classifiers with finite VC dimension. Let S(F, n) be the Shatter coefficient for F, representing the number of different effective classifiers are in F for n training samples; the VC dimension is V F = arg max 1 {S(F,k)=2 k } k 1 and can be thought of as the largest number of examples that can be arbitrarily labeled. We showed using Sauer s lemma, S(F, n) (n + 1) V (F) that with probability 1 δ R(f) ˆR(f) V log n + log(1/δ) n which leads to the bound E[R( ˆf)] min R(f) V log n. n 4 / 23
6 Example: Histogram classifiers Let X = [0, 1] d, Y = {0, 1} with 0/1 loss. Let F k denote the set of histogram classifiers with k bins. Note F k = 2 k. If we fix m = k classifier bins, then we have with probability at least 1 δ R(f) ˆR(f) m log 2 + log(2/δ). n 5 / 23
7 Example: Histograms continued Let R be the Bayes Risk: R = min f R(f) where the minimization is over all classifiers, including those potentially not in F and based on the unknown distribution P. Then for the empirical risk minimizer over F = F m we have E[R( ˆf)] R = E[R( ˆf)] min R(f) + min R(f) R }{{}}{{} estimation error approximation error m log(n) + min R(f) R n }{{} decreases with m How should we choose m? Choosing m in advance based on n means that we cannot optimally balance between the two terms in the bound for all distributions P. We might consider F = k 1 F k, but this set has F = V (F) =. 6 / 23
8 Countably Infinite Sets of Classifiers Suppose that F is a countable, possibly infinite, collection of candidate functions. Example: Histogram classifiers F = k 1 F k Further suppose that we have some prior distribution p over this set, so that p(f) 0 f F and p(f) = 1. This provides two advantages: 1. By choosing p(f) larger for certain f, we can preferentially treat those candidates 2. We do not need F to be finite and we only require p(f) = 1 7 / 23
9 Let then we have c(f) = log p(f); e c(f) = 1. The numbers c(f) can be interpreted as -log of prior probabilities codelengths measures of complexity 8 / 23
10 Now recall Hoeffding s inequality. For each f and every ɛ > 0 ( P R(f) ˆR ) n (f) ɛ e ɛ2 or for every δ > 0 ( ) P R(f) ˆR log(1/δ) n (f) δ Suppose δ > 0 is specified. Using the values c(f) for f F, define δ(f) := δe c(f) Then we have ( ) P R(f) ˆR log(1/δ(f)) n (f) δ(f) 9 / 23
11 Furthermore we can apply the union bound as follows P R(f) ˆR log(1/δ(f)) n (f) ( ) P R(f) ˆR log(1/δ(f)) n (f) δ(f) = e c(f) δ = δ 10 / 23
12 Summary We have that f F and δ > 0 with probability at least 1-δ R(f) ˆR log(1/δ(f)) n (f) + = ˆR c(f) + log(1/δ) n (f) + 11 / 23
13 Example: Finite sets Suppose F is finite and c(f) = log F prior). Then and f F (this is a uniform e c(f) = e log F = 1 F = 1 δ(f) = δ F which implies f F, F <, and δ > 0 with probability at least 1 δ R(f) ˆR log F + log(1/δ) n (f) + Note that this is precisely the PAC bound we derived in the last lectures. 12 / 23
14 Example: Histogram Classifiers X = [0, 1] d, Y = {0, 1}. Let F k, k=1, 2,... denote the collection of histogram classification rules with k equal volume bins, and let F = k 1 F k. For f F k, we choose c(f) = 2k. (We will how to derive this and that it satisfies e c(f) 1 in the next lecture.) Then f k 1 F k and δ > 0, with probability at least 1 δ R(f) ˆR n (f) + 2kf log 2 + log(1/δ) where k f is the number of bins in histogram corresponding to f. 13 / 23
15 Example: Histograms continued Contrast with the bound we had for the class of m bin histograms alone: f F m and δ > 0, with probability 1 δ R(f) ˆR m log 2 + log(1/δ) n (f) + Notice the bound for all histograms rules is almost as good as the bound for only the m-bin rules. That is, when k f = m the bounds are within a factor of 2. On the other hand, the new bound is a big improvement, since it also gives us a guide for selecting the number of bins. 14 / 23
16 Beyond ERMs The above bounds can be used to derive a useful alternative to empirical risk minimization one which exploits the prior incapsulated by p(f) = e c(f). In general, we want to choose a classifier ˆf F so that E[R( f n )] inf R(f) is as small as possible. We d like to choose ˆf = arg min R(f) but we cannot measure R(f) to minimize it. However, we can minimize its upper bound! 15 / 23
17 Structural risk minimization Definition: Structural risk minimizer For a countably infinite or finite class F, let c(f) be a function such that e c(f) 1. Then for any δ (0, 1), where f δ n = arg min C(f, n, δ) is the structural risk minimizer. { Rn (f) + C(f, n, δ)} c(f) + log(2/δ) 16 / 23
18 According to the PAC bound, f F and δ > 0, with probability 1 δ, and in particular, R(f) R n (f) + C(f, n, δ) R( f δ n) R n ( f δ n) + C( f δ n, n, δ) so, by the definition of f δ n, f F R( f δ n) R n (f) + C(f, n, δ) We will make use of the inequality above in a moment. First note that f F E[R( f δ n)] R(f) = E[R( f δ n) R n (f)] + E[ R n (f) R(f)] The second term is exactly 0, since E[ R n (f)] = R(f). 17 / 23
19 Now consider the first term E[R( f δ n) R n (f)]. Let Ω be the set of events on which R( f δ n) R n (f) C(f, n, δ), f F From our PAC bound, we know that P (Ω) 1 δ. Thus, E[R( f δ n) R n (f)] =E[R( f δ n) R n (f) Ω]P (Ω) + E[R( f δ n) R n (f) Ω c ](1 P (Ω)) C(f, n, δ) + δ (since 0 R, R 1, P (Ω) 1 and 1 P (Ω) δ) c(f) + log(2/δ) = + δ c(f) = log n + 1 (by setting δ = 1 ) n n 18 / 23
20 We can summarize our analysis with the following theorem. Theorem: Complexity Regularized Model Selection Let F be a collection of functions, and assign a positive number c(f) to each f F such that e c(f) 1. Define the structural risk minimizer f n = arg min R n (f) + c(f) log n Then, E[R( f n )] inf R(f) + c(f) log n + 1 n 19 / 23
21 This shows that R n (f) + c(f) log n is a reasonable surrogate for c(f) R(f) + log n 20 / 23
22 Example: Histogram Classifiers Let X = [0, 1] d be the input space and Y = {0, 1} be the output space. Let F k, k = 1, 2,... denote the collection of histogram classification rules with k equal volume bins. Let f n be the structural risk minimizer ˆf n = arg min ˆR(f) + c(f) log n Recall our choice c(f) = 2k for f a k-bin histogram classifier. Then equivalently ˆf n = min k 1 min 2k Rn (f) + log n k 21 / 23
23 Example: Histograms continued That is, for each k, let f (k) n = arg min k Rn (f) Then select the best k according to k = arg min k 1 R (k) 2k + n ( f 1 2 n ) + log n and set f n = f ( k) n Then, E[R( f n )] inf k 1 min k R(f) + 2k log n + 1 n 22 / 23
24 Example: Histograms continued From the third homework, we know that if d = 2 and the Bayes decision boundary is a 1-d curve, then by setting k = n and selecting the best f from F n we have E[R( f n )] = O(n 1/4 ) It is a simple exercise to show that the complexity regularized classifier will perform just as well automatically. That is, the proper k is selected automatically, without user intervention. 23 / 23
Probably Approximately Correct (PAC) Learning
ECE91 Spring 24 Statistical Regularization and Learning Theory Lecture: 6 Probably Approximately Correct (PAC) Learning Lecturer: Rob Nowak Scribe: Badri Narayan 1 Introduction 1.1 Overview of the Learning
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU10701 11. Learning Theory Barnabás Póczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationSupervised Machine Learning (Spring 2014) Homework 2, sample solutions
58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationConsistency of Nearest Neighbor Methods
E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More informationFast Rates for Estimation Error and Oracle Inequalities for Model Selection
Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu
More information1 A Lower Bound on Sample Complexity
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #7 Scribe: Chee Wei Tan February 25, 2008 1 A Lower Bound on Sample Complexity In the last lecture, we stopped at the lower bound on
More informationSymmetrization and Rademacher Averages
Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationIntroduction to Machine Learning (67577) Lecture 5
Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai
More informationMachine Learning. Model Selection and Validation. Fabio Vandin November 7, 2017
Machine Learning Model Selection and Validation Fabio Vandin November 7, 2017 1 Model Selection When we have to solve a machine learning task: there are different algorithms/classes algorithms have parameters
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What
More informationThe PAC Learning Framework -II
The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline
More information6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL
6.867 Machine learning: lecture 2 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learning problem hypothesis class, estimation algorithm loss and estimation criterion sampling, empirical and
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More informationCase study: stochastic simulation via Rademacher bootstrap
Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationAn Introduction to Statistical Machine Learning - Theoretical Aspects -
An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationPAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht
PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More informationAn Introduction to No Free Lunch Theorems
February 2, 2012 Table of Contents Induction Learning without direct observation. Generalising from data. Modelling physical phenomena. The Problem of Induction David Hume (1748) How do we know an induced
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationStatistical Properties of Large Margin Classifiers
Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides
More informationIncentive Compatible Regression Learning
Incentive Compatible Regression Learning Ofer Dekel 1 Felix Fischer 2 Ariel D. Procaccia 1 1 School of Computer Science and Engineering The Hebrew University of Jerusalem 2 Institut für Informatik Ludwig-Maximilians-Universität
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationLecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax
More informationVC dimension and Model Selection
VC dimension and Model Selection Overview PAC model: review VC dimension: Definition Examples Sample: Lower bound Upper bound!!! Model Selection Introduction to Machine Learning 2 PAC model: Setting A
More informationAn Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI
An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm
More informationGeneralization Bounds for the Area Under an ROC Curve
Generalization Bounds for the Area Under an ROC Curve Shivani Agarwal, Thore Graepel, Ralf Herbrich, Sariel Har-Peled and Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign
More information4 Invariant Statistical Decision Problems
4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationSample width for multi-category classifiers
R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationSolving Classification Problems By Knowledge Sets
Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationA first model of learning
A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) We observe the data where each Suppose we are given an ensemble of possible hypotheses / classifiers
More informationInformation, Learning and Falsification
Information, Learning and Falsification David Balduzzi December 17, 2011 Max Planck Institute for Intelligent Systems Tübingen, Germany Three main theories of information: Algorithmic information. Description.
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationComputational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013
Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationA BETTER VARIANCE CONTROL FOR PAC-BAYESIAN CLASSIFICATION
A BETTER VARIACE COTROL FOR PAC-BAYESIA CLASSIFICATIO J-Y AUDIBERT jyaudibe@ccrjussieufr Université Paris VI and CREST, France Abstract The common method to understand and improve classification rules
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationStatistical Learning Theory and the C-Loss cost function
Statistical Learning Theory and the C-Loss cost function Jose Principe, Ph.D. Distinguished Professor ECE, BME Computational NeuroEngineering Laboratory and principe@cnel.ufl.edu Statistical Learning Theory
More informationGRADIENT DESCENT AND LOCAL MINIMA
GRADIENT DESCENT AND LOCAL MINIMA 25 20 5 15 10 3 2 1 1 2 5 2 2 4 5 5 10 Suppose for both functions above, gradient descent is started at the point marked red. It will walk downhill as far as possible,
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationLogistic regression and linear classifiers COMS 4771
Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random
More informationCOMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization
: Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage
More informationSelective Prediction. Binary classifications. Rong Zhou November 8, 2017
Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?
More informationIntroduction and Models
CSE522, Winter 2011, Learning Theory Lecture 1 and 2-01/04/2011, 01/06/2011 Lecturer: Ofer Dekel Introduction and Models Scribe: Jessica Chang Machine learning algorithms have emerged as the dominant and
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationIntroduction to statistical learning (lecture notes)
Introduction to statistical learning (lecture notes) Jüri Lember Aalto, Spring 2012 http://math.tkk.fi/en/research/stochastics Literature: "The elements of statistical learning" T. Hastie, R. Tibshirani,
More information