Nearly-tight VC-dimension bounds for piecewise linear neural networks
|
|
- Alfred McCoy
- 5 years ago
- Views:
Transcription
1 Nearly-tight VC-dimension bounds for piecewise linear neural networks Nicholas J. A. Harvey, Christopher Liaw, Abbas Mehrabian University of British Columbia COLT 17 July 1, 217
2 Neural networks (w * x + b) (x) = max {x, } (ReLU) x " x # x $ x % Identity x & input layer hidden layer 1 hidden layer 2 output layer
3 VC-dimension Defn: If F is a family of functions then VCdim F k iff X = {x ",, x A } s.t. F achieves all 2 A signings, i.e. {(sign(f(x " )),, sign(f(x A ))): f F} = {,1} A e.g. Hyperplanes in R K have VC-dimension d + 1. Can shatter Impossible to shatter any 4 points
4 VC-dimension Defn: If F is a family of functions then VCdim F k iff X = {x ",, x A } s.t. F achieves all 2 A signings, i.e. {(sign(f(x " )),, sign(f(x A ))): f F} = {,1} A Thm [Fund. thm. of learning]: F is learnable iff VCdim F <. Moreover, sample complexity is Θ(VCdim F ).
5 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94]
6 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94] Main Thm [HLM 17]: For a ReLU NN w/ W params, L layers Ω WL log(w/l) VCdim O(WL log W) Means there exists NN with this VCdim
7 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94] Main Thm [HLM 17]: For a ReLU NN w/ W params, L layers Ω WL log(w/l) VCdim O(WL log W) Means there exists NN with this VCdim Independently proved by Bartlett 17
8 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94] Main Thm [HLM 17]: For a ReLU NN w/ W params, L layers Ω WL log(w/l) VCdim O(WL log W) Means there exists NN with this VCdim Independently proved by Bartlett 17 Recently, lots of work on power of depth for expressiveness of NNs [T 16, ES 16, Y 16, LS 16, SS 16, CSS 16, LGMRA 17, D 17]
9 Lower bound (refinement of [BMM 98]) Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c )
10 Lower bound (refinement of [BMM 98]) NN block extracts bits from a^ a^," a^,# Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c ) Given e^, easy to extract a^ e^ e c 1 a^ a^ a^,c a^,d 1 Select bit j from a^ Rest of NN
11 Lower bound (refinement of [BMM 98]) NN block extracts bits from a^ a^," Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c ) Given e^, easy to extract a^ Design bit extractor to extract a^,c [BMM 98] do this 1 bit per layer Ω(WL) More efficient: log(w/l) bits per layer Ω(WL log (W/L)) e^ e c 1 1 a^,# a^ a^ a^,c a^,d Select bit j from a^ Rest of NN
12 Lower bound (refinement of [BMM 98]) NN block extracts bits from a^ a^," Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c ) Given e^, easy to extract a^ Design bit extractor to extract a^,c [BMM 98] do this 1 bit per layer Ω(WL) More efficient: log(w/l) bits per layer Ω(WL log (W/L)) e^ e c 1 1 a^,# a^ a^ a^,c a^,d Select bit j from a^ Thm [HLM 17]: Suppose a ReLU NN w/ W params, L layers extracts m th bit of input. Then m O(L log (W/L)). Rest of NN
13 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Number of partition is small, i.e. (Cm) g Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^
14 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Number of partition is small, i.e. (Cm) g Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^
15 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Number of partition is small, i.e. (Cm) g Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^
16 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Size of partition is small, i.e. (Cm) g [Warren 68] Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^ * C > 1 is some constant
17 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Size of partition is small, i.e. (Cm) g [Warren 68] Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) * C > 1 is some constant x "^ x #^ x $^ x %^ x &^
18 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Size of partition is small, i.e. (Cm) g [Warren 68] Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) * C > 1 is some constant x "^ x #^ x $^ x %^ x &^
19 Open questions Can we close the gap for ReLU NNs: Ω(WL log W/L ) vs O(WL log W)? For polynomial NNs, we have Ωk WL VCdim Ol WL #. Can we close this gap? Do poly NNs have higher VCdim than ReLU NNs? What about VC dimensions of CNNs, RNNs, ResNets, etc.?
20 Open questions Can we close the gap for ReLU NNs: Ω(WL log W/L ) vs O(WL log W)? For polynomial NNs, we have Ωk WL VCdim Ol WL #. Can we close this gap? Do poly NNs have higher VCdim than ReLU NNs? What about VC dimensions of CNNs, RNNs, ResNets, etc.? Thank you!
Nearly-tight VC-dimension bounds for piecewise linear neural networks
Proceedings of Machine Learning Research vol 65:1 5, 2017 Nearly-tight VC-dimension bounds for piecewise linear neural networks Nick Harvey Christopher Liaw Abbas Mehrabian Department of Computer Science,
More informationarxiv: v1 [cs.lg] 8 Mar 2017
Nearly-tight VC-dimension bounds for piecewise linear neural networks arxiv:1703.02930v1 [cs.lg] 8 Mar 2017 Nick Harvey Chris Liaw Abbas Mehrabian March 9, 2017 Abstract We prove new upper and lower bounds
More informationSome Statistical Properties of Deep Networks
Some Statistical Properties of Deep Networks Peter Bartlett UC Berkeley August 2, 2018 1 / 22 Deep Networks Deep compositions of nonlinear functions h = h m h m 1 h 1 2 / 22 Deep Networks Deep compositions
More informationVapnik-Chervonenkis Dimension of Neural Nets
P. L. Bartlett and W. Maass: Vapnik-Chervonenkis Dimension of Neural Nets 1 Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley
More informationVapnik-Chervonenkis Dimension of Neural Nets
Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley Department of Statistics 367 Evans Hall, CA 94720-3860, USA bartlett@stat.berkeley.edu
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Efficient PAC Learning Definition: A family H n of
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationGeneralization in Deep Networks
Generalization in Deep Networks Peter Bartlett BAIR UC Berkeley November 28, 2017 1 / 29 Deep neural networks Game playing (Jung Yeon-Je/AFP/Getty Images) 2 / 29 Deep neural networks Image recognition
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Learning Using FIND-CONS For any family of hypothesis
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationComputational Learning Theory (VC Dimension)
Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is
More informationVC-dimension of a context-dependent perceptron
1 VC-dimension of a context-dependent perceptron Piotr Ciskowski Institute of Engineering Cybernetics, Wroc law University of Technology, Wybrzeże Wyspiańskiego 27, 50 370 Wroc law, Poland cis@vectra.ita.pwr.wroc.pl
More informationAn Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI
An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process
More informationAnalysis on the Nonlinear Dynamics of Deep Neural Networks: Topological Entropy and Chaos
Analysis on the Nonlinear Dynamics of Deep Neural Networks: Topological Entropy and Chaos Husheng Li arxiv:84.3987v3 [cs.lg] 8 Jan 9 Abstract The theoretical explanation for the great success of deep neural
More informationCh.6 Deep Feedforward Networks (2/3)
Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations
More informationLecture 13: Introduction to Neural Networks
Lecture 13: Introduction to Neural Networks Instructor: Aditya Bhaskara Scribe: Dietrich Geisler CS 5966/6966: Theory of Machine Learning March 8 th, 2017 Abstract This is a short, two-line summary of
More informationComputational Learning Theory
1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number
More informationBounding and Counting Linear Regions of Deep Neural Networks
Bounding and Counting Linear Regions of Deep Neural Networks Thiago Serra Mitsubishi Electric Research Labs @thserra Christian Tjandraatmadja Google Srikumar Ramalingam The University of Utah 2 The Answer
More informationComputational Learning Theory (COLT)
Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine
More informationCOMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization
: Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage
More informationIntroduction to Machine Learning
Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity
More informationNonparametric regression using deep neural networks with ReLU activation function
Nonparametric regression using deep neural networks with ReLU activation function Johannes Schmidt-Hieber February 2018 Caltech 1 / 20 Many impressive results in applications... Lack of theoretical understanding...
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationLecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful?
Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation The Story So Far... So: How can we evaluate E(X Y )? What is a neural network? How is it useful? EE 278: Using Neural Networks for
More informationDeep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04
Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures Lecture 04 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Self-Taught Learning 1. Learn
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes
More informationLearning Hedonic Games
Learning Hedonic Games Jakub Sliwinski and Yair Zick Abstract Coalitional stability in hedonic games has usually been considered in the setting where agent preferences are fully known. We consider the
More informationCombinatorial Auctions Do Need Modest Interaction
Combinatorial Auctions Do Need Modest Interaction Sepehr Assadi University of Pennsylvania Motivation A fundamental question: How to determine efficient allocation of resources between individuals? Motivation
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationFast classification using sparsely active spiking networks. Hesham Mostafa Institute of neural computation, UCSD
Fast classification using sparsely active spiking networks Hesham Mostafa Institute of neural computation, UCSD Artificial networks vs. spiking networks backpropagation output layer Multi-layer networks
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis
More informationConnections between Neural Networks and Boolean Functions
Connections between Neural Networks and Boolean Functions Martin Anthony Department of Mathematics and Centre for Discrete and Applicable Mathematics The London School of Economics and Political Science
More informationNeural Network Learning: Testing Bounds on Sample Complexity
Neural Network Learning: Testing Bounds on Sample Complexity Joaquim Marques de Sá, Fernando Sereno 2, Luís Alexandre 3 INEB Instituto de Engenharia Biomédica Faculdade de Engenharia da Universidade do
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationLearnability and models of decision making under uncertainty
and models of decision making under uncertainty Pathikrit Basu Federico Echenique Caltech Virginia Tech DT Workshop April 6, 2018 Pathikrit To think is to forget a difference, to generalize, to abstract.
More informationLearning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationLearning symmetric non-monotone submodular functions
Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationON THE APPROXIMABILITY OF
APPROX October 30, 2018 Overview 1 2 3 4 5 6 Closing remarks and questions What is an approximation algorithm? Definition 1.1 An algorithm A for an optimization problem X is an a approximation algorithm
More informationAustralian National University. Abstract. number of training examples necessary for satisfactory learning performance grows
The VC-Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs Peter L. Bartlett Robert C. Williamson Department of Systems Engineering Research School of Information Sciences and
More informationConvolutional Neural Network Architecture
Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation
More informationGeneralization theory
Generalization theory Chapter 4 T.P. Runarsson (tpr@hi.is) and S. Sigurdsson (sven@hi.is) Introduction Suppose you are given the empirical observations, (x 1, y 1 ),..., (x l, y l ) (X Y) l. Consider the
More informationReLU Deep Neural Networks and Linear Finite Elements
ReLU Deep Neural Networks and Linear Finite Elements arxiv:1807.03973v1 [math.na] 11 Jul 2018 Juncai He Lin Li Jinchao Xu Chunyue Zheng Abstract In this paper, we investigate the relationship between deep
More informationCS 6375: Machine Learning Computational Learning Theory
CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty
More informationOn the Complexity of Approximating the VC dimension
On the Complexity of Approximating the VC dimension Elchanan Mossel Microsoft Research One Microsoft Way Redmond, WA 98052 mossel@microsoft.com Christopher Umans Microsoft Research One Microsoft Way Redmond,
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationA PAC-BAYESIAN APPROACH TO SPECTRALLY-NORMALIZED MARGIN BOUNDS
Published as a conference paper at ICLR 08 A PAC-BAYESIAN APPROACH TO SPECTRALLY-NORMALIZED MARGIN BOUNDS FOR NEURAL NETWORKS Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro Toyota Technological
More informationComplexity Theory of Polynomial-Time Problems
Complexity Theory of Polynomial-Time Problems Lecture 8: (Boolean) Matrix Multiplication Karl Bringmann Recall: Boolean Matrix Multiplication given n n matrices A, B with entries in {0,1} compute matrix
More informationComputational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization
More informationName (NetID): (1 Point)
CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four
More informationQuantum algorithms based on span programs
Quantum algorithms based on span programs Ben Reichardt IQC, U Waterloo [arxiv:0904.2759] Model: Complexity measure: Quantum algorithms Black-box query complexity Span programs Witness size [KW 93] [RŠ
More informationarxiv: v1 [cs.lg] 30 Sep 2018
Deep, Skinny Neural Networks are not Universal Approximators arxiv:1810.00393v1 [cs.lg] 30 Sep 2018 Jesse Johnson Sanofi jejo.math@gmail.com October 2, 2018 Abstract In order to choose a neural network
More informationVC Dimension Bounds for Product Unit Networks*
VC Dimension Bounds for Product Unit Networks* Michael Schmitt Lehrstuhl Mathematik und Informatik, Fakultat fur Mathematik Ruhr-Universitat Bochum, D-44780 Bochum, Germany http://wuu.ruhr-mi-bochum.de/lmi/mschmitt/
More informationTTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model
TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationLearning Deep Architectures for AI. Part I - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a
More informationComputational Learning Theory. Definitions
Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationAnnouncements. Review: Lyap. thms so far. Review: Multiple Lyap. Fcns. Discrete-time PWL/PWA Quadratic Lyapunov Theory
EECE 571M/491M, Spring 2007 Lecture 11 Discrete-time PWL/PWA Quadratic Lyapunov Theory http://www.ece.ubc.ca/~elec571m.html moishi@ece.ubc.ca Meeko Oishi, Ph.D. Electrical and Computer Engineering University
More informationDeep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang
Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping
More informationA Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997
A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science
More informationVC-DIMENSION OF SHORT PRESBURGER FORMULAS
VC-DIMENSION OF SHORT PRESBURGER FORMULAS DANNY NGUYEN AND IGOR PAK Abstract. We study VC-dimension of short formulas in Presburger Arithmetic, defined to have a bounded number of variables, quantifiers
More informationLocal Decoding and Testing Polynomials over Grids
Local Decoding and Testing Polynomials over Grids Madhu Sudan Harvard University Joint work with Srikanth Srinivasan (IIT Bombay) January 11, 2018 ITCS: Polynomials over Grids 1 of 12 DeMillo-Lipton-Schwarz-Zippel
More informationDistributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria
Distributed Machine Learning: A Brief Overview Dan Alistarh IST Austria Background The Machine Learning Cambrian Explosion Key Factors: 1. Large s: Millions of labelled images, thousands of hours of speech
More information(
Class 15 - Long Short-Term Memory (LSTM) Study materials http://colah.github.io/posts/2015-08-understanding-lstms/ (http://colah.github.io/posts/2015-08-understanding-lstms/) http://karpathy.github.io/2015/05/21/rnn-effectiveness/
More informationLecture 15: Neural Networks Theory
EECS 598-5: Theoretical Foundations of Machine Learning Fall 25 Lecture 5: Neural Networks Theory Lecturer: Matus Telgarsky Scribes: Xintong Wang, Editors: Yuan Zhuang 5. Neural Networks Definition and
More informationComputational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013
Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction
More informationVC dimension and Model Selection
VC dimension and Model Selection Overview PAC model: review VC dimension: Definition Examples Sample: Lower bound Upper bound!!! Model Selection Introduction to Machine Learning 2 PAC model: Setting A
More informationMachine Learning: Homework 5
0-60 Machine Learning: Homework 5 Due 5:0 p.m. Thursday, March, 06 TAs: Travis Dick and Han Zhao Instructions Late homework policy: Homework is worth full credit if submitted before the due date, half
More informationMore Completeness, conp, FNP,etc. CS254 Chris Pollett Oct 30, 2006.
More Completeness, conp, FNP,etc. CS254 Chris Pollett Oct 30, 2006. Outline A last complete problem for NP conp, conp NP Function problems MAX-CUT A cut in a graph G=(V,E) is a set of nodes S used to partition
More informationLearning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38
Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationHyperdeterminants of Polynomials
Hyperdeterminants of Polynomials Luke Oeding University of California, Berkeley June 5, 0 Support: NSF IRFP (#085000) while at the University of Florence. Luke Oeding (UC Berkeley) Hyperdets of Polys June
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationInteger Programming (IP)
Integer Programming (IP) An LP problem with an additional constraint that variables will only get an integral value, maybe from some range. BIP binary integer programming: variables should be assigned
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationDepth-Width Tradeoffs in Approximating Natural Functions with Neural Networks
Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks Itay Safran 1 Ohad Shamir 1 Abstract We provide several new depth-based separation results for feed-forward neural networks,
More informationRecurrent and Recursive Networks
Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationABHELSINKI UNIVERSITY OF TECHNOLOGY
ABHELSINKI UNIVERSITY OF TECHNOLOGY TECHNISCHE UNIVERSITÄT HELSINKI UNIVERSITE DE TECHNOLOGIE D HELSINKI A posteriori error analysis for Kirchhoff plate elements Jarkko Niiranen Laboratory of Structural
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationDeep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties. Philipp Christian Petersen
Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties Philipp Christian Petersen Joint work Joint work with: Helmut Bölcskei (ETH Zürich) Philipp Grohs
More informationComputational Learning Theory
Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationModels of Language Acquisition: Part II
Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical
More informationA Bound on the Label Complexity of Agnostic Active Learning
A Bound on the Label Complexity of Agnostic Active Learning Steve Hanneke March 2007 CMU-ML-07-103 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Machine Learning Department,
More informationGeneral Strong Polarization
General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) December 4, 2017 IAS:
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationConditional Language modeling with attention
Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability
More informationComputational Learning Theory
Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son
More informationError bounds for approximations with deep ReLU networks
Error bounds for approximations with deep ReLU networks arxiv:1610.01145v3 [cs.lg] 1 May 2017 Dmitry Yarotsky d.yarotsky@skoltech.ru May 2, 2017 Abstract We study expressive power of shallow and deep neural
More informationDecision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1
Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More information