Nearly-tight VC-dimension bounds for piecewise linear neural networks

Size: px
Start display at page:

Download "Nearly-tight VC-dimension bounds for piecewise linear neural networks"

Transcription

1 Nearly-tight VC-dimension bounds for piecewise linear neural networks Nicholas J. A. Harvey, Christopher Liaw, Abbas Mehrabian University of British Columbia COLT 17 July 1, 217

2 Neural networks (w * x + b) (x) = max {x, } (ReLU) x " x # x $ x % Identity x & input layer hidden layer 1 hidden layer 2 output layer

3 VC-dimension Defn: If F is a family of functions then VCdim F k iff X = {x ",, x A } s.t. F achieves all 2 A signings, i.e. {(sign(f(x " )),, sign(f(x A ))): f F} = {,1} A e.g. Hyperplanes in R K have VC-dimension d + 1. Can shatter Impossible to shatter any 4 points

4 VC-dimension Defn: If F is a family of functions then VCdim F k iff X = {x ",, x A } s.t. F achieves all 2 A signings, i.e. {(sign(f(x " )),, sign(f(x A ))): f F} = {,1} A Thm [Fund. thm. of learning]: F is learnable iff VCdim F <. Moreover, sample complexity is Θ(VCdim F ).

5 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94]

6 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94] Main Thm [HLM 17]: For a ReLU NN w/ W params, L layers Ω WL log(w/l) VCdim O(WL log W) Means there exists NN with this VCdim

7 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94] Main Thm [HLM 17]: For a ReLU NN w/ W params, L layers Ω WL log(w/l) VCdim O(WL log W) Means there exists NN with this VCdim Independently proved by Bartlett 17

8 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94] Main Thm [HLM 17]: For a ReLU NN w/ W params, L layers Ω WL log(w/l) VCdim O(WL log W) Means there exists NN with this VCdim Independently proved by Bartlett 17 Recently, lots of work on power of depth for expressiveness of NNs [T 16, ES 16, Y 16, LS 16, SS 16, CSS 16, LGMRA 17, D 17]

9 Lower bound (refinement of [BMM 98]) Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c )

10 Lower bound (refinement of [BMM 98]) NN block extracts bits from a^ a^," a^,# Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c ) Given e^, easy to extract a^ e^ e c 1 a^ a^ a^,c a^,d 1 Select bit j from a^ Rest of NN

11 Lower bound (refinement of [BMM 98]) NN block extracts bits from a^ a^," Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c ) Given e^, easy to extract a^ Design bit extractor to extract a^,c [BMM 98] do this 1 bit per layer Ω(WL) More efficient: log(w/l) bits per layer Ω(WL log (W/L)) e^ e c 1 1 a^,# a^ a^ a^,c a^,d Select bit j from a^ Rest of NN

12 Lower bound (refinement of [BMM 98]) NN block extracts bits from a^ a^," Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c ) Given e^, easy to extract a^ Design bit extractor to extract a^,c [BMM 98] do this 1 bit per layer Ω(WL) More efficient: log(w/l) bits per layer Ω(WL log (W/L)) e^ e c 1 1 a^,# a^ a^ a^,c a^,d Select bit j from a^ Thm [HLM 17]: Suppose a ReLU NN w/ W params, L layers extracts m th bit of input. Then m O(L log (W/L)). Rest of NN

13 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Number of partition is small, i.e. (Cm) g Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^

14 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Number of partition is small, i.e. (Cm) g Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^

15 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Number of partition is small, i.e. (Cm) g Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^

16 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Size of partition is small, i.e. (Cm) g [Warren 68] Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^ * C > 1 is some constant

17 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Size of partition is small, i.e. (Cm) g [Warren 68] Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) * C > 1 is some constant x "^ x #^ x $^ x %^ x &^

18 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Size of partition is small, i.e. (Cm) g [Warren 68] Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) * C > 1 is some constant x "^ x #^ x $^ x %^ x &^

19 Open questions Can we close the gap for ReLU NNs: Ω(WL log W/L ) vs O(WL log W)? For polynomial NNs, we have Ωk WL VCdim Ol WL #. Can we close this gap? Do poly NNs have higher VCdim than ReLU NNs? What about VC dimensions of CNNs, RNNs, ResNets, etc.?

20 Open questions Can we close the gap for ReLU NNs: Ω(WL log W/L ) vs O(WL log W)? For polynomial NNs, we have Ωk WL VCdim Ol WL #. Can we close this gap? Do poly NNs have higher VCdim than ReLU NNs? What about VC dimensions of CNNs, RNNs, ResNets, etc.? Thank you!

Nearly-tight VC-dimension bounds for piecewise linear neural networks

Nearly-tight VC-dimension bounds for piecewise linear neural networks Proceedings of Machine Learning Research vol 65:1 5, 2017 Nearly-tight VC-dimension bounds for piecewise linear neural networks Nick Harvey Christopher Liaw Abbas Mehrabian Department of Computer Science,

More information

arxiv: v1 [cs.lg] 8 Mar 2017

arxiv: v1 [cs.lg] 8 Mar 2017 Nearly-tight VC-dimension bounds for piecewise linear neural networks arxiv:1703.02930v1 [cs.lg] 8 Mar 2017 Nick Harvey Chris Liaw Abbas Mehrabian March 9, 2017 Abstract We prove new upper and lower bounds

More information

Some Statistical Properties of Deep Networks

Some Statistical Properties of Deep Networks Some Statistical Properties of Deep Networks Peter Bartlett UC Berkeley August 2, 2018 1 / 22 Deep Networks Deep compositions of nonlinear functions h = h m h m 1 h 1 2 / 22 Deep Networks Deep compositions

More information

Vapnik-Chervonenkis Dimension of Neural Nets

Vapnik-Chervonenkis Dimension of Neural Nets P. L. Bartlett and W. Maass: Vapnik-Chervonenkis Dimension of Neural Nets 1 Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley

More information

Vapnik-Chervonenkis Dimension of Neural Nets

Vapnik-Chervonenkis Dimension of Neural Nets Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley Department of Statistics 367 Evans Hall, CA 94720-3860, USA bartlett@stat.berkeley.edu

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Efficient PAC Learning Definition: A family H n of

More information

Computational and Statistical Learning theory

Computational and Statistical Learning theory Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,

More information

Generalization in Deep Networks

Generalization in Deep Networks Generalization in Deep Networks Peter Bartlett BAIR UC Berkeley November 28, 2017 1 / 29 Deep neural networks Game playing (Jung Yeon-Je/AFP/Getty Images) 2 / 29 Deep neural networks Image recognition

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Learning Using FIND-CONS For any family of hypothesis

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE

More information

Computational Learning Theory (VC Dimension)

Computational Learning Theory (VC Dimension) Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is

More information

VC-dimension of a context-dependent perceptron

VC-dimension of a context-dependent perceptron 1 VC-dimension of a context-dependent perceptron Piotr Ciskowski Institute of Engineering Cybernetics, Wroc law University of Technology, Wybrzeże Wyspiańskiego 27, 50 370 Wroc law, Poland cis@vectra.ita.pwr.wroc.pl

More information

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process

More information

Analysis on the Nonlinear Dynamics of Deep Neural Networks: Topological Entropy and Chaos

Analysis on the Nonlinear Dynamics of Deep Neural Networks: Topological Entropy and Chaos Analysis on the Nonlinear Dynamics of Deep Neural Networks: Topological Entropy and Chaos Husheng Li arxiv:84.3987v3 [cs.lg] 8 Jan 9 Abstract The theoretical explanation for the great success of deep neural

More information

Ch.6 Deep Feedforward Networks (2/3)

Ch.6 Deep Feedforward Networks (2/3) Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations

More information

Lecture 13: Introduction to Neural Networks

Lecture 13: Introduction to Neural Networks Lecture 13: Introduction to Neural Networks Instructor: Aditya Bhaskara Scribe: Dietrich Geisler CS 5966/6966: Theory of Machine Learning March 8 th, 2017 Abstract This is a short, two-line summary of

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Bounding and Counting Linear Regions of Deep Neural Networks

Bounding and Counting Linear Regions of Deep Neural Networks Bounding and Counting Linear Regions of Deep Neural Networks Thiago Serra Mitsubishi Electric Research Labs @thserra Christian Tjandraatmadja Google Srikumar Ramalingam The University of Utah 2 The Answer

More information

Computational Learning Theory (COLT)

Computational Learning Theory (COLT) Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity

More information

Nonparametric regression using deep neural networks with ReLU activation function

Nonparametric regression using deep neural networks with ReLU activation function Nonparametric regression using deep neural networks with ReLU activation function Johannes Schmidt-Hieber February 2018 Caltech 1 / 20 Many impressive results in applications... Lack of theoretical understanding...

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful?

Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful? Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation The Story So Far... So: How can we evaluate E(X Y )? What is a neural network? How is it useful? EE 278: Using Neural Networks for

More information

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04 Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures Lecture 04 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Self-Taught Learning 1. Learn

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes

More information

Learning Hedonic Games

Learning Hedonic Games Learning Hedonic Games Jakub Sliwinski and Yair Zick Abstract Coalitional stability in hedonic games has usually been considered in the setting where agent preferences are fully known. We consider the

More information

Combinatorial Auctions Do Need Modest Interaction

Combinatorial Auctions Do Need Modest Interaction Combinatorial Auctions Do Need Modest Interaction Sepehr Assadi University of Pennsylvania Motivation A fundamental question: How to determine efficient allocation of resources between individuals? Motivation

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

Fast classification using sparsely active spiking networks. Hesham Mostafa Institute of neural computation, UCSD

Fast classification using sparsely active spiking networks. Hesham Mostafa Institute of neural computation, UCSD Fast classification using sparsely active spiking networks Hesham Mostafa Institute of neural computation, UCSD Artificial networks vs. spiking networks backpropagation output layer Multi-layer networks

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis

More information

Connections between Neural Networks and Boolean Functions

Connections between Neural Networks and Boolean Functions Connections between Neural Networks and Boolean Functions Martin Anthony Department of Mathematics and Centre for Discrete and Applicable Mathematics The London School of Economics and Political Science

More information

Neural Network Learning: Testing Bounds on Sample Complexity

Neural Network Learning: Testing Bounds on Sample Complexity Neural Network Learning: Testing Bounds on Sample Complexity Joaquim Marques de Sá, Fernando Sereno 2, Luís Alexandre 3 INEB Instituto de Engenharia Biomédica Faculdade de Engenharia da Universidade do

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Learnability and models of decision making under uncertainty

Learnability and models of decision making under uncertainty and models of decision making under uncertainty Pathikrit Basu Federico Echenique Caltech Virginia Tech DT Workshop April 6, 2018 Pathikrit To think is to forget a difference, to generalize, to abstract.

More information

Learning Kernels -Tutorial Part III: Theoretical Guarantees.

Learning Kernels -Tutorial Part III: Theoretical Guarantees. Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

ON THE APPROXIMABILITY OF

ON THE APPROXIMABILITY OF APPROX October 30, 2018 Overview 1 2 3 4 5 6 Closing remarks and questions What is an approximation algorithm? Definition 1.1 An algorithm A for an optimization problem X is an a approximation algorithm

More information

Australian National University. Abstract. number of training examples necessary for satisfactory learning performance grows

Australian National University. Abstract. number of training examples necessary for satisfactory learning performance grows The VC-Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs Peter L. Bartlett Robert C. Williamson Department of Systems Engineering Research School of Information Sciences and

More information

Convolutional Neural Network Architecture

Convolutional Neural Network Architecture Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation

More information

Generalization theory

Generalization theory Generalization theory Chapter 4 T.P. Runarsson (tpr@hi.is) and S. Sigurdsson (sven@hi.is) Introduction Suppose you are given the empirical observations, (x 1, y 1 ),..., (x l, y l ) (X Y) l. Consider the

More information

ReLU Deep Neural Networks and Linear Finite Elements

ReLU Deep Neural Networks and Linear Finite Elements ReLU Deep Neural Networks and Linear Finite Elements arxiv:1807.03973v1 [math.na] 11 Jul 2018 Juncai He Lin Li Jinchao Xu Chunyue Zheng Abstract In this paper, we investigate the relationship between deep

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

On the Complexity of Approximating the VC dimension

On the Complexity of Approximating the VC dimension On the Complexity of Approximating the VC dimension Elchanan Mossel Microsoft Research One Microsoft Way Redmond, WA 98052 mossel@microsoft.com Christopher Umans Microsoft Research One Microsoft Way Redmond,

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

A PAC-BAYESIAN APPROACH TO SPECTRALLY-NORMALIZED MARGIN BOUNDS

A PAC-BAYESIAN APPROACH TO SPECTRALLY-NORMALIZED MARGIN BOUNDS Published as a conference paper at ICLR 08 A PAC-BAYESIAN APPROACH TO SPECTRALLY-NORMALIZED MARGIN BOUNDS FOR NEURAL NETWORKS Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro Toyota Technological

More information

Complexity Theory of Polynomial-Time Problems

Complexity Theory of Polynomial-Time Problems Complexity Theory of Polynomial-Time Problems Lecture 8: (Boolean) Matrix Multiplication Karl Bringmann Recall: Boolean Matrix Multiplication given n n matrices A, B with entries in {0,1} compute matrix

More information

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

Quantum algorithms based on span programs

Quantum algorithms based on span programs Quantum algorithms based on span programs Ben Reichardt IQC, U Waterloo [arxiv:0904.2759] Model: Complexity measure: Quantum algorithms Black-box query complexity Span programs Witness size [KW 93] [RŠ

More information

arxiv: v1 [cs.lg] 30 Sep 2018

arxiv: v1 [cs.lg] 30 Sep 2018 Deep, Skinny Neural Networks are not Universal Approximators arxiv:1810.00393v1 [cs.lg] 30 Sep 2018 Jesse Johnson Sanofi jejo.math@gmail.com October 2, 2018 Abstract In order to choose a neural network

More information

VC Dimension Bounds for Product Unit Networks*

VC Dimension Bounds for Product Unit Networks* VC Dimension Bounds for Product Unit Networks* Michael Schmitt Lehrstuhl Mathematik und Informatik, Fakultat fur Mathematik Ruhr-Universitat Bochum, D-44780 Bochum, Germany http://wuu.ruhr-mi-bochum.de/lmi/mschmitt/

More information

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Learning Deep Architectures for AI. Part I - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

Announcements. Review: Lyap. thms so far. Review: Multiple Lyap. Fcns. Discrete-time PWL/PWA Quadratic Lyapunov Theory

Announcements. Review: Lyap. thms so far. Review: Multiple Lyap. Fcns. Discrete-time PWL/PWA Quadratic Lyapunov Theory EECE 571M/491M, Spring 2007 Lecture 11 Discrete-time PWL/PWA Quadratic Lyapunov Theory http://www.ece.ubc.ca/~elec571m.html moishi@ece.ubc.ca Meeko Oishi, Ph.D. Electrical and Computer Engineering University

More information

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

VC-DIMENSION OF SHORT PRESBURGER FORMULAS

VC-DIMENSION OF SHORT PRESBURGER FORMULAS VC-DIMENSION OF SHORT PRESBURGER FORMULAS DANNY NGUYEN AND IGOR PAK Abstract. We study VC-dimension of short formulas in Presburger Arithmetic, defined to have a bounded number of variables, quantifiers

More information

Local Decoding and Testing Polynomials over Grids

Local Decoding and Testing Polynomials over Grids Local Decoding and Testing Polynomials over Grids Madhu Sudan Harvard University Joint work with Srikanth Srinivasan (IIT Bombay) January 11, 2018 ITCS: Polynomials over Grids 1 of 12 DeMillo-Lipton-Schwarz-Zippel

More information

Distributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria

Distributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria Distributed Machine Learning: A Brief Overview Dan Alistarh IST Austria Background The Machine Learning Cambrian Explosion Key Factors: 1. Large s: Millions of labelled images, thousands of hours of speech

More information

(

( Class 15 - Long Short-Term Memory (LSTM) Study materials http://colah.github.io/posts/2015-08-understanding-lstms/ (http://colah.github.io/posts/2015-08-understanding-lstms/) http://karpathy.github.io/2015/05/21/rnn-effectiveness/

More information

Lecture 15: Neural Networks Theory

Lecture 15: Neural Networks Theory EECS 598-5: Theoretical Foundations of Machine Learning Fall 25 Lecture 5: Neural Networks Theory Lecturer: Matus Telgarsky Scribes: Xintong Wang, Editors: Yuan Zhuang 5. Neural Networks Definition and

More information

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction

More information

VC dimension and Model Selection

VC dimension and Model Selection VC dimension and Model Selection Overview PAC model: review VC dimension: Definition Examples Sample: Lower bound Upper bound!!! Model Selection Introduction to Machine Learning 2 PAC model: Setting A

More information

Machine Learning: Homework 5

Machine Learning: Homework 5 0-60 Machine Learning: Homework 5 Due 5:0 p.m. Thursday, March, 06 TAs: Travis Dick and Han Zhao Instructions Late homework policy: Homework is worth full credit if submitted before the due date, half

More information

More Completeness, conp, FNP,etc. CS254 Chris Pollett Oct 30, 2006.

More Completeness, conp, FNP,etc. CS254 Chris Pollett Oct 30, 2006. More Completeness, conp, FNP,etc. CS254 Chris Pollett Oct 30, 2006. Outline A last complete problem for NP conp, conp NP Function problems MAX-CUT A cut in a graph G=(V,E) is a set of nodes S used to partition

More information

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38 Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Hyperdeterminants of Polynomials

Hyperdeterminants of Polynomials Hyperdeterminants of Polynomials Luke Oeding University of California, Berkeley June 5, 0 Support: NSF IRFP (#085000) while at the University of Florence. Luke Oeding (UC Berkeley) Hyperdets of Polys June

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Integer Programming (IP)

Integer Programming (IP) Integer Programming (IP) An LP problem with an additional constraint that variables will only get an integral value, maybe from some range. BIP binary integer programming: variables should be assigned

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks

Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks Itay Safran 1 Ohad Shamir 1 Abstract We provide several new depth-based separation results for feed-forward neural networks,

More information

Recurrent and Recursive Networks

Recurrent and Recursive Networks Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

ABHELSINKI UNIVERSITY OF TECHNOLOGY

ABHELSINKI UNIVERSITY OF TECHNOLOGY ABHELSINKI UNIVERSITY OF TECHNOLOGY TECHNISCHE UNIVERSITÄT HELSINKI UNIVERSITE DE TECHNOLOGIE D HELSINKI A posteriori error analysis for Kirchhoff plate elements Jarkko Niiranen Laboratory of Structural

More information

Neural networks COMS 4771

Neural networks COMS 4771 Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a

More information

Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties. Philipp Christian Petersen

Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties. Philipp Christian Petersen Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties Philipp Christian Petersen Joint work Joint work with: Helmut Bölcskei (ETH Zürich) Philipp Grohs

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Models of Language Acquisition: Part II

Models of Language Acquisition: Part II Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical

More information

A Bound on the Label Complexity of Agnostic Active Learning

A Bound on the Label Complexity of Agnostic Active Learning A Bound on the Label Complexity of Agnostic Active Learning Steve Hanneke March 2007 CMU-ML-07-103 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Machine Learning Department,

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) December 4, 2017 IAS:

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

Conditional Language modeling with attention

Conditional Language modeling with attention Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

Error bounds for approximations with deep ReLU networks

Error bounds for approximations with deep ReLU networks Error bounds for approximations with deep ReLU networks arxiv:1610.01145v3 [cs.lg] 1 May 2017 Dmitry Yarotsky d.yarotsky@skoltech.ru May 2, 2017 Abstract We study expressive power of shallow and deep neural

More information

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1 Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information