Lipschitz Properties of General Convolutional Neural Networks

Size: px
Start display at page:

Download "Lipschitz Properties of General Convolutional Neural Networks"

Transcription

1 Lipschitz Properties of General Convolutional Neural Networks Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD Joint work with Dongmian Zou (UMD), Maneesh Singh (Verisk) March 22, 2017 Image Analysis Seminar University of Houston, Houston TX

2 This material is based upon work supported by the National Science Foundation under Grant No. DMS Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The author has been partially supported by ARO under grant W911NF

3 Table of Contents: 1 Problem Formulation 2 Deep Convolutional Neural Networks 3 Lipschitz Analysis 4 Numerical Results

4 Problem Formulation Nonlinear Maps Consider a nonlinear function between two metric spaces, F : (X, d X ) (Y, d Y ).

5 Problem Formulation Lipschitz analysis of nonlinear systems F : (X, d X ) (Y, d Y ) F is called Lipschitz with constant C if for any f, f X, d Y (F(f ), F( f )) C d X (f, f ) The optimal (i.e. smallest) Lipschitz constant is denoted Lip(F). The square C 2 is called Lipschitz bound (similar to the Bessel bound). F is called bi-lipschitz with constants C 1, C 2 > 0 if for any f, f X, C 1 d X (f, f ) d Y (F(f ), F( f )) C 2 d X (f, f ) The square C 2 1, C 2 2 are called Lipschitz bounds (similar to frame bounds).

6 Problem Formulation Motivating Examples Consider the typical neural network as a feature extractor component in a classification system: g = F(f ) = F M (...F 1 (f ; W 1, ϕ 1 );...; W M, ϕ M ) F m (f ; W m, ϕ m ) = ϕ m (W m f ) W m is a linear operator (matrix); ϕ m is a Lip(1) scalar nonlinearity (e.g. Rectified Linear Unit).

7 Problem Formulation Deep Convolutional Neural Networks Lipschitz Analysis Numerical Results Problem Formulation Motivating Example: AlexNet Example from [SZSBEGF13] ( Intriguing properties... ). ImageNet dataset [DDSLLF09] (10,184 categories; 8.9 mil.img.); AlexNet architecture [KSH12]: Figure: From Krizhevsky et all 2012 [KSH12]: AlexNet: 5 convolutive layers + 3 dense layers. Input size: 224x224x3 pixels. Output size: 1000.

8 Problem Formulation Motivating Example: AlexNet The authors of [SZSBEGF13] ( Intriguing properties... ) found small variations of the input, almost imperceptible, that produced completely different classification decisions: Figure: From Szegedy et all 2013 [SZSBEGF13]: AlexNet: 6 different classes: original image, difference, and adversarial example all classified as ostrich

9 Problem Formulation Motivating Example: Alex Net Szegedy et all 2013 [SZSBEGF13] computed the Lipschitz constants of each layer. Overall Lipschitz constant: Layer Size Sing.Val Conv Conv Conv Conv Conv Fully Conn (43264) Fully Conn Fully Conn Lip = 5, 612, 006

10 Problem Formulation Motivating Example: Scattering Network Example of Scattering Network; definition and properties: [Mallat12]; this example from [BSZ17]: Input: f ; Outputs: y = (y l,k ).

11 Problem Formulation Motivating Example: Scattering Network Remarks: Outputs from each layer

12 Problem Formulation Motivating Example: Scattering Network Remarks: Outputs from each layer Tree-like topology

13 Problem Formulation Motivating Example: Scattering Network Remarks: Outputs from each layer Tree-like topology Backpropagation/Chain rule: Lipschitz bound 40.

14 Problem Formulation Motivating Example: Scattering Network Remarks: Outputs from each layer Tree-like topology Backpropagation/Chain rule: Lipschitz bound 40. Mallat s result predicts Lip = 1.

15 Problem Formulation Problem 1 Given a deep network: Estimate the Lipschitz constant, or bound: Lip = 2 y ỹ sup 2 y ỹ 2, Bound = sup f f L 2 f f 2 f f L 2 f f 2. 2

16 Problem Formulation Problem 1 Given a deep network: Estimate the Lipschitz constant, or bound: Lip = Methods (Approaches): 2 y ỹ sup 2 y ỹ 2, Bound = sup f f L 2 f f 2 f f L 2 f f Standard Method: Backpropagation, or chain-rule 2 New Method: Storage function based approach (dissipative systems) 3 Numerical Method: Simulations

17 Problem Formulation Problem 2 Given a deep network: Estimate the stability of the output to specific variations of the input: 1 Invariance to deformations: f (x) = f (x τ(x)), for some smooth τ. 2 Covariance to such deformations f (x) = f (x τ(x)), for smooth τ and bandlimited signals f ; 3 Tail bounds when f has a known statistical distribution (e.g. normal with known spectral power)

18 ConvNet Topology A deep convolution network is composed of multiple layers:

19 ConvNet One Layer Each layer is composed of two or three sublayers: convolution, downsampling, detection/pooling/merge.

20 ConvNet: Sublayers Linear Filters: Convolution and Pooling-to-Output Sublayer f (2) = g f (1), f (2) (x) = g(x ξ)f (1) (ξ)dξ where g B = {g S, ĝ L (R d )}. (B, ) is a Banach algebra with norm g B = ĝ. Notation: g for regular convolution filters, and Φ for pooling-to-output filters.

21 ConvNet: Sublayers Downsampling Sublayer f (2) (x) = f (1) (Dx) For f (1) L 2 (R d ) and D = D 0 I, f (2) L 2 (R d ) and f (2) 2 2 = R d f (2) (x) 2 dx = 1 f (1) (x) 2 dx = 1 det(d) R d D0 d f (1) 2 2

22 ConvNet: Sublayers Detection and Pooling Sublayer We consider three types of detection/pooling/merge sublayers: Type I, τ 1 : Componentwise Addition: z = k j=1 σ j (y j ) ( kj=1 Type II, τ 2 : p-norm aggregation: z = σ j (y j ) p) 1/p Type III, τ 3 : Componentwise Multiplication: z = k j=1 σ j (y j ) Assumptions: (1) σ j are scalar Lipschitz functions with Lip(σ j ) 1; (2) If σ j is connected to a multiplication block then σ j 1.

23 ConvNet: Sublayers MaxPooling and AveragePooling MaxPooling can be implemented as follows:

24 ConvNet: Sublayers MaxPooling and AveragePooling MaxPooling can be implemented as follows: AveragePooling can be implemented as follows:

25 ConvNet: Layer m Components of the m th layer

26 ConvNet: Layer m Topology coding of the m th layer n m denotes the number of input nodes in the m-th layer: I m = {N m,1, N m,2,, N m,nm }. Filters: 1 pooling filter: φ m,n for node n, in layer m; 2 convolution filter: g m,n,k for input node n to output node k, in layer m; For node n: G m,n = {g m,n;1, g m,n;km,n }. The set of all convolution filters in layer m: G m = nm n=1 G m,n.

27 ConvNet: Layer m Topology coding of the m th layer n m denotes the number of input nodes in the m-th layer: I m = {N m,1, N m,2,, N m,nm }. Filters: 1 pooling filter: φ m,n for node n, in layer m; 2 convolution filter: g m,n,k for input node n to output node k, in layer m; For node n: G m,n = {g m,n;1, g m,n;km,n }. The set of all convolution filters in layer m: G m = nm n=1 G m,n. O m = {N m,1, N m,2,, N m,n m } the set of output nodes of the m-th layer. Note that n m = n m+1 and there is a one-one correspondence between O m and I m+1. The output nodes automatically partitions G m into n m disjoint subsets G m = n m n =1 G m,n, where G m,n is the set of filters merged into N m,n.

28 ConvNet: Layer m Topology coding of the m th layer For each filter g m,n;k, we define an associated multiplier l m,n;k in the following way: suppose g m,n;k G m,n G, let K = m,n denote the cardinality of G m,n. Then l m,n;k = { K, if gm,n;k τ 1 τ 3 K max{0,2/p 1}, if g m,n;k τ 2 (2.1)

29 Layer Analysis Bessel Bounds In each layer m and for each input node n we define three types of Bessel bounds: 1st type Bessel bound: m,n = ˆφ k 2 m,n m,n + B (1) 2nd type Bessel bound: B (2) k=1 3rd type (or generating) bound: k=1 l m,n;k D d m,n;k ĝ m,n;k 2 k m,n m,n = l m,n;k Dm,n;k d ĝ m,n;k 2 (3.2) (3.3) B (3) m,n = ˆφ m,n 2. (3.4)

30 Layer Analysis Bessel Bounds Next we define the layer m Bessel bounds: 1 st type Bessel bound B m (1) = max B m,n (1) (3.5) 1 n n m 2 nd type Bessel bound B m (2) = max B m,n (2) (3.6) 1 n n m 3 rd type (generating) Bessel bound B m (3) = max B (3) 1 n n m m,n. (3.7) Remark. These bounds characterize semi-discrete Bessel systems.

31 Lipschitz Analysis First Result Theorem [BSZ17] Consider a Convolutional Neural Network with M layers as described before, where all scalar nonlinear functions are Lipschitz with Lip(ϕ m,n,n ) 1. Additionally, those ϕ m,n,n that aggregate into a multiplicative block satisfy ϕ m,n,n 1. Let the m-th layer 1st type Bessel bound be B m (1) = max ˆφ k 2 m,n m,n + l m,n;k Dm,n;k d ĝ m,n;k 2. 1 n n m k=1 Then the Lipschitz bound of the entire CNN is upper bounded by Mm=1 max(1, B (1) m ). Specifically, for any f, f L 2 (R d ): F(f ) F( f ) 2 2 ( M m=1 max(1, B (1) m ) ) f f 2 2,

32 Lipschitz Analysis Second Result Theorem Consider a Convolutional Neural Network with M layers as described before, where all scalar nonlinearities satisfy the same conditions as in the previous result. For layer m, let B m (1), B m (2), and B m (3) denote the three Bessel bounds defined earlier. Denote by L the optimal solution of the following linear program: Γ = max y 1,...,y M,z 1,...,z M 0 M z m m=1 s.t. y 0 = 1 y m + z m B (1) m y m 1, y m B (2) m y m 1, z m B (3) m y m 1, 1 m M 1 m M 1 m M (3.8)

33 Lipschitz Analysis Second Result - cont d Theorem Then the Lipschitz bound satisfies Lip(F) 2 Γ. Specifically, for any f, f L 2 (R d ): F(f ) F( f ) 2 2 Γ f f 2 2,

34 Example 1: Scattering Network The Lipschitz constant: Backpropagation/Chain rule: Lipschitz bound 40 (hence Lip 6.3).

35 Example 1: Scattering Network The Lipschitz constant: Backpropagation/Chain rule: Lipschitz bound 40 (hence Lip 6.3). Using our main theorem, Lip 1, but Mallat s result: Lip = 1. Filters have been choosen as in a dyadic wavelet decomposition. Thus B m (1) = B m (2) = B m (3) = 1, 1 m 4.

36 Example 2: A General Convolutive Neural Network

37 Example 2: A General Convolutive Neural Network Set p = 2 and: F (ω) = exp( 4ω2 + 4ω + 1 4ω 2 + 4ω )χ ( 1, 1/2)(ω) + χ ( 1/2,1/2) (ω) + exp( 4ω2 4ω + 1 4ω 2 4ω )χ (1/2,1)(ω). ˆφ 1 (ω) = F (ω) ĝ 1,j (ω) = F (ω + 2j 1/2) + F (ω 2j + 1/2), j = 1, 2, 3, 4 ˆφ 2 (ω) = exp( 4ω2 + 12ω + 9 4ω ω + 8 )χ ( 2, 3/2)(ω) + χ ( 3/2,3/2) (ω) + exp( 4ω2 12ω + 9 4ω 2 12ω + 8 )χ (3/2,2)(ω) ĝ 2,j (ω) = F (ω + 2j) + F (ω 2j), j = 1, 2, 3 ĝ 2,4 (ω) = F (ω + 2) + F (ω 2) ĝ 2,5 (ω) = F (ω + 5) + F (ω 5) ˆφ 3 (ω) = exp( 4ω2 + 20ω ω ω + 24 )χ ( 3, 5/2)(ω) + χ ( 5/2,5/2) (ω) + exp( 4ω2 20ω ω 2 20ω + 25 )χ (5/2,3)(ω).

38 Example 2: A General Convolutive Neural Network Bessel Bounds: B m (1) = 2e 1/3 = 1.43, B m (2) = B m (3) = 1. The Lipschitz bound: Using backpropagation/chain-rule: Lip 2 5. Using Theorem 1: Lip Using Theorem 2 (linear program): Lip

39 References [BSZ17] Radu Balan, Manish Singh, Dongmian Zou, Lipschitz Properties for Deep Convolutional Networks, available online arxiv [cs.lg], 18 Jan [BZ15] Radu Balan, Dongmian Zou, On Lipschitz Analysis and Lipschitz Synthesis for the Phase Retrieval Problem, available online arxiv v1 [mathfa], 6 June 2015; Lin. Alg. and Appl. 496 (2016), [BGC16] Yoshua Bengio, Ian Goodfellow and Aaron Courville, Deep learning, MIT Press, [BM13] Joan Bruna and Stephane Mallat, Invariant scattering convolution networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2013), no. 8, [BCLPS15] Joan Bruna, Soumith Chintala, Yann LeCun, Serkan Piantino, Arthur Szlam, and Mark Tygert, A theoretical argument for

40 complex-valued convolutional networks, CoRR abs/ (2015). [DDSLLF09] Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei Imagenet: A large-scale hierarchical image database, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, [HS97] Sepp Hochreiter and Jürgen Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, [KSH12] Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems 25, , [LBH15] Yann Lecun, Yoshua Bengio, and Geoffrey Hinton, Deep learning, Nature 521 (2015), no. 7553, [LSS14] Roi Livni, Shai Shalev-Shwartz, and Ohad Shamir, On the computational efficiency of training neural networks, Advances in

41 Neural Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger, eds.), Curran Associates, Inc., 2014, pp [Mallat12] Stephane Mallat, Group invariant scattering, Communications on Pure and Applied Mathematics 65 (2012), no. 10, [SVS15] Tara N. Sainath, Oriol Vinyals, Andrew W. Senior, and Hasim Sak, Convolutional, long short-term memory, fully connected deep neural networks, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, 2015, pp [SLJSRAEVR15] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Angelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich, Going deeper with convolutions, CVPR 2015, 2015.

42 [SZSBEGF13] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus, Intriguing properties of neural networks, CoRR abs/ (2013). [WB15a] Thomas Wiatowski and Helmut Bölcskei, Deep convolutional neural networks based on semi-discrete frames, Proc. of IEEE International Symposium on Information Theory (ISIT), June 2015, pp [WB15b]Thomas Wiatowski and Helmut Bölcskei, A mathematical theory of deep convolutional neural networks for feature extraction, IEEE Transactions on Information Theory (2015).

When Harmonic Analysis Meets Machine Learning: Lipschitz Analysis of Deep Convolution Networks

When Harmonic Analysis Meets Machine Learning: Lipschitz Analysis of Deep Convolution Networks When Harmonic Analysis Meets Machine Learning: Lipschitz Analysis of Deep Convolution Networks Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD Joint

More information

Classification of Hand-Written Digits Using Scattering Convolutional Network

Classification of Hand-Written Digits Using Scattering Convolutional Network Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview

More information

Notes on Adversarial Examples

Notes on Adversarial Examples Notes on Adversarial Examples David Meyer dmm@{1-4-5.net,uoregon.edu,...} March 14, 2017 1 Introduction The surprising discovery of adversarial examples by Szegedy et al. [6] has led to new ways of thinking

More information

Harmonic Analysis of Deep Convolutional Neural Networks

Harmonic Analysis of Deep Convolutional Neural Networks Harmonic Analysis of Deep Convolutional Neural Networks Helmut Bőlcskei Department of Information Technology and Electrical Engineering October 2017 joint work with Thomas Wiatowski and Philipp Grohs ImageNet

More information

arxiv: v1 [cs.cv] 11 May 2015 Abstract

arxiv: v1 [cs.cv] 11 May 2015 Abstract Training Deeper Convolutional Networks with Deep Supervision Liwei Wang Computer Science Dept UIUC lwang97@illinois.edu Chen-Yu Lee ECE Dept UCSD chl260@ucsd.edu Zhuowen Tu CogSci Dept UCSD ztu0@ucsd.edu

More information

ENSEMBLE METHODS AS A DEFENSE TO ADVERSAR-

ENSEMBLE METHODS AS A DEFENSE TO ADVERSAR- ENSEMBLE METHODS AS A DEFENSE TO ADVERSAR- IAL PERTURBATIONS AGAINST DEEP NEURAL NET- WORKS Anonymous authors Paper under double-blind review ABSTRACT Deep learning has become the state of the art approach

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY SIFT HOG ALL ABOUT THE FEATURES DAISY GABOR AlexNet GoogleNet CONVOLUTIONAL NEURAL NETWORKS VGG-19 ResNet FEATURES COMES FROM DATA

More information

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:

More information

COMPLEX INPUT CONVOLUTIONAL NEURAL NETWORKS FOR WIDE ANGLE SAR ATR

COMPLEX INPUT CONVOLUTIONAL NEURAL NETWORKS FOR WIDE ANGLE SAR ATR COMPLEX INPUT CONVOLUTIONAL NEURAL NETWORKS FOR WIDE ANGLE SAR ATR Michael Wilmanski #*1, Chris Kreucher *2, & Alfred Hero #3 # University of Michigan 500 S State St, Ann Arbor, MI 48109 1 wilmansk@umich.edu,

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Deep Convolutional Neural Networks on Cartoon Functions

Deep Convolutional Neural Networks on Cartoon Functions Deep Convolutional Neural Networks on Cartoon Functions Philipp Grohs, Thomas Wiatowski, and Helmut Bölcskei Dept. Math., ETH Zurich, Switzerland, and Dept. Math., University of Vienna, Austria Dept. IT

More information

LEARNING WITH A STRONG ADVERSARY

LEARNING WITH A STRONG ADVERSARY LEARNING WITH A STRONG ADVERSARY Ruitong Huang, Bing Xu, Dale Schuurmans and Csaba Szepesvári Department of Computer Science University of Alberta Edmonton, AB T6G 2E8, Canada {ruitong,bx3,daes,szepesva}@ualberta.ca

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Topics in Harmonic Analysis, Sparse Representations, and Data Analysis

Topics in Harmonic Analysis, Sparse Representations, and Data Analysis Topics in Harmonic Analysis, Sparse Representations, and Data Analysis Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu Thesis Defense,

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Very Deep Convolutional Neural Networks for LVCSR

Very Deep Convolutional Neural Networks for LVCSR INTERSPEECH 2015 Very Deep Convolutional Neural Networks for LVCSR Mengxiao Bi, Yanmin Qian, Kai Yu Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab,

More information

arxiv: v1 [cs.lg] 4 Mar 2019

arxiv: v1 [cs.lg] 4 Mar 2019 A Fundamental Performance Limitation for Adversarial Classification Abed AlRahman Al Makdah, Vaibhav Katewa, and Fabio Pasqualetti arxiv:1903.01032v1 [cs.lg] 4 Mar 2019 Abstract Despite the widespread

More information

arxiv: v1 [cs.cv] 21 Jul 2017

arxiv: v1 [cs.cv] 21 Jul 2017 CONFIDENCE ESTIMATION IN DEEP NEURAL NETWORKS VIA DENSITY MODELLING Akshayvarun Subramanya Suraj Srinivas R.Venkatesh Babu Video Analytics Lab, Department of Computational and Data Sciences Indian Institute

More information

arxiv: v1 [cs.cv] 14 Sep 2016

arxiv: v1 [cs.cv] 14 Sep 2016 Understanding Convolutional Neural Networks with A Mathematical Model arxiv:1609.04112v1 [cs.cv] 14 Sep 2016 Abstract C.-C. Jay Kuo Ming-Hsieh Department of Electrical Engineering University of Southern

More information

Compression of Deep Neural Networks on the Fly

Compression of Deep Neural Networks on the Fly Compression of Deep Neural Networks on the Fly Guillaume Soulié, Vincent Gripon, Maëlys Robert Télécom Bretagne name.surname@telecom-bretagne.eu arxiv:109.0874v1 [cs.lg] 9 Sep 01 Abstract Because of their

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

arxiv: v3 [cs.cl] 24 Feb 2018

arxiv: v3 [cs.cl] 24 Feb 2018 FACTORIZATION TRICKS FOR LSTM NETWORKS Oleksii Kuchaiev NVIDIA okuchaiev@nvidia.com Boris Ginsburg NVIDIA bginsburg@nvidia.com ABSTRACT arxiv:1703.10722v3 [cs.cl] 24 Feb 2018 We present two simple ways

More information

Norms and embeddings of classes of positive semidefinite matrices

Norms and embeddings of classes of positive semidefinite matrices Norms and embeddings of classes of positive semidefinite matrices Radu Balan Department of Mathematics, Center for Scientific Computation and Mathematical Modeling and the Norbert Wiener Center for Applied

More information

arxiv: v3 [cs.lg] 22 Nov 2017

arxiv: v3 [cs.lg] 22 Nov 2017 End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies Hesham M. Eraqi 1, Mohamed N. Moustafa 1, and Jens Honer 2 arxiv:1710.03804v3 [cs.lg] 22 Nov 2017 1 Department

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

arxiv: v1 [cs.lg] 12 Mar 2017

arxiv: v1 [cs.lg] 12 Mar 2017 Jörn-Henrik Jacobsen 1 Edouard Oyallon 2 Stéphane Mallat 2 Arnold W.M. Smeulders 1 arxiv:1703.04140v1 [cs.lg] 12 Mar 2017 Abstract Deep neural network algorithms are difficult to analyze because they lack

More information

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University

More information

Statistics of the Stability Bounds in the Phase Retrieval Problem

Statistics of the Stability Bounds in the Phase Retrieval Problem Statistics of the Stability Bounds in the Phase Retrieval Problem Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD January 7, 2017 AMS Special Session

More information

arxiv: v3 [cs.ne] 10 Mar 2017

arxiv: v3 [cs.ne] 10 Mar 2017 ROBUSTNESS TO ADVERSARIAL EXAMPLES THROUGH AN ENSEMBLE OF SPECIALISTS Mahdieh Abbasi & Christian Gagné Computer Vision and Systems Laboratory, Electrical and Computer Engineering Department Université

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

GENERATIVE MODELING OF CONVOLUTIONAL NEU-

GENERATIVE MODELING OF CONVOLUTIONAL NEU- GENERATIVE MODELING OF CONVOLUTIONAL NEU- RAL NETWORKS Jifeng Dai Microsoft Research jifdai@microsoft.com Yang Lu and Ying Nian Wu University of California, Los Angeles {yanglv, ywu}@stat.ucla.edu ABSTRACT

More information

Providing theoretical learning guarantees to Deep Learning Networks arxiv: v1 [cs.lg] 28 Nov 2017

Providing theoretical learning guarantees to Deep Learning Networks arxiv: v1 [cs.lg] 28 Nov 2017 Providing theoretical learning guarantees to Deep Learning Networks arxiv:1711.10292v1 [cs.lg] 28 Nov 2017 Rodrigo F. de Mello 1, Martha Dais Ferreira 1, and Moacir A. Ponti 1 1 University of São Paulo,

More information

Adversarial Examples Generation and Defense Based on Generative Adversarial Network

Adversarial Examples Generation and Defense Based on Generative Adversarial Network Adversarial Examples Generation and Defense Based on Generative Adversarial Network Fei Xia (06082760), Ruishan Liu (06119690) December 15, 2016 1 Abstract We propose a novel generative adversarial network

More information

Bayesian Networks (Part I)

Bayesian Networks (Part I) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Convolutional Neural Networks. Srikumar Ramalingam

Convolutional Neural Networks. Srikumar Ramalingam Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/

More information

Reading Group on Deep Learning Session 2

Reading Group on Deep Learning Session 2 Reading Group on Deep Learning Session 2 Stephane Lathuiliere & Pablo Mesejo 10 June 2016 1/39 Chapter Structure Introduction. 5.1. Feed-forward Network Functions. 5.2. Network Training. 5.3. Error Backpropagation.

More information

arxiv: v1 [cs.lg] 20 Apr 2017

arxiv: v1 [cs.lg] 20 Apr 2017 Softmax GAN Min Lin Qihoo 360 Technology co. ltd Beijing, China, 0087 mavenlin@gmail.com arxiv:704.069v [cs.lg] 0 Apr 07 Abstract Softmax GAN is a novel variant of Generative Adversarial Network (GAN).

More information

Maxout Networks. Hien Quoc Dang

Maxout Networks. Hien Quoc Dang Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine

More information

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This

More information

arxiv: v1 [cs.cv] 1 Oct 2015

arxiv: v1 [cs.cv] 1 Oct 2015 A Deep Neural Network ion Pipeline: Pruning, Quantization, Huffman Encoding arxiv:1510.00149v1 [cs.cv] 1 Oct 2015 Song Han 1 Huizi Mao 2 William J. Dally 1 1 Stanford University 2 Tsinghua University {songhan,

More information

arxiv: v3 [stat.ml] 20 Mar 2015

arxiv: v3 [stat.ml] 20 Mar 2015 EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy Google Inc., Mountain View, CA {goodfellow,shlens,szegedy}@google.com arxiv:1412.6572v3 [stat.ml] 20

More information

Invariant Scattering Convolution Networks

Invariant Scattering Convolution Networks Invariant Scattering Convolution Networks Joan Bruna and Stephane Mallat Submitted to PAMI, Feb. 2012 Presented by Bo Chen Other important related papers: [1] S. Mallat, A Theory for Multiresolution Signal

More information

CONVOLUTIONAL neural networks(cnns) [2] [4] have

CONVOLUTIONAL neural networks(cnns) [2] [4] have 1 Hartley Spectral Pooling for Deep Learning Hao Zhang, Jianwei Ma arxiv:1810.04028v1 [cs.cv] 7 Oct 2018 Abstract In most convolution neural networks (CNNs), downsampling hidden layers is adopted for increasing

More information

Theories of Deep Learning

Theories of Deep Learning Theories of Deep Learning Lecture 02 Donoho, Monajemi, Papyan Department of Statistics Stanford Oct. 4, 2017 1 / 50 Stats 385 Fall 2017 2 / 50 Stats 285 Fall 2017 3 / 50 Course info Wed 3:00-4:20 PM in

More information

Expressiveness of Rectifier Networks

Expressiveness of Rectifier Networks Xingyuan Pan Vivek Srikumar The University of Utah, Salt Lake City, UT 84112, USA XPAN@CS.UTAH.EDU SVIVEK@CS.UTAH.EDU From the learning point of view, the choice of an activation function is driven by

More information

Convolutional neural networks

Convolutional neural networks 11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 1 / 35 Outline 1 Universality of Neural Networks

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Deep Learning Year in Review 2016: Computer Vision Perspective

Deep Learning Year in Review 2016: Computer Vision Perspective Deep Learning Year in Review 2016: Computer Vision Perspective Alex Kalinin, PhD Candidate Bioinformatics @ UMich alxndrkalinin@gmail.com @alxndrkalinin Architectures Summary of CNN architecture development

More information

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability Maithra Raghu, 1,2 Justin Gilmer, 1 Jason Yosinski, 3 & Jascha Sohl-Dickstein 1 1 Google Brain 2 Cornell

More information

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of

More information

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr

More information

Deep Convolutional Neural Networks Based on Semi-Discrete Frames

Deep Convolutional Neural Networks Based on Semi-Discrete Frames Deep Convolutional Neural Networks Based on Semi-Discrete Frames Thomas Wiatowski and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {withomas, boelcskei}@nari.ee.ethz.ch arxiv:504.05487v

More information

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun and Rob Fergus Dept. of Computer Science, Courant Institute, New

More information

Supervised Deep Learning

Supervised Deep Learning Supervised Deep Learning Joana Frontera-Pons Grigorios Tsagkatakis Dictionary Learning on Manifolds workshop Nice, September 2017 Supervised Learning Data Labels Model Prediction Spiral Exploiting prior

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

Expressiveness of Rectifier Networks

Expressiveness of Rectifier Networks Xingyuan Pan Vivek Srikumar The University of Utah, Salt Lake City, UT 84112, USA XPAN@CS.UTAH.EDU SVIVEK@CS.UTAH.EDU Abstract Rectified Linear Units (ReLUs have been shown to ameliorate the vanishing

More information

arxiv: v1 [cs.lg] 15 Nov 2017 ABSTRACT

arxiv: v1 [cs.lg] 15 Nov 2017 ABSTRACT THE BEST DEFENSE IS A GOOD OFFENSE: COUNTERING BLACK BOX ATTACKS BY PREDICTING SLIGHTLY WRONG LABELS Yannic Kilcher Department of Computer Science ETH Zurich yannic.kilcher@inf.ethz.ch Thomas Hofmann Department

More information

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Convolutional Networks Convolutional

More information

Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers

Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers Alexander Binder 1, Grégoire Montavon 2, Sebastian Lapuschkin 3, Klaus-Robert Müller 2,4, and Wojciech Samek 3 1 ISTD

More information

LARGE BATCH TRAINING OF CONVOLUTIONAL NET-

LARGE BATCH TRAINING OF CONVOLUTIONAL NET- LARGE BATCH TRAINING OF CONVOLUTIONAL NET- WORKS WITH LAYER-WISE ADAPTIVE RATE SCALING Anonymous authors Paper under double-blind review ABSTRACT A common way to speed up training of deep convolutional

More information

On Modern Deep Learning and Variational Inference

On Modern Deep Learning and Variational Inference On Modern Deep Learning and Variational Inference Yarin Gal University of Cambridge {yg279,zg201}@cam.ac.uk Zoubin Ghahramani Abstract Bayesian modelling and variational inference are rooted in Bayesian

More information

Seq2Tree: A Tree-Structured Extension of LSTM Network

Seq2Tree: A Tree-Structured Extension of LSTM Network Seq2Tree: A Tree-Structured Extension of LSTM Network Weicheng Ma Computer Science Department, Boston University 111 Cummington Mall, Boston, MA wcma@bu.edu Kai Cao Cambia Health Solutions kai.cao@cambiahealth.com

More information

arxiv: v1 [cs.cv] 21 Nov 2016

arxiv: v1 [cs.cv] 21 Nov 2016 Training Sparse Neural Networks Suraj Srinivas Akshayvarun Subramanya Indian Institute of Science Indian Institute of Science Bangalore, India Bangalore, India surajsrinivas@grads.cds.iisc.ac.in akshayvarun07@gmail.com

More information

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks Modeling Time-Frequency Patterns with LSTM vs Convolutional Architectures for LVCSR Tasks Tara N Sainath, Bo Li Google, Inc New York, NY, USA {tsainath, boboli}@googlecom Abstract Various neural network

More information

arxiv: v6 [stat.ml] 18 Jan 2016

arxiv: v6 [stat.ml] 18 Jan 2016 BAYESIAN CONVOLUTIONAL NEURAL NETWORKS WITH BERNOULLI APPROXIMATE VARIATIONAL INFERENCE Yarin Gal & Zoubin Ghahramani University of Cambridge {yg279,zg201}@cam.ac.uk arxiv:1506.02158v6 [stat.ml] 18 Jan

More information

Regularizing Deep Networks Using Efficient Layerwise Adversarial Training

Regularizing Deep Networks Using Efficient Layerwise Adversarial Training Related Work Many approaches have been proposed to regularize the training procedure of very deep networks. Early stopping and statistical techniques like weight decay are commonly used to prevent overfitting.

More information

ANALYSIS ON GRADIENT PROPAGATION IN BATCH NORMALIZED RESIDUAL NETWORKS

ANALYSIS ON GRADIENT PROPAGATION IN BATCH NORMALIZED RESIDUAL NETWORKS ANALYSIS ON GRADIENT PROPAGATION IN BATCH NORMALIZED RESIDUAL NETWORKS Anonymous authors Paper under double-blind review ABSTRACT We conduct mathematical analysis on the effect of batch normalization (BN)

More information

P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks

P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks Rahul Duggal rahulduggal2608@gmail.com Anubha Gupta anubha@iiitd.ac.in SBILab (http://sbilab.iiitd.edu.in/) Deptt. of

More information

arxiv: v1 [cs.lg] 11 May 2015

arxiv: v1 [cs.lg] 11 May 2015 Improving neural networks with bunches of neurons modeled by Kumaraswamy units: Preliminary study Jakub M. Tomczak JAKUB.TOMCZAK@PWR.EDU.PL Wrocław University of Technology, wybrzeże Wyspiańskiego 7, 5-37,

More information

Recurrent Residual Learning for Sequence Classification

Recurrent Residual Learning for Sequence Classification Recurrent Residual Learning for Sequence Classification Yiren Wang University of Illinois at Urbana-Champaign yiren@illinois.edu Fei ian Microsoft Research fetia@microsoft.com Abstract In this paper, we

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

arxiv: v2 [cs.cv] 30 May 2015

arxiv: v2 [cs.cv] 30 May 2015 Deep Roto-Translation Scattering for Object Classification Edouard Oyallon and Stéphane Mallat Département Informatique, Ecole Normale Supérieure 45 rue d Ulm, 75005 Paris edouard.oyallon@ens.fr arxiv:1412.8659v2

More information

Accelerating Convolutional Neural Networks by Group-wise 2D-filter Pruning

Accelerating Convolutional Neural Networks by Group-wise 2D-filter Pruning Accelerating Convolutional Neural Networks by Group-wise D-filter Pruning Niange Yu Department of Computer Sicence and Technology Tsinghua University Beijing 0008, China yng@mails.tsinghua.edu.cn Shi Qiu

More information

arxiv: v1 [cs.lg] 17 Dec 2017

arxiv: v1 [cs.lg] 17 Dec 2017 Deep Neural Networks as 0-1 Mixed Integer Linear Programs: A Feasibility Study Matteo Fischetti and Jason Jo + Department of Information Engineering (DEI), University of Padova matteo.fischetti@unipd.it

More information

Manitest: Are classifiers really invariant?

Manitest: Are classifiers really invariant? FAWZI, FROSSARD: MANITEST: ARE CLASSIFIERS REALLY INVARIANT? Manitest: Are classifiers really invariant? Alhussein Fawzi alhussein.fawzi@epfl.ch Pascal Frossard pascal.frossard@epfl.ch Signal Processing

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Image classification via Scattering Transform

Image classification via Scattering Transform 1 Image classification via Scattering Transform Edouard Oyallon under the supervision of Stéphane Mallat following the work of Laurent Sifre, Joan Bruna, Classification of signals Let n>0, (X, Y ) 2 R

More information

arxiv: v1 [cs.cv] 21 Oct 2016

arxiv: v1 [cs.cv] 21 Oct 2016 RODNER ET AL.: SENSITIVITY ANALYSIS OF CNNS 1 arxiv:161.6756v1 [cs.cv] 21 Oct 16 Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches Erik Rodner

More information

Deep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks

Deep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. X, XXXX 2016 1 Deep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks Mohammad Javad Shafiee, Student Member, IEEE, Akshaya Mishra, Member,

More information

Robustness of classifiers: from adversarial to random noise

Robustness of classifiers: from adversarial to random noise Robustness of classifiers: from adversarial to random noise Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard École Polytechnique Fédérale de Lausanne Lausanne, Switzerland {alhussein.fawzi,

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Deep Learning Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein, Pieter Abbeel, Anca Dragan for CS188 Intro to AI at UC Berkeley.

More information

arxiv: v4 [cs.cv] 6 Sep 2017

arxiv: v4 [cs.cv] 6 Sep 2017 Deep Pyramidal Residual Networks Dongyoon Han EE, KAIST dyhan@kaist.ac.kr Jiwhan Kim EE, KAIST jhkim89@kaist.ac.kr Junmo Kim EE, KAIST junmo.kim@kaist.ac.kr arxiv:1610.02915v4 [cs.cv] 6 Sep 2017 Abstract

More information

Deep Learning for Automatic Speech Recognition Part II

Deep Learning for Automatic Speech Recognition Part II Deep Learning for Automatic Speech Recognition Part II Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief revisit of sampling, pitch/formant and MFCC DNN-HMM

More information

Adversarially Robust Optimization and Generalization

Adversarially Robust Optimization and Generalization Adversarially Robust Optimization and Generalization Ludwig Schmidt MIT UC Berkeley Based on joint works with Logan ngstrom (MIT), Aleksander Madry (MIT), Aleksandar Makelov (MIT), Dimitris Tsipras (MIT),

More information

arxiv: v3 [cs.lg] 21 May 2014

arxiv: v3 [cs.lg] 21 May 2014 Spectral Networks and Deep Locally Connected Networks on Graphs arxiv:1312.6203v3 [cs.lg] 21 May 2014 Joan Bruna New York University bruna@cims.nyu.edu Arthur Szlam The City College of New York aszlam@ccny.cuny.edu

More information

Neural Networks: Introduction

Neural Networks: Introduction Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1

More information

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Ben Haeffele and René Vidal Center for Imaging Science Institute for Computational Medicine Learning Deep Image Feature Hierarchies

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

arxiv: v1 [cs.lg] 4 Dec 2017

arxiv: v1 [cs.lg] 4 Dec 2017 Adaptive Quantization for Deep Neural Network Yiren Zhou 1, Seyed-Mohsen Moosavi-Dezfooli, Ngai-Man Cheung 1, Pascal Frossard 1 Singapore University of Technology and Design (SUTD) École Polytechnique

More information

Deep learning attracts lots of attention.

Deep learning attracts lots of attention. Deep Learning Deep learning attracts lots of attention. I believe you have seen lots of exciting results before. Deep learning trends at Google. Source: SIGMOD/Jeff Dean Ups and downs of Deep Learning

More information

Unsupervised Feature Learning from Temporal Data

Unsupervised Feature Learning from Temporal Data Unsupervised Feature Learning from Temporal Data Rostislav Goroshin 1 goroshin@cims.nyu.edu Joan Bruna 1 bruna@cims.nyu.edu Jonathan Tompson 1 tompson@cims.nyu.edu Arthur Szlam 2 aszlam@ccny.cuny.edu David

More information

UNDERSTANDING LOCAL MINIMA IN NEURAL NET-

UNDERSTANDING LOCAL MINIMA IN NEURAL NET- UNDERSTANDING LOCAL MINIMA IN NEURAL NET- WORKS BY LOSS SURFACE DECOMPOSITION Anonymous authors Paper under double-blind review ABSTRACT To provide principled ways of designing proper Deep Neural Network

More information

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- Workshop track - ICLR COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- CURRENT NEURAL NETWORKS Daniel Fojo, Víctor Campos, Xavier Giró-i-Nieto Universitat Politècnica de Catalunya, Barcelona Supercomputing

More information

Provable approximation properties for deep neural networks

Provable approximation properties for deep neural networks We discuss approximation properties of deep neural nets, in the case that the data concentrates near a d-dimensional manifold Γ R m. Our network essentially computes wavelet functions, which are computed

More information

Group Equivariant Convolutional Networks. Taco Cohen

Group Equivariant Convolutional Networks. Taco Cohen Group Equivariant Convolutional Networks Taco Cohen Outline 1. Symmetry & Deep Learning: Statistical Power from Symmetry Invariance vs Equivariance Equivariance in Deep Learning 2.Group Theory Symmetry,

More information