Lipschitz Properties of General Convolutional Neural Networks
|
|
- Jemimah Sophie Lawrence
- 5 years ago
- Views:
Transcription
1 Lipschitz Properties of General Convolutional Neural Networks Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD Joint work with Dongmian Zou (UMD), Maneesh Singh (Verisk) March 22, 2017 Image Analysis Seminar University of Houston, Houston TX
2 This material is based upon work supported by the National Science Foundation under Grant No. DMS Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The author has been partially supported by ARO under grant W911NF
3 Table of Contents: 1 Problem Formulation 2 Deep Convolutional Neural Networks 3 Lipschitz Analysis 4 Numerical Results
4 Problem Formulation Nonlinear Maps Consider a nonlinear function between two metric spaces, F : (X, d X ) (Y, d Y ).
5 Problem Formulation Lipschitz analysis of nonlinear systems F : (X, d X ) (Y, d Y ) F is called Lipschitz with constant C if for any f, f X, d Y (F(f ), F( f )) C d X (f, f ) The optimal (i.e. smallest) Lipschitz constant is denoted Lip(F). The square C 2 is called Lipschitz bound (similar to the Bessel bound). F is called bi-lipschitz with constants C 1, C 2 > 0 if for any f, f X, C 1 d X (f, f ) d Y (F(f ), F( f )) C 2 d X (f, f ) The square C 2 1, C 2 2 are called Lipschitz bounds (similar to frame bounds).
6 Problem Formulation Motivating Examples Consider the typical neural network as a feature extractor component in a classification system: g = F(f ) = F M (...F 1 (f ; W 1, ϕ 1 );...; W M, ϕ M ) F m (f ; W m, ϕ m ) = ϕ m (W m f ) W m is a linear operator (matrix); ϕ m is a Lip(1) scalar nonlinearity (e.g. Rectified Linear Unit).
7 Problem Formulation Deep Convolutional Neural Networks Lipschitz Analysis Numerical Results Problem Formulation Motivating Example: AlexNet Example from [SZSBEGF13] ( Intriguing properties... ). ImageNet dataset [DDSLLF09] (10,184 categories; 8.9 mil.img.); AlexNet architecture [KSH12]: Figure: From Krizhevsky et all 2012 [KSH12]: AlexNet: 5 convolutive layers + 3 dense layers. Input size: 224x224x3 pixels. Output size: 1000.
8 Problem Formulation Motivating Example: AlexNet The authors of [SZSBEGF13] ( Intriguing properties... ) found small variations of the input, almost imperceptible, that produced completely different classification decisions: Figure: From Szegedy et all 2013 [SZSBEGF13]: AlexNet: 6 different classes: original image, difference, and adversarial example all classified as ostrich
9 Problem Formulation Motivating Example: Alex Net Szegedy et all 2013 [SZSBEGF13] computed the Lipschitz constants of each layer. Overall Lipschitz constant: Layer Size Sing.Val Conv Conv Conv Conv Conv Fully Conn (43264) Fully Conn Fully Conn Lip = 5, 612, 006
10 Problem Formulation Motivating Example: Scattering Network Example of Scattering Network; definition and properties: [Mallat12]; this example from [BSZ17]: Input: f ; Outputs: y = (y l,k ).
11 Problem Formulation Motivating Example: Scattering Network Remarks: Outputs from each layer
12 Problem Formulation Motivating Example: Scattering Network Remarks: Outputs from each layer Tree-like topology
13 Problem Formulation Motivating Example: Scattering Network Remarks: Outputs from each layer Tree-like topology Backpropagation/Chain rule: Lipschitz bound 40.
14 Problem Formulation Motivating Example: Scattering Network Remarks: Outputs from each layer Tree-like topology Backpropagation/Chain rule: Lipschitz bound 40. Mallat s result predicts Lip = 1.
15 Problem Formulation Problem 1 Given a deep network: Estimate the Lipschitz constant, or bound: Lip = 2 y ỹ sup 2 y ỹ 2, Bound = sup f f L 2 f f 2 f f L 2 f f 2. 2
16 Problem Formulation Problem 1 Given a deep network: Estimate the Lipschitz constant, or bound: Lip = Methods (Approaches): 2 y ỹ sup 2 y ỹ 2, Bound = sup f f L 2 f f 2 f f L 2 f f Standard Method: Backpropagation, or chain-rule 2 New Method: Storage function based approach (dissipative systems) 3 Numerical Method: Simulations
17 Problem Formulation Problem 2 Given a deep network: Estimate the stability of the output to specific variations of the input: 1 Invariance to deformations: f (x) = f (x τ(x)), for some smooth τ. 2 Covariance to such deformations f (x) = f (x τ(x)), for smooth τ and bandlimited signals f ; 3 Tail bounds when f has a known statistical distribution (e.g. normal with known spectral power)
18 ConvNet Topology A deep convolution network is composed of multiple layers:
19 ConvNet One Layer Each layer is composed of two or three sublayers: convolution, downsampling, detection/pooling/merge.
20 ConvNet: Sublayers Linear Filters: Convolution and Pooling-to-Output Sublayer f (2) = g f (1), f (2) (x) = g(x ξ)f (1) (ξ)dξ where g B = {g S, ĝ L (R d )}. (B, ) is a Banach algebra with norm g B = ĝ. Notation: g for regular convolution filters, and Φ for pooling-to-output filters.
21 ConvNet: Sublayers Downsampling Sublayer f (2) (x) = f (1) (Dx) For f (1) L 2 (R d ) and D = D 0 I, f (2) L 2 (R d ) and f (2) 2 2 = R d f (2) (x) 2 dx = 1 f (1) (x) 2 dx = 1 det(d) R d D0 d f (1) 2 2
22 ConvNet: Sublayers Detection and Pooling Sublayer We consider three types of detection/pooling/merge sublayers: Type I, τ 1 : Componentwise Addition: z = k j=1 σ j (y j ) ( kj=1 Type II, τ 2 : p-norm aggregation: z = σ j (y j ) p) 1/p Type III, τ 3 : Componentwise Multiplication: z = k j=1 σ j (y j ) Assumptions: (1) σ j are scalar Lipschitz functions with Lip(σ j ) 1; (2) If σ j is connected to a multiplication block then σ j 1.
23 ConvNet: Sublayers MaxPooling and AveragePooling MaxPooling can be implemented as follows:
24 ConvNet: Sublayers MaxPooling and AveragePooling MaxPooling can be implemented as follows: AveragePooling can be implemented as follows:
25 ConvNet: Layer m Components of the m th layer
26 ConvNet: Layer m Topology coding of the m th layer n m denotes the number of input nodes in the m-th layer: I m = {N m,1, N m,2,, N m,nm }. Filters: 1 pooling filter: φ m,n for node n, in layer m; 2 convolution filter: g m,n,k for input node n to output node k, in layer m; For node n: G m,n = {g m,n;1, g m,n;km,n }. The set of all convolution filters in layer m: G m = nm n=1 G m,n.
27 ConvNet: Layer m Topology coding of the m th layer n m denotes the number of input nodes in the m-th layer: I m = {N m,1, N m,2,, N m,nm }. Filters: 1 pooling filter: φ m,n for node n, in layer m; 2 convolution filter: g m,n,k for input node n to output node k, in layer m; For node n: G m,n = {g m,n;1, g m,n;km,n }. The set of all convolution filters in layer m: G m = nm n=1 G m,n. O m = {N m,1, N m,2,, N m,n m } the set of output nodes of the m-th layer. Note that n m = n m+1 and there is a one-one correspondence between O m and I m+1. The output nodes automatically partitions G m into n m disjoint subsets G m = n m n =1 G m,n, where G m,n is the set of filters merged into N m,n.
28 ConvNet: Layer m Topology coding of the m th layer For each filter g m,n;k, we define an associated multiplier l m,n;k in the following way: suppose g m,n;k G m,n G, let K = m,n denote the cardinality of G m,n. Then l m,n;k = { K, if gm,n;k τ 1 τ 3 K max{0,2/p 1}, if g m,n;k τ 2 (2.1)
29 Layer Analysis Bessel Bounds In each layer m and for each input node n we define three types of Bessel bounds: 1st type Bessel bound: m,n = ˆφ k 2 m,n m,n + B (1) 2nd type Bessel bound: B (2) k=1 3rd type (or generating) bound: k=1 l m,n;k D d m,n;k ĝ m,n;k 2 k m,n m,n = l m,n;k Dm,n;k d ĝ m,n;k 2 (3.2) (3.3) B (3) m,n = ˆφ m,n 2. (3.4)
30 Layer Analysis Bessel Bounds Next we define the layer m Bessel bounds: 1 st type Bessel bound B m (1) = max B m,n (1) (3.5) 1 n n m 2 nd type Bessel bound B m (2) = max B m,n (2) (3.6) 1 n n m 3 rd type (generating) Bessel bound B m (3) = max B (3) 1 n n m m,n. (3.7) Remark. These bounds characterize semi-discrete Bessel systems.
31 Lipschitz Analysis First Result Theorem [BSZ17] Consider a Convolutional Neural Network with M layers as described before, where all scalar nonlinear functions are Lipschitz with Lip(ϕ m,n,n ) 1. Additionally, those ϕ m,n,n that aggregate into a multiplicative block satisfy ϕ m,n,n 1. Let the m-th layer 1st type Bessel bound be B m (1) = max ˆφ k 2 m,n m,n + l m,n;k Dm,n;k d ĝ m,n;k 2. 1 n n m k=1 Then the Lipschitz bound of the entire CNN is upper bounded by Mm=1 max(1, B (1) m ). Specifically, for any f, f L 2 (R d ): F(f ) F( f ) 2 2 ( M m=1 max(1, B (1) m ) ) f f 2 2,
32 Lipschitz Analysis Second Result Theorem Consider a Convolutional Neural Network with M layers as described before, where all scalar nonlinearities satisfy the same conditions as in the previous result. For layer m, let B m (1), B m (2), and B m (3) denote the three Bessel bounds defined earlier. Denote by L the optimal solution of the following linear program: Γ = max y 1,...,y M,z 1,...,z M 0 M z m m=1 s.t. y 0 = 1 y m + z m B (1) m y m 1, y m B (2) m y m 1, z m B (3) m y m 1, 1 m M 1 m M 1 m M (3.8)
33 Lipschitz Analysis Second Result - cont d Theorem Then the Lipschitz bound satisfies Lip(F) 2 Γ. Specifically, for any f, f L 2 (R d ): F(f ) F( f ) 2 2 Γ f f 2 2,
34 Example 1: Scattering Network The Lipschitz constant: Backpropagation/Chain rule: Lipschitz bound 40 (hence Lip 6.3).
35 Example 1: Scattering Network The Lipschitz constant: Backpropagation/Chain rule: Lipschitz bound 40 (hence Lip 6.3). Using our main theorem, Lip 1, but Mallat s result: Lip = 1. Filters have been choosen as in a dyadic wavelet decomposition. Thus B m (1) = B m (2) = B m (3) = 1, 1 m 4.
36 Example 2: A General Convolutive Neural Network
37 Example 2: A General Convolutive Neural Network Set p = 2 and: F (ω) = exp( 4ω2 + 4ω + 1 4ω 2 + 4ω )χ ( 1, 1/2)(ω) + χ ( 1/2,1/2) (ω) + exp( 4ω2 4ω + 1 4ω 2 4ω )χ (1/2,1)(ω). ˆφ 1 (ω) = F (ω) ĝ 1,j (ω) = F (ω + 2j 1/2) + F (ω 2j + 1/2), j = 1, 2, 3, 4 ˆφ 2 (ω) = exp( 4ω2 + 12ω + 9 4ω ω + 8 )χ ( 2, 3/2)(ω) + χ ( 3/2,3/2) (ω) + exp( 4ω2 12ω + 9 4ω 2 12ω + 8 )χ (3/2,2)(ω) ĝ 2,j (ω) = F (ω + 2j) + F (ω 2j), j = 1, 2, 3 ĝ 2,4 (ω) = F (ω + 2) + F (ω 2) ĝ 2,5 (ω) = F (ω + 5) + F (ω 5) ˆφ 3 (ω) = exp( 4ω2 + 20ω ω ω + 24 )χ ( 3, 5/2)(ω) + χ ( 5/2,5/2) (ω) + exp( 4ω2 20ω ω 2 20ω + 25 )χ (5/2,3)(ω).
38 Example 2: A General Convolutive Neural Network Bessel Bounds: B m (1) = 2e 1/3 = 1.43, B m (2) = B m (3) = 1. The Lipschitz bound: Using backpropagation/chain-rule: Lip 2 5. Using Theorem 1: Lip Using Theorem 2 (linear program): Lip
39 References [BSZ17] Radu Balan, Manish Singh, Dongmian Zou, Lipschitz Properties for Deep Convolutional Networks, available online arxiv [cs.lg], 18 Jan [BZ15] Radu Balan, Dongmian Zou, On Lipschitz Analysis and Lipschitz Synthesis for the Phase Retrieval Problem, available online arxiv v1 [mathfa], 6 June 2015; Lin. Alg. and Appl. 496 (2016), [BGC16] Yoshua Bengio, Ian Goodfellow and Aaron Courville, Deep learning, MIT Press, [BM13] Joan Bruna and Stephane Mallat, Invariant scattering convolution networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2013), no. 8, [BCLPS15] Joan Bruna, Soumith Chintala, Yann LeCun, Serkan Piantino, Arthur Szlam, and Mark Tygert, A theoretical argument for
40 complex-valued convolutional networks, CoRR abs/ (2015). [DDSLLF09] Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei Imagenet: A large-scale hierarchical image database, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, [HS97] Sepp Hochreiter and Jürgen Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, [KSH12] Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems 25, , [LBH15] Yann Lecun, Yoshua Bengio, and Geoffrey Hinton, Deep learning, Nature 521 (2015), no. 7553, [LSS14] Roi Livni, Shai Shalev-Shwartz, and Ohad Shamir, On the computational efficiency of training neural networks, Advances in
41 Neural Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger, eds.), Curran Associates, Inc., 2014, pp [Mallat12] Stephane Mallat, Group invariant scattering, Communications on Pure and Applied Mathematics 65 (2012), no. 10, [SVS15] Tara N. Sainath, Oriol Vinyals, Andrew W. Senior, and Hasim Sak, Convolutional, long short-term memory, fully connected deep neural networks, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, 2015, pp [SLJSRAEVR15] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Angelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich, Going deeper with convolutions, CVPR 2015, 2015.
42 [SZSBEGF13] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus, Intriguing properties of neural networks, CoRR abs/ (2013). [WB15a] Thomas Wiatowski and Helmut Bölcskei, Deep convolutional neural networks based on semi-discrete frames, Proc. of IEEE International Symposium on Information Theory (ISIT), June 2015, pp [WB15b]Thomas Wiatowski and Helmut Bölcskei, A mathematical theory of deep convolutional neural networks for feature extraction, IEEE Transactions on Information Theory (2015).
When Harmonic Analysis Meets Machine Learning: Lipschitz Analysis of Deep Convolution Networks
When Harmonic Analysis Meets Machine Learning: Lipschitz Analysis of Deep Convolution Networks Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD Joint
More informationClassification of Hand-Written Digits Using Scattering Convolutional Network
Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview
More informationNotes on Adversarial Examples
Notes on Adversarial Examples David Meyer dmm@{1-4-5.net,uoregon.edu,...} March 14, 2017 1 Introduction The surprising discovery of adversarial examples by Szegedy et al. [6] has led to new ways of thinking
More informationHarmonic Analysis of Deep Convolutional Neural Networks
Harmonic Analysis of Deep Convolutional Neural Networks Helmut Bőlcskei Department of Information Technology and Electrical Engineering October 2017 joint work with Thomas Wiatowski and Philipp Grohs ImageNet
More informationarxiv: v1 [cs.cv] 11 May 2015 Abstract
Training Deeper Convolutional Networks with Deep Supervision Liwei Wang Computer Science Dept UIUC lwang97@illinois.edu Chen-Yu Lee ECE Dept UCSD chl260@ucsd.edu Zhuowen Tu CogSci Dept UCSD ztu0@ucsd.edu
More informationENSEMBLE METHODS AS A DEFENSE TO ADVERSAR-
ENSEMBLE METHODS AS A DEFENSE TO ADVERSAR- IAL PERTURBATIONS AGAINST DEEP NEURAL NET- WORKS Anonymous authors Paper under double-blind review ABSTRACT Deep learning has become the state of the art approach
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationRAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY
RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY SIFT HOG ALL ABOUT THE FEATURES DAISY GABOR AlexNet GoogleNet CONVOLUTIONAL NEURAL NETWORKS VGG-19 ResNet FEATURES COMES FROM DATA
More informationLecture 7 Convolutional Neural Networks
Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:
More informationCOMPLEX INPUT CONVOLUTIONAL NEURAL NETWORKS FOR WIDE ANGLE SAR ATR
COMPLEX INPUT CONVOLUTIONAL NEURAL NETWORKS FOR WIDE ANGLE SAR ATR Michael Wilmanski #*1, Chris Kreucher *2, & Alfred Hero #3 # University of Michigan 500 S State St, Ann Arbor, MI 48109 1 wilmansk@umich.edu,
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationDeep Convolutional Neural Networks on Cartoon Functions
Deep Convolutional Neural Networks on Cartoon Functions Philipp Grohs, Thomas Wiatowski, and Helmut Bölcskei Dept. Math., ETH Zurich, Switzerland, and Dept. Math., University of Vienna, Austria Dept. IT
More informationLEARNING WITH A STRONG ADVERSARY
LEARNING WITH A STRONG ADVERSARY Ruitong Huang, Bing Xu, Dale Schuurmans and Csaba Szepesvári Department of Computer Science University of Alberta Edmonton, AB T6G 2E8, Canada {ruitong,bx3,daes,szepesva}@ualberta.ca
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationTopics in Harmonic Analysis, Sparse Representations, and Data Analysis
Topics in Harmonic Analysis, Sparse Representations, and Data Analysis Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu Thesis Defense,
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationVery Deep Convolutional Neural Networks for LVCSR
INTERSPEECH 2015 Very Deep Convolutional Neural Networks for LVCSR Mengxiao Bi, Yanmin Qian, Kai Yu Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab,
More informationarxiv: v1 [cs.lg] 4 Mar 2019
A Fundamental Performance Limitation for Adversarial Classification Abed AlRahman Al Makdah, Vaibhav Katewa, and Fabio Pasqualetti arxiv:1903.01032v1 [cs.lg] 4 Mar 2019 Abstract Despite the widespread
More informationarxiv: v1 [cs.cv] 21 Jul 2017
CONFIDENCE ESTIMATION IN DEEP NEURAL NETWORKS VIA DENSITY MODELLING Akshayvarun Subramanya Suraj Srinivas R.Venkatesh Babu Video Analytics Lab, Department of Computational and Data Sciences Indian Institute
More informationarxiv: v1 [cs.cv] 14 Sep 2016
Understanding Convolutional Neural Networks with A Mathematical Model arxiv:1609.04112v1 [cs.cv] 14 Sep 2016 Abstract C.-C. Jay Kuo Ming-Hsieh Department of Electrical Engineering University of Southern
More informationCompression of Deep Neural Networks on the Fly
Compression of Deep Neural Networks on the Fly Guillaume Soulié, Vincent Gripon, Maëlys Robert Télécom Bretagne name.surname@telecom-bretagne.eu arxiv:109.0874v1 [cs.lg] 9 Sep 01 Abstract Because of their
More informationDeep Learning (CNNs)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell
More informationarxiv: v3 [cs.cl] 24 Feb 2018
FACTORIZATION TRICKS FOR LSTM NETWORKS Oleksii Kuchaiev NVIDIA okuchaiev@nvidia.com Boris Ginsburg NVIDIA bginsburg@nvidia.com ABSTRACT arxiv:1703.10722v3 [cs.cl] 24 Feb 2018 We present two simple ways
More informationNorms and embeddings of classes of positive semidefinite matrices
Norms and embeddings of classes of positive semidefinite matrices Radu Balan Department of Mathematics, Center for Scientific Computation and Mathematical Modeling and the Norbert Wiener Center for Applied
More informationarxiv: v3 [cs.lg] 22 Nov 2017
End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies Hesham M. Eraqi 1, Mohamed N. Moustafa 1, and Jens Honer 2 arxiv:1710.03804v3 [cs.lg] 22 Nov 2017 1 Department
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationarxiv: v1 [cs.lg] 12 Mar 2017
Jörn-Henrik Jacobsen 1 Edouard Oyallon 2 Stéphane Mallat 2 Arnold W.M. Smeulders 1 arxiv:1703.04140v1 [cs.lg] 12 Mar 2017 Abstract Deep neural network algorithms are difficult to analyze because they lack
More informationTwo at Once: Enhancing Learning and Generalization Capacities via IBN-Net
Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University
More informationStatistics of the Stability Bounds in the Phase Retrieval Problem
Statistics of the Stability Bounds in the Phase Retrieval Problem Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD January 7, 2017 AMS Special Session
More informationarxiv: v3 [cs.ne] 10 Mar 2017
ROBUSTNESS TO ADVERSARIAL EXAMPLES THROUGH AN ENSEMBLE OF SPECIALISTS Mahdieh Abbasi & Christian Gagné Computer Vision and Systems Laboratory, Electrical and Computer Engineering Department Université
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationGENERATIVE MODELING OF CONVOLUTIONAL NEU-
GENERATIVE MODELING OF CONVOLUTIONAL NEU- RAL NETWORKS Jifeng Dai Microsoft Research jifdai@microsoft.com Yang Lu and Ying Nian Wu University of California, Los Angeles {yanglv, ywu}@stat.ucla.edu ABSTRACT
More informationProviding theoretical learning guarantees to Deep Learning Networks arxiv: v1 [cs.lg] 28 Nov 2017
Providing theoretical learning guarantees to Deep Learning Networks arxiv:1711.10292v1 [cs.lg] 28 Nov 2017 Rodrigo F. de Mello 1, Martha Dais Ferreira 1, and Moacir A. Ponti 1 1 University of São Paulo,
More informationAdversarial Examples Generation and Defense Based on Generative Adversarial Network
Adversarial Examples Generation and Defense Based on Generative Adversarial Network Fei Xia (06082760), Ruishan Liu (06119690) December 15, 2016 1 Abstract We propose a novel generative adversarial network
More informationBayesian Networks (Part I)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationConvolutional Neural Networks. Srikumar Ramalingam
Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/
More informationReading Group on Deep Learning Session 2
Reading Group on Deep Learning Session 2 Stephane Lathuiliere & Pablo Mesejo 10 June 2016 1/39 Chapter Structure Introduction. 5.1. Feed-forward Network Functions. 5.2. Network Training. 5.3. Error Backpropagation.
More informationarxiv: v1 [cs.lg] 20 Apr 2017
Softmax GAN Min Lin Qihoo 360 Technology co. ltd Beijing, China, 0087 mavenlin@gmail.com arxiv:704.069v [cs.lg] 0 Apr 07 Abstract Softmax GAN is a novel variant of Generative Adversarial Network (GAN).
More informationMaxout Networks. Hien Quoc Dang
Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine
More informationGlobal Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond
Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This
More informationarxiv: v1 [cs.cv] 1 Oct 2015
A Deep Neural Network ion Pipeline: Pruning, Quantization, Huffman Encoding arxiv:1510.00149v1 [cs.cv] 1 Oct 2015 Song Han 1 Huizi Mao 2 William J. Dally 1 1 Stanford University 2 Tsinghua University {songhan,
More informationarxiv: v3 [stat.ml] 20 Mar 2015
EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy Google Inc., Mountain View, CA {goodfellow,shlens,szegedy}@google.com arxiv:1412.6572v3 [stat.ml] 20
More informationInvariant Scattering Convolution Networks
Invariant Scattering Convolution Networks Joan Bruna and Stephane Mallat Submitted to PAMI, Feb. 2012 Presented by Bo Chen Other important related papers: [1] S. Mallat, A Theory for Multiresolution Signal
More informationCONVOLUTIONAL neural networks(cnns) [2] [4] have
1 Hartley Spectral Pooling for Deep Learning Hao Zhang, Jianwei Ma arxiv:1810.04028v1 [cs.cv] 7 Oct 2018 Abstract In most convolution neural networks (CNNs), downsampling hidden layers is adopted for increasing
More informationTheories of Deep Learning
Theories of Deep Learning Lecture 02 Donoho, Monajemi, Papyan Department of Statistics Stanford Oct. 4, 2017 1 / 50 Stats 385 Fall 2017 2 / 50 Stats 285 Fall 2017 3 / 50 Course info Wed 3:00-4:20 PM in
More informationExpressiveness of Rectifier Networks
Xingyuan Pan Vivek Srikumar The University of Utah, Salt Lake City, UT 84112, USA XPAN@CS.UTAH.EDU SVIVEK@CS.UTAH.EDU From the learning point of view, the choice of an activation function is driven by
More informationConvolutional neural networks
11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:
More informationIntroduction to Deep Learning
Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 1 / 35 Outline 1 Universality of Neural Networks
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationDeep Learning Year in Review 2016: Computer Vision Perspective
Deep Learning Year in Review 2016: Computer Vision Perspective Alex Kalinin, PhD Candidate Bioinformatics @ UMich alxndrkalinin@gmail.com @alxndrkalinin Architectures Summary of CNN architecture development
More informationSVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability Maithra Raghu, 1,2 Justin Gilmer, 1 Jason Yosinski, 3 & Jascha Sohl-Dickstein 1 1 Google Brain 2 Cornell
More informationa) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM
c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of
More informationSajid Anwar, Kyuyeon Hwang and Wonyong Sung
Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr
More informationDeep Convolutional Neural Networks Based on Semi-Discrete Frames
Deep Convolutional Neural Networks Based on Semi-Discrete Frames Thomas Wiatowski and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {withomas, boelcskei}@nari.ee.ethz.ch arxiv:504.05487v
More informationExploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun and Rob Fergus Dept. of Computer Science, Courant Institute, New
More informationSupervised Deep Learning
Supervised Deep Learning Joana Frontera-Pons Grigorios Tsagkatakis Dictionary Learning on Manifolds workshop Nice, September 2017 Supervised Learning Data Labels Model Prediction Spiral Exploiting prior
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationExpressiveness of Rectifier Networks
Xingyuan Pan Vivek Srikumar The University of Utah, Salt Lake City, UT 84112, USA XPAN@CS.UTAH.EDU SVIVEK@CS.UTAH.EDU Abstract Rectified Linear Units (ReLUs have been shown to ameliorate the vanishing
More informationarxiv: v1 [cs.lg] 15 Nov 2017 ABSTRACT
THE BEST DEFENSE IS A GOOD OFFENSE: COUNTERING BLACK BOX ATTACKS BY PREDICTING SLIGHTLY WRONG LABELS Yannic Kilcher Department of Computer Science ETH Zurich yannic.kilcher@inf.ethz.ch Thomas Hofmann Department
More informationDeep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning
Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Convolutional Networks Convolutional
More informationLayer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers
Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers Alexander Binder 1, Grégoire Montavon 2, Sebastian Lapuschkin 3, Klaus-Robert Müller 2,4, and Wojciech Samek 3 1 ISTD
More informationLARGE BATCH TRAINING OF CONVOLUTIONAL NET-
LARGE BATCH TRAINING OF CONVOLUTIONAL NET- WORKS WITH LAYER-WISE ADAPTIVE RATE SCALING Anonymous authors Paper under double-blind review ABSTRACT A common way to speed up training of deep convolutional
More informationOn Modern Deep Learning and Variational Inference
On Modern Deep Learning and Variational Inference Yarin Gal University of Cambridge {yg279,zg201}@cam.ac.uk Zoubin Ghahramani Abstract Bayesian modelling and variational inference are rooted in Bayesian
More informationSeq2Tree: A Tree-Structured Extension of LSTM Network
Seq2Tree: A Tree-Structured Extension of LSTM Network Weicheng Ma Computer Science Department, Boston University 111 Cummington Mall, Boston, MA wcma@bu.edu Kai Cao Cambia Health Solutions kai.cao@cambiahealth.com
More informationarxiv: v1 [cs.cv] 21 Nov 2016
Training Sparse Neural Networks Suraj Srinivas Akshayvarun Subramanya Indian Institute of Science Indian Institute of Science Bangalore, India Bangalore, India surajsrinivas@grads.cds.iisc.ac.in akshayvarun07@gmail.com
More informationModeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks
Modeling Time-Frequency Patterns with LSTM vs Convolutional Architectures for LVCSR Tasks Tara N Sainath, Bo Li Google, Inc New York, NY, USA {tsainath, boboli}@googlecom Abstract Various neural network
More informationarxiv: v6 [stat.ml] 18 Jan 2016
BAYESIAN CONVOLUTIONAL NEURAL NETWORKS WITH BERNOULLI APPROXIMATE VARIATIONAL INFERENCE Yarin Gal & Zoubin Ghahramani University of Cambridge {yg279,zg201}@cam.ac.uk arxiv:1506.02158v6 [stat.ml] 18 Jan
More informationRegularizing Deep Networks Using Efficient Layerwise Adversarial Training
Related Work Many approaches have been proposed to regularize the training procedure of very deep networks. Early stopping and statistical techniques like weight decay are commonly used to prevent overfitting.
More informationANALYSIS ON GRADIENT PROPAGATION IN BATCH NORMALIZED RESIDUAL NETWORKS
ANALYSIS ON GRADIENT PROPAGATION IN BATCH NORMALIZED RESIDUAL NETWORKS Anonymous authors Paper under double-blind review ABSTRACT We conduct mathematical analysis on the effect of batch normalization (BN)
More informationP-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks
P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks Rahul Duggal rahulduggal2608@gmail.com Anubha Gupta anubha@iiitd.ac.in SBILab (http://sbilab.iiitd.edu.in/) Deptt. of
More informationarxiv: v1 [cs.lg] 11 May 2015
Improving neural networks with bunches of neurons modeled by Kumaraswamy units: Preliminary study Jakub M. Tomczak JAKUB.TOMCZAK@PWR.EDU.PL Wrocław University of Technology, wybrzeże Wyspiańskiego 7, 5-37,
More informationRecurrent Residual Learning for Sequence Classification
Recurrent Residual Learning for Sequence Classification Yiren Wang University of Illinois at Urbana-Champaign yiren@illinois.edu Fei ian Microsoft Research fetia@microsoft.com Abstract In this paper, we
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationarxiv: v2 [cs.cv] 30 May 2015
Deep Roto-Translation Scattering for Object Classification Edouard Oyallon and Stéphane Mallat Département Informatique, Ecole Normale Supérieure 45 rue d Ulm, 75005 Paris edouard.oyallon@ens.fr arxiv:1412.8659v2
More informationAccelerating Convolutional Neural Networks by Group-wise 2D-filter Pruning
Accelerating Convolutional Neural Networks by Group-wise D-filter Pruning Niange Yu Department of Computer Sicence and Technology Tsinghua University Beijing 0008, China yng@mails.tsinghua.edu.cn Shi Qiu
More informationarxiv: v1 [cs.lg] 17 Dec 2017
Deep Neural Networks as 0-1 Mixed Integer Linear Programs: A Feasibility Study Matteo Fischetti and Jason Jo + Department of Information Engineering (DEI), University of Padova matteo.fischetti@unipd.it
More informationManitest: Are classifiers really invariant?
FAWZI, FROSSARD: MANITEST: ARE CLASSIFIERS REALLY INVARIANT? Manitest: Are classifiers really invariant? Alhussein Fawzi alhussein.fawzi@epfl.ch Pascal Frossard pascal.frossard@epfl.ch Signal Processing
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationImage classification via Scattering Transform
1 Image classification via Scattering Transform Edouard Oyallon under the supervision of Stéphane Mallat following the work of Laurent Sifre, Joan Bruna, Classification of signals Let n>0, (X, Y ) 2 R
More informationarxiv: v1 [cs.cv] 21 Oct 2016
RODNER ET AL.: SENSITIVITY ANALYSIS OF CNNS 1 arxiv:161.6756v1 [cs.cv] 21 Oct 16 Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches Erik Rodner
More informationDeep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks
IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. X, XXXX 2016 1 Deep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks Mohammad Javad Shafiee, Student Member, IEEE, Akshaya Mishra, Member,
More informationRobustness of classifiers: from adversarial to random noise
Robustness of classifiers: from adversarial to random noise Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard École Polytechnique Fédérale de Lausanne Lausanne, Switzerland {alhussein.fawzi,
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Deep Learning Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein, Pieter Abbeel, Anca Dragan for CS188 Intro to AI at UC Berkeley.
More informationarxiv: v4 [cs.cv] 6 Sep 2017
Deep Pyramidal Residual Networks Dongyoon Han EE, KAIST dyhan@kaist.ac.kr Jiwhan Kim EE, KAIST jhkim89@kaist.ac.kr Junmo Kim EE, KAIST junmo.kim@kaist.ac.kr arxiv:1610.02915v4 [cs.cv] 6 Sep 2017 Abstract
More informationDeep Learning for Automatic Speech Recognition Part II
Deep Learning for Automatic Speech Recognition Part II Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief revisit of sampling, pitch/formant and MFCC DNN-HMM
More informationAdversarially Robust Optimization and Generalization
Adversarially Robust Optimization and Generalization Ludwig Schmidt MIT UC Berkeley Based on joint works with Logan ngstrom (MIT), Aleksander Madry (MIT), Aleksandar Makelov (MIT), Dimitris Tsipras (MIT),
More informationarxiv: v3 [cs.lg] 21 May 2014
Spectral Networks and Deep Locally Connected Networks on Graphs arxiv:1312.6203v3 [cs.lg] 21 May 2014 Joan Bruna New York University bruna@cims.nyu.edu Arthur Szlam The City College of New York aszlam@ccny.cuny.edu
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationGlobal Optimality in Matrix and Tensor Factorizations, Deep Learning and More
Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Ben Haeffele and René Vidal Center for Imaging Science Institute for Computational Medicine Learning Deep Image Feature Hierarchies
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationarxiv: v1 [cs.lg] 4 Dec 2017
Adaptive Quantization for Deep Neural Network Yiren Zhou 1, Seyed-Mohsen Moosavi-Dezfooli, Ngai-Man Cheung 1, Pascal Frossard 1 Singapore University of Technology and Design (SUTD) École Polytechnique
More informationDeep learning attracts lots of attention.
Deep Learning Deep learning attracts lots of attention. I believe you have seen lots of exciting results before. Deep learning trends at Google. Source: SIGMOD/Jeff Dean Ups and downs of Deep Learning
More informationUnsupervised Feature Learning from Temporal Data
Unsupervised Feature Learning from Temporal Data Rostislav Goroshin 1 goroshin@cims.nyu.edu Joan Bruna 1 bruna@cims.nyu.edu Jonathan Tompson 1 tompson@cims.nyu.edu Arthur Szlam 2 aszlam@ccny.cuny.edu David
More informationUNDERSTANDING LOCAL MINIMA IN NEURAL NET-
UNDERSTANDING LOCAL MINIMA IN NEURAL NET- WORKS BY LOSS SURFACE DECOMPOSITION Anonymous authors Paper under double-blind review ABSTRACT To provide principled ways of designing proper Deep Neural Network
More informationCOMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-
Workshop track - ICLR COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- CURRENT NEURAL NETWORKS Daniel Fojo, Víctor Campos, Xavier Giró-i-Nieto Universitat Politècnica de Catalunya, Barcelona Supercomputing
More informationProvable approximation properties for deep neural networks
We discuss approximation properties of deep neural nets, in the case that the data concentrates near a d-dimensional manifold Γ R m. Our network essentially computes wavelet functions, which are computed
More informationGroup Equivariant Convolutional Networks. Taco Cohen
Group Equivariant Convolutional Networks Taco Cohen Outline 1. Symmetry & Deep Learning: Statistical Power from Symmetry Invariance vs Equivariance Equivariance in Deep Learning 2.Group Theory Symmetry,
More information