Convolutional Neural Network Architecture

Similar documents
Deep Residual. Variations

Convolutional neural networks

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Jakub Hajic Artificial Intelligence Seminar I

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Neural Networks 2. 2 Receptive fields and dealing with image inputs

CSCI 315: Artificial Intelligence through Deep Learning

Is Robustness the Cost of Accuracy? A Comprehensive Study on the Robustness of 18 Deep Image Classification Models

Binary Convolutional Neural Network on RRAM

Introduction to Convolutional Neural Networks (CNNs)

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CSC321 Lecture 16: ResNets and Attention

Introduction to (Convolutional) Neural Networks

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Understanding How ConvNets See

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Deep Learning (CNNs)

ECE521 Lectures 9 Fully Connected Neural Networks

Convolutional Neural Networks

Neural networks COMS 4771

Lecture 17: Neural Networks and Deep Learning

Spatial Transformer Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Lecture 7 Convolutional Neural Networks

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Training Neural Networks Practical Issues

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

CSE 591: Introduction to Deep Learning in Visual Computing. - Parag S. Chandakkar - Instructors: Dr. Baoxin Li and Ragav Venkatesan

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Convolutional Neural Networks. Srikumar Ramalingam

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Spatial Transformation

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Neural Architectures for Image, Language, and Speech Processing

Introduction to Machine Learning Spring 2018 Note Neural Networks

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Deep learning attracts lots of attention.

Normalization Techniques in Training of Deep Neural Networks

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01

Ch.6 Deep Feedforward Networks (2/3)

Recurrent Neural Networks

Deep Learning. Hung-yi Lee 李宏毅

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Introduction to CNN and PyTorch

Introduction to Deep Learning CMPT 733. Steven Bergner

Introduction to Machine Learning (67577)

arxiv: v2 [cs.cv] 21 Oct 2018

CONVOLUTIONAL neural networks [18] have contributed

TYPES OF MODEL COMPRESSION. Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad

CSC321 Lecture 20: Reversible and Autoregressive Models

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT

arxiv: v4 [cs.cv] 6 Sep 2017

Deep Learning Year in Review 2016: Computer Vision Perspective

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK

Internal Covariate Shift Batch Normalization Implementation Experiments. Batch Normalization. Devin Willmott. University of Kentucky.

arxiv: v2 [cs.cv] 12 Apr 2016

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

PRUNING CONVOLUTIONAL NEURAL NETWORKS. Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY

From perceptrons to word embeddings. Simon Šuster University of Groningen

ANALYSIS ON GRADIENT PROPAGATION IN BATCH NORMALIZED RESIDUAL NETWORKS

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Intro to Neural Networks and Deep Learning

A practical theory for designing very deep convolutional neural networks. Xudong Cao.

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Understanding CNNs using visualisation and transformation analysis

Nonparametric regression using deep neural networks with ReLU activation function

Cheng Soon Ong & Christian Walder. Canberra February June 2018

arxiv: v1 [cs.cv] 11 May 2015 Abstract

Outline. CSCI567 Machine Learning (Spring 2019) Outline. Math formulation. Prof. Victor Adamchik. Feb. 12, 2019

Topic 3: Neural Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

A summary of Deep Learning without Poor Local Minima

Learning Neural Networks

Bayesian Networks (Part I)

Artificial Neural Networks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Maxout Networks. Hien Quoc Dang

Introduction to Deep Learning

Recurrent Neural Network

Convolution and Pooling as an Infinitely Strong Prior

Advanced Machine Learning

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Deep learning on 3D geometries. Hope Yao Design Informatics Lab Department of Mechanical and Aerospace Engineering

IN recent years, convolutional neural networks (CNNs) have

Bits of Machine Learning Part 1: Supervised Learning

STA 414/2104: Lecture 8

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

arxiv: v1 [cs.cv] 18 May 2018

4. Multilayer Perceptrons

Transcription:

Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55

Outline 1 Introduction of Convolution Motivation Operation Properties From FC Layer to CONV Layer 2 CNN Architecture The Benchmark Dataset Go Deeper Go Wider Information Flow 3 Summary Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 2 / 55

Outline ResNeXt, MultiResNet, FractalNet, Go wider CONV layer NN CNN Go deeper AlexNet, VGGNet, GoogleNet, ResNet Information Flow DenseNet, CliqueNet (Our work) Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 3 / 55

Outline 1 Introduction of Convolution Motivation Operation Properties From FC Layer to CONV Layer 2 CNN Architecture The Benchmark Dataset Go Deeper Go Wider Information Flow 3 Summary Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 4 / 55

Introduction of Convolution Motivation In the 1960s, scientists studied the cat s visual cortex cells and found that each visual neuron processes only a small area of the visual image, the receptive field. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 5 / 55

Outline 1 Introduction of Convolution Motivation Operation Properties From FC Layer to CONV Layer 2 CNN Architecture The Benchmark Dataset Go Deeper Go Wider Information Flow 3 Summary Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 6 / 55

Introduction of Convolution Operation We first take a input with single channel as an example. P W P P 0 0 0 0 0 w W 0 0 H 0 0 h H 0 0 P 0 0 0 0 0 Input Convolution Kernel Output (S = 1) Here are some parameters:h, W, P, S, h, w, H, W. The relations between these parameters: H = (H + 2P h)/s + 1 W = (W + 2P w)/s + 1 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 7 / 55

Introduction of Convolution Operation P W P P 0 0 0 0 0 w W 0 0 H 0 0 h H 0 0 P 0 0 0 0 0 Input Convolution Kernel Output (S = 1) Suppose we use X to represent the input matrix, W for the convolution kernel, X for the output matrix. The convolution operation is computed by the following formula ( represents convolution operation): X (m, n) = X W = h w X (m + i, n + j)w (i, j). (1) i=1 j=1 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 8 / 55

Introduction of Convolution Operation In general, the input is not a matrix but a 3-D tensor, X R H W C i, the kernel becomes to a 4-D tensor W R h w C i C o, and the output is also a 3-D tensor X R H W C o. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 9 / 55

Outline 1 Introduction of Convolution Motivation Operation Properties From FC Layer to CONV Layer 2 CNN Architecture The Benchmark Dataset Go Deeper Go Wider Information Flow 3 Summary Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 10 / 55

Introduction of Convolution Properties In image processing, the convolution kernel is also is called a filter. It is important to note that filters acts as feature detectors from the original input image. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 11 / 55

Introduction of Convolution Properties In signal processing, convolution has a strong connection with Fourier transform. Theorem Convolution Theorem: X Y = F 1 (F(X ) F(Y )), (2) where F represents Fourier transform and F 1 represents inverse Fourier transform, denotes element-wise product. Convolution in the spatial domain are equivalent to element-wise products in the Fourier domain. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 12 / 55

Introduction of Convolution Properties Because the computation complexities of FFT and IFFT are both O(HW log(hw )), directly computation of convolution requires O(HWhw), we can use FFT and IFFT to reduce the number of parameters and accelerate the forward propagation. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 13 / 55

Outline 1 Introduction of Convolution Motivation Operation Properties From FC Layer to CONV Layer 2 CNN Architecture The Benchmark Dataset Go Deeper Go Wider Information Flow 3 Summary Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 14 / 55

Introduction of Convolution From FC Layer to CONV Layer Items FC layer CONV layer Input vector 3-D tensor Weight matrix 4-D tensor Output vector 3-D tensor Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 15 / 55

Introduction of Convolution From FC Layer to CONV Layer 1 2 3 4 5 6 7 8 9 a b c d = I J K L 1 2 3 4 5 6 7 8 9 Figure: Convent CONV Layer to FC Layer Comparing with FC layer, CONV Layer has 2 properties: Local Connectivity. Weight sharing. Both FC layer and CONV layer are linear transformations. Due to the convolution operator, CONV layer can preserve the spatial information. I J K L Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 16 / 55

Introduction of Convolution From FC Layer to CONV Layer Strong Representation and Predictive Power The toy models are run on MNIST dataset, which contains 60,000 training examples and each example is a 28 28 pixels image of digit from 0 to 9. Items Setting 1 Setting 2 Hidden layer 1 CONV (5,5,1,10) FC (784,1440) Number of parameter of H 1 250 1,128,960 Hidden layer 2 FC (1440,10) FC (1440,10) Number of parameter of H 2 14,400 14,400 The minimum of training loss 0.0038 0.0107 The maximum of test acc. 98.32% 97.25% Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 17 / 55

Introduction of Convolution From FC Layer to CONV Layer Strong Representation and Predictive Power Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 18 / 55

Outline 1 Introduction of Convolution Motivation Operation Properties From FC Layer to CONV Layer 2 CNN Architecture The Benchmark Dataset Go Deeper Go Wider Information Flow 3 Summary Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 19 / 55

The Benchmark Dataset ImageNet Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 20 / 55

The Benchmark Dataset ImageNet difficulty Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 21 / 55

Outline 1 Introduction of Convolution Motivation Operation Properties From FC Layer to CONV Layer 2 CNN Architecture The Benchmark Dataset Go Deeper Go Wider Information Flow 3 Summary Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 22 / 55

Go Deeper: Overview Revolution of Depth 152 layers 25.8 28.2 16.4 22 layers 19 layers 6.7 7.3 11.7 3.57 8 layers 8 layers shallow ILSVRC'15 ResNet ILSVRC'14 GoogleNet ILSVRC'14 VGG ILSVRC'13 ZFNet ILSVRC'12 AlexNet ImageNet Classification top-5 error (%) ILSVRC'11 ILSVRC'10 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 23 / 55

Go Deeper Vanishing gradient problem the vanishing gradient problem is a difficulty found in training neural networks with gradient-based learning methods and backpropagation. We take backpropagation as an example: Denote an L-layer feedforward neural network by W X Σ 1 W 0 2 Σ1 Σ2 WL Σ L Y, where Σ i = σ(σ i 1 W i ), i = 1,..., L 1, Σ L = g(σ L 1 W L ). σ( ): activation, sigmoid, Relu... g( ): transform function of the last layer, e.g., identity for regression, softmax for classification. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 24 / 55

Go Deeper Vanishing gradient problem We define f (W) is the loss on W. After choosing a proper loss function l, Using backpropagation we can get the gradients for all weights: WL f = Σ T L 1 Φ 1, Φ 1 = g (Σ L 1 W L 1 ) l (Σ L ) WL 1 f = Σ T L 2 Φ 2, Φ 2 = σ (Σ L 2 W L 2 ) (Φ 1 W T L ) WL 2 f = Σ T L 3 Φ 3, Φ 3 = σ (Σ L 3 W L 3 ) (Φ 2 W T L 1 ) where denotes element-wise product. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 25 / 55

Go Deeper Vanishing gradient problem When we use sigmoid function as an activation function, σ(x) = 1, σ(x) (0, 1) 1 + e x. Consider the first order derivative of the sigmoid function: σ (x) = σ(x)(1 σ(x)), σ (x) 1 4. When we use Relu function as an activation function, { σ 1, if x 0, (x) = 0, otherwise. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 26 / 55

Go Deeper AlexNet Feature extraction Classification Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 27 / 55

Go Deeper VGGNet Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 28 / 55

Go Deeper VGGNet Main contribution: The use of only 3 3 sized filters is quite different from AlexNets 11 11 filters in the first layer and ZF Nets 7 7 filters. This idea is widely applied by later CNN architectures such as ResNet, DenseNet and so on. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 29 / 55

Go Deeper GoogleNet Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 30 / 55

Go Deeper GoogleNet:Inception Main contribution: 1. GoogLeNet was one of the first models that introduced the idea that CNN layers didnt always have to be stacked up sequentially. Inception block contains parallel paths. 2. Assistant loss functions in middle layers helps to learn more efficiently. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 31 / 55

Go Deeper ResNet: Overview Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 32 / 55

Go Deeper ResNet: Motivation 20 20 training error (%) 10 56-layer 20-layer test error (%) 10 56-layer 20-layer 0 0 1 2 3 4 5 6 iter. (1e4) 0 0 1 2 3 4 5 6 iter. (1e4) Figure: Training error (left) and test error (right) on CIFAR-10 with 20-layer and 56-layer plain networks. The degradation (of training accuracy) indicates that not all systems are similarly easy to optimize. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 33 / 55

Go Deeper ResNet: Conclusion & Solution Conclusion: Identity map may be difficult to learn when networks become deeper. Solution: Design a residual block with skip identity map. x F(x) weight layer relu weight layer F(x) + x relu x identity X i+1 = σ(f(x i, W i ) + X i ). (3) Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 34 / 55

Go Deeper ResNet: Improvement 60 60 50 50 error (%) 40 34-layer 30 plain-18 18-layer plain-34 20 0 10 20 30 40 50 iter. (1e4) error (%) 40 18-layer 30 ResNet-18 ResNet-34 34-layer 20 0 10 20 30 40 50 iter. (1e4) Figure: Left: plain networks of 18 and 34 layers. Right: ResNets of 18 and 34 layers. In this plot, the residual networks have no extra parameter compared to their plain counterparts. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 35 / 55

Go Deeper ResNet: Main contribution Importance of identity skip connection For simplicity, we do not consider the effect of activation function. X i+1 = X i + F(X i, W i ), (4) X i+2 = X i+1 +F(X i+1, W i+1 ) = X i +F(X i, W i )+F(X i+1, W i+1 ). (5) From the two equations, we can get the recursive form: L 1 X L = X k + F(X i, W i ). (6) i=k Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 36 / 55

Go Deeper ResNet: Main contribution Importance of identity skip connection L 1 X L = X k + F(X i, W i ). i=k We compute the gradient of kth layer: l X k = l l L 1 F(X i, W i ) (1 + )). (7) X L X k i=k The additive term of X L ensures that information is directly propagated back to any shallower layer k. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 37 / 55

Go Deeper ResNet: Main contribution Importance of identity skip connection Lets consider a simple modification to break the identity shortcut: X i+1 = λ i X i + F(X i, W i ), (8) L 1 X L = X k i=k the gradient of kth layer becomes to: l X k = l L 1 ( X L i=k L 1 λ i + F(X i, W i ). (9) i=k L 1 λ i + i=k F(X i, W i ) X k )). (10) For an extremely deep network (L is large), if λ i > 1 for all i, this factor can be exponentially large; if λ i < 1 for all i, this factor can be small and vanish. (information flow was broken) Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 38 / 55

Go Deeper ResNet: Another inference Ensemble X 3 = X 2 + f 3 (X 2 ) (11) X 3 = X 1 + f 2 (X 1 ) + f 3 (X 1 + f 2 (X 1 )) (12) X 3 = X 0 + f 1 (X 0 ) + f 2 (X 0 + f 1 (X 0 )) + f 3 (X 0 + f 1 (X 0 ) + f 2 (X 0 + f 1 (X 0 ))) If f 1, f 2, f 3 are linear operators: X 3 = X 0 +f 1 (X 0 )+f 2 (X 0 )+f 2 f 1 (X 0 )+f 3 (X 0 )+f 3 f 1 (X 0 )+f 3 f 2 (X 0 )+f 3 f 2 f 1 (X 0 ) Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 39 / 55

Go Deeper ResNet: Another inference Ensemble Distribution of path lengths follows a Binomial distribution. 54 blocks in total, more than 95% of paths go through 19 to 35 modules. (ResNet-110) Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 40 / 55

Go Deeper ResNet: Another inference Ensemble The effective paths in residual networks are relatively shallow The gradient of the input layer mostly pass by the shallow path. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 41 / 55

Outline 1 Introduction of Convolution Motivation Operation Properties From FC Layer to CONV Layer 2 CNN Architecture The Benchmark Dataset Go Deeper Go Wider Information Flow 3 Summary Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 42 / 55

Go Wider Motivated by ResNet, the effective paths in residual networks are relatively shallow. Researchers tried to shallow and widen the networks and achieved better results. ResNeXt Multi-ResNet FractalNet Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 43 / 55

Outline 1 Introduction of Convolution Motivation Operation Properties From FC Layer to CONV Layer 2 CNN Architecture The Benchmark Dataset Go Deeper Go Wider Information Flow 3 Summary Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 44 / 55

Information Flow DenseNet Motivated by ResNet, identity skip connections in ResNet are very important. Because of identity skip connections, the signal can be directly propagated from any unit to another, both forward and backward. X l = H l ([X 0, X 1,..., X l 1 ]) (13) Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 45 / 55

Information Flow DenseNet This introduces (L 1)L 2 connections in an L-layer network. Problems: 1. the number of connections grows quadratically with depth. 2. the number of input feature maps grows linearly with depth. Solutions: Add blocks and transition layers. Transition layer Transition layer Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 46 / 55

Information Flow CliqueNet (our work) Motivated by the success of ResNet and DenseNet, we add more identity skip connections to maximize the information flow of the networks. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 47 / 55

Information Flow CliqueNet (our work) Stage-I: X (1) i = σ( W li X (1) l ) (14) 0 l<i Stage-II (k 2): = σ( X (k) i 0<l<i W li X (k) l + i<m L W mi X (k) m ) (15) Loop Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 48 / 55

Information Flow CliqueNet (our work) Transition Feature Block Feature Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 49 / 55

Information Flow CliqueNet (our work) Comparison of the number of parameters. (DenseNet vs. CliqueNet) We set the number of block equal to 5. Each block contains 5 layers. each layer produces 64 feature maps. 700 The number of input channel per layer of DenseNet 700 The number of input channel per layer of CliqueNet 600 600 500 500 400 400 300 300 200 200 100 100 0 0 5 10 15 20 25 30 0 0 5 10 15 20 25 30 Under the same conditions, the number of parameters of DenseNet / the number of parameters of CliqueNet is about 0.95. When the number of Block becomes larger, the number of CliqueNet will be less than the number of DenseNet. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 50 / 55

Information Flow CliqueNet (our work) Figure: Visualization of the weights in the first block in pretrained DenseNet (left) and CliqueNet (right) by calculating the average absolute value of W ij. From the visualization, CliqueNet s parameter efficiency is better than DenseNet s. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 51 / 55

Information Flow CliqueNet (our work) Results on CIFAR-10, CIFAR-100 and SVHN Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 52 / 55

Information Flow CliqueNet (our work) Results on ImageNet Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 53 / 55

Information Flow CliqueNet (our work) Our contribution: 1. To maximize the information flow, we use fully connected graph. 2. We are the first to propose a unfolded loop block structure. 3. We use multi-scale features as an input of loss function and 2 kind of features are propagated in our network. Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 54 / 55

Summary Local connected Weight sharing More powerful in Image Feature extraction ResNeXt, MultiResNet, FractalNet, Go wider CONV layer NN CNN Go deeper AlexNet, VGGNet, GoogleNet, ResNet The most important part: Skip identity connection Design criterion: Directed computation graph without loop Information Flow DenseNet, CliqueNet (Our work) Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 55 / 55