A practical theory for designing very deep convolutional neural networks. Xudong Cao.
|
|
- Carmel Glenn
- 6 years ago
- Views:
Transcription
1 A practical theory for designing very deep convolutional neural networks Xudong Cao Abstract Going deep is essential for deep learning. However it is not easy, there are many ways of going deep but most of them are ineffective. In this work, we propose two novel constrains in the design of deep structure to guarantee the performance gain when going deep. Firstly, for each convolutional layer, its capacity of learning more complex patterns should be guaranteed; Secondly, the receptive field of the topmost layer should be no larger than the image region. Given these two constrains, we cast the task of designing deep convolutional neural network into a constrained optimization problem. We present an analytic optimal solution under certain conditions. 1. Introduction Going deep has demonstrated a great success in the ImageNet competition Both Google and VGG won the competition using very deep convolutional neural networks. Going deep greatly improves the learning/fitting capacity of the entire network while only increase model size and computational cost linearly. Although it has become a common sense to going deep, it is still unclear how to design a very deep convolution neural network effectively. Most times arbitrarily adding more layers does not help, or even worsen the performance. In this work, we propose a practical theory for designing very deep convolutional neural network effectively. As shown in Figure 1, we divide a convolutional neural network into two levels i.e. classifier and feature levels, we focus on designing a very deep architecture in the feature level while use a fixed simple design in the classifier level (will describe in the end of this section). We cast the design of deep convolutional neural network into a constrained optimization problem. The objective is maximizing the depth of the target convolutional neural network, subjecting to two constraints: (1) the c-value of each layer should not be too small, c-value is a metric for measuring the capacity of learning more complex patterns; (2) the receptive field of the topmost convolutional layer in the feature-level should no larger than image size. We proof Classifier Level Feature Level Small Feature Map Large Conv Kernel Identical for All Nets Our theory will provide a CNN design Figure 1. We divide a convolutional neural network into two levels, i.e. classifier level and feature level. In this work, we focus on designing very deep convolutional neural network in the feature level, while using a fixed simple classifier-level design for all networks. that there is a close form solution for this constrained optimization problem under certain conditions. The convolutional neural networks we designed achieved state-of-theart performance in many tasks, including cifar10/cifar100 classification, kaggle plankton classification and face recognition/verification 1. It is worth noting that our theory is task independent. The architecture of the designed convolutional neural network only depends on the raw image size and the fixed filter size 2. This fact could become less surprising if we consider that some recent works achieve good results on various tasks using VGGNet and GooleNet. It implies that we could design good architecture regardless the specific tasks. Of course, we believe better architectures can be designed by incorporating task specific knowledge, such as the cyclic pooling and rolling network designed by the first prize winner of the national data science bowl. Before directly dive into detailed mathematical formulation, we firstly describe the two novel constraints and the underlined intuitions in the subsequent sections, then we present the formal mathematical formulations as well as the 1 I will detail those experiments in a formal academic paper. For this tech report, I mainly describe the methods and the underlined intuitions. 2 In this work we use one kind of filter (size) in the designing of convolutional neural network for clarity and simplicity. It is possible to remove this restriction and extend the ideas to more general cases. 1
2 Pattern A Pattern B Pattern Response down sample stride 2 down sample stride 2 (a) Figure 2. An illustration of learning capacity. The exemplar (a) has better learning capacity comparing to the exemplar (b). The blue rectangle represent the convolutional filter/kernel. (b) 8 conv layers 3-by-3 kernel 2: 2: 16 4 conv layers 3-by-3 kernel 20: 4: 32 4 conv layers 3-by-3 kernel 40: 8: 64 receptive field size Figure 3. An illustration of receptive field size. The receptive field sizes of all convolutional layers are represented in matlab-style arrays for clarity. Fror example, 40: 8: 64 represents [40, 48, 56, 64]. The size of input image is 64-by-64. We subtract an annoying small constant 1 when calculating the receptive field sizes in order to make the description and subsequent derivation more concise. close form solution under certain conditions. The design of classifier level. The common design in classifier level is two fully connected layers with dropout on each of them. However we found this design is prone to over-fitting if the training set is not sufficiently large. We empirically found that it is better to down sample the input feature map to a small size (6x6, 7x7 or 8x8), and then apply two 5x5 convolutional layers, finally max pool the feature map into a vector and dropout this vector. As kernel size is very large relative to the feature map, herein the convolutional layers are more like fully connected layers. One perspective of understanding this design is it conduct dense sliding window test (used in VGG s work) and aggregate the final results by max pooling. This design is inspired by Network in Network and GoogleNet. The difference is that we found it is better to use large convolutional kernels relative to the feature map and replace the average pooling with max pooling. 2. Capacity of learning the first constraint The functionality of a convolutional layer is composing more complex patterns from input patterns. As shown in Figure 2(a), given the responses of pattern A and B and their spatial relationship, a convolutional layer can form/detect a more complex pattern AB. However, a convolutional layer does not always has the capacity of learning more complex patterns. In Figure 2(b), we show a simple case when a convolutional layer fail to learn a more complex pattern. In this case, the filter size is smaller than the distance between the responses of pattern A and B, in other words, it can not detect both pattern A and B as well as their spatial relationship, therefore fail to learn the more complex pattern AB. To regain the capacity of learning, we can either use larger convolutional kernel, or cut the responses distance by half via down sampling with stride 2. In this work, we only s- tudy regaining the learning capacity via down sampling. The odds of the learning failure grows, as a convolutional neural network goes deep without down sampling, because the sizes of the detected patterns and their meaningful s- patial relationships grows layer by layer. Once the spatial relationships of the majority input patterns exceed the filter size of a convolutional layer, this convolutional layer will lose its capacity of learning more complex patterns. To quantitatively measure the learning capacity of a convolutional layer we define the c-value of a convolutional layer as follows. c-value = Real Filter Size Receptive Field Size where the real filter size of a k-by-k convolutional layer is k if there is no down sampling, it doubles after each down sampling i.e. 2k after one down sampling and 4k after two down sampling etc. The receptive field size is defined as the maximum size of a neuron can see on the raw image. It grows proportionally as the convolutional neural network goes deep. Figure 3 shows how the receptive fields grows in an exemplar convolutional neural network. It is worth noting that we use the receptive field size instead of the typical size of the spatial relationships of the input patterns. Because the later is not measurable, even though it is conceptually more accurate. In the definition, we implicitly assume that the receptive field size of a convolutional layer is proportional to the typical size of the spatial relationships of the input patterns. Provided the discussions above, we are ready to present the the first constraint for designing very deep convolutional neural network: the c-value of each convolutional layer should be larger than a minimum value t. We empirically found t = 1/6 is a good lower bound of c-value for all convolutional layers in various tasks. (1)
3 3. Necessity of learning the second constraint As the receptive field grows, new and more complex patterns are constantly emerging. In the process, we need additional layers to learn the newly emerged patterns as the receptive field grows. However when the receptive field reach the image size i.e. neurons have seen the entire image region, it stops emerging new and more complex patterns, the driven force for adding new layers no longer exists. Empirically, we found adding many layers after the saturation of receptive field does not help the performance in general, what is worse, it increase the risk of over-fitting and hurt the performance in most cases. Base the analysis above and our empirical study, we propose the second constraint for designing very deep convolutional neural network. The receptive field size of the topmost convolutional layer should be no larger than the image size. This constraint implies that the receptive field of the topmost convolutional layer must be around the entire image region. If the receptive field of the topmost layer is much smaller the image region, we can add one more layer to improve our objective (i.e. increase the depth) without violating the constraint. From another perspective, if the receptive field is much smaller the image size, the network will lose the opportunity to see and learn the high-level patterns/features, which is suboptimal to the performance. For some tasks, there is no meaningful or discriminative patterns/features in a certain range of receptive field, thus no necessity of using many layers for learning i.e. maximizing the depth of entire network is no longer valid. We will have more discussions about this topic in the later part of this paper. 4. The mathematical formulation In this section, we formulate the design of deep architecture into a constrained optimization problem, and then we present and prove an optimal solution under certain conditions. As aforementioned, we assume the filter sizes are the same for all layers, we do not take the filter numbers into consideration, because it is task and data set dependent The formulation Herein we define the notations of the input parameters. They are the image size z, the filter size k and the minimum c-value t. The architecture of a deep model can determined by the total number of stages n and the number of layers in various stages {a i }. Various stages are divided by a down sampling (stride 2). For example, n = 3 and a 1, a 2, a 3 = 4, 3, 2 represent a model with 3 stages, the number of layers in the first, second and the third stage are 4, 3, 2 respectively. In between are down sampling with stride 2. The goal of going deep essentially is to maximize the total number of layers i.e. i a i, given the two constrains proposed in the previous two sections. The first constrain requires the c-values of all layers are no smaller than the minimum c-value t. As the receptive field keep growing, and the real filter size in one stage stay the same, the c-value of the last layer in one stage is the smallest. Therefore the first constrain is equivalent to ensuring the c-value of the last layer in each stage no smaller than the minimum c-value t, which can be translated into a set of inequations, 2 l k l 2i 1 (k 1)a i t, where l = 1, 2,..., n (2) where 2 l k is the real filter size at l-th stage, l 2i 1 (k 1)a i is the receptive field 3 of the last layer at l-th stage, t is the minimum c-value, we set t = 1/6 for all tasks. The second constrain requires the receptive field of the topmost of convolutional layer is no larger than the entire image region. It can be formally represented as follows. 2 i 1 (k 1)a i z (3) i where the left term is the receptive field of the topmost convolutional layer, 2 i 1 (k 1) is the receptive field increment of a layer at i-th stage, 2 i 1 (k 1)a i is the total receptive field increment at i-th stage. The objective function of our formulation can be formally represented by maximizing the total number of layers, subjecting to the two constraints in Equations (2) and (3). With slightly transforming the two constraints, we achieve our final formulation as follows. i max n,{a i} n a i (4) i a i 2 i 1 2l k where l = 1, 2,..., n (5) t(k 1) a i 2 i 1 z (6) k 1 where both n and {a i } are integers. We simplify the forms of the first and second constrains by equivalent transforms to facilitate the later discussions. 3 We omit the small constant 1 in calculating the receptive field sizes to simplify the description and subsequent derivation.
4 net without augment net with augment 32x32 input image 24x24 input image conv11, 5x5, 32x32, 128 conv11, 5x5, 24x24, 192 conv12, 5x5, 32x32, 128 conv12, 5x5, 24x24, 192 conv13, 5x5, 32x32, 128 conv13, 5x5, 24x24, 192 conv14, 5x5, 32x32, 128 conv14, 5x5, 24x24, 192 conv15, 5x5, 32x32, 128 conv15, 5x5, 24x24, 192 conv16, 5x5, 32x32, 128 conv16, 5x5, 24x24, 192 max pool stride 2 conv21, 5x5, 16x16, 128 max pool stride 2 max pool stride 4 conv31, 5x5, 8x8, 128 conv21, 5x5, 6x6, 192 conv32, 5x5, 8x8, 64 conv22, 5x5, 6x6, 64 max pool stride 8 max pool stride 6 Table 1. Networks designed using 5x5 filter size for cifar10 and cifar100 data. The description conv15, 5x5, 32x32, 128, from left to right, represents layer name, filter size, feature map size and filter number The optimal solution under certain conditions In this section, we show that the optimal solution of our formulation can be found under certain conditions. Assume that the image size z = 2 m 1 k/t, and the layer numbers {a i } are relaxed from integers to positive real numbers, we can show that the optimal solution to our objective function is that a 1 = n = m (7) k (k 1)t, a 2 =... = a n = 1/2a 1 (8) Due to the limit of time, we just roughly sketch the proof in Section 6. Discussion about the optimal solution. Although this optimal solution is obtained under certain conditions, it provides great insights about how to design effective deep architecture under general conditions. First, it guides how many times of down samplings should we choose given the input parameters. Second, it shows the number of layers should be as evenly distributed as possible in all stages, except the first stage. Third, it shows the maximum depth could be achieved by various filter size, based on which we can make better tradeoff between various filter sizes. net without augment net with augment 32x32 input image 24x24 input image conv11, 3x3, 32x32, 192 conv11, 3x3, 24x24, 256 conv12, 3x3, 32x32, 192 conv12, 3x3, 24x24, 256 conv13, 3x3, 32x32, 192 conv13, 3x3, 24x24, 256 conv14, 3x3, 32x32, 192 conv14, 3x3, 24x24, 256 conv15, 3x3, 32x32, 192 conv15, 3x3, 24x24, 256 conv16, 3x3, 32x32, 192 conv16, 3x3, 24x24, 256 conv17, 3x3, 32x32, 192 conv17, 3x3, 24x24, 256 conv18, 3x3, 32x32, 192 conv18, 3x3, 24x24, 256 max pool stride 2 max pool stride 2 conv21, 3x3, 16x16, 192 conv21, 3x3, 16x16, 256 conv22, 3x3, 16x16, 192 conv22, 3x3, 16x16, 256 conv23, 3x3, 16x16, 192 conv24, 3x3, 16x16, 192 max pool stride 2 max pool stride 4 conv31, 5x5, 8x8, 128 conv31, 5x5, 6x6, 192 conv32, 5x5, 8x8, 64 conv32, 5x5, 6x6, 64 max pool stride 8 max pool stride 6 Table 2. Networks designed using 3x3 filter size for cifar10 and cifar100 data. 5. Some exemplar networks Herein we show some networks designed according to our theory Networks for cifar10/cifar100 CIFAR10 and CIFAR100 are proposed by Alex. It contains 60,000 tiny color images with 32x32 size. Without data augmentation, the image size feed into the convolutional neural network is 32x32. With Alex s data augmentation, the image size feed into convolutional neural network is 24x24. As our design depends on the input image size and the filter size, we present four networks in Table (2) and (1). We achieve the best performance on cifar10 and cifar100 using the designed networks. On cifar10, our best result is 7.44%/9.02% (7.44% with data augmentation and 9.02% without data augmentation), better than Deeply Supervised Network (7.97%/9.69%), Network in Network (8.81%/10.41%), Maxout Network (9.32%/11.68%). On cifar100, our best result is 33.03% without data augmentation, better than Deeply Supervised Network (34.57%), Network in Network (35.68%) and Maxout Network (38.57%). It is worth noting that fractional max pooling proposed by Dr. Ben achieves much better results using more sophisticated data augmentation. Due to different data augmentation and testing settings, herein we did not directly compare with his work.
5 image size 48x48 image size 112x112 image size 128x128 image size 144x144 filter size 5x5 filter size 2x2 filter size 3x3 filter size 4x4 conv01, 112x112, 5x5, 16 conv01, 128x128, 5x5, 16 conv01, 144x144, 5x5, 16 max pool stride 2 max pool stride 2 max pool stride 2 conv11, 48x48, 5x5, 16 conv11, 56x56, 2x2, 32 conv11, 64x64, 3x3, 32 conv11, 72x72, 4x4, 32 conv12, 48x48, 5x5, 16 conv12, 56x56, 2x2, 32 conv12, 64x64, 3x3, 32 conv12, 72x72, 4x4, 32 conv13, 48x48, 5x5, 32 conv13, 56x56, 2x2, 32 conv13, 64x64, 3x3, 64 conv13, 72x72, 4x4, 64 conv14, 48x48, 5x5, 32 conv14, 56x56, 2x2, 32 conv14, 64x64, 3x3, 64 conv14, 72x72, 4x4, 64 conv15, 48x48, 5x5, 128 conv15, 56x56, 2x2, 64 conv15, 64x64, 3x3, 128 conv15, 72x72, 4x4, 128 conv16, 48x48, 5x5, 128 conv16, 56x56, 2x2, 64 conv16, 64x64, 3x3, 128 conv16, 72x72, 4x4, 128 conv17, 56x56, 2x2, 64 conv17, 64x64, 3x3, 192 conv17, 72x72, 4x4, 128 conv18, 56x56, 2x2, 64 conv18, 64x64, 3x3, 192 conv18, 72x72, 4x4, 128 conv19, 56x56, 2x2, 192 conv110, 56x56, 2x2, 192 conv111, 56x56, 2x2, 192 conv112, 56x56, 2x2, 192 max pool stride 2 max pool stride 2 max pool stride 2 max pool stride 2 conv21, 24x24, 5x5, 256 conv21, 28x28, 2x2, 384 conv21, 32x32, 3x3, 256 conv21, 36x36, 4x4, 256 conv22, 24x24, 5x5, 256 conv22, 28x28, 2x2, 384 conv22, 32x32, 3x3, 256 conv22, 36x36, 4x4, 256 conv23, 24x24, 5x5, 256 conv23, 28x28, 2x2, 384 conv23, 32x32, 3x3, 256 conv23, 36x36, 4x4, 256 conv24, 28x28, 2x2, 384 conv24, 32x32, 3x3, 256 conv24, 36x36, 4x4, 256 conv25, 28x28, 2x2, 384 conv26, 28x28, 2x2, 384 max pool stride 2 max pool stride 2 max pool stride 2 conv31, 14x14, 2x2, 768 conv31, 16x16, 3x3, 512 conv31, 18x18, 4x4, 512 conv32, 14x14, 2x2, 768 conv32, 16x16, 3x3, 512 conv32, 18x18, 4x4, 512 conv33, 14x14, 2x2, 768 conv33, 16x16, 3x3, 512 conv34, 14x14, 2x2, 768 conv34, 16x16, 3x3, 512 conv35, 14x14, 2x2, 768 conv36, 14x14, 2x2, 768 max pool stride 2 conv41, 7x7, 2x2, 1024 conv42, 7x7, 2x2, 1024 max pool stride 4 max pool stride 2 max pool stride 2 max pool stride 3 conv31, 6x6, 5x5, 256 conv51, 3x3, 3x3, 768 conv41, 8x8, 5x5, 512 conv41, 6x6, 5x5, 512 conv32, 6x6, 5x5, 256 conv52, 3x3, 3x3, 768 conv42, 8x8, 5x5, 512 conv42, 6x6, 5x5, 512 Table 3. Networks for national data science bowl Networks for national data science bowl National data science bowl is a kaggle competition. The task is classifying plankton images into 121 pre-defined classes. There are around 30k training data and 130k testing data. To make the designed networks complementary to each other, we design our networks with various input image sizes and filter sizes. In Table (3), we show 4 representative networks designed following our theory. Among them, the third net (image size 128, filter size 3) and the forth net (image size 144, filter size 4) achieved and log loss score on the public leader board. Our final ensemble consists of the four models in Table (3), a VGG-like model and 2 variants of our designs. The 7-model ensemble achieved public leader board score. Combining with Bings score, we final achieved score on the public leader board.
6 6. The sketch of the proof First, the solution satisfies the constrains in the inequations 5 and 6. We can easily verify this by plugging the solution into the inequations. We found that the solution pushes the left term of those inequations exactly equal to the right term. In other words, this solution is at the boundary of the set of feasible solutions. Second, assume there is a solution l and {b 1,..., b l }, and l m. By linearly composing the inequations in 5 and 6, we can always show that m b i (l + 1)C/2 = k where C = (k 1)t. This inequations implies there is no better solution when l m. Third, assume there is a solution l and {b 1,..., b l }, and l m. By linearly composing the inequations in 5 and 6, we can have that m b i + 2 i=m+1 a i b i (l + 1)C/2 By reorganize the the left term, we have b i + i=m+1 b i (l + 1)C/2 = m From the inequations above we tell that there is no better solution when l > m. References a i
Convolutional neural networks
11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:
More informationConvolutional Neural Network Architecture
Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationRAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY
RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY SIFT HOG ALL ABOUT THE FEATURES DAISY GABOR AlexNet GoogleNet CONVOLUTIONAL NEURAL NETWORKS VGG-19 ResNet FEATURES COMES FROM DATA
More informationConvolution and Pooling as an Infinitely Strong Prior
Convolution and Pooling as an Infinitely Strong Prior Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Convolutional
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationLecture 7 Convolutional Neural Networks
Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:
More informationMaxout Networks. Hien Quoc Dang
Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationDeep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning
Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Convolutional Networks Convolutional
More informationTasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based
UNDERSTANDING CNN ADAS Tasks Self Driving Localizati on Perception Planning/ Control Driver state Vehicle Diagnosis Smart factory Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationAsaf Bar Zvi Adi Hayat. Semantic Segmentation
Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationCSCI 315: Artificial Intelligence through Deep Learning
CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationNotes on Adversarial Examples
Notes on Adversarial Examples David Meyer dmm@{1-4-5.net,uoregon.edu,...} March 14, 2017 1 Introduction The surprising discovery of adversarial examples by Szegedy et al. [6] has led to new ways of thinking
More informationBinary Convolutional Neural Network on RRAM
Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationDeep Convolutional Neural Networks for Pairwise Causality
Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationIntroduction to Machine Learning (67577)
Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationNeural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture
Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are
More informationClassification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks
Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Mohit Shridhar Stanford University mohits@stanford.edu, mohit@u.nus.edu Abstract In particle physics, Higgs Boson to tau-tau
More informationFinal Examination CS540-2: Introduction to Artificial Intelligence
Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your
More informationSpatial Transformer Networks
BIL722 - Deep Learning for Computer Vision Spatial Transformer Networks Max Jaderberg Andrew Zisserman Karen Simonyan Koray Kavukcuoglu Contents Introduction to Spatial Transformers Related Works Spatial
More informationIntroduction to Deep Learning CMPT 733. Steven Bergner
Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization
More informationTwo at Once: Enhancing Learning and Generalization Capacities via IBN-Net
Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationConvolutional Neural Networks. Srikumar Ramalingam
Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationIs Robustness the Cost of Accuracy? A Comprehensive Study on the Robustness of 18 Deep Image Classification Models
Is Robustness the Cost of Accuracy? A Comprehensive Study on the Robustness of 18 Deep Image Classification Models Dong Su 1*, Huan Zhang 2*, Hongge Chen 3, Jinfeng Yi 4, Pin-Yu Chen 1, and Yupeng Gao
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationPRUNING CONVOLUTIONAL NEURAL NETWORKS. Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz
PRUNING CONVOLUTIONAL NEURAL NETWORKS Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz 2017 WHY WE CAN PRUNE CNNS? 2 WHY WE CAN PRUNE CNNS? Optimization failures : Some neurons are "dead":
More informationChannel Pruning and Other Methods for Compressing CNN
Channel Pruning and Other Methods for Compressing CNN Yihui He Xi an Jiaotong Univ. October 12, 2017 Yihui He (Xi an Jiaotong Univ.) Channel Pruning and Other Methods for Compressing CNN October 12, 2017
More informationAnalyzing the Performance of Multilayer Neural Networks for Object Recognition
Analyzing the Performance of Multilayer Neural Networks for Object Recognition Pulkit Agrawal, Ross Girshick, Jitendra Malik {pulkitag,rbg,malik}@eecs.berkeley.edu University of California Berkeley Supplementary
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationBayesian Networks (Part I)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationGlobal Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond
Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This
More informationDeep Learning (CNNs)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell
More informationLocal Affine Approximators for Improving Knowledge Transfer
Local Affine Approximators for Improving Knowledge Transfer Suraj Srinivas & François Fleuret Idiap Research Institute and EPFL {suraj.srinivas, francois.fleuret}@idiap.ch Abstract The Jacobian of a neural
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationRegularization in Neural Networks
Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationarxiv: v1 [stat.ml] 27 Jul 2016
Convolutional Neural Networks Analyzed via Convolutional Sparse Coding arxiv:1607.08194v1 [stat.ml] 27 Jul 2016 Vardan Papyan*, Yaniv Romano*, Michael Elad October 17, 2017 Abstract In recent years, deep
More informationAxiomatic Attribution of Neural Networks
Axiomatic Attribution of Neural Networks Mukund Sundararajan* 1, Ankur Taly* 1, Qiqi Yan* 1 1 Google Inc. Presenter: Arshdeep Sekhon Mukund Sundararajan*, Ankur Taly*, Qiqi Yan*Axiomatic Attribution of
More informationarxiv: v1 [cs.lg] 3 Jan 2017
Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, New-Delhi, India January 4, 2017 arxiv:1701.00597v1
More informationDo not tear exam apart!
6.036: Final Exam: Fall 2017 Do not tear exam apart! This is a closed book exam. Calculators not permitted. Useful formulas on page 1. The problems are not necessarily in any order of di culty. Record
More informationUnderstanding CNNs using visualisation and transformation analysis
Understanding CNNs using visualisation and transformation analysis Andrea Vedaldi Medical Imaging Summer School August 2016 Image representations 2 encoder Φ representation An encoder maps the data into
More informationarxiv: v1 [cs.cv] 11 May 2015 Abstract
Training Deeper Convolutional Networks with Deep Supervision Liwei Wang Computer Science Dept UIUC lwang97@illinois.edu Chen-Yu Lee ECE Dept UCSD chl260@ucsd.edu Zhuowen Tu CogSci Dept UCSD ztu0@ucsd.edu
More informationDeep Residual. Variations
Deep Residual Network and Its Variations Diyu Yang (Originally prepared by Kaiming He from Microsoft Research) Advantages of Depth Degradation Problem Possible Causes? Vanishing/Exploding Gradients. Overfitting
More informationClassifaction of gg gh against qg qh
Jet Variables, Machine Learning, Box vs Triangle Diagrams Joshua Lin + Ben Nachman joshua.z.lin@gmail.com LBNL, Berkeley 2018 January 8-12 Conference, LBNL Introduction 2 These days, we understand very
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationGenerative adversarial networks
14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based
More informationMultiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis
Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis Minghang Zhao, Myeongsu Kang, Baoping Tang, Michael Pecht 1 Backgrounds Accurate fault diagnosis is important to ensure
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationFA Homework 5 Recitation 1. Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks
FA17 10-701 Homework 5 Recitation 1 Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks Note Remember that there is no problem set covering some of the lecture material; you may need to study these topics
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization
TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by
More informationMitosis Detection in Breast Cancer Histology Images with Multi Column Deep Neural Networks
Mitosis Detection in Breast Cancer Histology Images with Multi Column Deep Neural Networks IDSIA, Lugano, Switzerland dan.ciresan@gmail.com Dan C. Cireşan and Alessandro Giusti DNN for Visual Pattern Recognition
More informationRegion Description for Recognition
Region Description for Recognition For object recognition, descriptions of regions in an image have to be compared with descriptions of regions of meaningful objects (models). The general problem of object
More informationConvolution arithmetic
Institut des algorithmes d apprentissage de Montréal Convolution arithmetic Vincent Dumoulin Francesco Visin Introduction Why should I care? Convolutional neural networks are not a new idea (LeCun et al.
More informationDetecting Dark Matter Halos Sam Beder, Brie Bunge, Adriana Diakite December 14, 2012
Detecting Dark Matter Halos Sam Beder, Brie Bunge, Adriana Diakite December 14, 2012 Introduction Dark matter s gravitational pull affects the positions of galaxies in the universe. If galaxies were circular,
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationStable and Efficient Representation Learning with Nonnegativity Constraints. Tsung-Han Lin and H.T. Kung
Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and H.T. Kung Unsupervised Representation Learning Layer 3 Representation Encoding Sparse encoder Layer 2 Representation
More informationLecture 7: DecisionTrees
Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationGlobal Scene Representations. Tilke Judd
Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features
More informationConvolutional Neural Network. Hung-yi Lee
al Neural Network Hung-yi Lee Why CNN for Image? [Zeiler, M. D., ECCV 2014] x 1 x 2 Represented as pixels x N The most basic classifiers Use 1 st layer as module to build classifiers Use 2 nd layer as
More informationCS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016
CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More information