A practical theory for designing very deep convolutional neural networks. Xudong Cao.

Size: px
Start display at page:

Download "A practical theory for designing very deep convolutional neural networks. Xudong Cao."

Transcription

1 A practical theory for designing very deep convolutional neural networks Xudong Cao Abstract Going deep is essential for deep learning. However it is not easy, there are many ways of going deep but most of them are ineffective. In this work, we propose two novel constrains in the design of deep structure to guarantee the performance gain when going deep. Firstly, for each convolutional layer, its capacity of learning more complex patterns should be guaranteed; Secondly, the receptive field of the topmost layer should be no larger than the image region. Given these two constrains, we cast the task of designing deep convolutional neural network into a constrained optimization problem. We present an analytic optimal solution under certain conditions. 1. Introduction Going deep has demonstrated a great success in the ImageNet competition Both Google and VGG won the competition using very deep convolutional neural networks. Going deep greatly improves the learning/fitting capacity of the entire network while only increase model size and computational cost linearly. Although it has become a common sense to going deep, it is still unclear how to design a very deep convolution neural network effectively. Most times arbitrarily adding more layers does not help, or even worsen the performance. In this work, we propose a practical theory for designing very deep convolutional neural network effectively. As shown in Figure 1, we divide a convolutional neural network into two levels i.e. classifier and feature levels, we focus on designing a very deep architecture in the feature level while use a fixed simple design in the classifier level (will describe in the end of this section). We cast the design of deep convolutional neural network into a constrained optimization problem. The objective is maximizing the depth of the target convolutional neural network, subjecting to two constraints: (1) the c-value of each layer should not be too small, c-value is a metric for measuring the capacity of learning more complex patterns; (2) the receptive field of the topmost convolutional layer in the feature-level should no larger than image size. We proof Classifier Level Feature Level Small Feature Map Large Conv Kernel Identical for All Nets Our theory will provide a CNN design Figure 1. We divide a convolutional neural network into two levels, i.e. classifier level and feature level. In this work, we focus on designing very deep convolutional neural network in the feature level, while using a fixed simple classifier-level design for all networks. that there is a close form solution for this constrained optimization problem under certain conditions. The convolutional neural networks we designed achieved state-of-theart performance in many tasks, including cifar10/cifar100 classification, kaggle plankton classification and face recognition/verification 1. It is worth noting that our theory is task independent. The architecture of the designed convolutional neural network only depends on the raw image size and the fixed filter size 2. This fact could become less surprising if we consider that some recent works achieve good results on various tasks using VGGNet and GooleNet. It implies that we could design good architecture regardless the specific tasks. Of course, we believe better architectures can be designed by incorporating task specific knowledge, such as the cyclic pooling and rolling network designed by the first prize winner of the national data science bowl. Before directly dive into detailed mathematical formulation, we firstly describe the two novel constraints and the underlined intuitions in the subsequent sections, then we present the formal mathematical formulations as well as the 1 I will detail those experiments in a formal academic paper. For this tech report, I mainly describe the methods and the underlined intuitions. 2 In this work we use one kind of filter (size) in the designing of convolutional neural network for clarity and simplicity. It is possible to remove this restriction and extend the ideas to more general cases. 1

2 Pattern A Pattern B Pattern Response down sample stride 2 down sample stride 2 (a) Figure 2. An illustration of learning capacity. The exemplar (a) has better learning capacity comparing to the exemplar (b). The blue rectangle represent the convolutional filter/kernel. (b) 8 conv layers 3-by-3 kernel 2: 2: 16 4 conv layers 3-by-3 kernel 20: 4: 32 4 conv layers 3-by-3 kernel 40: 8: 64 receptive field size Figure 3. An illustration of receptive field size. The receptive field sizes of all convolutional layers are represented in matlab-style arrays for clarity. Fror example, 40: 8: 64 represents [40, 48, 56, 64]. The size of input image is 64-by-64. We subtract an annoying small constant 1 when calculating the receptive field sizes in order to make the description and subsequent derivation more concise. close form solution under certain conditions. The design of classifier level. The common design in classifier level is two fully connected layers with dropout on each of them. However we found this design is prone to over-fitting if the training set is not sufficiently large. We empirically found that it is better to down sample the input feature map to a small size (6x6, 7x7 or 8x8), and then apply two 5x5 convolutional layers, finally max pool the feature map into a vector and dropout this vector. As kernel size is very large relative to the feature map, herein the convolutional layers are more like fully connected layers. One perspective of understanding this design is it conduct dense sliding window test (used in VGG s work) and aggregate the final results by max pooling. This design is inspired by Network in Network and GoogleNet. The difference is that we found it is better to use large convolutional kernels relative to the feature map and replace the average pooling with max pooling. 2. Capacity of learning the first constraint The functionality of a convolutional layer is composing more complex patterns from input patterns. As shown in Figure 2(a), given the responses of pattern A and B and their spatial relationship, a convolutional layer can form/detect a more complex pattern AB. However, a convolutional layer does not always has the capacity of learning more complex patterns. In Figure 2(b), we show a simple case when a convolutional layer fail to learn a more complex pattern. In this case, the filter size is smaller than the distance between the responses of pattern A and B, in other words, it can not detect both pattern A and B as well as their spatial relationship, therefore fail to learn the more complex pattern AB. To regain the capacity of learning, we can either use larger convolutional kernel, or cut the responses distance by half via down sampling with stride 2. In this work, we only s- tudy regaining the learning capacity via down sampling. The odds of the learning failure grows, as a convolutional neural network goes deep without down sampling, because the sizes of the detected patterns and their meaningful s- patial relationships grows layer by layer. Once the spatial relationships of the majority input patterns exceed the filter size of a convolutional layer, this convolutional layer will lose its capacity of learning more complex patterns. To quantitatively measure the learning capacity of a convolutional layer we define the c-value of a convolutional layer as follows. c-value = Real Filter Size Receptive Field Size where the real filter size of a k-by-k convolutional layer is k if there is no down sampling, it doubles after each down sampling i.e. 2k after one down sampling and 4k after two down sampling etc. The receptive field size is defined as the maximum size of a neuron can see on the raw image. It grows proportionally as the convolutional neural network goes deep. Figure 3 shows how the receptive fields grows in an exemplar convolutional neural network. It is worth noting that we use the receptive field size instead of the typical size of the spatial relationships of the input patterns. Because the later is not measurable, even though it is conceptually more accurate. In the definition, we implicitly assume that the receptive field size of a convolutional layer is proportional to the typical size of the spatial relationships of the input patterns. Provided the discussions above, we are ready to present the the first constraint for designing very deep convolutional neural network: the c-value of each convolutional layer should be larger than a minimum value t. We empirically found t = 1/6 is a good lower bound of c-value for all convolutional layers in various tasks. (1)

3 3. Necessity of learning the second constraint As the receptive field grows, new and more complex patterns are constantly emerging. In the process, we need additional layers to learn the newly emerged patterns as the receptive field grows. However when the receptive field reach the image size i.e. neurons have seen the entire image region, it stops emerging new and more complex patterns, the driven force for adding new layers no longer exists. Empirically, we found adding many layers after the saturation of receptive field does not help the performance in general, what is worse, it increase the risk of over-fitting and hurt the performance in most cases. Base the analysis above and our empirical study, we propose the second constraint for designing very deep convolutional neural network. The receptive field size of the topmost convolutional layer should be no larger than the image size. This constraint implies that the receptive field of the topmost convolutional layer must be around the entire image region. If the receptive field of the topmost layer is much smaller the image region, we can add one more layer to improve our objective (i.e. increase the depth) without violating the constraint. From another perspective, if the receptive field is much smaller the image size, the network will lose the opportunity to see and learn the high-level patterns/features, which is suboptimal to the performance. For some tasks, there is no meaningful or discriminative patterns/features in a certain range of receptive field, thus no necessity of using many layers for learning i.e. maximizing the depth of entire network is no longer valid. We will have more discussions about this topic in the later part of this paper. 4. The mathematical formulation In this section, we formulate the design of deep architecture into a constrained optimization problem, and then we present and prove an optimal solution under certain conditions. As aforementioned, we assume the filter sizes are the same for all layers, we do not take the filter numbers into consideration, because it is task and data set dependent The formulation Herein we define the notations of the input parameters. They are the image size z, the filter size k and the minimum c-value t. The architecture of a deep model can determined by the total number of stages n and the number of layers in various stages {a i }. Various stages are divided by a down sampling (stride 2). For example, n = 3 and a 1, a 2, a 3 = 4, 3, 2 represent a model with 3 stages, the number of layers in the first, second and the third stage are 4, 3, 2 respectively. In between are down sampling with stride 2. The goal of going deep essentially is to maximize the total number of layers i.e. i a i, given the two constrains proposed in the previous two sections. The first constrain requires the c-values of all layers are no smaller than the minimum c-value t. As the receptive field keep growing, and the real filter size in one stage stay the same, the c-value of the last layer in one stage is the smallest. Therefore the first constrain is equivalent to ensuring the c-value of the last layer in each stage no smaller than the minimum c-value t, which can be translated into a set of inequations, 2 l k l 2i 1 (k 1)a i t, where l = 1, 2,..., n (2) where 2 l k is the real filter size at l-th stage, l 2i 1 (k 1)a i is the receptive field 3 of the last layer at l-th stage, t is the minimum c-value, we set t = 1/6 for all tasks. The second constrain requires the receptive field of the topmost of convolutional layer is no larger than the entire image region. It can be formally represented as follows. 2 i 1 (k 1)a i z (3) i where the left term is the receptive field of the topmost convolutional layer, 2 i 1 (k 1) is the receptive field increment of a layer at i-th stage, 2 i 1 (k 1)a i is the total receptive field increment at i-th stage. The objective function of our formulation can be formally represented by maximizing the total number of layers, subjecting to the two constraints in Equations (2) and (3). With slightly transforming the two constraints, we achieve our final formulation as follows. i max n,{a i} n a i (4) i a i 2 i 1 2l k where l = 1, 2,..., n (5) t(k 1) a i 2 i 1 z (6) k 1 where both n and {a i } are integers. We simplify the forms of the first and second constrains by equivalent transforms to facilitate the later discussions. 3 We omit the small constant 1 in calculating the receptive field sizes to simplify the description and subsequent derivation.

4 net without augment net with augment 32x32 input image 24x24 input image conv11, 5x5, 32x32, 128 conv11, 5x5, 24x24, 192 conv12, 5x5, 32x32, 128 conv12, 5x5, 24x24, 192 conv13, 5x5, 32x32, 128 conv13, 5x5, 24x24, 192 conv14, 5x5, 32x32, 128 conv14, 5x5, 24x24, 192 conv15, 5x5, 32x32, 128 conv15, 5x5, 24x24, 192 conv16, 5x5, 32x32, 128 conv16, 5x5, 24x24, 192 max pool stride 2 conv21, 5x5, 16x16, 128 max pool stride 2 max pool stride 4 conv31, 5x5, 8x8, 128 conv21, 5x5, 6x6, 192 conv32, 5x5, 8x8, 64 conv22, 5x5, 6x6, 64 max pool stride 8 max pool stride 6 Table 1. Networks designed using 5x5 filter size for cifar10 and cifar100 data. The description conv15, 5x5, 32x32, 128, from left to right, represents layer name, filter size, feature map size and filter number The optimal solution under certain conditions In this section, we show that the optimal solution of our formulation can be found under certain conditions. Assume that the image size z = 2 m 1 k/t, and the layer numbers {a i } are relaxed from integers to positive real numbers, we can show that the optimal solution to our objective function is that a 1 = n = m (7) k (k 1)t, a 2 =... = a n = 1/2a 1 (8) Due to the limit of time, we just roughly sketch the proof in Section 6. Discussion about the optimal solution. Although this optimal solution is obtained under certain conditions, it provides great insights about how to design effective deep architecture under general conditions. First, it guides how many times of down samplings should we choose given the input parameters. Second, it shows the number of layers should be as evenly distributed as possible in all stages, except the first stage. Third, it shows the maximum depth could be achieved by various filter size, based on which we can make better tradeoff between various filter sizes. net without augment net with augment 32x32 input image 24x24 input image conv11, 3x3, 32x32, 192 conv11, 3x3, 24x24, 256 conv12, 3x3, 32x32, 192 conv12, 3x3, 24x24, 256 conv13, 3x3, 32x32, 192 conv13, 3x3, 24x24, 256 conv14, 3x3, 32x32, 192 conv14, 3x3, 24x24, 256 conv15, 3x3, 32x32, 192 conv15, 3x3, 24x24, 256 conv16, 3x3, 32x32, 192 conv16, 3x3, 24x24, 256 conv17, 3x3, 32x32, 192 conv17, 3x3, 24x24, 256 conv18, 3x3, 32x32, 192 conv18, 3x3, 24x24, 256 max pool stride 2 max pool stride 2 conv21, 3x3, 16x16, 192 conv21, 3x3, 16x16, 256 conv22, 3x3, 16x16, 192 conv22, 3x3, 16x16, 256 conv23, 3x3, 16x16, 192 conv24, 3x3, 16x16, 192 max pool stride 2 max pool stride 4 conv31, 5x5, 8x8, 128 conv31, 5x5, 6x6, 192 conv32, 5x5, 8x8, 64 conv32, 5x5, 6x6, 64 max pool stride 8 max pool stride 6 Table 2. Networks designed using 3x3 filter size for cifar10 and cifar100 data. 5. Some exemplar networks Herein we show some networks designed according to our theory Networks for cifar10/cifar100 CIFAR10 and CIFAR100 are proposed by Alex. It contains 60,000 tiny color images with 32x32 size. Without data augmentation, the image size feed into the convolutional neural network is 32x32. With Alex s data augmentation, the image size feed into convolutional neural network is 24x24. As our design depends on the input image size and the filter size, we present four networks in Table (2) and (1). We achieve the best performance on cifar10 and cifar100 using the designed networks. On cifar10, our best result is 7.44%/9.02% (7.44% with data augmentation and 9.02% without data augmentation), better than Deeply Supervised Network (7.97%/9.69%), Network in Network (8.81%/10.41%), Maxout Network (9.32%/11.68%). On cifar100, our best result is 33.03% without data augmentation, better than Deeply Supervised Network (34.57%), Network in Network (35.68%) and Maxout Network (38.57%). It is worth noting that fractional max pooling proposed by Dr. Ben achieves much better results using more sophisticated data augmentation. Due to different data augmentation and testing settings, herein we did not directly compare with his work.

5 image size 48x48 image size 112x112 image size 128x128 image size 144x144 filter size 5x5 filter size 2x2 filter size 3x3 filter size 4x4 conv01, 112x112, 5x5, 16 conv01, 128x128, 5x5, 16 conv01, 144x144, 5x5, 16 max pool stride 2 max pool stride 2 max pool stride 2 conv11, 48x48, 5x5, 16 conv11, 56x56, 2x2, 32 conv11, 64x64, 3x3, 32 conv11, 72x72, 4x4, 32 conv12, 48x48, 5x5, 16 conv12, 56x56, 2x2, 32 conv12, 64x64, 3x3, 32 conv12, 72x72, 4x4, 32 conv13, 48x48, 5x5, 32 conv13, 56x56, 2x2, 32 conv13, 64x64, 3x3, 64 conv13, 72x72, 4x4, 64 conv14, 48x48, 5x5, 32 conv14, 56x56, 2x2, 32 conv14, 64x64, 3x3, 64 conv14, 72x72, 4x4, 64 conv15, 48x48, 5x5, 128 conv15, 56x56, 2x2, 64 conv15, 64x64, 3x3, 128 conv15, 72x72, 4x4, 128 conv16, 48x48, 5x5, 128 conv16, 56x56, 2x2, 64 conv16, 64x64, 3x3, 128 conv16, 72x72, 4x4, 128 conv17, 56x56, 2x2, 64 conv17, 64x64, 3x3, 192 conv17, 72x72, 4x4, 128 conv18, 56x56, 2x2, 64 conv18, 64x64, 3x3, 192 conv18, 72x72, 4x4, 128 conv19, 56x56, 2x2, 192 conv110, 56x56, 2x2, 192 conv111, 56x56, 2x2, 192 conv112, 56x56, 2x2, 192 max pool stride 2 max pool stride 2 max pool stride 2 max pool stride 2 conv21, 24x24, 5x5, 256 conv21, 28x28, 2x2, 384 conv21, 32x32, 3x3, 256 conv21, 36x36, 4x4, 256 conv22, 24x24, 5x5, 256 conv22, 28x28, 2x2, 384 conv22, 32x32, 3x3, 256 conv22, 36x36, 4x4, 256 conv23, 24x24, 5x5, 256 conv23, 28x28, 2x2, 384 conv23, 32x32, 3x3, 256 conv23, 36x36, 4x4, 256 conv24, 28x28, 2x2, 384 conv24, 32x32, 3x3, 256 conv24, 36x36, 4x4, 256 conv25, 28x28, 2x2, 384 conv26, 28x28, 2x2, 384 max pool stride 2 max pool stride 2 max pool stride 2 conv31, 14x14, 2x2, 768 conv31, 16x16, 3x3, 512 conv31, 18x18, 4x4, 512 conv32, 14x14, 2x2, 768 conv32, 16x16, 3x3, 512 conv32, 18x18, 4x4, 512 conv33, 14x14, 2x2, 768 conv33, 16x16, 3x3, 512 conv34, 14x14, 2x2, 768 conv34, 16x16, 3x3, 512 conv35, 14x14, 2x2, 768 conv36, 14x14, 2x2, 768 max pool stride 2 conv41, 7x7, 2x2, 1024 conv42, 7x7, 2x2, 1024 max pool stride 4 max pool stride 2 max pool stride 2 max pool stride 3 conv31, 6x6, 5x5, 256 conv51, 3x3, 3x3, 768 conv41, 8x8, 5x5, 512 conv41, 6x6, 5x5, 512 conv32, 6x6, 5x5, 256 conv52, 3x3, 3x3, 768 conv42, 8x8, 5x5, 512 conv42, 6x6, 5x5, 512 Table 3. Networks for national data science bowl Networks for national data science bowl National data science bowl is a kaggle competition. The task is classifying plankton images into 121 pre-defined classes. There are around 30k training data and 130k testing data. To make the designed networks complementary to each other, we design our networks with various input image sizes and filter sizes. In Table (3), we show 4 representative networks designed following our theory. Among them, the third net (image size 128, filter size 3) and the forth net (image size 144, filter size 4) achieved and log loss score on the public leader board. Our final ensemble consists of the four models in Table (3), a VGG-like model and 2 variants of our designs. The 7-model ensemble achieved public leader board score. Combining with Bings score, we final achieved score on the public leader board.

6 6. The sketch of the proof First, the solution satisfies the constrains in the inequations 5 and 6. We can easily verify this by plugging the solution into the inequations. We found that the solution pushes the left term of those inequations exactly equal to the right term. In other words, this solution is at the boundary of the set of feasible solutions. Second, assume there is a solution l and {b 1,..., b l }, and l m. By linearly composing the inequations in 5 and 6, we can always show that m b i (l + 1)C/2 = k where C = (k 1)t. This inequations implies there is no better solution when l m. Third, assume there is a solution l and {b 1,..., b l }, and l m. By linearly composing the inequations in 5 and 6, we can have that m b i + 2 i=m+1 a i b i (l + 1)C/2 By reorganize the the left term, we have b i + i=m+1 b i (l + 1)C/2 = m From the inequations above we tell that there is no better solution when l > m. References a i

Convolutional neural networks

Convolutional neural networks 11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:

More information

Convolutional Neural Network Architecture

Convolutional Neural Network Architecture Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Bagging and Other Ensemble Methods

Bagging and Other Ensemble Methods Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY SIFT HOG ALL ABOUT THE FEATURES DAISY GABOR AlexNet GoogleNet CONVOLUTIONAL NEURAL NETWORKS VGG-19 ResNet FEATURES COMES FROM DATA

More information

Convolution and Pooling as an Infinitely Strong Prior

Convolution and Pooling as an Infinitely Strong Prior Convolution and Pooling as an Infinitely Strong Prior Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Convolutional

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:

More information

Maxout Networks. Hien Quoc Dang

Maxout Networks. Hien Quoc Dang Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Convolutional Networks Convolutional

More information

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based UNDERSTANDING CNN ADAS Tasks Self Driving Localizati on Perception Planning/ Control Driver state Vehicle Diagnosis Smart factory Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Asaf Bar Zvi Adi Hayat. Semantic Segmentation Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Notes on Adversarial Examples

Notes on Adversarial Examples Notes on Adversarial Examples David Meyer dmm@{1-4-5.net,uoregon.edu,...} March 14, 2017 1 Introduction The surprising discovery of adversarial examples by Szegedy et al. [6] has led to new ways of thinking

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Deep Convolutional Neural Networks for Pairwise Causality

Deep Convolutional Neural Networks for Pairwise Causality Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Mohit Shridhar Stanford University mohits@stanford.edu, mohit@u.nus.edu Abstract In particle physics, Higgs Boson to tau-tau

More information

Final Examination CS540-2: Introduction to Artificial Intelligence

Final Examination CS540-2: Introduction to Artificial Intelligence Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your

More information

Spatial Transformer Networks

Spatial Transformer Networks BIL722 - Deep Learning for Computer Vision Spatial Transformer Networks Max Jaderberg Andrew Zisserman Karen Simonyan Koray Kavukcuoglu Contents Introduction to Spatial Transformers Related Works Spatial

More information

Introduction to Deep Learning CMPT 733. Steven Bergner

Introduction to Deep Learning CMPT 733. Steven Bergner Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization

More information

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Convolutional Neural Networks. Srikumar Ramalingam

Convolutional Neural Networks. Srikumar Ramalingam Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Is Robustness the Cost of Accuracy? A Comprehensive Study on the Robustness of 18 Deep Image Classification Models

Is Robustness the Cost of Accuracy? A Comprehensive Study on the Robustness of 18 Deep Image Classification Models Is Robustness the Cost of Accuracy? A Comprehensive Study on the Robustness of 18 Deep Image Classification Models Dong Su 1*, Huan Zhang 2*, Hongge Chen 3, Jinfeng Yi 4, Pin-Yu Chen 1, and Yupeng Gao

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

PRUNING CONVOLUTIONAL NEURAL NETWORKS. Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz

PRUNING CONVOLUTIONAL NEURAL NETWORKS. Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz PRUNING CONVOLUTIONAL NEURAL NETWORKS Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz 2017 WHY WE CAN PRUNE CNNS? 2 WHY WE CAN PRUNE CNNS? Optimization failures : Some neurons are "dead":

More information

Channel Pruning and Other Methods for Compressing CNN

Channel Pruning and Other Methods for Compressing CNN Channel Pruning and Other Methods for Compressing CNN Yihui He Xi an Jiaotong Univ. October 12, 2017 Yihui He (Xi an Jiaotong Univ.) Channel Pruning and Other Methods for Compressing CNN October 12, 2017

More information

Analyzing the Performance of Multilayer Neural Networks for Object Recognition

Analyzing the Performance of Multilayer Neural Networks for Object Recognition Analyzing the Performance of Multilayer Neural Networks for Object Recognition Pulkit Agrawal, Ross Girshick, Jitendra Malik {pulkitag,rbg,malik}@eecs.berkeley.edu University of California Berkeley Supplementary

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Bayesian Networks (Part I)

Bayesian Networks (Part I) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

Local Affine Approximators for Improving Knowledge Transfer

Local Affine Approximators for Improving Knowledge Transfer Local Affine Approximators for Improving Knowledge Transfer Suraj Srinivas & François Fleuret Idiap Research Institute and EPFL {suraj.srinivas, francois.fleuret}@idiap.ch Abstract The Jacobian of a neural

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Regularization in Neural Networks

Regularization in Neural Networks Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

arxiv: v1 [stat.ml] 27 Jul 2016

arxiv: v1 [stat.ml] 27 Jul 2016 Convolutional Neural Networks Analyzed via Convolutional Sparse Coding arxiv:1607.08194v1 [stat.ml] 27 Jul 2016 Vardan Papyan*, Yaniv Romano*, Michael Elad October 17, 2017 Abstract In recent years, deep

More information

Axiomatic Attribution of Neural Networks

Axiomatic Attribution of Neural Networks Axiomatic Attribution of Neural Networks Mukund Sundararajan* 1, Ankur Taly* 1, Qiqi Yan* 1 1 Google Inc. Presenter: Arshdeep Sekhon Mukund Sundararajan*, Ankur Taly*, Qiqi Yan*Axiomatic Attribution of

More information

arxiv: v1 [cs.lg] 3 Jan 2017

arxiv: v1 [cs.lg] 3 Jan 2017 Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, New-Delhi, India January 4, 2017 arxiv:1701.00597v1

More information

Do not tear exam apart!

Do not tear exam apart! 6.036: Final Exam: Fall 2017 Do not tear exam apart! This is a closed book exam. Calculators not permitted. Useful formulas on page 1. The problems are not necessarily in any order of di culty. Record

More information

Understanding CNNs using visualisation and transformation analysis

Understanding CNNs using visualisation and transformation analysis Understanding CNNs using visualisation and transformation analysis Andrea Vedaldi Medical Imaging Summer School August 2016 Image representations 2 encoder Φ representation An encoder maps the data into

More information

arxiv: v1 [cs.cv] 11 May 2015 Abstract

arxiv: v1 [cs.cv] 11 May 2015 Abstract Training Deeper Convolutional Networks with Deep Supervision Liwei Wang Computer Science Dept UIUC lwang97@illinois.edu Chen-Yu Lee ECE Dept UCSD chl260@ucsd.edu Zhuowen Tu CogSci Dept UCSD ztu0@ucsd.edu

More information

Deep Residual. Variations

Deep Residual. Variations Deep Residual Network and Its Variations Diyu Yang (Originally prepared by Kaiming He from Microsoft Research) Advantages of Depth Degradation Problem Possible Causes? Vanishing/Exploding Gradients. Overfitting

More information

Classifaction of gg gh against qg qh

Classifaction of gg gh against qg qh Jet Variables, Machine Learning, Box vs Triangle Diagrams Joshua Lin + Ben Nachman joshua.z.lin@gmail.com LBNL, Berkeley 2018 January 8-12 Conference, LBNL Introduction 2 These days, we understand very

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Generative adversarial networks

Generative adversarial networks 14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based

More information

Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis

Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis Minghang Zhao, Myeongsu Kang, Baoping Tang, Michael Pecht 1 Backgrounds Accurate fault diagnosis is important to ensure

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Support Vector Machines. Machine Learning Fall 2017

Support Vector Machines. Machine Learning Fall 2017 Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce

More information

FA Homework 5 Recitation 1. Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks

FA Homework 5 Recitation 1. Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks FA17 10-701 Homework 5 Recitation 1 Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks Note Remember that there is no problem set covering some of the lecture material; you may need to study these topics

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by

More information

Mitosis Detection in Breast Cancer Histology Images with Multi Column Deep Neural Networks

Mitosis Detection in Breast Cancer Histology Images with Multi Column Deep Neural Networks Mitosis Detection in Breast Cancer Histology Images with Multi Column Deep Neural Networks IDSIA, Lugano, Switzerland dan.ciresan@gmail.com Dan C. Cireşan and Alessandro Giusti DNN for Visual Pattern Recognition

More information

Region Description for Recognition

Region Description for Recognition Region Description for Recognition For object recognition, descriptions of regions in an image have to be compared with descriptions of regions of meaningful objects (models). The general problem of object

More information

Convolution arithmetic

Convolution arithmetic Institut des algorithmes d apprentissage de Montréal Convolution arithmetic Vincent Dumoulin Francesco Visin Introduction Why should I care? Convolutional neural networks are not a new idea (LeCun et al.

More information

Detecting Dark Matter Halos Sam Beder, Brie Bunge, Adriana Diakite December 14, 2012

Detecting Dark Matter Halos Sam Beder, Brie Bunge, Adriana Diakite December 14, 2012 Detecting Dark Matter Halos Sam Beder, Brie Bunge, Adriana Diakite December 14, 2012 Introduction Dark matter s gravitational pull affects the positions of galaxies in the universe. If galaxies were circular,

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Stable and Efficient Representation Learning with Nonnegativity Constraints. Tsung-Han Lin and H.T. Kung

Stable and Efficient Representation Learning with Nonnegativity Constraints. Tsung-Han Lin and H.T. Kung Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and H.T. Kung Unsupervised Representation Learning Layer 3 Representation Encoding Sparse encoder Layer 2 Representation

More information

Lecture 7: DecisionTrees

Lecture 7: DecisionTrees Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Global Scene Representations. Tilke Judd

Global Scene Representations. Tilke Judd Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features

More information

Convolutional Neural Network. Hung-yi Lee

Convolutional Neural Network. Hung-yi Lee al Neural Network Hung-yi Lee Why CNN for Image? [Zeiler, M. D., ECCV 2014] x 1 x 2 Represented as pixels x N The most basic classifiers Use 1 st layer as module to build classifiers Use 2 nd layer as

More information

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an

More information

SGN (4 cr) Chapter 5

SGN (4 cr) Chapter 5 SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information