Compression of Deep Neural Networks on the Fly

Size: px
Start display at page:

Download "Compression of Deep Neural Networks on the Fly"

Transcription

1 Compression of Deep Neural Networks on the Fly Guillaume Soulié, Vincent Gripon, Maëlys Robert Télécom Bretagne arxiv: v1 [cs.lg] 9 Sep 01 Abstract Because of their performance, deep neural networks are increasingly used for object recognition. They are particularly attractive because of their ability to absorb great quantities of labeled data through millions of parameters. However, as the accuracy and the model sizes increase, so does the memory requirements of the classifiers. This prohibits their usage on resource limited hardware, including cell phones or other embedded devices. We introduce a novel compression method for deep neural networks that performs during the learning phase. It consists in adding an extra regularization term to the cost function of fully-connected layers. We combine this method with Product Quantization (PQ) of the trained weights for higher savings in memory and storage consumption. We evaluate our method on two data sets (MNIST and CIFAR10), on which we achieve significantly larger compression than state-of-the-art methods. I. MOTIVATION Deep Convolutional Neural Networks (CNNs) [1,, 3, 4] have become the gold standard for object recognition, image classification and retrieval. Almost all of the recent successful recognition systems [4,, 6, 7, 8, 9] are built on top of this architecture. Importing CNNs onto embedded platforms [10], the recent trend toward mobile computing, has a wide range of application impacts. It is especially useful when the bandwidth is limited and when files are sensitive and should not be sent to distant servers. However, the size of a CNN is typically large (e.g. more than 00M bytes), which limits the applicability of such models on embedded platforms. For example, it is hard to force users to download an iphone application that big. The purpose of this paper is to propose new techniques for compressing networks without sacrificing performance. In this article we focus on compressing CNNs for computer vision tasks, although our methodology is not taking any advantage of this particular application field and we expect it to perform similarly on other types of learning tasks. A typical CNN [, 7, 8] with good performance in visual object recognition contains eight layers (five convolutional layers and three dense connected layers) and a huge number of parameters in order to achieve state-of-the-art results. These parameters are overparameterized [11] and we aim at compressing them. Note that our motivation is mainly reducing storage rather than speeding up the testing time [1]. A few early works on compressing CNNs have been published. In [1] and [13] the authors use compressing methods This work was founded in part by the European Research Council under the European Union s Seventh Framework Program ( FP7 / ) / ERC grant agreement number for speeding up CNN testing time. More recently, some works focus on compressing neural network to reduce storage of the network. These works can generally be put in two different categories: some of them focus on compressing the fully connected layers and others on compressing the convolutionnal layers. In [14] the authors focus on compressing dense connected layers. In their work, instead of the traditional matrix factorization methods considered in [1, 13], they consider series of information theoretical vector quantization methods [1, 16] for compressing dense connected layers. They obtain good results by applying Product Quantization. In [17] the authors focus on compressing the fully connected layers of a Multi- Layer Perceptron (MLP) using Hashing Trick, a low cost hash function to randomly group connection weights into hash buckets, and set the same value to all the parameters in the same bucket. In [18] the authors propose compressing convolutionnal layers using a Discrete Cosinus Transform applied on the convolutionnal filters, followed by Hashing Trick, as for the fully connected layers. For a typical CNN as the one introduced in [8], about 90% of the storage is taken up by the dense connected layers and more than 90% of the running time is taken by the convolutional layers. This is why our work focus upon how to compress the dense connected layers to reduce storage of neural networks. Instead of using a post-learning method to compress the network, our approach consists in modifying the regularization function used during the learning phase in order to favor quantized weights in some layers especially the output ones. To achieve this, we use an idea that was originally proposed in [19]. In order to compress furthermore our obtained networks, we also use PQ as described in [14] afterwards. We perform some experiments both on Multi-Layer Perceptrons (MLP) and Convolutionnal Neural Networks (CNN). In this paper, we introduce a novel way to quantize weights in deep learning systems. More precisely : We introduce a regularization term that forces weights to converge to either 0 or 1, before using the product quantization on the learned weights. We show how this extra term impacts performance depending on the depth of the layer it is used onto. We experiment our proposed method on celebrated benchmarks and compare with state-of-the-art techniques. The outline of the paper is as follows. In Section II we discuss related work. Section III introduced our methodology

2 for compressing layers in deep neural networks. In Section IV we run experiments on celebrated databases. In section V we discuss the biological motivation for our work. Section VI is a conclusion. II. RELATED WORK As already mentioned in the introduction, a state-of-theart CNN usually involves hundreds of millions of parameters, thus requiring an important storage capability that may be hard to obtain in practice. The bottleneck comes from model storage and testing speed. Several works have been published on speeding up CNN prediction speed. In [0] the authors use tricks of CPUs to speed up the execution of CNN. In particular they focus on aligning memory and SIMD operations to boost matrix operations. In [1] the authors show the convolutional operations can be efficiently carried out in the Fourier domain, leading to a speed-up of 00%. Two very recent works [1, 13] use linear matrix factorization methods for speeding up convolutional layers and obtain a 00% speed-up gain with almost no lost in classification performance. Almost all of the above mentioned works focus on making the prediction speed of CNN faster; little work has been specifically devoted to making CNN models smaller. In [11], the authors demonstrate the redundancies in neural network parameters. Indeed, they show that the weights within one layer can be accurately predicted by a small (%) subset of the parameters. These results motivate [9] to apply vector quantization methods to compress the redundancy in parameter space. Their work can be viewed as a compression of the parameter prediction results reported in [11]. They obtain similar results to those of [11], in that they are able to compress the parameters about 0 times with little decrease of performance. This result further confirms the interesting empirical findings in [11]. In their paper, they tackle this model storage issue by investigating information theoretical vector quantization methods for compressing the parameters of CNNs. In particular, they show that in terms of compressing the most storage demanding dense connected layers, vector quantization method performs much better than existing matrix factorization methods. Simply applying k-means clustering to the weights or conducting product quantization can lead to a very good balance between model size and recognition accuracy. For the 1000-category classification task in the ImageNet challenge, they are able to achieve 16-4 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN. In [17], for the first time a learn-based method is proposed to compress neural networks. This method, based on Hashing Trick, allows efficient compression rates. In particular, they show that compressing a big neural network may be more efficient than using a smaller one: in their example they are able to divide the loss by two using a eight times larger neural network compressed eight times. The same authors also propose in [18] to compress filters in convolutional layers, arguing that the size of the convolutional layers in state-ofthe-art s CNN is increasing year after year. Using the fact that learned CNN filters are often smooth, their Discrete Cosinus Transform followed by Hashing Tricks allows them to compress a whole neural network without loosing too much accuracy. III. METHODOLOGY In this section, we consider two methods for compressing the parameters in CNN layers. We first introduce the product quantization method proposed in [14], and then introduce our proposed learn-based method. A. Product Quantization (PQ) For more details on this quantization method, refer to [14]. The key idea is to consider structured vector quantization methods for compressing parameters. More precisely, the idea consists in applying Product Quantization (PQ) [1] to exploit the redundancy of structures in vector space. PQ partitions the vector space into disjoint sub-spaces, and perform quantization in each of them. It performs increasingly better as the redundancy in each subspace grows. Specifically, given the matrix W, we partition it column-wise into several sub-matrices: W = [W 1, W,..., W s ], (1) where W i R m (n/s) assuming n is divisible by s ( (n/s) is named the segment size). We can then perform a k-means clustering for each sub-matrix W i : min m z=1 j=1 k wz i c i j, () where wi z denotes the z-th row of sub-matrix W i, and c i j denotes the j-th row of sub-codebook C i R k (n/s). For each sub-vector wi z, we need only store its corresponding cluster index and the associated codebooks. Thus, the reconstructed matrix is: Ŵ = [Ŵ 1, Ŵ,..., Ŵ s ], (3) where ŵ i j = c i j, j being a minimizer of min j w i z c i j. Note that PQ can be applied to either the x-axis or the y- axis of the matrix. In our experiments, we evaluated both cases and obtained similar performance. We need to store the cluster indexes and codebooks for each sub-vector. In particular, the codebook is not negligible. The compression rate for this method is (3mn)/(3kn+log (k)ms). With a fixed segment size, increasing k will lead to decreasing the compression rate. B. Proposed method Our proposed method is twofold: first, we use a specific added regularization term in order to attract network weights to binary values, then we coarsely quantize the output layers. Let us recall that training a neural network is generally performed thanks to the minimization of a cost function using a derivative of a gradient descent algorithm. In order to attract network weights to binary values, we add a binarization

3 cost (regularizer) during the learning phase. This added cost pushes weights to binary values. As a result, solutions of the minimization problem are expected to be binary or almost binary, depending on the scaling parameter of the added cost with respect to the initial one. This idea is not new, although we did not find any work applying it to deep learning in the literature. Our choice for the regularization term has been greatly inspired by [19]. More precisely, let us denote by W the weights of the neural network, f(w ) the cost associated with W, h W (X) and y(x) respectively the output and the label for a given input X, we obtain: f(w ) = X h W (X) y(x) + α w W w w 1, (4) where α is a scaling parameter representing the importance of the binarization cost with respect to the initial cost. Finding a good value for α may be tricky, as a too small value results in a failure of the binarization process and a too large value results in the creation of local minima that will prevent the network from successfully training. To facilitate this selection of α, we use a barrier method [19] that consists in starting with small values of α and incrementing it regularly to help the quantization process. We observed that typically some layers are very well quantified at the end of this learning phase, whereas others are still far from binary. For that reason we then binarize some of the layers but not all. In order to improve further our compression rate, we then use the PQ method presented in the previous subsection. The compression rate for this method is (3mn)/(kn + log (k)ms) (instead of (3mn)/(3kn + log (k)ms) for single Product Quantization). With a fixed segment size, increasing k will lead to decreasing the compression rate. IV. EXPERIMENTS We evaluate these different methods on two image classification datasets : MNIST and CIFAR10. A. Experimental settings 1) MNIST: The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixedsize image. The neural network we use with MNIST is LeNet. LeNet is a convolutional neural network introduced by Yann Lecun []. LeNet is one of the LeNet family networks, described in Figure 1. We use exactly the same parameters for the network (number of layer, sizes of layers, etc...). ) CIFAR10: The CIFAR10 database has a training set of 0,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from the 80 million tiny images dataset. It consists in 3x3 colour images partitioned into ten classes. With the CIFAR10 database, we use a convolutional neural network made of four convolutional layers followed by two RGB input layer 0 features maps 0 features maps (8 8) (4 4) (1 1) convolution layer max pooling layer Convolution... 0 features maps 0 features maps (8 8) (4 4) layer 1 1 max pooling layer hidden layer (00) fully connected MLP Figure 1. A graphical depiction of a LeNet model (figure inspired by [3]) fully connected layers. This network has been introduced in [4]. B. Layers to quantify Our first experiments (with the MNIST database) depicted in Figure show the influence of quantified layers on performance. We observe that performance strongly depends on which layers are quantized. More precisely, this experiment shows that one should quantize layers from the output to the input rather than the contrary. This result is not surprising to us as input layers have often been described as similar to wavelet transforms, which are intrinsically analog operators, whereas output layers are often compared to high level patterns which detection in an image is often enough for good classification results. C. Performance comparison Our second experiment shows a comparison with previous work. The results are depicted in Figures 3 and 4. Note that in both cases compared networks have the exact same architecture. As far as our proposed method is concerned, we chose to compress only the two outputs layers, which are fully connected. Since their sizes are distinct, we were not able to use the same PQ coefficients k and m twice. Note that layer contains almost all weights and is therefore the one we chose to investigate the role of each parameters. We found that our added regularization cost allows to significantly improve performance. For example for the MNIST database, if we want to respect a loss of %, we have a compression rate of 33 with single PQ, whereas our learnbased method leads to a compression rate of 107.

4 For decades the question of whether the encoding of information in the brain is analog or digital has been debated. A recent paper [7] performed some experiments emphasizing the brain might have different encodings depending on the region it is applied to. In particular they show that as far as the visual cortex is concerned, the further the region is from the sensors the most likely its encodings are digital. It is tempting to draw a parallel between these findings and the ability of the brain to associate analog-like data, such as sounds and images, with digital-data, such as language and labels. The results depicted in Figure suggest a similar behavior in deep artificial neural systems. Figure. Performance of the classification task depending on which layers of the network are quantized, on the MNIST database VI. CONCLUSION AND PERSPECTIVES In this paper we introduced a new method to compress convolutionnal neural networks. This method consists in adding an extra term to the cost function that forces weights to become almost binary. In order to compress even more the network, we then apply Product Quantization and the combination of both allows us to reach performance above state-of-the-art methods. We also demonstrate the influence of the depth of the binarized layer on performance. These findings are of particular interest to us, and a motivation to further explore the connections between actual biological data and deep neural systems. In future work, we consider exploring more complex regularization functions. Also, we plan on performing PQ on the fly as well during the learning phase. Finally, other strategies could try to take benefit from sparseness of the trained weights. Such sparsity could be enforced thanks to similar mechanisms as the ones we introduce in this paper. Figure 3. Comparison of our proposed method with previous work on the MNIST dataset. Figure 4. Comparison of our proposed method with previous work on the CIFAR10 dataset. V. BIOLOGICAL DISCUSSION Deep neural networks have often been compared to sensor parts of the human cortex, both in terms of architecture and performance []. As a matter of fact, some authors [6] propose models very similar to CNNs and biologically inspired.

5 REFERENCES [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in neural information processing systems, 01, pp [] B. B. Le Cun, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, Handwritten digit recognition with a back-propagation network, in Advances in neural information processing systems. Citeseer, [3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, arxiv preprint arxiv: , 014. [4] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arxiv preprint arxiv: , 014. [] Y. Jia, Caffe: An open source convolutional architecture for fast feature embedding, 013. [6] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, Decaf: A deep convolutional activation feature for generic visual recognition, arxiv preprint arxiv: , 013. [7] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, Overfeat: Integrated recognition, localization and detection using convolutional networks, arxiv preprint arxiv:131.69, 013. [8] M. D. Zeiler and R. Fergus, Stochastic pooling for regularization of deep convolutional neural networks, arxiv preprint arxiv: , 013. [9] Y. Gong, L. Wang, R. Guo, and S. Lazebnik, Multiscale orderless pooling of deep convolutional activation features, in Computer Vision ECCV 014. Springer, 014, pp [10] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, A 40 g-ops/s mobile coprocessor for deep neural networks, in Computer Vision and Pattern Recognition Workshops (CVPRW), 014 IEEE Conference on. IEEE, 014, pp [11] M. Denil, B. Shakibi, L. Dinh, N. de Freitas et al., Predicting parameters in deep learning, in Advances in Neural Information Processing Systems, 013, pp [1] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, Exploiting linear structure within convolutional networks for efficient evaluation, in Advances in Neural Information Processing Systems, 014, pp [13] M. Jaderberg, A. Vedaldi, and A. Zisserman, Speeding up convolutional neural networks with low rank expansions, arxiv preprint arxiv: , 014. [14] Y. Gong, L. Liu, M. Yang, and L. Bourdev, Compressing deep convolutional networks using vector quantization, arxiv preprint arxiv: , 014. [1] H. Jegou, M. Douze, and C. Schmid, Product quantization for nearest neighbor search, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 1, pp , 011. [16] Y. Chen, T. Guan, and C. Wang, Approximate nearest neighbor search by residual vector quantization, Sensors, vol. 10, no. 1, pp , 010. [17] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen, Compressing neural networks with the hashing trick, arxiv preprint arxiv: , 01. [18], Compressing convolutional neural networks, arxiv preprint arxiv: , 01. [19] W. Murray and K.-M. Ng, An algorithm for nonlinear optimization problems with binary variables, Computational Optimization and Applications, vol. 47, no., pp. 7 88, 010. [0] V. Vanhoucke, A. Senior, and M. Z. Mao, Improving the speed of neural networks on cpus, in Proc. Deep Learning and Unsupervised Feature Learning NIPS Workshop, vol. 1, 011. [1] M. Mathieu, M. Henaff, and Y. LeCun, Fast training of convolutional networks through ffts, arxiv preprint arxiv:131.81, 013. [] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp , [3] F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio, Theano: new features and speed improvements, Deep Learning and Unsupervised Feature Learning NIPS 01 Workshop, 01. [4] F. Chollet, Keras: Theano-based deep learning library. [Online]. Available: [] C. F. Cadieu, H. Hong, D. L. Yamins, N. Pinto, D. Ardila, E. A. Solomon, N. J. Majaj, and J. J. DiCarlo, Deep neural networks rival the representation of primate it cortex for core visual object recognition, PLoS computational biology, vol. 10, no. 1, p. e , 014. [6] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, Robust object recognition with cortex-like mechanisms, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 9, no. 3, pp , 007. [7] Y. Mochizuki and S. Shinomoto, Analog and digital codes in the brain, Physical Review E, vol. 89, no., p. 070, 014.

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr

More information

arxiv: v1 [cs.cv] 11 May 2015 Abstract

arxiv: v1 [cs.cv] 11 May 2015 Abstract Training Deeper Convolutional Networks with Deep Supervision Liwei Wang Computer Science Dept UIUC lwang97@illinois.edu Chen-Yu Lee ECE Dept UCSD chl260@ucsd.edu Zhuowen Tu CogSci Dept UCSD ztu0@ucsd.edu

More information

arxiv: v1 [cs.cv] 21 Nov 2016

arxiv: v1 [cs.cv] 21 Nov 2016 Training Sparse Neural Networks Suraj Srinivas Akshayvarun Subramanya Indian Institute of Science Indian Institute of Science Bangalore, India Bangalore, India surajsrinivas@grads.cds.iisc.ac.in akshayvarun07@gmail.com

More information

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun and Rob Fergus Dept. of Computer Science, Courant Institute, New

More information

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Differentiable Fine-grained Quantization for Deep Neural Network Compression Differentiable Fine-grained Quantization for Deep Neural Network Compression Hsin-Pai Cheng hc218@duke.edu Yuanjun Huang University of Science and Technology of China Anhui, China yjhuang@mail.ustc.edu.cn

More information

Evolution in Groups: A Deeper Look at Synaptic Cluster Driven Evolution of Deep Neural Networks

Evolution in Groups: A Deeper Look at Synaptic Cluster Driven Evolution of Deep Neural Networks Evolution in Groups: A Deeper Look at Synaptic Cluster Driven Evolution of Deep Neural Networks Mohammad Javad Shafiee Department of Systems Design Engineering, University of Waterloo Waterloo, Ontario,

More information

Improved Local Coordinate Coding using Local Tangents

Improved Local Coordinate Coding using Local Tangents Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854

More information

arxiv: v1 [cs.cv] 1 Oct 2015

arxiv: v1 [cs.cv] 1 Oct 2015 A Deep Neural Network ion Pipeline: Pruning, Quantization, Huffman Encoding arxiv:1510.00149v1 [cs.cv] 1 Oct 2015 Song Han 1 Huizi Mao 2 William J. Dally 1 1 Stanford University 2 Tsinghua University {songhan,

More information

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY SIFT HOG ALL ABOUT THE FEATURES DAISY GABOR AlexNet GoogleNet CONVOLUTIONAL NEURAL NETWORKS VGG-19 ResNet FEATURES COMES FROM DATA

More information

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Ben Haeffele and René Vidal Center for Imaging Science Institute for Computational Medicine Learning Deep Image Feature Hierarchies

More information

Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error

Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error Chunhui Jiang, Guiying Li, Chao Qian, Ke Tang Anhui Province Key Lab of Big Data Analysis and Application, University

More information

TYPES OF MODEL COMPRESSION. Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad

TYPES OF MODEL COMPRESSION. Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad TYPES OF MODEL COMPRESSION Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad 1. Pruning 2. Quantization 3. Architectural Modifications PRUNING WHY PRUNING? Deep Neural Networks have redundant parameters.

More information

Classification of Hand-Written Digits Using Scattering Convolutional Network

Classification of Hand-Written Digits Using Scattering Convolutional Network Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview

More information

Deep Neural Network Compression with Single and Multiple Level Quantization

Deep Neural Network Compression with Single and Multiple Level Quantization The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Deep Neural Network Compression with Single and Multiple Level Quantization Yuhui Xu, 1 Yongzhuang Wang, 1 Aojun Zhou, 2 Weiyao Lin,

More information

arxiv: v1 [cs.ne] 9 Feb 2018

arxiv: v1 [cs.ne] 9 Feb 2018 Nature vs. Nurture: The Role of Environmental Resources in Evolutionary Deep Intelligence Audrey G. Chung, Paul Fieguth, and Alexander Wong Vision & Image Processing Research Group Dept. of Systems Design

More information

Learning Deep ResNet Blocks Sequentially using Boosting Theory

Learning Deep ResNet Blocks Sequentially using Boosting Theory Furong Huang 1 Jordan T. Ash 2 John Langford 3 Robert E. Schapire 3 Abstract We prove a multi-channel telescoping sum boosting theory for the ResNet architectures which simultaneously creates a new technique

More information

Accelerating Convolutional Neural Networks by Group-wise 2D-filter Pruning

Accelerating Convolutional Neural Networks by Group-wise 2D-filter Pruning Accelerating Convolutional Neural Networks by Group-wise D-filter Pruning Niange Yu Department of Computer Sicence and Technology Tsinghua University Beijing 0008, China yng@mails.tsinghua.edu.cn Shi Qiu

More information

arxiv: v4 [cs.cv] 6 Sep 2017

arxiv: v4 [cs.cv] 6 Sep 2017 Deep Pyramidal Residual Networks Dongyoon Han EE, KAIST dyhan@kaist.ac.kr Jiwhan Kim EE, KAIST jhkim89@kaist.ac.kr Junmo Kim EE, KAIST junmo.kim@kaist.ac.kr arxiv:1610.02915v4 [cs.cv] 6 Sep 2017 Abstract

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

arxiv: v3 [cs.lg] 2 Jun 2016

arxiv: v3 [cs.lg] 2 Jun 2016 Darryl D. Lin Qualcomm Research, San Diego, CA 922, USA Sachin S. Talathi Qualcomm Research, San Diego, CA 922, USA DARRYL.DLIN@GMAIL.COM TALATHI@GMAIL.COM arxiv:5.06393v3 [cs.lg] 2 Jun 206 V. Sreekanth

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion Tutorial on Methods for Interpreting and Understanding Deep Neural Networks W. Samek, G. Montavon, K.-R. Müller Part 3: Applications & Discussion ICASSP 2017 Tutorial W. Samek, G. Montavon & K.-R. Müller

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

arxiv: v1 [astro-ph.im] 20 Jan 2017

arxiv: v1 [astro-ph.im] 20 Jan 2017 IAU Symposium 325 on Astroinformatics Proceedings IAU Symposium No. xxx, xxx A.C. Editor, B.D. Editor & C.E. Editor, eds. c xxx International Astronomical Union DOI: 00.0000/X000000000000000X Deep learning

More information

Maxout Networks. Hien Quoc Dang

Maxout Networks. Hien Quoc Dang Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine

More information

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

arxiv: v7 [cs.ne] 2 Sep 2014

arxiv: v7 [cs.ne] 2 Sep 2014 Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, and Yoshua Bengio Département d Informatique et de Recherche Opérationelle Université

More information

Deep Learning with Low Precision by Half-wave Gaussian Quantization

Deep Learning with Low Precision by Half-wave Gaussian Quantization Deep Learning with Low Precision by Half-wave Gaussian Quantization Zhaowei Cai UC San Diego zwcai@ucsd.edu Xiaodong He Microsoft Research Redmond xiaohe@microsoft.com Jian Sun Megvii Inc. sunjian@megvii.com

More information

arxiv: v3 [cs.cv] 20 Nov 2015

arxiv: v3 [cs.cv] 20 Nov 2015 DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING Song Han Stanford University, Stanford, CA 94305, USA songhan@stanford.edu arxiv:1510.00149v3 [cs.cv]

More information

Handwritten Indic Character Recognition using Capsule Networks

Handwritten Indic Character Recognition using Capsule Networks Handwritten Indic Character Recognition using Capsule Networks Bodhisatwa Mandal,Suvam Dubey, Swarnendu Ghosh, RiteshSarkhel, Nibaran Das Dept. of CSE, Jadavpur University, Kolkata, 700032, WB, India.

More information

Fine-grained Classification

Fine-grained Classification Fine-grained Classification Marcel Simon Department of Mathematics and Computer Science, Germany marcel.simon@uni-jena.de http://www.inf-cv.uni-jena.de/ Seminar Talk 23.06.2015 Outline 1 Motivation 2 3

More information

Lower bounds on the robustness to adversarial perturbations

Lower bounds on the robustness to adversarial perturbations Lower bounds on the robustness to adversarial perturbations Jonathan Peck 1,2, Joris Roels 2,3, Bart Goossens 3, and Yvan Saeys 1,2 1 Department of Applied Mathematics, Computer Science and Statistics,

More information

Linear Feature Transform and Enhancement of Classification on Deep Neural Network

Linear Feature Transform and Enhancement of Classification on Deep Neural Network Linear Feature ransform and Enhancement of Classification on Deep Neural Network Penghang Yin, Jack Xin, and Yingyong Qi Abstract A weighted and convex regularized nuclear norm model is introduced to construct

More information

Improved Bayesian Compression

Improved Bayesian Compression Improved Bayesian Compression Marco Federici University of Amsterdam marco.federici@student.uva.nl Karen Ullrich University of Amsterdam karen.ullrich@uva.nl Max Welling University of Amsterdam Canadian

More information

Compressed Fisher vectors for LSVR

Compressed Fisher vectors for LSVR XRCE@ILSVRC2011 Compressed Fisher vectors for LSVR Florent Perronnin and Jorge Sánchez* Xerox Research Centre Europe (XRCE) *Now with CIII, Cordoba University, Argentina Our system in a nutshell High-dimensional

More information

Deep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks

Deep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. X, XXXX 2016 1 Deep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks Mohammad Javad Shafiee, Student Member, IEEE, Akshaya Mishra, Member,

More information

Theories of Deep Learning

Theories of Deep Learning Theories of Deep Learning Lecture 02 Donoho, Monajemi, Papyan Department of Statistics Stanford Oct. 4, 2017 1 / 50 Stats 385 Fall 2017 2 / 50 Stats 285 Fall 2017 3 / 50 Course info Wed 3:00-4:20 PM in

More information

P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks

P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks Rahul Duggal rahulduggal2608@gmail.com Anubha Gupta anubha@iiitd.ac.in SBILab (http://sbilab.iiitd.edu.in/) Deptt. of

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Convolutional neural networks

Convolutional neural networks 11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:

More information

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based UNDERSTANDING CNN ADAS Tasks Self Driving Localizati on Perception Planning/ Control Driver state Vehicle Diagnosis Smart factory Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

Accelerating Very Deep Convolutional Networks for Classification and Detection

Accelerating Very Deep Convolutional Networks for Classification and Detection Accelerating Very Deep Convolutional Networks for Classification and Detection Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun arxiv:55.6798v [cs.cv] 6 May 5 Abstract This paper aims to accelerate

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Deeply-Supervised Nets

Deeply-Supervised Nets Deeply-Supervised Nets Chen-Yu Lee Saining Xie Patrick W. Gallagher Dept. of ECE, UCSD Dept. of CSE and CogSci, UCSD Dept. of CogSci, UCSD chl60@ucsd.edu s9xie@ucsd.edu patrick.w.gallagher@gmail.com Zhengyou

More information

Spatial Transformer Networks

Spatial Transformer Networks BIL722 - Deep Learning for Computer Vision Spatial Transformer Networks Max Jaderberg Andrew Zisserman Karen Simonyan Koray Kavukcuoglu Contents Introduction to Spatial Transformers Related Works Spatial

More information

Analysis of Feature Maps Selection in Supervised Learning Using Convolutional Neural Networks

Analysis of Feature Maps Selection in Supervised Learning Using Convolutional Neural Networks Analysis of Selection in Supervised Learning Using Convolutional Neural Networks Joseph Lin Chu and Adam Krzyżak Department of Computer Science and Software Engineering, Concordia University, Montreal,

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Weight Quantization for Multi-layer Perceptrons Using Soft Weight Sharing

Weight Quantization for Multi-layer Perceptrons Using Soft Weight Sharing Weight Quantization for Multi-layer Perceptrons Using Soft Weight Sharing Fatih Köksal, Ethem Alpaydın, and Günhan Dündar 2 Department of Computer Engineering 2 Department of Electrical and Electronics

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Asaf Bar Zvi Adi Hayat. Semantic Segmentation Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random

More information

OBJECT DETECTION FROM MMS IMAGERY USING DEEP LEARNING FOR GENERATION OF ROAD ORTHOPHOTOS

OBJECT DETECTION FROM MMS IMAGERY USING DEEP LEARNING FOR GENERATION OF ROAD ORTHOPHOTOS OBJECT DETECTION FROM MMS IMAGERY USING DEEP LEARNING FOR GENERATION OF ROAD ORTHOPHOTOS Y. Li 1,*, M. Sakamoto 1, T. Shinohara 1, T. Satoh 1 1 PASCO CORPORATION, 2-8-10 Higashiyama, Meguro-ku, Tokyo 153-0043,

More information

arxiv: v1 [stat.ml] 15 Dec 2017

arxiv: v1 [stat.ml] 15 Dec 2017 BT-Nets: Simplifying Deep Neural Networks via Block Term Decomposition Guangxi Li 1, Jinmian Ye 1, Haiqin Yang 2, Di Chen 1, Shuicheng Yan 3, Zenglin Xu 1, 1 SMILE Lab, School of Comp. Sci. and Eng., Univ.

More information

Abstention Protocol for Accuracy and Speed

Abstention Protocol for Accuracy and Speed Abstention Protocol for Accuracy and Speed Abstract Amirata Ghorbani EE Dept. Stanford University amiratag@stanford.edu In order to confidently rely on machines to decide and perform tasks for us, there

More information

arxiv: v1 [stat.ml] 2 Sep 2014

arxiv: v1 [stat.ml] 2 Sep 2014 On the Equivalence Between Deep NADE and Generative Stochastic Networks Li Yao, Sherjil Ozair, Kyunghyun Cho, and Yoshua Bengio Département d Informatique et de Recherche Opérationelle Université de Montréal

More information

Deep Learning & Neural Networks Lecture 4

Deep Learning & Neural Networks Lecture 4 Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly

More information

Exploring the Granularity of Sparsity in Convolutional Neural Networks

Exploring the Granularity of Sparsity in Convolutional Neural Networks Exploring the Granularity of Sparsity in Convolutional Neural Networks Huizi Mao 1, Song Han 1, Jeff Pool 2, Wenshuo Li 3, Xingyu Liu 1, Yu Wang 3, William J. Dally 1,2 1 Stanford University 2 NVIDIA 3

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

Multi-Layer Boosting for Pattern Recognition

Multi-Layer Boosting for Pattern Recognition Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting

More information

38 1 Vol. 38, No ACTA AUTOMATICA SINICA January, Bag-of-phrases.. Image Representation Using Bag-of-phrases

38 1 Vol. 38, No ACTA AUTOMATICA SINICA January, Bag-of-phrases.. Image Representation Using Bag-of-phrases 38 1 Vol. 38, No. 1 2012 1 ACTA AUTOMATICA SINICA January, 2012 Bag-of-phrases 1, 2 1 1 1, Bag-of-words,,, Bag-of-words, Bag-of-phrases, Bag-of-words DOI,, Bag-of-words, Bag-of-phrases, SIFT 10.3724/SP.J.1004.2012.00046

More information

Deep Residual. Variations

Deep Residual. Variations Deep Residual Network and Its Variations Diyu Yang (Originally prepared by Kaiming He from Microsoft Research) Advantages of Depth Degradation Problem Possible Causes? Vanishing/Exploding Gradients. Overfitting

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Compressing deep neural networks

Compressing deep neural networks From Data to Decisions - M.Sc. Data Science Compressing deep neural networks Challenges and theoretical foundations Presenter: Simone Scardapane University of Exeter, UK Table of contents Introduction

More information

GENERATIVE MODELING OF CONVOLUTIONAL NEU-

GENERATIVE MODELING OF CONVOLUTIONAL NEU- GENERATIVE MODELING OF CONVOLUTIONAL NEU- RAL NETWORKS Jifeng Dai Microsoft Research jifdai@microsoft.com Yang Lu and Ying Nian Wu University of California, Los Angeles {yanglv, ywu}@stat.ucla.edu ABSTRACT

More information

CONVOLUTIONAL neural networks(cnns) [2] [4] have

CONVOLUTIONAL neural networks(cnns) [2] [4] have 1 Hartley Spectral Pooling for Deep Learning Hao Zhang, Jianwei Ma arxiv:1810.04028v1 [cs.cv] 7 Oct 2018 Abstract In most convolution neural networks (CNNs), downsampling hidden layers is adopted for increasing

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

Exploring the Granularity of Sparsity in Convolutional Neural Networks

Exploring the Granularity of Sparsity in Convolutional Neural Networks Exploring the Granularity of Sparsity in Convolutional Neural Networks Anonymous TMCV submission Abstract Sparsity helps reducing the computation complexity of DNNs by skipping the multiplication with

More information

Invariant Scattering Convolution Networks

Invariant Scattering Convolution Networks Invariant Scattering Convolution Networks Joan Bruna and Stephane Mallat Submitted to PAMI, Feb. 2012 Presented by Bo Chen Other important related papers: [1] S. Mallat, A Theory for Multiresolution Signal

More information

Abstract. 1. Introduction

Abstract. 1. Introduction Learning Efficient Deep Feature Representations via Transgenerational Genetic Transmission of Environmental Information during Evolutionary Synthesis of Deep Neural Networks M. J. Shafiee, E. Barshan,

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Memory-Augmented Attention Model for Scene Text Recognition

Memory-Augmented Attention Model for Scene Text Recognition Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

More information

Predicting Parameters in Deep Learning

Predicting Parameters in Deep Learning Predicting Parameters in Deep Learning Misha Denil 1 Babak Shakibi 2 Laurent Dinh 3 Marc Aurelio Ranzato 4 Nando de Freitas 1,2 1 University of Oxford, United Kingdom 2 University of British Columbia,

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

arxiv: v4 [cs.cv] 2 Aug 2016

arxiv: v4 [cs.cv] 2 Aug 2016 arxiv:1603.05279v4 [cs.cv] 2 Aug 2016 XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi Allen Institute for AI,

More information

Toward Correlating and Solving Abstract Tasks Using Convolutional Neural Networks Supplementary Material

Toward Correlating and Solving Abstract Tasks Using Convolutional Neural Networks Supplementary Material Toward Correlating and Solving Abstract Tasks Using Convolutional Neural Networks Supplementary Material Kuan-Chuan Peng Cornell University kp388@cornell.edu Tsuhan Chen Cornell University tsuhan@cornell.edu

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Very Deep Convolutional Neural Networks for LVCSR

Very Deep Convolutional Neural Networks for LVCSR INTERSPEECH 2015 Very Deep Convolutional Neural Networks for LVCSR Mengxiao Bi, Yanmin Qian, Kai Yu Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab,

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

Bayesian Networks (Part I)

Bayesian Networks (Part I) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

arxiv: v2 [cs.cv] 18 Jul 2017

arxiv: v2 [cs.cv] 18 Jul 2017 Interleaved Group Convolutions for Deep Neural Networks Ting Zhang 1 Guo-Jun Qi 2 Bin Xiao 1 Jingdong Wang 1 1 Microsoft Research 2 University of Central Florida {tinzhan, Bin.Xiao, jingdw}@microsoft.com

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

arxiv: v1 [cs.cv] 14 Sep 2016

arxiv: v1 [cs.cv] 14 Sep 2016 Understanding Convolutional Neural Networks with A Mathematical Model arxiv:1609.04112v1 [cs.cv] 14 Sep 2016 Abstract C.-C. Jay Kuo Ming-Hsieh Department of Electrical Engineering University of Southern

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se

More information

Binary Deep Learning. Presented by Roey Nagar and Kostya Berestizshevsky

Binary Deep Learning. Presented by Roey Nagar and Kostya Berestizshevsky Binary Deep Learning Presented by Roey Nagar and Kostya Berestizshevsky Deep Learning Seminar, School of Electrical Engineering, Tel Aviv University January 22 nd 2017 Lecture Outline Motivation and existing

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Frequency-Domain Dynamic Pruning for Convolutional Neural Networks

Frequency-Domain Dynamic Pruning for Convolutional Neural Networks Frequency-Domain Dynamic Pruning for Convolutional Neural Networks Zhenhua Liu 1, Jizheng Xu 2, Xiulian Peng 2, Ruiqin Xiong 1 1 Institute of Digital Media, School of Electronic Engineering and Computer

More information

CONVOLUTIONAL neural networks [18] have contributed

CONVOLUTIONAL neural networks [18] have contributed SUBMITTED FOR PUBLICATION, 2016 1 Multi-Residual Networks: Improving the Speed and Accuracy of Residual Networks Masoud Abdi, and Saeid Nahavandi, Senior Member, IEEE arxiv:1609.05672v4 cs.cv 15 Mar 2017

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

arxiv: v1 [cs.lg] 9 Nov 2015

arxiv: v1 [cs.lg] 9 Nov 2015 HOW FAR CAN WE GO WITHOUT CONVOLUTION: IM- PROVING FULLY-CONNECTED NETWORKS Zhouhan Lin & Roland Memisevic Université de Montréal Canada {zhouhan.lin, roland.memisevic}@umontreal.ca arxiv:1511.02580v1

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information