Compression of Deep Neural Networks on the Fly
|
|
- Lee Morton
- 6 years ago
- Views:
Transcription
1 Compression of Deep Neural Networks on the Fly Guillaume Soulié, Vincent Gripon, Maëlys Robert Télécom Bretagne arxiv: v1 [cs.lg] 9 Sep 01 Abstract Because of their performance, deep neural networks are increasingly used for object recognition. They are particularly attractive because of their ability to absorb great quantities of labeled data through millions of parameters. However, as the accuracy and the model sizes increase, so does the memory requirements of the classifiers. This prohibits their usage on resource limited hardware, including cell phones or other embedded devices. We introduce a novel compression method for deep neural networks that performs during the learning phase. It consists in adding an extra regularization term to the cost function of fully-connected layers. We combine this method with Product Quantization (PQ) of the trained weights for higher savings in memory and storage consumption. We evaluate our method on two data sets (MNIST and CIFAR10), on which we achieve significantly larger compression than state-of-the-art methods. I. MOTIVATION Deep Convolutional Neural Networks (CNNs) [1,, 3, 4] have become the gold standard for object recognition, image classification and retrieval. Almost all of the recent successful recognition systems [4,, 6, 7, 8, 9] are built on top of this architecture. Importing CNNs onto embedded platforms [10], the recent trend toward mobile computing, has a wide range of application impacts. It is especially useful when the bandwidth is limited and when files are sensitive and should not be sent to distant servers. However, the size of a CNN is typically large (e.g. more than 00M bytes), which limits the applicability of such models on embedded platforms. For example, it is hard to force users to download an iphone application that big. The purpose of this paper is to propose new techniques for compressing networks without sacrificing performance. In this article we focus on compressing CNNs for computer vision tasks, although our methodology is not taking any advantage of this particular application field and we expect it to perform similarly on other types of learning tasks. A typical CNN [, 7, 8] with good performance in visual object recognition contains eight layers (five convolutional layers and three dense connected layers) and a huge number of parameters in order to achieve state-of-the-art results. These parameters are overparameterized [11] and we aim at compressing them. Note that our motivation is mainly reducing storage rather than speeding up the testing time [1]. A few early works on compressing CNNs have been published. In [1] and [13] the authors use compressing methods This work was founded in part by the European Research Council under the European Union s Seventh Framework Program ( FP7 / ) / ERC grant agreement number for speeding up CNN testing time. More recently, some works focus on compressing neural network to reduce storage of the network. These works can generally be put in two different categories: some of them focus on compressing the fully connected layers and others on compressing the convolutionnal layers. In [14] the authors focus on compressing dense connected layers. In their work, instead of the traditional matrix factorization methods considered in [1, 13], they consider series of information theoretical vector quantization methods [1, 16] for compressing dense connected layers. They obtain good results by applying Product Quantization. In [17] the authors focus on compressing the fully connected layers of a Multi- Layer Perceptron (MLP) using Hashing Trick, a low cost hash function to randomly group connection weights into hash buckets, and set the same value to all the parameters in the same bucket. In [18] the authors propose compressing convolutionnal layers using a Discrete Cosinus Transform applied on the convolutionnal filters, followed by Hashing Trick, as for the fully connected layers. For a typical CNN as the one introduced in [8], about 90% of the storage is taken up by the dense connected layers and more than 90% of the running time is taken by the convolutional layers. This is why our work focus upon how to compress the dense connected layers to reduce storage of neural networks. Instead of using a post-learning method to compress the network, our approach consists in modifying the regularization function used during the learning phase in order to favor quantized weights in some layers especially the output ones. To achieve this, we use an idea that was originally proposed in [19]. In order to compress furthermore our obtained networks, we also use PQ as described in [14] afterwards. We perform some experiments both on Multi-Layer Perceptrons (MLP) and Convolutionnal Neural Networks (CNN). In this paper, we introduce a novel way to quantize weights in deep learning systems. More precisely : We introduce a regularization term that forces weights to converge to either 0 or 1, before using the product quantization on the learned weights. We show how this extra term impacts performance depending on the depth of the layer it is used onto. We experiment our proposed method on celebrated benchmarks and compare with state-of-the-art techniques. The outline of the paper is as follows. In Section II we discuss related work. Section III introduced our methodology
2 for compressing layers in deep neural networks. In Section IV we run experiments on celebrated databases. In section V we discuss the biological motivation for our work. Section VI is a conclusion. II. RELATED WORK As already mentioned in the introduction, a state-of-theart CNN usually involves hundreds of millions of parameters, thus requiring an important storage capability that may be hard to obtain in practice. The bottleneck comes from model storage and testing speed. Several works have been published on speeding up CNN prediction speed. In [0] the authors use tricks of CPUs to speed up the execution of CNN. In particular they focus on aligning memory and SIMD operations to boost matrix operations. In [1] the authors show the convolutional operations can be efficiently carried out in the Fourier domain, leading to a speed-up of 00%. Two very recent works [1, 13] use linear matrix factorization methods for speeding up convolutional layers and obtain a 00% speed-up gain with almost no lost in classification performance. Almost all of the above mentioned works focus on making the prediction speed of CNN faster; little work has been specifically devoted to making CNN models smaller. In [11], the authors demonstrate the redundancies in neural network parameters. Indeed, they show that the weights within one layer can be accurately predicted by a small (%) subset of the parameters. These results motivate [9] to apply vector quantization methods to compress the redundancy in parameter space. Their work can be viewed as a compression of the parameter prediction results reported in [11]. They obtain similar results to those of [11], in that they are able to compress the parameters about 0 times with little decrease of performance. This result further confirms the interesting empirical findings in [11]. In their paper, they tackle this model storage issue by investigating information theoretical vector quantization methods for compressing the parameters of CNNs. In particular, they show that in terms of compressing the most storage demanding dense connected layers, vector quantization method performs much better than existing matrix factorization methods. Simply applying k-means clustering to the weights or conducting product quantization can lead to a very good balance between model size and recognition accuracy. For the 1000-category classification task in the ImageNet challenge, they are able to achieve 16-4 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN. In [17], for the first time a learn-based method is proposed to compress neural networks. This method, based on Hashing Trick, allows efficient compression rates. In particular, they show that compressing a big neural network may be more efficient than using a smaller one: in their example they are able to divide the loss by two using a eight times larger neural network compressed eight times. The same authors also propose in [18] to compress filters in convolutional layers, arguing that the size of the convolutional layers in state-ofthe-art s CNN is increasing year after year. Using the fact that learned CNN filters are often smooth, their Discrete Cosinus Transform followed by Hashing Tricks allows them to compress a whole neural network without loosing too much accuracy. III. METHODOLOGY In this section, we consider two methods for compressing the parameters in CNN layers. We first introduce the product quantization method proposed in [14], and then introduce our proposed learn-based method. A. Product Quantization (PQ) For more details on this quantization method, refer to [14]. The key idea is to consider structured vector quantization methods for compressing parameters. More precisely, the idea consists in applying Product Quantization (PQ) [1] to exploit the redundancy of structures in vector space. PQ partitions the vector space into disjoint sub-spaces, and perform quantization in each of them. It performs increasingly better as the redundancy in each subspace grows. Specifically, given the matrix W, we partition it column-wise into several sub-matrices: W = [W 1, W,..., W s ], (1) where W i R m (n/s) assuming n is divisible by s ( (n/s) is named the segment size). We can then perform a k-means clustering for each sub-matrix W i : min m z=1 j=1 k wz i c i j, () where wi z denotes the z-th row of sub-matrix W i, and c i j denotes the j-th row of sub-codebook C i R k (n/s). For each sub-vector wi z, we need only store its corresponding cluster index and the associated codebooks. Thus, the reconstructed matrix is: Ŵ = [Ŵ 1, Ŵ,..., Ŵ s ], (3) where ŵ i j = c i j, j being a minimizer of min j w i z c i j. Note that PQ can be applied to either the x-axis or the y- axis of the matrix. In our experiments, we evaluated both cases and obtained similar performance. We need to store the cluster indexes and codebooks for each sub-vector. In particular, the codebook is not negligible. The compression rate for this method is (3mn)/(3kn+log (k)ms). With a fixed segment size, increasing k will lead to decreasing the compression rate. B. Proposed method Our proposed method is twofold: first, we use a specific added regularization term in order to attract network weights to binary values, then we coarsely quantize the output layers. Let us recall that training a neural network is generally performed thanks to the minimization of a cost function using a derivative of a gradient descent algorithm. In order to attract network weights to binary values, we add a binarization
3 cost (regularizer) during the learning phase. This added cost pushes weights to binary values. As a result, solutions of the minimization problem are expected to be binary or almost binary, depending on the scaling parameter of the added cost with respect to the initial one. This idea is not new, although we did not find any work applying it to deep learning in the literature. Our choice for the regularization term has been greatly inspired by [19]. More precisely, let us denote by W the weights of the neural network, f(w ) the cost associated with W, h W (X) and y(x) respectively the output and the label for a given input X, we obtain: f(w ) = X h W (X) y(x) + α w W w w 1, (4) where α is a scaling parameter representing the importance of the binarization cost with respect to the initial cost. Finding a good value for α may be tricky, as a too small value results in a failure of the binarization process and a too large value results in the creation of local minima that will prevent the network from successfully training. To facilitate this selection of α, we use a barrier method [19] that consists in starting with small values of α and incrementing it regularly to help the quantization process. We observed that typically some layers are very well quantified at the end of this learning phase, whereas others are still far from binary. For that reason we then binarize some of the layers but not all. In order to improve further our compression rate, we then use the PQ method presented in the previous subsection. The compression rate for this method is (3mn)/(kn + log (k)ms) (instead of (3mn)/(3kn + log (k)ms) for single Product Quantization). With a fixed segment size, increasing k will lead to decreasing the compression rate. IV. EXPERIMENTS We evaluate these different methods on two image classification datasets : MNIST and CIFAR10. A. Experimental settings 1) MNIST: The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixedsize image. The neural network we use with MNIST is LeNet. LeNet is a convolutional neural network introduced by Yann Lecun []. LeNet is one of the LeNet family networks, described in Figure 1. We use exactly the same parameters for the network (number of layer, sizes of layers, etc...). ) CIFAR10: The CIFAR10 database has a training set of 0,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from the 80 million tiny images dataset. It consists in 3x3 colour images partitioned into ten classes. With the CIFAR10 database, we use a convolutional neural network made of four convolutional layers followed by two RGB input layer 0 features maps 0 features maps (8 8) (4 4) (1 1) convolution layer max pooling layer Convolution... 0 features maps 0 features maps (8 8) (4 4) layer 1 1 max pooling layer hidden layer (00) fully connected MLP Figure 1. A graphical depiction of a LeNet model (figure inspired by [3]) fully connected layers. This network has been introduced in [4]. B. Layers to quantify Our first experiments (with the MNIST database) depicted in Figure show the influence of quantified layers on performance. We observe that performance strongly depends on which layers are quantized. More precisely, this experiment shows that one should quantize layers from the output to the input rather than the contrary. This result is not surprising to us as input layers have often been described as similar to wavelet transforms, which are intrinsically analog operators, whereas output layers are often compared to high level patterns which detection in an image is often enough for good classification results. C. Performance comparison Our second experiment shows a comparison with previous work. The results are depicted in Figures 3 and 4. Note that in both cases compared networks have the exact same architecture. As far as our proposed method is concerned, we chose to compress only the two outputs layers, which are fully connected. Since their sizes are distinct, we were not able to use the same PQ coefficients k and m twice. Note that layer contains almost all weights and is therefore the one we chose to investigate the role of each parameters. We found that our added regularization cost allows to significantly improve performance. For example for the MNIST database, if we want to respect a loss of %, we have a compression rate of 33 with single PQ, whereas our learnbased method leads to a compression rate of 107.
4 For decades the question of whether the encoding of information in the brain is analog or digital has been debated. A recent paper [7] performed some experiments emphasizing the brain might have different encodings depending on the region it is applied to. In particular they show that as far as the visual cortex is concerned, the further the region is from the sensors the most likely its encodings are digital. It is tempting to draw a parallel between these findings and the ability of the brain to associate analog-like data, such as sounds and images, with digital-data, such as language and labels. The results depicted in Figure suggest a similar behavior in deep artificial neural systems. Figure. Performance of the classification task depending on which layers of the network are quantized, on the MNIST database VI. CONCLUSION AND PERSPECTIVES In this paper we introduced a new method to compress convolutionnal neural networks. This method consists in adding an extra term to the cost function that forces weights to become almost binary. In order to compress even more the network, we then apply Product Quantization and the combination of both allows us to reach performance above state-of-the-art methods. We also demonstrate the influence of the depth of the binarized layer on performance. These findings are of particular interest to us, and a motivation to further explore the connections between actual biological data and deep neural systems. In future work, we consider exploring more complex regularization functions. Also, we plan on performing PQ on the fly as well during the learning phase. Finally, other strategies could try to take benefit from sparseness of the trained weights. Such sparsity could be enforced thanks to similar mechanisms as the ones we introduce in this paper. Figure 3. Comparison of our proposed method with previous work on the MNIST dataset. Figure 4. Comparison of our proposed method with previous work on the CIFAR10 dataset. V. BIOLOGICAL DISCUSSION Deep neural networks have often been compared to sensor parts of the human cortex, both in terms of architecture and performance []. As a matter of fact, some authors [6] propose models very similar to CNNs and biologically inspired.
5 REFERENCES [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in neural information processing systems, 01, pp [] B. B. Le Cun, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, Handwritten digit recognition with a back-propagation network, in Advances in neural information processing systems. Citeseer, [3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, arxiv preprint arxiv: , 014. [4] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arxiv preprint arxiv: , 014. [] Y. Jia, Caffe: An open source convolutional architecture for fast feature embedding, 013. [6] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, Decaf: A deep convolutional activation feature for generic visual recognition, arxiv preprint arxiv: , 013. [7] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, Overfeat: Integrated recognition, localization and detection using convolutional networks, arxiv preprint arxiv:131.69, 013. [8] M. D. Zeiler and R. Fergus, Stochastic pooling for regularization of deep convolutional neural networks, arxiv preprint arxiv: , 013. [9] Y. Gong, L. Wang, R. Guo, and S. Lazebnik, Multiscale orderless pooling of deep convolutional activation features, in Computer Vision ECCV 014. Springer, 014, pp [10] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, A 40 g-ops/s mobile coprocessor for deep neural networks, in Computer Vision and Pattern Recognition Workshops (CVPRW), 014 IEEE Conference on. IEEE, 014, pp [11] M. Denil, B. Shakibi, L. Dinh, N. de Freitas et al., Predicting parameters in deep learning, in Advances in Neural Information Processing Systems, 013, pp [1] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, Exploiting linear structure within convolutional networks for efficient evaluation, in Advances in Neural Information Processing Systems, 014, pp [13] M. Jaderberg, A. Vedaldi, and A. Zisserman, Speeding up convolutional neural networks with low rank expansions, arxiv preprint arxiv: , 014. [14] Y. Gong, L. Liu, M. Yang, and L. Bourdev, Compressing deep convolutional networks using vector quantization, arxiv preprint arxiv: , 014. [1] H. Jegou, M. Douze, and C. Schmid, Product quantization for nearest neighbor search, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 1, pp , 011. [16] Y. Chen, T. Guan, and C. Wang, Approximate nearest neighbor search by residual vector quantization, Sensors, vol. 10, no. 1, pp , 010. [17] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen, Compressing neural networks with the hashing trick, arxiv preprint arxiv: , 01. [18], Compressing convolutional neural networks, arxiv preprint arxiv: , 01. [19] W. Murray and K.-M. Ng, An algorithm for nonlinear optimization problems with binary variables, Computational Optimization and Applications, vol. 47, no., pp. 7 88, 010. [0] V. Vanhoucke, A. Senior, and M. Z. Mao, Improving the speed of neural networks on cpus, in Proc. Deep Learning and Unsupervised Feature Learning NIPS Workshop, vol. 1, 011. [1] M. Mathieu, M. Henaff, and Y. LeCun, Fast training of convolutional networks through ffts, arxiv preprint arxiv:131.81, 013. [] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp , [3] F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio, Theano: new features and speed improvements, Deep Learning and Unsupervised Feature Learning NIPS 01 Workshop, 01. [4] F. Chollet, Keras: Theano-based deep learning library. [Online]. Available: [] C. F. Cadieu, H. Hong, D. L. Yamins, N. Pinto, D. Ardila, E. A. Solomon, N. J. Majaj, and J. J. DiCarlo, Deep neural networks rival the representation of primate it cortex for core visual object recognition, PLoS computational biology, vol. 10, no. 1, p. e , 014. [6] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, Robust object recognition with cortex-like mechanisms, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 9, no. 3, pp , 007. [7] Y. Mochizuki and S. Shinomoto, Analog and digital codes in the brain, Physical Review E, vol. 89, no., p. 070, 014.
Sajid Anwar, Kyuyeon Hwang and Wonyong Sung
Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr
More informationarxiv: v1 [cs.cv] 11 May 2015 Abstract
Training Deeper Convolutional Networks with Deep Supervision Liwei Wang Computer Science Dept UIUC lwang97@illinois.edu Chen-Yu Lee ECE Dept UCSD chl260@ucsd.edu Zhuowen Tu CogSci Dept UCSD ztu0@ucsd.edu
More informationarxiv: v1 [cs.cv] 21 Nov 2016
Training Sparse Neural Networks Suraj Srinivas Akshayvarun Subramanya Indian Institute of Science Indian Institute of Science Bangalore, India Bangalore, India surajsrinivas@grads.cds.iisc.ac.in akshayvarun07@gmail.com
More informationExploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun and Rob Fergus Dept. of Computer Science, Courant Institute, New
More informationDifferentiable Fine-grained Quantization for Deep Neural Network Compression
Differentiable Fine-grained Quantization for Deep Neural Network Compression Hsin-Pai Cheng hc218@duke.edu Yuanjun Huang University of Science and Technology of China Anhui, China yjhuang@mail.ustc.edu.cn
More informationEvolution in Groups: A Deeper Look at Synaptic Cluster Driven Evolution of Deep Neural Networks
Evolution in Groups: A Deeper Look at Synaptic Cluster Driven Evolution of Deep Neural Networks Mohammad Javad Shafiee Department of Systems Design Engineering, University of Waterloo Waterloo, Ontario,
More informationImproved Local Coordinate Coding using Local Tangents
Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854
More informationarxiv: v1 [cs.cv] 1 Oct 2015
A Deep Neural Network ion Pipeline: Pruning, Quantization, Huffman Encoding arxiv:1510.00149v1 [cs.cv] 1 Oct 2015 Song Han 1 Huizi Mao 2 William J. Dally 1 1 Stanford University 2 Tsinghua University {songhan,
More informationRAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY
RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY SIFT HOG ALL ABOUT THE FEATURES DAISY GABOR AlexNet GoogleNet CONVOLUTIONAL NEURAL NETWORKS VGG-19 ResNet FEATURES COMES FROM DATA
More informationGlobal Optimality in Matrix and Tensor Factorizations, Deep Learning and More
Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Ben Haeffele and René Vidal Center for Imaging Science Institute for Computational Medicine Learning Deep Image Feature Hierarchies
More informationEfficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error
Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error Chunhui Jiang, Guiying Li, Chao Qian, Ke Tang Anhui Province Key Lab of Big Data Analysis and Application, University
More informationTYPES OF MODEL COMPRESSION. Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad
TYPES OF MODEL COMPRESSION Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad 1. Pruning 2. Quantization 3. Architectural Modifications PRUNING WHY PRUNING? Deep Neural Networks have redundant parameters.
More informationClassification of Hand-Written Digits Using Scattering Convolutional Network
Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview
More informationDeep Neural Network Compression with Single and Multiple Level Quantization
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Deep Neural Network Compression with Single and Multiple Level Quantization Yuhui Xu, 1 Yongzhuang Wang, 1 Aojun Zhou, 2 Weiyao Lin,
More informationarxiv: v1 [cs.ne] 9 Feb 2018
Nature vs. Nurture: The Role of Environmental Resources in Evolutionary Deep Intelligence Audrey G. Chung, Paul Fieguth, and Alexander Wong Vision & Image Processing Research Group Dept. of Systems Design
More informationLearning Deep ResNet Blocks Sequentially using Boosting Theory
Furong Huang 1 Jordan T. Ash 2 John Langford 3 Robert E. Schapire 3 Abstract We prove a multi-channel telescoping sum boosting theory for the ResNet architectures which simultaneously creates a new technique
More informationAccelerating Convolutional Neural Networks by Group-wise 2D-filter Pruning
Accelerating Convolutional Neural Networks by Group-wise D-filter Pruning Niange Yu Department of Computer Sicence and Technology Tsinghua University Beijing 0008, China yng@mails.tsinghua.edu.cn Shi Qiu
More informationarxiv: v4 [cs.cv] 6 Sep 2017
Deep Pyramidal Residual Networks Dongyoon Han EE, KAIST dyhan@kaist.ac.kr Jiwhan Kim EE, KAIST jhkim89@kaist.ac.kr Junmo Kim EE, KAIST junmo.kim@kaist.ac.kr arxiv:1610.02915v4 [cs.cv] 6 Sep 2017 Abstract
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationarxiv: v3 [cs.lg] 2 Jun 2016
Darryl D. Lin Qualcomm Research, San Diego, CA 922, USA Sachin S. Talathi Qualcomm Research, San Diego, CA 922, USA DARRYL.DLIN@GMAIL.COM TALATHI@GMAIL.COM arxiv:5.06393v3 [cs.lg] 2 Jun 206 V. Sreekanth
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationTutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion
Tutorial on Methods for Interpreting and Understanding Deep Neural Networks W. Samek, G. Montavon, K.-R. Müller Part 3: Applications & Discussion ICASSP 2017 Tutorial W. Samek, G. Montavon & K.-R. Müller
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationarxiv: v1 [astro-ph.im] 20 Jan 2017
IAU Symposium 325 on Astroinformatics Proceedings IAU Symposium No. xxx, xxx A.C. Editor, B.D. Editor & C.E. Editor, eds. c xxx International Astronomical Union DOI: 00.0000/X000000000000000X Deep learning
More informationMaxout Networks. Hien Quoc Dang
Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine
More informationLecture 7 Convolutional Neural Networks
Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationarxiv: v7 [cs.ne] 2 Sep 2014
Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, and Yoshua Bengio Département d Informatique et de Recherche Opérationelle Université
More informationDeep Learning with Low Precision by Half-wave Gaussian Quantization
Deep Learning with Low Precision by Half-wave Gaussian Quantization Zhaowei Cai UC San Diego zwcai@ucsd.edu Xiaodong He Microsoft Research Redmond xiaohe@microsoft.com Jian Sun Megvii Inc. sunjian@megvii.com
More informationarxiv: v3 [cs.cv] 20 Nov 2015
DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING Song Han Stanford University, Stanford, CA 94305, USA songhan@stanford.edu arxiv:1510.00149v3 [cs.cv]
More informationHandwritten Indic Character Recognition using Capsule Networks
Handwritten Indic Character Recognition using Capsule Networks Bodhisatwa Mandal,Suvam Dubey, Swarnendu Ghosh, RiteshSarkhel, Nibaran Das Dept. of CSE, Jadavpur University, Kolkata, 700032, WB, India.
More informationFine-grained Classification
Fine-grained Classification Marcel Simon Department of Mathematics and Computer Science, Germany marcel.simon@uni-jena.de http://www.inf-cv.uni-jena.de/ Seminar Talk 23.06.2015 Outline 1 Motivation 2 3
More informationLower bounds on the robustness to adversarial perturbations
Lower bounds on the robustness to adversarial perturbations Jonathan Peck 1,2, Joris Roels 2,3, Bart Goossens 3, and Yvan Saeys 1,2 1 Department of Applied Mathematics, Computer Science and Statistics,
More informationLinear Feature Transform and Enhancement of Classification on Deep Neural Network
Linear Feature ransform and Enhancement of Classification on Deep Neural Network Penghang Yin, Jack Xin, and Yingyong Qi Abstract A weighted and convex regularized nuclear norm model is introduced to construct
More informationImproved Bayesian Compression
Improved Bayesian Compression Marco Federici University of Amsterdam marco.federici@student.uva.nl Karen Ullrich University of Amsterdam karen.ullrich@uva.nl Max Welling University of Amsterdam Canadian
More informationCompressed Fisher vectors for LSVR
XRCE@ILSVRC2011 Compressed Fisher vectors for LSVR Florent Perronnin and Jorge Sánchez* Xerox Research Centre Europe (XRCE) *Now with CIII, Cordoba University, Argentina Our system in a nutshell High-dimensional
More informationDeep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks
IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. X, XXXX 2016 1 Deep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks Mohammad Javad Shafiee, Student Member, IEEE, Akshaya Mishra, Member,
More informationTheories of Deep Learning
Theories of Deep Learning Lecture 02 Donoho, Monajemi, Papyan Department of Statistics Stanford Oct. 4, 2017 1 / 50 Stats 385 Fall 2017 2 / 50 Stats 285 Fall 2017 3 / 50 Course info Wed 3:00-4:20 PM in
More informationP-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks
P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks Rahul Duggal rahulduggal2608@gmail.com Anubha Gupta anubha@iiitd.ac.in SBILab (http://sbilab.iiitd.edu.in/) Deptt. of
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationConvolutional neural networks
11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:
More informationTasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based
UNDERSTANDING CNN ADAS Tasks Self Driving Localizati on Perception Planning/ Control Driver state Vehicle Diagnosis Smart factory Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationAccelerating Very Deep Convolutional Networks for Classification and Detection
Accelerating Very Deep Convolutional Networks for Classification and Detection Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun arxiv:55.6798v [cs.cv] 6 May 5 Abstract This paper aims to accelerate
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationDeeply-Supervised Nets
Deeply-Supervised Nets Chen-Yu Lee Saining Xie Patrick W. Gallagher Dept. of ECE, UCSD Dept. of CSE and CogSci, UCSD Dept. of CogSci, UCSD chl60@ucsd.edu s9xie@ucsd.edu patrick.w.gallagher@gmail.com Zhengyou
More informationSpatial Transformer Networks
BIL722 - Deep Learning for Computer Vision Spatial Transformer Networks Max Jaderberg Andrew Zisserman Karen Simonyan Koray Kavukcuoglu Contents Introduction to Spatial Transformers Related Works Spatial
More informationAnalysis of Feature Maps Selection in Supervised Learning Using Convolutional Neural Networks
Analysis of Selection in Supervised Learning Using Convolutional Neural Networks Joseph Lin Chu and Adam Krzyżak Department of Computer Science and Software Engineering, Concordia University, Montreal,
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationWeight Quantization for Multi-layer Perceptrons Using Soft Weight Sharing
Weight Quantization for Multi-layer Perceptrons Using Soft Weight Sharing Fatih Köksal, Ethem Alpaydın, and Günhan Dündar 2 Department of Computer Engineering 2 Department of Electrical and Electronics
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationAsaf Bar Zvi Adi Hayat. Semantic Segmentation
Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random
More informationOBJECT DETECTION FROM MMS IMAGERY USING DEEP LEARNING FOR GENERATION OF ROAD ORTHOPHOTOS
OBJECT DETECTION FROM MMS IMAGERY USING DEEP LEARNING FOR GENERATION OF ROAD ORTHOPHOTOS Y. Li 1,*, M. Sakamoto 1, T. Shinohara 1, T. Satoh 1 1 PASCO CORPORATION, 2-8-10 Higashiyama, Meguro-ku, Tokyo 153-0043,
More informationarxiv: v1 [stat.ml] 15 Dec 2017
BT-Nets: Simplifying Deep Neural Networks via Block Term Decomposition Guangxi Li 1, Jinmian Ye 1, Haiqin Yang 2, Di Chen 1, Shuicheng Yan 3, Zenglin Xu 1, 1 SMILE Lab, School of Comp. Sci. and Eng., Univ.
More informationAbstention Protocol for Accuracy and Speed
Abstention Protocol for Accuracy and Speed Abstract Amirata Ghorbani EE Dept. Stanford University amiratag@stanford.edu In order to confidently rely on machines to decide and perform tasks for us, there
More informationarxiv: v1 [stat.ml] 2 Sep 2014
On the Equivalence Between Deep NADE and Generative Stochastic Networks Li Yao, Sherjil Ozair, Kyunghyun Cho, and Yoshua Bengio Département d Informatique et de Recherche Opérationelle Université de Montréal
More informationDeep Learning & Neural Networks Lecture 4
Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly
More informationExploring the Granularity of Sparsity in Convolutional Neural Networks
Exploring the Granularity of Sparsity in Convolutional Neural Networks Huizi Mao 1, Song Han 1, Jeff Pool 2, Wenshuo Li 3, Xingyu Liu 1, Yu Wang 3, William J. Dally 1,2 1 Stanford University 2 NVIDIA 3
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationMulti-Layer Boosting for Pattern Recognition
Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting
More information38 1 Vol. 38, No ACTA AUTOMATICA SINICA January, Bag-of-phrases.. Image Representation Using Bag-of-phrases
38 1 Vol. 38, No. 1 2012 1 ACTA AUTOMATICA SINICA January, 2012 Bag-of-phrases 1, 2 1 1 1, Bag-of-words,,, Bag-of-words, Bag-of-phrases, Bag-of-words DOI,, Bag-of-words, Bag-of-phrases, SIFT 10.3724/SP.J.1004.2012.00046
More informationDeep Residual. Variations
Deep Residual Network and Its Variations Diyu Yang (Originally prepared by Kaiming He from Microsoft Research) Advantages of Depth Degradation Problem Possible Causes? Vanishing/Exploding Gradients. Overfitting
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationTwo at Once: Enhancing Learning and Generalization Capacities via IBN-Net
Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationCompressing deep neural networks
From Data to Decisions - M.Sc. Data Science Compressing deep neural networks Challenges and theoretical foundations Presenter: Simone Scardapane University of Exeter, UK Table of contents Introduction
More informationGENERATIVE MODELING OF CONVOLUTIONAL NEU-
GENERATIVE MODELING OF CONVOLUTIONAL NEU- RAL NETWORKS Jifeng Dai Microsoft Research jifdai@microsoft.com Yang Lu and Ying Nian Wu University of California, Los Angeles {yanglv, ywu}@stat.ucla.edu ABSTRACT
More informationCONVOLUTIONAL neural networks(cnns) [2] [4] have
1 Hartley Spectral Pooling for Deep Learning Hao Zhang, Jianwei Ma arxiv:1810.04028v1 [cs.cv] 7 Oct 2018 Abstract In most convolution neural networks (CNNs), downsampling hidden layers is adopted for increasing
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationLecture 13 Visual recognition
Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object
More informationExploring the Granularity of Sparsity in Convolutional Neural Networks
Exploring the Granularity of Sparsity in Convolutional Neural Networks Anonymous TMCV submission Abstract Sparsity helps reducing the computation complexity of DNNs by skipping the multiplication with
More informationInvariant Scattering Convolution Networks
Invariant Scattering Convolution Networks Joan Bruna and Stephane Mallat Submitted to PAMI, Feb. 2012 Presented by Bo Chen Other important related papers: [1] S. Mallat, A Theory for Multiresolution Signal
More informationAbstract. 1. Introduction
Learning Efficient Deep Feature Representations via Transgenerational Genetic Transmission of Environmental Information during Evolutionary Synthesis of Deep Neural Networks M. J. Shafiee, E. Barshan,
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationMemory-Augmented Attention Model for Scene Text Recognition
Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences
More informationPredicting Parameters in Deep Learning
Predicting Parameters in Deep Learning Misha Denil 1 Babak Shakibi 2 Laurent Dinh 3 Marc Aurelio Ranzato 4 Nando de Freitas 1,2 1 University of Oxford, United Kingdom 2 University of British Columbia,
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationarxiv: v4 [cs.cv] 2 Aug 2016
arxiv:1603.05279v4 [cs.cv] 2 Aug 2016 XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi Allen Institute for AI,
More informationToward Correlating and Solving Abstract Tasks Using Convolutional Neural Networks Supplementary Material
Toward Correlating and Solving Abstract Tasks Using Convolutional Neural Networks Supplementary Material Kuan-Chuan Peng Cornell University kp388@cornell.edu Tsuhan Chen Cornell University tsuhan@cornell.edu
More informationDeep Learning: a gentle introduction
Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationVery Deep Convolutional Neural Networks for LVCSR
INTERSPEECH 2015 Very Deep Convolutional Neural Networks for LVCSR Mengxiao Bi, Yanmin Qian, Kai Yu Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab,
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More informationBayesian Networks (Part I)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationarxiv: v2 [cs.cv] 18 Jul 2017
Interleaved Group Convolutions for Deep Neural Networks Ting Zhang 1 Guo-Jun Qi 2 Bin Xiao 1 Jingdong Wang 1 1 Microsoft Research 2 University of Central Florida {tinzhan, Bin.Xiao, jingdw}@microsoft.com
More informationA Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality
A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,
More informationarxiv: v1 [cs.cv] 14 Sep 2016
Understanding Convolutional Neural Networks with A Mathematical Model arxiv:1609.04112v1 [cs.cv] 14 Sep 2016 Abstract C.-C. Jay Kuo Ming-Hsieh Department of Electrical Engineering University of Southern
More informationStatistical Machine Learning
Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se
More informationBinary Deep Learning. Presented by Roey Nagar and Kostya Berestizshevsky
Binary Deep Learning Presented by Roey Nagar and Kostya Berestizshevsky Deep Learning Seminar, School of Electrical Engineering, Tel Aviv University January 22 nd 2017 Lecture Outline Motivation and existing
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationFrequency-Domain Dynamic Pruning for Convolutional Neural Networks
Frequency-Domain Dynamic Pruning for Convolutional Neural Networks Zhenhua Liu 1, Jizheng Xu 2, Xiulian Peng 2, Ruiqin Xiong 1 1 Institute of Digital Media, School of Electronic Engineering and Computer
More informationCONVOLUTIONAL neural networks [18] have contributed
SUBMITTED FOR PUBLICATION, 2016 1 Multi-Residual Networks: Improving the Speed and Accuracy of Residual Networks Masoud Abdi, and Saeid Nahavandi, Senior Member, IEEE arxiv:1609.05672v4 cs.cv 15 Mar 2017
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationarxiv: v1 [cs.lg] 9 Nov 2015
HOW FAR CAN WE GO WITHOUT CONVOLUTION: IM- PROVING FULLY-CONNECTED NETWORKS Zhouhan Lin & Roland Memisevic Université de Montréal Canada {zhouhan.lin, roland.memisevic}@umontreal.ca arxiv:1511.02580v1
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More information