Expressive Efficiency and Inductive Bias of Convolutional Networks:
|
|
- Roger Reed
- 5 years ago
- Views:
Transcription
1 Expressive Efficiency and Inductive Bias of Convolutional Networks: Analysis & Design via Hierarchical Tensor Decompositions Nadav Cohen The Hebrew University of Jerusalem AAAI Spring Symposium Series 2017 Science of Intelligence: Computational Principles of Natural and Artificial Intelligence Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
2 Sources Deep SimNets N. Cohen, O. Sharir and A. Shashua Computer Vision and Pattern Recognition (CVPR) 2016 On the Expressive Power of Deep Learning: A Tensor Analysis N. Cohen, O. Sharir and A. Shashua Conference on Learning Theory (COLT) 2016 Convolutional Rectifier Networks as Generalized Tensor Decompositions N. Cohen and A. Shashua International Conference on Machine Learning (ICML) 2016 Inductive Bias of Deep Convolutional Networks through Pooling Geometry N. Cohen and A. Shashua International Conference on Learning Representations (ICLR) 2017 Tractable Generative Convolutional Arithmetic Circuits O. Sharir. R. Tamari, N. Cohen and A. Shashua arxiv preprint 2017 On the Expressive Power of Overlapping Operations of Deep Networks O. Sharir and A. Shashua arxiv preprint 2017 Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions N. Cohen, R. Tamari and A. Shashua arxiv preprint 2017 Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
3 Collaborators Prof. Amnon Shashua Or Sharir David Yakira Ronen Tamari Yoav Levine Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
4 Classic vs. State of the Art Deep Learning Classic Multilayer Perceptron (MLP) Architectural choices: depth layer widths activation types Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
5 Classic vs. State of the Art Deep Learning Classic State of the Art Multilayer Perceptron (MLP) Architectural choices: depth layer widths activation types Convolutional Networks (ConvNets) Architectural choices: depth layer widths activation types pooling types convolution/pooling windows convolution/pooling strides dilation factors connectivity and more... Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
6 Classic vs. State of the Art Deep Learning Classic State of the Art Multilayer Perceptron (MLP) Architectural choices: depth layer widths activation types Convolutional Networks (ConvNets) Architectural choices: depth layer widths activation types pooling types convolution/pooling windows convolution/pooling strides dilation factors connectivity and more... Can the architectural choices of state of the art ConvNets be theoretically analyzed? Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
7 Outline Expressiveness 1 Expressiveness 2 Expressiveness of Convolutional Networks Questions 3 Convolutional Arithmetic Circuits 4 Efficiency of Depth (Cohen+Sharir+Shashua@COLT 16, Cohen+Shashua@ICML 16) 5 Inductive Bias of Pooling Geometry (Cohen+Shashua@ICLR 17) 6 Efficiency of Overlapping Operations (Sharir+Shashua@arXiv 17) 7 Efficiency of Interconnectivity (Cohen+Tamari+Shashua@arXiv 17) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
8 Expressiveness Expressiveness Expressiveness: Ability to compactly represent rich and effective classes of func The driving force behind deep networks Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
9 Expressiveness Expressiveness Expressiveness: Ability to compactly represent rich and effective classes of func The driving force behind deep networks Fundamental theoretical questions: What kind of func can different network arch represent? Why are these func suitable for real-world tasks? What is the representational benefit of depth? Can other arch features deliver representational benefits? Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
10 Efficiency Expressiveness Expressive efficiency compares network arch in terms of their ability to compactly represent func Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
11 Efficiency Expressiveness Expressive efficiency compares network arch in terms of their ability to compactly represent func Let: H A space of func compactly representable by network arch A H B - - network arch B Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
12 Efficiency Expressiveness Expressive efficiency compares network arch in terms of their ability to compactly represent func Let: H A space of func compactly representable by network arch A H B - - network arch B A is efficient w.r.t. B if H B is a strict subset of H A H A H B Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
13 Efficiency Expressiveness Expressive efficiency compares network arch in terms of their ability to compactly represent func Let: H A space of func compactly representable by network arch A H B - - network arch B A is efficient w.r.t. B if H B is a strict subset of H A H A H B A is completely efficient w.r.t. B if H B has zero volume inside H A H A H B Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
14 Expressiveness Efficiency Formal Definition Network arch A is efficient w.r.t. network arch B if: (1) func realized by B w/size r B can be realized by A w/size r A O(r B ) (2) func realized by A w/size r A requiring B to have size r B Ω(f (r A )), where f ( ) is super-linear A is completely efficient w.r.t. B if (2) holds for all of its func but a set of Lebesgue measure zero (in weight space) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
15 Inductive Bias Expressiveness Networks of reasonable size can only realize a fraction of all possible func Efficiency does not explain why this fraction is effective all func Why are these functions interesting? H A H B Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
16 Inductive Bias Expressiveness Networks of reasonable size can only realize a fraction of all possible func Efficiency does not explain why this fraction is effective all func Why are these functions interesting? H A H B To explain the effectiveness, one must consider the inductive bias: Not all func are equally useful for a given task Network only needs to represent useful func Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
17 Expressiveness of Convolutional Networks Questions Outline 1 Expressiveness 2 Expressiveness of Convolutional Networks Questions 3 Convolutional Arithmetic Circuits 4 Efficiency of Depth (Cohen+Sharir+Shashua@COLT 16, Cohen+Shashua@ICML 16) 5 Inductive Bias of Pooling Geometry (Cohen+Shashua@ICLR 17) 6 Efficiency of Overlapping Operations (Sharir+Shashua@arXiv 17) 7 Efficiency of Interconnectivity (Cohen+Tamari+Shashua@arXiv 17) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
18 Expressiveness of Convolutional Networks Questions Efficiency of Depth Longstanding conjecture, proven for MLP: deep networks are efficient w.r.t. shallow ones Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
19 Expressiveness of Convolutional Networks Questions Efficiency of Depth Longstanding conjecture, proven for MLP: deep networks are efficient w.r.t. shallow ones Q: Can this be proven for ConvNets? Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
20 Expressiveness of Convolutional Networks Questions Efficiency of Depth Longstanding conjecture, proven for MLP: deep networks are efficient w.r.t. shallow ones Q: Can this be proven for ConvNets? Q: Is their efficiency of depth complete? Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
21 Expressiveness of Convolutional Networks Questions Inductive Bias of Convolution/Pooling Geometry ConvNets typically employ square conv/pool windows Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
22 Expressiveness of Convolutional Networks Questions Inductive Bias of Convolution/Pooling Geometry ConvNets typically employ square conv/pool windows Recently, dilated windows have also become popular Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
23 Expressiveness of Convolutional Networks Questions Inductive Bias of Convolution/Pooling Geometry ConvNets typically employ square conv/pool windows Recently, dilated windows have also become popular Q: What is the inductive bias of conv/pool window geometry? Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
24 Expressiveness of Convolutional Networks Questions Inductive Bias of Convolution/Pooling Geometry ConvNets typically employ square conv/pool windows Recently, dilated windows have also become popular Q: What is the inductive bias of conv/pool window geometry? Q: Can the geometries be tailored for a given task? Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
25 Expressiveness of Convolutional Networks Questions Efficiency of Overlapping Operations Modern ConvNets employ both overlapping and non-overlapping conv/pool operations Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
26 Expressiveness of Convolutional Networks Questions Efficiency of Overlapping Operations Modern ConvNets employ both overlapping and non-overlapping conv/pool operations Q: Do overlapping operations introduce efficiency? Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
27 Expressiveness of Convolutional Networks Questions Efficiency of Connectivity Schemes Nearly all state of the art ConvNets employ elaborate connectivity schemes Inception (GoogLeNet) ResNet DenseNet Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
28 Expressiveness of Convolutional Networks Questions Efficiency of Connectivity Schemes Nearly all state of the art ConvNets employ elaborate connectivity schemes Inception (GoogLeNet) ResNet DenseNet Q: Can this be justified in terms of efficiency? Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
29 Outline Convolutional Arithmetic Circuits 1 Expressiveness 2 Expressiveness of Convolutional Networks Questions 3 Convolutional Arithmetic Circuits 4 Efficiency of Depth (Cohen+Sharir+Shashua@COLT 16, Cohen+Shashua@ICML 16) 5 Inductive Bias of Pooling Geometry (Cohen+Shashua@ICLR 17) 6 Efficiency of Overlapping Operations (Sharir+Shashua@arXiv 17) 7 Efficiency of Interconnectivity (Cohen+Tamari+Shashua@arXiv 17) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
30 Convolutional Arithmetic Circuits Convolutional Arithmetic Circuits To address raised questions, we consider a surrogate (special case) of ConvNets Convolutional Arithmetic Circuits (ConvACs) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
31 Convolutional Arithmetic Circuits Convolutional Arithmetic Circuits To address raised questions, we consider a surrogate (special case) of ConvNets Convolutional Arithmetic Circuits (ConvACs) ConvACs are equivalent to hierarchical tensor decompositions, allowing theoretical analysis w/mathematical tools from various fields, e.g.: Functional Analysis Measure Theory Matrix Algebra Graph Theory Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
32 Convolutional Arithmetic Circuits Convolutional Arithmetic Circuits To address raised questions, we consider a surrogate (special case) of ConvNets Convolutional Arithmetic Circuits (ConvACs) ConvACs are equivalent to hierarchical tensor decompositions, allowing theoretical analysis w/mathematical tools from various fields, e.g.: Functional Analysis Measure Theory Matrix Algebra Graph Theory ConvACs are superior to ReLU ConvNets in terms of expressiveness 1 ; deliver promising results in practice: Excel in computationally constrained settings 2 Classify optimally under missing data 3 1 Convolutional Rectifier Networks as Generalized Tensor Decompositions, ICML 16 2 Deep SimNets, CVPR 16 3 Tractable Generative Convolutional Arithmetic Circuits, arxiv 17 Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
33 Convolutional Arithmetic Circuits Baseline Architecture hidden layer 0 hidden layer L-1 input X representation 1x1 conv pooling 1x1 conv x i pooling dense (output), x 0, conv0 j, a, rep j,: rep i d f d i M r 0 0 pool j, conv j ', r rl 1 r L 1 pooll 1 convl 1 j ', j ' window j j ' covers space out y 0 0 a Ly, Y, pooll 1 : Baseline ConvAC architecture: 2D ConvNet Linear activation (σ(z) = z), product pooling (P{c j } = j c j) 1 1 convolution windows Non-overlapping pooling windows Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
34 Convolutional Arithmetic Circuits Grid Tensors ConvNets realize func over many local structures: f (x 1, x 2,..., x N ) x i image patches (2D network) / sequence samples (1D network) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
35 Convolutional Arithmetic Circuits Grid Tensors ConvNets realize func over many local structures: f (x 1, x 2,..., x N ) x i image patches (2D network) / sequence samples (1D network) f ( ) may be studied by discretizing each x i into one of {v (1),..., v (M) }: A d1...d N = f (v (d1)... v (dn) ), d 1...d N {1,..., M} Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
36 Convolutional Arithmetic Circuits Grid Tensors ConvNets realize func over many local structures: f (x 1, x 2,..., x N ) x i image patches (2D network) / sequence samples (1D network) f ( ) may be studied by discretizing each x i into one of {v (1),..., v (M) }: A d1...d N = f (v (d1)... v (dn) ), d 1...d N {1,..., M} The lookup table A is: an N-dim array (tensor) w/length M in each axis referred to as the grid tensor of f ( ) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
37 Convolutional Arithmetic Circuits Tensor Decompositions Compact Parameterizations High-dim tensors (arrays) are exponentially large cannot be used directly May be represented and manipulated via tensor decompositions: Compact algebraic parameterizations Generalization of low-rank matrix decompositions large matrix compact parameterization = Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
38 Convolutional Arithmetic Circuits Hierarchical Tensor Decompositions Hierarchical tensor decompositions represent high-dim tensors by incrementally generating intermediate tensors of increasing dim Generation process can be described by a tree over tensor modes (axes) {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16} {1,2,3,4,5,6,7,8} {9,10,11,12,13,14,15,16} decomposition output {1,2,3,4} {5,6,7,8} {9,10,11,12} {13,14,15,16} {1,2} {3,4} {5,6} {7,8} {9,10} {11,12} {13,14} {15,16} {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15} {16} 16D tensors 8D tensors 4D tensors 2D tensors 1D tensors Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
39 Convolutional Arithmetic Circuits Convolutional Arithmetic Circuits Hierarchical Tensor Decompositions Observation Grid tensors of func realized by ConvACs are given by hierarchical tensor decompositions: network structure (depth, width, pooling etc) decomposition type (dim tree, internal ranks etc) network weights decomposition parameters Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
40 Convolutional Arithmetic Circuits Convolutional Arithmetic Circuits Hierarchical Tensor Decompositions Observation Grid tensors of func realized by ConvACs are given by hierarchical tensor decompositions: network structure (depth, width, pooling etc) decomposition type (dim tree, internal ranks etc) network weights decomposition parameters We can study networks through corresponding decompositions! Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
41 Convolutional Arithmetic Circuits Example 1: Shallow Network CP Decomposition Shallow network (single hidden layer, global pooling): hidden layer input X representation 1x1 conv x i, x 0, j, conv j, a, rep j,: pool conv j, rep i d f d i global pooling M r r 0 0 j covers space corresponds to classic CP decomposition: A y = r 0 γ=1 a 1,1,y γ dense (output) out y a 1,1, y Y, pool : a 0,1,γ a 0,2,γ a 0,N,γ ( outer product) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
42 Convolutional Arithmetic Circuits Example 2: Deep Network HT Decomposition Deep network with size-2 pooling: hidden layer 0 hidden layer L-1 input X representation 1x1 conv pooling (L=log 2 N) x i 1x1 conv pooling dense (output) rep i, d f d xi M r 0 0 j ' 2 j 1,2 j r rl 1 r L 1 0, j, conv0 j, a, rep j,: pooll 1 convl 1 j ', j ' 1,2 pool0 j, conv0 j ', out y a corresponds to Hierarchical Tucker (HT) decomposition: φ 1,j,γ = r 0 α=1 a1,j,γ α φ l,j,γ = r l 1 A y = r L 1 α=1 al,j,γ α α=1 al,1,y α a 0,2j 1,α a 0,2j,α φ l 1,2j 1,α φ l 1,2j,α φ L 1,1,α φ L 1,2,α L,1, Y, pooll 1 : y Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
43 Outline Efficiency of Depth 1 Expressiveness 2 Expressiveness of Convolutional Networks Questions 3 Convolutional Arithmetic Circuits 4 Efficiency of Depth (Cohen+Sharir+Shashua@COLT 16, Cohen+Shashua@ICML 16) 5 Inductive Bias of Pooling Geometry (Cohen+Shashua@ICLR 17) 6 Efficiency of Overlapping Operations (Sharir+Shashua@arXiv 17) 7 Efficiency of Interconnectivity (Cohen+Tamari+Shashua@arXiv 17) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
44 Tensor Matricization Efficiency of Depth 16, 16 Let A be a tensor of order (dim) N Let (I, J) be a partition of [N], i.e. I J = [N] := {1,..., N} Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
45 Tensor Matricization Efficiency of Depth 16, 16 Let A be a tensor of order (dim) N Let (I, J) be a partition of [N], i.e. I J = [N] := {1,..., N} A I,J matricization of A w.r.t. (I, J): Arrangement of A as matrix Rows correspond to modes (axes) indexed by I Cols - - J mode 1 A 111 A 121 A 211 A 221 mode 2 mode 3 matricization w.r.t. I={1,3} J={2} modes 1 & 3 A 111 A 112 A 113 A 211 A 212 A 213 A 121 A 122 A 123 A 221 A 222 A 223 mode 2 Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
46 Efficiency of Depth 16, 16 Exponential & Complete Efficiency of Depth Claim Tensors generated by CP decomposition w/r 0 terms, when matricized under any partition (I, J), have rank r 0 or less Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
47 Efficiency of Depth 16, 16 Exponential & Complete Efficiency of Depth Claim Tensors generated by CP decomposition w/r 0 terms, when matricized under any partition (I, J), have rank r 0 or less Theorem Consider the partition I odd = {1, 3,..., N 1}, J even = {2, 4,..., N}. Besides a set of measure zero, all param settings of HT decomposition give tensors that when matricized w.r.t. (I odd, J even ), have exponential ranks. Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
48 Efficiency of Depth 16, 16 Exponential & Complete Efficiency of Depth Claim Tensors generated by CP decomposition w/r 0 terms, when matricized under any partition (I, J), have rank r 0 or less Theorem Consider the partition I odd = {1, 3,..., N 1}, J even = {2, 4,..., N}. Besides a set of measure zero, all param settings of HT decomposition give tensors that when matricized w.r.t. (I odd, J even ), have exponential ranks. Since # of terms in CP decomposition corresponds to # of hidden channels in shallow ConvAC: Corollary Almost all func realizable by deep ConvAC cannot be replicated by shallow ConvAC with less than exponentially many hidden channels Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
49 Efficiency of Depth 16, 16 Exponential & Complete Efficiency of Depth Claim Tensors generated by CP decomposition w/r 0 terms, when matricized under any partition (I, J), have rank r 0 or less Theorem Consider the partition I odd = {1, 3,..., N 1}, J even = {2, 4,..., N}. Besides a set of measure zero, all param settings of HT decomposition give tensors that when matricized w.r.t. (I odd, J even ), have exponential ranks. Since # of terms in CP decomposition corresponds to # of hidden channels in shallow ConvAC: Corollary Almost all func realizable by deep ConvAC cannot be replicated by shallow ConvAC with less than exponentially many hidden channels W/ConvACs efficiency of depth is exponential and complete! Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
50 Efficiency of Depth From Convolutional Arithmetic Circuits to Convolutional Rectifier Networks 16, 16 hidden layer 0 hidden layer L-1 input X representation 1x1 conv pooling 1x1 conv x i pooling dense (output), x rep i d f d i M r 0 0 0, j, a conv0 j,, rep j,:, ', pool j P conv j 0 0 j ' window j r rl 1 r L 1 ', pool P conv j L 1 L 1 j ' covers space out y a L,1, y, pooll 1 : Y Transform ConvACs into convolutional rectifier networks (R-ConvNets): linear activation ReLU activation: σ(z) = max{z, 0} product pooling max/average pooling: P{c j } = max{c j }/mean{c j } Most successful deep learning architecture to date! Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
51 Efficiency of Depth Generalized Tensor Decompositions 16, 16 ConvACs correspond to tensor decompositions based on tensor product : (A B) d1,...,d P+Q = A d1,...,d P B dp+1,...,d P+Q Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
52 Efficiency of Depth Generalized Tensor Decompositions 16, 16 ConvACs correspond to tensor decompositions based on tensor product : (A B) d1,...,d P+Q = A d1,...,d P B dp+1,...,d P+Q For an operator g : R R R, the generalized tensor product g : (A g B) d1,...,d P+Q := g(a d1,...,d P, B dp+1,...,d P+Q ) (same as but with g( ) instead of multiplication) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
53 Efficiency of Depth Generalized Tensor Decompositions 16, 16 ConvACs correspond to tensor decompositions based on tensor product : (A B) d1,...,d P+Q = A d1,...,d P B dp+1,...,d P+Q For an operator g : R R R, the generalized tensor product g : (A g B) d1,...,d P+Q := g(a d1,...,d P, B dp+1,...,d P+Q ) (same as but with g( ) instead of multiplication) Generalized tensor decompositions are obtained by replacing with g Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
54 Efficiency of Depth Convolutional Rectifier Networks Generalized Tensor Decompositions 16, 16 Define the activation-pooling operator: ρ σ/p (a, b) := P{σ(a), σ(b)} Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
55 Efficiency of Depth Convolutional Rectifier Networks Generalized Tensor Decompositions 16, 16 Define the activation-pooling operator: ρ σ/p (a, b) := P{σ(a), σ(b)} Grid tensors of func realized by R-ConvNets are given by generalized tensor decompositions w/g( ) ρ σ/p ( ): Shallow R-ConvNet Deep R-ConvNet Generalized CP decomposition w/g( ) ρ σ/p ( ) Generalized HT decomposition w/g( ) ρ σ/p ( ) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
56 Efficiency of Depth 16, 16 Exponential But Incomplete Efficiency of Depth By analyzing matricization ranks of tensors realized by generalized CP and HT decompositions w/g( ) ρ σ/p ( ), we show: Claim There exist func realizable by deep R-ConvNet requiring shallow R-ConvNet to be exponentially large Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
57 Efficiency of Depth 16, 16 Exponential But Incomplete Efficiency of Depth By analyzing matricization ranks of tensors realized by generalized CP and HT decompositions w/g( ) ρ σ/p ( ), we show: Claim There exist func realizable by deep R-ConvNet requiring shallow R-ConvNet to be exponentially large On the other hand: Claim A non-negligible (positive measure) set of the func realizable by deep R- ConvNet can be replicated by shallow R-ConvNet w/few hidden channels Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
58 Efficiency of Depth 16, 16 Exponential But Incomplete Efficiency of Depth By analyzing matricization ranks of tensors realized by generalized CP and HT decompositions w/g( ) ρ σ/p ( ), we show: Claim There exist func realizable by deep R-ConvNet requiring shallow R-ConvNet to be exponentially large On the other hand: Claim A non-negligible (positive measure) set of the func realizable by deep R- ConvNet can be replicated by shallow R-ConvNet w/few hidden channels W/R-ConvNets efficiency of depth is exponential but incomplete! Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
59 Efficiency of Depth 16, 16 Exponential But Incomplete Efficiency of Depth By analyzing matricization ranks of tensors realized by generalized CP and HT decompositions w/g( ) ρ σ/p ( ), we show: Claim There exist func realizable by deep R-ConvNet requiring shallow R-ConvNet to be exponentially large On the other hand: Claim A non-negligible (positive measure) set of the func realizable by deep R- ConvNet can be replicated by shallow R-ConvNet w/few hidden channels W/R-ConvNets efficiency of depth is exponential but incomplete! Developing optimization methods for ConvACs may give rise to an arch that is provably superior but has so far been overlooked Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
60 Outline Inductive Bias of Pooling Geometry 1 Expressiveness 2 Expressiveness of Convolutional Networks Questions 3 Convolutional Arithmetic Circuits 4 Efficiency of Depth (Cohen+Sharir+Shashua@COLT 16, Cohen+Shashua@ICML 16) 5 Inductive Bias of Pooling Geometry (Cohen+Shashua@ICLR 17) 6 Efficiency of Overlapping Operations (Sharir+Shashua@arXiv 17) 7 Efficiency of Interconnectivity (Cohen+Tamari+Shashua@arXiv 17) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
61 Inductive Bias of Pooling Geometry 17 Separation Rank A Measure of Input Correlations ConvNets realize func over many local structures: f (x 1, x 2,..., x N ) x i image patches (2D network) / sequence samples (1D network) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
62 Inductive Bias of Pooling Geometry 17 Separation Rank A Measure of Input Correlations ConvNets realize func over many local structures: f (x 1, x 2,..., x N ) x i image patches (2D network) / sequence samples (1D network) Important feature of f ( ) correlations it models between the x i s Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
63 Inductive Bias of Pooling Geometry 17 Separation Rank A Measure of Input Correlations ConvNets realize func over many local structures: f (x 1, x 2,..., x N ) x i image patches (2D network) / sequence samples (1D network) Important feature of f ( ) correlations it models between the x i s Separation rank: Formal measure of these correlations I - J - Partition A Partition B Sep rank of f ( ) w.r.t. input partition (I, J) measures dist from separability (sep rank = more correlation between (x i ) i I and (x j ) j J ) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
64 Inductive Bias of Pooling Geometry 17 Deep Networks Favor Some Correlations Over Others Claim W/ConvAC sep rank w.r.t (I, J) is equal to rank of A y I,J grid tensor matricized w.r.t. (I, J) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
65 Inductive Bias of Pooling Geometry 17 Deep Networks Favor Some Correlations Over Others Claim W/ConvAC sep rank w.r.t (I, J) is equal to rank of A y I,J grid tensor matricized w.r.t. (I, J) Theorem Maximal rank of tensor generated by HT decomposition, when matricized w.r.t. (I, J), is: Exponential for interleaved partitions Polynomial for coarse partitions Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
66 Inductive Bias of Pooling Geometry 17 Deep Networks Favor Some Correlations Over Others Claim W/ConvAC sep rank w.r.t (I, J) is equal to rank of A y I,J grid tensor matricized w.r.t. (I, J) Theorem Maximal rank of tensor generated by HT decomposition, when matricized w.r.t. (I, J), is: Exponential for interleaved partitions Polynomial for coarse partitions Corollary Deep ConvAC can realize exponential sep ranks (correlations) for favored partitions, polynomial for others Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
67 Inductive Bias of Pooling Geometry 17 Pooling Geometry Controls the Preference contiguous pooling local correlations alternative pooling alternative correlations Pooling geometry of deep ConvAC determines which partitions are favored controls the correlation profile (inductive bias)! Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
68 Outline Efficiency of Overlapping Operations 1 Expressiveness 2 Expressiveness of Convolutional Networks Questions 3 Convolutional Arithmetic Circuits 4 Efficiency of Depth (Cohen+Sharir+Shashua@COLT 16, Cohen+Shashua@ICML 16) 5 Inductive Bias of Pooling Geometry (Cohen+Shashua@ICLR 17) 6 Efficiency of Overlapping Operations (Sharir+Shashua@arXiv 17) 7 Efficiency of Interconnectivity (Cohen+Tamari+Shashua@arXiv 17) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
69 Efficiency of Overlapping Operations Overlapping Operations 17 Baseline ConvAC arch has non-overlapping conv and pool windows: hidden layer 0 hidden layer L-1 input X representation 1x1 conv pooling 1x1 conv x i, x 0, conv0 j, a, rep j,: rep i d f d i M r 0 0 pool j, conv j ', pooling r rl 1 r L 1 pooll 1 convl 1 j ', j ' window j j ' covers space out y 0 0 a Ly, dense (output) Y, pooll 1 : Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
70 Efficiency of Overlapping Operations Overlapping Operations 17 Baseline ConvAC arch has non-overlapping conv and pool windows: hidden layer 0 hidden layer L-1 input X representation 1x1 conv pooling 1x1 conv x i, x 0, conv0 j, a, rep j,: rep i d f d i M r 0 0 pool j, conv j ', pooling r rl 1 r L 1 pooll 1 convl 1 j ', j ' window j j ' covers space out y 0 0 Replace those by (possibly) overlapping generalized convolution: a Ly, dense (output) Y, pooll 1 : Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
71 Efficiency of Overlapping Operations Exponential Efficiency Theorem 17 Various ConvACs w/overlapping GC layers realize func requiring ConvAC w/no overlaps to be exponentially large Examples Network starts with large receptive field: input X representa*on GC GC Arbitrary Layers N > N 2 stride 1x1 GC Output N > N 2 M Typical scheme of alternating B B conv and 2 2 pool : N input X N representa-on k: BxB s: 1x1 M Block 0 BxB GC 2x2 GC k: 2x2 s: 2x2 M r 0 r 0 Block L-1 K BxB GC 2x2 GC dense (output) r L-1 r L-1 K Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
72 Efficiency of Overlapping Operations Exponential Efficiency Theorem 17 Various ConvACs w/overlapping GC layers realize func requiring ConvAC w/no overlaps to be exponentially large Examples Network starts with large receptive field: input X representa*on GC GC Arbitrary Layers N > N 2 stride 1x1 GC Output N > N 2 M Typical scheme of alternating B B conv and 2 2 pool : N input X N representa-on k: BxB s: 1x1 M Block 0 BxB GC 2x2 GC k: 2x2 s: 2x2 M r 0 r 0 Block L-1 K BxB GC 2x2 GC dense (output) r L-1 r L-1 K W/ConvACs overlaps lead to exponential efficiency! Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
73 Outline Efficiency of Interconnectivity 1 Expressiveness 2 Expressiveness of Convolutional Networks Questions 3 Convolutional Arithmetic Circuits 4 Efficiency of Depth (Cohen+Sharir+Shashua@COLT 16, Cohen+Shashua@ICML 16) 5 Inductive Bias of Pooling Geometry (Cohen+Shashua@ICLR 17) 6 Efficiency of Overlapping Operations (Sharir+Shashua@arXiv 17) 7 Efficiency of Interconnectivity (Cohen+Tamari+Shashua@arXiv 17) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
74 Efficiency of Interconnectivity Dilated Convolutional Networks 17 Study efficiency of interconnectivity w/dilated convolutional networks: output L-1 hidden layers input r L rl 1 r 2 r 1 r 0 Time t-2 L t-2 L +1 t-2 L +2 t-3 t-2 t-1 t t+1 1D ConvNets (sequence data) Dilated (gapped) conv windows No pooling N:=2 L time points size-2 conv: dilation-2 L-1 size-2 conv: dilation-2 size-2 conv: dilation-1 Underlie Google s WaveNet & ByteNet state of the art for audio & text! Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
75 Efficiency of Interconnectivity 17 Mixing Tensor Decompositions Interconnectivity With dilated ConvNets, mode (axes) tree underlying corresponding tensor decomposition determines dilation scheme {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16} {1,2,3,4,5,6,7,8} {9,10,11,12,13,14,15,16} {1,2,3,4} {5,6,7,8} {9,10,11,12} {13,14,15,16} {1,2} {3,4} {5,6} {7,8} {9,10} {11,12} {13,14} {15,16} {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15} {16} dilation-8 dilation-4 dilation-2 dilation-1 {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16} {1,2,3,4,9,10,11,12} {5,6,7,8,13,14,15,16} {1,2,3,4} {5,6,7,8} {9,10,11,12} {13,14,15,16} {1,3} {2,4} {5,7} {6,8} {9,11} {10,12} {13,15} {14,16} {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15} {16} dilation-4 dilation-8 dilation-1 dilation-2 Mixed tensor decomposition blending different mode (axes) trees corresponds to interconnected networks with different dilations Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
76 Efficiency of Interconnectivity Efficiency of Interconnectivity 17 Theorem Mixed tensor decomposition generates tensors that can only be realized by individual decompositions if these grow quadratically Corollary Interconnected dilated ConvNets realize func that cannot be realized by individual networks unless these are quadratically larger Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
77 Efficiency of Interconnectivity Efficiency of Interconnectivity 17 Theorem Mixed tensor decomposition generates tensors that can only be realized by individual decompositions if these grow quadratically Corollary Interconnected dilated ConvNets realize func that cannot be realized by individual networks unless these are quadratically larger W/dilated ConvNets interconnectivity brings efficiency! Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
78 Outline 1 Expressiveness 2 Expressiveness of Convolutional Networks Questions 3 Convolutional Arithmetic Circuits 4 Efficiency of Depth (Cohen+Sharir+Shashua@COLT 16, Cohen+Shashua@ICML 16) 5 Inductive Bias of Pooling Geometry (Cohen+Shashua@ICLR 17) 6 Efficiency of Overlapping Operations (Sharir+Shashua@arXiv 17) 7 Efficiency of Interconnectivity (Cohen+Tamari+Shashua@arXiv 17) Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
79 Conclusion Expressiveness the driving force behind deep networks Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
80 Conclusion Expressiveness the driving force behind deep networks Formal concepts for treating expressiveness: Efficiency network arch realizes func requiring alternative arch to be much larger Inductive bias prioritization of some func over others given prior knowledge on task at hand Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
81 Conclusion Expressiveness the driving force behind deep networks Formal concepts for treating expressiveness: Efficiency network arch realizes func requiring alternative arch to be much larger Inductive bias prioritization of some func over others given prior knowledge on task at hand We analyzed efficiency and inductive bias of ConvNet arch features: depth pooling geometry overlapping operations interconnectivity Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
82 Conclusion Expressiveness the driving force behind deep networks Formal concepts for treating expressiveness: Efficiency network arch realizes func requiring alternative arch to be much larger Inductive bias prioritization of some func over others given prior knowledge on task at hand We analyzed efficiency and inductive bias of ConvNet arch features: depth pooling geometry overlapping operations interconnectivity Fundamental tool underlying all of our analyses: ConvNets hierarchical tensor decompositions Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
83 Thank You Nadav Cohen (Hebrew U.) Expressiveness of Convolutional Networks Science of Intelligence, / 44
Expressive Efficiency and Inductive Bias of Convolutional Networks:
Expressive Efficiency and Inductive Bias of Convolutional Networks: Analysis & Design via Hierarchical Tensor Decompositions Nadav Cohen The Hebrew University of Jerusalem AAAI Spring Symposium Series
More informationUnderstanding Deep Learning via Physics:
Understanding Deep Learning via Physics: The use of Quantum Entanglement for studying the Inductive Bias of Convolutional Networks Nadav Cohen Institute for Advanced Study Symposium on Physics and Machine
More informationBoosting Dilated Convolutional Networks with Mixed Tensor Decompositions
Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions Nadav Cohen Ronen Tamari Amnon Shashua Institute for Advanced Study Hebrew University of Jerusalem International Conference on Learning
More informationarxiv: v5 [cs.lg] 11 Jun 2018
Analysis and Design of Convolutional Networks via Hierarchical Tensor Decompositions arxiv:1705.02302v5 [cs.lg] 11 Jun 2018 Nadav Cohen Or Sharir Yoav Levine Ronen Tamari David Yakira Amnon Shashua The
More informationA TENSOR ANALYSIS ON DENSE CONNECTIVITY VIA CONVOLUTIONAL ARITHMETIC CIRCUITS
A TENSOR ANALYSIS ON DENSE CONNECTIVITY VIA CONVOLUTIONAL ARITHMETIC CIRCUITS Anonymous authors Paper under double-blind review ABSTRACT Several state of the art convolutional networks rely on inter-connecting
More informationarxiv: v2 [cs.lg] 10 Apr 2017
Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design arxiv:1704.01552v2 [cs.lg] 10 Apr 2017 Yoav Levine David Yakira Nadav Cohen Amnon Shashua The Hebrew
More informationarxiv: v2 [cs.ne] 23 May 2016
Convolutional Rectifier Networks as Generalized Tensor Decompositions Nadav Cohen The Hebrew Universit of Jerusalem cohennadav@cs.huji.ac.il Amnon Shashua The Hebrew Universit of Jerusalem shashua@cs.huji.ac.il
More informationBridging Many-Body Quantum Physics and Deep Learning via Tensor Networks
Bridging any-body Quantum Physics and Deep Learning via Tensor Networks Yoav Levine, 1, Or Sharir, 1, Nadav Cohen,, and Amnon Shashua 1, 1 The ebrew University of Jerusalem, Israel School of athematics,
More informationOn the Expressive Power of Deep Learning: A Tensor Analysis
JMLR: Workshop and Conference Proceedings vol 49:1 31, 2016 On the Expressive Power of Deep Learning: A Tensor Analysis Nadav Cohen Or Sharir Amnon Shashua The Hebrew University of Jerusalem COHENNADAV@CS.HUJI.AC.IL
More informationMachine Learning with Quantum-Inspired Tensor Networks
Machine Learning with Quantum-Inspired Tensor Networks E.M. Stoudenmire and David J. Schwab Advances in Neural Information Processing 29 arxiv:1605.05775 RIKEN AICS - Mar 2017 Collaboration with David
More informationAn Isabelle Formalization of the Expressiveness of Deep Learning
Universität des Saarlandes Master s Thesis in partial fulfillment of the requirements for the degree of M. Sc. Language Science and Technology An Isabelle Formalization of the Expressiveness of Deep Learning
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationSum-Product-Quotient Networks
Or Sharir The Hebrew University of Jerusalem or.sharir@cs.huji.ac.il Amnon Shashua The Hebrew University of Jerusalem shashua@cs.huji.ac.il Abstract We present a novel tractable generative model that extends
More informationarxiv: v3 [cs.lg] 20 Feb 2018
Or Sharir The Hebrew University of Jerusalem or.sharir@cs.huji.ac.il Amnon Shashua The Hebrew University of Jerusalem shashua@cs.huji.ac.il arxiv:1710.04404v [cs.lg] 20 Feb 2018 Abstract We present a novel
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationAsaf Bar Zvi Adi Hayat. Semantic Segmentation
Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random
More informationDeep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04
Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures Lecture 04 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Self-Taught Learning 1. Learn
More informationTensor networks and deep learning
Tensor networks and deep learning I. Oseledets, A. Cichocki Skoltech, Moscow 26 July 2017 What is a tensor Tensor is d-dimensional array: A(i 1,, i d ) Why tensors Many objects in machine learning can
More informationIntroduction to Machine Learning (67577)
Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationDeep Homogeneous Mixture Models: Representation, Separation, and Approximation
Deep Homogeneous Mixture Models: Representation, Separation, and Approximation Priyank Jaini Department of Computer Science & Waterloo AI Institute University of Waterloo pjaini@uwaterloo.ca Pascal Poupart
More informationhe Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation
he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation Amnon Shashua School of Computer Science & Eng. The Hebrew University Matrix Factorization
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationGlobal Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond
Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This
More informationMaxout Networks. Hien Quoc Dang
Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine
More informationConvolutional Neural Network Architecture
Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation
More informationOn the complexity of shallow and deep neural network classifiers
On the complexity of shallow and deep neural network classifiers Monica Bianchini and Franco Scarselli Department of Information Engineering and Mathematics University of Siena Via Roma 56, I-53100, Siena,
More informationConvolutional neural networks
11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:
More informationCSCI 315: Artificial Intelligence through Deep Learning
CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationNearly-tight VC-dimension bounds for piecewise linear neural networks
Proceedings of Machine Learning Research vol 65:1 5, 2017 Nearly-tight VC-dimension bounds for piecewise linear neural networks Nick Harvey Christopher Liaw Abbas Mehrabian Department of Computer Science,
More informationConvolutional Neural Networks. Srikumar Ramalingam
Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationLearning Dexterity Matthias Plappert SEPTEMBER 6, 2018
Learning Dexterity Matthias Plappert SEPTEMBER 6, 2018 OpenAI OpenAI is a non-profit AI research company, discovering and enacting the path to safe artificial general intelligence. OpenAI OpenAI is a non-profit
More informationCombining Memory and Landmarks with Predictive State Representations
Combining Memory and Landmarks with Predictive State Representations Michael R. James and Britton Wolfe and Satinder Singh Computer Science and Engineering University of Michigan {mrjames, bdwolfe, baveja}@umich.edu
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationLocal Affine Approximators for Improving Knowledge Transfer
Local Affine Approximators for Improving Knowledge Transfer Suraj Srinivas & François Fleuret Idiap Research Institute and EPFL {suraj.srinivas, francois.fleuret}@idiap.ch Abstract The Jacobian of a neural
More informationMachine Learning with Tensor Networks
Machine Learning with Tensor Networks E.M. Stoudenmire and David J. Schwab Advances in Neural Information Processing 29 arxiv:1605.05775 Beijing Jun 2017 Machine learning has physics in its DNA # " # #
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More informationLecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep
More informationI-theory on depth vs width: hierarchical function composition
CBMM Memo No. 041 December 29, 2015 I-theory on depth vs width: hierarchical function composition by Tomaso Poggio with Fabio Anselmi and Lorenzo Rosasco Abstract Deep learning networks with convolution,
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationGeneral Theory of Equivariant CNNs
General Theory of Equivariant CNNs Taco Cohen July 13, 2018 Collaborators Mario Geiger Maurice Weiler Jonas Koehler Marysia Winkels Max Welling Emiel Hoogeboom Jorn Peters Bas Veeling Jim Winkens Jasper
More informationMulti-Layer Boosting for Pattern Recognition
Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationNormalization Techniques in Training of Deep Neural Networks
Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,
More informationLearning Deep Architectures for AI. Part I - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a
More informationDRAFT. Algebraic computation models. Chapter 14
Chapter 14 Algebraic computation models Somewhat rough We think of numerical algorithms root-finding, gaussian elimination etc. as operating over R or C, even though the underlying representation of the
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationWHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION SHIYU LIANG THESIS
c 2017 Shiyu Liang WHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION BY SHIYU LIANG THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer
More informationTENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018
TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in
More informationGroup Equivariant Convolutional Networks. Taco Cohen
Group Equivariant Convolutional Networks Taco Cohen Outline 1. Symmetry & Deep Learning: Statistical Power from Symmetry Invariance vs Equivariance Equivariance in Deep Learning 2.Group Theory Symmetry,
More informationSajid Anwar, Kyuyeon Hwang and Wonyong Sung
Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationOnline Dictionary Learning with Group Structure Inducing Norms
Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationTwo at Once: Enhancing Learning and Generalization Capacities via IBN-Net
Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University
More informationON THE EXPRESSIVE POWER OF OVERLAPPING ARCHITECTURES OF DEEP LEARNING
Published as a conference paper at ICL 018 ON THE EXPEIVE POWE OF OVELAPPING ACHITECTUE OF EEP LEANING Or harir & Amnon hashua The Hebrew University of Jerusalem {or.sharir,shashua}@cs.huji.ac.il ABTACT
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationApplied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1
Applied Statistics Multivariate Analysis - part II Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1 Fisher Discriminant You want to separate two types/classes (A and B) of
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationFinal Examination CS540-2: Introduction to Artificial Intelligence
Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your
More informationTensor network vs Machine learning. Song Cheng ( 程嵩 ) IOP, CAS
Tensor network vs Machine learning Song Cheng ( 程嵩 ) IOP, CAS physichengsong@iphy.ac.cn Outline Tensor network in a nutshell TN concepts in machine learning TN methods in machine learning Outline Tensor
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationWaveNet: A Generative Model for Raw Audio
WaveNet: A Generative Model for Raw Audio Ido Guy & Daniel Brodeski Deep Learning Seminar 2017 TAU Outline Introduction WaveNet Experiments Introduction WaveNet is a deep generative model of raw audio
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationCSCI 250: Intro to Robotics. Spring Term 2017 Prof. Levy. Computer Vision: A Brief Survey
CSCI 25: Intro to Robotics Spring Term 27 Prof. Levy Computer Vision: A Brief Survey What Is Computer Vision? Higher-order tasks Face recognition Object recognition (Deep Learning) What Is Computer Vision?
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More informationTHE POWER OF DEEPER NETWORKS
THE POWER OF DEEPER NETWORKS FOR EXPRESSING NATURAL FUNCTIONS David Rolnic, Max Tegmar Massachusetts Institute of Technology {drolnic, tegmar}@mit.edu ABSTRACT It is well-nown that neural networs are universal
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More information1 3 4 5 6 7 8 9 10 11 12 13 Convolutions in more detail material for this part of the lecture is taken mainly from the theano tutorial: http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationNeural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10
CS9840 Learning and Computer Vision Prof. Olga Veksler Lecture 0 Neural Networks Many slides are from Andrew NG, Yann LeCun, Geoffry Hinton, Abin - Roozgard Outline Short Intro Perceptron ( layer NN) Multilayer
More informationUncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization
Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationarxiv: v1 [cs.lg] 26 Jul 2017
Tensor Regression Networks Jean Kossaifi Amazon AI Imperial College London jean.kossaifi@imperial.ac.uk Zachary Lipton Amazon AI University of California, San Diego zlipton@cs.ucsd.edu arxiv:1707.08308v1
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationQuantum Convolutional Neural Networks
Quantum Convolutional Neural Networks Iris Cong Soonwon Choi Mikhail D. Lukin arxiv:1810.03787 Berkeley Quantum Information Seminar October 16 th, 2018 Why quantum machine learning? Machine learning: interpret
More informationEfficient Learning of Linear Perceptrons
Efficient Learning of Linear Perceptrons Shai Ben-David Department of Computer Science Technion Haifa 32000, Israel shai~cs.technion.ac.il Hans Ulrich Simon Fakultat fur Mathematik Ruhr Universitat Bochum
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationDeep Learning Lecture 2
Fall 2016 Machine Learning CMPSCI 689 Deep Learning Lecture 2 Sridhar Mahadevan Autonomous Learning Lab UMass Amherst COLLEGE Outline of lecture New type of units Convolutional units, Rectified linear
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationArtificial neural networks
Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural
More informationDeep Belief Networks are Compact Universal Approximators
Deep Belief Networks are Compact Universal Approximators Franck Olivier Ndjakou Njeunje Applied Mathematics and Scientific Computation May 16, 2016 1 / 29 Outline 1 Introduction 2 Preliminaries Universal
More informationDoes the Wake-sleep Algorithm Produce Good Density Estimators?
Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationCSC321 Lecture 20: Reversible and Autoregressive Models
CSC321 Lecture 20: Reversible and Autoregressive Models Roger Grosse Roger Grosse CSC321 Lecture 20: Reversible and Autoregressive Models 1 / 23 Overview Four modern approaches to generative modeling:
More information