Group Equivariant Convolutional Networks. Taco Cohen

Size: px

Start display at page:

Download "Group Equivariant Convolutional Networks. Taco Cohen"

Elfreda Hall
5 years ago
Views:

1 Group Equivariant Convolutional Networks Taco Cohen

2 Outline 1. Symmetry & Deep Learning: Statistical Power from Symmetry Invariance vs Equivariance Equivariance in Deep Learning 2.Group Theory Symmetry, Groups Subgroups, Cosets, Quotients Wallpaper groups 3.Group Equivariant Networks CNNs and translation equivariance G-Convolutions Equivariance of non-linearities Equivariance of G-pooling operator Backpropagation 4. Algorithms Spatial & Spectral G-convs 5. Results

3 Background: Invariance, Equivariance & Symmetry

4 Symmetry in ML A symmetry of a function is a transformation that leaves that function invariant In ML: look for symmetries of densities, factors, label functions, Shariff, R. (2015). Exploiting Symmetries To Construct Efficient Mcmc Algorithms.

5 Invariance Face Eyes Nose Mouth The Picasso Problem

6 Equivariance T 2 g Z 1 Z 2 T 1 g (T 1 g x) =T 2 g (x) Hinton, G., Krizhevsky, A., & Wang, S. (2011). Transforming auto-encoders. ICANN-11 Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence (CVPR) Cohen, T., & Welling, M. (2014). Learning the Irreducible Representations of Commutative Lie Groups. (ICML)

7 Symmetry in DL Why do CNNs work so well? They exploit translational symmetry In deep nets, each layer should preserve the symmetry The representation should be an equivariant map for the symmetry group. Szegedy et al. (2015) Deeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeep!

8 ConvNets are Translation Equivariant Source:

9 Are ConvNets Rotation-Equivariant? Source:

CNNs want to be Equivariant http://blog.keras.

10 CNNs want to be Equivariant

11 Visual Group Theory With figures from Visual Group Theory by Nathan Carter (2009)

12 Symmetries

13 Groups A group is: a set, G, together with a binary operation that combines two elements a, b in G and produces another element ab, that satisfies the group axioms: 1. Identity: there exists an element e in G, such that for every element a in G, ea = ae = a 2. Associativity: For all a, b and c in G, (a b) c = a (b c). 3. Closure: for all a, b in G, the composition ab is also in G 4. Inverse: for each a in G, there exists an element b in G such that ab = ba = e

14 The Symmetries of an Object form a Group hh -1 =e 1 h h hv=r 1 h 1 v

15 Examples Lie Groups Discrete Groups Topological Groups Galois Groups Wallpaper Groups Space Groups

16 Cayley Graphs

17 Subgroups A subgroup of a group is a subset of said group, that is itself a group

18 Cosets The left coset of subgroup H in G with respect to g is the set: gh = {gh h 2 H} Question: when is a coset a subgroup? Question: do the cosets always partition the group?

19 Quotients The quotient of G by subgroup H is the set of cosets of H in G G/H = {gh g 2 G}

20 Wallpaper Groups

21 The Groups p4 & p4m 0 1 ya 1 g(m, r, u, v) = = 2 ( 1) m cos(r /2) sin(r /2) sin(r /2) cos(r /2) ( 1) m cos(r /2) ( 1) m sin(r /2) u 4 sin(r /2) cos(r /2) v u 40 1 v

The Goups p6 & p6m g(m, r, u, v) = = 2 ( 1) m 0 3 2 0 cos(r /3) sin(r /3) 3 0 4 0 1 05 4sin(r /3) cos(r /3)

22 The Goups p6 & p6m g(m, r, u, v) = = 2 ( 1) m cos(r /3) sin(r /3) sin(r /3) cos(r /3) ( 1) m cos(r /3) ( 1) m sin(r /3) u 4 sin(r /3) cos(r /3) v u 40 1 v

23 Group Equivariant CNNs

24 How to think about CNNs A stack of feature maps is a 3D array A stack of feature maps is a vector-valued function f l : Z 2! R K l Mmm Donuts Genus one topological space

25 G-Equivariant Correlation on Z 2 Standard correlation: translate canonical filter and compute inner product G-Correlation: transform canonical filter and compute inner product

26 Translational Correlation Translation [T s f](x) =f(x s) Correlation X X K [f? ](s) = f k (x)[t s ] k (x) x2z 2 Equivariance k=1 [T s f]? = T s [f? ]

27 Group Correlation on Z 2 Transformation [T g f](x) =f(g 1 x) G-Correlation [f? ](g) = X x2z 2 K X k=1 f k (x)[t g ] k (x) Equivariance [T g f]? = T g [f? ]

28 ( Feature Transformation Law (p4) [T g f]? = T g [f? ] Transformation of 2D image??? T r ( = apple R( 0 ) apple R( ) t 0 1 = apple R( 0 ) R( 0 )t 0 1 1

29 ( ( Feature Transformation Law (p4m) [T g f]? = T g [f? ] T r ( = T m ( =

30 G-Conv is Non-Commutative G = p4 f = psi = f??f Same information content: f? (g 1 )=?f(g)

31 Group Correlation on G Transformation [T g f](h) =f(g 1 h) G-Correlation [f? ](g) = X h2g KX k=1 f k (h)[t g ] k (h) Equivariance [T g f]? = T g [f? ]

32 Non-linearities: Equivariant Function Composition Commutes with Domain Transformations [T h f]= (f h 1 )=( f) h 1 = T h [ f] f T h T h

33 Strideless G-Pooling: Equivariant Max-pool over neighborhood gu of g: Pf(g) = max k2g U f(k) Pooling Operator Commutes with G-Action PT h = T h P f T h P P T h

34 Z 2 pooling of a p6 feature map f T r f U P f P T r f T r P f

35 Subsampling Subsample on a subgroup H of G 2Z 2 Z 2 C 4 n 2Z 2 C 4 n Z 2 C 4 C 4 n Z 2 Z 2 C 4 n Z 2

36 Coset Pooling Choose pooling domain U to be a subgroup H gh = {gh h 2 H} Pooled feature map is constant on cosets Pf(gh) = max k2ghh f(k) = max k2gh f(k) That is, the pooled feature map is a function on the quotient G / H

37 Example: p4 Coset Pooling r p4 feature map array x y Pool over U = C 4 p4/c 4 = Z 2 feature map array Pool over U = Z 2 p4/z 2 = C 4 feature map array Pool over U = C 4 n 2Z 2 Z 2 /2Z 2 feature map array

38 Backprop Input grad: involuted correlation L f l 1 j (k) = L f l? l j (k) Filter grad: involuted convolution L li j (k) = L f l i f l j 1 (k)

39 Algorithms: Spatial & Spectral G-Convs

40 Naive Spatial Implementation Compute: r(g) =f (g) For each output channel j: g For each g in G-grid: Warp filter: j g = T g j Compute inner product: t x t y r j [g] =hf, g i = X i X t x X t y X f i [t x,t y, ] ji g [t x,t y, ]

41 Efficient Spatial Implementation G-Correlation [f? ](g) = X h2g KX k=1 f k (h)[t g ] k (h) Decompose into translation and rotation: T g = T tr = T t T r To get output feature plane theta, do a planar convolution with rotated filter: [f? ](t, r) = X h X f(h)l t r (h) k

42 Spectral G-Convolution f t y t y t x t x F F ˆf 0 h i ˆf h i5 ˆf K ˆ0 h ˆ1i h i5 ˆK ˆf 0 ˆ0 h ˆf 1 ˆ1i h i5 ˆf K ˆK F 1 f Folland, G. B. (1995). A Course in Abstract Harmonic Analysis.

Results: Rotated MNIST SVM [1] NNet [1] DBN [1] RC-RBM [2] Z2CNN P4CNNRP P4CNN 10,38 17,62 12,11 3,98 5,68 3,90 2.51 [1] Larochelle, H., Erhan, D., Courville, A., Bergstra, J., and Bengio, Y.

43 Results: Rotated MNIST SVM [1] NNet [1] DBN [1] RC-RBM [2] Z2CNN P4CNNRP P4CNN 10,38 17,62 12,11 3,98 5,68 3, [1] Larochelle, H., Erhan, D., Courville, A., Bergstra, J., and Bengio, Y. (2007) An empirical evaluation of deep architectures on problems with many factors of variation. (ICML) [2] Schmidt, U., & Roth, S. (2012). Learning rotation-aware features: From invariant priors to equivariant descriptors. (CVPR)

44 CIFAR-10 All-CNN 3 x 3 conv. 96 ReLU 3 x 3 conv. 96 ReLU 3 x 3 conv. 96 ReLU (stride 2) 3 x 3 conv. 192 ReLU 3 x 3 conv. 192 ReLU 3 x 3 conv. 192 ReLU (stride 2) 3 x 3 conv. 192 ReLU 3 x 3 conv. 96 ReLU 1 x 1 conv. 192 ReLU 1 x 1 conv. 10 ReLU 3 x 3 conv. 96 ReLU Global averaging softmax P4-All-CNN Replace conv by p4-conv, halve number of filters Results

45 Work in Progress..

46 Projects / Theses / Internships taco.cohen@gmail.com

47 Questions?

Group Equivariant Convolutional Networks

Group Equivariant Convolutional Networks Taco S. Cohen University of Amsterdam Max Welling University of Amsterdam University of California Irvine Canadian Institute for Advanced Research T.S.COHEN@UVA.NL M.WELLING@UVA.NL Abstract We introduce