Spatial Transformer. Ref: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS, 2015

Size: px

Start display at page:

Download "Spatial Transformer. Ref: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS, 2015"

Olivia Perry
5 years ago
Views:

1 Spatial Transormer Re: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transormer Networks, NIPS, 2015

2 Spatial Transormer Layer CNN is not invariant to scaling and rotation CNN 5 CNN 6 NN layer End-to-end learn Can also transorm eature map

3 Spatial Transormer Layer How to transorm an image/eature map Layer l-1 Layer l a l 1 11 a l 1 l 1 12 a 13 a l 1 21 a l 1 l 1 22 a 23 a l 1 31 a l 1 l 1 32 a 33 Spatial Transormer Layer Translate l l l a 11 a 12 a 13 l l l a 21 a 22 a 23 l l l a 31 a 32 a 33 General layer: al nm 3 = i=1 3 j=1 l I we want translate as above: a nm l w nm,ij l l 1 w nm,ij a ij l 1 = a (n 1)m l = 1 i i = n 1, j = m w nm,ij = 0 otherwise

4 Spatial Transormer Layer How to transorm an image/eature map Layer l-1 Layer l Layer l-1 Layer l a l 1 11 a l 1 l 1 12 a 13 l l l a 11 a 12 a 13 a l 1 11 a l 1 l 1 12 a 13 l l l a 11 a 12 a 13 a l 1 21 a l 1 l 1 22 a 23 l l l a 21 a 22 a 23 a l 1 21 a l 1 l 1 22 a 23 l l l a 21 a 22 a 23 a l 1 31 a l 1 l 1 32 a 33 l l l a 31 a 32 a 33 a l 1 31 a l 1 l 1 32 a 33 l l l a 31 a 32 a 33 NN Control the connection NN Control the connection

Image Transormation Expansion, Compression, Translation x y = 2

5 Image Transormation Expansion, Compression, Translation x y = x y x y x y 1 1 x y = x y x y

6 reationdetail.php?sn= Image Transormation Rotation x y = cosθ sinθ sinθ cosθ x y x y x y Rotate θ

7 Spatial Transormer Layer x y = a b c d x y + e 6 parameters to describe the aine transormation Index o layer l-1 Index o layer l a l 1 11 a l 1 l 1 12 a 13 l l l a 11 a 12 a 13 Layer l-1 a l 1 21 a l 1 l 1 22 a 23 a l 1 31 a l 1 l 1 32 a 33 a c b d e l l l a 21 a 22 a 23 l l l a 31 a 32 a 33 Layer l NN

8 Spatial Transormer Layer x y = 0a 1b 1c 0d x y + 1 e 1 6 parameters to describe the aine transormation Index o layer l-1 Index o layer l a l 1 11 a l 1 l 1 12 a 13 l l l a 11 a 12 a 13 Layer l-1 a l 1 21 a l 1 l 1 22 a 23 a l 1 31 a l 1 l 1 32 a 33 0a 1b 1c 0d 1 e 1 l a 21 l a 22 l a 23 l a 31 l a 32 l a 33 Layer l NN

9 Spatial Transormer Layer 1.6 x 2.4 y = 0a 0.5 b 1c 0d x2 y e parameters to describe the aine transormation Index o layer l-1 Index o layer l What is the problem? a l 1 11 a l 1 l 1 12 a 13 l l l a 11 a 12 a 13 Layer l-1 a l 1 21 a l 1 l 1 22 a 23 a l 1 31 a l 1 l 1 32 a 33 0a 0.5 b 1 c 0d 0.6 e 0.4 l a 21 l a 22 l a 23 l a 31 l a 32 l a 33 Layer l NN Gradient is always zero

10 Interpolation Now we can use gradient descent 1.6 x 2.4 y = 0a 0.5 b 1c 0d x2 y e parameters to describe the aine transormation Index o layer l-1 Index o layer l l l l a 11 a 12 a l l l a 21 a 22 a 23 Layer l l l l a 31 a 32 a l a 22 l 1 = (1 0.4) (1 0.4) a 22 l (1 0.4) a 12 l (1 0.6) a 13 l (1 0.6) a 23

13 Street View House Number Single: one transormation layer Multi: many transormation layer

14 Brid Recognition a c 0 b d 0 e

15 Highway Network & Grid LSTM

16 Feedorward v.s. Recurrent 1. Feedorward network does not have input at each step 2. Feedorward network has dierent parameters or each layer x 1 a 1 2 a 2 3 a 3 4 y a t = l a t 1 = σ W t a t 1 + b t t is layer h 0 h 1 h 2 h 3 g y 4 x 1 x 2 a t = a t 1, x t x 3 x 4 = σ W h a t 1 + W i x t + b i t is time step Applying gated structure in eedorward network

17 GRU Highway Network y t No input x t at each step ha t-1 + ha t No output y t at each step a t-1 is the output o the (t-1)-th layer reset update 1- a t is the output o the t-th layer r z h' No reset gate ah t-1 x t x t

18 Highway Network Highway Network Gate controller z h a t copy h = σ Wa t 1 z = σ W a t 1 a t = z a t z h Residual Network h + a t a t 1 copy a t 1 a t 1 Training Very Deep Networks 8v2.pd Deep Residual Learning or Image Recognition

19 output layer output layer output layer Input layer Highway Network automatically determines the layers needed! Input layer Input layer

20 Highway Network

21 Grid LSTM y depth Memory or both time and depth a b c h LSTM c h t c h Grid LSTM c h x a b time

22 a l+1 b l+1 a l+1 b l+1 c t-1 h t-1 Grid LSTM c t h t Grid LSTM c t+1 h t+1 a l b l a l b l c t-1 h t-1 Grid LSTM c t h t Grid LSTM c t+1 h t+1 a l-1 b l-1 a l-1 b l-1

23 Grid LSTM h ' b' a b c c' c h Grid LSTM c h a z z i + tanh z z o a' a b h b

24 3D Grid LSTM a b e c c h e h a b

25 Recursive Structure

26 Application: Sentiment Analysis h 0 h 1 h 2 h 3 g sentiment x 1 x 2 x 3 x 4 word vector Word Sequence Recurrent Structure Special case o recursive structure Recursive Structure How to stack unction is already determined x 1 h 1 h 3 same dimension x 2 x 3 g h 2 x 4

27 Recursive Model syntactic structure not very good How to do it is out o the scope word sequence: not very good

28 Recursive Model By composing the two meaning, what should the meaning be. Dimension o word vector = Z Input: 2 X Z, output: Z Meaning o very good V( very good ) syntactic structure not very good V( not ) V( very ) V( good ) not very good

29 Recursive Model V(w A w B ) V(w A ) + V(w B ) syntactic structure not : neutral good : positive not very good not good : negative Meaning o very good V( very good ) V( not ) V( very ) V( good ) not very good

30 Recursive Model V(w A w B ) V(w A ) + V(w B ) syntactic structure 棒 : positive 好棒 : positive not very good 好棒棒 : negative Meaning o very good V( very good ) network V( not ) V( very ) V( good ) not very good

31 Recursive Model not good not bad syntactic structure not very good not good not bad Meaning o very good V( very good ) not : reverse another input V( not ) V( very ) V( good ) not very good

32 Recursive Model very good very bad syntactic structure not very good very good very bad Meaning o very good V( very good ) very : emphasize another input V( not ) V( very ) V( good ) not very good

33 - output re 5 classes ( --, -, 0, +, ++ ) V( not very good ) g Train both g V( not ) V( very ) V( good ) not very good

34 Recursive Neural Tensor Network p = σ W Little interaction between a and b a b = σ x T W x + W ij x i x j i,j

35 Experiments 5-class sentiment classiication ( --, -, 0, +, ++ ) Demo:

36 Experiments Socher, Richard, et al. "Recursive deep models or semantic compositionality over a sentiment treebank." Proceedings o the conerence on empirical methods in natural language processing (EMNLP). Vol

37 Matrix-Vector Recursive Network inherent meaning how it changes the others = = σ W = W M = Zero? a A Inormative? b B 1 0 Identity? 0 1? not good

38 Tree LSTM h t-1 h t Typical LSTM Tree LSTM m t-1 LSTM m t x t h 3 m 3 LSTM h 1 m 1 m 2 h 2

39 More Applications Sentence relatedness NN Tai, Kai Sheng, Richard Socher, and Christopher D. Manning. "Improved semantic representations rom treestructured long short-term memory networks." arxiv preprint arxiv: (2015). Recursive Neural Network Recursive Neural Network Sentence 1 Sentence 2

40 Batch Normalization

41 Experimental Results

Spatial Transformer Networks

BIL722 - Deep Learning for Computer Vision Spatial Transformer Networks Max Jaderberg Andrew Zisserman Karen Simonyan Koray Kavukcuoglu Contents Introduction to Spatial Transformers Related Works Spatial