Spatial Transformer. Ref: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS, PDF Free Download

Spatial Transormer Re: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transormer Networks, NIPS, 2015

Spatial Transormer Layer CNN is not invariant to scaling and rotation CNN 5 CNN 6 NN layer End-to-end learn Can also transorm eature map

Spatial Transormer Layer How to transorm an image/eature map Layer l-1 Layer l a l 1 11 a l 1 l 1 12 a 13 a l 1 21 a l 1 l 1 22 a 23 a l 1 31 a l 1 l 1 32 a 33 Spatial Transormer Layer Translate l l l a 11 a 12 a 13 l l l a 21 a 22 a 23 l l l a 31 a 32 a 33 General layer: al nm 3 = i=1 3 j=1 l I we want translate as above: a nm l w nm,ij l l 1 w nm,ij a ij l 1 = a (n 1)m l = 1 i i = n 1, j = m w nm,ij = 0 otherwise

Spatial Transormer Layer How to transorm an image/eature map Layer l-1 Layer l Layer l-1 Layer l a l 1 11 a l 1 l 1 12 a 13 l l l a 11 a 12 a 13 a l 1 11 a l 1 l 1 12 a 13 l l l a 11 a 12 a 13 a l 1 21 a l 1 l 1 22 a 23 l l l a 21 a 22 a 23 a l 1 21 a l 1 l 1 22 a 23 l l l a 21 a 22 a 23 a l 1 31 a l 1 l 1 32 a 33 l l l a 31 a 32 a 33 a l 1 31 a l 1 l 1 32 a 33 l l l a 31 a 32 a 33 NN Control the connection NN Control the connection

Image Transormation Expansion, Compression, Translation x y = 2 0 0 2 x y + 0 0 x y x y 1 1 x y = 0.5 0 0 0.5 x y + 0.5 0.5 x y

https://home.gamer.com.tw/c reationdetail.php?sn=792585 Image Transormation Rotation x y = cosθ sinθ sinθ cosθ x y + 0 0 x y x y Rotate θ

Spatial Transormer Layer x y = a b c d x y + e 6 parameters to describe the aine transormation Index o layer l-1 Index o layer l a l 1 11 a l 1 l 1 12 a 13 l l l a 11 a 12 a 13 Layer l-1 a l 1 21 a l 1 l 1 22 a 23 a l 1 31 a l 1 l 1 32 a 33 a c b d e l l l a 21 a 22 a 23 l l l a 31 a 32 a 33 Layer l NN

Spatial Transormer Layer x y = 0a 1b 1c 0d x y + 1 e 1 6 parameters to describe the aine transormation Index o layer l-1 Index o layer l a l 1 11 a l 1 l 1 12 a 13 l l l a 11 a 12 a 13 Layer l-1 a l 1 21 a l 1 l 1 22 a 23 a l 1 31 a l 1 l 1 32 a 33 0a 1b 1c 0d 1 e 1 l a 21 l a 22 l a 23 l a 31 l a 32 l a 33 Layer l NN

Spatial Transormer Layer 1.6 x 2.4 y = 0a 0.5 b 1c 0d x2 y + 0.6 e 2 0.4 6 parameters to describe the aine transormation Index o layer l-1 Index o layer l What is the problem? a l 1 11 a l 1 l 1 12 a 13 l l l a 11 a 12 a 13 Layer l-1 a l 1 21 a l 1 l 1 22 a 23 a l 1 31 a l 1 l 1 32 a 33 0a 0.5 b 1 c 0d 0.6 e 0.4 l a 21 l a 22 l a 23 l a 31 l a 32 l a 33 Layer l NN Gradient is always zero

Interpolation Now we can use gradient descent 1.6 x 2.4 y = 0a 0.5 b 1c 0d x2 y + 0.6 e 2 0.4 6 parameters to describe the aine transormation Index o layer l-1 Index o layer l l l l a 11 a 12 a 13 0.4 0.6 l l l a 21 a 22 a 23 Layer l l l l a 31 a 32 a 33 0.6 0.4 0.4 1.6 2.4 0.6 0.6 0.4 l a 22 l 1 = (1 0.4) (1 0.4) a 22 l 1 + 1 0.6 (1 0.4) a 12 l 1 + 1 0.6 (1 0.6) a 13 l 1 + 1 0.4 (1 0.6) a 23

Street View House Number Single: one transormation layer Multi: many transormation layer

Brid Recognition a c 0 b d 0 e

Highway Network & Grid LSTM

Feedorward v.s. Recurrent 1. Feedorward network does not have input at each step 2. Feedorward network has dierent parameters or each layer x 1 a 1 2 a 2 3 a 3 4 y a t = l a t 1 = σ W t a t 1 + b t t is layer h 0 h 1 h 2 h 3 g y 4 x 1 x 2 a t = a t 1, x t x 3 x 4 = σ W h a t 1 + W i x t + b i t is time step Applying gated structure in eedorward network

GRU Highway Network y t No input x t at each step ha t-1 + ha t No output y t at each step a t-1 is the output o the (t-1)-th layer reset update 1- a t is the output o the t-th layer r z h' No reset gate ah t-1 x t x t

Highway Network Highway Network Gate controller z h a t copy h = σ Wa t 1 z = σ W a t 1 a t = z a t 1 + 1 z h Residual Network h + a t a t 1 copy a t 1 a t 1 Training Very Deep Networks https://arxiv.org/pd/1507.0622 8v2.pd Deep Residual Learning or Image Recognition http://arxiv.org/abs/1512.03385

output layer output layer output layer Input layer Highway Network automatically determines the layers needed! Input layer Input layer

Highway Network

Grid LSTM y depth Memory or both time and depth a b c h LSTM c h t c h Grid LSTM c h x a b time

a l+1 b l+1 a l+1 b l+1 c t-1 h t-1 Grid LSTM c t h t Grid LSTM c t+1 h t+1 a l b l a l b l c t-1 h t-1 Grid LSTM c t h t Grid LSTM c t+1 h t+1 a l-1 b l-1 a l-1 b l-1

Grid LSTM h ' b' a b c c' c h Grid LSTM c h a z z i + tanh z z o a' a b h b

3D Grid LSTM a b e c c h e h a b

Recursive Structure

Application: Sentiment Analysis h 0 h 1 h 2 h 3 g sentiment x 1 x 2 x 3 x 4 word vector Word Sequence Recurrent Structure Special case o recursive structure Recursive Structure How to stack unction is already determined x 1 h 1 h 3 same dimension x 2 x 3 g h 2 x 4

Recursive Model syntactic structure not very good How to do it is out o the scope word sequence: not very good

Recursive Model By composing the two meaning, what should the meaning be. Dimension o word vector = Z Input: 2 X Z, output: Z Meaning o very good V( very good ) syntactic structure not very good V( not ) V( very ) V( good ) not very good

Recursive Model V(w A w B ) V(w A ) + V(w B ) syntactic structure not : neutral good : positive not very good not good : negative Meaning o very good V( very good ) V( not ) V( very ) V( good ) not very good

Recursive Model V(w A w B ) V(w A ) + V(w B ) syntactic structure 棒 : positive 好棒 : positive not very good 好棒棒 : negative Meaning o very good V( very good ) network V( not ) V( very ) V( good ) not very good

Recursive Model not good not bad syntactic structure not very good not good not bad Meaning o very good V( very good ) not : reverse another input V( not ) V( very ) V( good ) not very good

Recursive Model very good very bad syntactic structure not very good very good very bad Meaning o very good V( very good ) very : emphasize another input V( not ) V( very ) V( good ) not very good

- output re 5 classes ( --, -, 0, +, ++ ) V( not very good ) g Train both g V( not ) V( very ) V( good ) not very good

Recursive Neural Tensor Network p = σ W Little interaction between a and b a b = σ x T W x + W ij x i x j i,j

Experiments 5-class sentiment classiication ( --, -, 0, +, ++ ) Demo: http://nlp.stanord.edu:8080/sentiment/rntndemo.html

Experiments Socher, Richard, et al. "Recursive deep models or semantic compositionality over a sentiment treebank." Proceedings o the conerence on empirical methods in natural language processing (EMNLP). Vol. 1631. 2013.

Matrix-Vector Recursive Network inherent meaning how it changes the others = = σ W = W M = Zero? a A Inormative? b B 1 0 Identity? 0 1? not good

Tree LSTM h t-1 h t Typical LSTM Tree LSTM m t-1 LSTM m t x t h 3 m 3 LSTM h 1 m 1 m 2 h 2

More Applications Sentence relatedness NN Tai, Kai Sheng, Richard Socher, and Christopher D. Manning. "Improved semantic representations rom treestructured long short-term memory networks." arxiv preprint arxiv:1503.00075 (2015). Recursive Neural Network Recursive Neural Network Sentence 1 Sentence 2

Batch Normalization

Experimental Results

Spatial Transformer. Ref: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS, 2015