Image classification via Scattering Transform

Size: px

Start display at page:

Download "Image classification via Scattering Transform"

Sophia Lynch
5 years ago
Views:

1 1 Image classification via Scattering Transform Edouard Oyallon under the supervision of Stéphane Mallat following the work of Laurent Sifre, Joan Bruna,

2 Classification of signals Let n>0, (X, Y ) 2 R n Y random variables Problem: Estimate ŷ such that ŷ = arg infỹ E( ỹ(x) Y ) We are given a training set (x i,y i ) 2 R n Yto build ŷ Say one can write ŷ = Classifier( x), Classifier being built with ( x i,y i ) 3 ways to build : Supervised Unsupervised Predefined (x i,y i ) i (x i ) i Geometric priors Y = {, } n =2 Classifier w <latexit w<latexit sha1_base64="eqfykhkxp29jldl sha1_base64="o0lqj2xoqxj8acw

3 3 Research statement What are the mathematical foundations of signal classifications? Which mechanisms can give us state-of-the-art results on standard datasets? How much supervision is required? Is it a geometrical problem? How can we build? Rhino or cat? The variability is HUGE!

4 x y High-dimensional variabilities 4 Claim: In R n,n 1, the variance is huge. Ex.: E(X) =0 Claim: Small deformations (not parametric) can have huge effects: Ex.: X N(0,I n ) then 9C >0, 8n, P(kXk t)) apple 2e t2 x 2 L 2 (R n ), 2C 1 The variance is high, and the bias is difficult to estimate. There are also few available samples How to handle that? define L x(u) =x(u (u) =, C R 2, k1 C L 1 C k =2 (u)) Cn kx yk 2 =2

5 5 A metric for natural images? Claim: Images do not lie on a low-dimensional manifold: an image is only an element of Two kind of variabilities: Geometric A (E) Occlusion, L 2 Fix the metric, how to design an appropriate? Relatively to the variability L: k Lx Relatively to the class: L 2 (R 2 ) xk appleklk Classes Intra-class Extra-class y i 6= y j )k x i x j k 1 y i = y j )k x i x j k 1 Observe that the final dimensionality reduction is extreme. n 10 6, Y 100

6 Neural Networks get state-of-the-art results on many datasets since Yet it is a pure black box. Learning: DeepNet, an engineered solution 6 Classifier f(f 1 ) f(f 2 ) f(f J ) Engineers propose to use a cascade of operators (F j ) j followed by a pointwise nonlinearity f where each operator is learned. (Neural Networks) Ref.: Convolutional Networks and Applications in Vision, Y. Lecun et al. Parameters are supervisedly learned from the samples.

7 7 Dimensionality reduction Two interpretable mechanisms seem empirically implemented: Linearization: Only requires the use of a linear projection to totally handle the variability Theorem: sup x6=y k x kx yk yk < 1)9 ) (x + h) x x h Invariance: An averaging along the orbit L of a variability; however, we lose in discriminability X L2L Lx How could we take advantage of these strategies?

8 Geometric variability modelled as Lie Groups 8 The affine group A (E) is a good approximation of 3D variability, smoothly acting on images: 8x 2 L 2 (R 2 ) 8g 2 R 2 o SO 2 (R) R + g.x(u) =(v, r,s).x(u) =x(r ( u v )) s A Lie group G can be equipped with a Haar measure which is invariant by GZ : Z 8 g, 8A 2B(G), Invariance can Z be achieved via Z an averaging G Yet how do we recover the nonzero frequencies? G L g f(g)dg = 1 A (g)dg = G f(g)dg G 1 A ( g.g)dg

9 Wavelet Transform on Lie groups 9 A wavelet is defined by 2 L 2 (G), ˆ(e) =0and can be dilated via = L! 2 Theorem: Let G be a compact Lie group, for appropriate mother wavelet Z and then Wx = { is an isometry and covariant with the action of G Proposition: W almost commutes with deformations but is not invariant to translation k[w, L ]kappleck k Ex. : In the case of images, a wavelet is obtained via a mother wavelet dilated and rotated G Ref.: Stein, E. M. Topics in harmonic analysis related to the Littlewood-Paley theory. x, x? G } 2 j, = 1 2 2j (r ( u 2 j )) Ref.: Group Invariant Scattering, Mallat S Littlehood-Paley for images: Ŵ 2! 1

10 The Roto-translation Scattering Transform A Scattering Transform is the cascade of complex 10 wavelets transform and modulus, followed by a final averaging: Z Sx = { Theorem: Can be reformulated as a NeuralNet x Z Z x, x? G 1, x? G 1? G 2,...} G 1 G 2 G 3 Ex.: G 1 = R 2,G 2 = R 2 or R 2 o SO 2 (R).. Z Z Z Theorem: is invariant and linearizes deformation Issue: the averaging can absorb all the variability necessary to discrimination. Ref.: Invariant Convolutional Scattering Network, J. Bruna et Mallat S Ref.: Group Invariant Scattering, Mallat S Ref.: Group Invariant Scattering, Mallat S

11 Selecting variability 11 Discovering more complex groups is necessary to build more complex invariants: R 2,! SO 2 (R) or 2,!... The metrics defined by Translation or RotoTranslation are not equivalent.(recombining angles increase discriminability) Claim: Too much group invariance reduces the discriminability power. Some invariance is likely to be selected. vs Translation Scattering does not see the difference How competitive are we with learned representations?

How much learning is really 12 10 4 101 256 256 required? Dataset Type Paper Accuracy Caltech101 Scattering 79.

8 images classes CALTECH color images Unsupervised RFL 54.2 Supervised DeepNet 65.

12 How much learning is really required? Dataset Type Paper Accuracy Caltech101 Scattering 79.9 Unsupervised Ask the locals 77.3 Supervised DeepNet 91.4 CIFAR100 Scattering 56.8 images classes CALTECH color images Unsupervised RFL 54.2 Supervised DeepNet 65.4 CIFAR Group representations are competitive with representations learned from data without labels Ref.: Deep Roto-Translation Scattering for Object Classification. EO and S Mallat Identical Representation images classes color images

13 Explaining the dimensionality reduction in Neural Networks 13 Understanding NeuralNet is a problem of both exploration and experimentation. We need to measure variability reduction: Nonlinear Linear Sparsity as a measure Rank of the operators Linear separability kxk 0 =#{i, x i 6=0} Low dimensionality Low co-dimensionality The origin of dimension reduction could be either learning (contextual) or the architecture (structural), the correct reason is probably a mix.

14 14 Linearization: strategy A good measure is the ability of a linear classifier (SVM) to classify correctly. Linear SVM In a regular architecture, as we get deeper, more and more linearization occurs. F 1 F 2 Accuracy F 3 F 4 Ref.: Support-vector networks Cortes C., & Vapnik V. What is the nature of the linearized variabilities? Could they lead to a complex model on natural images?

15 Fill in the performance gap We used small datasets ( < 5.10 4 with small or big images ) to perform our numerical experiments.

15 15 Fill in the performance gap We used small datasets ( < with small or big images ) to perform our numerical experiments. Challenge: ImageNet( 10 6 large images) Requires the use of GPUs! Lots of engineering Typical computations take around 1s for an input size of But there are also a lot of maths to discover!

16 16 Conclusion We propose tools to analyse the deep networks, and to reduce variabilities in high-dimensional problems. State-of-the-arts with unsupervised learning. We provide software: (or send me an to get the latest!) Merci au DIM de la région Île de France!

On the road to Affine Scattering Transform

1 On the road to Affine Scattering Transform Edouard Oyallon with Stéphane Mallat following the work of Laurent Sifre, Joan Bruna, High Dimensional classification (x i,y i ) 2 R 5122 {1,...,100},i=1...10