Harmonic Analysis of Deep Convolutional Neural Networks

Size: px
Start display at page:

Download "Harmonic Analysis of Deep Convolutional Neural Networks"

Transcription

1 Harmonic Analysis of Deep Convolutional Neural Networks Helmut Bőlcskei Department of Information Technology and Electrical Engineering October 2017 joint work with Thomas Wiatowski and Philipp Grohs

2 ImageNet

3 ImageNet ski rock plant coffee

4 ImageNet ski rock plant coffee CNNs win the ImageNet 2015 challenge [He et al., 2015 ]

5 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ]

6 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] Carlos.

7 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] Carlos Kleiber.

8 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] Carlos Kleiber conducting the.

9 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] Carlos Kleiber conducting the Vienna Philharmonic s.

10 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] Carlos Kleiber conducting the Vienna Philharmonic s New Year s Concert.

11 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] Carlos Kleiber conducting the Vienna Philharmonic s New Year s Concert 1989.

12 Feature extraction and classification input: f = non-linear feature extraction feature vector Φ(f) linear classifier output: { w, Φ(f) > 0, w, Φ(f) < 0, Shannon von Neumann

13 Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier 1 : w, f > 0 : w, f < 0

14 Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier 1 : w, f > 0 : w, f < 0 not possible!

15 Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier 1 Φ(f) = [ f 1 ] : w, f > 0 : w, f < 0 not possible! : w, Φ(f) > 0 : w, Φ(f) < 0 [ ] 1 possible with w = 1

16 Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier Φ(f) = [ f 1 ] Φ is invariant to angular component of the data

17 Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier Φ(f) = [ f 1 ] Φ is invariant to angular component of the data Linear separability in feature space!

18 Translation invariance Handwritten digits from the MNIST database [LeCun & Cortes, 1998 ] Feature vector should be invariant to spatial location translation invariance

19 Deformation insensitivity Feature vector should be independent of cameras (of different resolutions), and insensitive to small acquisition jitters

20 Scattering networks ([Mallat, 2012], [Wiatowski and HB, 2015]) f g λ (k) 1 g (l) λ 2 f g λ (p) 1 g (r) λ 2 f g (k) λ 1 f g (p) λ 1 f feature map

21 Scattering networks ([Mallat, 2012], [Wiatowski and HB, 2015]) f g λ (k) 1 g (l) λ 2 f g λ (p) 1 g (r) λ 2 χ 3 χ 3 f g (k) λ 1 f g (p) λ 1 χ 2 f feature map χ 2 χ 1

22 Scattering networks ([Mallat, 2012], [Wiatowski and HB, 2015]) f g λ (k) 1 g (l) λ 2 f g λ (p) 1 g (r) λ 2 χ 3 χ 3 f g (k) λ 1 f g (p) λ 1 χ 2 f feature map χ 2 χ 1 feature vector Φ(f) General scattering networks guarantee [Wiatowski & HB, 2015 ] - (vertical) translation invariance - small deformation sensitivity essentially irrespective of filters, non-linearities, and poolings!

23 Building blocks Basic operations in the n-th network layer g λ (k) n non-lin. pool. f. g λ (r) n non-lin. pool. Filters: Semi-discrete frame Ψ n := {χ n } {g λn } λn Λ n A n f 2 2 f χ n λ n Λ n f g λn 2 B n f 2 2, f L 2 (R d )

24 Building blocks Basic operations in the n-th network layer g λ (k) n non-lin. pool. f. g λ (r) n non-lin. pool. Filters: Semi-discrete frame Ψ n := {χ n } {g λn } λn Λ n A n f 2 2 f χ n λ n Λ n f g λn 2 B n f 2 2, f L 2 (R d ) e.g.: Structured filters

25 Building blocks Basic operations in the n-th network layer g λ (k) n non-lin. pool. f. g λ (r) n non-lin. pool. Filters: Semi-discrete frame Ψ n := {χ n } {g λn } λn Λ n A n f 2 2 f χ n λ n Λ n f g λn 2 B n f 2 2, f L 2 (R d ) e.g.: Unstructured filters

26 Building blocks Basic operations in the n-th network layer g λ (k) n non-lin. pool. f. g λ (r) n non-lin. pool. Filters: Semi-discrete frame Ψ n := {χ n } {g λn } λn Λ n A n f 2 2 f χ n λ n Λ n f g λn 2 B n f 2 2, f L 2 (R d ) e.g.: Learned filters

27 Building blocks Basic operations in the n-th network layer g λ (k) n non-lin. pool. f. g λ (r) n non-lin. pool. Non-linearities: Point-wise and Lipschitz-continuous M n (f) M n (h) 2 L n f h 2, f, h L 2 (R d )

28 Building blocks Basic operations in the n-th network layer g λ (k) n non-lin. pool. f. g λ (r) n non-lin. pool. Non-linearities: Point-wise and Lipschitz-continuous M n (f) M n (h) 2 L n f h 2, f, h L 2 (R d ) Satisfied by virtually all non-linearities used in the deep learning literature! ReLU: L n = 1; modulus: L n = 1; logistic sigmoid: L n = 1 4 ;...

29 Building blocks Basic operations in the n-th network layer g λ (k) n non-lin. pool. f. g λ (r) n non-lin. pool. Pooling: In continuous-time according to f Sn d/2 P n (f)(s n ), where S n 1 is the pooling factor and P n : L 2 (R d ) L 2 (R d ) is R n -Lipschitz-continuous

30 Building blocks Basic operations in the n-th network layer g λ (k) n non-lin. pool. f. g λ (r) n non-lin. pool. Pooling: In continuous-time according to f Sn d/2 P n (f)(s n ), where S n 1 is the pooling factor and P n : L 2 (R d ) L 2 (R d ) is R n -Lipschitz-continuous Emulates most poolings used in the deep learning literature! e.g.: Pooling by sub-sampling P n (f) = f with R n = 1

31 Building blocks Basic operations in the n-th network layer g λ (k) n non-lin. pool. f. g λ (r) n non-lin. pool. Pooling: In continuous-time according to f Sn d/2 P n (f)(s n ), where S n 1 is the pooling factor and P n : L 2 (R d ) L 2 (R d ) is R n -Lipschitz-continuous Emulates most poolings used in the deep learning literature! e.g.: Pooling by averaging P n (f) = f φ n with R n = φ n 1

32 Vertical translation invariance Theorem (Wiatowski and HB, 2015) Assume that the filters, non-linearities, and poolings satisfy B n min{1, L 2 n Rn 2 }, n N. Let the pooling factors be S n 1, n N. Then, ( ) t Φ n (T t f) Φ n (f) = O, S 1... S n for all f L 2 (R d ), t R d, n N.

33 Vertical translation invariance Theorem (Wiatowski and HB, 2015) Assume that the filters, non-linearities, and poolings satisfy B n min{1, L 2 n Rn 2 }, n N. Let the pooling factors be S n 1, n N. Then, ( ) t Φ n (T t f) Φ n (f) = O, S 1... S n for all f L 2 (R d ), t R d, n N. Features become more invariant with increasing network depth!

34 Vertical translation invariance Theorem (Wiatowski and HB, 2015) Assume that the filters, non-linearities, and poolings satisfy B n min{1, L 2 n Rn 2 }, n N. Let the pooling factors be S n 1, n N. Then, ( ) t Φ n (T t f) Φ n (f) = O, S 1... S n for all f L 2 (R d ), t R d, n N. Full translation invariance: If lim n S 1 S 2... S n =, then lim n Φn (T t f) Φ n (f) = 0

35 Vertical translation invariance Theorem (Wiatowski and HB, 2015) Assume that the filters, non-linearities, and poolings satisfy B n min{1, L 2 n Rn 2 }, n N. Let the pooling factors be S n 1, n N. Then, ( ) t Φ n (T t f) Φ n (f) = O, S 1... S n for all f L 2 (R d ), t R d, n N. The condition B n min{1, L 2 n Rn 2 }, n N, is easily satisfied by normalizing the filters {g λn } λn Λ n.

36 Vertical translation invariance Theorem (Wiatowski and HB, 2015) Assume that the filters, non-linearities, and poolings satisfy B n min{1, L 2 n Rn 2 }, n N. Let the pooling factors be S n 1, n N. Then, ( ) t Φ n (T t f) Φ n (f) = O, S 1... S n for all f L 2 (R d ), t R d, n N. applies to general filters, non-linearities, and poolings

37 Philosophy behind invariance results Mallat s horizontal translation invariance [Mallat, 2012 ]: lim Φ W (T t f) Φ W (f) = 0, J f L 2 (R d ), t R d Vertical translation invariance: lim n Φn (T t f) Φ n (f) = 0, f L 2 (R d ), t R d

38 Philosophy behind invariance results Mallat s horizontal translation invariance [Mallat, 2012 ]: lim Φ W (T t f) Φ W (f) = 0, J f L 2 (R d ), t R d - features become invariant in every network layer, but needs J Vertical translation invariance: lim n Φn (T t f) Φ n (f) = 0, f L 2 (R d ), t R d - features become more invariant with increasing network depth

39 Philosophy behind invariance results Mallat s horizontal translation invariance [Mallat, 2012 ]: lim Φ W (T t f) Φ W (f) = 0, J f L 2 (R d ), t R d - features become invariant in every network layer, but needs J - applies to wavelet transform and modulus non-linearity without pooling Vertical translation invariance: lim n Φn (T t f) Φ n (f) = 0, f L 2 (R d ), t R d - features become more invariant with increasing network depth - applies to general filters, general non-linearities, and general poolings

40 Non-linear deformations Non-linear deformation (F τ f)(x) = f(x τ(x)), where τ : R d R d For small τ:

41 Non-linear deformations Non-linear deformation (F τ f)(x) = f(x τ(x)), where τ : R d R d For large τ:

42 Deformation sensitivity for signal classes Consider (F τ f)(x) = f(x τ(x)) = f(x e x2 ) f 1(x), (F τ f 1)(x) x f 2(x), (F τ f 2)(x) x For given τ the amount of deformation induced can depend drastically on f L 2 (R d )

43 Philosophy behind deformation stability/sensitivity bounds Mallat s deformation stability bound [Mallat, 2012 ]: Φ W (F τ f) Φ W (f) C ( 2 J τ +J Dτ + D 2 τ ) f W, for all f H W L 2 (R d ) - The signal class H W and the corresponding norm W depend on the mother wavelet (and hence the network) Our deformation sensitivity bound: Φ(F τ f) Φ(f) C C τ α, f C L 2 (R d ) - The signal class C (band-limited functions, cartoon functions, or Lipschitz functions) is independent of the network

44 Philosophy behind deformation stability/sensitivity bounds Mallat s deformation stability bound [Mallat, 2012 ]: Φ W (F τ f) Φ W (f) C ( 2 J τ +J Dτ + D 2 τ ) f W, for all f H W L 2 (R d ) - Signal class description complexity implicit via norm W Our deformation sensitivity bound: Φ(F τ f) Φ(f) C C τ α, f C L 2 (R d ) - Signal class description complexity explicit via C C - L-band-limited functions: C C = O(L) - cartoon functions of size K: C C = O(K 3/2 ) - M-Lipschitz functions C C = O(M)

45 Philosophy behind deformation stability/sensitivity bounds Mallat s deformation stability bound [Mallat, 2012 ]: Φ W (F τ f) Φ W (f) C ( 2 J τ +J Dτ + D 2 τ ) f W, for all f H W L 2 (R d ) Our deformation sensitivity bound: Φ(F τ f) Φ(f) C C τ α, f C L 2 (R d ) - Decay rate α > 0 of the deformation error is signal-classspecific (band-limited functions: α = 1, cartoon functions: α = 1 2, Lipschitz functions: α = 1)

46 Philosophy behind deformation stability/sensitivity bounds Mallat s deformation stability bound [Mallat, 2012 ]: Φ W (F τ f) Φ W (f) C ( 2 J τ +J Dτ + D 2 ) τ f W, for all f H W L 2 (R d ) - The bound depends explicitly on higher order derivatives of τ Our deformation sensitivity bound: Φ(F τ f) Φ(f) C C τ α, f C L 2 (R d ) - The bound implicitly depends on derivative of τ via the condition Dτ 1 2d

47 Philosophy behind deformation stability/sensitivity bounds Mallat s deformation stability bound [Mallat, 2012 ]: Φ W (F τ f) Φ W (f) C ( 2 J τ +J Dτ + D 2 τ ) f W, for all f H W L 2 (R d ) - The bound is coupled to horizontal translation invariance lim Φ W (T t f) Φ W (f) = 0, J Our deformation sensitivity bound: f L 2 (R d ), t R d Φ(F τ f) Φ(f) C C τ α, f C L 2 (R d ) - The bound is decoupled from vertical translation invariance lim n Φn (T t f) Φ n (f) = 0, f L 2 (R d ), t R d

48 CNNs in a nutshell CNNs used in practice employ potentially hundreds of layers and 10,000s of nodes!

49 CNNs in a nutshell CNNs used in practice employ potentially hundreds of layers and 10,000s of nodes! e.g.: Winner of the ImageNet 2015 challenge [He et al., 2015 ] - Network depth: 152 layers - average # of nodes per layer: # of FLOPS for a single forward pass: 11.3 billion

50 CNNs in a nutshell CNNs used in practice employ potentially hundreds of layers and 10,000s of nodes! e.g.: Winner of the ImageNet 2015 challenge [He et al., 2015 ] - Network depth: 152 layers - average # of nodes per layer: # of FLOPS for a single forward pass: 11.3 billion Such depths (and breadths) pose formidable computational challenges in training and operating the network!

51 Topology reduction Determine how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers

52 Topology reduction Determine how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers Guarantee trivial null-space for feature extractor Φ

53 Topology reduction Determine how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers Guarantee trivial null-space for feature extractor Φ Specify the number of layers needed to have most of the input signal energy be contained in the feature vector

54 Topology reduction Determine how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers Guarantee trivial null-space for feature extractor Φ Specify the number of layers needed to have most of the input signal energy be contained in the feature vector For a fixed (possibly small) depth, design CNNs that capture most of the input signal energy

55 Building blocks Basic operations in the n-th network layer g λ (k) n S f. g λ (r) n S Filters: Semi-discrete frame Ψ n := {χ n } {g λn } λn Λ n Non-linearity: Modulus Pooling: Sub-sampling with pooling factor S 1

56 Demodulation effect of modulus non-linearity Components of feature vector given by f g λn χ n+1 ĝ λn (ω) 1 χ n+1(ω) f(ω) ω

57 Demodulation effect of modulus non-linearity Components of feature vector given by f g λn χ n+1 ĝ λn (ω) 1 χ n+1(ω) f(ω) ω f(ω) ĝ λn (ω) 1 ω

58 Demodulation effect of modulus non-linearity Components of feature vector given by f g λn χ n+1 ĝ λn (ω) 1 χ n+1(ω) f(ω) ω f(ω) ĝ λn (ω) 1 ω Modulus squared: f g λn (x) 2 R f ĝλn (ω)

59 Demodulation effect of modulus non-linearity Components of feature vector given by f g λn χ n+1 ĝ λn (ω) 1 χ n+1(ω) f(ω) ω f(ω) ĝ λn (ω) 1 ω f g λn (ω) 1 Φ(f) via χ n+1 ω

60 Do all non-linearities demodulate? High-pass filtered signal: F(f g λ ) 2R 2R ω 2R

61 Do all non-linearities demodulate? High-pass filtered signal: F(f g λ ) 2R 2R ω 2R Modulus: Yes! F( f g λ )... but (small) tails! 2R 2R ω

62 Do all non-linearities demodulate? High-pass filtered signal: F(f g λ ) 2R 2R ω 2R Modulus squared: Yes, and sharply so! F( f g λ 2 ) 2R... but not Lipschitz-continuous! 2R ω

63 Do all non-linearities demodulate? High-pass filtered signal: F(f g λ ) 2R 2R ω 2R Rectified linear unit: No! F(ReLU(f g λ )) 2R 2R ω

64 First goal: Quantify feature map energy decay f g λ (k) 1 g (l) λ 2 f g λ (p) 1 g (r) λ 2 W 2(f) χ 3 χ 3 f g (k) λ 1 f g (p) λ 1 W 1(f) χ 2 f χ 2 χ 1

65 Assumptions (on the filters) i) Analyticity: For every filter g λn there exists a (not necessarily canonical) orthant H λn R d such that supp(ĝ λn ) H λn. ii) High-pass: There exists δ > 0 such that λ n Λ n ĝ λn (ω) 2 = 0, a.e. ω B δ (0).

66 Assumptions (on the filters) i) Analyticity: For every filter g λn there exists a (not necessarily canonical) orthant H λn R d such that supp(ĝ λn ) H λn. ii) High-pass: There exists δ > 0 such that λ n Λ n ĝ λn (ω) 2 = 0, a.e. ω B δ (0). Comprises various contructions of WH filters, wavelets, ridgelets, (α)-curvelets, shearlets ω 2 e.g.: analytic band-limited curvelets: ω 1

67 Input signal classes Sobolev functions of order s 0: { H s (R d ) = f L 2 (R d ) (1 + ω 2 ) s f(ω) } 2 dω < R d

68 Input signal classes Sobolev functions of order s 0: { H s (R d ) = f L 2 (R d ) (1 + ω 2 ) s f(ω) } 2 dω < R d H s (R d ) contains a wide range of practically relevant signal classes

69 Input signal classes Sobolev functions of order s 0: { H s (R d ) = f L 2 (R d ) (1 + ω 2 ) s f(ω) } 2 dω < R d H s (R d ) contains a wide range of practically relevant signal classes - square-integrable functions L 2 (R d ) = H 0 (R d )

70 Input signal classes Sobolev functions of order s 0: { H s (R d ) = f L 2 (R d ) (1 + ω 2 ) s f(ω) } 2 dω < R d H s (R d ) contains a wide range of practically relevant signal classes - square-integrable functions L 2 (R d ) = H 0 (R d ) - L-band-limited functions L 2 L (Rd ) H s (R d ), L > 0, s 0

71 Input signal classes Sobolev functions of order s 0: { H s (R d ) = f L 2 (R d ) (1 + ω 2 ) s f(ω) } 2 dω < R d H s (R d ) contains a wide range of practically relevant signal classes - square-integrable functions L 2 (R d ) = H 0 (R d ) - L-band-limited functions L 2 L (Rd ) H s (R d ), L > 0, s 0 - cartoon functions [Donoho, 2001 ] C CART H s (R d ), s [0, 1 2 ) Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]

72 Exponential energy decay Theorem Let the filters be wavelets with mother wavelet supp( ψ ) [r 1, r], r > 1, or Weyl-Heisenberg (WH) filters with prototype function supp(ĝ) [ R, R], R > 0. Then, for every f H s (R d ), there exists β > 0 such that ) W n (f) = O (a n(2s+β) 2s+β+1, where a = r2 +1 r 2 1 in the wavelet case, and a = R in the WH case.

73 Exponential energy decay Theorem Let the filters be wavelets with mother wavelet supp( ψ ) [r 1, r], r > 1, or Weyl-Heisenberg (WH) filters with prototype function supp(ĝ) [ R, R], R > 0. Then, for every f H s (R d ), there exists β > 0 such that ) W n (f) = O (a n(2s+β) 2s+β+1, where a = r2 +1 r 2 1 in the wavelet case, and a = R in the WH case. decay factor a is explicit and can be tuned via r, R

74 Exponential energy decay Exponential energy decay: ) W n (f) = O (a n(2s+β) 2s+β+1

75 Exponential energy decay Exponential energy decay: ) W n (f) = O (a n(2s+β) 2s+β+1 - β > 0 determines the decay of f(ω) (as ω ) according to f(ω) µ(1 + ω 2 ) ( s β 4 ), ω L, for some µ > 0, and L acts as an effective bandwidth

76 Exponential energy decay Exponential energy decay: ) W n (f) = O (a n(2s+β) 2s+β+1 - β > 0 determines the decay of f(ω) (as ω ) according to f(ω) µ(1 + ω 2 ) ( s β 4 ), ω L, for some µ > 0, and L acts as an effective bandwidth - smoother input signals (i.e., s ) lead to faster energy decay

77 Exponential energy decay Exponential energy decay: ) W n (f) = O (a n(2s+β) 2s+β+1 - β > 0 determines the decay of f(ω) (as ω ) according to f(ω) µ(1 + ω 2 ) ( s β 4 ), ω L, for some µ > 0, and L acts as an effective bandwidth - smoother input signals (i.e., s ) lead to faster energy decay - pooling through sub-sampling f S 1/2 f(s ) leads to decay factor a S

78 Exponential energy decay Exponential energy decay: ) W n (f) = O (a n(2s+β) 2s+β+1 - β > 0 determines the decay of f(ω) (as ω ) according to f(ω) µ(1 + ω 2 ) ( s β 4 ), ω L, for some µ > 0, and L acts as an effective bandwidth - smoother input signals (i.e., s ) lead to faster energy decay - pooling through sub-sampling f S 1/2 f(s ) leads to decay factor a S What about general filters? polynomial energy decay!

79 ... our second goal... trivial null-space for Φ Why trivial null-space? Feature space w : w, Φ(f) > 0 : w, Φ(f) < 0

80 ... our second goal... trivial null-space for Φ Why trivial null-space? Feature space w Φ(f ) : w, Φ(f) > 0 : w, Φ(f) < 0 Non-trivial null-space: f 0 such that Φ(f ) = 0 w, Φ(f ) = 0 for all w! these f become unclassifiable!

81 ... our second goal... Trivial null-space for feature extractor: { f L 2 (R d ) Φ(f) = 0 } = { 0 } Feature extractor Φ( ) = n=0 Φn ( ) shall satisfy for some A, B > 0. A f 2 2 Φ(f) 2 B f 2 2, f L 2 (R d ),

82 Energy conservation Theorem For the frame upper {B n } n N and frame lower bounds {A n } n N, define B := n=1 max{1, B n} and A := n=1 min{1, A n}. If then 0 < A B <, A f 2 2 Φ(f) 2 B f 2 2, f L 2 (R d ).

83 Energy conservation Theorem For the frame upper {B n } n N and frame lower bounds {A n } n N, define B := n=1 max{1, B n} and A := n=1 min{1, A n}. If then 0 < A B <, A f 2 2 Φ(f) 2 B f 2 2, f L 2 (R d ). - For Parseval frames (i.e., A n = B n = 1, n N), this yields Φ(f) 2 = f 2 2

84 Energy conservation Theorem For the frame upper {B n } n N and frame lower bounds {A n } n N, define B := n=1 max{1, B n} and A := n=1 min{1, A n}. If then 0 < A B <, A f 2 2 Φ(f) 2 B f 2 2, f L 2 (R d ). - For Parseval frames (i.e., A n = B n = 1, n N), this yields - Connection to energy decay: Φ(f) 2 = f 2 2 n 1 f 2 2 = Φ k (f) 2 + W n (f) }{{} k=0 0

85 ... and our third goal... For a given CNN, specify the number of layers needed to capture most of the input signal energy

86 ... and our third goal... For a given CNN, specify the number of layers needed to capture most of the input signal energy How many layers n are needed to have at least ((1 ε) 100)% of the input signal energy be contained in the feature vector, i.e., n (1 ε) f 2 2 Φ k (f) 2 f 2 2, f L 2 (R d ). k=0

87 Number of layers needed Theorem Let the frame bounds satisfy A n = B n = 1, n N. Let the input signal f be L-band-limited, and let ε (0, 1). If n log a ( L (1 1 ε ) ), then n (1 ε) f 2 2 Φ k (f) 2 f 2 2. k=0

88 Number of layers needed Theorem Let the frame bounds satisfy A n = B n = 1, n N. Let the input signal f be L-band-limited, and let ε (0, 1). If n log a ( L (1 1 ε ) ), then n (1 ε) f 2 2 Φ k (f) 2 f 2 2. k=0 also guarantees trivial null-space for n k=0 Φk (f)

89 Number of layers needed Theorem Let the frame bounds satisfy A n = B n = 1, n N. Let the input signal f be L-band-limited, and let ε (0, 1). If n log a ( L (1 1 ε ) ), then n (1 ε) f 2 2 Φ k (f) 2 f 2 2. k=0 - lower bound depends on - description complexity of input signals (i.e., bandwidth L) - decay factor (wavelets a = r2 +1 r 2 1, WH filters a = R )

90 Number of layers needed Theorem Let the frame bounds satisfy A n = B n = 1, n N. Let the input signal f be L-band-limited, and let ε (0, 1). If n log a ( L (1 1 ε ) ), then n (1 ε) f 2 2 Φ k (f) 2 f 2 2. k=0 - lower bound depends on - description complexity of input signals (i.e., bandwidth L) - decay factor (wavelets a = r2 +1 r 2 1, WH filters a = R ) - similar estimates for Sobolev input signals and for general filters (polynomial decay!)

91 Number of layers needed Numerical example for bandwidth L = 1: (1 ε) wavelets (r = 2) WH filters (R = 1) general filters

92 Number of layers needed Numerical example for bandwidth L = 1: (1 ε) wavelets (r = 2) WH filters (R = 1) general filters

93 Number of layers needed Numerical example for bandwidth L = 1: (1 ε) wavelets (r = 2) WH filters (R = 1) general filters Recall: Winner of the ImageNet 2015 challenge [He et al., 2015 ] - Network depth: 152 layers - average # of nodes per layer: # of FLOPS for a single forward pass: 11.3 billion

94 ... our fourth and last goal... For a fixed (possibly small) depth N, design scattering networks that capture most of the input signal energy

95 ... our fourth and last goal... For a fixed (possibly small) depth N, design scattering networks that capture most of the input signal energy Recall: Let the filters be wavelets with mother wavelet supp( ψ ) [r 1, r], r > 1, or Weyl-Heisenberg filters with prototype function supp(ĝ) [ R, R], R > 0.

96 ... our fourth and last goal... For a fixed (possibly small) depth N, design scattering networks that capture most of the input signal energy For fixed depth N, want to choose r in the wavelet and R in the WH case so that (1 ε) f 2 2 N Φ k (f) 2 f 2 2, f L 2 (R d ). k=0

97 Depth-constrained networks Theorem Let the frame bounds satisfy A n = B n = 1, n N. Let the input signal f be L-band-limited, and fix ε (0, 1) and N N. If, in the wavelet case, κ < r κ 1, or, in the WH case, ( where κ := L (1 1 ε ) 0 < R ) 1 N, then (1 ε) f κ 1, 2 N Φ k (f) 2 f 2 2. k=0

98 Depth-width tradeoff Spectral supports of wavelet filters: 1 ψ ĝ 1 ĝ 2 ĝ r r 2 r 3 r L ω

99 Depth-width tradeoff Spectral supports of wavelet filters: 1 ψ ĝ 1 ĝ 2 ĝ r r 2 r 3 r L ω Smaller depth N smaller bandwidth r of mother wavelet

100 Depth-width tradeoff Spectral supports of wavelet filters: 1 ψ ĝ 1 ĝ 2 ĝ r r 2 r 3 r L ω Smaller depth N smaller bandwidth r of mother wavelet larger number of wavelets (O(log r (L))) to cover the spectral support [ L, L] of input signal

101 Depth-width tradeoff Spectral supports of wavelet filters: 1 ψ ĝ 1 ĝ 2 ĝ r r 2 r 3 r L ω Smaller depth N smaller bandwidth r of mother wavelet larger number of wavelets (O(log r (L))) to cover the spectral support [ L, L] of input signal larger number of filters in the first layer

102 Depth-width tradeoff Spectral supports of wavelet filters: 1 ψ ĝ 1 ĝ 2 ĝ r r 2 r 3 r L ω Smaller depth N smaller bandwidth r of mother wavelet larger number of wavelets (O(log r (L))) to cover the spectral support [ L, L] of input signal larger number of filters in the first layer depth-width tradeoff

103 Yours truly

104 Experiment: Handwritten digit classification - Dataset: MNIST database of handwritten digits [LeCun & Cortes, 1998 ]; 60,000 training and 10,000 test images - Φ-network: D = 3 layers; same filters, non-linearities, and pooling operators in all layers - Classifier: SVM with radial basis function kernel [Vapnik, 1995 ] - Dimensionality reduction: Supervised orthogonal least squares scheme [Chen et al., 1991 ]

105 Experiment: Handwritten digit classification Classification error in percent: Haar wavelet Bi-orthogonal wavelet abs ReLU tanh LogSig abs ReLU tanh LogSig n.p sub max avg

106 Experiment: Handwritten digit classification Classification error in percent: Haar wavelet Bi-orthogonal wavelet abs ReLU tanh LogSig abs ReLU tanh LogSig n.p sub max avg modulus and ReLU perform better than tanh and LogSig

107 Experiment: Handwritten digit classification Classification error in percent: Haar wavelet Bi-orthogonal wavelet abs ReLU tanh LogSig abs ReLU tanh LogSig n.p sub max avg modulus and ReLU perform better than tanh and LogSig - results with pooling (S = 2) are competitive with those without pooling, at significanly lower computational cost

108 Experiment: Handwritten digit classification Classification error in percent: Haar wavelet Bi-orthogonal wavelet abs ReLU tanh LogSig abs ReLU tanh LogSig n.p sub max avg modulus and ReLU perform better than tanh and LogSig - results with pooling (S = 2) are competitive with those without pooling, at significanly lower computational cost - state-of-the-art: 0.43 [Bruna and Mallat, 2013 ] - similar feature extraction network with directional, non-separable wavelets and no pooling - significantly higher computational complexity

109 Energy decay: Related work [Waldspurger, 2017 ]: Exponential energy decay W n (f) = O(a n ), for some unspecified a > D wavelet filters - every network layer equipped with the same set of wavelets

110 Energy decay: Related work [Waldspurger, 2017 ]: Exponential energy decay W n (f) = O(a n ), for some unspecified a > D wavelet filters - every network layer equipped with the same set of wavelets - vanishing moments condition on the mother wavelet

111 Energy decay: Related work [Waldspurger, 2017 ]: Exponential energy decay W n (f) = O(a n ), for some unspecified a > D wavelet filters - every network layer equipped with the same set of wavelets - vanishing moments condition on the mother wavelet - applies to 1-D real-valued band-limited input signals f L 2 (R)

112 Energy decay: Related work [Czaja and Li, 2016 ]: Exponential energy decay for some unspecified a > 1. W n (f) = O(a n ), - d-dimensional uniform covering filters (similar to Weyl- Heisenberg filters), but does not cover multi-scale filters (e.g. wavelets, ridgedelets, curvelets etc.) - every network layer equipped with the same set of filters

113 Energy decay: Related work [Czaja and Li, 2016 ]: Exponential energy decay for some unspecified a > 1. W n (f) = O(a n ), - d-dimensional uniform covering filters (similar to Weyl- Heisenberg filters), but does not cover multi-scale filters (e.g. wavelets, ridgedelets, curvelets etc.) - every network layer equipped with the same set of filters - analyticity and vanishing moments conditions on the filters

114 Energy decay: Related work [Czaja and Li, 2016 ]: Exponential energy decay for some unspecified a > 1. W n (f) = O(a n ), - d-dimensional uniform covering filters (similar to Weyl- Heisenberg filters), but does not cover multi-scale filters (e.g. wavelets, ridgedelets, curvelets etc.) - every network layer equipped with the same set of filters - analyticity and vanishing moments conditions on the filters - applies to d-dimensional complex-valued input signals f L 2 (R d )

The Mathematics of Deep Learning Part 1: Continuous-time Theory

The Mathematics of Deep Learning Part 1: Continuous-time Theory The Mathematics of Deep Learning Part 1: Continuous-time Theory Helmut Bőlcskei Department of Information Technology and Electrical Engineering June 2016 joint work with Thomas Wiatowski, Philipp Grohs,

More information

Deep Convolutional Neural Networks on Cartoon Functions

Deep Convolutional Neural Networks on Cartoon Functions Deep Convolutional Neural Networks on Cartoon Functions Philipp Grohs, Thomas Wiatowski, and Helmut Bölcskei Dept. Math., ETH Zurich, Switzerland, and Dept. Math., University of Vienna, Austria Dept. IT

More information

A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. Thomas Wiatowski and Helmut Bölcskei, Fellow, IEEE

A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. Thomas Wiatowski and Helmut Bölcskei, Fellow, IEEE A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction Thomas Wiatowski and Helmut Bölcskei, Fellow, IEEE arxiv:52.06293v3 [cs.it] 24 Oct 207 Abstract Deep convolutional neural

More information

Topics in Harmonic Analysis, Sparse Representations, and Data Analysis

Topics in Harmonic Analysis, Sparse Representations, and Data Analysis Topics in Harmonic Analysis, Sparse Representations, and Data Analysis Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu Thesis Defense,

More information

Deep Convolutional Neural Networks Based on Semi-Discrete Frames

Deep Convolutional Neural Networks Based on Semi-Discrete Frames Deep Convolutional Neural Networks Based on Semi-Discrete Frames Thomas Wiatowski and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {withomas, boelcskei}@nari.ee.ethz.ch arxiv:504.05487v

More information

Invariant Scattering Convolution Networks

Invariant Scattering Convolution Networks Invariant Scattering Convolution Networks Joan Bruna and Stephane Mallat Submitted to PAMI, Feb. 2012 Presented by Bo Chen Other important related papers: [1] S. Mallat, A Theory for Multiresolution Signal

More information

Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties. Philipp Christian Petersen

Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties. Philipp Christian Petersen Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties Philipp Christian Petersen Joint work Joint work with: Helmut Bölcskei (ETH Zürich) Philipp Grohs

More information

Introduction to (Convolutional) Neural Networks

Introduction to (Convolutional) Neural Networks Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient

More information

arxiv: v1 [cs.lg] 26 May 2016

arxiv: v1 [cs.lg] 26 May 2016 arxiv:605.08283v [cs.lg] 26 May 206 Thomas Wiatowski Michael Tschannen Aleksandar Stanić Philipp Grohs 2 Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland 2 Dept. Math., University of Vienna, Austria

More information

Image classification via Scattering Transform

Image classification via Scattering Transform 1 Image classification via Scattering Transform Edouard Oyallon under the supervision of Stéphane Mallat following the work of Laurent Sifre, Joan Bruna, Classification of signals Let n>0, (X, Y ) 2 R

More information

Classification of Hand-Written Digits Using Scattering Convolutional Network

Classification of Hand-Written Digits Using Scattering Convolutional Network Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview

More information

FEATURE extraction based on deep convolutional neural. Energy Propagation in Deep Convolutional Neural Networks

FEATURE extraction based on deep convolutional neural. Energy Propagation in Deep Convolutional Neural Networks Energy Propagation in Deep Convolutional Neural Networks Thomas Wiatowski, Philipp Grohs, an Helmut Bölcskei, Fellow, IEEE Abstract Many practical machine learning tasks employ very eep convolutional neural

More information

On the road to Affine Scattering Transform

On the road to Affine Scattering Transform 1 On the road to Affine Scattering Transform Edouard Oyallon with Stéphane Mallat following the work of Laurent Sifre, Joan Bruna, High Dimensional classification (x i,y i ) 2 R 5122 {1,...,100},i=1...10

More information

arxiv: v3 [math.fa] 7 Feb 2018

arxiv: v3 [math.fa] 7 Feb 2018 ANALYSIS OF TIME-FREQUENCY SCATTERING TRANSFORMS WOJCIECH CZAJA AND WEILIN LI arxiv:1606.08677v3 [math.fa] 7 Feb 2018 Abstract. In this paper we address the problem of constructing a feature extractor

More information

Wavelet Scattering Transforms

Wavelet Scattering Transforms Wavelet Scattering Transforms Haixia Liu Department of Mathematics The Hong Kong University of Science and Technology February 6, 2018 Outline 1 Problem Dataset Problem two subproblems outline of image

More information

WAVELETS, SHEARLETS AND GEOMETRIC FRAMES: PART II

WAVELETS, SHEARLETS AND GEOMETRIC FRAMES: PART II WAVELETS, SHEARLETS AND GEOMETRIC FRAMES: PART II Philipp Grohs 1 and Axel Obermeier 2 October 22, 2014 1 ETH Zürich 2 ETH Zürich, supported by SNF grant 146356 OUTLINE 3. Curvelets, shearlets and parabolic

More information

Spatial Transformer Networks

Spatial Transformer Networks BIL722 - Deep Learning for Computer Vision Spatial Transformer Networks Max Jaderberg Andrew Zisserman Karen Simonyan Koray Kavukcuoglu Contents Introduction to Spatial Transformers Related Works Spatial

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Lipschitz Properties of General Convolutional Neural Networks

Lipschitz Properties of General Convolutional Neural Networks Lipschitz Properties of General Convolutional Neural Networks Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD Joint work with Dongmian Zou (UMD), Maneesh

More information

Introduction to Deep Learning CMPT 733. Steven Bergner

Introduction to Deep Learning CMPT 733. Steven Bergner Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Multi-Layer Boosting for Pattern Recognition

Multi-Layer Boosting for Pattern Recognition Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

Handwritten Indic Character Recognition using Capsule Networks

Handwritten Indic Character Recognition using Capsule Networks Handwritten Indic Character Recognition using Capsule Networks Bodhisatwa Mandal,Suvam Dubey, Swarnendu Ghosh, RiteshSarkhel, Nibaran Das Dept. of CSE, Jadavpur University, Kolkata, 700032, WB, India.

More information

ON THE STABILITY OF DEEP NETWORKS

ON THE STABILITY OF DEEP NETWORKS ON THE STABILITY OF DEEP NETWORKS AND THEIR RELATIONSHIP TO COMPRESSED SENSING AND METRIC LEARNING RAJA GIRYES AND GUILLERMO SAPIRO DUKE UNIVERSITY Mathematics of Deep Learning International Conference

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04 Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures Lecture 04 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Self-Taught Learning 1. Learn

More information

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:

More information

Approximation theory in neural networks

Approximation theory in neural networks Approximation theory in neural networks Yanhui Su yanhui su@brown.edu March 30, 2018 Outline 1 Approximation of functions by a sigmoidal function 2 Approximations of continuous functionals by a sigmoidal

More information

Encoder Based Lifelong Learning - Supplementary materials

Encoder Based Lifelong Learning - Supplementary materials Encoder Based Lifelong Learning - Supplementary materials Amal Rannen Rahaf Aljundi Mathew B. Blaschko Tinne Tuytelaars KU Leuven KU Leuven, ESAT-PSI, IMEC, Belgium firstname.lastname@esat.kuleuven.be

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

Normalization Techniques in Training of Deep Neural Networks

Normalization Techniques in Training of Deep Neural Networks Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,

More information

Computational Harmonic Analysis (Wavelet Tutorial) Part II

Computational Harmonic Analysis (Wavelet Tutorial) Part II Computational Harmonic Analysis (Wavelet Tutorial) Part II Understanding Many Particle Systems with Machine Learning Tutorials Matthew Hirn Michigan State University Department of Computational Mathematics,

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

arxiv: v2 [cs.cv] 30 May 2015

arxiv: v2 [cs.cv] 30 May 2015 Deep Roto-Translation Scattering for Object Classification Edouard Oyallon and Stéphane Mallat Département Informatique, Ecole Normale Supérieure 45 rue d Ulm, 75005 Paris edouard.oyallon@ens.fr arxiv:1412.8659v2

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr

More information

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels Machine Learning: Basis and Wavelet 32 157 146 204 + + + + + - + - 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 7 22 38 191 17 83 188 211 71 167 194 207 135 46 40-17 18 42 20 44 31 7 13-32 + + - - +

More information

Final Examination CS540-2: Introduction to Artificial Intelligence

Final Examination CS540-2: Introduction to Artificial Intelligence Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Topic 3: Neural Networks

Topic 3: Neural Networks CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why

More information

Invariant Scattering Convolution Networks

Invariant Scattering Convolution Networks 1 Invariant Scattering Convolution Networks Joan Bruna and Stéphane Mallat CMAP, Ecole Polytechnique, Palaiseau, France Abstract A wavelet scattering network computes a translation invariant image representation,

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Supervised Learning Part I

Supervised Learning Part I Supervised Learning Part I http://www.lps.ens.fr/~nadal/cours/mva Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information

Back to the future: Radial Basis Function networks revisited

Back to the future: Radial Basis Function networks revisited Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Convolutional Neural Network Architecture

Convolutional Neural Network Architecture Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

The role of dimensionality reduction in classification

The role of dimensionality reduction in classification The role of dimensionality reduction in classification Weiran Wang and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Convolutional Networks Convolutional

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Channel Pruning and Other Methods for Compressing CNN

Channel Pruning and Other Methods for Compressing CNN Channel Pruning and Other Methods for Compressing CNN Yihui He Xi an Jiaotong Univ. October 12, 2017 Yihui He (Xi an Jiaotong Univ.) Channel Pruning and Other Methods for Compressing CNN October 12, 2017

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Adversarial Examples

Adversarial Examples Adversarial Examples presentation by Ian Goodfellow Deep Learning Summer School Montreal August 9, 2015 In this presentation. - Intriguing Properties of Neural Networks. Szegedy et al., ICLR 2014. - Explaining

More information

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Ben Haeffele and René Vidal Center for Imaging Science Institute for Computational Medicine Learning Deep Image Feature Hierarchies

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

MGA Tutorial, September 08, 2004 Construction of Wavelets. Jan-Olov Strömberg

MGA Tutorial, September 08, 2004 Construction of Wavelets. Jan-Olov Strömberg MGA Tutorial, September 08, 2004 Construction of Wavelets Jan-Olov Strömberg Department of Mathematics Royal Institute of Technology (KTH) Stockholm, Sweden Department of Numerical Analysis and Computer

More information

Edge preserved denoising and singularity extraction from angles gathers

Edge preserved denoising and singularity extraction from angles gathers Edge preserved denoising and singularity extraction from angles gathers Felix Herrmann, EOS-UBC Martijn de Hoop, CSM Joint work Geophysical inversion theory using fractional spline wavelets: ffl Jonathan

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Support Vector Machines

Support Vector Machines Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781

More information

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

ACM 126a Solutions for Homework Set 4

ACM 126a Solutions for Homework Set 4 ACM 26a Solutions for Homewor Set 4 Laurent Demanet March 2, 25 Problem. Problem 7.7 page 36 We need to recall a few standard facts about Fourier series. Convolution: Subsampling (see p. 26): Zero insertion

More information

Quantum Chemistry Energy Regression

Quantum Chemistry Energy Regression Quantum Chemistry Energy Regression with Scattering Transforms Matthew Hirn, Stéphane Mallat École Normale Supérieure Nicolas Poilvert Penn-State 1 Quantum Chemistry Regression 2 Linear Regression 3 Energy

More information

Lightweight Probabilistic Deep Networks Supplemental Material

Lightweight Probabilistic Deep Networks Supplemental Material Lightweight Probabilistic Deep Networks Supplemental Material Jochen Gast Stefan Roth Department of Computer Science, TU Darmstadt In this supplemental material we derive the recipe to create uncertainty

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN) EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN) TARGETED PIECES OF KNOWLEDGE Linear regression Activation function Multi-Layers Perceptron (MLP) Stochastic Gradient Descent

More information

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

( nonlinear constraints)

( nonlinear constraints) Wavelet Design & Applications Basic requirements: Admissibility (single constraint) Orthogonality ( nonlinear constraints) Sparse Representation Smooth functions well approx. by Fourier High-frequency

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

Multimodal context analysis and prediction

Multimodal context analysis and prediction Multimodal context analysis and prediction Valeria Tomaselli (valeria.tomaselli@st.com) Sebastiano Battiato Giovanni Maria Farinella Tiziana Rotondo (PhD student) Outline 2 Context analysis vs prediction

More information

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information