Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties. Philipp Christian Petersen

Size: px
Start display at page:

Download "Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties. Philipp Christian Petersen"

Transcription

1 Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties Philipp Christian Petersen

2 Joint work Joint work with: Helmut Bölcskei (ETH Zürich) Philipp Grohs (University of Vienna) Joost Opschoor (ETH Zürich) Gitta Kutyniok (TU Berlin) Mones Raslan (TU Berlin) Christoph Schwab (ETH Zürich) Felix Voigtlaender (KU Eichstätt-Ingolstadt) 1 / 36

3 Today s Goal Goal of this talk: Discuss the suitability of neural networks as an ansatz system for the solution of PDEs. Two threads: Structural properties: Approximation theory: I universal approximation I I optimal approximation rates for all classical function spaces non-convex, non-closed ansatz spaces I parametrization not stable I very hard to optimize over reduced curse of dimension I / 36

4 Outline Neural networks Introduction to neural networks Approaches to solve PDEs Approximation theory of neural networks Classical results Optimality High-dimensional approximation Structural results Convexity Closedness Stable parametrization 3 / 36

5 Neural networks We consider neural networks as a special kind of functions: d = N 0 N: input dimension, L: number of layers, ϱ : R R: activation function, T l : R N l 1 R N l, l = 1,..., L: affine-linear maps. Then Φ ϱ : R d R N L given by Φ ϱ (x) = T L (ϱ(t L 1 (ϱ(... ϱ(t 1 (x)))))), x R d, is called a neural network (NN). The sequence (d, N 1,..., N L ) is called the architecture of Φ ϱ. 4 / 36

6 Why are neural networks interesting? - I Deep Learning: Deep learning describes a variety of techniques based on data-driven adaptation of the affine linear maps in a neural network. Overwhelming success: Image classification Text understanding Game intelligence Ren, He, Girshick, Sun; 2015 Hardware design of the future! 5 / 36

7 Why are neural networks interesting? - II Expressibility: Neural networks constitute a very powerful architecture. Theorem (Cybenko; 1989, Hornik; 1991, Pinkus; 1999) Let d N, K R d compact, f : K R continuous, ϱ : R R continuous and not a polynomial. Let ε > 0, then there exist a two-layer NN Φ ϱ : f Φ ϱ ε. Efficient expressibility: R M θ (T 1,..., T L ) Φ ϱ θ yields a parametrized system of functions. In a sense this parametrization is optimally efficient. (More on this below). 6 / 36

8 How can we apply NNs to solve PDEs? PDE problem: For D R d, d N find u such that G(x, u(x), u(x), 2 u(x)) = 0 for all x D. Approach of [Lagaris, Likas, Fotiadis; 1998]: Let (x i ) i I D, find a NN Φ ϱ θ such that G(x i, Φ ϱ θ (x i), Φ ϱ θ (x i), 2 Φ ϱ θ (x i)) = 0 for all i I. Standard methods can be used to find parameters θ. 7 / 36

9 Approaches to solve PDEs - Examples General Framework: Deep Ritz Method [E, Yu; 2017]: NNs as trial functions, SGD naturally replaces quadrature. High-dimensional PDEs: [Sirignano, Spiliopoulos; 2017]: Let D R d d 100 find u such that u (t, x) + H(u)(t, x) = 0, t (t, x) [0, T ] Ω, + BC + IC As the number of parameters of the NNs increases the minimizer of associated energy approaches true solution. No mesh generation required! [Berner, Grohs, Hornung, Jentzen, von Wurstemberger; 2017]: Phrasing problem as empirical risk minimization provably no curse of dimension in approximation problem or number of samples. 8 / 36

10 How can we apply NNs to solve PDEs? Deep learning and PDEs: Both approaches above are based on two ideas. Neural networks are highly efficient in representing solutions of PDEs, hence the complexity of the problem can be greatly reduced. There exist black box methods from machine learning that solve the optimization problem. This talk: We will show exactly how efficient the representations are. Raise doubt that the black box can produce reliable results in general. 9 / 36

11 Approximation theory of neural networks 10 / 36

12 Complexity of neural networks Recall: Φ ϱ (x) = T L (ϱ(t L 1 (ϱ(... ϱ(t 1 (x)))))), x R d. Each affine linear mapping T l is defined by a matrix A l R N l N l 1 and a translation b l R N l via T l (x) = A l x + b l. The number of weights W (Φ ϱ ) and the number of neurons N(Φ ϱ ) are W (Φ ϱ ) = L j l 0 + b j l 0) and N(Φ j L( A ϱ ) = N j. j=0 11 / 36

13 Power of the architecture Exemplary results Given f from some class of functions, how many weights/neurons does an ε-approximating NN need to have? 12 / 36

14 Power of the architecture Exemplary results Given f from some class of functions, how many weights/neurons does an ε-approximating NN need to have? Not so many... Theorem (Maiorov, Pinkus; 1999) There exists an activation function ϱ weird : R R that is analytic and strictly increasing, satisfies lim x ϱ weird (x) = 0 and lim x ϱ weird (x) = 1, such that for any d N, any f C([0, 1] d ), and any ε > 0, there is a 3-layer ϱ-network Φ ϱ weird ε with f Φ ϱ weird ε L ε and N(Φ ϱ weird ε ) = 9d / 36

15 Power of the architecture Exemplary results Barron; 1993: Approximation rate for functions with one finite Fourier moment using shallow networks with activation function ϱ sigmoidal of order zero. 13 / 36

16 Power of the architecture Exemplary results Barron; 1993: Approximation rate for functions with one finite Fourier moment using shallow networks with activation function ϱ sigmoidal of order zero. Mhaskar; 1993: Let ϱ be sigmoidal function of order k 2. For f C s ([0, 1] d ), we have f Φ ϱ n L N(Φ ϱ n) s/d and L(Φ ϱ n) = L(d, s, k). 13 / 36

17 Power of the architecture Exemplary results Barron; 1993: Approximation rate for functions with one finite Fourier moment using shallow networks with activation function ϱ sigmoidal of order zero. Mhaskar; 1993: Let ϱ be sigmoidal function of order k 2. For f C s ([0, 1] d ), we have f Φ ϱ n L N(Φ ϱ n) s/d and L(Φ ϱ n) = L(d, s, k). Yarotsky; 2017: For f C s ([0, 1] d ), we have for ϱ(x) = x + (called ReLU) that f Φ ϱ n L W (Φ ϱ n) s/d and L(Φ ϱ ε) log(n). 13 / 36

18 Power of the architecture Exemplary results Barron; 1993: Approximation rate for functions with one finite Fourier moment using shallow networks with activation function ϱ sigmoidal of order zero. Mhaskar; 1993: Let ϱ be sigmoidal function of order k 2. For f C s ([0, 1] d ), we have f Φ ϱ n L N(Φ ϱ n) s/d and L(Φ ϱ n) = L(d, s, k). Yarotsky; 2017: For f C s ([0, 1] d ), we have for ϱ(x) = x + (called ReLU) that f Φ ϱ n L W (Φ ϱ n) s/d and L(Φ ϱ ε) log(n). Shaham, Cloninger, Coifman; 2015: One can implement certain wavelets using 4 layer NNs. 13 / 36

19 Power of the architecture Exemplary results Barron; 1993: Approximation rate for functions with one finite Fourier moment using shallow networks with activation function ϱ sigmoidal of order zero. Mhaskar; 1993: Let ϱ be sigmoidal function of order k 2. For f C s ([0, 1] d ), we have f Φ ϱ n L N(Φ ϱ n) s/d and L(Φ ϱ n) = L(d, s, k). Yarotsky; 2017: For f C s ([0, 1] d ), we have for ϱ(x) = x + (called ReLU) that f Φ ϱ n L W (Φ ϱ n) s/d and L(Φ ϱ ε) log(n). Shaham, Cloninger, Coifman; 2015: One can implement certain wavelets using 4 layer NNs. He, Li, Xu, Zheng; 2018, Opschoor, Schwab, P.; 2019: ReLU NNs reproduce approximation rates of h-, p- and hp-fem. 13 / 36

20 Lower bounds Optimal approximation rates: Lower bounds on required network size only exist under additional assumptions. (Recall networks based on ϱ weird ). Options: (A) Place restrictions on activation function (e.g. only consider the ReLU), thereby excluding pathological examples like ϱ weird. ( VC dimension bounds) (B) Place restrictions on the weights. ( Information theoretical bounds, entropy arguments) (C) Use still other concepts like continuous N-widths. 14 / 36

21 Lower bounds Optimal approximation rates: Lower bounds on required network size only exist under additional assumptions. (Recall networks based on ϱ weird ). Options: (A) Place restrictions on activation function (e.g. only consider the ReLU), thereby excluding pathological examples like ϱ weird. ( VC dimension bounds) (B) Place restrictions on the weights. ( Information theoretical bounds, entropy arguments) (C) Use still other concepts like continuous N-widths. 14 / 36

22 Asymptotic min-max rate distortion Encoders: Let C L 2 (R d ), l N { E l := E : C {0, 1} l} { }, D l := D : {0, 1} l L 2 (R d ). {0, 1, 0, 0, 1, 1, 1} Min-max code length: { } L(ɛ, C) := min l N : D D l, C C l :sup D(E(f )) f 2 < ɛ f C. Optimal exponent: γ (C) := inf { γ > 0 : L(ɛ, C) = O(ɛ γ ) }. 15 / 36

23 Asymptotic min-max rate distortion Theorem (Boelcskei, Grohs, Kutyniok, P.; 2017) Let C L 2 (R d ), ϱ : R R, then for all ɛ > 0: sup f C inf Φ ϱ NN with quantised weights Φ ϱ f 2 ɛ W (Φ ϱ ) ɛ γ (C). (1) Optimal approximation/parametrization: If for C L 2 (R d ) one also has in (1), then NNs approximate a function class optimally. Versatility: It turns out that NNs achieve optimal approximation rates for many practically-used function classes. 16 / 36

24 Some instances of optimal approximation Mhaskar; 1993: Let ϱ be sigmoidal function of order k 2. For f C s ([0, 1] d ), we have f Φ ϱ n L N(Φ ϱ n) s/d. We have γ ({f C s ([0, 1] d : f 1}) = d/s. Shaham, Cloninger, Coifman; 2015: One can implement certain wavelets using 4 layer ReLU NNs. Optimal, when wavelets are optimal. Bölcskei, Grohs, Kutyniok, P.; 2017: Networks yield optimal rates if any affine system does. Example: shearlets for cartoon-like functions. 17 / 36

25 ReLU Approximation Piecewise smooth functions: E β,d denotes the d-dimensional C β -piecewise smooth functions on [0, 1] d with interfaces in C β Theorem (P., Voigtlaender; 2018) Let d N, β 0, ϱ : R R, ϱ(x) = x +, then sup f E β,d inf Φ ϱ NN with quantised weights Φ ϱ f 2 ɛ W (Φ ϱ ) ɛ γ (E β,d ) = ɛ 2(d 1)/β. The optimal depth of the networks is β/d. 18 / 36

26 High-dimensional approximation Curse of dimension: To guarantee approximation with error ε of functions in Eβ,d one requires networks with O(ε 2(d 1)/β ) weights. Symmetries and invariances: Image classifiers are often: I translation, dilation, and rotation invariant, I invariant to small deformations, I invariant to small changes in brightness, contrast, color. 19 / 36

27 Curse of dimension Two-step setup: f = χ τ τ : R D R d is a smooth dimension reducing feature map. χ E β,d performs classification on low-dimensional space. 20 / 36

28 Curse of dimension Two-step setup: f = χ τ τ : R D R d is a smooth dimension reducing feature map. χ E β,d performs classification on low-dimensional space. Theorem (P., Voigtlaender; 2017) Let ϱ(x) = x +. There are constants c > 0, L N such that for any f = χ τ and any ε (0, 1/2), there is a NN Φ ϱ ε with at most L layers, and at most c ε 2(d 1)/β non-zero weights such that Φ ϱ ε f L 2 < ε. Asymptotic approximation rate depends only on d, not on D. 20 / 36

29 Compositional functions Compositional functions: [Mhaskar, Poggio; 2016] High-dimensional functions as dyadic composition of 2-dimensional functions. R 8 x h 3 1(h 2 1(h 1 1(x 1, x 2 ), h 1 2(x 3, x 4 )), h 2 2(h 1 3(x 5, x 6 ), h 1 4(x 7, x 8 ))) x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 21 / 36

30 Extensions Approximation with respect to Sobolev norms: ReLU NNs Φ are Lipschitz continuous. Hence, for s [0, 1], p 1 and f W s,p (Ω), we can measure f Φ W s,p (Ω). ReLU Networks achieve the same approximation rates as h-, p-, hp-fem, [Opschoor, P., Schwab; 2019]. Convolutional neural networks: Direct correspondence between approximation by CNNs (without pooling) and approximation by fully-connected networks, [P., Voigtlaender; 2018]. 22 / 36

31 Optimal parametrization Optimal parametrization: Neural networks yield optimal representations of many function classes relevant in PDE applications, Approximation is flexible and quality is improved if low-dimensional structure is present. PDE discretization: Problem complexity drastically reduced, No design of ansatz system necessary, since NNs approximate almost every function class well. Can neural networks really be this good? 23 / 36

32 The inconvenient structure of neural networks 24 / 36

33 Fixed architecture networks Goal: Fix a space of networks with prescribed shape and understand the associated set of functions. Fixed architecture networks: Let d, L N, N 1,..., N L 1 N, ϱ : R R then we define by N N ϱ (d, N 1,..., N L 1, 1) the set of NNs with architecture (d, N 1,..., N L 1, 1). d = 8 N 1 = 12 N 2 = 12 N 3 = 12 N 4 = 8 25 / 36

34 Back to the basics Topological properties: Is N N ϱ (d, N 1,..., N L 1, 1) star-shaped? convex? approximately convex? closed? Is the map (T 1,..., T L ) Φ open? Implications for optimization: If we do not have the properties above, then we can have terrible local minima, exploding weights, very slow convergence. 26 / 36

35 Star-shapedness Star-shapedness: N N ϱ (d, N 1,..., N L 1, 1) is trivially star-shaped with center 0....but... Proposition (P., Raslan, Voigtlaender; 2018) Let d, L, N, N 1,..., N L 1 N and let ϱ : R R be locally Lipschitz continuous. Then the number of linearly independent centers of N N ϱ (d, N 1,..., N L ) is at most L l=1 (N l 1 + 1)N l, where N 0 = d. 27 / 36

36 Convexity? Corollary (P., Raslan, Voigtlaender; 2018) Let d, L, N, N 1,..., N L 1 N, N 0 = d, and let ϱ : R R be locally Lipschitz continuous. If N N ϱ (d, N 1,..., N L 1, 1) contains more than L l=1 (N l 1 + 1)N l linearly independent functions, then N N ϱ (d, N 1,..., N L 1, 1) is not convex. From translation invariance: If N N ϱ (d, N 1,..., N L 1, 1) only finitely many linearly independent functions then ϱ is a finite sum of complex exponentials multiplied with polynomials. 28 / 36

37 Weak Convexity? Weak convexity: N N ϱ (d, N 1,..., N L 1, 1) is almost never convex, but what about N N ϱ (d, N 1,..., N L 1, 1) + B ɛ (0) for a hopefully small ɛ > 0? 29 / 36

38 Weak Convexity? Weak convexity: N N ϱ (d, N 1,..., N L 1, 1) is almost never convex, but what about N N ϱ (d, N 1,..., N L 1, 1) + B ɛ (0) for a hopefully small ɛ > 0? Theorem (P., Raslan, Voigtlaender; 2018) Let d, L, N, N 1,..., N L 1 N, N 0 = d. For all commonly-used activation functions there does not exist ɛ > 0 such that N N ϱ (d, N 1,..., N L 1, 1) + B ɛ (0) is convex. As a corollary, we also get that N N ϱ (d, N 1,..., N L 1, 1) is usually nowhere dense. 29 / 36

39 Illustration Illustration: The set N N ϱ (d, N 1,..., N L 1, 1) has very few centers, it is scaling invariant, not approximately convex, and nowhere dense. 30 / 36

40 Closedness in L p Compact weights: If the activation function ϱ is continuous, then a compactness argument shows that the set of networks of a compact parameter set is closed. 31 / 36

41 Closedness in L p Compact weights: If the activation function ϱ is continuous, then a compactness argument shows that the set of networks of a compact parameter set is closed. Theorem (P., Raslan, Voigtlaender; 2018) Let d, L, N 1,..., N L 1 N, N 0 = d. If ϱ has one of the properties below, then N N ϱ (d, N 1,..., N L 1, 1) is not closed in L p, p (0, ). analytic, bounded, not constant, C 1 but not C, continuous, monotone, bounded, ϱ (x 0 ) exists and is non-zero in at least one point x 0 R. continuous, monotone, continuous differentiable outside a compact set, and lim x ϱ (x), lim x ϱ (x) exist and do not coincide. 31 / 36

42 Closedness in L Theorem (P., Raslan, Voigtlaender; 2018) Let d, L, N, N 1,..., N L 1 N, N 0 = d. If ϱ is has one of the properties below, then N N ϱ (d, N 1,..., N L 1, 1) is not closed in L. analytic, bounded, not constant, C 1 but not C ρ C p and ρ(x) x p + bounded, for p 1. ReLU: The set of two-layer ReLU NNs is closed in L! 32 / 36

43 Illustration Illustration: For most activation functions (except the ReLU) ϱ the set N N ϱ (d, N 1,..., N L 1, 1) is star-shaped with center 0, not approximately convex, not closed. 33 / 36

44 Stable parametrization Continuous parametrization: It is not hard to see, that if ϱ is continuous, then so is the map R ϱ : (T 1,..., T L ) Φ. Quotient map: We can also ask if R ϱ is a quotient map, i.e., if Φ 1, Φ 2 are NNs which are close (w.r.t. sup ), then there exist (T1 1,..., T L 1) and (T 1 2,..., T L 2 ) which are close in some norm and R ϱ ((T 1 1,..., T 1 L )) = Φ 1 and R ϱ ((T 2 1,..., T 2 L )) = Φ2. Proposition (P., Raslan, Voigtlaender; 2018) Let ϱ be Lipschitz continuous and not affine linear, then R ϱ is not a quotient map. 34 / 36

45 Consequences No convexity: Want to solve J(Φ) = 0 for an energy J and NN Φ. Not only J could be non-convex, but also the set we optimize over. Similar to N-term approximation by dictionaries. No closedness: Exploding coefficients (if P N N (f ) N N ). No low-neuron approximation. No inverse-stable parametrization: Error term very small, while parametrization is far from optimal. Potentially very slow convergence. 35 / 36

46 Where to go from here? Different networks: Special types of networks could be more robust. Convolutional neural networks are probably still too large a class. [P., Voigtlaender; 2018]. Stronger norms: Stronger norms naturally help with closedness and inverse stability. Example is Sobolev training [Czarnecki, Osindero, Jaderberg, Swirszcz, Pascanu; 2017]. Many arguments of our results break down if W 1, norm is used. 36 / 36

47 Conclusion Approximation: NNs are a very powerful approximation tool: Often optimally efficient parametrization overcome curse of dimension surprisingly efficient black-box optimization Topological structure: NNs form an impractical set: non-convex non-closed no inverse-stable parametrization. 37 / 36

48 References: H. Andrade-Loarca, G. Kutyniok, O. Öktem, P. Petersen, Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks, arxiv: H. Bölcskei, P. Grohs, G. Kutyniok, P. Petersen, Optimal Approximation with Sparsely Connected Deep Neural Networks, arxiv: J. Opschoor, P. Petersen, Ch. Schwab, Deep ReLU Networks and High-Order Finite Element Methods, SAM, ETH Zürich, P. Petersen, F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks, Neural Networks, (2018) P. Petersen, M. Raslan, F. Voigtlaender, Topological properties of the set of functions generated by neural networks of fixed size, arxiv: P. Petersen, F. Voigtlaender, Equivalence of approximation by convolutional neural networks and fully-connected networks, arxiv: / 36

49 Thank you for your attention! References: H. Andrade-Loarca, G. Kutyniok, O. Öktem, P. Petersen, Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks, arxiv: H. Bölcskei, P. Grohs, G. Kutyniok, P. Petersen, Optimal Approximation with Sparsely Connected Deep Neural Networks, arxiv: J. Opschoor, P. Petersen, Ch. Schwab, Deep ReLU Networks and High-Order Finite Element Methods, SAM, ETH Zürich, P. Petersen, F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks, Neural Networks, (2018) P. Petersen, M. Raslan, F. Voigtlaender, Topological properties of the set of functions generated by neural networks of fixed size, arxiv: P. Petersen, F. Voigtlaender, Equivalence of approximation by convolutional neural networks and fully-connected networks, arxiv: / 36

Approximation theory in neural networks

Approximation theory in neural networks Approximation theory in neural networks Yanhui Su yanhui su@brown.edu March 30, 2018 Outline 1 Approximation of functions by a sigmoidal function 2 Approximations of continuous functionals by a sigmoidal

More information

Memory-Optimal Neural Network Approximation

Memory-Optimal Neural Network Approximation Memory-Optimal Neural Network Approximation Helmut Bölcskei a, Philipp Grohs b, Gitta Kutyniok c, and Philipp Petersen c a Department of Information Technology and Electrical Engineering, ETH Zürich, 8092

More information

Harmonic Analysis of Deep Convolutional Neural Networks

Harmonic Analysis of Deep Convolutional Neural Networks Harmonic Analysis of Deep Convolutional Neural Networks Helmut Bőlcskei Department of Information Technology and Electrical Engineering October 2017 joint work with Thomas Wiatowski and Philipp Grohs ImageNet

More information

WAVELETS, SHEARLETS AND GEOMETRIC FRAMES: PART II

WAVELETS, SHEARLETS AND GEOMETRIC FRAMES: PART II WAVELETS, SHEARLETS AND GEOMETRIC FRAMES: PART II Philipp Grohs 1 and Axel Obermeier 2 October 22, 2014 1 ETH Zürich 2 ETH Zürich, supported by SNF grant 146356 OUTLINE 3. Curvelets, shearlets and parabolic

More information

Deep Convolutional Neural Networks on Cartoon Functions

Deep Convolutional Neural Networks on Cartoon Functions Deep Convolutional Neural Networks on Cartoon Functions Philipp Grohs, Thomas Wiatowski, and Helmut Bölcskei Dept. Math., ETH Zurich, Switzerland, and Dept. Math., University of Vienna, Austria Dept. IT

More information

When can Deep Networks avoid the curse of dimensionality and other theoretical puzzles

When can Deep Networks avoid the curse of dimensionality and other theoretical puzzles When can Deep Networks avoid the curse of dimensionality and other theoretical puzzles Tomaso Poggio, MIT, CBMM Astar CBMM s focus is the Science and the Engineering of Intelligence We aim to make progress

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Local Affine Approximators for Improving Knowledge Transfer

Local Affine Approximators for Improving Knowledge Transfer Local Affine Approximators for Improving Knowledge Transfer Suraj Srinivas & François Fleuret Idiap Research Institute and EPFL {suraj.srinivas, francois.fleuret}@idiap.ch Abstract The Jacobian of a neural

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems Weinan E 1 and Bing Yu 2 arxiv:1710.00211v1 [cs.lg] 30 Sep 2017 1 The Beijing Institute of Big Data Research,

More information

Sparse Tensor Galerkin Discretizations for First Order Transport Problems

Sparse Tensor Galerkin Discretizations for First Order Transport Problems Sparse Tensor Galerkin Discretizations for First Order Transport Problems Ch. Schwab R. Hiptmair, E. Fonn, K. Grella, G. Widmer ETH Zürich, Seminar for Applied Mathematics IMA WS Novel Discretization Methods

More information

Beyond incoherence and beyond sparsity: compressed sensing in the real world

Beyond incoherence and beyond sparsity: compressed sensing in the real world Beyond incoherence and beyond sparsity: compressed sensing in the real world Clarice Poon 1st November 2013 University of Cambridge, UK Applied Functional and Harmonic Analysis Group Head of Group Anders

More information

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This

More information

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Learning Deep Architectures for AI. Part I - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

arxiv: v1 [cs.lg] 30 Sep 2018

arxiv: v1 [cs.lg] 30 Sep 2018 Deep, Skinny Neural Networks are not Universal Approximators arxiv:1810.00393v1 [cs.lg] 30 Sep 2018 Jesse Johnson Sanofi jejo.math@gmail.com October 2, 2018 Abstract In order to choose a neural network

More information

Multilayer feedforward networks are universal approximators

Multilayer feedforward networks are universal approximators Multilayer feedforward networks are universal approximators Kur Hornik, Maxwell Stinchcombe and Halber White (1989) Presenter: Sonia Todorova Theoretical properties of multilayer feedforward networks -

More information

Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks

Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks Itay Safran 1 Ohad Shamir 1 Abstract We provide several new depth-based separation results for feed-forward neural networks,

More information

I-theory on depth vs width: hierarchical function composition

I-theory on depth vs width: hierarchical function composition CBMM Memo No. 041 December 29, 2015 I-theory on depth vs width: hierarchical function composition by Tomaso Poggio with Fabio Anselmi and Lorenzo Rosasco Abstract Deep learning networks with convolution,

More information

PDEs in Image Processing, Tutorials

PDEs in Image Processing, Tutorials PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower

More information

Persistent Chaos in High-Dimensional Neural Networks

Persistent Chaos in High-Dimensional Neural Networks Persistent Chaos in High-Dimensional Neural Networks D. J. Albers with J. C. Sprott and James P. Crutchfield February 20, 2005 1 Outline: Introduction and motivation Mathematical versus computational dynamics

More information

On Some Mathematical Results of Neural Networks

On Some Mathematical Results of Neural Networks On Some Mathematical Results of Neural Networks Dongbin Xiu Department of Mathematics Ohio State University Overview (Short) Introduction of Neural Networks (NNs) Successes Basic mechanism (Incomplete)

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear

More information

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University. Nonlinear Models Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Intro 1 Course Overview Intro 2 Course Overview Lecture 1: Linear Models

More information

Neural networks COMS 4771

Neural networks COMS 4771 Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a

More information

Why and When Can Deep-but Not Shallow-networks Avoid the Curse of Dimensionality: A Review

Why and When Can Deep-but Not Shallow-networks Avoid the Curse of Dimensionality: A Review International Journal of Automation and Computing DOI: 10.1007/s11633-017-1054-2 Why and When Can Deep-but Not Shallow-networks Avoid the Curse of Dimensionality: A Review Tomaso Poggio 1 Hrushikesh Mhaskar

More information

On Ridge Functions. Allan Pinkus. September 23, Technion. Allan Pinkus (Technion) Ridge Function September 23, / 27

On Ridge Functions. Allan Pinkus. September 23, Technion. Allan Pinkus (Technion) Ridge Function September 23, / 27 On Ridge Functions Allan Pinkus Technion September 23, 2013 Allan Pinkus (Technion) Ridge Function September 23, 2013 1 / 27 Foreword In this lecture we will survey a few problems and properties associated

More information

Understanding Neural Networks : Part I

Understanding Neural Networks : Part I TensorFlow Workshop 2018 Understanding Neural Networks Part I : Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 Outline 1 Neural Networks

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04 Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures Lecture 04 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Self-Taught Learning 1. Learn

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Scientific Computing WS 2018/2019. Lecture 15. Jürgen Fuhrmann Lecture 15 Slide 1

Scientific Computing WS 2018/2019. Lecture 15. Jürgen Fuhrmann Lecture 15 Slide 1 Scientific Computing WS 2018/2019 Lecture 15 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de Lecture 15 Slide 1 Lecture 15 Slide 2 Problems with strong formulation Writing the PDE with divergence and gradient

More information

Some Statistical Properties of Deep Networks

Some Statistical Properties of Deep Networks Some Statistical Properties of Deep Networks Peter Bartlett UC Berkeley August 2, 2018 1 / 22 Deep Networks Deep compositions of nonlinear functions h = h m h m 1 h 1 2 / 22 Deep Networks Deep compositions

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

Measure and Integration: Solutions of CW2

Measure and Integration: Solutions of CW2 Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost

More information

Implementation of Sparse Wavelet-Galerkin FEM for Stochastic PDEs

Implementation of Sparse Wavelet-Galerkin FEM for Stochastic PDEs Implementation of Sparse Wavelet-Galerkin FEM for Stochastic PDEs Roman Andreev ETH ZÜRICH / 29 JAN 29 TOC of the Talk Motivation & Set-Up Model Problem Stochastic Galerkin FEM Conclusions & Outlook Motivation

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

A large deviation principle for a RWRC in a box

A large deviation principle for a RWRC in a box A large deviation principle for a RWRC in a box 7th Cornell Probability Summer School Michele Salvi TU Berlin July 12, 2011 Michele Salvi (TU Berlin) An LDP for a RWRC in a nite box July 12, 2011 1 / 15

More information

Space-time sparse discretization of linear parabolic equations

Space-time sparse discretization of linear parabolic equations Space-time sparse discretization of linear parabolic equations Roman Andreev August 2, 200 Seminar for Applied Mathematics, ETH Zürich, Switzerland Support by SNF Grant No. PDFMP2-27034/ Part of PhD thesis

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

THE STOKES SYSTEM R.E. SHOWALTER

THE STOKES SYSTEM R.E. SHOWALTER THE STOKES SYSTEM R.E. SHOWALTER Contents 1. Stokes System 1 Stokes System 2 2. The Weak Solution of the Stokes System 3 3. The Strong Solution 4 4. The Normal Trace 6 5. The Mixed Problem 7 6. The Navier-Stokes

More information

arxiv: v2 [cs.ne] 20 May 2016

arxiv: v2 [cs.ne] 20 May 2016 A SINGLE HIDDEN LAYER FEEDFORWARD NETWORK WITH ONLY ONE NEURON IN THE HIDDEN LAYER CAN APPROXIMATE ANY UNIVARIATE FUNCTION arxiv:1601.00013v [cs.ne] 0 May 016 NAMIG J. GULIYEV AND VUGAR E. ISMAILOV Abstract.

More information

Topic 3: Neural Networks

Topic 3: Neural Networks CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why

More information

Non-radial solutions to a bi-harmonic equation with negative exponent

Non-radial solutions to a bi-harmonic equation with negative exponent Non-radial solutions to a bi-harmonic equation with negative exponent Ali Hyder Department of Mathematics, University of British Columbia, Vancouver BC V6TZ2, Canada ali.hyder@math.ubc.ca Juncheng Wei

More information

Error bounds for approximations with deep ReLU networks

Error bounds for approximations with deep ReLU networks Error bounds for approximations with deep ReLU networks arxiv:1610.01145v3 [cs.lg] 1 May 2017 Dmitry Yarotsky d.yarotsky@skoltech.ru May 2, 2017 Abstract We study expressive power of shallow and deep neural

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply

More information

Deep Learning for Partial Differential Equations (PDEs)

Deep Learning for Partial Differential Equations (PDEs) Deep Learning for Partial Differential Equations (PDEs) Kailai Xu kailaix@stanford.edu Bella Shi bshi@stanford.edu Shuyi Yin syin3@stanford.edu Abstract Partial differential equations (PDEs) have been

More information

Introduction to (Convolutional) Neural Networks

Introduction to (Convolutional) Neural Networks Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Generalized quantiles as risk measures

Generalized quantiles as risk measures Generalized quantiles as risk measures Bellini, Klar, Muller, Rosazza Gianin December 1, 2014 Vorisek Jan Introduction Quantiles q α of a random variable X can be defined as the minimizers of a piecewise

More information

On the Structure of Anisotropic Frames

On the Structure of Anisotropic Frames On the Structure of Anisotropic Frames P. Grohs ETH Zurich, Seminar for Applied Mathematics ESI Modern Methods of Time-Frequency Analysis Motivation P. Grohs ESI Modern Methods of Time-Frequency Analysis

More information

Sobolev spaces. May 18

Sobolev spaces. May 18 Sobolev spaces May 18 2015 1 Weak derivatives The purpose of these notes is to give a very basic introduction to Sobolev spaces. More extensive treatments can e.g. be found in the classical references

More information

Deep learning quantum matter

Deep learning quantum matter Deep learning quantum matter Machine-learning approaches to the quantum many-body problem Joaquín E. Drut & Andrew C. Loheac University of North Carolina at Chapel Hill SAS, September 2018 Outline Why

More information

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping

More information

Convergence of Multivariate Quantile Surfaces

Convergence of Multivariate Quantile Surfaces Convergence of Multivariate Quantile Surfaces Adil Ahidar Institut de Mathématiques de Toulouse - CERFACS August 30, 2013 Adil Ahidar (Institut de Mathématiques de Toulouse Convergence - CERFACS) of Multivariate

More information

Notes for Functional Analysis

Notes for Functional Analysis Notes for Functional Analysis Wang Zuoqin (typed by Xiyu Zhai) September 29, 2015 1 Lecture 09 1.1 Equicontinuity First let s recall the conception of equicontinuity for family of functions that we learned

More information

IN the field of ElectroMagnetics (EM), Boundary Value

IN the field of ElectroMagnetics (EM), Boundary Value 1 A Neural Network Based ElectroMagnetic Solver Sethu Hareesh Kolluru (hareesh@stanford.edu) General Machine Learning I. BACKGROUND AND MOTIVATION IN the field of ElectroMagnetics (EM), Boundary Value

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Topics in Harmonic Analysis, Sparse Representations, and Data Analysis

Topics in Harmonic Analysis, Sparse Representations, and Data Analysis Topics in Harmonic Analysis, Sparse Representations, and Data Analysis Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu Thesis Defense,

More information

arxiv: v3 [cs.lg] 8 Jun 2018

arxiv: v3 [cs.lg] 8 Jun 2018 Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions Quynh Nguyen Mahesh Chandra Mukkamala Matthias Hein 2 arxiv:803.00094v3 [cs.lg] 8 Jun 208 Abstract In the recent literature

More information

Discussion of Hypothesis testing by convex optimization

Discussion of Hypothesis testing by convex optimization Electronic Journal of Statistics Vol. 9 (2015) 1 6 ISSN: 1935-7524 DOI: 10.1214/15-EJS990 Discussion of Hypothesis testing by convex optimization Fabienne Comte, Céline Duval and Valentine Genon-Catalot

More information

ON THE STABILITY OF DEEP NETWORKS

ON THE STABILITY OF DEEP NETWORKS ON THE STABILITY OF DEEP NETWORKS AND THEIR RELATIONSHIP TO COMPRESSED SENSING AND METRIC LEARNING RAJA GIRYES AND GUILLERMO SAPIRO DUKE UNIVERSITY Mathematics of Deep Learning International Conference

More information

The Mathematics of Deep Learning Part 1: Continuous-time Theory

The Mathematics of Deep Learning Part 1: Continuous-time Theory The Mathematics of Deep Learning Part 1: Continuous-time Theory Helmut Bőlcskei Department of Information Technology and Electrical Engineering June 2016 joint work with Thomas Wiatowski, Philipp Grohs,

More information

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition Content Andrew Kusiak Intelligent Systems Laboratory 239 Seamans Center The University of Iowa Iowa City, IA 52242-527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Introduction to learning

More information

Takens embedding theorem for infinite-dimensional dynamical systems

Takens embedding theorem for infinite-dimensional dynamical systems Takens embedding theorem for infinite-dimensional dynamical systems James C. Robinson Mathematics Institute, University of Warwick, Coventry, CV4 7AL, U.K. E-mail: jcr@maths.warwick.ac.uk Abstract. Takens

More information

Introduction to Machine Learning (67577) Lecture 7

Introduction to Machine Learning (67577) Lecture 7 Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew

More information

Course Description for Real Analysis, Math 156

Course Description for Real Analysis, Math 156 Course Description for Real Analysis, Math 156 In this class, we will study elliptic PDE, Fourier analysis, and dispersive PDE. Here is a quick summary of the topics will study study. They re described

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC Non-Convex Optimization in Machine Learning Jan Mrkos AIC The Plan 1. Introduction 2. Non convexity 3. (Some) optimization approaches 4. Speed and stuff? Neural net universal approximation Theorem (1989):

More information

Numerical Solutions to Partial Differential Equations

Numerical Solutions to Partial Differential Equations Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical Sciences Peking University The Residual and Error of Finite Element Solutions Mixed BVP of Poisson Equation

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

Bounded uniformly continuous functions

Bounded uniformly continuous functions Bounded uniformly continuous functions Objectives. To study the basic properties of the C -algebra of the bounded uniformly continuous functions on some metric space. Requirements. Basic concepts of analysis:

More information

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

Signal Recovery, Uncertainty Relations, and Minkowski Dimension Signal Recovery, Uncertainty Relations, and Minkowski Dimension Helmut Bőlcskei ETH Zurich December 2013 Joint work with C. Aubel, P. Kuppinger, G. Pope, E. Riegler, D. Stotz, and C. Studer Aim of this

More information

Energy Landscapes of Deep Networks

Energy Landscapes of Deep Networks Energy Landscapes of Deep Networks Pratik Chaudhari February 29, 2016 References Auffinger, A., Arous, G. B., & Černý, J. (2013). Random matrices and complexity of spin glasses. arxiv:1003.1129.# Choromanska,

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

S chauder Theory. x 2. = log( x 1 + x 2 ) + 1 ( x 1 + x 2 ) 2. ( 5) x 1 + x 2 x 1 + x 2. 2 = 2 x 1. x 1 x 2. 1 x 1.

S chauder Theory. x 2. = log( x 1 + x 2 ) + 1 ( x 1 + x 2 ) 2. ( 5) x 1 + x 2 x 1 + x 2. 2 = 2 x 1. x 1 x 2. 1 x 1. Sep. 1 9 Intuitively, the solution u to the Poisson equation S chauder Theory u = f 1 should have better regularity than the right hand side f. In particular one expects u to be twice more differentiable

More information

Reliable and Interpretable Artificial Intelligence

Reliable and Interpretable Artificial Intelligence (Sept 15, 2017) Reliable and Interpretable Artificial Intelligence How good (robust) is your neural net? [1] Pei et. al., DeepXplore: Automated Whitebox Testing of Deep Learning Systems, SOSP 2017 Attacks

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods The Connection to Green s Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH 590 1 Outline 1 Introduction

More information

Towards stability and optimality in stochastic gradient descent

Towards stability and optimality in stochastic gradient descent Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction

More information

Quarkonial frames of wavelet type - Stability, approximation and compression properties

Quarkonial frames of wavelet type - Stability, approximation and compression properties Quarkonial frames of wavelet type - Stability, approximation and compression properties Stephan Dahlke 1 Peter Oswald 2 Thorsten Raasch 3 ESI Workshop Wavelet methods in scientific computing Vienna, November

More information

WHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION SHIYU LIANG THESIS

WHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION SHIYU LIANG THESIS c 2017 Shiyu Liang WHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION BY SHIYU LIANG THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer

More information

On the Hilbert Transform of Wavelets

On the Hilbert Transform of Wavelets On the Hilbert Transform of Wavelets Kunal Narayan Chaudhury and Michael Unser Abstract A wavelet is a localized function having a prescribed number of vanishing moments. In this correspondence, we provide

More information

Image classification via Scattering Transform

Image classification via Scattering Transform 1 Image classification via Scattering Transform Edouard Oyallon under the supervision of Stéphane Mallat following the work of Laurent Sifre, Joan Bruna, Classification of signals Let n>0, (X, Y ) 2 R

More information

Institut de Recherche MAthématique de Rennes

Institut de Recherche MAthématique de Rennes LMS Durham Symposium: Computational methods for wave propagation in direct scattering. - July, Durham, UK The hp version of the Weighted Regularization Method for Maxwell Equations Martin COSTABEL & Monique

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Lecture 15: Neural Networks Theory

Lecture 15: Neural Networks Theory EECS 598-5: Theoretical Foundations of Machine Learning Fall 25 Lecture 5: Neural Networks Theory Lecturer: Matus Telgarsky Scribes: Xintong Wang, Editors: Yuan Zhuang 5. Neural Networks Definition and

More information

Continuous functions that are nowhere differentiable

Continuous functions that are nowhere differentiable Continuous functions that are nowhere differentiable S. Kesavan The Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai - 600113. e-mail: kesh @imsc.res.in Abstract It is shown that the existence

More information