Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties. Philipp Christian Petersen
|
|
- Lee Rose
- 5 years ago
- Views:
Transcription
1 Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties Philipp Christian Petersen
2 Joint work Joint work with: Helmut Bölcskei (ETH Zürich) Philipp Grohs (University of Vienna) Joost Opschoor (ETH Zürich) Gitta Kutyniok (TU Berlin) Mones Raslan (TU Berlin) Christoph Schwab (ETH Zürich) Felix Voigtlaender (KU Eichstätt-Ingolstadt) 1 / 36
3 Today s Goal Goal of this talk: Discuss the suitability of neural networks as an ansatz system for the solution of PDEs. Two threads: Structural properties: Approximation theory: I universal approximation I I optimal approximation rates for all classical function spaces non-convex, non-closed ansatz spaces I parametrization not stable I very hard to optimize over reduced curse of dimension I / 36
4 Outline Neural networks Introduction to neural networks Approaches to solve PDEs Approximation theory of neural networks Classical results Optimality High-dimensional approximation Structural results Convexity Closedness Stable parametrization 3 / 36
5 Neural networks We consider neural networks as a special kind of functions: d = N 0 N: input dimension, L: number of layers, ϱ : R R: activation function, T l : R N l 1 R N l, l = 1,..., L: affine-linear maps. Then Φ ϱ : R d R N L given by Φ ϱ (x) = T L (ϱ(t L 1 (ϱ(... ϱ(t 1 (x)))))), x R d, is called a neural network (NN). The sequence (d, N 1,..., N L ) is called the architecture of Φ ϱ. 4 / 36
6 Why are neural networks interesting? - I Deep Learning: Deep learning describes a variety of techniques based on data-driven adaptation of the affine linear maps in a neural network. Overwhelming success: Image classification Text understanding Game intelligence Ren, He, Girshick, Sun; 2015 Hardware design of the future! 5 / 36
7 Why are neural networks interesting? - II Expressibility: Neural networks constitute a very powerful architecture. Theorem (Cybenko; 1989, Hornik; 1991, Pinkus; 1999) Let d N, K R d compact, f : K R continuous, ϱ : R R continuous and not a polynomial. Let ε > 0, then there exist a two-layer NN Φ ϱ : f Φ ϱ ε. Efficient expressibility: R M θ (T 1,..., T L ) Φ ϱ θ yields a parametrized system of functions. In a sense this parametrization is optimally efficient. (More on this below). 6 / 36
8 How can we apply NNs to solve PDEs? PDE problem: For D R d, d N find u such that G(x, u(x), u(x), 2 u(x)) = 0 for all x D. Approach of [Lagaris, Likas, Fotiadis; 1998]: Let (x i ) i I D, find a NN Φ ϱ θ such that G(x i, Φ ϱ θ (x i), Φ ϱ θ (x i), 2 Φ ϱ θ (x i)) = 0 for all i I. Standard methods can be used to find parameters θ. 7 / 36
9 Approaches to solve PDEs - Examples General Framework: Deep Ritz Method [E, Yu; 2017]: NNs as trial functions, SGD naturally replaces quadrature. High-dimensional PDEs: [Sirignano, Spiliopoulos; 2017]: Let D R d d 100 find u such that u (t, x) + H(u)(t, x) = 0, t (t, x) [0, T ] Ω, + BC + IC As the number of parameters of the NNs increases the minimizer of associated energy approaches true solution. No mesh generation required! [Berner, Grohs, Hornung, Jentzen, von Wurstemberger; 2017]: Phrasing problem as empirical risk minimization provably no curse of dimension in approximation problem or number of samples. 8 / 36
10 How can we apply NNs to solve PDEs? Deep learning and PDEs: Both approaches above are based on two ideas. Neural networks are highly efficient in representing solutions of PDEs, hence the complexity of the problem can be greatly reduced. There exist black box methods from machine learning that solve the optimization problem. This talk: We will show exactly how efficient the representations are. Raise doubt that the black box can produce reliable results in general. 9 / 36
11 Approximation theory of neural networks 10 / 36
12 Complexity of neural networks Recall: Φ ϱ (x) = T L (ϱ(t L 1 (ϱ(... ϱ(t 1 (x)))))), x R d. Each affine linear mapping T l is defined by a matrix A l R N l N l 1 and a translation b l R N l via T l (x) = A l x + b l. The number of weights W (Φ ϱ ) and the number of neurons N(Φ ϱ ) are W (Φ ϱ ) = L j l 0 + b j l 0) and N(Φ j L( A ϱ ) = N j. j=0 11 / 36
13 Power of the architecture Exemplary results Given f from some class of functions, how many weights/neurons does an ε-approximating NN need to have? 12 / 36
14 Power of the architecture Exemplary results Given f from some class of functions, how many weights/neurons does an ε-approximating NN need to have? Not so many... Theorem (Maiorov, Pinkus; 1999) There exists an activation function ϱ weird : R R that is analytic and strictly increasing, satisfies lim x ϱ weird (x) = 0 and lim x ϱ weird (x) = 1, such that for any d N, any f C([0, 1] d ), and any ε > 0, there is a 3-layer ϱ-network Φ ϱ weird ε with f Φ ϱ weird ε L ε and N(Φ ϱ weird ε ) = 9d / 36
15 Power of the architecture Exemplary results Barron; 1993: Approximation rate for functions with one finite Fourier moment using shallow networks with activation function ϱ sigmoidal of order zero. 13 / 36
16 Power of the architecture Exemplary results Barron; 1993: Approximation rate for functions with one finite Fourier moment using shallow networks with activation function ϱ sigmoidal of order zero. Mhaskar; 1993: Let ϱ be sigmoidal function of order k 2. For f C s ([0, 1] d ), we have f Φ ϱ n L N(Φ ϱ n) s/d and L(Φ ϱ n) = L(d, s, k). 13 / 36
17 Power of the architecture Exemplary results Barron; 1993: Approximation rate for functions with one finite Fourier moment using shallow networks with activation function ϱ sigmoidal of order zero. Mhaskar; 1993: Let ϱ be sigmoidal function of order k 2. For f C s ([0, 1] d ), we have f Φ ϱ n L N(Φ ϱ n) s/d and L(Φ ϱ n) = L(d, s, k). Yarotsky; 2017: For f C s ([0, 1] d ), we have for ϱ(x) = x + (called ReLU) that f Φ ϱ n L W (Φ ϱ n) s/d and L(Φ ϱ ε) log(n). 13 / 36
18 Power of the architecture Exemplary results Barron; 1993: Approximation rate for functions with one finite Fourier moment using shallow networks with activation function ϱ sigmoidal of order zero. Mhaskar; 1993: Let ϱ be sigmoidal function of order k 2. For f C s ([0, 1] d ), we have f Φ ϱ n L N(Φ ϱ n) s/d and L(Φ ϱ n) = L(d, s, k). Yarotsky; 2017: For f C s ([0, 1] d ), we have for ϱ(x) = x + (called ReLU) that f Φ ϱ n L W (Φ ϱ n) s/d and L(Φ ϱ ε) log(n). Shaham, Cloninger, Coifman; 2015: One can implement certain wavelets using 4 layer NNs. 13 / 36
19 Power of the architecture Exemplary results Barron; 1993: Approximation rate for functions with one finite Fourier moment using shallow networks with activation function ϱ sigmoidal of order zero. Mhaskar; 1993: Let ϱ be sigmoidal function of order k 2. For f C s ([0, 1] d ), we have f Φ ϱ n L N(Φ ϱ n) s/d and L(Φ ϱ n) = L(d, s, k). Yarotsky; 2017: For f C s ([0, 1] d ), we have for ϱ(x) = x + (called ReLU) that f Φ ϱ n L W (Φ ϱ n) s/d and L(Φ ϱ ε) log(n). Shaham, Cloninger, Coifman; 2015: One can implement certain wavelets using 4 layer NNs. He, Li, Xu, Zheng; 2018, Opschoor, Schwab, P.; 2019: ReLU NNs reproduce approximation rates of h-, p- and hp-fem. 13 / 36
20 Lower bounds Optimal approximation rates: Lower bounds on required network size only exist under additional assumptions. (Recall networks based on ϱ weird ). Options: (A) Place restrictions on activation function (e.g. only consider the ReLU), thereby excluding pathological examples like ϱ weird. ( VC dimension bounds) (B) Place restrictions on the weights. ( Information theoretical bounds, entropy arguments) (C) Use still other concepts like continuous N-widths. 14 / 36
21 Lower bounds Optimal approximation rates: Lower bounds on required network size only exist under additional assumptions. (Recall networks based on ϱ weird ). Options: (A) Place restrictions on activation function (e.g. only consider the ReLU), thereby excluding pathological examples like ϱ weird. ( VC dimension bounds) (B) Place restrictions on the weights. ( Information theoretical bounds, entropy arguments) (C) Use still other concepts like continuous N-widths. 14 / 36
22 Asymptotic min-max rate distortion Encoders: Let C L 2 (R d ), l N { E l := E : C {0, 1} l} { }, D l := D : {0, 1} l L 2 (R d ). {0, 1, 0, 0, 1, 1, 1} Min-max code length: { } L(ɛ, C) := min l N : D D l, C C l :sup D(E(f )) f 2 < ɛ f C. Optimal exponent: γ (C) := inf { γ > 0 : L(ɛ, C) = O(ɛ γ ) }. 15 / 36
23 Asymptotic min-max rate distortion Theorem (Boelcskei, Grohs, Kutyniok, P.; 2017) Let C L 2 (R d ), ϱ : R R, then for all ɛ > 0: sup f C inf Φ ϱ NN with quantised weights Φ ϱ f 2 ɛ W (Φ ϱ ) ɛ γ (C). (1) Optimal approximation/parametrization: If for C L 2 (R d ) one also has in (1), then NNs approximate a function class optimally. Versatility: It turns out that NNs achieve optimal approximation rates for many practically-used function classes. 16 / 36
24 Some instances of optimal approximation Mhaskar; 1993: Let ϱ be sigmoidal function of order k 2. For f C s ([0, 1] d ), we have f Φ ϱ n L N(Φ ϱ n) s/d. We have γ ({f C s ([0, 1] d : f 1}) = d/s. Shaham, Cloninger, Coifman; 2015: One can implement certain wavelets using 4 layer ReLU NNs. Optimal, when wavelets are optimal. Bölcskei, Grohs, Kutyniok, P.; 2017: Networks yield optimal rates if any affine system does. Example: shearlets for cartoon-like functions. 17 / 36
25 ReLU Approximation Piecewise smooth functions: E β,d denotes the d-dimensional C β -piecewise smooth functions on [0, 1] d with interfaces in C β Theorem (P., Voigtlaender; 2018) Let d N, β 0, ϱ : R R, ϱ(x) = x +, then sup f E β,d inf Φ ϱ NN with quantised weights Φ ϱ f 2 ɛ W (Φ ϱ ) ɛ γ (E β,d ) = ɛ 2(d 1)/β. The optimal depth of the networks is β/d. 18 / 36
26 High-dimensional approximation Curse of dimension: To guarantee approximation with error ε of functions in Eβ,d one requires networks with O(ε 2(d 1)/β ) weights. Symmetries and invariances: Image classifiers are often: I translation, dilation, and rotation invariant, I invariant to small deformations, I invariant to small changes in brightness, contrast, color. 19 / 36
27 Curse of dimension Two-step setup: f = χ τ τ : R D R d is a smooth dimension reducing feature map. χ E β,d performs classification on low-dimensional space. 20 / 36
28 Curse of dimension Two-step setup: f = χ τ τ : R D R d is a smooth dimension reducing feature map. χ E β,d performs classification on low-dimensional space. Theorem (P., Voigtlaender; 2017) Let ϱ(x) = x +. There are constants c > 0, L N such that for any f = χ τ and any ε (0, 1/2), there is a NN Φ ϱ ε with at most L layers, and at most c ε 2(d 1)/β non-zero weights such that Φ ϱ ε f L 2 < ε. Asymptotic approximation rate depends only on d, not on D. 20 / 36
29 Compositional functions Compositional functions: [Mhaskar, Poggio; 2016] High-dimensional functions as dyadic composition of 2-dimensional functions. R 8 x h 3 1(h 2 1(h 1 1(x 1, x 2 ), h 1 2(x 3, x 4 )), h 2 2(h 1 3(x 5, x 6 ), h 1 4(x 7, x 8 ))) x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 21 / 36
30 Extensions Approximation with respect to Sobolev norms: ReLU NNs Φ are Lipschitz continuous. Hence, for s [0, 1], p 1 and f W s,p (Ω), we can measure f Φ W s,p (Ω). ReLU Networks achieve the same approximation rates as h-, p-, hp-fem, [Opschoor, P., Schwab; 2019]. Convolutional neural networks: Direct correspondence between approximation by CNNs (without pooling) and approximation by fully-connected networks, [P., Voigtlaender; 2018]. 22 / 36
31 Optimal parametrization Optimal parametrization: Neural networks yield optimal representations of many function classes relevant in PDE applications, Approximation is flexible and quality is improved if low-dimensional structure is present. PDE discretization: Problem complexity drastically reduced, No design of ansatz system necessary, since NNs approximate almost every function class well. Can neural networks really be this good? 23 / 36
32 The inconvenient structure of neural networks 24 / 36
33 Fixed architecture networks Goal: Fix a space of networks with prescribed shape and understand the associated set of functions. Fixed architecture networks: Let d, L N, N 1,..., N L 1 N, ϱ : R R then we define by N N ϱ (d, N 1,..., N L 1, 1) the set of NNs with architecture (d, N 1,..., N L 1, 1). d = 8 N 1 = 12 N 2 = 12 N 3 = 12 N 4 = 8 25 / 36
34 Back to the basics Topological properties: Is N N ϱ (d, N 1,..., N L 1, 1) star-shaped? convex? approximately convex? closed? Is the map (T 1,..., T L ) Φ open? Implications for optimization: If we do not have the properties above, then we can have terrible local minima, exploding weights, very slow convergence. 26 / 36
35 Star-shapedness Star-shapedness: N N ϱ (d, N 1,..., N L 1, 1) is trivially star-shaped with center 0....but... Proposition (P., Raslan, Voigtlaender; 2018) Let d, L, N, N 1,..., N L 1 N and let ϱ : R R be locally Lipschitz continuous. Then the number of linearly independent centers of N N ϱ (d, N 1,..., N L ) is at most L l=1 (N l 1 + 1)N l, where N 0 = d. 27 / 36
36 Convexity? Corollary (P., Raslan, Voigtlaender; 2018) Let d, L, N, N 1,..., N L 1 N, N 0 = d, and let ϱ : R R be locally Lipschitz continuous. If N N ϱ (d, N 1,..., N L 1, 1) contains more than L l=1 (N l 1 + 1)N l linearly independent functions, then N N ϱ (d, N 1,..., N L 1, 1) is not convex. From translation invariance: If N N ϱ (d, N 1,..., N L 1, 1) only finitely many linearly independent functions then ϱ is a finite sum of complex exponentials multiplied with polynomials. 28 / 36
37 Weak Convexity? Weak convexity: N N ϱ (d, N 1,..., N L 1, 1) is almost never convex, but what about N N ϱ (d, N 1,..., N L 1, 1) + B ɛ (0) for a hopefully small ɛ > 0? 29 / 36
38 Weak Convexity? Weak convexity: N N ϱ (d, N 1,..., N L 1, 1) is almost never convex, but what about N N ϱ (d, N 1,..., N L 1, 1) + B ɛ (0) for a hopefully small ɛ > 0? Theorem (P., Raslan, Voigtlaender; 2018) Let d, L, N, N 1,..., N L 1 N, N 0 = d. For all commonly-used activation functions there does not exist ɛ > 0 such that N N ϱ (d, N 1,..., N L 1, 1) + B ɛ (0) is convex. As a corollary, we also get that N N ϱ (d, N 1,..., N L 1, 1) is usually nowhere dense. 29 / 36
39 Illustration Illustration: The set N N ϱ (d, N 1,..., N L 1, 1) has very few centers, it is scaling invariant, not approximately convex, and nowhere dense. 30 / 36
40 Closedness in L p Compact weights: If the activation function ϱ is continuous, then a compactness argument shows that the set of networks of a compact parameter set is closed. 31 / 36
41 Closedness in L p Compact weights: If the activation function ϱ is continuous, then a compactness argument shows that the set of networks of a compact parameter set is closed. Theorem (P., Raslan, Voigtlaender; 2018) Let d, L, N 1,..., N L 1 N, N 0 = d. If ϱ has one of the properties below, then N N ϱ (d, N 1,..., N L 1, 1) is not closed in L p, p (0, ). analytic, bounded, not constant, C 1 but not C, continuous, monotone, bounded, ϱ (x 0 ) exists and is non-zero in at least one point x 0 R. continuous, monotone, continuous differentiable outside a compact set, and lim x ϱ (x), lim x ϱ (x) exist and do not coincide. 31 / 36
42 Closedness in L Theorem (P., Raslan, Voigtlaender; 2018) Let d, L, N, N 1,..., N L 1 N, N 0 = d. If ϱ is has one of the properties below, then N N ϱ (d, N 1,..., N L 1, 1) is not closed in L. analytic, bounded, not constant, C 1 but not C ρ C p and ρ(x) x p + bounded, for p 1. ReLU: The set of two-layer ReLU NNs is closed in L! 32 / 36
43 Illustration Illustration: For most activation functions (except the ReLU) ϱ the set N N ϱ (d, N 1,..., N L 1, 1) is star-shaped with center 0, not approximately convex, not closed. 33 / 36
44 Stable parametrization Continuous parametrization: It is not hard to see, that if ϱ is continuous, then so is the map R ϱ : (T 1,..., T L ) Φ. Quotient map: We can also ask if R ϱ is a quotient map, i.e., if Φ 1, Φ 2 are NNs which are close (w.r.t. sup ), then there exist (T1 1,..., T L 1) and (T 1 2,..., T L 2 ) which are close in some norm and R ϱ ((T 1 1,..., T 1 L )) = Φ 1 and R ϱ ((T 2 1,..., T 2 L )) = Φ2. Proposition (P., Raslan, Voigtlaender; 2018) Let ϱ be Lipschitz continuous and not affine linear, then R ϱ is not a quotient map. 34 / 36
45 Consequences No convexity: Want to solve J(Φ) = 0 for an energy J and NN Φ. Not only J could be non-convex, but also the set we optimize over. Similar to N-term approximation by dictionaries. No closedness: Exploding coefficients (if P N N (f ) N N ). No low-neuron approximation. No inverse-stable parametrization: Error term very small, while parametrization is far from optimal. Potentially very slow convergence. 35 / 36
46 Where to go from here? Different networks: Special types of networks could be more robust. Convolutional neural networks are probably still too large a class. [P., Voigtlaender; 2018]. Stronger norms: Stronger norms naturally help with closedness and inverse stability. Example is Sobolev training [Czarnecki, Osindero, Jaderberg, Swirszcz, Pascanu; 2017]. Many arguments of our results break down if W 1, norm is used. 36 / 36
47 Conclusion Approximation: NNs are a very powerful approximation tool: Often optimally efficient parametrization overcome curse of dimension surprisingly efficient black-box optimization Topological structure: NNs form an impractical set: non-convex non-closed no inverse-stable parametrization. 37 / 36
48 References: H. Andrade-Loarca, G. Kutyniok, O. Öktem, P. Petersen, Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks, arxiv: H. Bölcskei, P. Grohs, G. Kutyniok, P. Petersen, Optimal Approximation with Sparsely Connected Deep Neural Networks, arxiv: J. Opschoor, P. Petersen, Ch. Schwab, Deep ReLU Networks and High-Order Finite Element Methods, SAM, ETH Zürich, P. Petersen, F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks, Neural Networks, (2018) P. Petersen, M. Raslan, F. Voigtlaender, Topological properties of the set of functions generated by neural networks of fixed size, arxiv: P. Petersen, F. Voigtlaender, Equivalence of approximation by convolutional neural networks and fully-connected networks, arxiv: / 36
49 Thank you for your attention! References: H. Andrade-Loarca, G. Kutyniok, O. Öktem, P. Petersen, Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks, arxiv: H. Bölcskei, P. Grohs, G. Kutyniok, P. Petersen, Optimal Approximation with Sparsely Connected Deep Neural Networks, arxiv: J. Opschoor, P. Petersen, Ch. Schwab, Deep ReLU Networks and High-Order Finite Element Methods, SAM, ETH Zürich, P. Petersen, F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks, Neural Networks, (2018) P. Petersen, M. Raslan, F. Voigtlaender, Topological properties of the set of functions generated by neural networks of fixed size, arxiv: P. Petersen, F. Voigtlaender, Equivalence of approximation by convolutional neural networks and fully-connected networks, arxiv: / 36
Approximation theory in neural networks
Approximation theory in neural networks Yanhui Su yanhui su@brown.edu March 30, 2018 Outline 1 Approximation of functions by a sigmoidal function 2 Approximations of continuous functionals by a sigmoidal
More informationMemory-Optimal Neural Network Approximation
Memory-Optimal Neural Network Approximation Helmut Bölcskei a, Philipp Grohs b, Gitta Kutyniok c, and Philipp Petersen c a Department of Information Technology and Electrical Engineering, ETH Zürich, 8092
More informationHarmonic Analysis of Deep Convolutional Neural Networks
Harmonic Analysis of Deep Convolutional Neural Networks Helmut Bőlcskei Department of Information Technology and Electrical Engineering October 2017 joint work with Thomas Wiatowski and Philipp Grohs ImageNet
More informationWAVELETS, SHEARLETS AND GEOMETRIC FRAMES: PART II
WAVELETS, SHEARLETS AND GEOMETRIC FRAMES: PART II Philipp Grohs 1 and Axel Obermeier 2 October 22, 2014 1 ETH Zürich 2 ETH Zürich, supported by SNF grant 146356 OUTLINE 3. Curvelets, shearlets and parabolic
More informationDeep Convolutional Neural Networks on Cartoon Functions
Deep Convolutional Neural Networks on Cartoon Functions Philipp Grohs, Thomas Wiatowski, and Helmut Bölcskei Dept. Math., ETH Zurich, Switzerland, and Dept. Math., University of Vienna, Austria Dept. IT
More informationWhen can Deep Networks avoid the curse of dimensionality and other theoretical puzzles
When can Deep Networks avoid the curse of dimensionality and other theoretical puzzles Tomaso Poggio, MIT, CBMM Astar CBMM s focus is the Science and the Engineering of Intelligence We aim to make progress
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationLocal Affine Approximators for Improving Knowledge Transfer
Local Affine Approximators for Improving Knowledge Transfer Suraj Srinivas & François Fleuret Idiap Research Institute and EPFL {suraj.srinivas, francois.fleuret}@idiap.ch Abstract The Jacobian of a neural
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationThe Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems
The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems Weinan E 1 and Bing Yu 2 arxiv:1710.00211v1 [cs.lg] 30 Sep 2017 1 The Beijing Institute of Big Data Research,
More informationSparse Tensor Galerkin Discretizations for First Order Transport Problems
Sparse Tensor Galerkin Discretizations for First Order Transport Problems Ch. Schwab R. Hiptmair, E. Fonn, K. Grella, G. Widmer ETH Zürich, Seminar for Applied Mathematics IMA WS Novel Discretization Methods
More informationBeyond incoherence and beyond sparsity: compressed sensing in the real world
Beyond incoherence and beyond sparsity: compressed sensing in the real world Clarice Poon 1st November 2013 University of Cambridge, UK Applied Functional and Harmonic Analysis Group Head of Group Anders
More informationGlobal Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond
Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This
More informationLearning Deep Architectures for AI. Part I - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More informationarxiv: v1 [cs.lg] 30 Sep 2018
Deep, Skinny Neural Networks are not Universal Approximators arxiv:1810.00393v1 [cs.lg] 30 Sep 2018 Jesse Johnson Sanofi jejo.math@gmail.com October 2, 2018 Abstract In order to choose a neural network
More informationMultilayer feedforward networks are universal approximators
Multilayer feedforward networks are universal approximators Kur Hornik, Maxwell Stinchcombe and Halber White (1989) Presenter: Sonia Todorova Theoretical properties of multilayer feedforward networks -
More informationDepth-Width Tradeoffs in Approximating Natural Functions with Neural Networks
Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks Itay Safran 1 Ohad Shamir 1 Abstract We provide several new depth-based separation results for feed-forward neural networks,
More informationI-theory on depth vs width: hierarchical function composition
CBMM Memo No. 041 December 29, 2015 I-theory on depth vs width: hierarchical function composition by Tomaso Poggio with Fabio Anselmi and Lorenzo Rosasco Abstract Deep learning networks with convolution,
More informationPDEs in Image Processing, Tutorials
PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower
More informationPersistent Chaos in High-Dimensional Neural Networks
Persistent Chaos in High-Dimensional Neural Networks D. J. Albers with J. C. Sprott and James P. Crutchfield February 20, 2005 1 Outline: Introduction and motivation Mathematical versus computational dynamics
More informationOn Some Mathematical Results of Neural Networks
On Some Mathematical Results of Neural Networks Dongbin Xiu Department of Mathematics Ohio State University Overview (Short) Introduction of Neural Networks (NNs) Successes Basic mechanism (Incomplete)
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationNonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.
Nonlinear Models Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Intro 1 Course Overview Intro 2 Course Overview Lecture 1: Linear Models
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationWhy and When Can Deep-but Not Shallow-networks Avoid the Curse of Dimensionality: A Review
International Journal of Automation and Computing DOI: 10.1007/s11633-017-1054-2 Why and When Can Deep-but Not Shallow-networks Avoid the Curse of Dimensionality: A Review Tomaso Poggio 1 Hrushikesh Mhaskar
More informationOn Ridge Functions. Allan Pinkus. September 23, Technion. Allan Pinkus (Technion) Ridge Function September 23, / 27
On Ridge Functions Allan Pinkus Technion September 23, 2013 Allan Pinkus (Technion) Ridge Function September 23, 2013 1 / 27 Foreword In this lecture we will survey a few problems and properties associated
More informationUnderstanding Neural Networks : Part I
TensorFlow Workshop 2018 Understanding Neural Networks Part I : Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 Outline 1 Neural Networks
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationDeep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04
Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures Lecture 04 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Self-Taught Learning 1. Learn
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationScientific Computing WS 2018/2019. Lecture 15. Jürgen Fuhrmann Lecture 15 Slide 1
Scientific Computing WS 2018/2019 Lecture 15 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de Lecture 15 Slide 1 Lecture 15 Slide 2 Problems with strong formulation Writing the PDE with divergence and gradient
More informationSome Statistical Properties of Deep Networks
Some Statistical Properties of Deep Networks Peter Bartlett UC Berkeley August 2, 2018 1 / 22 Deep Networks Deep compositions of nonlinear functions h = h m h m 1 h 1 2 / 22 Deep Networks Deep compositions
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationMeasure and Integration: Solutions of CW2
Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost
More informationImplementation of Sparse Wavelet-Galerkin FEM for Stochastic PDEs
Implementation of Sparse Wavelet-Galerkin FEM for Stochastic PDEs Roman Andreev ETH ZÜRICH / 29 JAN 29 TOC of the Talk Motivation & Set-Up Model Problem Stochastic Galerkin FEM Conclusions & Outlook Motivation
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationA large deviation principle for a RWRC in a box
A large deviation principle for a RWRC in a box 7th Cornell Probability Summer School Michele Salvi TU Berlin July 12, 2011 Michele Salvi (TU Berlin) An LDP for a RWRC in a nite box July 12, 2011 1 / 15
More informationSpace-time sparse discretization of linear parabolic equations
Space-time sparse discretization of linear parabolic equations Roman Andreev August 2, 200 Seminar for Applied Mathematics, ETH Zürich, Switzerland Support by SNF Grant No. PDFMP2-27034/ Part of PhD thesis
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationTHE STOKES SYSTEM R.E. SHOWALTER
THE STOKES SYSTEM R.E. SHOWALTER Contents 1. Stokes System 1 Stokes System 2 2. The Weak Solution of the Stokes System 3 3. The Strong Solution 4 4. The Normal Trace 6 5. The Mixed Problem 7 6. The Navier-Stokes
More informationarxiv: v2 [cs.ne] 20 May 2016
A SINGLE HIDDEN LAYER FEEDFORWARD NETWORK WITH ONLY ONE NEURON IN THE HIDDEN LAYER CAN APPROXIMATE ANY UNIVARIATE FUNCTION arxiv:1601.00013v [cs.ne] 0 May 016 NAMIG J. GULIYEV AND VUGAR E. ISMAILOV Abstract.
More informationTopic 3: Neural Networks
CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why
More informationNon-radial solutions to a bi-harmonic equation with negative exponent
Non-radial solutions to a bi-harmonic equation with negative exponent Ali Hyder Department of Mathematics, University of British Columbia, Vancouver BC V6TZ2, Canada ali.hyder@math.ubc.ca Juncheng Wei
More informationError bounds for approximations with deep ReLU networks
Error bounds for approximations with deep ReLU networks arxiv:1610.01145v3 [cs.lg] 1 May 2017 Dmitry Yarotsky d.yarotsky@skoltech.ru May 2, 2017 Abstract We study expressive power of shallow and deep neural
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationLocal strong convexity and local Lipschitz continuity of the gradient of convex functions
Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate
More informationThe Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:
Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply
More informationDeep Learning for Partial Differential Equations (PDEs)
Deep Learning for Partial Differential Equations (PDEs) Kailai Xu kailaix@stanford.edu Bella Shi bshi@stanford.edu Shuyi Yin syin3@stanford.edu Abstract Partial differential equations (PDEs) have been
More informationIntroduction to (Convolutional) Neural Networks
Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationGeneralized quantiles as risk measures
Generalized quantiles as risk measures Bellini, Klar, Muller, Rosazza Gianin December 1, 2014 Vorisek Jan Introduction Quantiles q α of a random variable X can be defined as the minimizers of a piecewise
More informationOn the Structure of Anisotropic Frames
On the Structure of Anisotropic Frames P. Grohs ETH Zurich, Seminar for Applied Mathematics ESI Modern Methods of Time-Frequency Analysis Motivation P. Grohs ESI Modern Methods of Time-Frequency Analysis
More informationSobolev spaces. May 18
Sobolev spaces May 18 2015 1 Weak derivatives The purpose of these notes is to give a very basic introduction to Sobolev spaces. More extensive treatments can e.g. be found in the classical references
More informationDeep learning quantum matter
Deep learning quantum matter Machine-learning approaches to the quantum many-body problem Joaquín E. Drut & Andrew C. Loheac University of North Carolina at Chapel Hill SAS, September 2018 Outline Why
More informationDeep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang
Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping
More informationConvergence of Multivariate Quantile Surfaces
Convergence of Multivariate Quantile Surfaces Adil Ahidar Institut de Mathématiques de Toulouse - CERFACS August 30, 2013 Adil Ahidar (Institut de Mathématiques de Toulouse Convergence - CERFACS) of Multivariate
More informationNotes for Functional Analysis
Notes for Functional Analysis Wang Zuoqin (typed by Xiyu Zhai) September 29, 2015 1 Lecture 09 1.1 Equicontinuity First let s recall the conception of equicontinuity for family of functions that we learned
More informationIN the field of ElectroMagnetics (EM), Boundary Value
1 A Neural Network Based ElectroMagnetic Solver Sethu Hareesh Kolluru (hareesh@stanford.edu) General Machine Learning I. BACKGROUND AND MOTIVATION IN the field of ElectroMagnetics (EM), Boundary Value
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationTopics in Harmonic Analysis, Sparse Representations, and Data Analysis
Topics in Harmonic Analysis, Sparse Representations, and Data Analysis Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu Thesis Defense,
More informationarxiv: v3 [cs.lg] 8 Jun 2018
Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions Quynh Nguyen Mahesh Chandra Mukkamala Matthias Hein 2 arxiv:803.00094v3 [cs.lg] 8 Jun 208 Abstract In the recent literature
More informationDiscussion of Hypothesis testing by convex optimization
Electronic Journal of Statistics Vol. 9 (2015) 1 6 ISSN: 1935-7524 DOI: 10.1214/15-EJS990 Discussion of Hypothesis testing by convex optimization Fabienne Comte, Céline Duval and Valentine Genon-Catalot
More informationON THE STABILITY OF DEEP NETWORKS
ON THE STABILITY OF DEEP NETWORKS AND THEIR RELATIONSHIP TO COMPRESSED SENSING AND METRIC LEARNING RAJA GIRYES AND GUILLERMO SAPIRO DUKE UNIVERSITY Mathematics of Deep Learning International Conference
More informationThe Mathematics of Deep Learning Part 1: Continuous-time Theory
The Mathematics of Deep Learning Part 1: Continuous-time Theory Helmut Bőlcskei Department of Information Technology and Electrical Engineering June 2016 joint work with Thomas Wiatowski, Philipp Grohs,
More informationContent. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition
Content Andrew Kusiak Intelligent Systems Laboratory 239 Seamans Center The University of Iowa Iowa City, IA 52242-527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Introduction to learning
More informationTakens embedding theorem for infinite-dimensional dynamical systems
Takens embedding theorem for infinite-dimensional dynamical systems James C. Robinson Mathematics Institute, University of Warwick, Coventry, CV4 7AL, U.K. E-mail: jcr@maths.warwick.ac.uk Abstract. Takens
More informationIntroduction to Machine Learning (67577) Lecture 7
Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew
More informationCourse Description for Real Analysis, Math 156
Course Description for Real Analysis, Math 156 In this class, we will study elliptic PDE, Fourier analysis, and dispersive PDE. Here is a quick summary of the topics will study study. They re described
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationNon-Convex Optimization in Machine Learning. Jan Mrkos AIC
Non-Convex Optimization in Machine Learning Jan Mrkos AIC The Plan 1. Introduction 2. Non convexity 3. (Some) optimization approaches 4. Speed and stuff? Neural net universal approximation Theorem (1989):
More informationNumerical Solutions to Partial Differential Equations
Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical Sciences Peking University The Residual and Error of Finite Element Solutions Mixed BVP of Poisson Equation
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationBounded uniformly continuous functions
Bounded uniformly continuous functions Objectives. To study the basic properties of the C -algebra of the bounded uniformly continuous functions on some metric space. Requirements. Basic concepts of analysis:
More informationSignal Recovery, Uncertainty Relations, and Minkowski Dimension
Signal Recovery, Uncertainty Relations, and Minkowski Dimension Helmut Bőlcskei ETH Zurich December 2013 Joint work with C. Aubel, P. Kuppinger, G. Pope, E. Riegler, D. Stotz, and C. Studer Aim of this
More informationEnergy Landscapes of Deep Networks
Energy Landscapes of Deep Networks Pratik Chaudhari February 29, 2016 References Auffinger, A., Arous, G. B., & Černý, J. (2013). Random matrices and complexity of spin glasses. arxiv:1003.1129.# Choromanska,
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationS chauder Theory. x 2. = log( x 1 + x 2 ) + 1 ( x 1 + x 2 ) 2. ( 5) x 1 + x 2 x 1 + x 2. 2 = 2 x 1. x 1 x 2. 1 x 1.
Sep. 1 9 Intuitively, the solution u to the Poisson equation S chauder Theory u = f 1 should have better regularity than the right hand side f. In particular one expects u to be twice more differentiable
More informationReliable and Interpretable Artificial Intelligence
(Sept 15, 2017) Reliable and Interpretable Artificial Intelligence How good (robust) is your neural net? [1] Pei et. al., DeepXplore: Automated Whitebox Testing of Deep Learning Systems, SOSP 2017 Attacks
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations
Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationBrownian Motion. 1 Definition Brownian Motion Wiener measure... 3
Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................
More informationMATH 590: Meshfree Methods
MATH 590: Meshfree Methods The Connection to Green s Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH 590 1 Outline 1 Introduction
More informationTowards stability and optimality in stochastic gradient descent
Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction
More informationQuarkonial frames of wavelet type - Stability, approximation and compression properties
Quarkonial frames of wavelet type - Stability, approximation and compression properties Stephan Dahlke 1 Peter Oswald 2 Thorsten Raasch 3 ESI Workshop Wavelet methods in scientific computing Vienna, November
More informationWHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION SHIYU LIANG THESIS
c 2017 Shiyu Liang WHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION BY SHIYU LIANG THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer
More informationOn the Hilbert Transform of Wavelets
On the Hilbert Transform of Wavelets Kunal Narayan Chaudhury and Michael Unser Abstract A wavelet is a localized function having a prescribed number of vanishing moments. In this correspondence, we provide
More informationImage classification via Scattering Transform
1 Image classification via Scattering Transform Edouard Oyallon under the supervision of Stéphane Mallat following the work of Laurent Sifre, Joan Bruna, Classification of signals Let n>0, (X, Y ) 2 R
More informationInstitut de Recherche MAthématique de Rennes
LMS Durham Symposium: Computational methods for wave propagation in direct scattering. - July, Durham, UK The hp version of the Weighted Regularization Method for Maxwell Equations Martin COSTABEL & Monique
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationA Bayesian perspective on GMM and IV
A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all
More informationContents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.
Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More informationLecture 15: Neural Networks Theory
EECS 598-5: Theoretical Foundations of Machine Learning Fall 25 Lecture 5: Neural Networks Theory Lecturer: Matus Telgarsky Scribes: Xintong Wang, Editors: Yuan Zhuang 5. Neural Networks Definition and
More informationContinuous functions that are nowhere differentiable
Continuous functions that are nowhere differentiable S. Kesavan The Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai - 600113. e-mail: kesh @imsc.res.in Abstract It is shown that the existence
More information