Fundamentality of Ridge Functions

Similar documents
Approximation of Multivariate Functions

On Ridge Functions. Allan Pinkus. September 23, Technion. Allan Pinkus (Technion) Ridge Function September 23, / 27

Smoothness and uniqueness in ridge function representation

Interpolation on lines by ridge functions

Strictly Positive Definite Functions on a Real Inner Product Space

Where is matrix multiplication locally open?

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions

Lower Bounds for Approximation by MLP Neural Networks

REAL AND COMPLEX ANALYSIS

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Approximating By Ridge Functions. Allan Pinkus. We hope it will also encourage some readers to consider researching

Strictly positive definite functions on a real inner product space

Decomposition of Riesz frames and wavelets into a finite union of linearly independent sets

Commutative Banach algebras 79

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Approximation by Conditionally Positive Definite Functions with Finitely Many Centers

10. Smooth Varieties. 82 Andreas Gathmann

Maths 212: Homework Solutions

Linear Algebra Practice Problems

Integral Jensen inequality

THEOREMS, ETC., FOR MATH 515

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

Multilayer feedforward networks are universal approximators

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

MATH 225 Summer 2005 Linear Algebra II Solutions to Assignment 1 Due: Wednesday July 13, 2005

An introduction to some aspects of functional analysis

MASTERS EXAMINATION IN MATHEMATICS SOLUTIONS

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Functional Analysis HW #1

On Riesz-Fischer sequences and lower frame bounds

Spanning and Independence Properties of Finite Frames

Abstract Vector Spaces and Concrete Examples

Math 421, Homework #9 Solutions

A NICE PROOF OF FARKAS LEMMA

Math 341: Convex Geometry. Xi Chen

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

THEOREMS, ETC., FOR MATH 516

Math 209B Homework 2

Fourier Transform & Sobolev Spaces

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures

2. Function spaces and approximation

COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE

Chapter 2 Linear Transformations

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

Comparing continuous and discrete versions of Hilbert s thirteenth problem

Banach Spaces V: A Closer Look at the w- and the w -Topologies

V (v i + W i ) (v i + W i ) is path-connected and hence is connected.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Some Background Material

We denote the space of distributions on Ω by D ( Ω) 2.

MAT Linear Algebra Collection of sample exams

Math 426 Homework 4 Due 3 November 2017

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Min-Rank Conjecture for Log-Depth Circuits

Topological vectorspaces

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f

En busca de la linealidad en Matemáticas

Analysis Preliminary Exam Workshop: Hilbert Spaces

Math 61CM - Solutions to homework 6

Problem Set 6: Solutions Math 201A: Fall a n x n,

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Here we used the multiindex notation:

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

ARCS IN FINITE PROJECTIVE SPACES. Basic objects and definitions

Continuous Functions on Metric Spaces

P-adic Functions - Part 1

INTRODUCTION TO REAL ANALYTIC GEOMETRY

Representation theory and quantum mechanics tutorial Spin and the hydrogen atom

FREMLIN TENSOR PRODUCTS OF CONCAVIFICATIONS OF BANACH LATTICES

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Trace Class Operators and Lidskii s Theorem

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Immerse Metric Space Homework

Universal Approximation by Ridge Computational Models and Neural Networks: A Survey

L p Approximation of Sigma Pi Neural Networks

The Skorokhod reflection problem for functions with discontinuities (contractive case)

MATH 8253 ALGEBRAIC GEOMETRY WEEK 12

On stable inversion of the attenuated Radon transform with half data Jan Boman. We shall consider weighted Radon transforms of the form

Spectral theorems for bounded self-adjoint operators on a Hilbert space

Irrationality exponent and rational approximations with prescribed growth

9 Radon-Nikodym theorem and conditioning

Review and problem list for Applied Math I

Chapter 1. Preliminaries

Math 210B. Artin Rees and completions

TOPOLOGICAL GROUPS MATH 519

Part III. 10 Topological Space Basics. Topological Spaces

LECTURE 15: COMPLETENESS AND CONVEXITY

Math 321: Linear Algebra

Fragmentability and σ-fragmentability

The Dirichlet-to-Neumann operator

On m-accretive Schrödinger operators in L p -spaces on manifolds of bounded geometry

Reminder Notes for the Course on Distribution Theory

1 The Local-to-Global Lemma

Recall that any inner product space V has an associated norm defined by

Whitney s Extension Problem for C m

Analysis II Lecture notes

12. Hilbert Polynomials and Bézout s Theorem

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 2: HILBERT S NULLSTELLENSATZ.

Analysis-3 lecture schemes

Transcription:

Fundamentality of Ridge Functions Vladimir Ya. Lin and Allan Pinkus Abstract. For a given integer d, 1 d n 1, let Ω be a subset of the set of all d n real matrices. Define the subspace M(Ω) = span{g(ax) : A Ω, g C(IR d, IR)}. We give necessary and sufficient conditions on Ω so that M(Ω) is dense in C(IR n, IR) in the topology of uniform convergence on compact subsets. This generalizes work of Vostrecov and Kreines. We also consider some related problems. 1. Introduction Ridge functions on IR n, in their simplest case, are functions F of the form F (x) = f(a x) where f : IR IR, a IR n \{0} is a fixed vector, x = (x 1,..., x n ) IR n, and a x = n i=1 a ix i. Such functions, and both generalizations and linear combinations thereof, arise in various contexts. They arise in problems of tomography, see e. g. [8], [12], [13] and references therein, projection pursuit in statistics, see e. g. [6], [9], neural networks, see [1], [2], [11] and references therein, partial differential equations [10] (where they are called plane waves ), and approximation theory, see e. g. [1], [3], [4], [5], and [14]. We consider, for given d, 1 d n 1, functions G of the form G(x) = g(ax) where A is a fixed d n real matrix, and g : IR d IR. For d = 1, this reduces to ridge functions. In this paper we let Ω be a subset of all d n real matrices. Set M(Ω) = span{g(ax) : A Ω, g C(IR d, IR)}. (We run over all A Ω and all g C(IR d, IR).) In Section 2 we determine necessary and sufficient conditions on Ω such that the subspace M(Ω) is dense in C(IR n, IR), i. e., M(Ω) = C(IR n, IR), in the topology of uniform convergence on compact subsets. In Section 3 we highlight various consequences of this result. In Section 4 we discuss the question of characterizing M(Ω) in general, and its relationship to kernels of differential operators. Finally, in Section 5 we ask a slightly different question. We fix a natural number k and

2 Vladimir Ya. Lin and Allan Pinkus ask whether it is possible to approximate functions in C(IR n, IR) (in the above sense) by linear combinations of k functions of the form g(ax) where we are free to choose the k directions A, as well as the functions g. We prove that the answer is no. We learned from Prof. Y. Brudnyi, only after we completed the work on this paper, of work of Vostrecov and Kreines, [15] and [16]. Two main results of our paper, namely Theorem 2.1 and part of Theorem 4.1, were proven for the case d = 1 in these two papers from the early 60 s. These papers were unfortunately overlooked. We hope that they now receive the attention which is their due. (For example, in [8] and [12] can be found a version of Theorem 2.1 in the case d = 1 and n = 2.) Other papers of Vostrecov, see especially [17], are very much related to more recent work in [5]. A paper dealing with the above topics should note, and possibly use, the interrelationship between the results of this paper and Radon transform theory, polynomial ideals, exponential solutions, Zariski topology, Zariski closure, and other related matters. It was our desire to write this paper in as elementary a fashion as possible. We hope that the reader will make the appropriate connections. 2. Main Result and Proof Let Ω and M(Ω) be as defined above. For each A Ω, we let L(A) denote the span of the d rows of A. In what follows A, B Ω are considered distinct if L(A) L(B). This is because span{g(ax) : g C(IR d, IR)} = span{g(bx) : g C(IR d, IR)} if and only if L(A) = L(B). Set L(Ω) = A Ω L(A). Let Hk n i. e., denote the set of homogeneous polynomials of n variables of total degree k, Hk n = { c m s m }, and H n the set of all homogeneous polynomials of n variables, i. e., H n = Hk n. k=0 We use the standard notation m = (m 1,..., m n ) Z n +, m = m 1 + + m n, and s m = s m 1 1 s m n n. Note that dim Hk n = ( ) n 1+k k. Theorem 2.1. The linear space M(Ω) is dense in C(IR n, IR) in the topology of uniform convergence on compact subsets if and only if the only polynomial in H n which vanishes identically on L(Ω) is the zero polynomial.

Ridge Functions 3 Proof. ( ). Assume that for some k IN there exists a p Hk n \{0} such that p(ξ) = 0 for all ξ L(Ω). Let p(ξ) = b m ξ m. Choose any φ C0 (IR n ), φ 0, i. e., φ is a nontrivial C function with compact support. For each m Z n +, m = k, set We define D m = ψ(x) = k x m 1 1 x m n n. b m D m φ(x). Note that ψ C 0 (IRn ), ψ 0, (suppψ suppφ), and ψ = i k φ p where denotes the Fourier transform. We claim that (2.1) g(ax)ψ(x)dx = 0 IR n for all A Ω and g C(IR d, IR), i. e., the nontrivial linear functional defined by integrating against ψ annihilates M(Ω). This implies the desired result. We prove (2.1) as follows. For given A Ω, let m = dim L(A) d. Write x = (x, x ), where (x, 0) and (0, x ) are the orthogonal projections of x onto L(A) and its orthogonal complement, respectively. Then for any ρ C0 (IR n ) ρ(x)dx = IR n IR m [ IR n m ρ(x, x )dx ] dx. Every c IR m is such that c = (c, 0) L(A). Thus 0 = i k 1 φ(c)p(c) = ψ(c) = ψ(x)e ic x dx (2π) n/2 IR n 1 [ ] = ψ(x, x )dx e ic x dx. (2π) n/2 IR m IR n m Set H(x ) = ψ(x, x )dx. IR n m Then H C0 (IRm ), and the previous equation can be rewritten as 1 0 = H(x )e ic x dx = (2π) m/2 Ĥ(c ) IR k

4 Vladimir Ya. Lin and Allan Pinkus for all c IR m. Thus H 0. Set x = (x, 0). Since (0, x ) is orthogonal to L(A), it is clear that Ax = A x for all x = (x, x ) IR n. Thus for any g C(IR d, IR), g(ax)ψ(x)dx = IR n IR m [ IR n m ψ(x, x )dx ] g(a x )dx = H(x )g(a x )dx = 0. IR m ( ). Assume that for a given k IN no non-trivial p Hk n vanishes identically on L(Ω). We will prove that Hk n M(Ω). If the above holds for all k Z +, it then follows that M(Ω) contains all polynomials, and from the Weierstrass Theorem, M(Ω) = C(IR n, IR). Let d L(Ω). Then there exists a y IR d and A Ω such that d = ya. Set g(ax) = (y Ax) k = (d x) k. Thus (d x) k M(Ω) for all d L(Ω). Since D m 1 x m 2 = δ m1,m 2 k!, for m 1, m 2 Z n +, m 1 = m 2 = k, it easily follows that every linear functional l on the finite dimensional linear space Hk n may be represented by some q Hk n via l(p) = q(d)p for each p H n k. For any given q H n k, q(d) (d x) k = k! q(d). If the linear functional l annihilates (d x) k for all d L(Ω), then its representor q H n k vanishes on L(Ω). By assumption this implies that q 0. Thus H n k = span{(d x)k : d L(Ω)} M(Ω). Remark 2.1. The proof of this theorem in the case d = 1 in Vostrecov and Kreines [15] is much the same, although we like to think that our proof is somewhat more elegant. Remark 2.2. In Section 4 we provide a different proof of the sufficiency. In fact that proof will be more general. The above proof is given because it is elementary and highlights an important fact. Namely, M(Ω) is dense in C(IR n, IR) in the topology of uniform convergence on compact subsets if and only if M(Ω) explicitly contains the polynomials. Remark 2.3. Our choice of the topology of uniform convergence on compact subsets is rather arbitrary. Theorem 2.1 will hold for many other linear spaces defined on IR n or subsets thereof. It is sufficient that the polynomials are dense therein (the Weierstrass Theorem holds) and we can choose ψ, as in the first part of the proof of Theorem 2.1, in such a way that it defines a continuous linear functional on the space which annihilates M(Ω).

Ridge Functions 5 3. Consequences and Remarks In this section we note some simple consequences of Theorem 2.1. But first we remark that the property that no non-trivial polynomial in H n identically vanishes on L(Ω) is equivalent to dim Hk n L(Ω) = dim Hk n, for each k IN. For notational ease, we sometimes use this latter form. Proposition 3.1. If dim Hk n L(Ω) = dim Hk n for some k IN, then dim H l n for all l < k. L(Ω) = dim H n l L(Ω) < dim Hl n, then there exists a p H l n \{0} vanishing on L(Ω). Let Proof. If dim Hl n q Hk l n \{0}. Then pq Hn k \{0} and vanishes on L(Ω). A contradiction. Proposition 3.2. If Ω = Ω 1 Ω2, then M(Ω) = C(IR n, IR) if and only if M(Ω j ) = C(IR n, IR) for j = 1 or j = 2. Proof. The proof in one direction is simple. To prove the other direction, assume M(Ω j ) C(IR n, IR) for j = 1 and j = 2. As such there exist p j H n \{0} such that p j vanishes on L(Ω j ), j = 1, 2. Then p = p 1 p 2 H n \{0} and vanishes on L(Ω). Thus M(Ω) C(IR n, IR). As a consequence of Proposition 3.2, we have: Corollary 3.3. C(IR n, IR). If Ω contains only a finite number of distinct elements, then M(Ω) Proof. Given a d n matrix A, d n 1, there exists a vector c IR n \{0} such that Ac = 0. Set p(ξ) = c ξ. p vanishes on L(A). Apply Proposition 3.2. The case d = 1 (n = 2) of this next result was proved in [15], see also [12] and [8], by a much different method. Proposition 3.4. If d = n 1, and Ω contains an infinite number of distinct A of rank n 1, then M(Ω) = C(IR n, IR). Proof. Recall that A, B Ω are distinct if L(A) L(B). For each A Ω of rank n 1, there exists a unique (up to multiplication by a constant) c A IR n \{0} (the normal to L(A)) such that Ac A = 0. Let p A (ξ) = c A ξ. p A vanishes exactly on L(A) and is irreducible. Assume M(Ω) C(IR n, IR). Thus for some k IN, there exists a p Hk n \{0} which vanishes on L(Ω). For each A Ω, p vanishes on all of L(A). Thus, by the Hilbert Nullstellensatz, the polynomial p A must be a divisor of p. This is true for an infinite number of irreducible distinct polynomials, which is a contradiction.

6 Vladimir Ya. Lin and Allan Pinkus Let Π n k denote the space of algebraic polynomials of n variables and total degree k, i. e., Π n k = k l=0 Hn l. Note that dim Πn k = dim Hn+1 k. We also let Π n denote the set of all algebraic polynomials of n variables. For each a IR n with a n 0, set τ(a) = ( a 1 a n,..., a n 1 a n ). (τ(λa) = τ(a) for all λ IR\{0}.) If a n = 0, τ(a) is not defined. Proposition 3.5. The following are equivalent: a) The only polynomial in H n which vanishes identically on L(Ω) is the zero polynomial. b) The only polynomial in Π n which vanishes identically on L(Ω) is the zero polynomial. c) The only polynomial in Π n 1 which vanishes identically on τ(l(ω)) is the zero polynomial. Proof. We first prove the equivalence of (a) and (b). Obviously (b) implies (a). To prove that (a) implies (b), we first note that L(Ω) is a balanced cone, i. e., if a L(Ω) then λa L(Ω) for all λ IR. Thus if p Π n \{0} vanishes on L(Ω), then for each a L(Ω) and every λ IR 0 = p(λa) = q j (λa) = λ j q j (a) where q j H n j j=0 j=0 in the expansion of p. But this implies that q j (a) = 0, j = 0, 1,..., k. Thus there exists a non-trivial homogeneous polynomial vanishing on L(Ω). We now prove the equivalence of (a) and (c). Assume (c) does not hold. Then, for some k IN, there exists a q Π n 1 k \{0} which vanishes on τ(l(ω)). Let q(x) = d m x m where x = (x 1,..., x n 1 ). Set p(x) = m k m k d m x m x k+1 m n. Then p Hk+1 n \{0}, and as is easily checked, p vanishes on L(Ω). Thus (a) does not hold. Now assume that (a) does not hold. For some k IN there exists a p Hk n \{0} which vanishes on L(Ω). Let p(x) = c m x m, x = (x 1,..., x n ). Set q(x) = c m x m 1 1 x m n 1 n 1.

Ridge Functions 7 Then q Π n 1 k \{0} and q vanishes on τ(l(ω)). Thus (c) does not hold. Let U 1,..., U n IR. By U = U 1 U n we mean all vectors a = (a 1,..., a n ) IR n with a i U i, i = 1,..., n. Given d, 1 d n 1, let Ω(U) denote the subset of the set of d n matrices, the rows of which are all possible vectors in U. Proposition 3.6. M(Ω(U)) = C(IR n, IR) if and only if a) at least n d of the U 1,..., U n have an infinite number of distinct elements b) at most one of the U 1,..., U n has only one element, and none has only the zero element. Proof. ( ). (a). Assume U 1,..., U d+1 each have a finite number of elements. For each distinct set of d vectors à = {ã i} d i=1 in U 1 U d+1 IR d+1, let cã IR d+1 satisfy cã ã i = 0, i = 1,..., d. Let cã = ( cã, 0,..., 0) IR n. Take the product of all linear polynomials cã x, for different Ã. (There are a finite number of such polynomials.) This product is a homogeneous polynomial which vanishes on L(Ω(U)). (b). If U 1 = {0}, let p(x) = x 1. If U 1 = {u 1 } and U 2 = {u 2 }, where u 1, u 2 0, let p(x) = (u 2 x 1 u 1 x 2 ). These homogeneous polynomials vanish on their respective L(Ω(U)). ( ). Let us assume that both (a) and (b) hold. Our argument is via induction on n (with d fixed). As such we first prove the result for d = n 1. Assume d = n 1, U n has an infinite number of elements, and (b) holds. Thus assume U i has at least two distinct elements a i, b i, i = 1,..., n 2, and b n 1 U n 1 with b n 1 0. Let B denote the n 1 n 1 matrix a 1 b 2 b n 2 b n 1 b 1 a 2 b n 2 b n 1 B =........ b 1 b 2 a n 2 b n 1 b 1 b 2 b n 2 b n 1 This matrix has rank n 1. Let u 1 n,..., un 1 n be distinct elements of U n. For any x U n, consider the n 1 n matrix U x = B u 1 n. un n 1. x Each U x is in Ω, and it is easily checked that for distinct x U n, the associated U x are distinct in the sense of Proposition 3.4. Thus, by the result therein, M(Ω(U)) = C(IR n, IR).

8 Vladimir Ya. Lin and Allan Pinkus We now continue the induction argument. Assume that 1 d < n 1 and U n contains an infinite number of distinct elements. Let p Hk n vanish on L(Ω(U)). Then p(x) = q 0 ( x)x k n + q 1( x)x k 1 n + + q k ( x) where x = (x 1,..., x n 1 ), and q j ( x) is a homogeneous polynomial of degree j in the n 1 variables x 1,..., x n 1. For ũ Ũ = U 1 U n 1, p(ũ, x n ) = q 0 (ũ)x k n + q 1(ũ)x k 1 n + + q k (ũ) vanishes at each element of U n. Since there are an infinite number of such elements, we have p(ũ, x n ) 0 for each ũ Ũ. Thus q 0 (ũ) = = q k (ũ) = 0 for each ũ Ũ. We apply the induction hypothesis to obtain Thus p 0. q 0 q k 0. For d = 1, this result was proved in [14]. The condition that no non-trivial polynomial in H n (or Π n ) vanishes identically on L(Ω) is not one which is easily checked, unless d = n 1. The fact is that no simple condition seems possible because of the very complicated nature of the zero set of multivariate polynomials. 4. The Closure of M(Ω) Assume M(Ω) C(IR n, IR), i. e., M(Ω) does not satisfy the conditions of Theorem 2.1. Can we then identify in some way the closure of M(Ω)? Before stating the result of this section, we need some additional notation. For Ω as previously defined, let P Ω denote the set of those polynomials which vanish on L(Ω), i. e., P Ω = {p : p Π n, p L(Ω) = 0 }. (P Ω is a polynomial ideal.) Set N = ker P Ω = p P Ω ker p = {s : p(s) = 0, all p P Ω }. Note that L(Ω) N, and in general N may be much the larger set.

Ridge Functions 9 It is somewhat of a problem to simultaneously work with both Ω, which is a subset of d n matrices, and N a subset of IR n (1 n matrices). As such, let us note that in Theorem 2.1 the case where d > 1 is essentially equivalent to the case d = 1 in the following sense: Given Ω, let M(L(Ω)) = span{f(a x) : a L(Ω), f C(IR, IR)}. Then M(Ω) = M(L(Ω)). To see this simply note that as an application of Theorem 2.1, for each d n matrix A span{g(ax) : g C(IR d, IR)} = span{f(a x) : a L(A), f C(IR, IR)}. Theorem 4.1. In the topology of uniform convergence on compact subsets, the following three sets are equal: 1) A = span{g(ax) : A Ω, g C(IR d, IR)} 2) B = span{f(a x) : a N, f C(IR, IR)} 3) C = span{q(x) : q Π n, p(d)q = 0 for all p P Ω }. Proof. By definition, and from the remark previous to the statement of Theorem 4.1, A = M(Ω) = M(L(Ω)) = span{f(a x) : a L(Ω), f C(IR, IR)}. Since L(Ω) N, it follows that M(L(Ω)) B. Thus A B. The set L(Ω) is a balanced cone (see Proposition 3.5). Let p P Ω, and write p in the form r p = where r IN, and each p k is a homogeneous polynomial of total degree k. Since p(λa) = k=0 p k r λ k p k (a) k=0 for each a IR n, it follows from the balanced cone property of L(Ω) that p L(Ω) k = 0 for each k = 0, 1,..., r. Thus p k P Ω for each k, and from the definition of N we see that N is also a balanced cone. We now prove that B C. For each a IR n, l Z +, and homogeneous polynomial p k of total degree k { 0, k > l p(d)(a x) l = l! (l k)! p(a)(a x)l k k l. Thus for each a N, and every l Z +, p(d)(a x) l = 0 for all p P Ω since each p P Ω vanishes on N. Therefore (a x) l C for each a N and l Z +. The Weierstrass Theorem implies that B C.

10 Vladimir Ya. Lin and Allan Pinkus It remains to prove that C A. Since A is a closed linear subspace of C, it suffices to prove that each continuous linear functional on C(IR n, IR) which annihilates A also annihilates C. Every continuous linear functional m on C(IR n, IR) (in the topology of uniform convergence on compact subsets) may be represented in the form m(h) = h(x)dµ(x) IR n where µ is a Borel measure of finite total variation and compact support, see e. g. [7, p. 203]. Set 1 µ(ξ) = e iξ x dµ(x), (2π) n/2 IR n i. e., µ is the Fourier transform of µ. As is well-known, see e. g. [7, p. 389], µ is an entire analytic function on C n. Furthermore, assuming that µ annihilates A (i. e., g(ax)dµ(x) = 0 for all A Ω, g C(IR d, IR)), we have that µ vanishes on L(Ω). Set IR n µ(ξ) = u k (ξ) k=0 where u k is the homogeneous polynomial of total degree k in the power series expansion of µ. Since L(Ω) is a balanced cone it follows, as previously shown, that each u k vanishes on L(Ω). That is, u k P Ω for each k Z +. Now u k (ξ) = a m ξ m, and m 1! m n!a m = D m µ(ξ) ξ=0 = ( i)k x m dµ(x). (2π) n/2 IR n Therefore for any homogeneous polynomial q k of total degree k of the form we have IR n q k (x)dµ(x) = (2π) n/2 ( i) k Furthermore, as is easily checked, q k (x) = c m x m, m 1! m n!a m c m = (2π) n/2 ( i) k u k (D)q k (x). u k (D)q l (x) x=0 = 0

Ridge Functions 11 if l k. Thus if q Π n, and q = N q k, where each q k is a homogeneous polynomial of total degree k, then IR n q(x)dµ(x) = (2π) n/2 k=0 N ( i) k u k (D)q(x) x=0. k=0 This formula together with the fact that u k P Ω for all k implies that for each q Π n satisfying p(d)q = 0 for all p P Ω, we have IR n q(x)dµ(x) = 0. This proves that C A. Remark 4.1. If no non-trivial polynomial vanishes on L(Ω), then Theorem 4.1 implies that M(Ω) = span{f(a x) : a IR n, f C(IR, IR)} = span{q : q Π n } = C(IR n, IR). This is the different proof of sufficiency in Theorem 2.1 alluded to in Remark 2.2. Remark 4.2. In [16], Vostrecov and Kreines prove the equivalence A = B in the case d = 1. That is, an arbitrary function f(b x) (f C(IR, IR)) can be uniformly approximated on compact subsets by functions from span{g(a x) : a L(Ω), g C(IR, IR)} (Ω a subset of IR n ) if and only if all (homogeneous) polynomials which vanish on L(Ω) also vanish at b. 5. Variable Directions If we are given a finite number of d n matrices (d < n is fixed throughout this section) A 1,..., A k, then we know that M(A 1,..., A k ) = { g i (A i x) : g i C(IR d, IR)} i=1 is not dense in C(IR n, IR) (Corollary 3.3). However it does not follow from this fact that if we are permitted to vary the {A i } k i=1, while keeping k fixed (k may be very large), we do not get all of C(IR n, IR). This is the problem we address in this section. We set M k = M(A 1,..., A k ) A 1,...,A k

12 Vladimir Ya. Lin and Allan Pinkus and ask whether to each f C(IR n, IR) and compact K in IR n inf f g L g M (K) = 0. k The answer is no. However this is a natural question to ask as one of the objects and advantages of working with ridge functions is in choosing the directions A 1,..., A k depending upon the function f. We will prove the following result. Theorem 5.1. Given any k IN, there exists a f C(IR n, IR) and K IR n, compact, such that inf g M k f g L (K) > 0. Our proof of Theorem 5.1 is elementary, but not short. We start with some preliminary material. In the proof of Theorem 2.1 we constructed a linear functional which vanished on M(Ω). For Ω with only a finite number of terms A 1,..., A k, there exist simpler linear functionals annihilating M(A 1,..., A k ). One such set of linear functionals was given in [3] and in this more general setting may be defined as follows. For each i {1,..., k}, let b i IR n \{0} satisfy A i b i = 0. (Assume for convenience that the A i are distinct, L(A i ) L(A j ) for i j, and the vectors b i are distinct (and even pairwise linearly independent)). Consider the 2 k points (not necessarily all distinct) of the form y + ε j b j where ε j {0, 1}, j = 1,..., k, and y is any fixed vector in IR n. To each such point we associate the weight ( 1) ε, where ε = k ε j. Lemma 5.1. The non-trivial linear functional on C(IR n, IR) given by annihilates M(A 1,..., A k ). l(h) = Proof. It suffices to prove that ε j {0,1},...,k ε j {0,1},...,k ( 1) ε h(y + ( 1) ε g(a i (y + ε j b j ) ε j b j )) = 0 for every g C(IR d, IR) and each i {1,..., k}. Since A i b i = 0, it follows that the sum of the coefficients ( 1) ε of each of the distinct points A i (y+ k ε jb j ) in the above sum are

Ridge Functions 13 zero. That is, for each given {ε j }, j i, A i (y+ k ε jb j ) is a constant vector independent of ε i {0, 1}, and thus both +1 and 1 are the coefficients of g(a i (y + k ε jb j )) in the above sum. We will also use this next result. Lemma 5.2. There exists a constant c(k, n) depending only on k and n, such that for any given unit vectors c 1,..., c k IR n there exists a unit vector v IR n satisfying i = 1,..., k. c i v c(k, n), Proof. Let α n be the surface area of the unit ball in IR n. For each t (0, 1], let γ(t) denote the surface area of the unit ball covered by the set {w : w = 1, c w < t}, where c is any fixed unit vector in IR n. Obviously γ(t) is a continuous function of t, (γ(1) = α n ) and lim t 0 + γ(t) = 0. Let t k satisfy kγ(t k ) = α n. Let c 1,..., c k be any k unit vectors. By construction there must exist a unit vector v such that Set c(k, n) = t k. c i v t k i = 1,..., k. We are now prepared to prove Theorem 5.1. Proof of Theorem 5.1. Let A 1,..., A k be any k d n matrices. With no loss of generality, assume that L(A i ) L(A j ) for i j. Let K be any compact subset of IR n with interior. Without loss of generality we will also assume that K contains the ball, centered at the origin, with radius σ k, some σ > 0. Let f C(IR n, IR) vanish on K outside the ball with center 0 and radius σ c(k, n), and f(0) = 1. To prove the theorem we will show that inf f g L g M (K) k c(k, n) 2 k. We recall that if L is a subspace of a normed linear space X, and l X annihilates L, then inf f g X l(f). g L l X (This follows very easily.) We will apply this inequality to prove our result. For each A 1,..., A k as above, choose unit vectors c 1,..., c k such that A i c i = 0, i = 1,..., k. (We may choose the c i so that they are pairwise disjoint.) Let v be a unit vector satisfying c i v c(k, n), i = 1,..., k, as given by Lemma 5.2. Choose δ i { 1, 1} so that δ i c i v c(k, n), i = 1,..., k,

14 Vladimir Ya. Lin and Allan Pinkus and set b i = σδ i c i, i = 1,..., k. Define l(g) = ε j {0,1},...,k ( 1) ε g( ε j b j ). l is a non-trivial linear functional on C(K) (since each of the k ε jb j lie in K) of norm at most 2 k. By Lemma 5.1, l annihilates M(A 1,..., A k ). Now if any of the {ε j } k are not zero, then Thus ε j b j ( k ε j b j v ) This proves the theorem. l(f) = ε j {0,1},...,k ( 1) ε f( ε j (b j v) σ c(k, n). ε j b j ) = f(0) = 1. Remark 5.1. M k is not a closed set (k > 1). That is, there exist g M k which are not in M(A 1,..., A k ) for any choice of distinct A 1,..., A k. For example if k = 2, n = 2 (and thus d = 1), then x 1 x 2 2 M 2, but x 1 x 2 2 M(a1, a 2 ) for any two vectors a 1, a 2 IR 2. The mechanisms here are understood (see e.g. [17]), but are technical. We will not go into detail. Acknowledgement. We would like to thank Profs. Ron Aharoni, Yuri Brudnyi, Amos Ron, and Vladimir Tikhomirov for their comments, suggestions and contributions. References [1] Chui, C. K., Li, Xin, Approximation by ridge functions and neural networks with one hidden layer, to appear in J. Approx. Theory. [2] Cybenko, G., Approximation by superpositions of a sigmoidal function, Math. of Control, Signals, and Systems 2 (1989), 303 314. [3] Braess, D., Pinkus, A., Interpolation by ridge functions, to appear in J. Approx. Theory. [4] Dahmen, W., Micchelli, C. A., Some remarks on ridge functions, Approx. Theory and its Appl. 3 (1987), 139 143. [5] Diaconis, P., Shahshahani, M., On nonlinear functions of linear combinations, SIAM J. Sci. Stat. Comput. Applications 5 (1984), 175 191. [6] Donoho, D. L., Johnstone, I. M., Projection-based approximation and a duality method with kernel methods, Ann. Statist. 17 (1989), 58 106.

Ridge Functions 15 [7] Edwards, R. E., Functional Analysis, Theory and Applications, Holt, Rinehart and Winston, New York, 1965. [8] Hamaker, C., Solomon, D. C., The angles between the null spaces of X-rays, J. Math. Anal. and Appl. 62 (1978), 1 23. [9] Huber, P. J., Projection Pursuit, Ann. Statist. 13 (1985), 435 475. [10] John, F., Plane Waves and Spherical Means applied to Partial Differential Equations, Interscience Publishers, Inc., New York, 1955. [11] Leshno, L., Lin, V. Ya., Pinkus, A., Schocken, S., Multilayer feedforward networks with a non-polynomial activation function can approximate any function, preprint. [12] Logan, B. F., Shepp, L. A., Optimal reconstruction of a function from its projections, Duke Math. J. 42 (1975), 645 659. [13] Natterer, F., The Mathematics of Computerized Tomography, John Wiley & Sons, 1986. [14] Sun, X., Cheney, E. W., The fundamentality of sets of ridge functions, preprint. [15] Vostrecov, B. A., Kreines, M. A., Approximation of continuous functions by superpositions of plane waves, Dokl. Akad. Nauk SSSR 140 (1961), 1237 1240 = Soviet Math. Dokl. 2 (1961), 1326 1329. [16] Vostrecov, B. A., Kreines, M. A., Approximation of a plane wave by superpositions of plane waves of given directions, Dokl. Akad. Nauk SSSR 144 (1962), 1212 1214 = Soviet Math. Dokl. 3 (1962), 875 877. [17] Vostrecov, B. A., Conditions for a function of many variables to be representable as a sum of a finite number of plane waves traveling in given directions, Dokl. Akad. Nauk SSSR 153 (1963), 16 19 = Soviet Math. Dokl. 4 (1963), 1588 1591. Vladimir Ya. Lin Department of Mathematics Technion, I. I. T. Haifa, 32000 Israel Allan Pinkus Department of Mathematics Technion, I. I. T. Haifa, 32000 Israel