# The learning task is to use examples of classied points to be able to correctly classify all possible points. In neural network learning, decision bou

Save this PDF as:

Size: px
Start display at page:

Download "The learning task is to use examples of classied points to be able to correctly classify all possible points. In neural network learning, decision bou"

## Transcription

1 Decision Region Approximation by Polynomials or Neural Networks Kim L. Blackmore Iven M.Y. Mareels Robert C. Williamson Abstract We give degree of approximation results for decision regions which are dened by polynomial parametrizations and by neural network parametrizations. The volume of the misclassied region is used to measure the approximation error. We use results from the degree of approximation of functions by rst constructing an intermediate function which correctly classies almost all points. For both classes of approximating decision regions, we show that the degree of approximation is at least r, where r can be any number in the open interval (0 1). Keywords: Rate of approximation Decision region Classication Polynomials Neural Networks. 1 Introduction Decision regions arise in the machine learning problem of sorting or classication of data [24]. Points contained in the decision region are positively classied, and points outside the decision region are negatively classied. For a decision region D R n, this classication can be described by the discriminant fuction y D (x) = 8 >< >: 1 if x 2 D ;1 otherwise Comms Division, TSAS, DSTO, PO Box 1500, Salisbury SA 5108, Australia. Formerly at: Department of Engineering, Australian National University, Canberra ACT 0200, Australia (1.1) 1

2 The learning task is to use examples of classied points to be able to correctly classify all possible points. In neural network learning, decision boundaries are often represented as zero sets of certain functions, with points contained in the decision region yielding positive values of the function, and points outside the decision region yielding negative values [4]. In this case, the learning task is to use examples of correctly classied points to identify a parameter a 2 R m for which the set fx : f(a x) 0g, called the positive domain of f(a ), matches the true decision region. For the purposes of analysing a learning algorithm, it is useful to assume that a suitable value of the parameter exists. However, there is no general reason why such an assumption is satised in practice. Even if there is a class of functions f( ) and a parameter a such that the positive domain of f(a ) matches the true decision region, there is usually no way of identifying this class a priori. It is therefore useful to know how well particular classes of functions can approximate decision regions with prescribed general properties. In particular, it is important to know how fast the approximation error decreases as the approximating class becomes more complicated e.g. as the degree of a polynomial or the number of nodes of a neural network increases. The question of approximation of functions has been widely studied. The classical Weierstrass Theorem showed that polynomials are universal approximators [19] (in the sense that they are dense in the space of continuous functions on an interval). Many other classes have been shown to be universal approximators, including those dened by neural networks [11]. Other theoretical questions involve determining whether or not best approximations exist and are unique, for approximating functions from particular classes [6, 26]. There are also degree of approximation results, which tell the user how complicated a class of approximating functions must be in order to guarantee a certain degree of accuracy of the best approximation. The classical Jackson Theorem [6] is the rst example of this. Hornik [12], Barron [1], Mhaskar and Michelli [21], Mhaskar [20], Darken et al. [7] and Hornik et al. [13] give degree of approximation results for neural networks. Even more powerful results give the best class of functions to use in approximating particular classes of functions, by showing converses to Jackson type theorems for certain classes of functions [22]. Given that we are representing a decision region as the positive domain of a function, function approximation results do not immediately translate into the decision region context. In order to approximate a given decision region, the rst task is 2

3 to identify a function g with suitable smoothness properties for which the positive domain closely matches the decision region. Function approximation then establishes the existence of a good approximation f of g, where f belongs to the desired class of functions. \Best" may be measured in terms of any function norm, such as the innity norm or the 2-norm. However, such measures of good function approximation do not guarantee that the positive domain of the approximate function is close to the original decision region. For instance, there may be arbitrarily many points x where g(x) is small even though x is far from a zero of g. At these points good approximation of g may not ensure that the sign of g(x) equals the sign of the best approximation f(x), so the positive domains of g and f may not match at arbitrarily many points. Restrictions on the function g may guarantee that good function approximation will imply good decision region approximation. However, it is not clear how restrictions on the function describing a decision region translate to restrictions on the decision region itself, which is all that is given in the original problem formulation the function g is introduced solely for the sake of solving the problem. The problem of approximating sets, rather than functions, has received some attention in the literature. Approximation of (unparametrised) sets and curves has been studied for pattern recognition and computer vision purposes [2, 15, 27]. The approach is quite dierent to the approach here. Theoretical work can be grouped according to two basic approaches namely explicit and implicit parametrisations. \Explicit parametrisation" refers to frameworks where the decision boundary is parametrised. For example if the decision region is a set in R n, the decision boundary might be considered the graph of a function on R n;1, or acombination of such graphs. \Implicit parametrisation" refers to frameworks (as used in this thesis) where the decision region is the positive domain of some function. Most existing work is in terms of explicit parametrisations [16]. For instance, Korostelev and Tsybakov [17, 18] consider the estimation (from sample data) of decision regions. Although they consider non-parametric estimation, it is in fact the explicit rather than implicit framework as dened above (they reduce the problem to estimating functions whose graphs make up parts of the decision boundary). In a similar vein, Dudley [8] and Shchebrina [23] have determined the metric entropy of certain smooth curves. Regarding the implicit problem, Mhaskar [20] gives a universal approximation type result for approximation by positive domains of certain neural network functions. It appears that the argument in [20] can not be used to determine the degree of 3

4 approximation. Ivanov [14] summarises many problems in algebraic geometry concerned with the question of when a smooth manifold can be approximated by a real algebraic set but does not address the degree of approximation question. In work similar to that described in [14], Broglia and Tognoli [5] consider when a C 1 function can be approximated by certain classes of functions without changing the positive domain. In this paper we use function approximation results to determine the degree of approximation of decision regions by positive domains of polynomial functions. This is achieved by constructing a continuous function from the discriminant function for the decision region, using a convolution process. The continuous function can be approximated by polynomials, to a certain degree of accuracy, and this gives a bound on the distance that the boundary of the approximate decision region can be from the true decision boundary. We then use a result from dierential geometry to link this distance with the size of the misclassied volume. Since most learning problems can be analysed probabilistically, thevolume of the misclassied region has a natural interpretation as the probability of misclassication by the approximate decision region when the data are drawn from a uniform distribution over the input space. The next section of this paper contains a formal statement of the degree of approximation problem for decision regions, and denitions of the dierent measures of approximation error we use in the paper, along with results showing how the measures can be connected. Section 3 contains the construction of a smooth function g whose positive domain closely matches any (reasonable) decision region, by convolution of the discriminant function for the decision region with a square convolution kernel. Section 4 contains the polynomial approximation results. Our main result is Theorem 4.2, which says that the volume of the misclassied region when a decision region with smooth boundary is approximated by the positive domain of a polynomial of degree m, goes to zero at least as fast as m ;r, where r can be made as close to (but less than) 1 as desired. By \smooth boundary" we mean essentially that the boundary is a nite union of n ; 1 dimensional manifolds. In Section 5 a similar result is given for decision regions dened by neural networks and the two results are compared. Section 6 concludes. 4

5 2 Measuring Approximation Error 2.1 The Approximation Problem We assume that a decision region is a closed subset D of a compact set X R n, called the sample space. Points in the sample space are classied positively if they are contained in the decision region, and negatively if they are not. We wish to determine how well a decision region can be approximated by the positive domain of functions belonging to a parametrized class of functions, in the sense of minimizing the probability of misclassication. If points to be classied are chosen uniformly throughout the sample space X, the probability of misclassication is equal to the volume of the misclassied region, i.e.the volume of the symmetric dierence of the two sets. For decision regions D 1 D 2 X, the volume of the misclassied region is V (D 1 D 2 ):=vol(d 1 4D 2 )= Z D 1 4D 2 For a decision region D X and an approximate decision region X, we say that approximates D well if V (D ) is small thus most points in X are correctly classied by. It is assumed that the approximating decision regions belong to a class C d of subsets of X which gets progressively larger as d increases. That is, C d 1 C d 2 if d 1 < d 2. Typically, d is a non-decreasing function of the dimension of the parameter space. If the true decision region is D, then for any particular choice of d the minimum approximation error is inf 2C d V (D ). Clearly the minimum approximation error is a non-increasing function of d. For some choices of C d, the minimum approximation error goes to zero as d! 1. In such cases, the classes C d are said to be uniform approximators. The degree of approximation problem for uniform approximators C d involves determining how quickly the minimum approximation error decreases. dx: The Degree of Approximation Problem Let X R n be compact, let D be a set of subsets of X and for each d>0, let C d be a set of subsets of X, such that lim sup d!1 D2D inf V (D ) = 0: 2C d Find the largest R 0 such that, for all suciently large d, sup inf V (D ) c 2C d d R D2D 5

6 where c is constant with respect to d. The constant R in (2.1) is called the degree of approximation for the class C d of decision regions. Typically one is interested in solving the degree of approximation question for a class D of subsets of D. Our results are for D := fd is a nite union of smooth manifoldsg. In Sections 4and5itisshown that the degree of approximation for D by polynomial and neural network decision regions is bounded below by any r 2 (0 1). 2.2 Corridor Size Let B(x ) := fz 2 R n : kx ; zk g, the closed ball with centre x and radius, where kk denotes the 2 norm (Euclidean distance) in R n. For any set D R denotes the boundary of D. Denition 2.1 The corridor around any set D R n is the set [ D + := B(x ): x2d The corridor size from D to is dened to be (D ) := + g The corridor size is the smallest value of such that all points in the boundary of are not further than from the boundary of D. The corridor size is not a metric because it is not symmetric however it is worth noting that maxf(d ) ( D)g is the Hausdor distance (which is a metric). 2.3 Relating V to The construction in Section 4 of the approximating set gives an upper bound on the minimum corridor size from D to rather than on the minimum volume of the misclassied region. So in order to answer the approximation problem, it is necessary to relate the corridor size from D to to the volume of the misclassied region between D and. The following relationship follows immediately from the denitions of V and. 6

7 Lemma 2.2 Let D R n. If (D ) = and D +, then V (D ) + ): The requirement that D + is necessary because the corridor size can be small if either most points are correctly classied, or most points are misclassied. The function classes considered in Sections 4 and 5 contain complements of all of their members, so this is not a restrictive assumption. Lemma 2.2 shows it is possible to bound the volume of the misclassied region by bounding the volume of the corridor This requires some knowledge of the size and smoothness For instance, is a space lling curve, then the volume of any corridor will be equal to the volume of X, and knowledge of the size of the corridor oers no advantage. On the other hand, if D is a ball with radius greater than the corridor size, then the volume is equal to two times the corridor size multiplied by the surface area of the ball. In order to obtain a general result, we assume that the decision boundary is nite union of hypersurfaces (n ; 1) dimensional submanifolds of R n, and measure the size of the hypersurface using the surface area [25, 3]. Denition 2.3 A set M R n is an n ; 1 dimensional submanifold of R n if for every x 2 M, there exists an open neighbourhood U R n of x and a function f : U! R n such that f(u) R n is open, f is a C 1 dieomorphism onto its image and either 1. f(u \ M) =f(u) \ R n;1, or 2. f(u \ M) =f(u) \fy R n;1 : y(1) 0g. Here y(1) denotes the rst component of the vector y. The usual denition of a submanifold allows only the rst case. When both cases are allowed, M is usually called a submanifold with boundary. We allow both cases because our consideration of decision regions conned to a compact domain implies that many interesting decision boundaries are not true submanifolds. Moreover, to be a union of submanifolds rather than a single submanifold means D may have (well behaved) sharp edges. For instance if X = [1 1] n and the decision region is the halfspace fx 2 X : a > x 0g, then the decision boundary consists of a union of up to 2n polygonal faces. Each of these faces is an n;1 dimensional submanifold (with boundary). 7

8 Denition 2.4 Let M be a union of nitely many n ; 1 dimensional submanifolds of R n. Let the points u 2 M be locally referred to parameters u(1) ::: u(n ; 1), which are mapped to the Euclidean space R n;1 with the coordinates v(1) ::: v(n;1). The surface area of M is dened as area(m) := Z M det(r)du(1) :::du(n ; 1) where R = [R ij ] R ij Thus area(m) is the volume of the image of M R n;1. If n =2,then M is acurve in the plane, and area(m) is the length of M. Using these denitions the volume of the corridor around a decision region can be bounded as follows: Lemma 2.5 Let D X R n, X compact. is a union of nitely many n ; 1 dimensional submanifolds of R n then there exists =(D) > 0 such that + ) 4 for all such that 0 <<. This result is intuitively obvious, can be locally approximated by an n ; 1 dimensional hyper-plane, and the volume of the corridor around a piece of an n;1 dimensional hyper-plane with area a is 2a + O( 2 ). A rigorous proof of Lemma 2.5 can be given using a result by Weyl that appears in [25]. If the decision boundary is a union of n ; 1 dimensional submanifolds of R n, Lemmas 2.2 and 2.5 can be employed to translate an upper bound on the minimum corridor distance into an upper bound on the minimum misclassied volume. Using this method, the surface area of the decision boundary does not aect the degree of approximation of decision regions, but only the constant in the approximation bound. The smoothness properties of the decision boundary, such as the curvature, do not even aect the constant in the approximation bound, according to the result in Weyl [25], they determine constants multiplying higher order terms in. This is in contrast with the function approximation results, where higher order smoothness of the original function does give higher degree of approximation (see Theorem 4.1). It appears unknown whether such a relationship exists for approximation of decision regions by positive domains of parametrised functions. 8

9 3 Construction of a Smooth Discriminant In the following lemma we construct a Lipschitz continuous approximation to the discriminant function by convolution of the discriminant function with a function of compact support. This new function satises sgn g(x) =y D (x) for all x suciently far from the decision boundary. In Sections 4 and 5 we use this constructed function to apply a bound on the rate of function approximation to our problem of bounding the rate of decision region approximation. In the following, the i-th component of any vector x 2 R n is denoted x(i), and I (x s) :=fz 2 R n : jz(i) ; x(i)j < s i=1 ::: ng is the open n-cube with centre x 2 and side s. Lemma 3.1 Let D R n, and 0 < < 1 p n. Dene functions h g : R n! R as follows: h(x) := y I(0 s) +1 2s n (3.1) where s = 2 p n, and g(x) :=(h y D )(x) = Z R n h(x ; t)y D (t)dt: (3.2) Then 1. For all x +, g(x) =y D (x). 2. g is Lipschitz continuous, with Lipschitz constant n 3 2. From equations 3.1 and 3.2 it can be seen that Z g(x) =s ;n y D (t)dt Therefore g(x) 2 [;1 1]. = I(x s) vol (I (x s) \ D) ; vol (I (x s) nd) (3.3) vol (I (x s)) Proof 1. Let x 2 + ). Since the greatest distance from x to any point ini (x s) is 9

10 , I (x s) D. Thus the second volume in the numerator of (3.3) is zero. Therefore g(x) =1=y D (x). A similar argument gives the result if x 62 D and x 62 + ). 2. Continuity of g follows from the denition of g asaconvolution of two bounded, piecewise constant, functions. For Lipschitz continuity, we need to show that jg(x 1 ) ; g(x 2 )j n 3 2 kx 1 ; x 2 k (3.4) for any x 1 x 2 2 R n. First, note that for any x 1 x 2 2 R n, jg(x 1 ) ; g(x 2 )j2 2n = n 3 2 s. So if kx 1 ; x 2 ks then (3.4) holds. Now assume that kx 1 ; x 2 k < s. Then I 1 \ I 2 6=, where I 1 = I (x 1 s) and I 2 = I (x 2 s). From (3.3), jg(x 1 ) ; g(x 2 )j = s ;n vol(i1 \ D) ; vol(i 1 nd) ; vol(i 2 \ D)+vol(I 2 nd) = s ;n vol(i1 ni 2 \ D) ; vol((i 1 ni 2 )nd) ; vol(i 2 ni 1 \ D)+vol((I 2 ni 1 )nd) s ;n (vol(i 1 ni 2 )+vol(i 2 ni 1 )) which is the volume of the symmetric dierence between I 1 and I 2, divided by the volume of the n-cubes I 1 and I 2. That is, jg(x 1 ) ; g(x 2 )j2 ; 2s ;n vol(i 1 \ I 2 ) The intersection I 1 \ I 2 is a rectangular region in R n, with side of length s ;jx 1 (i) ; x 2 (i)j in the direction of the i-th axis. Thus where kxk 1 = max i=1 ::: n jx(i)j. jg(x 1 ) ; g(x 2 )j2; 2s ;n (s ;jx 1 (i) ; x 2 (i)j) i=1 (! 2 1 ; 1 ; kx n ) 1 ; x 2 k 1 s Writing z =1; kx 1;x 2 k 1, we use the fact that s ny 1 ; z n n(1 ; z) whenever 0 <z< 1 and n>1 (see Theorem 42 of [10]), to give jg(x 1 ) ; g(x 2 )j 2n s kx 1 ; x 2 k 1 = n 3 2 kx 1 ; x 2 k 1 n 3 2 kx 1 ; x 2 k: 10

11 Thus (3.4) holds for all x 1 x 2 2 R n. The function g : R n! R is continuous and sgn g correctly classifes all points in R n, except possibly points +. If h is replaced with a smoother convolution kernel, the resulting function g will be smoother. For instance, if h has p continuous derivatives, and the p-th derivative is bounded, then g will have p continuous derivatives, and the p-th derviative will be bounded [28]. At the end of Section 4 we show that this apparent improvement does not actually aect the order of the approximation result achievable by our argument. 4 Polynomial Decision Regions First, some notation from polynomial function approximation: 1. For any function f : R n! R and any vector 2 N n 0, we say D (n), where P n i=1 (i) =p (i) 0 is a p-th order derivative of f. 2. P n d is the space of polynomials of degree at most d in x(1) ::: x(n). That is, P n d is the space of all linear combinations of x(1) s 1 x(2) s 2 x(n) sn with P ni=1 s i d s i 2 N 0. The number of parameters necessary to identify elements in Pd n is the number of ways of choosing n nonnegative integers s i so that P ni=1 s i d, since the parameters are the coecients in the linear combination. This number is less than (d +1) n. 3. CPd n is the class of polynomial decision regions. Each decision region in CPd n is the positive domain of a polynomial in Pd n. Specically, CP n d := 8 >< >: X : 9f 2Pn d satisfying f(x) 0 if x 2 f(x) < 0 if x 62 9 >= > : In this section and in Section 5, c 2 R denotes a quantity which is independent of d. Dependence of c on other variables will be indicated by, for instance, c = c(n). If no such indication is given, c is an absolute constant. The exact value of c will change without notice, even in a single expression. 11

12 The following result is an n-dimensional generalisation of Jackson's Theorem. It can be derived from Theorem 9.10 of Feinerman and Newman [9]. The derivation closely mimics the derivation of Theorem 4.5 in [9] from Theorem 4.2 in [9]. Theorem 4.1 Let X = [;1 1] n and g : X! R. with Lipschitz constant L for all 2 N n 0 c(p) > 1 such that if d + n p +1. inf sup f2p n d x2x If D g is Lipschitz continuous such that P n i=1 (i) =p, then there exists jg(x) ; f(x)j c(p) L n 3 2 (p+1) (d + n) p+1 In the following Theorem 4.1 is used to determine the degree of approximation of decision regions possessing a smooth boundary by polynomial decision regions. First it is shown that the minimum corridor distance between the true decision region and the approximating decision regions goes to zero at least as fast as d ;r, where 0 <r<1. This bound is then used in order to obtain aboundon the misclassied volume. Theorem 4.2 Let D X = [;1 1] n. is a union of nitely many n ; 1 dimensional submanifolds of R n then for any r 2 (0 1) there exist constants c, M(r D n) > 1 such that for all d c(r D n). inf V (D ) < c 2CP n d d r Proof Choose d 1 and r 2 (0 1). Dene d = 1 (d+n) r. Dene g d as in (3.2), using = d. Then g d is C 0, with Lipschitz constant n 3 2 According to Theorem 4.1, there exists f d 2Pd n satisfying d := sup x2x If d d r =(cn 3 ) 1 1;r ; n then d < 1. jg d (x) ; f d (x)j c n 3 2 d n 3 2 d + n cn 3 (d + n) 1;r : d. 12

13 By Lemma 3.1, jg d (x)j =1for all x + d. So if d>d r, for all points outside of the d corridor f d (x) =g d (x) ; (g d (x) ; f d (x)) has the same sign as g d (x). That is, points outside of the d corridor are correctly classied. The corridor size from D to the positive domain of f d is thus bounded above by d. Set c = 4 and c(r D n) = maxfd r ;1 ; ng, where is dened in Lemma 2.5. The result follows from Lemmas 2.2 and 2.5. The requirement that r<1 in Theorem 4.2 is essential since the constant d r increases exponentially with 1. Obviously larger r gives a stronger order of approximation 1;r result in the limit d!1,butc(r D n), the lower bound on d for which the result holds, will be larger. It might seem that Theorem 4.2 could be improved upon by making the intermediate function g smoother, since this gives great improvement in the result of Theorem 4.1. However, this improvement cannot be achieved using the obvious generalisation of our convolution technique, as we now show. Assume g = h y D where h is any nonnegative function with support I(0 2n ;1=2 ) which is dierentiable to (p+1)-th order suchthatjd hjb for all 2 N n 0 suchthat P ni=1 (i) =p +1. Then g is dierentiable to (p + 1)-th order and jd gjc(n) n B for all 2 N n 0 such that P n i=1 (i) = p +1. This bound is found by using the fact that P n i=1 (i) = p +1. D g = y D h (Theorem 18.4 in [28]), and that convolution involves integration over an n-cube of side 2n ;1=2. On the other hand the fact that h has compact support and is dierentiable to p + 1-th order implies that jh(x)j c(n p) p+1 M, so for all x +, jg(x)j c(n p) p+1+n M. Now applying Theorem 4.1 gives inf sup g2p n d x2x jg(x) ; f(x)j c(p n) n B n 3 2 (p+1) (d + n) p+1 : Following the proof of Theorem 4.2, a value of is sought which will guarantee that the function approximation error is less than the absolute value of g for all x +. This involves nding d such that c(n p) d nb <c(n p)p+1+n p+1 d M: (d + n) This only holds for all large d if d > c(n p). But the decision region approximation d+n error is a constant times d, so if this method is used the lower bound on the degree of approximation must be less than 1. 13

14 Thus it has been demonstrated that if g is the convolution of the discriminant function with any function h of compact support, then an argument analogous to that of Theorem 4.2 cannot give a bound with better order of magnitude in d. (Of course this does not preclude better bounds existing.) Thus the degree of approximation result obtainable by the method of Theorem 4.2 does not depend on the smoothness of the intermediate function g, which is constructed solely for the sake of the proof. 5 Neural Network Decision Regions Using the techniques of the last section, a result similar to Theorem 4.2 which applies when the approximating decision regions are dened by neural networks will be derived. In order to do this, function approximation result similar to Theorem 4.1, for functions dened by neural networks is used. First some notation: 1. A sigmoidal function is any continuous function : R! R which satises lim (x) =0 lim x!;1 (x) =1: x!1 2. N n d is the space of functions dened by two hidden layer feedforward neural networks with n inputs, n(d n + 1) nodes in the rst layer, and d n +1 nodes in the second layer. That is, Nd n is the space of all linear combinations d n X j=0 j nx k=1! jk > jk x + jk + "j where x jk 2 R n, and j jk jk " j 2 R. In order to identify elements in N n d, one must specify (n 2 +2n + 2)(d n + 1) real numbers. 3. CNd n is the class of neural network decision regions. Each decision region in is the positive domain of a function in Nd n. Specically, CN n d CN n d := 8 >< >: X : 9f 2Nn d satisfying f(x) 0 if x 2 f(x) < 0 if x 62 Theorem 5.1 is a special case of Theorem 3.4 of Mhaskar [20]. (In [20] the function is dened on [0 1] n rather than [;1 1] n.) 9 >= > : 14

15 Theorem 5.1 Mhaskar Let X = [;1 1] n and g : X! R. If g is Lipschitz continuous with Lipschitz constant L then there exists a constant c( n) such that for all d 1. inf f2n n d sup jg(x) ; f(x)j c( n)l x2x d This result is very similar to Theorem 4.1 in the case p =0. Using the technique in Section 4, the following theorem holds: Theorem 5.2 Let D X = [;1 1] n. is a union of nitely many n ; 1 dimensional submanifolds of R n then for any r 2 (0 1) there exist constants c, c(r D n) > 1 such that for all d c(r D n). inf V (D ) < c 2CN n d d r Comparing Theorem 5.2 with Theorem 4.2, it can be seen the two classes of approximating decision regions give the same upper bound on the degree of approximation for the two class CPd n and CNd n. Moreover, the number of parameters necessary to specify elements in the approximating classes is of order d n in both cases. Thus there is no apparent advantage of one class over the other. However lower bounds on the degree of approximation are needed in order to conclude that polynomial decision regions and neural network decision regions are equally capable approximators of decision regions. 6 Concluding Remarks We have given degree of approximation results for implicit decision region approximation which are similar to Jackson's Theorem for polynomial function approximation. The approximating decision regions are dened by the positive domains of polynomial functions or feedforward neural networks. These results support our intuition that classes of functions which are good function approximators tend to be good implicit decision region approximators. Many open problems remain the most pressing being \What conditions give better degree of approximation?" In function approximation, higher order smoothness of the approximated function gives a better degree of approximation. For instance, 15

16 in Theorem 4.1 if the p-th derivative is Lipschitz continuous, then the degree of approximation is at least p + 1. We would expect that there exist restrictions on the decision region to be approximated, D, which will guarantee a better degree of approximation than our results suggest. Moreover, we would expect that there would be a series of successively tighter restrictions on D which would guarantee successively better degree of approximation results. However, it is not clear what the right conditions are. As discussed in Section 4, using a smoother convolution kernel h to construct a smoother intermediate function f will not give a better degree of approximation using our methods. It is also clear that bounding the curvature of the boundary of D will not aect the degree of approximation using our argument, since all information about the decision boundary other than its area aects only higher order terms in the approximation bound, not the degree of approximation obtained in Theorem 4.2. Perhaps the number of connected components in D is the condition we need. Or perhaps the curvature properties of the decision boundary are important, but a tighter method of bounding V (D ) than the volume of the corridor size is needed. Maybe a completely dierent proof technique is needed to get higher degree of approximation results. 7 Acknowledgements The authors would like to thank Peter Bartlett for helpful discussions motivating this research, and the reviewers and associate editor of this journal for their comments. This work was supported by the Australian Research Council. References [1] A.R. Barron. Universal approximation bounds for superposition of a sigmoidal function. IEEE Transactions on Information Theory, 39:930{945, [2] A. Bengtsson and J.-O. Eklundh. Shape representation by multiscale contour approximation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31:85{93,

17 [3] M. Berger and B. Gostiaux. Dierential Geometry: Manifolds, Curves, and Surfaces. Springer-Verlag, New York, [4] K.L. Blackmore, R.C. Williamson, and I.M.Y. Mareels. Learning nonlinearly parametrized decision regions. Journal of Mathematical Systems, Estimation, and Control, To appear. [5] F. Broglia and A. Tognoli. Approximation of C 1 functions without changing their zero-set. Ann. Inst. Fourier, Grenoble, 39(3):611{632, [6] E.W. Cheney. Introduction to Approximation Theory. Chelsea, New York, 2nd edition, [7] C. Darken, M. Donahue, L. Gurvits, and E. Sontag. Rate of approximation results motivated by robust neural network learning. In Proceedings of the Sixth ACM Conference on Computational Learning Theory, pages 103{109, [8] R.M. Dudley. Metric entropy of some classes of sets with dierentiable boundaries. Journal of Approximation Theory, 10:227{236, [9] R.P. Feinerman and D.J. Newman. Polynomial Approximation. The Williams and Wilkins Company, Baltimore, [10] G.H. Hardy, J.E. Littlewood, and G. Polya. Inequalities. Cambridge University Press, Cambridge, [11] K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2:359{366, [12] K. Hornik, M. Stinchcombe, and H. White. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward neural networks. Neural Networks, 3:551{560, [13] K. Hornik, M. Stinchcombe, H. White, and P. Auer. Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Technical Report NC-TR , NeuroCOLT Technical Report Series, January [14] N.V. Ivanov. Approximation of smooth manifolds by real algebraic sets. Russian Math. Surveys, 37(1):1{59, [15] P.S. Kenderov and N.K. Kirov. A dynamical systems approach to the polygonal approximation of plane convex compacts. Journal of Approximation Theory, 74:1{ 15, [16] N.P. Korneichuk. Approximation and optimal coding of plane curves. Ukranian Mathematical Journal, 41(4):429{435,

18 [17] A.P. Korostelev and A.B. Tsybakov. Estimation of the density support and its functionals. Problems of Information Transmission, 29(1):1{15, [18] A.P. Korostelev and A.B. Tsybakov. Minimax Theory of Image Reconstruction. Lecture Notes in Statistics v. 82. Springer, New York, [19] G.G. Lorentz. Approximation of Functions. Holt, Rinehart and Winston, New York, [20] H.N. Mhaskar. Approximation properties of a multilayered feedforward articial neural network. Advances in Computational Mathematics, 1:61{80, [21] H.N. Mhaskar and C.A. Micchelli. Approximation by superposition of sigmoid and radial basis functions. Advances in Applied Mathematics, 13:350{373, [22] P.P. Petrushev and V.A. Popov. Rational Approximation of Real Functions. Cambridge University Press, Cambridge, [23] N.V. Shchebrina. Entropy of the space of twice smooth curves in IR n+1. Math. Acad. Sci. USSR, 47:515{521, [24] J. Sklansky and G.N. Wassel. Pattern Classiers and Trainable Machines. Springer- Verlag, New York, [25] H. Weyl. On the volume of tubes. American Journal of Mathematics, 61:461{472, [26] R.C. Williamson and U. Helmke. Existence and uniqueness results for neural network approximations. IEEE Transactions on Neural Networks, 6(1):2{13, [27] J.-S. Wu and J.-J. Leou. New polygonal approximation schemes for object shape representation. Pattern Recognition, 26:471{484, [28] A.C. Zaanen. Continuity, Integration and Fourier Theory. Springer-Verlag, Berlin,

### Introduction to Real Analysis

Introduction to Real Analysis Joshua Wilde, revised by Isabel Tecu, Takeshi Suzuki and María José Boccardi August 13, 2013 1 Sets Sets are the basic objects of mathematics. In fact, they are so basic that

### Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

### McGill University Math 354: Honors Analysis 3

Practice problems McGill University Math 354: Honors Analysis 3 not for credit Problem 1. Determine whether the family of F = {f n } functions f n (x) = x n is uniformly equicontinuous. 1st Solution: The

### balls, Edalat and Heckmann [4] provided a simple explicit construction of a computational model for a Polish space. They also gave domain theoretic pr

Electronic Notes in Theoretical Computer Science 6 (1997) URL: http://www.elsevier.nl/locate/entcs/volume6.html 9 pages Computational Models for Ultrametric Spaces Bob Flagg Department of Mathematics and

### Estimating the sample complexity of a multi-class. discriminant model

Estimating the sample complexity of a multi-class discriminant model Yann Guermeur LIP, UMR NRS 0, Universit Paris, place Jussieu, 55 Paris cedex 05 Yann.Guermeur@lip.fr Andr Elissee and H l ne Paugam-Moisy

### Some Background Material

Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

### Near convexity, metric convexity, and convexity

Near convexity, metric convexity, and convexity Fred Richman Florida Atlantic University Boca Raton, FL 33431 28 February 2005 Abstract It is shown that a subset of a uniformly convex normed space is nearly

### Multilayer feedforward networks are universal approximators

Multilayer feedforward networks are universal approximators Kur Hornik, Maxwell Stinchcombe and Halber White (1989) Presenter: Sonia Todorova Theoretical properties of multilayer feedforward networks -

### below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

### Entrance Exam, Real Analysis September 1, 2017 Solve exactly 6 out of the 8 problems

September, 27 Solve exactly 6 out of the 8 problems. Prove by denition (in ɛ δ language) that f(x) = + x 2 is uniformly continuous in (, ). Is f(x) uniformly continuous in (, )? Prove your conclusion.

### REAL AND COMPLEX ANALYSIS

REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

### Geometric intuition: from Hölder spaces to the Calderón-Zygmund estimate

Geometric intuition: from Hölder spaces to the Calderón-Zygmund estimate A survey of Lihe Wang s paper Michael Snarski December 5, 22 Contents Hölder spaces. Control on functions......................................2

### Identication and Control of Nonlinear Systems Using. Neural Network Models: Design and Stability Analysis. Marios M. Polycarpou and Petros A.

Identication and Control of Nonlinear Systems Using Neural Network Models: Design and Stability Analysis by Marios M. Polycarpou and Petros A. Ioannou Report 91-09-01 September 1991 Identication and Control

### 290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f

Numer. Math. 67: 289{301 (1994) Numerische Mathematik c Springer-Verlag 1994 Electronic Edition Least supported bases and local linear independence J.M. Carnicer, J.M. Pe~na? Departamento de Matematica

### MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions.

MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions. Uniform continuity Definition. A function f : E R defined on a set E R is called uniformly continuous on E if for every

### satisfying ( i ; j ) = ij Here ij = if i = j and 0 otherwise The idea to use lattices is the following Suppose we are given a lattice L and a point ~x

Dual Vectors and Lower Bounds for the Nearest Lattice Point Problem Johan Hastad* MIT Abstract: We prove that given a point ~z outside a given lattice L then there is a dual vector which gives a fairly

### Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques

Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques Hung Chen hchen@math.ntu.edu.tw Department of Mathematics National Taiwan University 3rd March 2004 Meet at NS 104 On Wednesday

### Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures

2 1 Borel Regular Measures We now state and prove an important regularity property of Borel regular outer measures: Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon

### REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

### Plan of Class 4. Radial Basis Functions with moving centers. Projection Pursuit Regression and ridge. Principal Component Analysis: basic ideas

Plan of Class 4 Radial Basis Functions with moving centers Multilayer Perceptrons Projection Pursuit Regression and ridge functions approximation Principal Component Analysis: basic ideas Radial Basis

### Richard DiSalvo. Dr. Elmer. Mathematical Foundations of Economics. Fall/Spring,

The Finite Dimensional Normed Linear Space Theorem Richard DiSalvo Dr. Elmer Mathematical Foundations of Economics Fall/Spring, 20-202 The claim that follows, which I have called the nite-dimensional normed

### Summer Jump-Start Program for Analysis, 2012 Song-Ying Li. 1 Lecture 7: Equicontinuity and Series of functions

Summer Jump-Start Program for Analysis, 0 Song-Ying Li Lecture 7: Equicontinuity and Series of functions. Equicontinuity Definition. Let (X, d) be a metric space, K X and K is a compact subset of X. C(K)

### Garrett: `Bernstein's analytic continuation of complex powers' 2 Let f be a polynomial in x 1 ; : : : ; x n with real coecients. For complex s, let f

1 Bernstein's analytic continuation of complex powers c1995, Paul Garrett, garrettmath.umn.edu version January 27, 1998 Analytic continuation of distributions Statement of the theorems on analytic continuation

### HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

### Chapter 5. Measurable Functions

Chapter 5. Measurable Functions 1. Measurable Functions Let X be a nonempty set, and let S be a σ-algebra of subsets of X. Then (X, S) is a measurable space. A subset E of X is said to be measurable if

### Math 209B Homework 2

Math 29B Homework 2 Edward Burkard Note: All vector spaces are over the field F = R or C 4.6. Two Compactness Theorems. 4. Point Set Topology Exercise 6 The product of countably many sequentally compact

### Real Analysis Problems

Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

### Sample width for multi-category classifiers

R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University

### ADDENDUM B: CONSTRUCTION OF R AND THE COMPLETION OF A METRIC SPACE

ADDENDUM B: CONSTRUCTION OF R AND THE COMPLETION OF A METRIC SPACE ANDREAS LEOPOLD KNUTSEN Abstract. These notes are written as supplementary notes for the course MAT11- Real Analysis, taught at the University

### The Erwin Schrodinger International Boltzmanngasse 9. Institute for Mathematical Physics A-1090 Wien, Austria

ESI The Erwin Schrodinger International Boltzmanngasse 9 Institute for Mathematical Physics A-090 Wien, Austria Lipschitz Image of a Measure{null Set Can Have a Null Complement J. Lindenstrauss E. Matouskova

### 1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

### Lecture 5. 1 Chung-Fuchs Theorem. Tel Aviv University Spring 2011

Random Walks and Brownian Motion Tel Aviv University Spring 20 Instructor: Ron Peled Lecture 5 Lecture date: Feb 28, 20 Scribe: Yishai Kohn In today's lecture we return to the Chung-Fuchs theorem regarding

### NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

### Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

### College Algebra Notes

Metropolitan Community College Contents Introduction 2 Unit 1 3 Rational Expressions........................................... 3 Quadratic Equations........................................... 9 Polynomial,

IMC 05, Blagoevgrad, Bulgaria Day, July 9, 05 Problem. For any integer n and two n n matrices with real entries, B that satisfy the equation + B ( + B prove that det( det(b. Does the same conclusion follow

### An upper bound for curvature integral

An upper bound for curvature integral Anton Petrunin Abstract Here I show that the integral of scalar curvature of a closed Riemannian manifold can be bounded from above in terms of its dimension, diameter,

### In the Name of God. Lectures 15&16: Radial Basis Function Networks

1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

### M311 Functions of Several Variables. CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3.

M311 Functions of Several Variables 2006 CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3. Differentiability 1 2 CHAPTER 1. Continuity If (a, b) R 2 then we write

### Principle of Mathematical Induction

Advanced Calculus I. Math 451, Fall 2016, Prof. Vershynin Principle of Mathematical Induction 1. Prove that 1 + 2 + + n = 1 n(n + 1) for all n N. 2 2. Prove that 1 2 + 2 2 + + n 2 = 1 n(n + 1)(2n + 1)

### Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

Economics 24 Fall 211 Problem Set 2 Suggested Solutions 1. Determine whether the following sets are open, closed, both or neither under the topology induced by the usual metric. (Hint: think about limit

### CALCULUS JIA-MING (FRANK) LIOU

CALCULUS JIA-MING (FRANK) LIOU Abstract. Contents. Power Series.. Polynomials and Formal Power Series.2. Radius of Convergence 2.3. Derivative and Antiderivative of Power Series 4.4. Power Series Expansion

### Techinical Proofs for Nonlinear Learning using Local Coordinate Coding

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding 1 Notations and Main Results Denition 1.1 (Lipschitz Smoothness) A function f(x) on R d is (α, β, p)-lipschitz smooth with respect

### Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

### This material is based upon work supported under a National Science Foundation graduate fellowship.

TRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE Avrim L. Blum MIT Laboratory for Computer Science Cambridge, Mass. 02139 USA Ronald L. Rivest y MIT Laboratory for Computer Science Cambridge, Mass. 02139

### Upper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139

Upper and Lower Bounds on the Number of Faults a System Can Withstand Without Repairs Michel Goemans y Nancy Lynch z Isaac Saias x Laboratory for Computer Science Massachusetts Institute of Technology

### Congurations of periodic orbits for equations with delayed positive feedback

Congurations of periodic orbits for equations with delayed positive feedback Dedicated to Professor Tibor Krisztin on the occasion of his 60th birthday Gabriella Vas 1 MTA-SZTE Analysis and Stochastics

### A LITTLE REAL ANALYSIS AND TOPOLOGY

A LITTLE REAL ANALYSIS AND TOPOLOGY 1. NOTATION Before we begin some notational definitions are useful. (1) Z = {, 3, 2, 1, 0, 1, 2, 3, }is the set of integers. (2) Q = { a b : aεz, bεz {0}} is the set

### We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.

### converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

Solve the following 6 problems. 1. Prove that if series n=1 a nx n converges for all x such that x < 1, then the series n=1 a n xn 1 x converges as well if x < 1. n For x < 1, x n 0 as n, so there exists

### 2 J. BRACHO, L. MONTEJANO, AND D. OLIVEROS Observe as an example, that the circle yields a Zindler carrousel with n chairs, because we can inscribe in

A CLASSIFICATION THEOREM FOR ZINDLER CARROUSELS J. BRACHO, L. MONTEJANO, AND D. OLIVEROS Abstract. The purpose of this paper is to give a complete classication of Zindler Carrousels with ve chairs. This

### Math 421, Homework #9 Solutions

Math 41, Homework #9 Solutions (1) (a) A set E R n is said to be path connected if for any pair of points x E and y E there exists a continuous function γ : [0, 1] R n satisfying γ(0) = x, γ(1) = y, and

### THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

### An Introduction to Complex Analysis and Geometry John P. D Angelo, Pure and Applied Undergraduate Texts Volume 12, American Mathematical Society, 2010

An Introduction to Complex Analysis and Geometry John P. D Angelo, Pure and Applied Undergraduate Texts Volume 12, American Mathematical Society, 2010 John P. D Angelo, Univ. of Illinois, Urbana IL 61801.

### 1 Holomorphic functions

Robert Oeckl CA NOTES 1 15/09/2009 1 1 Holomorphic functions 11 The complex derivative The basic objects of complex analysis are the holomorphic functions These are functions that posses a complex derivative

### Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

### Continuity of convex functions in normed spaces

Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

### L p Approximation of Sigma Pi Neural Networks

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 6, NOVEMBER 2000 1485 L p Approximation of Sigma Pi Neural Networks Yue-hu Luo and Shi-yi Shen Abstract A feedforward Sigma Pi neural networks with a

### Most Continuous Functions are Nowhere Differentiable

Most Continuous Functions are Nowhere Differentiable Spring 2004 The Space of Continuous Functions Let K = [0, 1] and let C(K) be the set of all continuous functions f : K R. Definition 1 For f C(K) we

### Analysis on Graphs. Alexander Grigoryan Lecture Notes. University of Bielefeld, WS 2011/12

Analysis on Graphs Alexander Grigoryan Lecture Notes University of Bielefeld, WS 0/ Contents The Laplace operator on graphs 5. The notion of a graph............................. 5. Cayley graphs..................................

### Vectors in many applications, most of the i, which are found by solving a quadratic program, turn out to be 0. Excellent classication accuracies in bo

Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces Bernhard Scholkopf 1, Phil Knirsch 2, Alex Smola 1, and Chris Burges 2 1 GMD

### Singular Integrals. 1 Calderon-Zygmund decomposition

Singular Integrals Analysis III Calderon-Zygmund decomposition Let f be an integrable function f dx 0, f = g + b with g Cα almost everywhere, with b

### Implicit Functions, Curves and Surfaces

Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

### Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Vector Space Basics (Remark: these notes are highly formal and may be a useful reference to some students however I am also posting Ray Heitmann's notes to Canvas for students interested in a direct computational

### Chapter 1. Functions 1.1. Functions and Their Graphs

1.1 Functions and Their Graphs 1 Chapter 1. Functions 1.1. Functions and Their Graphs Note. We start by assuming that you are familiar with the idea of a set and the set theoretic symbol ( an element of

### A note on a construction of J. F. Feinstein

STUDIA MATHEMATICA 169 (1) (2005) A note on a construction of J. F. Feinstein by M. J. Heath (Nottingham) Abstract. In [6] J. F. Feinstein constructed a compact plane set X such that R(X), the uniform

### SOME QUESTIONS FOR MATH 766, SPRING Question 1. Let C([0, 1]) be the set of all continuous functions on [0, 1] endowed with the norm

SOME QUESTIONS FOR MATH 766, SPRING 2016 SHUANGLIN SHAO Question 1. Let C([0, 1]) be the set of all continuous functions on [0, 1] endowed with the norm f C = sup f(x). 0 x 1 Prove that C([0, 1]) is a

### Chapter 8. P-adic numbers. 8.1 Absolute values

Chapter 8 P-adic numbers Literature: N. Koblitz, p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd edition, Graduate Texts in Mathematics 58, Springer Verlag 1984, corrected 2nd printing 1996, Chap.

### PUTNAM PROBLEMS SEQUENCES, SERIES AND RECURRENCES. Notes

PUTNAM PROBLEMS SEQUENCES, SERIES AND RECURRENCES Notes. x n+ = ax n has the general solution x n = x a n. 2. x n+ = x n + b has the general solution x n = x + (n )b. 3. x n+ = ax n + b (with a ) can be

### Lecture 5 - Hausdorff and Gromov-Hausdorff Distance

Lecture 5 - Hausdorff and Gromov-Hausdorff Distance August 1, 2011 1 Definition and Basic Properties Given a metric space X, the set of closed sets of X supports a metric, the Hausdorff metric. If A is

### Journal of Inequalities in Pure and Applied Mathematics

Journal of Inequalities in Pure and Applied Mathematics THE EXTENSION OF MAJORIZATION INEQUALITIES WITHIN THE FRAMEWORK OF RELATIVE CONVEXITY CONSTANTIN P. NICULESCU AND FLORIN POPOVICI University of Craiova

### EASY PUTNAM PROBLEMS

EASY PUTNAM PROBLEMS (Last updated: December 11, 2017) Remark. The problems in the Putnam Competition are usually very hard, but practically every session contains at least one problem very easy to solve

### Average Reward Parameters

Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend

### What to remember about metric spaces

Division of the Humanities and Social Sciences What to remember about metric spaces KC Border These notes are (I hope) a gentle introduction to the topological concepts used in economic theory. If the

### MATH 590: Meshfree Methods

MATH 590: Meshfree Methods Chapter 9: Conditionally Positive Definite Radial Functions Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH

### YET MORE ON THE DIFFERENTIABILITY OF CONVEX FUNCTIONS

PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 103, Number 3, July 1988 YET MORE ON THE DIFFERENTIABILITY OF CONVEX FUNCTIONS JOHN RAINWATER (Communicated by William J. Davis) ABSTRACT. Generic

### A Finite Element Method for an Ill-Posed Problem. Martin-Luther-Universitat, Fachbereich Mathematik/Informatik,Postfach 8, D Halle, Abstract

A Finite Element Method for an Ill-Posed Problem W. Lucht Martin-Luther-Universitat, Fachbereich Mathematik/Informatik,Postfach 8, D-699 Halle, Germany Abstract For an ill-posed problem which has its origin

### MARIA GIRARDI Fact 1.1. For a bounded linear operator T from L 1 into X, the following statements are equivalent. (1) T is Dunford-Pettis. () T maps w

DENTABILITY, TREES, AND DUNFORD-PETTIS OPERATORS ON L 1 Maria Girardi University of Illinois at Urbana-Champaign Pacic J. Math. 148 (1991) 59{79 Abstract. If all bounded linear operators from L1 into a

### and the polynomial-time Turing p reduction from approximate CVP to SVP given in [10], the present authors obtained a n=2-approximation algorithm that

Sampling short lattice vectors and the closest lattice vector problem Miklos Ajtai Ravi Kumar D. Sivakumar IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120. fajtai, ravi, sivag@almaden.ibm.com

### 1. Introduction The nonlinear complementarity problem (NCP) is to nd a point x 2 IR n such that hx; F (x)i = ; x 2 IR n + ; F (x) 2 IRn + ; where F is

New NCP-Functions and Their Properties 3 by Christian Kanzow y, Nobuo Yamashita z and Masao Fukushima z y University of Hamburg, Institute of Applied Mathematics, Bundesstrasse 55, D-2146 Hamburg, Germany,

### Continuity. Chapter 4

Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

### Math 117: Continuity of Functions

Math 117: Continuity of Functions John Douglas Moore November 21, 2008 We finally get to the topic of ɛ δ proofs, which in some sense is the goal of the course. It may appear somewhat laborious to use

### Proposition 5. Group composition in G 1 (N) induces the structure of an abelian group on K 1 (X):

2 RICHARD MELROSE 3. Lecture 3: K-groups and loop groups Wednesday, 3 September, 2008 Reconstructed, since I did not really have notes { because I was concentrating too hard on the 3 lectures on blow-up

### The Uniformity Principle: A New Tool for. Probabilistic Robustness Analysis. B. R. Barmish and C. M. Lagoa. further discussion.

The Uniformity Principle A New Tool for Probabilistic Robustness Analysis B. R. Barmish and C. M. Lagoa Department of Electrical and Computer Engineering University of Wisconsin-Madison, Madison, WI 53706

### A Course in Real Analysis. Saeed Zakeri

A Course in Real Analysis Saeed Zakeri Contents Chapter 1. Topology of Metric Spaces I 5 1.1. Basic definitions 5 1.2. Convergence 11 1.3. Continuity 12 1.4. Completeness 14 1.5. Compactness 17 1.6. Connectedness

### On Coarse Geometry and Coarse Embeddability

On Coarse Geometry and Coarse Embeddability Ilmari Kangasniemi August 10, 2016 Master's Thesis University of Helsinki Faculty of Science Department of Mathematics and Statistics Supervised by Erik Elfving

### 3. After the rst version of this paper had been written, Curt McMullen informed us that he also had a proof of Theorems 1.1 and 1.2. See [McM97] for a

Separated nets in Euclidean space and Jacobians of bilipschitz maps Dimtri Burago Bruce Kleiner y March 31, 1997 Abstract We show that there are separated nets in the Euclidean plane which are not bilipschitz

### 5.3. Compactness results for Dirichlet spaces Compactness results for manifolds 39 References 39 sec:intro 1. Introduction We mean by a spectr

CONVERGENCE OF SPECTRAL STRUCTURES: A FUNCTIONAL ANALYTIC THEORY AND ITS APPLICATIONS TO SPECTRAL GEOMETRY KAZUHIRO KUWAE AND TAKASHI SHIOYA Abstract. The purpose of this paper is to present a functional

### Semi-strongly asymptotically non-expansive mappings and their applications on xed point theory

Hacettepe Journal of Mathematics and Statistics Volume 46 (4) (2017), 613 620 Semi-strongly asymptotically non-expansive mappings and their applications on xed point theory Chris Lennard and Veysel Nezir

### CHARACTERISTIC POINTS, RECTIFIABILITY AND PERIMETER MEASURE ON STRATIFIED GROUPS

CHARACTERISTIC POINTS, RECTIFIABILITY AND PERIMETER MEASURE ON STRATIFIED GROUPS VALENTINO MAGNANI Abstract. We establish an explicit formula between the perimeter measure of an open set E with C 1 boundary

### The rst attempt to consider coupled map lattices with multidimensional local subsystems of hyperbolic type was made by Pesin and Sinai in [PS]. Assumi

Equilibrium Measures for Coupled Map Lattices: Existence, Uniqueness and Finite-Dimensional Approximations Miaohua Jiang Center for Dynamical Systems and Nonlinear Studies Georgia Institute of Technology,

### University of California. Berkeley, CA fzhangjun johans lygeros Abstract

Dynamical Systems Revisited: Hybrid Systems with Zeno Executions Jun Zhang, Karl Henrik Johansson y, John Lygeros, and Shankar Sastry Department of Electrical Engineering and Computer Sciences University

### The p-adic numbers. Given a prime p, we define a valuation on the rationals by

The p-adic numbers There are quite a few reasons to be interested in the p-adic numbers Q p. They are useful for solving diophantine equations, using tools like Hensel s lemma and the Hasse principle,

### MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES JAMES READY Abstract. In this paper, we rst introduce the concepts of Markov Chains and their stationary distributions. We then discuss

### Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017

Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods Coordinate Descent, Directional Search, Derivative-Free Method) Pavel Dvurechensky Alexander Gasnikov

### Jeong-Hyun Kang Department of Mathematics, University of West Georgia, Carrollton, GA

#A33 INTEGERS 10 (2010), 379-392 DISTANCE GRAPHS FROM P -ADIC NORMS Jeong-Hyun Kang Department of Mathematics, University of West Georgia, Carrollton, GA 30118 jkang@westga.edu Hiren Maharaj Department

### ON THE GROUND STATE OF QUANTUM LAYERS

ON THE GROUND STATE OF QUANTUM LAYERS ZHIQIN LU 1. Introduction The problem is from mesoscopic physics: let p : R 3 be an embedded surface in R 3, we assume that (1) is orientable, complete, but non-compact;

### Lecture 32: Taylor Series and McLaurin series We saw last day that some functions are equal to a power series on part of their domain.

Lecture 32: Taylor Series and McLaurin series We saw last day that some functions are equal to a power series on part of their domain. For example f(x) = 1 1 x = 1 + x + x2 + x 3 + = ln(1 + x) = x x2 2