K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS at inænity. As the error surface is deæned by an average over all of the training examples, it is diæc

Size: px
Start display at page:

Download "K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS at inænity. As the error surface is deæned by an average over all of the training examples, it is diæc"

Transcription

1 Journal of Mathematical Systems, Estimation, and Control Vol. 6, No. 2, 1996, pp. 1í18 cæ 1996 Birkhíauser-Boston Local Minima and Attractors at Inænity for Gradient Descent Learning Algorithms æ Kim L. Blackmore y Robert C. Williamson y Iven M.Y. Mareels y Abstract In the paper ëlearning Nonlinearly Parametrized Decision Regions", an online scheme for learning a very general class of decision regions is given, together with conditions on both the parametrization and on the sequence of input examples under which good learning can be guaranteed to occur. In this paper, we discuss these conditions, in particular the requirement that there be no non-global local minima of the relevant error function, and the more speciæc problem of no attractor at inænity. Somewhat simpler suæcient conditions are given. Anumber of examples are discussed. Key words: error function, local minimum, attractor at inænity, learning, decision region AMS Subject Classiæcations: 68T05 1 Introduction In ë3ë we presented a gradient descent based learning algorithm which was motivated by neural network learning algorithms. We gave a deterministic analysis of the convergence properties of the algorithm, using techniques of dynamical systems analysis. In the course of this analysis, we derived conditions under which the algorithm is guaranteed to learn eæectively. The conditions for convergence of the algorithm include restrictions on the topological properties of an associated error surface: speciæcally that it does not have non-global local minima, and does not produce an attractor æ Received November 5, 1993; received in ænal form June 30, Summary appeared in Volume 6, Number 6, y This work was supported by the Australian Research Council. 1

2 K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS at inænity. As the error surface is deæned by an average over all of the training examples, it is diæcult to test conditions on the error surface. The aim of this paper is to develop simpler suæcient conditions which are easier to test in applications. We have had some success in this aim, particularly with the question of attractors at inænity, although the question of existence of local minima remains intractable for some very simple parametrizations. This is not surprising when one considers the value to general optimization theory of identifying functions with no local minima. This question has long been investigated, with few helpful results ë5ë. The results given in this paper suggest considerable diæculty in analysing more complicated neural network parametrizations. However some potentially tractable areas for further analysis are identiæed. We present a number of examples of applying these results to understanding the behaviour of the algorithm for diæerent classes of decision regions. We look at three diæerent parametrizations for the class of half spaces containing the origin, and show that two of these satisfy the conditions for convergence, but the third does not. The third parametrization is shown to produce an attractor at inænity. This is reminiscent of behaviour observed in neural network learning, where estimate parameters may drift oæ to inænity. This exercise shows that the large scale behaviour of the learning algorithm is very much dependent on the choice of parametrization. The fact that such diæerent behaviour is observed strongly suggests èbut of course does not proveè that for more complex decision regions an even wider range of behaviours may be apparent with diæerent parametrizations of the decision boundary. We also look at a problem motivated by a radar problem, where the decision regions are stripes, and at decision regions which are an intersection of two half spaces. We show that there is no attractor at inænity for our parametrization of an èapproximateè intersection of two half spaces. In the following section we outline the algorithm presented in ë3ë and state the conditions for convergence that we will be discussing. In section 3 we assume that the example points are uniformly distributed. Section 3.1 contains conditions under which both local minima and attractors at inænity are excluded, whereas section 3.2 focuses on the exclusion of attractors at inænity when local minima may ormay not be present. In section 4 we look at the application to learning half spaces, in section 5 we lookatthe problem of learning stripes, and in section 6 we discuss intersections of half spaces. In section 7 we raise the question of whether additional problems are occur when the examples are nonuniformly distributed, and we show that the results of section 3 must be modiæed. Section 8 concludes. 2

3 GRADIENT DESCENT LEARNING ALGORITHMS 2 Problem Formulation and Previous Results It is assumed that a sequence of data samples èèx k ;y k èè k2zz + are received, where x k 2 X IR n and y k 2f,1;1g. Xis called the sample space and y k indicates the membership of x k in some decision region æ X. If x k is contained in æ then y k = 1; otherwise y k =,1. We assume that æ belongs to a class C of subsets of X for which there exists some parameter space A IR m and some epimorphism èonto mappingè æ : A! C. Moreover, it is assumed that there exists a parametrization f for C, deæned below. Deænition 2.1 Let C be a class of closed subsets of X with parameter space A = IR m and some epimorphism æ:a!c. Aparametrization of C is a function f : A æ X! IR, such that for all a 2 A fèa; xè 8 é : é 0 if x 2 interior of æèaè =0 if x 2 boundary of æèaè é 0 if x 62 æèaè è2.1è In addition, fèa; xè is required to be twice diæerentiable with respect to a, on A æ are tobebounded inacompact domain; and f 2 must be Lipschitz continuous in x inacompact domain. The algorithm proposed in ë3ë is as follows: Algorithm 2.2 Step 0: Choose the stepsize : 2 è0; 1è. Choose a boundary sensitivity parameter: " 2 è0; 1è. Choose an initial parameter estimate: a 0 2 A. Step 1: Commencing at k =0, iterate the recursion below: where a k+1 = æ æ èak ;x k è ègèa k ;x k è,y k è; è2.2è gèa; xè := 2 fèa; xè arctan : è2.3è " In ë3ë, algorithm 2.2 is be shown to be a perturbation of stepwise gradient descent of the cost function Jèaè= lim K!1 1 K K,1 X k=0 fèa; x k èègèa; x k è, gèa æ ;x k èè, " ln 1+ fèa; x kè 2 3 " 2 è2.4è

4 K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS where a æ 2 A is the true parameter vector. Based on this fact, we derived in ë3ë two sets of conditions, either of which guarantees that the estimate parameters a k asymptotically enter and remain in a neighbourhood of the parameters of the true decision region. This implies that after suæciently many iterations, the misclassiæed region is very small. One of the conditions in the ærst set èassumption B2 in ë3ëè is that the cost function J has a unique critical point ina. If a = a æ then each term in is zero. Therefore the true parameter aæ is always a critical point ofj, and B2 says that a æ is the only critical point ofj. Assumption B2 cannot hold when there is more than one a for which fèa; æè fèa æ ; æè. This occurs in many important examples where there is symmetry in the parametrization. We say there is more than one true parameter value. In ë3ë we showed that B2 is not necessary for eæective learning, and it is suæcient to replace it with two conditions in the second set: èc3è local minima of J only occur at the points where fèa; æè fèa æ ; æè and èc4è for all a 0 2 A, the solution of the initial value problem èivpè æ ; aè0è = a 0 è2.5è aètè does not cross the boundary of A, where A is assumed to be compact. That is, aètè 2 A for all t 0. For many interesting examples, A =IR m for some mé0, in which case C4 says that, for all a 0 2 A the solution of è2.5è is bounded for all t 0. We say in that case that there is no attractor at inænity for the ordinary diæerential equation in è2.5è. If C4 is violated, the estimate parameters generated by algorithm 2.2 may drift oæ to inænity as the algorithm updates. This undesirable behaviour has been observed in neural network learning, and in practice it would be good to anticipate this divergence, and avoid it if possible. For a given parametrization, it is not clear how to test conditions on the cost function J. This is due in part to the averaging involved in deæning J, and partly due to the application of the arctan squashing function to the parametrization before taking the diæerence between fèa; æè and fèa æ ; æè. 3 Results In this section we ignore the inæuence that diæerent input sequences can play inthe learning. This is achieved by assuming that the input examples èx k è k2zz + cover the sample space X. By this we mean that for any integrable function f : X! IR, K,1 1 X Z 1 lim fèx k è = f èxèdx: è3.1è K!1 K vol X X k=0 4

5 GRADIENT DESCENT LEARNING ALGORITHMS R Here vol X = dx. This is essentially a deterministic way ofsaying that X the input examples are uniformly distributed. In particular, this means that Z 1 Jèaè = fèa; xèègèa; xè, gèa æ ;xè, " 1+ vol X ln fèa; xè2 x " 2 dx: è3.2è 3.1 Local minima First we deal with condition B2, that the cost function J has a unique critical point a æ. This is guaranteed if J is strictly increasing along rays originating at a æ. That is, the directional derivative atany point a in the direction a, a æ should be positive. Deænition 3.1 The directional derivative of J : A! IR at point a 2 A and in direction b 2 A, is dj é D b Jèaè:= da æ b: è3.3è a Similarly, the directional derivative of f : AæX! IR with respect to a 2 A at point èa; xè 2 A æ X, in direction b 2 A, is D b fèa; xè é æ b: èa;xè è3.4è Note that D 0 Jèaè =D 0 fèa; xè = 0 for any values of a and x. Theorem 3.2 and corollary 3.3 use the directional derivative to give suæcient conditions that B2 is satisæed. For brevity, we use the notation Fèa; xè := D a,a æfèa; xèèfèa; xè, fèa æ ;xèè è3.5è J èa; xè := D a,a æfèa; xèègèa; xè, gèa æ ;xèè; è3.6è where g is deæned by è2.3è. Note that Fèa æ ;xè=jèa æ ;xè=0. For any a 2 IR m we partition X into the two subsets S a := fx 2 X : Fèa; xè 0g and T a := XnS a. Theorem 3.2 Assume X IR n is compact and èx k è covers X. Let f : IR m æ X! IR beaparametrization of some class of decision regions, and assume a æ 2 IR m. If, for any a 2 IR m, there exists a set R a S a such that inf x2r a Fèa; xè vol R a é è fèa; xè 1+ sup x2r a " 2! sup x2t a jfèa; xèj vol T a ; æ a =0if and only if a = a æ, where J is deæned by æ 5 è3.7è

6 K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS The proof of this theorem is given in the appendix. The regions S a and T a can be interpreted as ëgood" and ëbad" regions respectively. If the current estimate parameter is a k, and x k falls in S ak, then gradient descent on the instantaneous cost, fèa k ;x k èègèa k ;x k è, gèa æ ;x k èè, " ln 1+ fèa k;x k è 2 ", will cause an update which brings the estimate closer to the true parameter vector. Thus if x k 2 S ak èand is 2 suæciently smallè, then ka k+1, a æ kka k,a æ k, where a k+1 is determined by è2.2è. However, if x k is chosen in T ak, then the estimate parameters will move away from the true parameters. Averaging relies on the idea that the eæect of the erroneous updates is negligible. This requires that for any a 2 A =IR m, the volume of T a is small compared to that of S a, and updates made when x k falls in T a are not signiæcantly larger than when x k falls in S a. The following corollary deals with the special case where the volume of T a is zero. Corollary 3.3 Assume X IR n is compact and èx k è covers X. Let f : IR m æ X! IR beaparametrization of some class of decision regions, and assume a æ 2 IR m. If, for any a 2 IR m, there exists a set U a X such that the closure of U a is X and Fèa; xè é 0 for all x 2 U a a =0if and only if a = a æ, where J is deæned by è2.4è. Proof: From the deænition of S a, it is clear that U a S a X, but the closure of U a is X, sos a =X.Therefore vol S a = vol X and vol T a =0 for all a 2 IR m. Because Fèa; xè é 0 for all x 2 U a, there exists a set R a U a such that vol R a é 0 and inf Ra Fèa; xè é 0. Thus Theorem 3.2 applies. The above results rely on showing that J is strictly increasing along rays in parameter space originating at a æ. If this is relaxed to J being nondecreasing along the rays, we mayhave critical points of J which are not at a æ, so B2 is violated. However, provided J is not constant along the rays, the behaviour of the algorithm will not be signiæcantly altered by this relaxation, because the critical points will not be local minima of J, and all solutions of the IVP è2.5è will be bounded for all t 0. That is, assumptions C3 and C4 will be satisæed. As shown in ë3ë, this is suæcient for convergence of the algorithm. To this point, we have only considered situations where there is a unique true parameter vector. If this is not the case, B2 cannot hold, and there must be regions where the directional derivative ofjalong rays originating at any of the true parameter vectors will be negative. In this case it is much more diæcult to test for local minima. We may have some success if the true parameter vectors are isolated and countable, and IR m can be 6

7 GRADIENT DESCENT LEARNING ALGORITHMS partitioned into convex sets æ i, each containing exactly one true parameter vector, a æi. Then if the directional derivative along all rays originating at a æi is positive atallpoints in æ i, for all possible i, then there cannot be any local minima of J, and there is no attractor at inænity. Thus C3 and C4 are satisæed. In practice, it is generally diæcult to determine the sets æ i, so this is probably not a useful result. 3.2 Attractors at inænity Even when local minima of the cost function exist, it is useful to know that there is no attractor at inænity. Algorithm 2.2 uses a ænite step size and gradient descent on the instantaneous cost, rather than the average cost J, so the estimate parameters will jump around rather than moving smoothly to the minima of J. Moreover, they will continue to jump around even when they are in a local minimum, or even at the global minimum of J. Large deviations theory suggests that estimate parameters will eventually escape from the local minima èunder some additional assumptionsè ë1ë. They may also escape from the global minimum, though this is harder because the size of the updates is smaller when the value of J is closer to zero. If there is an attractor at inænity, large deviations theory suggests that estimate parameters will eventually appear in the basin of attraction of this attractor. They will then head oæ to inænity and may become so large that no amount of jumping around will cause them to return to the basin of attraction of the global minimum. So if there is an attractor at inænity then ultimately the estimate parameters will converge there, even though the value of J at this attractor may be large. If there is no attractor at inænity, all local minima of J are constrained to lie within some compact set. Parameters will not wander oæ to inænity but will spend most of their time near a local or global minimum of J. It is much less likely that the estimate parameters will leave a global minimum than a local minimum èsince the cost driving the algorithm will be lessè, so the estimate parameters are likely to spend most of their time near the global minimum. A classic result dealing with attractors at inænity gives the following èsee page 204 of ë4ëè: Lemma 3.4 Let J : IR m! ë0; 4è have continuous second derivatives. If J,1 èë0;cëè is compact for all c 2 ë0; 4è, a 6= 0 except at a ænite number of points a æ1 ;:::;a ær 2 IR m, then for all a 0 2 A, the solution of the IVP è2.5è is bounded for all t 0. So, under the conditions of Lemma 3.4, if A =IR m then assumption C4 is satisæed. Using Lemma 3.4 involves determining the lower level sets 7 æ

8 K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS J,1 èë0;cëè, which is not a simple problem. Essentially the idea here is that the cost function keeps increasing as the parameter a goes to inænity inany direction. We are not worried about the shape of the cost surface for ænite parameter values, but only as the parameter becomes large. Therefore we use the following, more easily tested result. Lemma 3.5 Let J : IR m! ë0; 4è have continuous second derivatives and Jèa æ è=0for some a æ 2 IR m. If there exists a constant C é 0 such that D a,a æjèaè é 0 for all a 2 IR m, kak C then for all a 0 2 A, the solution of the IVP è2.5è is bounded for all t 0. Proof: Let aètè 2 IR m. If kaètèk Cthen D a,a æjèaè é 0 so the solution of the IVP è2.5è moves towards a æ, in the sense that kaèt + "è, a æ k kaètè, a æ k for all suæciently small "é0. Thus all solutions of the IVP è2.5è enter and remain in the closed ball with centre a æ and radius C. Lemmas 3.4 and 3.5 are true for any functions J which satisfy the stated assumptions. We can now use the ideas of the last section to determine assumptions which are speciæc to the cost function J deæned in è2.4è. Theorem 3.6 Assume X IR n is compact and èx k è covers X. Let f : IR m æ X! IR beaparametrization of some class of decision regions, and assume a æ 2 IR m. If there exists a constant C é 0 such that, for all a 2 IR m, kak C, there exists a set R a S a such that inf x2r a Fèa; xè vol R a é è fèa; xè 1+ sup x2r a " 2! sup x2t a jfèa; xèj vol T a ; è3.8è then for all a 0 2 A, the solution of IVP è2.5è is bounded for all t 0, where J is deæned in è2.4è. Proof: The proof of this result follows similar lines to the proof of Theorem 3.2. In order to show that there is no attractor at inænity, we are only ever interested in what happens for large kak. For some cases it may be suæcient to only calculate the limiting behaviour of Fèca; xè as c!1, which is often much simpler than calculating Fèa; xè for ænite values of a. In particular, if lim c!1 Fèca; xè exists for all a and x, is never negative, and for each a lim c!1 Fèca; xè is positive for some values of x then Theorem 3.6 will hold. Or if, for each a, Fèca; xè!1for some values of x, and lim c!1 Fèa; xè exists for all other values of x then Theorem 3.6 will hold. Thus we have the following results. 8

9 GRADIENT DESCENT LEARNING ALGORITHMS Theorem 3.7 Assume X IR n is compact and èx k è covers X. Let f : IR m æ X! IR beaparametrization of some class of decision regions, and assume a æ 2 IR m. If 1. lim c!1 Fèca; xè 0 for all a 2 A; a 6= 0; x2x. 2. For all a 2 IR m ; a 6= 0, there exists a set V a X and a constant ré0such that vol V a 6=0and lim c!1 inf x2va Fèca; xè r. then for all a 0 2 A, the solution of IVP è2.5è is bounded for all t 0, where J is deæned in è2.4è. Proof: Choose a 2 IR m such that kak =1. Deæne æ a := 2vol X r vol V a : è3.9è fèca;xè 1 + sup cé0 sup 2 x2va " 2 The ærst assumption implies that sup x2tca jfèca; xèj! 0 as c! 1, so there exists a constant C æa such that sup jfèca; xèj éæ a 8cC æa : è3.10è x2t ca Assumption 2 implies that there exists a constant C a such that inf x2v a Fèca; xè r 2 8c C a : è3.11è Let C = sup kak=1 fc æa ;C a g. For any a 2 IR n,we can write a = c n a n, where c n = kak and a n = kak a. By equation 3.10, if kak éc then sup x2t cnan jfèc n a n ;xèj é r vol R a fèca n ;xè 2 è3.12è 2vol X 1 + sup sup cé0 x2r a " 2 è3.13è where R a = V an. By deænition, vol T a vol X, so equation 3.11 implies that sup x2t a jfèa; xèj Thus Theorem 3.6 holds. é inf x2ra Fèa; xè vol R a : è3.14è fèa;xè vol T a 1 + sup 2 x2ra " 2 9

10 K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS Theorem 3.8 Assume X IR n is compact and èx k è covers X. Let f : IR m æ X! IR beaparametrization of some class of decision regions, and assume a æ 2 IR m. If, for all a 2 IR m ; a 6= 0, 1. There exists r a é 0 such that lim c!1 Fèca; xè,r a for all x 2 X. 2. There exists a set V a X such that vol V a 6=0and lim c!1 inf x2va Fèca; xè =1. then for all a 0 2 A, the solution of IVP è2.5è is bounded for all t 0, where J is deæned in è2.4è. Proof: The proof follows along similar lines to Theorem 3.7, the main diæerence being that here we can choose inf x2ra Fèa; xè as large as we wish, whereas in Theorem 3.7 we can choose sup x2ta jfèa; xèj as small as we wish. Converse results, giving conditions under which solutions of è2.5è are unbounded èj has an attractor at inænityè, can also be given. Theorem 3.8 will be used in sections 5 and 6 to show that for particular parametrizations of a stripe and the èapproximateè intersection of two half spaces there is no attractor at inænity. 4 Application Learning a Half Space We consider decision regions which are half spaces in IR n containing the origin. Three diæerent parametrizations will be discussed. The ærst is the natural choice of parametrization, and there appears to be no good reason why either of the other two would be chosen. However, for more complicated classes of decision regions, it can be diæcult to ænd a suitable parametrization, and given two candidate parametrizations it is not immediately apparent which one is preferable. By looking at diæerent parametrizations for a half space, we illustrate some of the issues that must be considered in choosing a parametrization. The natural choice for a parametrization of the halfspace a é x +1é0 is f 1 èa; xè = a é x +1; è4.1è where the parameter space A =IR n. This is the parametrization used in single node artiæcial neural networks. In this case, the directional derivative satisæes F 1 èa; xè = èèa, a æ è é xè 2 ; è4.2è 10

11 GRADIENT DESCENT LEARNING ALGORITHMS which is never negative, so Corollary 3.3 holds. It turns out that the cost function J 1 induced by f 1 is convex. Another suitable parametrization is f 2 èa; xè = 1, e,pèaé x+1è : è4.3è Note that in this case the magnitude of f 2 èa; xè will be much larger for points x outside the decision region than for those an equal distance inside the decision region. Nonetheless, the decision regions deæned by f 1 èa; æè and f 2 èa; æè are identical. Taking the directional derivative, we see that F 2 èa; xè = pèa, a æ è é xèe pèa,aæ è éx, 1èe,2pèaé x+1è ; è4.4è which is again never negative, since e z é 1 èè zé0. Again, Corollary 3.3 shows that the cost function induced by f 2 has a unique critical point. Figure 1: The cost function J 3 induced by f 3 when a æ =,3, X =ë,1;1ë, " =0:001 and example points èx k ècover X. The average was determined by simple numerical integration. Now consider the parametrization f 3 èa; xè = èa é x +1èe,èaé x+1è 2 : è4.5è For a particular a, the decision region identiæed by f 3 is identical to that identiæed by f 1. However in most simulations using this parametrization, estimate parameters drift oæ towards inænity. This suggests that there 11

12 K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS is an attractor at inænity generated by the parametrization è4.5è. directional derivative satisæes The F 3 èa; xè = èa, a æ è é xè1, 2èa é x +1è 2 èe,èaé x+1è 2 èa é x +1èe,èaé x+1è 2, èa æé x +1èe,èaæé x+1è 2 : è4.6è So for any a 2 IR n, in the limit c!1, F 3 èca; xè! 2c 3 èa æé x + 1èèa é xè 3 e,c2 èa é xè 2,èa æé x+1è 2! 0 è4.7è for all x 2 X such that a é x 6= 0. Thus in the limit c!1,s ca = fx 2 X : èa æé x +1èa é x0gand T ca = fx 2 X :èa æé x +1èa é xé0g. We could try to prove there is an attractor at inænity by ænding opposite bounds to those in Theorem 3.6. That is, we would try to show that for any a 2 IR m there exists a set W a X and a constant ré0such that vol W a 6= 0 and lim c!1 F 3 èca; xè,rfor all x 2 W a. But è4.7è shows that no such r can be found in this case. Thus we have not been able to prove that there is always an attractor at inænity. However, for any one dimensional or two dimensional example, the cost function J 3 induced by f 3 can be plotted. Figure 1 shows the plot of J 3 for a particular one dimensional case. It can be seen from the ægure that J 3 has one non global local minimum and furthermore is decreasing èalbeit slowlyè as kak goes to inænity. 5 Application Learning a Stripe Next consider the parametrization f 4 èa; xè =, èa é x +1è 2 ; è5.1è where a; x 2 IR n and né0 is some æxed constant. The decision regions identiæed by f 4 are ëstripes" in IR n, where æèaè is of width 2p kak and is normal to a. Such decision regions arise in a radar problem. At a, the directional derivative is F 4 èa; xè = 4èèa, a æ è é xè 2 èa é x +1è 1 2 èa+aæ è é x+1 : è5.2è This is nonnegative ifèa é x+ 1èè 1 2 èa + aæ è é x +1è0, and negative otherwise. Figure 2 depicts the regions S a and T a for a particular choice of estimate parameter a and true parameter a æ, when X IR 2 and A =IR 2. In the limit C!1, F 4 èca; xè! 2c 4 èa é xè 4! 0 è5.3è 12

13 GRADIENT DESCENT LEARNING ALGORITHMS for all x 2fx2X:a é x6=0gand if a é x = 0 then F 4 èa; xè =,4èa æé xè aæé x +1 : è5.4è Compactness of X implies that s æ = sup x2x a æé x exists, so assumption 1 of Theorem 3.8 is satisæed, with r a =4s æ2 è 1 2 sæ + 1è. Setting V a = fx 2 X : ja é xj"gfor some "é0, it is clear that assumption 2 of Theorem 3.8 is also satisæed, so there is no attractor at inænity for this parametrization. T a Sa X T a x+1= 0 Σ(a) Σ( a*) 1 2 T (a+a*) x+1= 0 Figure 2: The ëgood", and ëbad" regions S a and T a for a particular choice of a and a æ. Here the sample space is X = ë,1; 1ë 2, the true parameter vector is a æ =è0;4è, the estimate is a =è2;0è, and the width is determined by =0:04. The dark shaded region is the true decision region, and the light shaded region is the estimate decision region. T a is the diagonally hashed area, and S a is the rest of X. As an aside, note that if the input sequence èx k è consists entirely of positive examples èy k = +1è, then X = æèa æ è. Then S a is much larger than T a for any choice of a, while the size of the integrand is of the same order in both S a and T a. This is interesting because it explains why in simulations the estimate parameters converge much more smoothly if only positive examples are used in training. This property is a special feature of this particular class of decision regions, and certainly does not apply in general. Generally one would probably not get convergence at all with solely positive examples. 13

14 K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS 6 Application Learning an Intersection of Two Half Spaces Now consider decision regions which are intersections of half spaces containing the origin. If X IR n, the natural choice of parameter space is IR 2n. As in ë3ë, we use the parametrization f p èa; xè = 1, e,pèné 1 x+1è, e,pèné 2 x+1è ; è6.1è where a = vecèn 1 ;n 2 è, n 1 ;n 2 2IR n. There are two true parameter vectors in this case, vecèn æ 1 ;næ 2è and vecèn æ 2 ;næ 1è, so the assumptions of Theorem 3.2 do not hold. The directional derivative satisæes F p èa; xè = pèn 1, n æ 1è é xèe pèn1,næ 1 èéx, 1èe,2pèné 1 x+1è + pèn 1, n æ 1è é xèe pèn2,næ 2 èéx, 1èe,2pè n 1 +n pèn 2, n æ 2è é xèe pèn1,næ 1 èéx, 1èe,2pè n 1 +n 2 2 é x+1è é x+1è + pèn 2, n æ 2è é xèe pèn2,næ 2 èéx, 1èe,2pèné 2 x+1è : è6.2è Due to the nonlinear, coupled nature of this equation, there appears to be no simple way of describing the regions S a and T a. The ærst and last terms are always positive, but the middle terms can be negative. All terms will be positive if sgn èn 1,n æ 1è é x = sgn èn 2,n æ 2è é x. So if èn 1,n æ 1è=cèn 2,n æ 2è for some positive constant c, then F p èa; xè 0 for all x 2 X. That is, S a = X if a 2 n èn 1 ; n1,næ 1 c + n æ 2è:c2IR o. As in the previous example, we can not show there are no local minima, but we can show that there is no attractor at inænity. Let a = vecèn 1 ;n 2 è 2 IR 2n, where either n 1 may equal 0 or n 2 may equal 0, but not both. Deæne æ := pn é 1 x and æ := pné 2 x. In the limit c!1,fèca; xè takes on the following values: Case 1. æé0 and æé0 F p èca; xè! 2ècæe,cæ + cæe,cæ è! +0 è6.3è Case 2. æé0 and æé0 F p èca; xè!,cæe,2cæ! +1 è6.4è Case 3. æé0 and æé0 F p èca; xè! è,cæe,cæ, cæe cæ èèe,cæ + e,cæ è! +1 è6.5è Case 4. æ = 0 and æé0 F p èca; xè!,pn æé 1 xèe,pnæé 1 x, 1èe,2p 0 è6.6è 14

15 GRADIENT DESCENT LEARNING ALGORITHMS Case 5. æ = 0 and æé0 F p èca; xè!,cæe,2cæ! +1 è6.7è Case 6. æ = 0 and æ =0 F p èca; xè =,pèn æ 1+n æ 2è é xèe,pnæé 1 x + e,pnæé 2 x, 2èe,2p è6.8è Let s 1 = sup x2x n æé 1 x and s 2 = sup x2x n æé 2 x. Setting r a =,pès 1 + s 2 èèe,ps1 + e,ps2, 2èe,2p and V a = fx 2 X : n é 1 x " or né 2 x "g, for some "é0, the assumptions of Theorem 3.8 are satisæed. 7 Nonuniformly Distributed Examples The results and examples given so far in this paper have all assumed that the examples are such that èx k ècovers X. We have given conditions which guarantee that there is a unique critical point of the cost function, or that no attractors at inænity exist. But these conditions may not be suæcient if the examples do not cover X. From our deænitions of the sets S a and T a in section 3, we see that it is often possible to construct malicious input sequences which will force the estimate parameters generated by algorithm 2.2 to diverge to inænity. If the parametrization is such that T a 6= ; for all a 2 A, a suitable malicious sequence is achieved by choosing x k 2 T ak for each k 0. In a similar vein, Sontag and Sussman ë7ë have given an example of a particular sequence of input examples which will cause the error surface for a simple neural network learning problem to have non global local minima. These examples require special choices of the examples which vary with k, so this is quite a large departure from our original assumption that èx k ècovers X. A smaller relaxation of the covering assumption is that the examples are i.i.d. èindependent and identically distributedè with some nonuniform distribution. In this case the results of section 3 no longer apply. Assume that x k are i.i.d. random variables with some known distribution P èxè :X!ë0; 1ë. The cost function J becomes Jèaè = 1 vol X Z X fèa; xèèg " èa; xè, g " èa æ ;xèè, " 1+ ln fèa; xè2 " 2 dp èxè: è7.1è Results similar to those in section 3 can be derived, using inf P èxè and sup P èxè on the sets S a and T a. Whether or not the nature of the cost surface can be changed signiæcantly by the choice of P èæè is unclear. Do 15

16 K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS there exist parametrizations for which there is no attractor at inænity when examples are uniformly distributed, but there is an attractor at inænity if examples are chosen according to some other distribution? If this is the case then satisfying the conditions for uniformly distributed points does not always guarantee good convergence if the points are nonuniformly distributed. 8 Conclusions In this paper we have investigated conditions guaranteeing convergence of an algorithm for learning nonlinearly parametrized decision regions, discussed in ë3ë. The conditions given in ë3ë involve a cost function, where the average is taken over the input examples. The conditions are hard to test directly, so our aim here has been to develop simpler suæcient conditions which can be tested for a particular parametrization. The approach taken has been to relate the directional derivative of the cost function to the directional derivative of the parametrization, and integrate over the entire sample space. A number of examples of the application of the new conditions to particular parametrizations have been given, and some interesting insights have been gained. We have shown that even half spaces can be parametrized in a way which will cause the learning algorithm to fail, though for the obvious linear parametrization, and for some nonlinear parametrizations of a half space, the algorithm will work. We have been able to explain why training with only positive examples gives smooth convergence for the parametrization of a stripe. The parametrization of an intersection of half spaces which was given in ë3ë has been investigated further, and it was shown that parameters will not drift oæ to inænity if points are uniformly distributed. Appendix Proof of Theorem 3.2 Let a 2 IR m. By the deænition of a parametrization, both are bounded on a compact domain, so inf Ra Fèa; xè and sup Ta jfèa; xèj are both ænite. In this section we write, for instance, inf Ra Fèa; xè, rather than inf x2ra Fèa; xè for the inæmum over a subset of X, and similarly for supremum. By è3.2è, the directional derivative ofjin the direction a, a æ is D a,a æjèaè = 1 vol X Z S a J èa; xèdx + 1 vol X Z T a J èa; xèdx: èa.1è Note that the smoothness property of f, and hence g, has been used to interchange the order of diæerentiation and integration. 16

17 GRADIENT DESCENT LEARNING ALGORITHMS It is clear from the deænitions of F and J that for any a 2 IR m, x 2 X, J èa; xè = gèa; xè, gèaæ ;xè fèa; xè, fèa æ Fèa; xè: ;xè èa.2è Because arctan is a strictly monotonic function, sgn ègèa; xè, gèa æ ;xèè = sgn èfèa; xè, fèa æ ;xèè. Thus for any a 2 IR m, x 2 X, sgn J èa; xè = sgn Fèa; xè. In particular, for any x 2 S a, J èa; xè is nonnegative, and for any x 2 T a, J èa; xè is negative. From the intermediate value theorem, we know that for any U X, But gèa; xè, gèaæ ;xè fèa; xè, fèa æ ;xè = 2 èa;xè " " 2 + fèa; xè 2 é 2 " èa.3è èa.4è always, so jj èa; xèj é 2 " jfèa; xèj for any values of a and x. Similarly, for 2" any x 2 U X, jj èa; xèj è" 2 +sup R f 2 èjfèa; xèj. Now choose a 2 A, a 6= a æ. From èa.1è, D a,a æjèaè vol X Z R a J èa; xèdx + inf R a J èa; xè Z Z T a J èa; xèdx dx, sup R a T a jj èa; xèj 2" inf R a è" 2 + sup Ra f 2 è Fèa; xè vol R a 2, sup T a " jfèa; xèj vol T a 2 " inf R a Fèa; xè vol R a sup Ra èf="è 2, 2 " sup jfèa; xèj vol T a T a é 0 by assumption. æ a 6=0ifa6=a æ as required. Z T a dx ë1ë M.R. Frater, R.R. Bitmead, and C.R. Johnson, Jr. Escape from stable equilibria in blind adaptive equalization, Conf. on Decision and Control, Tuscon, Arizona, December 1992, 1756í

18 K.L. BLACKMORE, R.C. WILLIAMSON, I.M.Y. MAREELS ë2ë G.C. Goodwin and K.S. Sin. Adaptive Filtering Prediction and Control. New York: Prentice-Hall, ë3ë K.L. Blackmore, R.C. Williamson, and I.M.Y. Mareels. Summary: Learning nonlinearly parametrized decision regions, J. Math. Systems, Estimation, and Control 6è1è è1996è, Full length paper available electronically. ë4ë M.W. Hirsch and S. Smale. Diæerential Equations, Dynamical Systems, and Linear Algebra. New York: Academic Press, ë5ë R. Horst and P.T. Thach. A topological property of limes-arcwise strictly quasiconvex functions, Jour. Math. Analysis and Applications 134 è1988è, 426í430. ë6ë M. Minsky and S. Papert. Perceptrons: An Introduction to Computational Geometry. Cambridge: MIT Press, ë7ë E.D. Sontag and H.J. Sussmann. Back propagation can give rise to spurious local minima even for network without hidden layers, Complex Systems 3 è1989è, 91í106. Department of Engineering, Australian National University, Canberra ACT 0200, Australia Department of Engineering, Australian National University, Canberra ACT 0200, Australia Department of Engineering, Australian National University, Canberra ACT 0200, Australia Communicated by David Hill 18

to test in applications. We have had some success in this aim, particularly with the question of attractors at innity, although the question of existe

to test in applications. We have had some success in this aim, particularly with the question of attractors at innity, although the question of existe Local Minima and Attractors at Innity for Gradient Descent Learning Algorithms Kim L. Blackmore Robert C. Williamson Iven M.Y. Mareels June 1, 1995 Abstract In the paper \Learning Nonlinearly Parametrized

More information

also has x æ as a local imizer. Of course, æ is typically not known, but an algorithm can approximate æ as it approximates x æ èas the augmented Lagra

also has x æ as a local imizer. Of course, æ is typically not known, but an algorithm can approximate æ as it approximates x æ èas the augmented Lagra Introduction to sequential quadratic programg Mark S. Gockenbach Introduction Sequential quadratic programg èsqpè methods attempt to solve a nonlinear program directly rather than convert it to a sequence

More information

where Sènè stands for the set of n æ n real symmetric matrices, and æ is a bounded open set in IR n,typically with a suæciently regular boundary, mean

where Sènè stands for the set of n æ n real symmetric matrices, and æ is a bounded open set in IR n,typically with a suæciently regular boundary, mean GOOD AND VISCOSITY SOLUTIONS OF FULLY NONLINEAR ELLIPTIC EQUATIONS 1 ROBERT JENSEN 2 Department of Mathematical and Computer Sciences Loyola University Chicago, IL 60626, U.S.A. E-mail: rrj@math.luc.edu

More information

A.V. SAVKIN AND I.R. PETERSEN uncertain systems in which the uncertainty satisæes a certain integral quadratic constraint; e.g., see ë5, 6, 7ë. The ad

A.V. SAVKIN AND I.R. PETERSEN uncertain systems in which the uncertainty satisæes a certain integral quadratic constraint; e.g., see ë5, 6, 7ë. The ad Journal of Mathematical Systems, Estimation, and Control Vol. 6, No. 3, 1996, pp. 1í14 cæ 1996 Birkhíauser-Boston Robust H 1 Control of Uncertain Systems with Structured Uncertainty æ Andrey V. Savkin

More information

LECTURE Review. In this lecture we shall study the errors and stability properties for numerical solutions of initial value.

LECTURE Review. In this lecture we shall study the errors and stability properties for numerical solutions of initial value. LECTURE 24 Error Analysis for Multi-step Methods 1. Review In this lecture we shall study the errors and stability properties for numerical solutions of initial value problems of the form è24.1è dx = fèt;

More information

322 HENDRA GUNAWAN AND MASHADI èivè kx; y + zk çkx; yk + kx; zk: The pair èx; kæ; ækè is then called a 2-normed space. A standard example of a 2-norme

322 HENDRA GUNAWAN AND MASHADI èivè kx; y + zk çkx; yk + kx; zk: The pair èx; kæ; ækè is then called a 2-normed space. A standard example of a 2-norme SOOCHOW JOURNAL OF MATHEMATICS Volume 27, No. 3, pp. 321-329, July 2001 ON FINITE DIMENSIONAL 2-NORMED SPACES BY HENDRA GUNAWAN AND MASHADI Abstract. In this note, we shall study ænite dimensional 2-normed

More information

P.B. Stark. January 29, 1998

P.B. Stark. January 29, 1998 Statistics 210B, Spring 1998 Class Notes P.B. Stark stark@stat.berkeley.edu www.stat.berkeley.eduèçstarkèindex.html January 29, 1998 Second Set of Notes 1 More on Testing and Conædence Sets See Lehmann,

More information

Regions. Abstract. decision regions. The only input required is a sequence ((xk yk)) k2z +

Regions. Abstract. decision regions. The only input required is a sequence ((xk yk)) k2z + Learning Nonlinearly Parametrized Decision Regions Kim L. Blackmore Robert C. Williamson Iven M. Y. Mareels June 21, 1995 Abstract In this paper we present a deterministic analysis of an online scheme

More information

U e = E (U\E) e E e + U\E e. (1.6)

U e = E (U\E) e E e + U\E e. (1.6) 12 1 Lebesgue Measure 1.2 Lebesgue Measure In Section 1.1 we defined the exterior Lebesgue measure of every subset of R d. Unfortunately, a major disadvantage of exterior measure is that it does not satisfy

More information

ẋ = f(x, y), ẏ = g(x, y), (x, y) D, can only have periodic solutions if (f,g) changes sign in D or if (f,g)=0in D.

ẋ = f(x, y), ẏ = g(x, y), (x, y) D, can only have periodic solutions if (f,g) changes sign in D or if (f,g)=0in D. 4 Periodic Solutions We have shown that in the case of an autonomous equation the periodic solutions correspond with closed orbits in phase-space. Autonomous two-dimensional systems with phase-space R

More information

Deænition 1 A set S is ænite if there exists a number N such that the number of elements in S èdenoted jsjè is less than N. If no such N exists, then

Deænition 1 A set S is ænite if there exists a number N such that the number of elements in S èdenoted jsjè is less than N. If no such N exists, then Morphology of Proof An introduction to rigorous proof techniques Craig Silverstein September 26, 1998 1 Methodology of Proof an example Deep down, all theorems are of the form if A then B. They may be

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

problem of detection naturally arises in technical diagnostics, where one is interested in detecting cracks, corrosion, or any other defect in a sampl

problem of detection naturally arises in technical diagnostics, where one is interested in detecting cracks, corrosion, or any other defect in a sampl In: Structural and Multidisciplinary Optimization, N. Olhoæ and G. I. N. Rozvany eds, Pergamon, 1995, 543í548. BOUNDS FOR DETECTABILITY OF MATERIAL'S DAMAGE BY NOISY ELECTRICAL MEASUREMENTS Elena CHERKAEVA

More information

REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

More information

J. TSINIAS In the present paper we extend previous results on the same problem for interconnected nonlinear systems èsee ë1-18ë and references therein

J. TSINIAS In the present paper we extend previous results on the same problem for interconnected nonlinear systems èsee ë1-18ë and references therein Journal of Mathematical Systems, Estimation, and Control Vol. 6, No. 1, 1996, pp. 1í17 cæ 1996 Birkhíauser-Boston Versions of Sontag's Input to State Stability Condition and Output Feedback Global Stabilization

More information

pairs. Such a system is a reinforcement learning system. In this paper we consider the case where we have a distribution of rewarded pairs of input an

pairs. Such a system is a reinforcement learning system. In this paper we consider the case where we have a distribution of rewarded pairs of input an Learning Canonical Correlations Hans Knutsson Magnus Borga Tomas Landelius knutte@isy.liu.se magnus@isy.liu.se tc@isy.liu.se Computer Vision Laboratory Department of Electrical Engineering Linkíoping University,

More information

QUOTIENTS OF F-SPACES

QUOTIENTS OF F-SPACES QUOTIENTS OF F-SPACES by N. J. KALTON (Received 6 October, 1976) Let X be a non-locally convex F-space (complete metric linear space) whose dual X' separates the points of X. Then it is known that X possesses

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

A New Invariance Property of Lyapunov Characteristic Directions S. Bharadwaj and K.D. Mease Mechanical and Aerospace Engineering University of Califor

A New Invariance Property of Lyapunov Characteristic Directions S. Bharadwaj and K.D. Mease Mechanical and Aerospace Engineering University of Califor A New Invariance Property of Lyapunov Characteristic Directions S. Bharadwaj and K.D. Mease Mechanical and Aerospace Engineering University of California, Irvine, California, 92697-3975 Email: sanjay@eng.uci.edu,

More information

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.

More information

W 1 æw 2 G + 0 e? u K y Figure 5.1: Control of uncertain system. For MIMO systems, the normbounded uncertainty description is generalized by assuming

W 1 æw 2 G + 0 e? u K y Figure 5.1: Control of uncertain system. For MIMO systems, the normbounded uncertainty description is generalized by assuming Chapter 5 Robust stability and the H1 norm An important application of the H1 control problem arises when studying robustness against model uncertainties. It turns out that the condition that a control

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

Riemann integral and volume are generalized to unbounded functions and sets. is an admissible set, and its volume is a Riemann integral, 1l E,

Riemann integral and volume are generalized to unbounded functions and sets. is an admissible set, and its volume is a Riemann integral, 1l E, Tel Aviv University, 26 Analysis-III 9 9 Improper integral 9a Introduction....................... 9 9b Positive integrands................... 9c Special functions gamma and beta......... 4 9d Change of

More information

FRAGMENTABILITY OF ROTUND BANACH SPACES. Scott Sciffer. lim <Jl(x+A.y)- $(x) A-tO A

FRAGMENTABILITY OF ROTUND BANACH SPACES. Scott Sciffer. lim <Jl(x+A.y)- $(x) A-tO A 222 FRAGMENTABILITY OF ROTUND BANACH SPACES Scott Sciffer Introduction A topological. space X is said to be fragmented by a metric p if for every e > 0 and every subset Y of X there exists a nonempty relatively

More information

Parabolic Layers, II

Parabolic Layers, II Discrete Approximations for Singularly Perturbed Boundary Value Problems with Parabolic Layers, II Paul A. Farrell, Pieter W. Hemker, and Grigori I. Shishkin Computer Science Program Technical Report Number

More information

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

REAL VARIABLES: PROBLEM SET 1. = x limsup E k REAL VARIABLES: PROBLEM SET 1 BEN ELDER 1. Problem 1.1a First let s prove that limsup E k consists of those points which belong to infinitely many E k. From equation 1.1: limsup E k = E k For limsup E

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Exercises from other sources REAL NUMBERS 2,...,

Exercises from other sources REAL NUMBERS 2,..., Exercises from other sources REAL NUMBERS 1. Find the supremum and infimum of the following sets: a) {1, b) c) 12, 13, 14, }, { 1 3, 4 9, 13 27, 40 } 81,, { 2, 2 + 2, 2 + 2 + } 2,..., d) {n N : n 2 < 10},

More information

BUDKER INSTITUTE OF NUCLEAR PHYSICS. B.V. Chirikov and V.V. Vecheslavov MULTIPLE SEPARATRIX CROSSING: CHAOS STRUCTURE. Budker INP NOVOSIBIRSK

BUDKER INSTITUTE OF NUCLEAR PHYSICS. B.V. Chirikov and V.V. Vecheslavov MULTIPLE SEPARATRIX CROSSING: CHAOS STRUCTURE. Budker INP NOVOSIBIRSK BUDKER INSTITUTE OF NUCLEAR PHYSICS B.V. Chirikov and V.V. Vecheslavov MULTIPLE SEPARATRIX CROSSING: CHAOS STRUCTURE Budker INP 2000-1 NOVOSIBIRSK 2000 chaos structure B.V. Chirikov and V.V. Vecheslavov

More information

Input layer. Weight matrix [ ] Output layer

Input layer. Weight matrix [ ] Output layer MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons

More information

CLASS NOTES FOR APRIL 14, 2000

CLASS NOTES FOR APRIL 14, 2000 CLASS NOTES FOR APRIL 14, 2000 Announcement: Section 1.2, Questions 3,5 have been deferred from Assignment 1 to Assignment 2. Section 1.4, Question 5 has been dropped entirely. 1. Review of Wednesday class

More information

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

g 2 (x) (1/3)M 1 = (1/3)(2/3)M. COMPACTNESS If C R n is closed and bounded, then by B-W it is sequentially compact: any sequence of points in C has a subsequence converging to a point in C Conversely, any sequentially compact C R n is

More information

Problem Set 2: Solutions Math 201A: Fall 2016

Problem Set 2: Solutions Math 201A: Fall 2016 Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that

More information

6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games

6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games 6.254 : Game Theory with Engineering Applications Lecture 7: Asu Ozdaglar MIT February 25, 2010 1 Introduction Outline Uniqueness of a Pure Nash Equilibrium for Continuous Games Reading: Rosen J.B., Existence

More information

In the previous chapters we have presented synthesis methods for optimal H 2 and

In the previous chapters we have presented synthesis methods for optimal H 2 and Chapter 8 Robust performance problems In the previous chapters we have presented synthesis methods for optimal H 2 and H1 control problems, and studied the robust stabilization problem with respect to

More information

AP Calculus Chapter 3 Testbank (Mr. Surowski)

AP Calculus Chapter 3 Testbank (Mr. Surowski) AP Calculus Chapter 3 Testbank (Mr. Surowski) Part I. Multiple-Choice Questions (5 points each; please circle the correct answer.). If f(x) = 0x 4 3 + x, then f (8) = (A) (B) 4 3 (C) 83 3 (D) 2 3 (E) 2

More information

Assessing the Calibration of Naive Bayes' Posterior. Estimates. Paul N. Bennett. September 12, 2000 CMU-CS Carnegie Mellon University

Assessing the Calibration of Naive Bayes' Posterior. Estimates. Paul N. Bennett. September 12, 2000 CMU-CS Carnegie Mellon University Assessing the Calibration of Naive Bayes' Posterior Estimates Paul N. Bennett September 2, 2 CMU-CS--55 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract In this paper,

More information

Lebesgue measure and integration

Lebesgue measure and integration Chapter 4 Lebesgue measure and integration If you look back at what you have learned in your earlier mathematics courses, you will definitely recall a lot about area and volume from the simple formulas

More information

Maths 212: Homework Solutions

Maths 212: Homework Solutions Maths 212: Homework Solutions 1. The definition of A ensures that x π for all x A, so π is an upper bound of A. To show it is the least upper bound, suppose x < π and consider two cases. If x < 1, then

More information

2 I. Pritsker and R. Varga Wesay that the triple èg; W;æè has the rational approximation property if, iè iiè for any f èzè which is analytic in G and

2 I. Pritsker and R. Varga Wesay that the triple èg; W;æè has the rational approximation property if, iè iiè for any f èzè which is analytic in G and Rational approximation with varying weights in the complex plane Igor E. Pritsker and Richard S. Varga Abstract. Given an open bounded set G in the complex plane and a weight function W èzè which is analytic

More information

The Phase Transition 55. have a peculiar gravitation in which the probability of merging is

The Phase Transition 55. have a peculiar gravitation in which the probability of merging is The Phase Transition 55 from to + d there is a probability c 1 c 2 d that they will merge. Components have a peculiar gravitation in which the probability of merging is proportional to their sizes. With

More information

initial work on entropy coding èe.g. Huæman codingè relied on knowing, or measuring,

initial work on entropy coding èe.g. Huæman codingè relied on knowing, or measuring, 161 Chapter 6 Adaptive Quantization Without Side Information 1 Contents 6.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 161 6.2 Adaptation algorithm : : : : : : : : : :

More information

ECARES Université Libre de Bruxelles MATH CAMP Basic Topology

ECARES Université Libre de Bruxelles MATH CAMP Basic Topology ECARES Université Libre de Bruxelles MATH CAMP 03 Basic Topology Marjorie Gassner Contents: - Subsets, Cartesian products, de Morgan laws - Ordered sets, bounds, supremum, infimum - Functions, image, preimage,

More information

Existence and Uniqueness

Existence and Uniqueness Chapter 3 Existence and Uniqueness An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect

More information

Regular finite Markov chains with interval probabilities

Regular finite Markov chains with interval probabilities 5th International Symposium on Imprecise Probability: Theories and Applications, Prague, Czech Republic, 2007 Regular finite Markov chains with interval probabilities Damjan Škulj Faculty of Social Sciences

More information

SPRING Final Exam. May 5, 1999

SPRING Final Exam. May 5, 1999 IE 230: PROBABILITY AND STATISTICS IN ENGINEERING SPRING 1999 Final Exam May 5, 1999 NAME: Show all your work. Make sure that any notation you use is clear and well deæned. For example, if you use a new

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

Problem Set 5: Solutions Math 201A: Fall 2016

Problem Set 5: Solutions Math 201A: Fall 2016 Problem Set 5: s Math 21A: Fall 216 Problem 1. Define f : [1, ) [1, ) by f(x) = x + 1/x. Show that f(x) f(y) < x y for all x, y [1, ) with x y, but f has no fixed point. Why doesn t this example contradict

More information

Continuity of Bçezier patches. Jana Pçlnikovça, Jaroslav Plaçcek, Juraj ç Sofranko. Faculty of Mathematics and Physics. Comenius University

Continuity of Bçezier patches. Jana Pçlnikovça, Jaroslav Plaçcek, Juraj ç Sofranko. Faculty of Mathematics and Physics. Comenius University Continuity of Bezier patches Jana Plnikova, Jaroslav Placek, Juraj Sofranko Faculty of Mathematics and Physics Comenius University Bratislava Abstract The paper is concerned about the question of smooth

More information

Simulated Annealing for Constrained Global Optimization

Simulated Annealing for Constrained Global Optimization Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective

More information

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Impulsive Stabilization and Application to a Population Growth Model*

Impulsive Stabilization and Application to a Population Growth Model* Nonlinear Dynamics and Systems Theory, 2(2) (2002) 173 184 Impulsive Stabilization and Application to a Population Growth Model* Xinzhi Liu 1 and Xuemin Shen 2 1 Department of Applied Mathematics, University

More information

1 Lecture 4: Set topology on metric spaces, 8/17/2012

1 Lecture 4: Set topology on metric spaces, 8/17/2012 Summer Jump-Start Program for Analysis, 01 Song-Ying Li 1 Lecture : Set topology on metric spaces, 8/17/01 Definition 1.1. Let (X, d) be a metric space; E is a subset of X. Then: (i) x E is an interior

More information

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria Weak Leiden University Amsterdam, 13 November 2013 Outline 1 2 3 4 5 6 7 Definition Definition Let µ, µ 1, µ 2,... be probability measures on (R, B). It is said that µ n converges weakly to µ, and we then

More information

Lecture 1. Introduction. The importance, ubiquity, and complexity of embedded systems are growing

Lecture 1. Introduction. The importance, ubiquity, and complexity of embedded systems are growing Lecture 1 Introduction Karl Henrik Johansson The importance, ubiquity, and complexity of embedded systems are growing tremendously thanks to the revolution in digital technology. This has created a need

More information

Logical Connectives and Quantifiers

Logical Connectives and Quantifiers Chapter 1 Logical Connectives and Quantifiers 1.1 Logical Connectives 1.2 Quantifiers 1.3 Techniques of Proof: I 1.4 Techniques of Proof: II Theorem 1. Let f be a continuous function. If 1 f(x)dx 0, then

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

I R TECHNICAL RESEARCH REPORT. A Computationally Efficient Feasible Sequential Quadratic Programming Algorithm. by C. Lawrence, A. Tits T.R.

I R TECHNICAL RESEARCH REPORT. A Computationally Efficient Feasible Sequential Quadratic Programming Algorithm. by C. Lawrence, A. Tits T.R. TECHNICAL RESEARCH REPORT A Computationally Efficient Feasible Sequential Quadratic Programming Algorithm by C. Lawrence, A. Tits T.R. 98-46 I R INSTITUTE FOR SYSTEMS RESEARCH ISR develops, applies and

More information

Organization of Receptive Fields in Networks with. Hebbian Learning: The Connection Between. Box 1843, Brown University.

Organization of Receptive Fields in Networks with. Hebbian Learning: The Connection Between. Box 1843, Brown University. Organization of Receptive Fields in Networks with Hebbian Learning: The Connection Between Synaptic and Phenomenological Models Harel Shouval and Leon N Cooper Department of Physics, The Department of

More information

In this chapter we study elliptical PDEs. That is, PDEs of the form. 2 u = lots,

In this chapter we study elliptical PDEs. That is, PDEs of the form. 2 u = lots, Chapter 8 Elliptic PDEs In this chapter we study elliptical PDEs. That is, PDEs of the form 2 u = lots, where lots means lower-order terms (u x, u y,..., u, f). Here are some ways to think about the physical

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1 Appendix To understand weak derivatives and distributional derivatives in the simplest context of functions of a single variable, we describe without proof some results from real analysis (see [7] and

More information

THE RANGE OF A VECTOR-VALUED MEASURE

THE RANGE OF A VECTOR-VALUED MEASURE THE RANGE OF A VECTOR-VALUED MEASURE J. J. UHL, JR. Liapounoff, in 1940, proved that the range of a countably additive bounded measure with values in a finite dimensional vector space is compact and, in

More information

MATH 426, TOPOLOGY. p 1.

MATH 426, TOPOLOGY. p 1. MATH 426, TOPOLOGY THE p-norms In this document we assume an extended real line, where is an element greater than all real numbers; the interval notation [1, ] will be used to mean [1, ) { }. 1. THE p

More information

FORM 1 Physics 240 í Fall Exam 1 CONTI COULTER GRAFE SKALSEY ZORN. Please æll in the information requested on the scantron answer sheet

FORM 1 Physics 240 í Fall Exam 1 CONTI COULTER GRAFE SKALSEY ZORN. Please æll in the information requested on the scantron answer sheet FORM 1 Physics 240 í Fall 1999 Exam 1 Name: Discussion section instructor and time: ècircle ONE TIMEè CONTI COULTER GRAFE SKALSEY ZORN MW 10-11 èè23è MW 8-9 èè4è MW 2-3 èè7è MW 11-12 èè6è MW 9-10 èè12è

More information

Lecture 4. Chapter 4: Lyapunov Stability. Eugenio Schuster. Mechanical Engineering and Mechanics Lehigh University.

Lecture 4. Chapter 4: Lyapunov Stability. Eugenio Schuster. Mechanical Engineering and Mechanics Lehigh University. Lecture 4 Chapter 4: Lyapunov Stability Eugenio Schuster schuster@lehigh.edu Mechanical Engineering and Mechanics Lehigh University Lecture 4 p. 1/86 Autonomous Systems Consider the autonomous system ẋ

More information

LMI Methods in Optimal and Robust Control

LMI Methods in Optimal and Robust Control LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 15: Nonlinear Systems and Lyapunov Functions Overview Our next goal is to extend LMI s and optimization to nonlinear

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Hybrid Systems Course Lyapunov stability

Hybrid Systems Course Lyapunov stability Hybrid Systems Course Lyapunov stability OUTLINE Focus: stability of an equilibrium point continuous systems decribed by ordinary differential equations (brief review) hybrid automata OUTLINE Focus: stability

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

586 D.E. Norton A dynamical system consists of three ingredients: a setting in which the dynamical behavior takes place, such as the real line, a toru

586 D.E. Norton A dynamical system consists of three ingredients: a setting in which the dynamical behavior takes place, such as the real line, a toru Comment.Math.Univ.Carolinae 36,3 è1995è585í597 585 The fundamental theorem of dynamical systems Douglas E. Norton Abstract. We propose the title of The Fundamental Theorem of Dynamical Systems for a theorem

More information

Continuity of convex functions in normed spaces

Continuity of convex functions in normed spaces Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

Feedback control for a chemostat with two organisms

Feedback control for a chemostat with two organisms Feedback control for a chemostat with two organisms Patrick De Leenheer and Hal Smith Arizona State University Department of Mathematics and Statistics Tempe, AZ 85287 email: leenheer@math.la.asu.edu,

More information

1 Which sets have volume 0?

1 Which sets have volume 0? Math 540 Spring 0 Notes #0 More on integration Which sets have volume 0? The theorem at the end of the last section makes this an important question. (Measure theory would supersede it, however.) Theorem

More information

CS 294-2, Visual Grouping and Recognition èprof. Jitendra Malikè Sept. 8, 1999

CS 294-2, Visual Grouping and Recognition èprof. Jitendra Malikè Sept. 8, 1999 CS 294-2, Visual Grouping and Recognition èprof. Jitendra Malikè Sept. 8, 999 Lecture è6 èbayesian estimation and MRFè DRAFT Note by Xunlei Wu æ Bayesian Philosophy æ Markov Random Fields Introduction

More information

StrucOpt manuscript No. èwill be inserted by the editorè Detecting the stress æelds in an optimal structure II: Three-dimensional case A. Cherkaev and

StrucOpt manuscript No. èwill be inserted by the editorè Detecting the stress æelds in an optimal structure II: Three-dimensional case A. Cherkaev and StrucOpt manuscript No. èwill be inserted by the editorè Detecting the stress æelds in an optimal structure II: Three-dimensional case A. Cherkaev and _I. Kíucíuk Abstract This paper is the second part

More information

MAXIMA AND MINIMA CHAPTER 7.1 INTRODUCTION 7.2 CONCEPT OF LOCAL MAXIMA AND LOCAL MINIMA

MAXIMA AND MINIMA CHAPTER 7.1 INTRODUCTION 7.2 CONCEPT OF LOCAL MAXIMA AND LOCAL MINIMA CHAPTER 7 MAXIMA AND MINIMA 7.1 INTRODUCTION The notion of optimizing functions is one of the most important application of calculus used in almost every sphere of life including geometry, business, trade,

More information

November 9, Abstract. properties change in response to the application of a magnetic æeld. These æuids typically

November 9, Abstract. properties change in response to the application of a magnetic æeld. These æuids typically On the Eæective Magnetic Properties of Magnetorheological Fluids æ November 9, 998 Tammy M. Simon, F. Reitich, M.R. Jolly, K. Ito, H.T. Banks Abstract Magnetorheological èmrè æuids represent a class of

More information

CONTINUITY AND DIFFERENTIABILITY ASPECTS OF METRIC PRESERVING FUNCTIONS

CONTINUITY AND DIFFERENTIABILITY ASPECTS OF METRIC PRESERVING FUNCTIONS Real Analysis Exchange Vol. (),, pp. 849 868 Robert W. Vallin, 229 Vincent Science Hall, Slippery Rock University of PA, Slippery Rock, PA 16057. e-mail: robert.vallin@sru.edu CONTINUITY AND DIFFERENTIABILITY

More information

256 Facta Universitatis ser.: Elect. and Energ. vol. 11, No.2 è1998è primarily concerned with narrow range of frequencies near ærst resonance èwhere s

256 Facta Universitatis ser.: Elect. and Energ. vol. 11, No.2 è1998è primarily concerned with narrow range of frequencies near ærst resonance èwhere s FACTA UNIVERSITATIS èni ç Sè Series: Electronics and Energetics vol. 11, No.2 è1998è, 255-261 EFFICIENT CALCULATION OF RADAR CROSS SECTION FOR FINITE STRIP ARRAY ON DIELECTRIC SLAB Borislav Popovski and

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote Real Variables, Fall 4 Problem set 4 Solution suggestions Exercise. Let f be of bounded variation on [a, b]. Show that for each c (a, b), lim x c f(x) and lim x c f(x) exist. Prove that a monotone function

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

Converse Lyapunov Functions for Inclusions 2 Basic denitions Given a set A, A stands for the closure of A, A stands for the interior set of A, coa sta

Converse Lyapunov Functions for Inclusions 2 Basic denitions Given a set A, A stands for the closure of A, A stands for the interior set of A, coa sta A smooth Lyapunov function from a class-kl estimate involving two positive semidenite functions Andrew R. Teel y ECE Dept. University of California Santa Barbara, CA 93106 teel@ece.ucsb.edu Laurent Praly

More information

GLOBAL ANALYSIS OF PIECEWISE LINEAR SYSTEMS USING IMPACT MAPS AND QUADRATIC SURFACE LYAPUNOV FUNCTIONS

GLOBAL ANALYSIS OF PIECEWISE LINEAR SYSTEMS USING IMPACT MAPS AND QUADRATIC SURFACE LYAPUNOV FUNCTIONS GLOBAL ANALYSIS OF PIECEWISE LINEAR SYSTEMS USING IMPACT MAPS AND QUADRATIC SURFACE LYAPUNOV FUNCTIONS Jorge M. Gonçalves, Alexandre Megretski y, Munther A. Dahleh y California Institute of Technology

More information

Relationship Between Integration and Differentiation

Relationship Between Integration and Differentiation Relationship Between Integration and Differentiation Fundamental Theorem of Calculus Philippe B. Laval KSU Today Philippe B. Laval (KSU) FTC Today 1 / 16 Introduction In the previous sections we defined

More information

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3 Analysis Math Notes Study Guide Real Analysis Contents Ordered Fields 2 Ordered sets and fields 2 Construction of the Reals 1: Dedekind Cuts 2 Metric Spaces 3 Metric Spaces 3 Definitions 4 Separability

More information

Lecture 3: Semidefinite Programming

Lecture 3: Semidefinite Programming Lecture 3: Semidefinite Programming Lecture Outline Part I: Semidefinite programming, examples, canonical form, and duality Part II: Strong Duality Failure Examples Part III: Conditions for strong duality

More information

Ordinary Differential Equations (ODEs)

Ordinary Differential Equations (ODEs) c01.tex 8/10/2010 22: 55 Page 1 PART A Ordinary Differential Equations (ODEs) Chap. 1 First-Order ODEs Sec. 1.1 Basic Concepts. Modeling To get a good start into this chapter and this section, quickly

More information

Measure and integration

Measure and integration Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Iterative Improvement plus Random Restart Stochastic Search by X. Hu, R. Shonkwiler, and M. C. Spruill School of Mathematics Georgia Institute of Tech

Iterative Improvement plus Random Restart Stochastic Search by X. Hu, R. Shonkwiler, and M. C. Spruill School of Mathematics Georgia Institute of Tech Iterative Improvement plus Random Restart Stochastic Search by X. Hu, R. Shonkwiler, and M. C. Spruill School of Mathematics Georgia Institute of Technology Atlanta, GA 0 email: shonkwiler@math.gatech.edu,

More information

8.3 Partial Fraction Decomposition

8.3 Partial Fraction Decomposition 8.3 partial fraction decomposition 575 8.3 Partial Fraction Decomposition Rational functions (polynomials divided by polynomials) and their integrals play important roles in mathematics and applications,

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information