1 Introduction Research on neural networks has led to the investigation of massively parallel computational models that consist of analog computationa

Size: px
Start display at page:

Download "1 Introduction Research on neural networks has led to the investigation of massively parallel computational models that consist of analog computationa"

Transcription

1 A Comparison of the Computational Power of Sigmoid and Boolean Threshold Circuits Wolfgang Maass Institute for Theoretical Computer Science Technische Universitaet Graz Klosterwiesgasse 32/2 A-8010 Graz, Austria Georg Schnitger Fachbereich Mathematik/Informatik Universitat Paderborn D-4790 Paderborn, Germany Eduardo D. Sontag y SYCON { Rutgers Center for Systems and Control Department of Mathematics Rutgers University sontag@hilbert.rutgers.edu Abstract We examine the power of constant depth circuits with sigmoid (i.e. smooth) threshold gates for computing boolean functions. It is shown that, for depth 2, constant size circuits of this type are strictly more powerful than constant size boolean threshold circuits (i.e. circuits with linear threshold gates). On the other hand it turns out that, for any constant depth d, polynomial size sigmoid threshold circuits with polynomially bounded weights compute exactly the same boolean functions as the corresponding circuits with linear threshold gates. Partially supported by NSF-CCR and AFOSR y Partially supported by Siemens Corporate Research and AFOSR and AFOSR

2 1 Introduction Research on neural networks has led to the investigation of massively parallel computational models that consist of analog computational elements. Usually these analog computational elements are assumed to be smooth threshold gates, i.e.-gates for some nondecreasing dierentiable function : R! R. A -gate with weights w 1 ; : : : ; w m 2 R and threshold t 2 R is dened to be a gate that computes the function (x 1 ; : : : ; x m ) 7! ( P m w i x i? t) from R m into R. A -circuit is dened as a directed acyclic circuit that consists of -gates. The most frequently considered special case of a smooth threshold circuit is the sigmoid threshold circuit, which is a -circuit for : R! R dened by 1 (x) = 1 + exp(?x). Smooth threshold circuits (-circuits for \smooth" functions ) have become the standard model for the investigation of learning on multi-layer articial neural nets ([K], [HKP], [RM], [SS1], [SS2], [WK]). In fact, the most common learning algorithm for multi-layer neural nets, the Backwards-Propagation algorithm, can only be implemented on -circuits for dierentiable functions. Another motivation for the investigation of smooth threshold circuits is the desire to explore simple models for the (very complicated) information processing mechanisms in neural systems of living organisms. In a rst approximation one may view the current ring rate of a neuron as its current output ([S], [RM], [K]). The ring rates of neurons are known to change between a few and several hundred rings per second. Hence a smooth threshold gate provides a somewhat better computational model for a neuron than a digital element that has just two dierent output signals. In this paper we examine the power of smooth threshold circuits for computing Boolean functions. In particular, we compare their power with that of boolean threshold circuits (i.e. s-circuits for the \heaviside" function s, with s(x) = 1 if x 0 and s(x) = 0 if x < 0). In the literature one often refers to such \boolean threshold circuits" as \linear threshold circuits", or simply as \threshold circuits". The most surprising result of this paper is the existence of a boolean function F n, that can be computed by a large class of -circuits (containing -circuits) with small weights in depth 2 and size 5 (Theorem 2.2.), but which cannot be computed with any weight size by constant size boolean threshold circuits of depth 2 (Theorem 3.1). A witness for this dierence in computational power is the boolean function F n with F n (~x; ~y) := Majority(~x) Majority(~y); where ~x and ~y are n-bit vectors. The proof of this lower bound result for boolean threshold circuits (Theorem 3.1) is of independent interest. First, this proof demonstrates that the restriction method is not only useful in order to prove lower bounds for AC 0 -circuits, but also for threshold circuits. Secondly, this proof exploits some previously unused potential in a standard tool for the analysis of threshold circuits: the "-Discriminator Lemma of [HMP ST ]. It is essential for our proof that the "-Discriminator Lemma holds not just for the uniform distribution over the input space (as it is stated in [HMPST]), but for any distribution. 2

3 Hence we have the freedom to construct such a distribution in a malicious manner, where we exploit specic \weak points" of the considered threshold circuit. This extra power of the (generalized) "-Discriminator Lemma is crucial: in Remark 3.10 we show that its conventional version is insucient for the proof of Theorem 3.1. Subsequent to the preliminary version [MSS] of this paper, DasGupta and Schnitger [DS] have shown that the boolean function SQ n with SQ n (x 1 ; : : : x n ; y 1 ; : : : ; y n 2) = (( n x i ) 2 leads to a depth-independent separation of boolean threshold circuits and -circuits. In particular, SQ n is computable by a large class of -circuits in constant size, whereas any boolean threshold circuit requires size (log n). However, the gate function has to satisfy more stringent dierentiabilty requirements than are necessary for the constantsize computation of F n. For the case of analog input, [DS] also provides a comparison of the approximation power of various -circuits. In order to compute a boolean function on an analog computational device one has to adopt a suitable output convention (similar to the conventions that are used to carry out digital computations on real-world computers, which consists of non-digital computational elements such as transistors). Denition 1.1 A -circuit C computes a boolean function F : f0; 1g n! f0; 1g with separation " if there is some t C 2 R such that for any input (x 1 ; : : : ; x n ) 2 f0; 1g n the output gate of C outputs a value which is at least t C +" if F (x 1 ; : : : ; x n ) = 1, and at most t C? " otherwise. A computation without separation at the output gate appears to be less interesting, since then an innitesimal change in the output of any -gate in the circuit may invert the output bit. Hence we consider in this paper computations on -circuits C n with 1 separation at least for some polynomial p (where n is the number of input bits of p(n) C n ). One nice feature of this convention is that, for Lipschitz bounded gate functions and polynomial size -circuits C n of constant depth and with polynomially bounded 1 weights, it allows a tolerance of for all -gates in C poly(n) n. We will give in Theorem 4.1 a \separation boosting" result, which says that for any constant depth d one may demand for polynomial size -circuits with polynomially bounded weights just as well a separation of size (1) without changing the class of boolean functions that can be computed. An extended abstract of this paper has previously appeared in [MSS]. In this full version of the paper we have strengthened the claim of Theorem 4.1 by imposing a less stringent condition on. n 2 y i ) 3

4 2 Sigmoid Threshold Circuits for the OR of Majorities We write (NL) for the following property of a function : R! R, (NL) There is some rational number s so that: 1. is dierentiable on some open interval containing s, and (s) exists and is nonzero. Obviously the function satises (NL). Observe that property (NL) is basically the requirement that the function be nonlinear; for instance, if 00 happens to be everywhere dened, then (NL) is precisely equivalent to not being a linear function. The nonlinearity of is obviously a necessary assumption for Theorem 2.2, since otherwise a -circuit can only compute linear functions. Without loss of generality, we will assume that c := 00 (s) 4 for some point s as in the denition. If this value where to be negative, we simply replace by? in what follows. > 0 Lemma 2.1 Assume that satises (NL). Dene the function (x) := (x + s) + (?x + s) : Then, the function is even, and there exists some " > 0 so that the following property holds: (a + h)? (a) ch 2 for all a; h 2 [0; "] : Proof. Note that (?x) = (x) directly from the denition, so is even. Moreover, is dierentiable on some open interval containing x = 0, because is dierentiable in a neighborhood of s, and evenness implies that 0 (0) = 0. Observe also that 00 (0) exists, and in fact 00 (0) = 2 00 (s) = 8c > 0 : By denition of 00 (0) (just write 0 (l) = 0 (0) + 00 r(l) (0)l + r(l) with lim = 0), there is l!0 l some " > 0 so that 0 (l) 00 (0)l = 4cl (1) 2 for all each l 2 [0; 2"]. Because 0 (l) 4cl > 0 for l > 0, it follows that is strictly increasing on [0; 2"]. We are only left to prove that this " is so that the last property holds. 4

5 Pick any a; h 2 [0; "]. Assume that h 6= 0, as otherwise there is nothing to prove. As a and a + h are both in the interval [0; 2"], and is strictly increasing there, it follows that (a + h)? (a) > (a + h)? (a + h 2 ) and by the Mean Value Theorem this last expression equals 0 (l) h 2 for some l 2 (a+ h 2 ; a+ h). Since l < a + h 2", we may apply inequality (1) to obtain (a + h)? (a) > 2clh : The result now follows from the fact that l > a + h 2 h 2. Theorem 2.2 Assume that : R! R satises (NL). Then there exists for every n 2 N a -circuit C n of depth 2 with 5 gates (and rational weights and thresholds of size O(1)) that computes F n with separation (1=n 2 ). Proof. With and " as in Lemma 2.1 one has (a) > (b), jaj > jbj for any a; b 2 [?"; +"]. Hence any two nonzero reals u; v 2 [?"=2; +"=2] have dierent sign if and only if (u? v)? (u + v) > 0. Let x 1 ; : : : ; x n ; y 1 ; : : : ; y n 2 f0; 1g be arbitrary and set Then we obtain Furthermore Lemma 2.1 implies that u := " 2 (4(x 1 + : : : + x n )? 2n + 1 ); 4n v := " 2 (4(y 1 + : : : + y n )? 2n + 1 ): 4n (u? v)? (u + v) > 0, F n (~x; ~y) = 1: j(u? v)? (u + v)j 4c minfu 2 ; v 2 g = (1=n 2 ): Hence, we can achieve separation (1=n 2 ) by using a -gate on level two of circuit C n that checks whether (u? v)? (u + v) > 0. Such a -gate exists: Since 00 (s) 6= 0, there is some t with 0 (t) 6= 0. Now transform (u? v)? (u + v) into a suitable neighborhood of t and choose a suitable rational approximation of (t) as threshold. Corollary 2.3 Assume that : R! R satises (NL) and is monotone. Then there exists for every n 2 N a -circuit C n of depth 2 and size 5 (with rational weights and thresholds of size polynomial in n) that computes F n with separation (1). Proof. Multiply the weights of the -gate on level two of the circuit C n with n 2 and transform the threshold accordingly. In this way we can ensure that the weighted sum computed at the top gate has distance (1) from its threshold. 5

6 Remark 2.4. For computations with real (rather than Boolean) inputs, there has been some work dealing with the dierences in capabilities between sigmoidal and threshold devices; in particular [So] studies questions of interpolation and classication related to learnability (VC dimension). 3 Boolean threshold gates are less powerful Theorem 3.1 No family (C n j n 2 N) of constant size boolean threshold circuits of depth 2 (with unrestricted weights and thresholds) can compute the function F n. Proof. Assume, by way of contradiction, that there exist such circuits C n, each with at most k 0 gates on level one. We can demand that all weights are integers and that the level 2 gate has weights of absolute value at most 2 O(k0 log k 0 ) ([Mu],[MT]). Thus we can assume, after appropriate duplication of level one gates, that the gate on level 2 has only weights from f?1; 1g. Let k be an upper bound on the resulting number of gates. In the next section we use the restriction method to eliminate those gates on level one of C n whose weights for the x i (y i ) have drastically dierent sizes. It turns out that we cannot achieve this goal for all gates. For example, if all the weights w i (for the x i ) are much larger than the weights u i (for the y i ), then we can only limit the variance of the weights w i (see condition b. in Denition 3.2). Nevertheless, the restriction method allows us to \regularize" all bottom gates of C n (see Lemma 3.4). In section 3.2 we show that the resulting regularized gates behave predictably for certain distributions (see Lemma 3.7). The argument concludes in section 3.3 with a non-standard application of the "-Discriminator Lemma. 3.1 The Restriction Method Our goal will be to x certain inputs such that all bottom gates of C n will have a normal form as described in the following denition. Denition 3.2. Let G be a boolean threshold gate (with inputs x 1 ; : : : ; x m ; y 1 ; : : : ; y m ) that outputs 1 if and only if that w i x i + u i y i t. Assume that the numbering is such jw 1 j : : : jw m j and ju 1 j : : : ju m j: We say that G is l-regular if and only if all w i have the same sign (negative, zero, or positive) and all u i have the same sign. Additionally, one of the following conditions has to hold, 6

7 a. G is constant. b. 8i (jw i j m 1=8 ju i j) and jw m j 60jw 1 j. c. 8i (ju i j m 1=8 jw i j) and ju m j 60ju 1 j. d. jw m j 30(1 + l)jw 1 j and ju m j 30(1 + l)ju 1 j. First we will transform a single threshold gate to a regular gate. Lemma 3.3 Let G be an arbitrary threshold gate that outputs 1 if and only if n w i x i + n u i y i t: Then there are sets M x f1; : : : ; ng and M y f1; : : : ; ng of size n 60 assignment A : fx i : i 62 M x g [ fy i : i 62 M y g! f0; 1g such that each and an a. when values are assigned according to A, F n=60 will be obtained as the corresponding subfunction of F n, and b. G, when restricted to the remaining free variables, is n 1=8 -regular. Proof. First we determine a set M 0 x f1; : : : ; ng of size n=3 such that all w i (with i 2 M 0 x) are either all positive, all negative or all zero. A set M 0 y f1; : : : ; ng of size n=3 is chosen analogously to enforce the same property for the coecients u i (with i 2 M 0 y). Set m = n=3. After possibly renumbering the indices, we can assume that M 0 x = M 0 y = f1; : : : ; mg. We can also assume that jw 1 j : : : jw m j as well as ju 1 j : : : ju m j. We dene R := f1; : : : ; m 4 g; S := fm 4 + 1; : : : ; 3m 4 g and T := f3m 4 + 1; : : : ; mg: By assigning 1's to the x i 's with i 2 R and 0's to the x i 's with i 2 T or vice versa, and by assigning 1's to the y i 's with i 2 R and 0's to the y i 's with i 2 T or vice versa, we obtain four partial assignments. Let us now interpret G as a threshold gate of the remaining variables x i (i 2 S) and y i (i 2 S). By choosing one of the four assignments, we can \move" the threshold of the resulting gate over a distance d with d = i2t jw i j? jw i j + ju i j? ju i j: i2r i2t i2r If for none of these four partial assignments the threshold gate G gives constant output, we have d jw i j + ju i j: i2s i2s 7

8 This implies that i2t (jw i j + ju i j) i2r[s (jw i j + ju i j): (2) Set a = P i2r[s (jw i j + ju i j)=(3m=4) and b = P i2t (jw i j + ju i j)=(m=4): Then (2) implies for these \averages" of jw i j + ju i j over R [ S respectively T that b 3a. We subdivide the set S by introducing the sets P = f 3m 4? 2m ; : : : ; 3m 4? m 10 g and Q = f3m 4? m ; : : : ; 3m 4 g: Since jw i j + ju i j is a non-decreasing function of i we have for all i 2 R [ S (and in particular for all i 2 P [ Q) Furthermore, we have for all i 2 P jw i j + ju i j b 3a: (3) jw i j + ju i j a=10; (4) since otherwise jw i j + ju i j < a=10 for all i 2 (R [ S)? (P [ Q), and we would get i2r[s (jw i j + ju i j) = i2(r[s)?(p[q) (jw i j + ju i j) + ( 3 4? 2 10 )m a 2m + 3a = m a( (3? 1 10 )) < 3m a ; 4 which is a contradiction to the denition of a. (3) and (4) jointly imply that i2p[q (jw i j + ju i j) max (jw ij + ju i j) 30 min (jw ij + ju i j): (5) i2p[q i2p[q Case 1: 8 i 2 P(jw i j m 1=8 ju i j _ ju i j m 1=8 jw i j). We can nd a subset P 0 P of size m=20 such that 8i 2 P 0 (jw i j m 1=8 ju i j) or 8i 2 P 0 (ju i j m 1=8 jw i j): 8

9 In the former case, (5) implies that max jw i j 30 min(jw i j + ju i j) i2p 0 i2p 0 30(1 + m?1=8 ) min jw i j i2p 0 60 min jw i j: i2p 0 Set M x = M y = P 0 and x the remaining variables such that exactly half of the x i 's and half of the y i 's are 0. Analogously, in the latter case we obtain max obtained as above. Case 2: Otherwise. i2p 0 ju i j 60 min ju i j. M x and M y are i2p 0 Then 9i 0 2 P (jw i0 j < m 1=8 ju i0 j^ ju i0 j < m 1=8 jw i0 j). We have for all i 2 Q: jw i j + ju i j 30(jw i0 j + ju i0 j) minf30jw i0 j(1 + m 1=8 ); 30ju i0 j(1 + m 1=8 )g: Thus we have max i2q jw i j 30(1 + m 1=8 ) min i2q jw i j and max i2q ju i j 30(1 + m 1=8 ) min i2q ju i j. Choose M x to be an arbitrary subsets of Q of size n 60, set M y = M x and x the remaining variables in the same fashion as before. If we perform the \regularization process" for all bottom gates of C n, then we obtain Lemma 3.4 There are sets M x ; M y f1; : : : ; ng of size m = n 60 k and there is an assignment A : fx i : i 62 M x g [ fy i : i 62 M y g! f0; 1g such that a. when values are assigned according to A, F m will be obtained as the corresponding subfunction of F n, and b. all level one gates of C n, when restricted to the free variables, are n 1=8 -regular. Proof. Apply Lemma 3.3 successively to each of the k level one gates of C n. Let M x be the set of indices of those variables x i which did not receive a value during the processing of all gates by Lemma 3.3. M y is dened analogously. A is the union of all partial assignments that have been made in this process. We write D n for the circuit that results from C n by the restriction of Lemma 3.4. Observe that D n computes the function F m (for m = n ). 60 k 9

10 3.2 The Likely Behavior of a Threshold Gate In this section we will exploit the result of our regularization process. In particular, in Lemma 3.7, we will show that, for the input distribution dened below, a weighted sum with small variance in weight sizes \almost" behaves as if all the weights were identical. For the integer s, 1 s m, set U(s) = f~x 2 f0; 1g m : random variable which assigns to each ~x 2 U(s) the value x i = sg. (s) is the w i x i ; all elements of U(s) are equally likely. Obviously E((s)) = s w i. m In the following, we will assume that the w i 's are either all positive or all negative. Proposition 3.5 Set W = w 2 i and g = maxf jw ij jw j j : 1 i; j mg. Then W m s 2 g2 E((s)) 2 : Proof. Set MIN = minfjw i j : 1 i mg. We get W 1 m (m2 g 2 MIN 2 ): (6) Also, E((s)) 2 = ( s m m w i ) 2 ( s m m MIN)2. Thus m 2 MIN 2 ( m s )2 E((s)) 2 : (7) If we replace m 2 MIN 2 in (6) according to (7), we get W m s 2 g2 E((s)) 2 : Proposition 3.6 V ar((s)) s m W. Proof. We have V ar((s)) = E((s) 2 )? E((s)) 2. Also, m s E((s) 2 ) = 1 m s ~x2u (s) = 1 ( m s = 1 ( ( ~x2u (s) w i 2 w i x i ) 2 w i2 x i + 2 ~x2u (s) 10 x i + 2 ~x2u (s) 1i<jm 1i<jm w i w j w i w j x i x j ) ~x2u (s) x i x j )

11 = s m m w i2 + 2s(s? 1) m(m? 1) 1i<jm w i w j : Furthermore, In summary, we obtain m s E((s)) 2 = ( 1 m s = ( 1 = ( s m = s2 m 2 ~x2u (s) m w i ~x2u (s) w i ) 2 w i2 + 2s2 m 2 w i x i ) 2 x i ) 2 1i<jm w i w j : V ar((s)) ( s m? s2 m m ) w 2 i2 s m W: Lemma 3.7 If maxf jw ij jw j j : 1 i; j mg = O(m1=8 ), then P r(j(s)? E((s))j je((s))j m 1=4 ) = O( m3=4 s ): Proof. By Chebyshev's inequality, we get for any t (t > 0) Thus, for t = je((s))j ), we obtain m 1=4 Proposition 3.6 implies P r(j(s)? E((s))j t) V ar((s) t 2 : P r(j(s)? E((s))j je((s))j m 1=4 ) V ar((s)) m1=2 E((s)) 2 : and with Proposition 3.5 V ar((s)) m 1=2 E((s)) 2 s W m1=2 m E((s)) 2 ; s W m 1=2 m E((s)) 2 m s g2 m?1=2 = O( m s m?1=4 ): 11

12 3.3 A Non-standard Application of the Discriminator Lemma Let G be some boolean threshold gate with weights w 1 ; : : : ; w m, u 1 ; : : : ; u m and threshold t. Set P m P w m i a := m and b := u i m : With G we can thus associate the two-dimensional threshold function ax + by t. Similarly, with F m we associate the two-dimensional function F : f0; : : : ; mg 2! f0; 1g, where F (x; y) = 1 if and only if (x m 2 ^ y < m 2 ) _ (x < m 2 ^ y m 2 ): Let L be the line ax + by = t in R 2 (where t is the threshold of G). Let x 0 (y 0 ) be the x-coordinate (y-coordinate) of the intersection of L and y = m (x = m). Set 2 2 x0 = 1 (y 0 = 1) if the line L is horizontal (vertical). We dene D(G) := minfjx 0? m 2 j; jy0? m 2 jg: and let U r be the uniform distribu- Proposition 3.8 Let r be an integer with 0 r m 2 tion over V r = f m? r; : : : ; m + rg. Then 2 2 P r UrUr[(x; y) 2 V 2 r j ax + by t ^ F (x; y) = 1g D(G) + 1 2r + 1 : Proof. Let be the area enclosed by the two lines ax + by = t and ax + by = (a + b) m 2. (The latter is the line through (m 2 ; m 2 ).) Intersect with the set Vr and call the 2 intersection r. Let us assume that D(G) = jx 0? mj. Then 2 r will contain at most D(G) + 1 points per row of Vr 2. Thus j r j (2r + 1) (D(G) + 1). On the other hand, the halfspace ax + by (a + b) m 2 the elements of the set f(x; y) 2 Vr 2 j F (x; y) = 1g: contains exactly one half of all Let us consider the case that all weights w i are identical and all weights u i are identical. If D(G) is \small", then G will not show any signicant advantage in predicting F for a subcollection of the m + 1 distributions U 2 r mentioned in Proposition 3.8. If on the other hand D(G) is large (say proportional to m), then we can trivialize G by choosing a distribution with a small value for r. Our goal is to carry out a similar argument for arbitrary gates G. Consequently we introduce a collection Q r of distributions over f0; 1g m with Q r (~x) = 8 >< >: (2r + 1) 0 ; if j P m x i? m 2 j > r 1 ; otherwise. m P m x i 12

13 Note that the probability of a string only depends on its number of ones. The appropriate value for the parameter r will be determined later. Finally we dene for the considered threshold gate G with input variables x 1 ; : : : ; x m, y 1 ; : : : ; y m, ADV r (G) := P r QrQr[G(~x; ~y) = 1jF m (~x; ~y) = 1]? P r QrQr[G(~x; ~y) = 1jF m (~x; ~y) = 0]: Lemma 3.9. Set m = n 60 k. Assume that the boolean threshold gate G with input variables x 1 ; : : : ; x m ; y 1 ; : : : y m is n 1=8 -regular, and that n is suciently large. Furthermore assume that the natural number r 2 [m 31=32 ; m 4 ] satises D(G) r=(64k) or D(G) 4r : Then jadv r (G)j 1 2k : Proof. Case 1: D(G) r. 64k We know that G is n 1=8 -regular. We proceed by examining the three dierent cases (see Denition 3.2.). Case 1.1: 8i (jw i j m 1=8 ju i j) and jw m j 60jw 1 j: This implies that jaj m 1=8 jbj. Hence the line L is very \steep". We have in this case, maxfx 2 [0; m] : 9y 2 [0; m] ((x; y) 2 L)g?minfx 2 [0; m] : 9y 2 [0; m] ((x; y) 2 L)g m 7=8 : Thus, the set fx 2 [0; m] : 9x 0 ; y 0 2 [0; m] (jx? x 0 j m 3=4 ^ (x 0 ; y 0 ) 2 Lg is contained in an interval of length m 7=8 + 2m 3= m 7=8. This implies that jf(x; y) 2 f m 2? r; : : : ; m 2 + rg2 : P (x; y)gj 3m 7=8 m = 3m 15=8 ; (8) where P (x; y) is equivalent to (ax + by < t) ^ 9(x 0 ; y 0 ) 2 [0; m] 2 ( (ax 0 + by 0 t) ^ (jx? x 0 j m 3=4 ) ): As a rst step towards estimating ADV r (G) we consider the set S = f(~x; ~y) 2 U : G(~x; ~y) = 1 ^ F m (~x; ~y) = 1g where U := f(~x; ~y) 2 f0; 1g 2m : m 2? r m x i ; y i m 2 + rg: 13

14 One shows that S is contained in the following two sets, S 1 = f(~x; ~y) 2 U : j w i x i? ( x i w i )=mj ( x i w i )=m 5=4 g and S 2 = f(~x; ~y) 2 U : F m (~x; ~y) = 1 ^ Q(~x; ~y)g, where Q(~x; ~y) is equivalent to 9~x 0 ; ~y 0 2 f0; 1g m (j x 0 i?? x i j m 3=4 ^ a x 0 i + b y 0 i t): Intuitively, the set S 1 consists of all those inputs (that are relevant for Q r ) on which our approximation of G by ax + by t fails. We will show later that this set has small probability. S 2 on the other hand is the collection of all relevant inputs on which the approximation (in a quite liberal sense) succeeds. j w i x i? a x i j ( x i jaj)=m 1=4 m 3=4 jaj: (9) We need to nd vectors ~x 0 ; ~y 0 according to the denition of set S 2. If a 0, we pick some ~x 0 such that U 3m=4 [ i=m=4 f0; 1g i. We then have with (9) a If a < 0, we pick some ~x 0 such that a x i? a m 3=4 = a x i + jaj m 3=4 x 0 i = x 0 i = Furthermore, we pick some vector ~y 0 with b x 0 i w i x i. x i + m 3=4. w i x i. This is possible, since x i? m 3=4. We then have a y 0 i x 0 i = u i y i according to the following procedure: if all components of u i are positive or all components are zero, then set ~y 0 = (1; : : : ; 1). Otherwise all components are negative and we set ~y 0 = (0; : : : ; 0). This concludes our proof of inclusion, since property Q(~x; ~y) holds for the pair (~x 0 ; ~y 0 ). It is obvious that P r QrQr[S 1 j F m (~x; ~y) = 1] 2 P r QrQr[S 1 ]: If we apply Lemma 3.7 for s 2 [ m? r; m + r] [ m; 3m], we obtain P r QrQr[S 1 ] = O(m?1=4 ). In order to give an upper bound on P r QrQr[S 2 ] we observe that S 2 S 3 [ S 4, where S 3 = f(~x; ~y) : (F m (~x; ~y) = 1) ^ (a S 4 = f(~x; ~y) : (F m (~x; ~y) = 1) ^ (a R(~x; ~y) is equivalent to 14 x i + b x i + b y i t)g and y i < t) ^ R(~x; ~y)g:

15 9~x 0 ; ~y 0 (j x i? It follows from Proposition 3.8 that x 0 ij m 3=4 ^ a x 0 i + b y 0 i t): Also, it is obvious that P r QrQr[S 3 j F m (~x; ~y) = 1] D(G) + 1 2r + 1 : P r QrQr[S 4 j F m (~x; ~y) = 1] 2 P r QrQr[S 4 ]: Furthermore, by (8), P r QrQr[S 4 ] 3 m15=8 : Thus we have 2 (2r + 1) P r QrQr[S j F m (~x; ~y) = 1] P r QrQr[S 1 [ S 3 [ S 4 j F m (~x; ~y) = 1] O(m?1=4 ) D(G) + 1 2r k + O(m?1=16 ): We will obtain the same upper bound for the probability of S 0 = f(~x; ~y) 2 U : G(~x; ~y) = 0 ^ F m (~x; ~y) = 1g: + 6m15=8 (2r + 1) 2 Thus, since P r QrQr[S j F m = 1] + P r QrQr[S 0 j F m = 1] = 1; we get jp r QrQr[S j F m (~x; ~y) = 1]? 1 2 j 1 64k + O(m?1=16 ): One shows analogously for T = f(~x; ~y) 2 U : G(~x; ~y) = 1 ^ F m (~x; ~y) = 0g that jp r QrQr[T j F m (~x; ~y) = 0]? 1 2 j 1 64k + O(m?1=16 ): Thus, jadv r (G)j 1 32k + O(m?1=16 ). Case 1.2: 8i (ju i j m 1=8 jw i j) and ju m j 60ju 1 j. The argument is analogous to Case 1.1. Case 1.3: jw m j 30(1 + n 1=8 )jw 1 j and ju m j 30(1 + n 1=8 )ju 1 j. We rst observe that the set S is contained in the union of the sets S 0 1 and S 0 2, where S 0 1 = f(~x; ~y) 2 U : P 0 (~x; ~y)g; and P 0 (~x; ~y) is equivalent to 15

16 (j w i x i? a x i j jaj P m x i m 1=4 ) _ (j u i y i? b S 0 2 = f(~x; ~y) 2 U : F m (~x; ~y) = 1 ^ Q 0 (~x; ~y)g, and Q 0 (~x; ~y) is equivalent to j 9~x 0 ; ~y 0 2 f0; 1g m (j y 0 i? y i j m 3=4 x 0 i?? ^ a x i j m 3=4 ^ x 0 i + b y 0 i y i j jbj P m y i m 1=4 ); t): Lemma 3.7 implies that P r QrQr[S 0 1 jf m (~x; ~y) = 1] 2 P r QrQr[S 0 1] = O(m?1=4 ): With an argument analogous to Case 1.1 we get S 0 2 S 3 [ S 0 4 where S 0 4 = f(~x; ~y) : (F m (~x; ~y) = 1) ^ (a R 0 (~x; ~y) is equivalent to j y i? 9~x 0 ; ~y 0 (j We have already shown that Furthermore, it is obvious that y 0 ij m 3=4 x i? ^ a x i + b x 0 ij m 3=4 ^ x 0 i + b y i < t) ^ R 0 (~x; ~y)g: y 0 i t): P r QrQr[S 3 j F m (~x; ~y) = 1] D(G) + 1 2r + 1 : P r QrQr[S 0 4 j F m (~x; ~y) = 1] 4 m m3=4 (2r + 1) 2 : The remaining argument is now analogous to Case 1.1. Case 2: D 4r. The analysis is now far simpler. The probability of the set S 1 (resp. S 0 1) is computed as before. As for S 3 we now get For S 4 we obtain P r QrQr[S 3 j F m (~x; ~y) = 1] 2 f0; 1g: P r QrQr[S 4 j F m (~x; ~y) = 1] = 0: The same applies to S4. 0 This follows, since the set U will be entirely contained in one of y i = tg. the halfspaces of f(~x; ~y) : a P m x i + b P m 16

17 In order to prove Theorem 3.1 we observe that for suciently large n we can nd r such that for each of the at most k gates G on level one of D n : D(G) r 64k or D(G) 4r: (A value for r can be found whenever k is bounded from above by the number of possible \r-intervals". This is the case, provided k c log k (m) for a suitably small constant c. This in turn is satised for k d for a suitably small constant d.) log n log log n The "-Discriminator Lemma of [HMPST] can be generalized to hold for any distribution over the input space. We apply it here to the distribution Q r Q r over the input space f0; 1g 2m of the circuit D n (which computes the function F m ). Since the weights of the gate on level two of D n are from f?1; 1g, we get jadv r (G)j 1 for some gate G on level one of D k n. But this contradicts Lemma 3.5. Thus we get a lower bound of ( log n ) for the size of depth 2 threshold circuits (with log log n weights from f?1; 1g for the top gate) computing F n. For unrestricted threshold circuits log log n our lower bound will be ( ) ([M],[MT]). log log log n Remark 3.10 It is not possible to prove Theorem 3.1 with the customary version of the "-Discriminator Lemma, where one considers the uniform distribution over the input space. Consider for example the threshold gate G dened by G(x 1 ; : : : x n ; y 1 ; : : : y n ) = 1, n x i? n y i c p n: For appropriate c one has ADV (G) = (1) (where ADV (G) is dened like ADV r (G), but with regard to the uniform distribution over f0; 1g 2n ). This happens, because a \large discrepancy" in x-sum and y-sum is more likely if we assume P n x i P n and 2 n y i n than if we assume (say) P n 2 x i n and P n 2 y i n. This phenomenon has 2 been independently observed by Bultman [B]. Corollary 3.7The class of boolean functions computable by constant size boolean threshold circuits of depth 2 with integer weights of polynomial size is properly contained in the class of boolean functions computable by constant size -circuits of depth 2 with polynomial size rational weights (even with common polynomial size denominator) and separation 1. poly The same statement holds if one considers arbitrary real weights for both types of circuits (still with separation 1 ). poly Proof. It is quite easy to simulate boolean threshold circuits of size s and constant depth d by sigmoid threshold circuits of the same size and depth. The containment is proper as a consequence of Theorems 3.1 and

18 4 Simulation Results and Separation Boosting T Cd() 0 is the class of those families (g n j n 2 N) of boolean functions that are computable, 1 with separation ( ), by polynomial size, depth d -circuits whose weights are reals poly(n) of absolute value at most poly(n). T Cd 0 ([HMPST]) is the corresponding class of families of boolean functions computable by polynomial size, depth d boolean threshold circuits whose weights are polynomial size integers. Theorem 4.1 Let : R! [0; 1] be a nondecreasing function that is Lipschitz-bounded and converges fast to 0 (resp. 1) in the following sense: Then the following holds. 9 " > 0 9 x 0 > 0 8 x x 0 (?x) 1 x " ^ 1? (x) 1 x " : (a) For every d 2 N, T C 0 d = T C 0 d(). (b) The class T C 0 d() does not change if we demand separation (1). Observe, that the above class of functions also includes the standard sigmoid. Proof. Assume that (g n jn 2 N) is a family of boolean functions in T Cd(). 0 Thus (g n jn 2 N) can be computed with separation 1 by some family (C p(n) n jn 2 N) of - circuits of depth d with the number of gates and the size of weights bounded by q(n) (for some polynomials p and q). Since is Lipschitz-bounded, and since the depth d of C n is a constant, there exists a polynomial r(n) with the following property: If the gate function of each gate G in C n is replaced by some arbitrary function G : R! R (where the functions G may be dierent for dierent gates G) such that 8x 2 R(j(x)? G (x)j 1 r(n) ); then for each input x 1 ; : : : ; x n of C n the value of the output gate of the new circuit diers from the value of the output gate of C n by at most 1. 2p(n) In order to construct a boolean threshold circuit Cn b that computes g n, one replaces in C n each internal -gate that outputs ( P m j=1 j y j? ) for inputs y 1 ; : : : ; y m 2 [0; 1] (with reals 1 ; : : : ; m ; of polynomial size in n) by a weighted sum S(~y) := l k=1 1 2r(n) H k(y 1 ; : : : ; y m ) of l := 2r(n) boolean threshold gates H 1 ; : : : ; H l (which use the same weights 1 ; : : : ; m as G). The function S is chosen to be a step function which approximates such that for all y 1 ; : : : ; y m 2 [0; 1], j( j y j? )? S(~y)j 1 2r(n) : j=1 18

19 In a second step, one replaces each of the boolean threshold gates H k by a boolean threshold gate H 0 k whose weights and thresholds are integers of polynomial size. We set S 0 (~y) := l k=1 The threshold gates H 0 k are chosen such that 1 2r(n) H0 k(y 1 ; : : : ; y m ): 8y 1 ; : : : ; y m 2 [0; 1](jS(~y)? S 0 (~y)j 1 2r(n) ): Let C 0 n be the circuit that results from C n by replacing in the described manner each internal -gate in C n by an array of boolean threshold gates Hk. 0 For every input, the value of the output gates of C n and C 0 1 n dier by at most. Hence we can replace 2p(n) the output gate of C 0 n by a boolean threshold gate with integer weights and threshold of polynomial size such that the resulting boolean threshold circuit Cn b computes g n. This shows that (g n jn 2 N) 2 T Cd. 0 In order to prove the other inclusion assume that (g n jn 2 N) 2 T Cd 0 is computed by a family (B n jn 2 N) of boolean threshold circuits of depth d, where B n has at most p(n) gates and its weights and thresholds are integers of absolute value at most q(n) (for some polynomials p and q with p(n) q(n) 2 for all n 2 N). Without loss of generality, we assume that for each circuit input the weighted sum at each gate in B n has distance at least 1 from its threshold (if this is not the case, rst multiply all weights and thresholds of gates in B n by 2, and then lower each threshold by 1). In addition we assume for simplicity that x 0 = 1 in the assumption about. By the assumption about there exists some l 2 N such that 8 x 1((?x l ) 1 x and 1? (xl ) 1 x ): Let B 0 n be a boolean threshold circuit that results from B n by multiplying rst all weights and thresholds of gates in B n by 2[2p(n)q(n)] l. It is obvious that B 0 n also computes the boolean function g n. In addition, for each circuit input the weighted sum at each gate in B 0 n has distance at least 2[2p(n)q(n)] l from its threshold. Let C n be the -circuit that results if we replace each boolean threshold gate in B 0 n by a -gate with the same weights and threshold. Then one shows by induction on the depth of a gate G in C n that for every boolean circuit input, the output of G diers by at most n from the output of the corresponding gate in B 0 n, where n := 1 2p(n)q(n). In the induction step one exploits that p(n) q(n) 2 [2p(n)q(n)] l n = [2p(n)q(n)] l : This implies that a change of at most n in each of the at most p(n) inputs of G causes a change of at most [2p(n)q(n)] l in the value of the weighted sum that reaches G, and so this weighted sum has distance at least 2[2p(n)q(n)] l? [2p(n)q(n)] l = [2p(n)q(n)] l 19

20 from the threshold. Therefore the output value of the -gate G diers by at most max n 1? ([2p(n)q(n)] l ); (?[2p(n)q(n)] l ) o 1 2p(n)q(n) = n from the output of the corresponding boolean threshold gate in B 0 n. The preceding argument implies that for any n 2 the -circuit C n threshold 1 computes the boolean function g 2 n with separation 1. 4 with outer Remark 4.2 One can also simulate polynomial size -circuits with weights of absolute value at most 2 poly(n) by polynomial size boolean threshold circuits with 0-1 weights; however in this case the circuit depth increases by a constant factor. This simulation can be extended to the case of real-valued inputs, where we assume that polynomially many bits of each real input are given as inputs to the simulating boolean threshold circuit. Remark 4.3 More recently it has been shown (see the paper by Maass in this volume, or the extended abstract [M]) that for neural nets with arbitrary piecewise polynomial activation functions (with polynomially many polynomial pieces of bounded degree) and arbitrary real weights, the class of boolean functions that can be computed in constant depth and polynomial size (with arbitrarily small separation) is contained in T C 0. 20

21 References [B] [DS] [HKP] [HMPST] [K] [M] [MSS] [MT] W.J. Bultman, \Topics in the Theory of Machine Learning and Neural Computing", Ph.D Dissertation, University of Illinois at Chicago, B. DasGupta and G. Schnitger, \The power of Approximating: a Comparison of Activation Functions", in Advances in Neural Information Processing Systems 5, S.E. Hanson, J.D. Cowan and C.L. Giles eds, pp , J. Hertz, A. Krogh and R.G. Palmer, \Introduction to the Theory of Neural Computation", Addison-Wesley, Redwood City, A.Hajnal, W. Maass, P. Pudlak, M. Szegedy and G. Turan, \Threshold Circuits of Bounded Depth", Proc. of the 28th Annual Symp. on Foundations of Computer Science, pp , To appear in J. Comp. Syst. Sci.. C.C. Klimasauskas, \The 1989 Neuro-Computing Bibliography", MIT Press, Cambridge, W. Maass, \Bounds for the computational power and learning complexity of analog neural nets", Proc. of the 25th ACM Symposium on the Theory of Computing, 1993, W. Maass, G. Schnitger, E. D. Sontag, \On the computational power of sigmoid versus boolean threshold circuits", Proc. of the 32nd Annual IEEE Symp. on Foundations of Computer Science, 1991, W. Maass and G. Turan, \How Fast Can a Threshold Gate Learn?", in: Computational Learning Theory and Natural Learning Systems: Constraints and Prospects, G. Drastal, S.J. Hanson and R. Rivest eds., MIT Press, to appear. [Mu] S. Muroga, \Threshold Logic and its Applications", Wiley, New York, [S] E.L. Schwartz, \Computational Neuroscience", MIT Press, Cambridge, [So] [SS1] [SS2] E.D. Sontag, \Feedforward Nets for Interpolation and Classication", J. Comp. Syst. Sci., 45(1992): E.D. Sontag and H.J. Sussmann, \Backpropagation can give rise to spurious local minima even for networks without hidden layers", Complex Systems 3, pp , E.D. Sontag and H.J. Sussmann, \Backpropagation separates where Perceptrons do", Neural Networks, 4, pp , [WK] S.M. Weiss, C.A. Kulikowsky, \Computer Systems that Learn", Morgan Kaufmann, San Mateo,

Analog Neural Nets with Gaussian or other Common. Noise Distributions cannot Recognize Arbitrary. Regular Languages.

Analog Neural Nets with Gaussian or other Common. Noise Distributions cannot Recognize Arbitrary. Regular Languages. Analog Neural Nets with Gaussian or other Common Noise Distributions cannot Recognize Arbitrary Regular Languages Wolfgang Maass Inst. for Theoretical Computer Science, Technische Universitat Graz Klosterwiesgasse

More information

The Power of Approximating: a Comparison of Activation Functions

The Power of Approximating: a Comparison of Activation Functions The Power of Approximating: a Comparison of Activation Functions Bhaskar DasGupta Department of Computer Science University of Minnesota Minneapolis, MN 55455-0159 email: dasgupta~cs.umn.edu Georg Schnitger

More information

On Learning µ-perceptron Networks with Binary Weights

On Learning µ-perceptron Networks with Binary Weights On Learning µ-perceptron Networks with Binary Weights Mostefa Golea Ottawa-Carleton Institute for Physics University of Ottawa Ottawa, Ont., Canada K1N 6N5 050287@acadvm1.uottawa.ca Mario Marchand Ottawa-Carleton

More information

Vapnik-Chervonenkis Dimension of Neural Nets

Vapnik-Chervonenkis Dimension of Neural Nets P. L. Bartlett and W. Maass: Vapnik-Chervonenkis Dimension of Neural Nets 1 Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley

More information

Vapnik-Chervonenkis Dimension of Neural Nets

Vapnik-Chervonenkis Dimension of Neural Nets Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley Department of Statistics 367 Evans Hall, CA 94720-3860, USA bartlett@stat.berkeley.edu

More information

Theoretical Advances in Neural Computation and Learning, V. P. Roychowdhury, K. Y. Siu, A. Perspectives of Current Research about

Theoretical Advances in Neural Computation and Learning, V. P. Roychowdhury, K. Y. Siu, A. Perspectives of Current Research about to appear in: Theoretical Advances in Neural Computation and Learning, V. P. Roychowdhury, K. Y. Siu, A. Orlitsky, editors, Kluwer Academic Publishers Perspectives of Current Research about the Complexity

More information

ations. It arises in control problems, when the inputs to a regulator are timedependent measurements of plant states, or in speech processing, where i

ations. It arises in control problems, when the inputs to a regulator are timedependent measurements of plant states, or in speech processing, where i Vapnik-Chervonenkis Dimension of Recurrent Neural Networks Pascal Koiran 1? and Eduardo D. Sontag 2?? 1 Laboratoire de l'informatique du Parallelisme Ecole Normale Superieure de Lyon { CNRS 46 allee d'italie,

More information

Nets Hawk Katz Theorem. There existsaconstant C>so that for any number >, whenever E [ ] [ ] is a set which does not contain the vertices of any axis

Nets Hawk Katz Theorem. There existsaconstant C>so that for any number >, whenever E [ ] [ ] is a set which does not contain the vertices of any axis New York Journal of Mathematics New York J. Math. 5 999) {3. On the Self Crossing Six Sided Figure Problem Nets Hawk Katz Abstract. It was shown by Carbery, Christ, and Wright that any measurable set E

More information

1 Introduction It will be convenient to use the inx operators a b and a b to stand for maximum (least upper bound) and minimum (greatest lower bound)

1 Introduction It will be convenient to use the inx operators a b and a b to stand for maximum (least upper bound) and minimum (greatest lower bound) Cycle times and xed points of min-max functions Jeremy Gunawardena, Department of Computer Science, Stanford University, Stanford, CA 94305, USA. jeremy@cs.stanford.edu October 11, 1993 to appear in the

More information

growth rates of perturbed time-varying linear systems, [14]. For this setup it is also necessary to study discrete-time systems with a transition map

growth rates of perturbed time-varying linear systems, [14]. For this setup it is also necessary to study discrete-time systems with a transition map Remarks on universal nonsingular controls for discrete-time systems Eduardo D. Sontag a and Fabian R. Wirth b;1 a Department of Mathematics, Rutgers University, New Brunswick, NJ 08903, b sontag@hilbert.rutgers.edu

More information

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES JAMES READY Abstract. In this paper, we rst introduce the concepts of Markov Chains and their stationary distributions. We then discuss

More information

In: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,

More information

Can PAC Learning Algorithms Tolerate. Random Attribute Noise? Sally A. Goldman. Department of Computer Science. Washington University

Can PAC Learning Algorithms Tolerate. Random Attribute Noise? Sally A. Goldman. Department of Computer Science. Washington University Can PAC Learning Algorithms Tolerate Random Attribute Noise? Sally A. Goldman Department of Computer Science Washington University St. Louis, Missouri 63130 Robert H. Sloan y Dept. of Electrical Engineering

More information

Lower Bounds for Cutting Planes Proofs. with Small Coecients. Abstract. We consider small-weight Cutting Planes (CP ) proofs; that is,

Lower Bounds for Cutting Planes Proofs. with Small Coecients. Abstract. We consider small-weight Cutting Planes (CP ) proofs; that is, Lower Bounds for Cutting Planes Proofs with Small Coecients Maria Bonet y Toniann Pitassi z Ran Raz x Abstract We consider small-weight Cutting Planes (CP ) proofs; that is, Cutting Planes (CP ) proofs

More information

On-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins.

On-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins. On-line Bin-Stretching Yossi Azar y Oded Regev z Abstract We are given a sequence of items that can be packed into m unit size bins. In the classical bin packing problem we x the size of the bins and try

More information

for average case complexity 1 randomized reductions, an attempt to derive these notions from (more or less) rst

for average case complexity 1 randomized reductions, an attempt to derive these notions from (more or less) rst On the reduction theory for average case complexity 1 Andreas Blass 2 and Yuri Gurevich 3 Abstract. This is an attempt to simplify and justify the notions of deterministic and randomized reductions, an

More information

8. Limit Laws. lim(f g)(x) = lim f(x) lim g(x), (x) = lim x a f(x) g lim x a g(x)

8. Limit Laws. lim(f g)(x) = lim f(x) lim g(x), (x) = lim x a f(x) g lim x a g(x) 8. Limit Laws 8.1. Basic Limit Laws. If f and g are two functions and we know the it of each of them at a given point a, then we can easily compute the it at a of their sum, difference, product, constant

More information

REMARKS ON THE TIME-OPTIMAL CONTROL OF A CLASS OF HAMILTONIAN SYSTEMS. Eduardo D. Sontag. SYCON - Rutgers Center for Systems and Control

REMARKS ON THE TIME-OPTIMAL CONTROL OF A CLASS OF HAMILTONIAN SYSTEMS. Eduardo D. Sontag. SYCON - Rutgers Center for Systems and Control REMARKS ON THE TIME-OPTIMAL CONTROL OF A CLASS OF HAMILTONIAN SYSTEMS Eduardo D. Sontag SYCON - Rutgers Center for Systems and Control Department of Mathematics, Rutgers University, New Brunswick, NJ 08903

More information

On the complexity of shallow and deep neural network classifiers

On the complexity of shallow and deep neural network classifiers On the complexity of shallow and deep neural network classifiers Monica Bianchini and Franco Scarselli Department of Information Engineering and Mathematics University of Siena Via Roma 56, I-53100, Siena,

More information

Australian National University. Abstract. number of training examples necessary for satisfactory learning performance grows

Australian National University. Abstract. number of training examples necessary for satisfactory learning performance grows The VC-Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs Peter L. Bartlett Robert C. Williamson Department of Systems Engineering Research School of Information Sciences and

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

Upper and Lower Bounds for Tree-like. Cutting Planes Proofs. Abstract. In this paper we study the complexity of Cutting Planes (CP) refutations, and

Upper and Lower Bounds for Tree-like. Cutting Planes Proofs. Abstract. In this paper we study the complexity of Cutting Planes (CP) refutations, and Upper and Lower Bounds for Tree-like Cutting Planes Proofs Russell Impagliazzo Toniann Pitassi y Alasdair Urquhart z UCSD UCSD and U. of Pittsburgh University of Toronto Abstract In this paper we study

More information

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics UNIVERSITY OF CAMBRIDGE Numerical Analysis Reports Spurious Chaotic Solutions of Dierential Equations Sigitas Keras DAMTP 994/NA6 September 994 Department of Applied Mathematics and Theoretical Physics

More information

be any ring homomorphism and let s S be any element of S. Then there is a unique ring homomorphism

be any ring homomorphism and let s S be any element of S. Then there is a unique ring homomorphism 21. Polynomial rings Let us now turn out attention to determining the prime elements of a polynomial ring, where the coefficient ring is a field. We already know that such a polynomial ring is a UFD. Therefore

More information

where c R and the content of f is one. 1

where c R and the content of f is one. 1 9. Gauss Lemma Obviously it would be nice to have some more general methods of proving that a given polynomial is irreducible. The first is rather beautiful and due to Gauss. The basic idea is as follows.

More information

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS LE MATEMATICHE Vol. LVII (2002) Fasc. I, pp. 6382 SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS VITTORINO PATA - ALFONSO VILLANI Given an arbitrary real function f, the set D

More information

Approximation Algorithms for Maximum. Coverage and Max Cut with Given Sizes of. Parts? A. A. Ageev and M. I. Sviridenko

Approximation Algorithms for Maximum. Coverage and Max Cut with Given Sizes of. Parts? A. A. Ageev and M. I. Sviridenko Approximation Algorithms for Maximum Coverage and Max Cut with Given Sizes of Parts? A. A. Ageev and M. I. Sviridenko Sobolev Institute of Mathematics pr. Koptyuga 4, 630090, Novosibirsk, Russia fageev,svirg@math.nsc.ru

More information

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on

More information

ANALOG COMPUTATION VIA NEURAL NETWORKS. Hava T. Siegelmann. Department of Computer Science. Rutgers University, New Brunswick, NJ 08903

ANALOG COMPUTATION VIA NEURAL NETWORKS. Hava T. Siegelmann. Department of Computer Science. Rutgers University, New Brunswick, NJ 08903 ANALOG COMPUTATION VIA NEURAL NETWORKS Hava T. Siegelmann Department of Computer Science Rutgers University, New Brunswick, NJ 08903 E-mail: siegelma@yoko.rutgers.edu Eduardo D. Sontag Department of Mathematics

More information

ON TRIVIAL GRADIENT YOUNG MEASURES BAISHENG YAN Abstract. We give a condition on a closed set K of real nm matrices which ensures that any W 1 p -grad

ON TRIVIAL GRADIENT YOUNG MEASURES BAISHENG YAN Abstract. We give a condition on a closed set K of real nm matrices which ensures that any W 1 p -grad ON TRIVIAL GRAIENT YOUNG MEASURES BAISHENG YAN Abstract. We give a condition on a closed set K of real nm matrices which ensures that any W 1 p -gradient Young measure supported on K must be trivial the

More information

Lecture 8 - Algebraic Methods for Matching 1

Lecture 8 - Algebraic Methods for Matching 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 1, 2018 Lecture 8 - Algebraic Methods for Matching 1 In the last lecture we showed that

More information

Extremal problems in logic programming and stable model computation Pawe l Cholewinski and Miros law Truszczynski Computer Science Department Universi

Extremal problems in logic programming and stable model computation Pawe l Cholewinski and Miros law Truszczynski Computer Science Department Universi Extremal problems in logic programming and stable model computation Pawe l Cholewinski and Miros law Truszczynski Computer Science Department University of Kentucky Lexington, KY 40506-0046 fpaweljmirekg@cs.engr.uky.edu

More information

STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER. Department of Mathematics. University of Wisconsin.

STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER. Department of Mathematics. University of Wisconsin. STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER Department of Mathematics University of Wisconsin Madison WI 5376 keisler@math.wisc.edu 1. Introduction The Loeb measure construction

More information

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f Numer. Math. 67: 289{301 (1994) Numerische Mathematik c Springer-Verlag 1994 Electronic Edition Least supported bases and local linear independence J.M. Carnicer, J.M. Pe~na? Departamento de Matematica

More information

X/94 $ IEEE 1894

X/94 $ IEEE 1894 I Implementing Radial Basis Functions Using Bump-Resistor Networks John G. Harris University of Florida EE Dept., 436 CSE Bldg 42 Gainesville, FL 3261 1 harris@j upit er.ee.ufl.edu Abstract- Radial Basis

More information

On the complexity of approximate multivariate integration

On the complexity of approximate multivariate integration On the complexity of approximate multivariate integration Ioannis Koutis Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 USA ioannis.koutis@cs.cmu.edu January 11, 2005 Abstract

More information

Resolution and the Weak Pigeonhole Principle

Resolution and the Weak Pigeonhole Principle Resolution and the Weak Pigeonhole Principle Sam Buss 1 and Toniann Pitassi 2 1 Departments of Mathematics and Computer Science University of California, San Diego, La Jolla, CA 92093-0112. 2 Department

More information

[3] R.M. Colomb and C.Y.C. Chung. Very fast decision table execution of propositional

[3] R.M. Colomb and C.Y.C. Chung. Very fast decision table execution of propositional - 14 - [3] R.M. Colomb and C.Y.C. Chung. Very fast decision table execution of propositional expert systems. Proceedings of the 8th National Conference on Articial Intelligence (AAAI-9), 199, 671{676.

More information

Upper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139

Upper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139 Upper and Lower Bounds on the Number of Faults a System Can Withstand Without Repairs Michel Goemans y Nancy Lynch z Isaac Saias x Laboratory for Computer Science Massachusetts Institute of Technology

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

Constructing c-ary Perfect Factors

Constructing c-ary Perfect Factors Constructing c-ary Perfect Factors Chris J. Mitchell Computer Science Department Royal Holloway University of London Egham Hill Egham Surrey TW20 0EX England. Tel.: +44 784 443423 Fax: +44 784 443420 Email:

More information

A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed

A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed Computer Science Technical Reports Computer Science 3-17-1994 A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed Richa Agarwala Iowa State University David Fernández-Baca

More information

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report # Degradable Agreement in the Presence of Byzantine Faults Nitin H. Vaidya Technical Report # 92-020 Abstract Consider a system consisting of a sender that wants to send a value to certain receivers. Byzantine

More information

Formal power series rings, inverse limits, and I-adic completions of rings

Formal power series rings, inverse limits, and I-adic completions of rings Formal power series rings, inverse limits, and I-adic completions of rings Formal semigroup rings and formal power series rings We next want to explore the notion of a (formal) power series ring in finitely

More information

DIVISORS ON NONSINGULAR CURVES

DIVISORS ON NONSINGULAR CURVES DIVISORS ON NONSINGULAR CURVES BRIAN OSSERMAN We now begin a closer study of the behavior of projective nonsingular curves, and morphisms between them, as well as to projective space. To this end, we introduce

More information

2 EBERHARD BECKER ET AL. has a real root. Thus our problem can be reduced to the problem of deciding whether or not a polynomial in one more variable

2 EBERHARD BECKER ET AL. has a real root. Thus our problem can be reduced to the problem of deciding whether or not a polynomial in one more variable Deciding positivity of real polynomials Eberhard Becker, Victoria Powers, and Thorsten Wormann Abstract. We describe an algorithm for deciding whether or not a real polynomial is positive semidenite. The

More information

The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning

The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning REVlEW The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning Yaser S. Abu-Mostafa California Institute of Terhnohgy, Pasadcna, CA 91 125 USA When feasible, learning is a very attractive

More information

Circuit depth relative to a random oracle. Peter Bro Miltersen. Aarhus University, Computer Science Department

Circuit depth relative to a random oracle. Peter Bro Miltersen. Aarhus University, Computer Science Department Circuit depth relative to a random oracle Peter Bro Miltersen Aarhus University, Computer Science Department Ny Munkegade, DK 8000 Aarhus C, Denmark. pbmiltersen@daimi.aau.dk Keywords: Computational complexity,

More information

Extremal Cases of the Ahlswede-Cai Inequality. A. J. Radclie and Zs. Szaniszlo. University of Nebraska-Lincoln. Department of Mathematics

Extremal Cases of the Ahlswede-Cai Inequality. A. J. Radclie and Zs. Szaniszlo. University of Nebraska-Lincoln. Department of Mathematics Extremal Cases of the Ahlswede-Cai Inequality A J Radclie and Zs Szaniszlo University of Nebraska{Lincoln Department of Mathematics 810 Oldfather Hall University of Nebraska-Lincoln Lincoln, NE 68588 1

More information

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.

More information

This material is based upon work supported under a National Science Foundation graduate fellowship.

This material is based upon work supported under a National Science Foundation graduate fellowship. TRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE Avrim L. Blum MIT Laboratory for Computer Science Cambridge, Mass. 02139 USA Ronald L. Rivest y MIT Laboratory for Computer Science Cambridge, Mass. 02139

More information

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition) Vector Space Basics (Remark: these notes are highly formal and may be a useful reference to some students however I am also posting Ray Heitmann's notes to Canvas for students interested in a direct computational

More information

Math 42, Discrete Mathematics

Math 42, Discrete Mathematics c Fall 2018 last updated 10/10/2018 at 23:28:03 For use by students in this class only; all rights reserved. Note: some prose & some tables are taken directly from Kenneth R. Rosen, and Its Applications,

More information

8.1 Polynomial Threshold Functions

8.1 Polynomial Threshold Functions CS 395T Computational Learning Theory Lecture 8: September 22, 2008 Lecturer: Adam Klivans Scribe: John Wright 8.1 Polynomial Threshold Functions In the previous lecture, we proved that any function over

More information

Alon Orlitsky. AT&T Bell Laboratories. March 22, Abstract

Alon Orlitsky. AT&T Bell Laboratories. March 22, Abstract Average-case interactive communication Alon Orlitsky AT&T Bell Laboratories March 22, 1996 Abstract and Y are random variables. Person P knows, Person P Y knows Y, and both know the joint probability distribution

More information

October 7, :8 WSPC/WS-IJWMIP paper. Polynomial functions are renable

October 7, :8 WSPC/WS-IJWMIP paper. Polynomial functions are renable International Journal of Wavelets, Multiresolution and Information Processing c World Scientic Publishing Company Polynomial functions are renable Henning Thielemann Institut für Informatik Martin-Luther-Universität

More information

A Method for the Efficient Design of Boltzmann Machines for Classification Problems

A Method for the Efficient Design of Boltzmann Machines for Classification Problems A Method for the Efficient Design of Boltzmann Machines for Classification Problems Ajay Gupta and Wolfgang Maass Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

MULTIPLICITIES OF MONOMIAL IDEALS

MULTIPLICITIES OF MONOMIAL IDEALS MULTIPLICITIES OF MONOMIAL IDEALS JÜRGEN HERZOG AND HEMA SRINIVASAN Introduction Let S = K[x 1 x n ] be a polynomial ring over a field K with standard grading, I S a graded ideal. The multiplicity of S/I

More information

3.1 Basic properties of real numbers - continuation Inmum and supremum of a set of real numbers

3.1 Basic properties of real numbers - continuation Inmum and supremum of a set of real numbers Chapter 3 Real numbers The notion of real number was introduced in section 1.3 where the axiomatic denition of the set of all real numbers was done and some basic properties of the set of all real numbers

More information

The WENO Method for Non-Equidistant Meshes

The WENO Method for Non-Equidistant Meshes The WENO Method for Non-Equidistant Meshes Philip Rupp September 11, 01, Berlin Contents 1 Introduction 1.1 Settings and Conditions...................... The WENO Schemes 4.1 The Interpolation Problem.....................

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

Lebesgue Measure. Dung Le 1

Lebesgue Measure. Dung Le 1 Lebesgue Measure Dung Le 1 1 Introduction How do we measure the size of a set in IR? Let s start with the simplest ones: intervals. Obviously, the natural candidate for a measure of an interval is its

More information

Graphs with few total dominating sets

Graphs with few total dominating sets Graphs with few total dominating sets Marcin Krzywkowski marcin.krzywkowski@gmail.com Stephan Wagner swagner@sun.ac.za Abstract We give a lower bound for the number of total dominating sets of a graph

More information

CHAPTER 10: POLYNOMIALS (DRAFT)

CHAPTER 10: POLYNOMIALS (DRAFT) CHAPTER 10: POLYNOMIALS (DRAFT) LECTURE NOTES FOR MATH 378 (CSUSM, SPRING 2009). WAYNE AITKEN The material in this chapter is fairly informal. Unlike earlier chapters, no attempt is made to rigorously

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Coins with arbitrary weights. Abstract. Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to

Coins with arbitrary weights. Abstract. Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to Coins with arbitrary weights Noga Alon Dmitry N. Kozlov y Abstract Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to decide if all the m given coins have the

More information

Notes on Computer Theory Last updated: November, Circuits

Notes on Computer Theory Last updated: November, Circuits Notes on Computer Theory Last updated: November, 2015 Circuits Notes by Jonathan Katz, lightly edited by Dov Gordon. 1 Circuits Boolean circuits offer an alternate model of computation: a non-uniform one

More information

Error Empirical error. Generalization error. Time (number of iteration)

Error Empirical error. Generalization error. Time (number of iteration) Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp

More information

Convex Analysis and Economic Theory AY Elementary properties of convex functions

Convex Analysis and Economic Theory AY Elementary properties of convex functions Division of the Humanities and Social Sciences Ec 181 KC Border Convex Analysis and Economic Theory AY 2018 2019 Topic 6: Convex functions I 6.1 Elementary properties of convex functions We may occasionally

More information

16 Chapter 3. Separation Properties, Principal Pivot Transforms, Classes... for all j 2 J is said to be a subcomplementary vector of variables for (3.

16 Chapter 3. Separation Properties, Principal Pivot Transforms, Classes... for all j 2 J is said to be a subcomplementary vector of variables for (3. Chapter 3 SEPARATION PROPERTIES, PRINCIPAL PIVOT TRANSFORMS, CLASSES OF MATRICES In this chapter we present the basic mathematical results on the LCP. Many of these results are used in later chapters to

More information

Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics Processing of Time Series by Neural Circuits with iologically Realistic Synaptic Dynamics Thomas Natschläger & Wolfgang Maass Institute for Theoretical Computer Science Technische Universität Graz, ustria

More information

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses Statistica Sinica 5(1995), 459-473 OPTIMAL DESIGNS FOR POLYNOMIAL REGRESSION WHEN THE DEGREE IS NOT KNOWN Holger Dette and William J Studden Technische Universitat Dresden and Purdue University Abstract:

More information

Contents. 4 Arithmetic and Unique Factorization in Integral Domains. 4.1 Euclidean Domains and Principal Ideal Domains

Contents. 4 Arithmetic and Unique Factorization in Integral Domains. 4.1 Euclidean Domains and Principal Ideal Domains Ring Theory (part 4): Arithmetic and Unique Factorization in Integral Domains (by Evan Dummit, 018, v. 1.00) Contents 4 Arithmetic and Unique Factorization in Integral Domains 1 4.1 Euclidean Domains and

More information

ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS. Abstract. We introduce a wide class of asymmetric loss functions and show how to obtain

ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS. Abstract. We introduce a wide class of asymmetric loss functions and show how to obtain ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS FUNCTIONS Michael Baron Received: Abstract We introduce a wide class of asymmetric loss functions and show how to obtain asymmetric-type optimal decision

More information

Lebesgue Measure on R n

Lebesgue Measure on R n CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

Linear Codes, Target Function Classes, and Network Computing Capacity

Linear Codes, Target Function Classes, and Network Computing Capacity Linear Codes, Target Function Classes, and Network Computing Capacity Rathinakumar Appuswamy, Massimo Franceschetti, Nikhil Karamchandani, and Kenneth Zeger IEEE Transactions on Information Theory Submitted:

More information

Counting and Constructing Minimal Spanning Trees. Perrin Wright. Department of Mathematics. Florida State University. Tallahassee, FL

Counting and Constructing Minimal Spanning Trees. Perrin Wright. Department of Mathematics. Florida State University. Tallahassee, FL Counting and Constructing Minimal Spanning Trees Perrin Wright Department of Mathematics Florida State University Tallahassee, FL 32306-3027 Abstract. We revisit the minimal spanning tree problem in order

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

3. The Sheaf of Regular Functions

3. The Sheaf of Regular Functions 24 Andreas Gathmann 3. The Sheaf of Regular Functions After having defined affine varieties, our next goal must be to say what kind of maps between them we want to consider as morphisms, i. e. as nice

More information

LECTURE NOTES IN CRYPTOGRAPHY

LECTURE NOTES IN CRYPTOGRAPHY 1 LECTURE NOTES IN CRYPTOGRAPHY Thomas Johansson 2005/2006 c Thomas Johansson 2006 2 Chapter 1 Abstract algebra and Number theory Before we start the treatment of cryptography we need to review some basic

More information

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University Likelihood Ratio Tests and Intersection-Union Tests by Roger L. Berger Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series Number 2288

More information

HOPFIELD neural networks (HNNs) are a class of nonlinear

HOPFIELD neural networks (HNNs) are a class of nonlinear IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 4, APRIL 2005 213 Stochastic Noise Process Enhancement of Hopfield Neural Networks Vladimir Pavlović, Member, IEEE, Dan Schonfeld,

More information

MATH 614 Dynamical Systems and Chaos Lecture 3: Classification of fixed points.

MATH 614 Dynamical Systems and Chaos Lecture 3: Classification of fixed points. MATH 614 Dynamical Systems and Chaos Lecture 3: Classification of fixed points. Periodic points Definition. A point x X is called a fixed point of a map f : X X if f(x) = x. A point x X is called a periodic

More information

Dempster's Rule of Combination is. #P -complete. Pekka Orponen. Department of Computer Science, University of Helsinki

Dempster's Rule of Combination is. #P -complete. Pekka Orponen. Department of Computer Science, University of Helsinki Dempster's Rule of Combination is #P -complete Pekka Orponen Department of Computer Science, University of Helsinki eollisuuskatu 23, SF{00510 Helsinki, Finland Abstract We consider the complexity of combining

More information

7. F.Balarin and A.Sangiovanni-Vincentelli, A Verication Strategy for Timing-

7. F.Balarin and A.Sangiovanni-Vincentelli, A Verication Strategy for Timing- 7. F.Balarin and A.Sangiovanni-Vincentelli, A Verication Strategy for Timing- Constrained Systems, Proc. 4th Workshop Computer-Aided Verication, Lecture Notes in Computer Science 663, Springer-Verlag,

More information

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding Techinical Proofs for Nonlinear Learning using Local Coordinate Coding 1 Notations and Main Results Denition 1.1 (Lipschitz Smoothness) A function f(x) on R d is (α, β, p)-lipschitz smooth with respect

More information

NEARLY COMONOTONE APPROXIMATION. Abstract. We discuss the degree of approximation by polynomials of a function f that is

NEARLY COMONOTONE APPROXIMATION. Abstract. We discuss the degree of approximation by polynomials of a function f that is NEARLY COMONOTONE APPROIMATION D. Leviatan I. A. Shevchuk Abstract. We discuss the degree of approximation by polynomials of a function f that is piecewise monotone in [?; ]. We would like to approximate

More information

x i2 j i2 ε + ε i2 ε ji1 ε j i1

x i2 j i2 ε + ε i2 ε ji1 ε j i1 T-S Fuzzy Model with Linear Rule Consequence and PDC Controller: A Universal Framewor for Nonlinear Control Systems Hua O. Wang Jing Li David Niemann Laboratory for Intelligent and Nonlinear Control (LINC)

More information

A note on monotone real circuits

A note on monotone real circuits A note on monotone real circuits Pavel Hrubeš and Pavel Pudlák March 14, 2017 Abstract We show that if a Boolean function f : {0, 1} n {0, 1} can be computed by a monotone real circuit of size s using

More information

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v 250) Contents 2 Vector Spaces 1 21 Vectors in R n 1 22 The Formal Denition of a Vector Space 4 23 Subspaces 6 24 Linear Combinations and

More information

Polynomial Certificates for Propositional Classes

Polynomial Certificates for Propositional Classes Polynomial Certificates for Propositional Classes Marta Arias Ctr. for Comp. Learning Systems Columbia University New York, NY 10115, USA Aaron Feigelson Leydig, Voit & Mayer, Ltd. Chicago, IL 60601, USA

More information

ON THE ARITHMETIC-GEOMETRIC MEAN INEQUALITY AND ITS RELATIONSHIP TO LINEAR PROGRAMMING, BAHMAN KALANTARI

ON THE ARITHMETIC-GEOMETRIC MEAN INEQUALITY AND ITS RELATIONSHIP TO LINEAR PROGRAMMING, BAHMAN KALANTARI ON THE ARITHMETIC-GEOMETRIC MEAN INEQUALITY AND ITS RELATIONSHIP TO LINEAR PROGRAMMING, MATRIX SCALING, AND GORDAN'S THEOREM BAHMAN KALANTARI Abstract. It is a classical inequality that the minimum of

More information

8537-CS, Bonn, 1989.) Society, Providence, R.I., July 1992, (1989):

8537-CS, Bonn, 1989.) Society, Providence, R.I., July 1992, (1989): An example is furnished by f(x) = cos(x), and = 1. This shows that arbitrary (not 1+x 2 exp-ra denable) analytic functions may result in architectures with innite VC dimension. (Moreover, the architecture

More information

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

Suppose R is an ordered ring with positive elements P.

Suppose R is an ordered ring with positive elements P. 1. The real numbers. 1.1. Ordered rings. Definition 1.1. By an ordered commutative ring with unity we mean an ordered sextuple (R, +, 0,, 1, P ) such that (R, +, 0,, 1) is a commutative ring with unity

More information

Bounds on the generalised acyclic chromatic numbers of bounded degree graphs

Bounds on the generalised acyclic chromatic numbers of bounded degree graphs Bounds on the generalised acyclic chromatic numbers of bounded degree graphs Catherine Greenhill 1, Oleg Pikhurko 2 1 School of Mathematics, The University of New South Wales, Sydney NSW Australia 2052,

More information

and the polynomial-time Turing p reduction from approximate CVP to SVP given in [10], the present authors obtained a n=2-approximation algorithm that

and the polynomial-time Turing p reduction from approximate CVP to SVP given in [10], the present authors obtained a n=2-approximation algorithm that Sampling short lattice vectors and the closest lattice vector problem Miklos Ajtai Ravi Kumar D. Sivakumar IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120. fajtai, ravi, sivag@almaden.ibm.com

More information

2. THE MAIN RESULTS In the following, we denote by the underlying order of the lattice and by the associated Mobius function. Further let z := fx 2 :

2. THE MAIN RESULTS In the following, we denote by the underlying order of the lattice and by the associated Mobius function. Further let z := fx 2 : COMMUNICATION COMLEITY IN LATTICES Rudolf Ahlswede, Ning Cai, and Ulrich Tamm Department of Mathematics University of Bielefeld.O. Box 100131 W-4800 Bielefeld Germany 1. INTRODUCTION Let denote a nite

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information