arxiv: v1 [cs.dm] 12 Jan 2018

Size: px
Start display at page:

Download "arxiv: v1 [cs.dm] 12 Jan 2018"

Transcription

1 SELF-PREDICTING BOOLEAN FUNCTIONS OFER SHAYEVITZ AND NIR WEINBERGER arxiv: v1 [cs.dm] 12 Jan 2018 Abstract. A Boolean function g is said to be an optimal predictor for another Boolean function f, if it minimizes the probability that f(x n ) g(y n ) among all functions, where X n is uniform over the Hamming cube and Y n is obtained from X n by independently flipping each coordinate with probability δ. This paper is about self-predicting functions, which are those that coincide with their optimal predictor. 1. Introduction One of the most important properties of a Boolean function f : { 1,1} n { 1,1} is its robustness to noise in its inputs. This robustness is traditionally measured by the noise sensitivity of the function (1) NS δ [f] := Pr(f(X n ) f(y n )), where X n { 1,1} n is a uniform Bernoulli vector, and Y n { 1,1} n is obtained from X n be flipping each coordinate independently with probability 0 < δ < 1/2. The noise sensitivity of Boolean function has been extensively investigated in the theory of Boolean functions [O D14], most often in terms of the equivalent notion of stability (2) Stab ρ [f] := Ef(X n )f(y n ), where 0 < ρ < 1 is the correlation parameter, i.e., ρ EX i Y i = 1 2δ. The noise sensitivity of f can also be interpreted as the error probability of a predictor trying to guess the value of f(x n ) based on their noisy version, by simply applying f to Y n. While this predictor is intuitively appealing and easy to analyze, it is generally suboptimal. As a simple example, think of the case where f is biased and the noise level δ is sufficiently high; it is easy to see that a constant predictor would result in a lower error probability than f(y n ) would. The optimal predictor, i.e., the one that minimizes the error probability in predicting f(x n ) from Y n, is clearly given by the sign of E(f(X n ) Y n = y n ), a function that in general can be very different than f itself. While using the optimal predictor is generally superior to using the function itself (albeit as we shall see, by a factor of two at the most), computing the former is often The authors are with the Department of EE Systems, Tel Aviv University, Tel Aviv, Israel. s: {nir.wein@gmail.com, ofersha@eng.tau.ac.il}. This work was supported by an ERC grant no

2 very difficult as it depends on the values of the function over the entire Hamming cube. It is therefore interesting to study functions that coincide with their optimal predictor; we call these functions self-predicting (SP). Note that a function can be SP at certain noise levels but not at others. We say that a function is uniformly SP (USP) if it is SP at any noise level. Predicting the value of a USP function by applying it to noisy inputs is always optimal, clearly a desirable property. For example, suppose the function describes a voting rule and the noise represents possible contamination of the votes (e.g., due to fraud). In such a case it is not realistic to assume that the noise level is known, yet if the function is USP it can always be used to obtain the optimal prediction of the true voting result. In this paper, we introduce and explore self-predictability of Boolean functions. We derive various properties of SP functions, and specifically the following: For a monotone function, self-predictability at dominating boundary points is necessary and sufficient for the function to be SP. We use this fact to show that Majority functions are USP. High correlation SP: A function with Fourier degree k is SP for any ρ > 1 1/k, and a polynomial threshold function with sparsity s is SP for any ρ exp ( s n ln s s 1). Also, if f is SP for ρ > 1 ε and n = Ω(1/ε), then each point x n has a distance-2 neighbor with the same function value. A low correlation SP (abbreviated LCSP) function is spectral threshold, i.e., equal to the sign of its lowest Fourier level. This simple fact implies many properties: LCSP functions are either balanced of constant, they have energy at 2 least1/2 on their first level (if any), and a monotone LCSP function is -close πn to a linear threshold function. Sharp threshold: While all functions are trivially SP for ρ > 1 2ln2 n +O(n 2 ), only a doubly-exponential small fraction are SP for ρ = 1 2α for any α > 1. n The same continues to hold in the fixed high-correlation regime. The paper is organized as follows. Section 2 contains basic notation and Fourier theory facts. The self-predictability problem and some basic properties are introduced in Section 3, including the proof that Majority is USP. In Section 4 we discuss high-correlation sufficient conditions for SP. In Section 5 we discuss low-correlation SP functions. Section 6 contains stability-based necessary conditions for SP. In Section 7 we prove the sharp threshold phenomenon for the SP property. We conclude the paper in Section 8 with a list of open problems.

3 2. Preliminaries 2.1. Notation and Definitions. We use upper case letters for random variables and random vectors, and their lower case counterparts for specific realizations. For vectors we write x j i = (x i,...,x j ) and omit the subscript whenever i = 1. A concatenation of vectors is denoted by (x j i,xm k ) = (x i,...,x j,x k,...,x m ). The cardinality of a set S will be denoted by S. The complement of the set A is denoted by A c. We write [n] for the set {1,2,...,n}. The sign function sgn(z) returns the sign of z, and by convention sgn(0) = 1 unless otherwise stated. Throughout, the logarithm log(t) is base 2, while ln(t) is the natural logarithm. The binary entropy function is h(t) := tlog(t) (1 t) log(1 t). The Hamming distance between x n and y n is d H (x n,y n ). In this paper, X n is a uniformly distributed binary vector, andy n is the binary vector obtained by flipping each coordinate of X n with some given probability δ [0,1/2]. We write p(x n,y n ) to denote the associated joint probability mass function. As a binary alphabet, for the most part we will find it convenient to work with { 1,1}, in which case it is more natural to consider the correlation parameter ρ := EX i Y i = 1 2δ [0,1] instead of the crossover probability parameter δ. We will use these notations throughout the paper, with the exception of a few proofs where we find it more convenient to work with δ and the binary alphabet {0,1} Boolean Functions and Fourier Analysis. In this paper we consider Boolean functions f : { 1,1} n { 1,1}. The distance between two Boolean functions f and g is defined as the fraction of inputs on which they disagree, i.e., Pr(f(X n ) g(x n )). We say that f and g are ε-close if their distance is at most ε. An inner product between two Boolean functions f,g can be defined as (3) f,g := E(f(X n )g(x n )). A character associated with a set of coordinates S [n] is the Boolean function x S := i S xi, where by convention x = 1. It can be shown [O D14, Chapter 1] that the set of all characters form an orthonormal basis with respect to (w.r.t.) to the inner product (3). Furthermore, (4) f(x n ) = S [n] ˆf S x S, where{ˆf S } S [n] are the Fourier coefficients off, given by ˆf S = x S,f = E(X S f(xn )). When S is a singleton {i} [n], we use the shorthand ˆf i = ˆf {i}. The Fourier weight of f at degree k is (5) W k [f] := ˆf S 2. S [n]: S =k

4 Instead of the noise sensitivity defined in (1) it is more common to consider the stability, defined as (6) Stab ρ [f] := E(f(X n )f(y n )). Note that the noise sensitivity and stability are trivially related via (7) Stab ρ [f] = 1 2NS1 ρ[f]. 2 Thus, the stability of a function is directly related to the error probability of the possibly suboptimal predictor f(y n ) to the function s true value f(x n ). When X n and Y n are ρ-correlated, it is useful to define the noise operator (8) T ρ f(y n ) := E(f(X n ) Y n = y n ). Evidently, since {(X i,y i )} is an i.i.d. sequence, (9) T ρ f(y n ) = E ˆf S X S Y n = y n S [n] (10) (11) = S [n] = S [n] ˆf S E ( X S Y n = y n) ˆf S E(X i Y n = y n ) i S (12) = S [n]ρ S ˆf S y S. The stability can then be expressed using the Fourier coefficients and the noise operator as (13) (14) (15) (16) (17) Stab ρ [f] = E(E(f(X n )f(y n )) Y n ) = E(f(Y n )E(f(X n ) Y n )) = E(f(Y n )T ρ f(y n )) = f,t ρ f (a) = S [n]ρ S ˆf 2 S (18) = T ρ f 2 2, where (a) is using Plancharel s identity f,g = E(f(X n )g(x n )) = S [n] ˆf S ĝ S.

5 A Boolean function f is called a linear threshold function (LTF) it there exists coefficients a n 0 R n+1 such that ( ) n (19) f(x n ) = sgn a 0 + a i x i. Note that if a 0 = 0 then f is balanced, i.e., Pr(f(X n ) = 1) = 1/2. More generally, a function f is a polynomial threshold function (PTF) [Bru90] of degree k if there exists {ˆp S } such that max S:ˆpS 0 S = k and (20) f(x n ) = sgn ˆp S x S. S [n] i=1 A PTF has sparsity s if {ˆp S } is supported over exactly s terms. For LTF and PTFs, we will always assume that coefficients are chosen such that the polynomial inside the sign operator is never exactly zero. 3. Optimal Prediction and Self Predicting (SP) Functions Let f : { 1,1} n { 1,1} be some Boolean function. It is easy to see that the optimal predictor (minimizing the error probability) of f(x n ) given that Y n = y n has been observed, is simply (21) sgne(f(x n ) Y n = y n ) = sgnt ρ f(y n ). Note that according to our definition sgn(0) = 1, but ties can of course be broken arbitrarily in any other way. We say that a Boolean function f is ρ-self-predicting (ρ- SP) at y n, if the optimal predictor given y n at correlation level ρ coincides with the function itself whenever it is not tied, i.e., if (22) f(y n ) = sgnt ρ f(y n ), whenever T ρ f(y n ) 0. The function f is called ρ-sp if it is ρ-sp for any y n { 1,1} n. We say that f is uniformly self-predicting (USP) if it is ρ-sp for any ρ [0,1]. We also say that f is low-correlation self-predicting (LCSP), if there exists some ρ > 0 such that f is ρ-sp for all ρ [0,ρ ). The following fact follows easily from the definition. Proposition 3.1. All the characters are USP. Proof. Let f(x n ) = x S for some S [n]. Then for any y n, (23) (24) sgnt ρ f(y n ) = sgn ( ρ S y S) = sgn ( y S)

6 (25) = f(y n ). We will later see there are other USP functions besides the characters. How far can a function be from self predicting? We say that a function is ε-close to ρ-sp, to mean that f and its optimal predictor sgnt ρ f are ε-close. Lemma 3.2. Any function f is S [n] (1 ρ S )ˆf S 2 -close to ρ-sp. Proof. Let A { 1,1} n be the set of all y n at which f is ρ-sp. Hence for any y n A it must be that f(y n ) T ρ f(y n ) < 0. Recalling that T ρ f(y n ) 1, we have that (26) E(f(Y n ) T ρ f(y n )) Pr(Y n A). On the other hand, it also holds that (27) E(f(Y n ) T ρ f(y n )) = ρ S ˆf2 S. The proof now follows by recalling that S ˆf 2 S = 1. For anyn, functions that depend on allnvariables can be found (even balanced ones), whose distance from their optimal predictor is larger than some universal constant. The problem with this measure of closeness to SP is that in many cases the optimal predictor might be different from the functions on inputs that are very noisy, i.e., where the posterior probability of the function value is close to uniform. Thus, a more practically motivated way of quantifying closeness to SP is by considering noise sensitivity and stability. Define the strong noise sensitivity of a function f to be (28) NS δ [f] := Pr(f(Xn ) sgnt ρ f(y n )) and the associated strong stability as (29) Stab ρ [f] := E(f(Xn ) sgnt ρ f(y n )). Of course, just as for the regular noise sensitivity and stability, we have the trivial connection (30) Stab ρ[f] = 1 2NS 1 ρ[f], 2 and we can express strong stability in terms of the noise operator: (31) (32) (33) Stab ρ[f] = E(E(f(X n ) sgnt ρ f(y n ) Y n )) = E(T ρ f(y n ) sgnt ρ f(y n )) = E T ρ f(y n )

7 (34) = T ρ f 1. Thus the 1-norm of T ρ f can be interpreted in terms of the error probability associated with the optimal predictor for f. Since the optimal predictor sgnt ρ f can only do better than f itself, we immediately have: Proposition 3.3. For any function f and any ρ (35) T ρ f 2 T 2 ρf 1, with equality if and only if f is ρ-sp. The strong stability can also be upper bounded by a regular stability expression. Proposition 3.4. Stab ρ [f] Stab ρ[f] Stab ρ 2[f]. Proof. Write (36) (37) (38) (39) (40) Stab ρ [f] = T ρf,sgnt ρ f (a) T ρ f 2 sgnt ρ f 2 = T ρ f,t ρ f (b) = T ρ 2f,f = Stab ρ 2[f]. where (a) is by the Cauchy-Schwartz inequality, and (b) is since T ρ f is a self-adjoint operator. An immediate consequence of the above is: Corollary 3.5. The strong noise sensitivity satisfies: (41) 1 Stab ρ 2[f] 1 Stab ρ [f] NS δ [f] NS δ[f] NS δ [f]. Note that this bound is tight for the characters (and again shows that they are USP). We can easily derive the following weaker statements: Corollary 3.6. For any f (42) If f is balanced, then (43) NS δ [f] 2 NS δ [f] NS δ[f]. NS δ [f] 1+ρ NS δ [f] NS δ[f]. We may obtain improved bounds for low correlation values:

8 Proposition 3.7. Suppose W 1 [f] > 0. Then: { } 1 (44) max 1, 2W1 [f] +O(ρ2 ) Stab ρ [f] Stab ρ [f] 1 W1 [f] +O(ρ2 ). Proof. We have that (45) (46) Stab ρ [f] = E T ρf(y n ) n = E ρˆf i Y i +O(ρ 2 ). i=1 Khintchine s inequality [Haa81] then implies (47) 1 2 W 1 [f] ρ+o(ρ 2 ) Stab ρ[f] W 1 [f] ρ+o(ρ 2 ), and the result follows from [O D14, Proposition 2.51] (48) Stab ρ [f] = W 1 [f] ρ+o(ρ 2 ) Majority is USP. The Majority function (for odd n) is given by (49) Maj(x n ) := sgn i [n]x i. In this subsection we show the following: Theorem 3.8. Majority is USP. We define the natural partial order over R k, where y k z k if and only if y i z i for all coordinates i. We write to denote the case of strict inequality in at least one of the coordinates. We say that a function f is monotone on a set of coordinates S [n], if f(y n ) f(z n ) whenever both y S z S and y [n]\s = z [n]\s. A function that is monotone on [n] is simply called monotone. Lemma 3.9. Let f : { 1,1} n { 1,1} be monotone on S [n], and suppose f(y n ) = 1. Let z n satisfy y S z S and y [n]\s = z [n]\s. Then if f is ρ-sp at y n, it is also ρ-sp at z n. We note that as usual, analogous statements immediately hold when the direction of monotonicity on every coordinate is determined separately. Proof. We prove the statement for a singleton S, say S = {n}. The general case then follows by applying the same argument repeatedly. Ify n = 1 the claim is trivial. Assume y n = 1 and let z n agree with y n except on the nth coordinate. Due to monotonicity

9 we have that f(z n ) = 1. Then (50) (51) (52) (53) (54) (55) T ρ f(z n ) = x n p(x n z n )f(x n ) = x n 1 = x n 1 x n p(x n 1 y n 1 )p(x n 1)f(x n ) p(x n 1 y n 1 ) [ δf(x n 1, 1)+(1 δ)f(x n 1,1) ] (a) p(x n 1 y n 1 ) [ (1 δ)f(x n 1, 1)+δf(x n 1,1) ] x n 1 = T ρ f(y n ) (b) 0, where (a) holds since f is monotone on the nth coordinate, and (b) holds by the assumptions that f(y n ) = 1 and that f is ρ-sp at y n. Recall that x n is called a boundary point of f if the value of f(x n ) can be flipped by filliping some single coordinate of x n. We further say that x n is a dominating boundary point of f if f(x n ) = 1 (resp. = 1) and f(y n ) = 1 (resp. = 1) for any y n x n (resp. x n y n ). The following corollary follows easily from Lemma 3.9. Corollary A monotone function is ρ-sp if and only if it is ρ-sp at all its dominating boundary points. Proof of Theorem 3.8. By Corollary 3.10 it suffices to check only the dominating boundary points, which in the case of Majority are exactly those pointsy n for which n i=1 y i = 1. Before we proceed with the proof, note that at least in the immediate neighborhood of such a point (say, Hamming distance one or two), there are more neighbors who disagree with y n on the value of the function, than those who agree with it. Due to oddness and symmetry, it suffices to check a single such point say, a concatenation of n 1 2 minus ones followed by n+1 2 ones (e.g., ( 1, 1,1,1,1) for n = 5). Let y n be that point, and note that y n 1 is balanced, i.e., n 1 i=1 y i = 0. Let us define for each x n 1 a conjugate vector x n 1 obtained by flipping all the bits ofx n 1, followed by a cyclic shift of n 1 2 symbols (e.g., x n 1 = (1, 1, 1, 1) and x n 1 = (1,1, 1,1)). Let A 0,A + and A be the sets of all balanced, positive sum, and negative sum vectors in { 1,1} n 1, respectively. We note that conjugation is a bijective mapping from A + to A which satisfies d H (x n 1,y n 1 ) = d H ( x n 1,y n 1 ), and so also p(x n 1 y n 1 ) = p( x n 1 y n 1 ). Hence, (56) T ρ Maj(y n ) = x n p(x n y n ) Maj(x n )

10 (57) (58) (59) (60) (61) = p(x n 1 y n 1 )p(x n 1) Maj(x n ) x n 1 x n = p(x n 1 y ) [(1 δ)maj(x n 1 n 1,1)+δMaj(x n 1, 1) ] x n 1 (a) = p(x n 1 y n 1 ) (1 2δ) x n 1 A 0 p(x n 1 y n 1 )+ p(x n 1 y n 1 ) x n 1 A x n 1 A + (b) = (1 2δ) Pr ( X n 1 A 0 Y n 1 = y n 1) 0, where (a) is since Maj(x n 1,x) = x for any x n 1 A 0, whereas Maj(x n 1,x) = Maj( x n 1,x) = 1 for any x n 1 A +, and (b) is by the properties of the conjugation mapping. Noting that the inequality in (61) is strict for any δ [0,1/2) we find that Majority is USP at y n, thus concluding the proof. Majority (and characters) are not the only USP functions, and not even the only USP LTF: Example The balanced LTFs with n = 5 and coefficients a 5 1 = (1,1,3,3,5), with n = 7 and coefficients a 7 1 = (1,1,3,3,3,5,7), with n = 9 and coefficients a9 1 = (1,1,3,3,3,5,5,5,7), with n = 11 and coefficients a 11 1 = (1,1,3,3,3,3,5,5,5,7,7) can all be verified by direct computation to be USP SP/USP Preserving Operators. Let us now discuss several operations that preserve self-predictability. First, we note that self-predictability is invariant to negation of inputs. We write for the Hadamard product. Proposition Let a n { 1,1} n Then, f(x n ) is ρ-sp if and only if f(a n x n ) is ρ-sp. The straightforward proof is omitted. Next, we consider the case of separable functions. Proposition Let f(x n ) = g(x k 1) h(x n k+1 ). Then f is ρ-sp if and only if both g and h are ρ-sp. Proof. If g and h are both ρ-sp then for any y n, (62) (63) (64) sgnt ρ f(y n ) = sgnt ρ ( g(y k ) h(y n k+1) ) = sgn ( T ρ g(y k ) T ρ h(y n k+1 )) = g(y k 1 ) h(yn k+1 )

11 (65) = f(y n ). Conversely, suppose that f is ρ-sp. Note that Lemma 3.2 implies in particular that there must exist at least one point yk+1 n at which h is ρ-sp. Without loss of generality, assume that h(yk+1 n ) = 1. Then for any yk (66) (67) (68) (69) sgnt ρ g(y k ) = sgnt ρ g(y k ) sgnt ρ h(yk+1) n = sgn ( T ρ g(y k ) T ρ h(yk+1) ) n = sgnt ρ f(y n ) = f(y n ) (70) (71) = g(y k ). = g(y k ) h(y n k+1 ) Hence g (and symmetrically, also h) is ρ-sp. Note that Proposition 3.1 also follows as a simple corollary to Proposition Next, we consider functions of equal-size disjoint characters. Proposition Let {S l [n]} l [m] be disjoint subsets of equal size S l = w. Let f : { 1,1} m { 1,1} be ρ w -SP. Then f(x S 0,x S 1,...,x S m 1 ) is ρ-sp. Proof. It is easy to check that the Fourier coefficients of h(x n ) = f(x S 1,x S 2,...,x Sm ) are given by ˆf T, S = t T S t (72) ĥ S =. 0, otherwise Hence, (73) (74) (75) (76) (77) (78) sgnt ρ h(y n ) = sgn ρ S ĥ S y S S [n] = sgn ρ w T ĥ t T S t y t TS t T [m] = sgn ρ w T ˆfT y T T [m] = sgnt ρ wf(y S 0,y S 1,...,y S m 1 ) = f(y S 0,y S 1,...,y S m 1 ) = h(y n ).

12 Example Using the fact that characters and Majority are USP functions, together with Propositions 3.12, 3.13 and 3.14, we can construct many distinct USP functions. For example, the function (79) sgn((x 1 x 2 +x 3 x 4 +x 5 x 6 ) (x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 ) x 16 ) is USP. Nonetheless, there are USP functions that cannot be constructed from characters and Majority this way. For example, none of these functions can be an LTF, as the USP functions in Example We note in passing that several seemingly plausible properties do not hold in general: Example The optimal predictor of a balanced function may not be balanced. For example, the function 1 4 (2x 1 +x 3 2x 1 x 2 +x 1 x 3 +x 2 x 3 x 3 x 4 +x 1 x 2 x 3 +x 1 x 3 x 4 x 2 x 3 x 4 +x 1 x 2 x 3 x 4 ) is a balanced function, yet sgnt ρ f is non-balanced when ρ = 1/2. Example In the following subsections we explore functions that are SP for high or low correlation. However, self-predictability is not necessarily a monotone property in ρ. to wit, if a function is ρ 0 -SP then might not be ρ-sp for some ρ ρ 0. Indeed, there are functions that admit an irregular behavior. For example, the balanced LTF with n = 11 and coefficients (80) a 11 1 = (13,43,67,67,67,117,153,165,165,179,179) can be verified by direct computation to be ρ-sp only for ρ [0,0.312] (0.544,1]. 4. High Correlation Sufficient Conditions In this section, we derive sufficient conditions on a function to be SP using various arguments. All our conditions will be high correlation ones, i.e., for ρ 0 larger than some threshold. Proposition 4.1. Any function is ρ-sp for ρ > 2 (n 1) /n 1, and there is no better universal guarantee. Proof. This range corresponds to the values of the crossover probability δ [0,1 2 1/n ) for which the probability no bit was flipped (1 δ) n, is at least 1/2. This bound is achieved with equality by the OR function OR(x n ). To see this, note that the OR

13 function is monotone and symmetric with two types of dominating boundary points. The first is the all-ones sequence 1 n. In this case (81) T ρ OR(1 n ) = (1 δ) n 1+[1 (1 δ) n ] ( 1). which is non-negative if and only if δ [0,1 2 1/n ]. The second type is y n = (1 n 1, 1) (or any permutation thereof), in which case (82) (83) (84) T ρ OR(y n ) δ(1 δ) n 1 δ 1+(1 δ(1 δ) n 1 ) ( 1) 1 2 (1 δ)n 2 1 < 0 for any δ [0,1/2). Our next goal is to obtain improved sufficient conditions using specific properties of the function. The extermal property of the OR function noted above may ostensibly be attributed to the fact that it is extremely unbalanced. Hence, it is natural to wonder if the statement in Proposition 4.1 would change if we restricted ourselves to balanced functions. As it turns out, the answer is no. Proposition 4.2. Any balanced function f is ρ-sp for ρ > 1 2ln(2) n there is no better universal guarantee. + O(n 2 ), and Proof. Note that the above region is essentially the same as the one in Proposition 4.1, hence one direction is clear. We need to show there exists a balanced function that is not ρ-sp at any point outside this region. To that end, let us introduce the enlightened dictator (E-DIC) function, defined for n 3 to be ( ) n (85) E-Dict(x n ) := sgn (n 2)x 1 + x i. Evidently, E-Dict(x n ) is determined by the dictator x 1, unless all the subjects x 2,...,x n disagree. It is easy to verify that E-Dict(x n ) is a monotone, odd (and hence balanced) function. By Lemma 3.9 we need only check its dominating boundary points to establish self-predictability. Due to oddness, it suffices to check the dominating boundary points for which E-Dict(y n ) = 1. There are two types of such points. The first is y n = ( 1,1 n 1 ). The function is SP at this y n if and only if i=2 (86) Pr(E-Dict(X n ) = 1 Y n = y n ) = (1 δ) n +δ(1 δ n 1 ) 1/2. The second derivative of the left-hand side (l.h.s.) above is n(n 1)((1 δ) n 1 δ n 2 ), which is non-negative for δ [0, 1/2], hence the l.h.s. is convex inside this interval. It is easy to check that equality in (86) holds for δ = ln(2) n 1 O(n 2 ) and for δ = 1/2,

14 hence by convexity y n is δ-sp if and only if δ < ln(2) n 1 O(n 2 ), or equivalently, ρ > 1 2ln(2) n 1 +O(n 2 ). The second type of dominating boundary points is of the form y n = (1,1, 1 n 2 ) (or any other permutation of the subjects). For this y n we have (87) (88) (89) Pr(E-Dict(X n ) = 1 Y n = y n ) = δ n 1 (1 δ)+(1 δ)(1 δ(1 δ) n 2 ) = (1 δ) [1 δ((1 δ) n 2 δ n 2 ) ] (1 δ) [1 δ(1 2δ)], where the inequality follows since (1 δ) n 2 δ n 2 1 2δ for δ [0,1/2] and any n 1. It is easy to check that (89) is strictly decreasing in δ [0,1/2] and equals 1/2 for δ = 1/2. This implies that the function is USP at this y n. Hence we conclude that E-Dict is ρ-sp if and only if ρ > 1 2ln(2) n 1 +O(n 2 ), concluding the proof Bounded Degree. Next, we provide an stronger statement that uses the degree Deg(f) of the function, i.e., the maximal character degree appearing in the Fourier representation of f. Theorem 4.3. Any function f is ρ-sp for ρ 1 1 Deg(f). Proof. Fix any y n and think of T ρ f(y n ) as a polynomial in ρ. Let ρ 0 be the largest root of this polynomial in [0,1] (if there is one, otherwise ρ 0 = 0). Since T ρ f(y n ) equals f(y n ) {1, 1} for ρ = 1, then by continuity f is ρ-sp at y n for any ρ ρ 0. Let us now upper bound ρ 0 for any y n, in terms of Deg(f). To that end, recall that Bernstein s inequality [RS02] states that for any polynomial Q(z) of degree k, (90) max dq(z) z 1 dz k max Q(z). z 1 So, since T ρ f(y n ) 1 for any ρ (0,1], and since the degree (in ρ) of T ρ f equals the (Fourier) degree of Deg(f) of f, we have (91) max d ρ [0,1] dρ T ρf(y n ) Deg(f), and the claim follows. Theorem 4.3 significantly improves on Theorem 4.1 whenever Deg(f) n, e.g., for n-dimensional functions f that can be computed by a decision tree of depth k, in which case Deg(f) k [O D14, Proposition 3.16] Sparse PTFs. Next, we derive a sufficient condition that applies to PTFs of a given sparsity.

15 Theorem 4.4. Let f be a PTF of sparsity s and character widths {w j } s j=1. Then f is ρ-sp for all ρ ρ 0 where ρ 0 is the (unique) solution to s (92) ρ w j = s 1. j=1 Proof. Let ζ w denote the probability that the value of a character of width w [n] is flipped over the noisy channel, i.e., ( w (93) ζ w := Pr X l (94) (95) = 1 2 ( l=1 = 1 ρw. 2 ) w Y l l=1 [ w ]) 1 Stab ρ X l Also, let {ˆp j } s j=1 and {S j } s j=1 denote the coefficients and character sets corresponding to the widths {w j } s j=1, respectively. Assume without loss of generality that f(y n ) = 1. Then T ρ f(y n ) can be expanded as follows: ( ) s (96) T ρ f(y n ) = E sgn ˆp j X S j Y n = y n (97) j=1 = Pr ( X S 1 = y S 1 Y n = y n) ( ( ) ) s E sgn ˆp 1 y S 1 + ˆp j X S j X S 1 = y S 1,Y n = y n j=2 l=1 +Pr ( X S 1 = y S 1 Y n = y n) ( ( ) ) s E sgn ˆp 1 y S 1 + ˆp j X S j X S 1 = y S 1,Y n = y n. We can add and subtract to the second addend above, noting that its absolute value is upper bounded by Pr ( X S 1 = y S 1 Y n = y n). This yields ( ) s (98) T ρ f(y n ) = E sgn ˆp j X S j Y n = y n (99) (100) E ( sgn j=1 ( ˆp 1 y S 1 + j=2 ) ) s ˆp j X S j Y n = y n j=2 2 Pr ( X S 1 = y S 1 Y n = y n) ( ( ) s = E sgn ˆp 1 y S 1 + ˆp j X S j Y n = y ) 2 ζ n w1. j=2

16 Continuing to eliminate terms in this manner, we obtain ( ( s ) ) T ρ f(y n ) = E sgn ˆp j X S j (101) Y n = y n (102) (103) j=1 ( s ) sgn ˆp j y S j 2 = 1 j=1 s (1 ρ w j ). j=1 Thus, f is ρ-sp at y n for any ρ satisfying s j=1 ρw j s 1. The derivation for the case where f(y n ) = 1 is similar. The claim now follows since s j=1 ρw j is monotonically increasing with ρ. The theorem is useful for moderate values ofn. Using the convexity ofρ t the following is easily verified: Corollary 4.5. Let f be a PTF of sparsity s. Then f is ρ-sp for all ( (104) ρ exp s ) n ln s. s 1 A simple generalization of Theorem 4.4 is as follows. Corollary 4.6. f(x n ) = sgn s j=1 f j(x n ) is ρ-sp for any ρ > ρ 0, where ρ 0 is the (unique) solution to (105) 1 s s j=1 s Stab ρ [f j ] = 1 1 s. j= Friendly Neighbors. Given a function f, we say that a point x n has a radius-d friendly neighborhood w.r.t. f if there exists some y n of distance at most d that agrees with x n, namely, where d H (x n,y n ) d and f(x n ) = f(y n ). Proposition 4.7. Suppose f is ρ-sp for all ρ > 1 ε, and n > max{2ε 1,γ} where γ is a universal constant. Then each point in { 1,1} n has a radius-2 friendly neighborhood w.r.t. f. Proof. Suppose toward contradiction that all the neighbors at Hamming distance 1 and 2 from some y n disagree with it. This implies that ( n (106) Pr(f(X n ) f(y n ) Y n = y n ) 1 (107) ζ wj ) δ(1 δ) n 1 + ( = (1 δ) n 2 nδ ( ) n δ 2 (1 δ) n 2 2 ) (1 δ)+ (n 1) δ 2.

17 Choosing δ = α, and assuming that n > 2α so that we are in the SP region, yields n ε ( Pr(f(X n ) f(y n ) Y n = y n ) 1 α ) ( n 2α 1+ α n 2 3α ) (108) 2n ( 1 α ) ( ) n 2 (α+ n) α2 1 (109) O 2 n ) ( ) = e α (α+ α2 1 (110) O. 2 n One can check that, e.g., for α = 1, (α+ α2 2 )e α > 1/2, and so f cannot be SP if n is larger than some universal constant, in contradiction. Hence, for a function to be SP even slightly above the guaranteed high correlation threshold of ρ > 1 2ln(2) n +O(n 2 ), every point admit a radius-2 friendly neighborhood. The OR function, e.g., does not satisfy this property. Furthermore, this result is tight: For the largest character x [n] = n i=1 x i, which is USP, the distance-1 neighbors of each point do not agree with it. The following corollary, which is not directly related to self-predictability, is obtained by combining Theorem 4.3 and Proposition 4.7. Corollary 4.8. If Degf < n/2 and n is larger than a universal constant, then each point in { 1,1} n has a radius-2 friendly neighborhood w.r.t. f. 5. Low Correlation Self Predicting (LCSP) Functions In this section we discuss LCSP functions, i.e., functions that are ρ-sp for any ρ < ρ for some ρ > 0. Note that any USP function is trivially also LCSP, hence all our LCSP necessary conditions will apply to USP functions verbatim LCSP and Spectral Threshold Functions. Let the minimal level of a function f be defined as (111) Lev(f) := min { k [n] : W k [f] > 0 }, and let (112) f Lev (x n ) := S: S =Lev(f) ˆf S x S. We say that f is weakly spectral threshold (WST) if f Lev (x n ) f(x n ) 0 for all x n, i.e., the sign of both functions agree whenever f Lev 0. We say that f is strongly spectral threshold (SST) if it is WST and f Lev is never zero. For an LTF f, the Fourier coefficients (ˆf φ, ˆf 1,..., ˆf n ) are known as the Chow parameters [Cho61, Tan61]. In this case, SST functions are exactly the LTFs for which

18 the solution to the Chow-parameters problem [OS11] is exactly the chow parameters themselves. Proposition 5.1. SST implies LCSP. Conversely, LCSP implies WST. Proof. The optimal predictor for f satisfies (113) sgnt ρ f(x n ) = sgn ρ ρ S Lev(f) ˆfS x S s: S Lev(f) (114) = sgn(f Lev (x n )+O(ρ)). Thus, sgnt ρ f(x n ) = sgnf Lev (x n ) for any ρ small enough whenever f Lev (x n ) 0. If f is SST f Lev (x n ) never vanishes, and hence f(x n ) = sgnf Lev (x n ) = sgnt ρ f(x n ), implying LCSP. Conversely, if f is LCSP, then f(x n ) = sgnt ρ f(x n ) = sgnf Lev (x n ) unless f Lev vanishes, implying WST. An immediate consequence of Proposition 5.1 is: Corollary 5.2. An LCSP function is either balanced or constant. Proof. Suppose f is LCSP and unbalanced. Then Lev[f] = 0 and ˆf φ 0, and by Proposition 5.1 it must be WST. Hence f = sgn ˆf φ { 1,1} must be constant. It is interesting to note that in light of Proposition 5.1, Proposition 3.7 immediately implies the following. Corollary 5.3. Let f be an LCSP function. Then either W 1 [f] = 0 or W 1 [f] 1/2. This result is very similar to the claim that W 1 [f] 1/2 for LTFs [O D14, Theorem 5.2]. Note however that the above claim holds for LCSP functions that are not LTFs but do have energy on the first level. Next, recall that Proposition 3.3 states a function is ρ-sp if and only if T ρ f 1 = T ρf 2 2. A similar property holds for f Lev if the function is LCSP. Corollary 5.4. If f is LCSP then f Lev 1 = f Lev 2 2. Proof. f must be WST by Proposition 5.1, and Plancharel s identity implies that (115) (116) (117) (118) E f Lev (X n ) = E(f Lev (X n ) f(x n )) = f Lev,f = S: S =Lev[f] ˆf 2 S = E ( f 2 Lev(X n ) ).

19 The following two examples show that the distinction between WST and SST in the theorem is necessary. Example 5.5 (LCSP does not imply SST). Consider the balanced LTF with n = 4 and coefficients a 3 1 = (2,1,1,1). This is a Majority function with a tie breaking input. It can be verified by direct computation that this function is USP, hence also LCSP. However, its level-1 Fourier coefficients are ( 3, 1, 1, 1 ). Hence, while it is clearly WST, it is not SST as there are 2 inputs for which f Lev (x n ) = 0. Example 5.6 (WST does not imply LCSP). The balanced LTF with n = 9 and coefficients a 9 1 = (1,5,16,19,25,58,68,91,94) can be verified to be WST, but not LCSP. It is ρ-sp only for ρ > The following example shows that the SST property is limited to the low-correlation regime only. Example 5.7 (SST does not imply USP). The LTF of Example 3.17 is SST, but as was shown there, is not USP. Thus, while an SST is always LCSP, it is not necessarily USP. We note in passing that are SST and WST functions outside Majority that are USP. Example 5.8. The LTF in Example 3.11 is SST and USP, while the balanced LTF with n = 9 and coefficients a 9 1 = (1,1,1,3,3,3,5,5,7) is WST and USP (f Lev = 0 for 30 inputs), but not SST. Next, using Proposition 5.1, we can show that the largest coefficients of an LCSP LTF cannot be too distinct. Proposition 5.9. Let f be an LTF that depends on all its n variables. Let a and b be its first and second largest coefficients in absolute values, respectively, in some representation of f. If f is LCSP then a < nlnn+1. b Proof. Assume without loss of generality that a 1 a 2 a n > 0. Recall also that by Corollary 5.2 we know that a 0 = 0. The level-1 Fourier coefficients are given by (119) (120) (121) ˆf k = E(X k f(x n )) ( n ) = E sgn a i X i X k i=1 ( = E sgn a k + ) a i X i i k

20 (122) (123) (124) ( = 2Pr a i X i a k ) 1 i k ( ( = Pr a i X i a k )+Pr a i X i a k ) 1 i k i k ) = Pr( a i X i a k. i k Assume without loss of generality that a 2 = 1, and write a := a 1. For brevity, also write Z := n i=3 a ix i and X := X 1. Then, from the symmetry of Z, (125) (126) (127) ˆf 1 = Pr( X +Z a) = Pr( 1+Z a) Pr( Z < a 1), and (128) (129) (130) ˆf 2 = Pr( ax +Z 1) Pr(a 1 Z a+1) Pr( Z a 1). Hence, (131) ˆf 1 ˆf 2 1 Pr( Z a 1). Pr( Z a 1) Since a i 1 for 3 i n, and assuming toward contradiction that a > nlnn+1, Hoeffding s inequality implies that (132) Pr( Z a 1) < 1/n, and so ˆf 1 /ˆf 2 > n 1. Noting that a i a j implies ˆf i ˆf j, we also have that ˆf 1 /ˆf i n 1+ε for any i > 1, for ε > 0 small enough. Since f is WST, i.e., f(x n ) = sgn n i=1 ˆf i x i whenever the right-hand side (r.h.s.) is nonzero, but for these ratios of coefficients clearly it must be that f(x n ) = x 1, in contradiction to the assumption that it depends on all the variables. For example, the enlightened dictator function E-Dict( ) (85) has first-to-second coefficient ratio of n 2, and thus cannot be LCSP. It should be noted however, that E-Dict( ) can also be written as an LTF with coefficients E-Dict( ) = ( n,1,c,c,...,c) wherec = n 1+ε for someε > 0. When, given in this form, Proposition 5.9 is incapable n 2 of ruling it out from being SP. Nonetheless, it is easy to verify that LTFs of coefficients (c, 1, 1,..., 1) for c < n 2 must have first-to-second-coefficient ratio of Ω(n).

21 5.2. LTF Approximation. The WST condition can be leveraged to show that a LCSP function can typically be well approximated by an LTF. Specifically: Theorem An LCSP f is 2 πn f -close to an LTF, where n f := {i [n] : ˆf i 0}. Corollary A monotone LCSP function that depends on all its coordinates is -close to an LTF. 2 πn To prove Theorem 5.10 we first establish the following technical lemma. We state it in a slightly more general form than we actually need. Lemma Let a n R n be a vector of nonzero coefficients. Then for any b R ) n ( ) n 2 (133) Pr( a i X i b < min a k 2 n k [n] n /2 πn. i=1 Proof. Write a = min a k and let { } (134) A x n { 1,1} n n : a i x i b < a. It is easy to see that A forms an antichain w.r.t. the partial order on { 1,1} n, i.e., that there are no two distinct x n,y n A such that x n y n. This holds simply since for such a pair it must hold that n (135) a i y i i=1 i=1 n a i x i 2a. i=1 Such an antichain is called a Sperner family, and Sperner s theorem [AS04, Maximal Antichains, Corollary 2] shows that (136) A concluding the proof. ( ) n n /2 Proof of Theorem Assume Lev[f] = 1 (trivial otherwise), and define g(x n ) = sgn( n ˆf i=1 i x i ). Let A = {x n { 1,1} n : g(x n ) = 0}. Using Lemma 5.12, we have that (137) Pr(X n A) 2 πn f. Since f is LCSP then by Proposition 5.1 is it also WST, and hence f(x n ) = g(x n ) for any x n A. By slightly perturbing the coefficients of g, one can clearly obtain a legal LTF g that takes values only in { 1,1} and still agrees with f for all x n A. The distance between f and g is therefore at most A /2 n.

22 5.3. Chow Distance. The Chow distance between two Boolean functions f and g is defined as (138) d Chow (f,g) ) 2 (ˆfi ĝ i i [n] 1/2. It was shown in [OS11, Prop. 1.5, Th. 1.6] that for any f and g ( ) 1 (139) 4 d2 Chow(f,g) Dist(f,g) Õ 1 logdchow (f,g) where for q < 1, Õ(q) means O(q log c (1/q)) for some absolute constant c. For LCSP LTF functions, the upper bound can be generally improved. We will state our result for the case where one of the functions is SST, though it can be somewhat cumbersomely extended to the case where none of them is. Let Gap[f] be the minimal positive value of n i=1 ˆf i x i over the Hamming cube (with Gap[f] = 0 if all the ˆf i s are zero). Theorem Let f and g be two balanced LCSP functions that depend on all n variables, and assume that f is SST. Then (140) Dist(f,g) d2 Chow (f,g). 2Gap[f] Proof of Theorem Let (141) B := {x n { 1,1} n : f(x n ) g(x n )}., Then (142) (143) (144) d 2 Chow (f,g) = E (f(x n ) g(x n )) ) (ˆfi ĝ i ) = 2E (ˆfi ĝ i i [n] X i 2Gap[f] Pr(X n B) i [n] X i ½(Xn B) where (142) follows from linearity of expectation and the definition of the Fourier coefficients, (143) holds since both f and g are WST by virtue of Proposition 5.1, and (144) holds since f is SST and g is WST. Lemma 5.12 implies that Gap[Maj]. Since Majority is SST, we have: 2 πn Corollary For odd n and any LCSP function g, (145) 1 4 d2 Chow(Maj,g) Dist(Maj,g) πn 8 d2 Chow(Maj,g).

23 6. Stability-based Conditions In this section we provide simple necessary conditions for a function to be ρ-sp, in terms of its stability and Fourier coefficients. Proposition 6.1. If f is ρ-sp then (146) Stab ρ [f] maxρ S ˆf S. S [n] Proof. If f is ρ-sp, then Stab ρ [f] = Stab ρ [f]. Letting T [n], the strong stability can be lower bounded as follows: Stab ρ [f] = E ρ S ˆf (147) S Y S S [n] = E ρ S ˆf (148) S Y S YT S [n] = E ρ S ˆf (149) S Y S Y T S [n] E S [n]ρ S ˆf (150) S Y S Y T (151) = ρ T ˆf T. The proof is completed by optimizing over T. Example 6.2. When f is the OR function, we have (152) max S [n] ρ S ˆf S = ˆf φ = n. It is easy to verify that (153) Pr(f(X n ) = f(y n )) = n (1 (1 δ) n ), and using ρ = 1 2δ (154) (155) Stab ρ [f] = 2 Pr(f(X n ) = f(y n )) 1 ( ( ) n ) 1+ρ = n 1. 2 Then, OR is ρ-sp only when Stab ρ [OR] n, which can be seen to be equivalent to ρ 2 ( n 1)/n 1. This is the same result that can be obtained by direct computation, and so the bound of Proposition 6.1 is tight in this case. We may deduce again the result of Corollary 5.2:

24 Corollary 6.3. An LCSP function is either balanced or constant. Proof. If f is ρ-sp then (156) Stab ρ [f] = S [n]ρ S ˆf S 2 ˆf φ. As ρ 0, this bound implies that ˆf φ 2 ˆf φ, and as ˆf φ 1, this is only possible when either ˆf φ = 0 or ˆf φ = 1. More generally, we have the following: Corollary 6.4. If f is LCSP then (157) W Lev[f] [f] max ˆf S. S [n]: S =Lev(f) Specifically, if f is also monotone, this bound reads (158) W 1 [f] = max i [n] ˆf i = maxinf[f], where the r.h.s. is the so-called maximal influence of f. When the Deg(f) < n, another bound of the form of Proposition 6.1 can be derived using the following implication of hypercontractivity [Bon70, Gro75]: When f : { 1,1} n R has Deg(f) = k then f 2 e k f 1 [O D14, Theorem 9.22]. Proposition 6.5. If f is ρ-sp and Deg(f) = k then (159) Stab ρ [f] e k Stab ρ 2[f]. Proof. As in the proof of Proposition 6.1, we lower bound E ρ S ˆf (160) S Y S = T ρf 1 S [n] (161) (162) (163) (164) (a) e k T ρ f 2 = e k T ρ f,t ρ f (b) = e k T ρ 2f,f = e k Stab ρ 2[f] where (a) is since if Deg(f) = Deg[T ρ f] = k, and (b) is since T ρ f is a self-adjoint operator. The last proof implies for a degree k, ρ-sp function f (165) e k Stab ρ 2[f] Stab ρ [f] Stab ρ 2[f].

25 It can be observed that even for a given degree k, neither of the bounds in Propositions 6.1 and 6.5 subsumes the other. 7. Sharp Threshold at High Correlation As we have seen, all functions are ρ-sp when ρ > 1 2ln2 n +O(n 2 ). In this section, we show that when the correlation is reduced ever so slightly to ρ 1 2, the fraction n of SP functions becomes double-exponentially small. Theorem 7.1. For any α > 1 and all n sufficiently large, the fraction of ρ-sp functions for ρ = 1 2α n is at most exp( 2n h(α 1 2α )+o(n) ). The fact that ρ-sp functions are rare is not limited to the ρ = 1 O( 1 n ) regime, yet different techniques are needed in order to establish this in other regimes. We next demonstrate how a similar phenomenon holds in a high correlation regime where ρ is fixed. Let η δ be the minimal η > 0 such that (166) 1 2 log 1 δ 2 +(1 δ) 2 < min { log } 1 1 δ,d(η δ) holds, where d(p q) := plog p 1 p +(1 p)log is the binary divergence function. It can q 1 q be verified that η δ < 1/4 for any δ < δ max Theorem 7.2. For any δ (0,δ max ) and all n sufficiently large, the fraction of ρ-sp functions for ρ = 1 2δ is at most exp ( 2 n[1 h(2η δ)] o(n) ). We begin with the proof of Theorem 7.1. Proof of Theorem 7.1. In this proof we find it more convenient to work with the δ and {0,1} convention. The proof comprises two steps. First, we focus on a specific y n and derive a sufficient condition for a function not to be ρ-sp at y n. This condition is specifically tailored to the regime of δ = Θ(1/n), and depends only on local values of the function, up to Hamming distance logn from y n. We show that for a random choice of function, the probability that this condition is satisfied decays exponentially, but we derive an upper bound on the exponential decay rate. In the second step, we recall that a function is ρ-sp only if it is ρ-sp at all 2 n points of the Hamming cube. This implies that the expected number of non ρ-sp points is exponentially large. While there are statistical dependencies between different points in the Hamming cube, Janson s theorem [AS04, Theorem 8.1.1] and the fact that the condition in the first step is local allows us to prove that the probability that all points in the Hamming cube are ρ-sp is only double-exponentially small.

26 We begin with the first step. To this end, let us denote the shell of radius d around x n {0,1} n by (167) S(x n,d) := { x n : d H (x n, x n ) = d}. For any function f : {0,1} n {0,1}, let the d-shell bias of f be (168) β d,f (x n 1 ) := f( x n ). S(x n,d) x n S(x n,d) Fix η > 0 and some y n. Without loss of generality, below we assume that f(y n ) = 0. Define the set of functions (169) B η (y n,1) = {f : β 1,f (y n ) 1 η}, and for 2 d l, the sets (170) B η (y n,d) = {f : β d,f (y n ) 1/2}, where l 3. We say that y n is bad for f if f B η (y n ), where (171) B η (y n ) := l B η (y n,d). d=1 Now, for any n > l, setting δ = α n, for any f B η(y n ): (172) (173) (174) (175) (176) Pr(f(X n ) f(y n ) Y n = y n ) l ( ) n β d,f (y n ) δ d (1 δ) n d d d=1 (1 δ) n d=1 ( 1 α ) n n l ( ) n β d,f (y n ) δ d d ( 1 l ) ( l l n d=1 β d,f (y n ) d! α d ) ( 1 α ) ( n 1 l ) ( ) l (1 η) α+ 1 l α d n n 2 d! d=2 ( = 1 α ) ( n 1 l ) ( ( l (1 η) α+ 1 e α 1 α n n 2 Choosing l = logn, (176) tends to (177) ( 1 2 η ) αe α 1 2 e α d=l+1 )) α d. d!

27 as n. Let (178) η α := α 1 2α. Clearly, η α is monotonically increasing for α > 0, where lim α 1 η α = 0, and lim α η α = 1/2. Setting η (0,η α ) guarantees that (177) is larger than 1/2 for all large enough n. Hence, for such a choice, (179) Pr(f(X n ) f(y n ) Y = y n ) > 1/2, and so (180) {f B η (y n )} {f is not ρ-sp at y n }. Let us now choose f uniformly at random over all functions Boolean functions on {0,1} n, and upper bound the probability Pr(f E}. This in turn will serve as an upper bound the probability that the function we draw is ρ-sp for the aforementioned ρ. To this end, note that Chernoff s bound implies that (181) Pr(β 1,f (y n ) 1 η) = 2 n(1 h(η))+o(n), and symmetry implies that (182) Pr(β d,f (y n ) > 1/2) = 1/2, for 2 d l = logn. By independence, (183) (184) (185) Pr(B η (y n )) = logn d=1 Pr(B η (y n,d)) = 2 n(1 h(η))+o(n) 2 (logn 1) = 2 n(1 h(η))+o(n), and so (186) Pr(f is not ρ-sp at y n ) 2 n(1 h(η))+o(n). This completes the first step of the proof. For the second step, note that if f is ρ-sp for ρ = 1 2α n 1 then it must be that f has no bad inputs, i.e., that f E, where (187) E := y n {0,1} n B c η (yn ). and therefore, the expected number of bad inputs is given by (188) µ := 2 n Pr(B η (y n )) = 2 nh(η)+o(n).

28 Had the number of bad inputs been distributed according to a Poisson distribution with expected value µ, then the probability of E would have been given by (189) Pr(E) = e µ = exp ( 2 n h(η)+o(n)). However, the events B η (x n ) and B η (y n ) are dependent whenever d H (x n,y n ) 2l. Nonetheless, Janson s correction [AS04, Theorem 8.1.1] implies that (190) Pr(E) e µ+ 2, where is a correction term that depends on joint probability of dependent bad events Pr(B η (x n ) B η (y n )). We next show that 0 as n exponentially fast, as long as η (0,h 1 (1/2)). As η max h 1 (1/2), for any η (0,η max ) (191) Pr(E) exp ( 2 n h(η)+o(n)). The statement of the theorem will then follow. To complete the proof, it remains to show that 0 exponentially fast. Let us denote x n y n whenever the events {f B η (x n )} and {f B η (y n )} are statistically dependent. For brevity, below we write B η (y n ) to mean the corresponding event. The term required for Janson s theorem is then given by (192) := x n y n Pr(B η (x n ) B η (y n )). Let us analyze the probability in (192) under the assumption that f(x n ) = f(y n ) = 0. Bayes rule implies that Pr(B η (x n ) B η (y n ) f(x n ) = 0,f(y n ) = 0) = Pr(B η (x n,1) f(x n ) = 0,f(y n ) = 0) Pr(B η (y n,1) f(x n ) = 0,f(y n ) = 0,B η (x n,1)) (193) ( l Pr B η (x n,d), d=2 ) l B η (y n,d) f(x n ) = 0,f(y n ) = 0,B η (x n,1),b η (y n,1). d=2 For the first probability on the r.h.s. of (193), we note that if d H (x n,y n ) 2 then S(x n,1) {x n,y n } = φ and (181) holds. Otherwise, if d H (x n,y n ) = 1 then S(x n,1) {x n,y n } = y n. In that case, Pr(B η (x n,1) f(x n ) = 0,f(y n ) = 0) (194) = Pr(β 1,f (x n ) 1 η f(x n ) = 0,f(y n ) = 0)

29 (195) (196) (197) (198) = Pr = Pr 1 ( n 1) 1 n 1 ỹ n S(x n,1)\{y n } ỹ n S(x n,1)\{y n } = 2 (n 1){1 h(η+o(n 1 ))}+o(n) = 2 n(1 h(η))+o(n), f(ỹ n )+f(y n ) 1 η f(x n ) = 0,f(y n ) = 0 f(ỹ n ) n n 1 (1 η) where the last transition is since h(η) is a smooth function, with bounded derivatives around a neighborhood of any fixed η (0,1). For the second probability on the r.h.s. of (193), if d H (x n,y n ) 3 then S(y n,1) {{x n,y n } S(x n,1)} = φ and (181) holds. Next, if d H (x n,y n ) = 1 then S(y n,1) {{x n,y n } S(x n,1)} = x n. A derivation similar to (198) shows that (199) Pr(B η (y n,1) f(x n ) = 0,f(y n ) = 0,B η (x n,1)) = 2 n(1 h(η))+o(n) holds. If d H (x n,y n ) = 2 then S(y n,1) {{x n,y n } S(x n,1)} contains exactly two points. Again, a derivation similar to (198) (with n 2 replacing n 1) shows that (199) holds. The third probability in the r.h.s. of (193) can be trivially upper bounded by 1. Thus, (200) Pr(B η (x n ) B η (y n ) f(x n ) = 0,f(y n ) = 0) 2 2n(1 h(η))+o(n). Evidently, analogous analysis holds for all other three possibilities of the pair(f(x n ),f(y n )), and so (201) Pr(B η (x n ) B η (y n )) 2 2n(1 h(η))+o(n). As if x n y n then d H (x n,y n ) 2l the number of dependent pairs is upper bounded by 2 n (n 2l). As l = logn was chosen, ( n 2l) n logn = 2 log2n. Then (201) implies that (202) (203) 2 n+o(n) 2 2n(1 h(η))+o(n) = 2 n(1 2h(η))+o(n). Thus, 0 as n exponentially fast, as long as η (0,h 1 (1/2)), as was required to be proved. We move on to the proof of Theorem 7.2. Proof of Theorem 7.2. The proof again comprises two steps, in the spirit of Theorem 7.1, first focusing on a single point, and then on the entire Hamming cube. However, the arguments in each step are different. Specifically, in the first step we derive a

30 necessary condition for a function to be ρ-sp, which is tailored to the regime of a fixed δ, and now based only on the value of the function at points of Hamming distance of (slightly less than) ηn with η < 1/4. We then use a central-limit theorem to show that the probability that this condition is satisfied is close to 1/2. In the second step, rather the considering all 2 n points of the Hamming cube, we consider a subset of the hamming cube of size about 2 n[1 h(2η)], which satisfies that the Hamming distance between each two points in the subset is at least ηn. The existence of this set is assured from the Gilbert-Varshamov bound [Rot06, Th. 4.10]. Since the points in this subset are sufficiently far apart, the event that the condition derived in the first step occurs for one point is independent of all other points. Thus, the probability of a function to be ρ-sp is about 2 2n[1 h(2η)]. For the first step, let η < 1/4 be given, and let B η (y n ) be a punctured Hamming ball of relative radius η around y n, i.e., { (204) B η (y n ) := z n { 1,1} n : 0 < 1 } n d H(z n,y n ) η. Then, clearly (205) p(y n y n ) f(x n ) p(y n y n ) = 2 nlog 1 1 δ, and by the Chernoff bound (or the method of types [CK11]) p(x n y n ) f(x n ) x n Bη c(yn )\y (206) p(x n y n ) n x n Bη c(yn )\y n (207) (208) = Pr(X n B η (y n ) Y n = y n ) 2 nd(η δ) Θ(logn). Focusing on a given y n, let us assume without loss of generality that f(y n ) = 1. Then, (209) (210) (211) and thus, E(f(X n ) Y n = y n ) = x n p(x n y n ) f(x n ) = p(y n y n ) f(x n )+ 2 nlog 1 1 δ + (212) {f is ρ-sp at y n } x n B η(y n ) x n B η(y n ) x n B η(y n ) p(x n y n ) f(x n )+ p(x n y n ) f(x n ) 2 nd(η δ) Θ(logn), p(x n y n ) f(x n ) 2 nlog 1 x n B c η(y n )\y n p(x n y n ) f(x n ) 1 δ +2 nd(η δ) Θ(logn).

CSE 291: Fourier analysis Chapter 2: Social choice theory

CSE 291: Fourier analysis Chapter 2: Social choice theory CSE 91: Fourier analysis Chapter : Social choice theory 1 Basic definitions We can view a boolean function f : { 1, 1} n { 1, 1} as a means to aggregate votes in a -outcome election. Common examples are:

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

Testing Monotone High-Dimensional Distributions

Testing Monotone High-Dimensional Distributions Testing Monotone High-Dimensional Distributions Ronitt Rubinfeld Computer Science & Artificial Intelligence Lab. MIT Cambridge, MA 02139 ronitt@theory.lcs.mit.edu Rocco A. Servedio Department of Computer

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

arxiv: v1 [cs.cc] 29 Feb 2012

arxiv: v1 [cs.cc] 29 Feb 2012 On the Distribution of the Fourier Spectrum of Halfspaces Ilias Diakonikolas 1, Ragesh Jaiswal 2, Rocco A. Servedio 3, Li-Yang Tan 3, and Andrew Wan 4 arxiv:1202.6680v1 [cs.cc] 29 Feb 2012 1 University

More information

APPROXIMATION RESISTANCE AND LINEAR THRESHOLD FUNCTIONS

APPROXIMATION RESISTANCE AND LINEAR THRESHOLD FUNCTIONS APPROXIMATION RESISTANCE AND LINEAR THRESHOLD FUNCTIONS RIDWAN SYED Abstract. In the boolean Max k CSP (f) problem we are given a predicate f : { 1, 1} k {0, 1}, a set of variables, and local constraints

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Mathematics for Economists

Mathematics for Economists Mathematics for Economists Victor Filipe Sao Paulo School of Economics FGV Metric Spaces: Basic Definitions Victor Filipe (EESP/FGV) Mathematics for Economists Jan.-Feb. 2017 1 / 34 Definitions and Examples

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions Economics 24 Fall 211 Problem Set 2 Suggested Solutions 1. Determine whether the following sets are open, closed, both or neither under the topology induced by the usual metric. (Hint: think about limit

More information

Immerse Metric Space Homework

Immerse Metric Space Homework Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Cartesian Products and Relations

Cartesian Products and Relations Cartesian Products and Relations Definition (Cartesian product) If A and B are sets, the Cartesian product of A and B is the set A B = {(a, b) : (a A) and (b B)}. The following points are worth special

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Fourier analysis of boolean functions in quantum computation

Fourier analysis of boolean functions in quantum computation Fourier analysis of boolean functions in quantum computation Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical Physics, University of Cambridge

More information

Lecture 03: Polynomial Based Codes

Lecture 03: Polynomial Based Codes Lecture 03: Polynomial Based Codes Error-Correcting Codes (Spring 016) Rutgers University Swastik Kopparty Scribes: Ross Berkowitz & Amey Bhangale 1 Reed-Solomon Codes Reed Solomon codes are large alphabet

More information

Measurable functions are approximately nice, even if look terrible.

Measurable functions are approximately nice, even if look terrible. Tel Aviv University, 2015 Functions of real variables 74 7 Approximation 7a A terrible integrable function........... 74 7b Approximation of sets................ 76 7c Approximation of functions............

More information

f(x) f(z) c x z > 0 1

f(x) f(z) c x z > 0 1 INVERSE AND IMPLICIT FUNCTION THEOREMS I use df x for the linear transformation that is the differential of f at x.. INVERSE FUNCTION THEOREM Definition. Suppose S R n is open, a S, and f : S R n is a

More information

18.5 Crossings and incidences

18.5 Crossings and incidences 18.5 Crossings and incidences 257 The celebrated theorem due to P. Turán (1941) states: if a graph G has n vertices and has no k-clique then it has at most (1 1/(k 1)) n 2 /2 edges (see Theorem 4.8). Its

More information

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA  address: Topology Xiaolong Han Department of Mathematics, California State University, Northridge, CA 91330, USA E-mail address: Xiaolong.Han@csun.edu Remark. You are entitled to a reward of 1 point toward a homework

More information

IMPROVING THE ALPHABET-SIZE IN EXPANDER BASED CODE CONSTRUCTIONS

IMPROVING THE ALPHABET-SIZE IN EXPANDER BASED CODE CONSTRUCTIONS IMPROVING THE ALPHABET-SIZE IN EXPANDER BASED CODE CONSTRUCTIONS 1 Abstract Various code constructions use expander graphs to improve the error resilience. Often the use of expanding graphs comes at the

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

3 Finish learning monotone Boolean functions

3 Finish learning monotone Boolean functions COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 5: 02/19/2014 Spring 2014 Scribes: Dimitris Paidarakis 1 Last time Finished KM algorithm; Applications of KM

More information

Lecture 3 Small bias with respect to linear tests

Lecture 3 Small bias with respect to linear tests 03683170: Expanders, Pseudorandomness and Derandomization 3/04/16 Lecture 3 Small bias with respect to linear tests Amnon Ta-Shma and Dean Doron 1 The Fourier expansion 1.1 Over general domains Let G be

More information

Irredundant Families of Subcubes

Irredundant Families of Subcubes Irredundant Families of Subcubes David Ellis January 2010 Abstract We consider the problem of finding the maximum possible size of a family of -dimensional subcubes of the n-cube {0, 1} n, none of which

More information

Bounds on the Maximum Likelihood Decoding Error Probability of Low Density Parity Check Codes

Bounds on the Maximum Likelihood Decoding Error Probability of Low Density Parity Check Codes Bounds on the Maximum ikelihood Decoding Error Probability of ow Density Parity Check Codes Gadi Miller and David Burshtein Dept. of Electrical Engineering Systems Tel-Aviv University Tel-Aviv 69978, Israel

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

Improving the Alphabet Size in Expander Based Code Constructions

Improving the Alphabet Size in Expander Based Code Constructions Tel Aviv University Raymond and Beverly Sackler Faculty of Exact Sciences School of Computer Sciences Improving the Alphabet Size in Expander Based Code Constructions Submitted as a partial fulfillment

More information

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes These notes form a brief summary of what has been covered during the lectures. All the definitions must be memorized and understood. Statements

More information

ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM

ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM ALEXANDER KUPERS Abstract. These are notes on space-filling curves, looking at a few examples and proving the Hahn-Mazurkiewicz theorem. This theorem

More information

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered

More information

Chapter 2 Metric Spaces

Chapter 2 Metric Spaces Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics

More information

Factorization in Integral Domains II

Factorization in Integral Domains II Factorization in Integral Domains II 1 Statement of the main theorem Throughout these notes, unless otherwise specified, R is a UFD with field of quotients F. The main examples will be R = Z, F = Q, and

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

2 Sequences, Continuity, and Limits

2 Sequences, Continuity, and Limits 2 Sequences, Continuity, and Limits In this chapter, we introduce the fundamental notions of continuity and limit of a real-valued function of two variables. As in ACICARA, the definitions as well as proofs

More information

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis Supplementary Notes for W. Rudin: Principles of Mathematical Analysis SIGURDUR HELGASON In 8.00B it is customary to cover Chapters 7 in Rudin s book. Experience shows that this requires careful planning

More information

Analysis of Boolean Functions

Analysis of Boolean Functions Analysis of Boolean Functions Notes from a series of lectures by Ryan O Donnell Barbados Workshop on Computational Complexity February 26th March 4th, 2012 Scribe notes by Li-Yang Tan Contents 1 Linearity

More information

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only MMSE Dimension Yihong Wu Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú Department of Electrical Engineering Princeton University

More information

NOTES ON FRAMES. Damir Bakić University of Zagreb. June 6, 2017

NOTES ON FRAMES. Damir Bakić University of Zagreb. June 6, 2017 NOTES ON FRAMES Damir Bakić University of Zagreb June 6, 017 Contents 1 Unconditional convergence, Riesz bases, and Bessel sequences 1 1.1 Unconditional convergence of series in Banach spaces...............

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS LE MATEMATICHE Vol. LVII (2002) Fasc. I, pp. 6382 SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS VITTORINO PATA - ALFONSO VILLANI Given an arbitrary real function f, the set D

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

AN INTRODUCTION TO THE FUNDAMENTAL GROUP

AN INTRODUCTION TO THE FUNDAMENTAL GROUP AN INTRODUCTION TO THE FUNDAMENTAL GROUP DAVID RAN Abstract. This paper seeks to introduce the reader to the fundamental group and then show some of its immediate applications by calculating the fundamental

More information

Notes on Complex Analysis

Notes on Complex Analysis Michael Papadimitrakis Notes on Complex Analysis Department of Mathematics University of Crete Contents The complex plane.. The complex plane...................................2 Argument and polar representation.........................

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

1 The Observability Canonical Form

1 The Observability Canonical Form NONLINEAR OBSERVERS AND SEPARATION PRINCIPLE 1 The Observability Canonical Form In this Chapter we discuss the design of observers for nonlinear systems modelled by equations of the form ẋ = f(x, u) (1)

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

Proclaiming Dictators and Juntas or Testing Boolean Formulae

Proclaiming Dictators and Juntas or Testing Boolean Formulae Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas The Academic College of Tel-Aviv-Yaffo Tel-Aviv, ISRAEL michalp@mta.ac.il Dana Ron Department of EE Systems Tel-Aviv University

More information

1. Bounded linear maps. A linear map T : E F of real Banach

1. Bounded linear maps. A linear map T : E F of real Banach DIFFERENTIABLE MAPS 1. Bounded linear maps. A linear map T : E F of real Banach spaces E, F is bounded if M > 0 so that for all v E: T v M v. If v r T v C for some positive constants r, C, then T is bounded:

More information

Optimal compression of approximate Euclidean distances

Optimal compression of approximate Euclidean distances Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch

More information

Quantum algorithms (CO 781/CS 867/QIC 823, Winter 2013) Andrew Childs, University of Waterloo LECTURE 13: Query complexity and the polynomial method

Quantum algorithms (CO 781/CS 867/QIC 823, Winter 2013) Andrew Childs, University of Waterloo LECTURE 13: Query complexity and the polynomial method Quantum algorithms (CO 781/CS 867/QIC 823, Winter 2013) Andrew Childs, University of Waterloo LECTURE 13: Query complexity and the polynomial method So far, we have discussed several different kinds of

More information

Boolean Functions: Influence, threshold and noise

Boolean Functions: Influence, threshold and noise Boolean Functions: Influence, threshold and noise Einstein Institute of Mathematics Hebrew University of Jerusalem Based on recent joint works with Jean Bourgain, Jeff Kahn, Guy Kindler, Nathan Keller,

More information

Compatible Hamilton cycles in Dirac graphs

Compatible Hamilton cycles in Dirac graphs Compatible Hamilton cycles in Dirac graphs Michael Krivelevich Choongbum Lee Benny Sudakov Abstract A graph is Hamiltonian if it contains a cycle passing through every vertex exactly once. A celebrated

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

arxiv: v1 [cs.cc] 16 Mar 2017

arxiv: v1 [cs.cc] 16 Mar 2017 A Nearly Optimal Lower Bound on the Approximate Degree of AC 0 Mark Bun mbun@cs.princeton.edu Justin Thaler justin.thaler@georgetown.edu arxiv:1703.05784v1 [cs.cc] 16 Mar 2017 Abstract The approximate

More information

1 Basic Combinatorics

1 Basic Combinatorics 1 Basic Combinatorics 1.1 Sets and sequences Sets. A set is an unordered collection of distinct objects. The objects are called elements of the set. We use braces to denote a set, for example, the set

More information

Lecture 1 Measure concentration

Lecture 1 Measure concentration CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples

More information

A Comparison of GAs Penalizing Infeasible Solutions and Repairing Infeasible Solutions on the 0-1 Knapsack Problem

A Comparison of GAs Penalizing Infeasible Solutions and Repairing Infeasible Solutions on the 0-1 Knapsack Problem A Comparison of GAs Penalizing Infeasible Solutions and Repairing Infeasible Solutions on the 0-1 Knapsack Problem Jun He 1, Yuren Zhou 2, and Xin Yao 3 1 J. He is with the Department of Computer Science,

More information

MATH 1A, Complete Lecture Notes. Fedor Duzhin

MATH 1A, Complete Lecture Notes. Fedor Duzhin MATH 1A, Complete Lecture Notes Fedor Duzhin 2007 Contents I Limit 6 1 Sets and Functions 7 1.1 Sets................................. 7 1.2 Functions.............................. 8 1.3 How to define a

More information

Lecture 3: Error Correcting Codes

Lecture 3: Error Correcting Codes CS 880: Pseudorandomness and Derandomization 1/30/2013 Lecture 3: Error Correcting Codes Instructors: Holger Dell and Dieter van Melkebeek Scribe: Xi Wu In this lecture we review some background on error

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

How Low Can Approximate Degree and Quantum Query Complexity be for Total Boolean Functions?

How Low Can Approximate Degree and Quantum Query Complexity be for Total Boolean Functions? How Low Can Approximate Degree and Quantum Query Complexity be for Total Boolean Functions? Andris Ambainis Ronald de Wolf Abstract It has long been known that any Boolean function that depends on n input

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Error Correcting Codes Questions Pool

Error Correcting Codes Questions Pool Error Correcting Codes Questions Pool Amnon Ta-Shma and Dean Doron January 3, 018 General guidelines The questions fall into several categories: (Know). (Mandatory). (Bonus). Make sure you know how to

More information

Isomorphisms between pattern classes

Isomorphisms between pattern classes Journal of Combinatorics olume 0, Number 0, 1 8, 0000 Isomorphisms between pattern classes M. H. Albert, M. D. Atkinson and Anders Claesson Isomorphisms φ : A B between pattern classes are considered.

More information

Principle of Mathematical Induction

Principle of Mathematical Induction Advanced Calculus I. Math 451, Fall 2016, Prof. Vershynin Principle of Mathematical Induction 1. Prove that 1 + 2 + + n = 1 n(n + 1) for all n N. 2 2. Prove that 1 2 + 2 2 + + n 2 = 1 n(n + 1)(2n + 1)

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

SANDWICH GAMES. July 9, 2014

SANDWICH GAMES. July 9, 2014 SANDWICH GAMES EHUD LEHRER AND ROEE TEPER July 9, 204 Abstract. The extension of set functions (or capacities) in a concave fashion, namely a concavification, is an important issue in decision theory and

More information

CHAPTER 9. Embedding theorems

CHAPTER 9. Embedding theorems CHAPTER 9 Embedding theorems In this chapter we will describe a general method for attacking embedding problems. We will establish several results but, as the main final result, we state here the following:

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

Lecture 5: February 16, 2012

Lecture 5: February 16, 2012 COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on

More information

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +

More information

Higher-order Fourier analysis of F n p and the complexity of systems of linear forms

Higher-order Fourier analysis of F n p and the complexity of systems of linear forms Higher-order Fourier analysis of F n p and the complexity of systems of linear forms Hamed Hatami School of Computer Science, McGill University, Montréal, Canada hatami@cs.mcgill.ca Shachar Lovett School

More information

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,

More information

Chapter 8. P-adic numbers. 8.1 Absolute values

Chapter 8. P-adic numbers. 8.1 Absolute values Chapter 8 P-adic numbers Literature: N. Koblitz, p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd edition, Graduate Texts in Mathematics 58, Springer Verlag 1984, corrected 2nd printing 1996, Chap.

More information

Quantum boolean functions

Quantum boolean functions Quantum boolean functions Ashley Montanaro 1 and Tobias Osborne 2 1 Department of Computer Science 2 Department of Mathematics University of Bristol Royal Holloway, University of London Bristol, UK London,

More information

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Maxim Raginsky and Igal Sason ISIT 2013, Istanbul, Turkey Capacity-Achieving Channel Codes The set-up DMC

More information

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic

More information

Partial cubes: structures, characterizations, and constructions

Partial cubes: structures, characterizations, and constructions Partial cubes: structures, characterizations, and constructions Sergei Ovchinnikov San Francisco State University, Mathematics Department, 1600 Holloway Ave., San Francisco, CA 94132 Abstract Partial cubes

More information

Math 421, Homework #9 Solutions

Math 421, Homework #9 Solutions Math 41, Homework #9 Solutions (1) (a) A set E R n is said to be path connected if for any pair of points x E and y E there exists a continuous function γ : [0, 1] R n satisfying γ(0) = x, γ(1) = y, and

More information

4 Uniform convergence

4 Uniform convergence 4 Uniform convergence In the last few sections we have seen several functions which have been defined via series or integrals. We now want to develop tools that will allow us to show that these functions

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

NOTES FOR MAT 570, REAL ANALYSIS I, FALL Contents

NOTES FOR MAT 570, REAL ANALYSIS I, FALL Contents NOTES FOR MAT 570, REAL ANALYSIS I, FALL 2016 JACK SPIELBERG Contents Part 1. Metric spaces and continuity 1 1. Metric spaces 1 2. The topology of metric spaces 3 3. The Cantor set 6 4. Sequences 7 5.

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

POWER SERIES AND ANALYTIC CONTINUATION

POWER SERIES AND ANALYTIC CONTINUATION POWER SERIES AND ANALYTIC CONTINUATION 1. Analytic functions Definition 1.1. A function f : Ω C C is complex-analytic if for each z 0 Ω there exists a power series f z0 (z) := a n (z z 0 ) n which converges

More information

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Support weight enumerators and coset weight distributions of isodual codes

Support weight enumerators and coset weight distributions of isodual codes Support weight enumerators and coset weight distributions of isodual codes Olgica Milenkovic Department of Electrical and Computer Engineering University of Colorado, Boulder March 31, 2003 Abstract In

More information

ECE Information theory Final (Fall 2008)

ECE Information theory Final (Fall 2008) ECE 776 - Information theory Final (Fall 2008) Q.1. (1 point) Consider the following bursty transmission scheme for a Gaussian channel with noise power N and average power constraint P (i.e., 1/n X n i=1

More information

P-adic Functions - Part 1

P-adic Functions - Part 1 P-adic Functions - Part 1 Nicolae Ciocan 22.11.2011 1 Locally constant functions Motivation: Another big difference between p-adic analysis and real analysis is the existence of nontrivial locally constant

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

An upper bound on l q norms of noisy functions

An upper bound on l q norms of noisy functions Electronic Collouium on Computational Complexity, Report No. 68 08 An upper bound on l norms of noisy functions Alex Samorodnitsky Abstract Let T ɛ, 0 ɛ /, be the noise operator acting on functions on

More information

6.842 Randomness and Computation April 2, Lecture 14

6.842 Randomness and Computation April 2, Lecture 14 6.84 Randomness and Computation April, 0 Lecture 4 Lecturer: Ronitt Rubinfeld Scribe: Aaron Sidford Review In the last class we saw an algorithm to learn a function where very little of the Fourier coeffecient

More information

Trace Class Operators and Lidskii s Theorem

Trace Class Operators and Lidskii s Theorem Trace Class Operators and Lidskii s Theorem Tom Phelan Semester 2 2009 1 Introduction The purpose of this paper is to provide the reader with a self-contained derivation of the celebrated Lidskii Trace

More information