arxiv: v1 [cs.dm] 12 Jan 2018

Size: px

Start display at page:

Download "arxiv: v1 [cs.dm] 12 Jan 2018"

Blanche Morgan
5 years ago
Views:

1 SELF-PREDICTING BOOLEAN FUNCTIONS OFER SHAYEVITZ AND NIR WEINBERGER arxiv: v1 [cs.dm] 12 Jan 2018 Abstract. A Boolean function g is said to be an optimal predictor for another Boolean function f, if it minimizes the probability that f(x n ) g(y n ) among all functions, where X n is uniform over the Hamming cube and Y n is obtained from X n by independently flipping each coordinate with probability δ. This paper is about self-predicting functions, which are those that coincide with their optimal predictor. 1. Introduction One of the most important properties of a Boolean function f : { 1,1} n { 1,1} is its robustness to noise in its inputs. This robustness is traditionally measured by the noise sensitivity of the function (1) NS δ [f] := Pr(f(X n ) f(y n )), where X n { 1,1} n is a uniform Bernoulli vector, and Y n { 1,1} n is obtained from X n be flipping each coordinate independently with probability 0 < δ < 1/2. The noise sensitivity of Boolean function has been extensively investigated in the theory of Boolean functions [O D14], most often in terms of the equivalent notion of stability (2) Stab ρ [f] := Ef(X n )f(y n ), where 0 < ρ < 1 is the correlation parameter, i.e., ρ EX i Y i = 1 2δ. The noise sensitivity of f can also be interpreted as the error probability of a predictor trying to guess the value of f(x n ) based on their noisy version, by simply applying f to Y n. While this predictor is intuitively appealing and easy to analyze, it is generally suboptimal. As a simple example, think of the case where f is biased and the noise level δ is sufficiently high; it is easy to see that a constant predictor would result in a lower error probability than f(y n ) would. The optimal predictor, i.e., the one that minimizes the error probability in predicting f(x n ) from Y n, is clearly given by the sign of E(f(X n ) Y n = y n ), a function that in general can be very different than f itself. While using the optimal predictor is generally superior to using the function itself (albeit as we shall see, by a factor of two at the most), computing the former is often The authors are with the Department of EE Systems, Tel Aviv University, Tel Aviv, Israel. s: {nir.wein@gmail.com, ofersha@eng.tau.ac.il}. This work was supported by an ERC grant no

2 very difficult as it depends on the values of the function over the entire Hamming cube. It is therefore interesting to study functions that coincide with their optimal predictor; we call these functions self-predicting (SP). Note that a function can be SP at certain noise levels but not at others. We say that a function is uniformly SP (USP) if it is SP at any noise level. Predicting the value of a USP function by applying it to noisy inputs is always optimal, clearly a desirable property. For example, suppose the function describes a voting rule and the noise represents possible contamination of the votes (e.g., due to fraud). In such a case it is not realistic to assume that the noise level is known, yet if the function is USP it can always be used to obtain the optimal prediction of the true voting result. In this paper, we introduce and explore self-predictability of Boolean functions. We derive various properties of SP functions, and specifically the following: For a monotone function, self-predictability at dominating boundary points is necessary and sufficient for the function to be SP. We use this fact to show that Majority functions are USP. High correlation SP: A function with Fourier degree k is SP for any ρ > 1 1/k, and a polynomial threshold function with sparsity s is SP for any ρ exp ( s n ln s s 1). Also, if f is SP for ρ > 1 ε and n = Ω(1/ε), then each point x n has a distance-2 neighbor with the same function value. A low correlation SP (abbreviated LCSP) function is spectral threshold, i.e., equal to the sign of its lowest Fourier level. This simple fact implies many properties: LCSP functions are either balanced of constant, they have energy at 2 least1/2 on their first level (if any), and a monotone LCSP function is -close πn to a linear threshold function. Sharp threshold: While all functions are trivially SP for ρ > 1 2ln2 n +O(n 2 ), only a doubly-exponential small fraction are SP for ρ = 1 2α for any α > 1. n The same continues to hold in the fixed high-correlation regime. The paper is organized as follows. Section 2 contains basic notation and Fourier theory facts. The self-predictability problem and some basic properties are introduced in Section 3, including the proof that Majority is USP. In Section 4 we discuss high-correlation sufficient conditions for SP. In Section 5 we discuss low-correlation SP functions. Section 6 contains stability-based necessary conditions for SP. In Section 7 we prove the sharp threshold phenomenon for the SP property. We conclude the paper in Section 8 with a list of open problems.

3 2. Preliminaries 2.1. Notation and Definitions. We use upper case letters for random variables and random vectors, and their lower case counterparts for specific realizations. For vectors we write x j i = (x i,...,x j ) and omit the subscript whenever i = 1. A concatenation of vectors is denoted by (x j i,xm k ) = (x i,...,x j,x k,...,x m ). The cardinality of a set S will be denoted by S. The complement of the set A is denoted by A c. We write [n] for the set {1,2,...,n}. The sign function sgn(z) returns the sign of z, and by convention sgn(0) = 1 unless otherwise stated. Throughout, the logarithm log(t) is base 2, while ln(t) is the natural logarithm. The binary entropy function is h(t) := tlog(t) (1 t) log(1 t). The Hamming distance between x n and y n is d H (x n,y n ). In this paper, X n is a uniformly distributed binary vector, andy n is the binary vector obtained by flipping each coordinate of X n with some given probability δ [0,1/2]. We write p(x n,y n ) to denote the associated joint probability mass function. As a binary alphabet, for the most part we will find it convenient to work with { 1,1}, in which case it is more natural to consider the correlation parameter ρ := EX i Y i = 1 2δ [0,1] instead of the crossover probability parameter δ. We will use these notations throughout the paper, with the exception of a few proofs where we find it more convenient to work with δ and the binary alphabet {0,1} Boolean Functions and Fourier Analysis. In this paper we consider Boolean functions f : { 1,1} n { 1,1}. The distance between two Boolean functions f and g is defined as the fraction of inputs on which they disagree, i.e., Pr(f(X n ) g(x n )). We say that f and g are ε-close if their distance is at most ε. An inner product between two Boolean functions f,g can be defined as (3) f,g := E(f(X n )g(x n )). A character associated with a set of coordinates S [n] is the Boolean function x S := i S xi, where by convention x = 1. It can be shown [O D14, Chapter 1] that the set of all characters form an orthonormal basis with respect to (w.r.t.) to the inner product (3). Furthermore, (4) f(x n ) = S [n] ˆf S x S, where{ˆf S } S [n] are the Fourier coefficients off, given by ˆf S = x S,f = E(X S f(xn )). When S is a singleton {i} [n], we use the shorthand ˆf i = ˆf {i}. The Fourier weight of f at degree k is (5) W k [f] := ˆf S 2. S [n]: S =k

4 Instead of the noise sensitivity defined in (1) it is more common to consider the stability, defined as (6) Stab ρ [f] := E(f(X n )f(y n )). Note that the noise sensitivity and stability are trivially related via (7) Stab ρ [f] = 1 2NS1 ρ[f]. 2 Thus, the stability of a function is directly related to the error probability of the possibly suboptimal predictor f(y n ) to the function s true value f(x n ). When X n and Y n are ρ-correlated, it is useful to define the noise operator (8) T ρ f(y n ) := E(f(X n ) Y n = y n ). Evidently, since {(X i,y i )} is an i.i.d. sequence, (9) T ρ f(y n ) = E ˆf S X S Y n = y n S [n] (10) (11) = S [n] = S [n] ˆf S E ( X S Y n = y n) ˆf S E(X i Y n = y n ) i S (12) = S [n]ρ S ˆf S y S. The stability can then be expressed using the Fourier coefficients and the noise operator as (13) (14) (15) (16) (17) Stab ρ [f] = E(E(f(X n )f(y n )) Y n ) = E(f(Y n )E(f(X n ) Y n )) = E(f(Y n )T ρ f(y n )) = f,t ρ f (a) = S [n]ρ S ˆf 2 S (18) = T ρ f 2 2, where (a) is using Plancharel s identity f,g = E(f(X n )g(x n )) = S [n] ˆf S ĝ S.

5 A Boolean function f is called a linear threshold function (LTF) it there exists coefficients a n 0 R n+1 such that ( ) n (19) f(x n ) = sgn a 0 + a i x i. Note that if a 0 = 0 then f is balanced, i.e., Pr(f(X n ) = 1) = 1/2. More generally, a function f is a polynomial threshold function (PTF) [Bru90] of degree k if there exists {ˆp S } such that max S:ˆpS 0 S = k and (20) f(x n ) = sgn ˆp S x S. S [n] i=1 A PTF has sparsity s if {ˆp S } is supported over exactly s terms. For LTF and PTFs, we will always assume that coefficients are chosen such that the polynomial inside the sign operator is never exactly zero. 3. Optimal Prediction and Self Predicting (SP) Functions Let f : { 1,1} n { 1,1} be some Boolean function. It is easy to see that the optimal predictor (minimizing the error probability) of f(x n ) given that Y n = y n has been observed, is simply (21) sgne(f(x n ) Y n = y n ) = sgnt ρ f(y n ). Note that according to our definition sgn(0) = 1, but ties can of course be broken arbitrarily in any other way. We say that a Boolean function f is ρ-self-predicting (ρ- SP) at y n, if the optimal predictor given y n at correlation level ρ coincides with the function itself whenever it is not tied, i.e., if (22) f(y n ) = sgnt ρ f(y n ), whenever T ρ f(y n ) 0. The function f is called ρ-sp if it is ρ-sp for any y n { 1,1} n. We say that f is uniformly self-predicting (USP) if it is ρ-sp for any ρ [0,1]. We also say that f is low-correlation self-predicting (LCSP), if there exists some ρ > 0 such that f is ρ-sp for all ρ [0,ρ ). The following fact follows easily from the definition. Proposition 3.1. All the characters are USP. Proof. Let f(x n ) = x S for some S [n]. Then for any y n, (23) (24) sgnt ρ f(y n ) = sgn ( ρ S y S) = sgn ( y S)

6 (25) = f(y n ). We will later see there are other USP functions besides the characters. How far can a function be from self predicting? We say that a function is ε-close to ρ-sp, to mean that f and its optimal predictor sgnt ρ f are ε-close. Lemma 3.2. Any function f is S [n] (1 ρ S )ˆf S 2 -close to ρ-sp. Proof. Let A { 1,1} n be the set of all y n at which f is ρ-sp. Hence for any y n A it must be that f(y n ) T ρ f(y n ) < 0. Recalling that T ρ f(y n ) 1, we have that (26) E(f(Y n ) T ρ f(y n )) Pr(Y n A). On the other hand, it also holds that (27) E(f(Y n ) T ρ f(y n )) = ρ S ˆf2 S. The proof now follows by recalling that S ˆf 2 S = 1. For anyn, functions that depend on allnvariables can be found (even balanced ones), whose distance from their optimal predictor is larger than some universal constant. The problem with this measure of closeness to SP is that in many cases the optimal predictor might be different from the functions on inputs that are very noisy, i.e., where the posterior probability of the function value is close to uniform. Thus, a more practically motivated way of quantifying closeness to SP is by considering noise sensitivity and stability. Define the strong noise sensitivity of a function f to be (28) NS δ [f] := Pr(f(Xn ) sgnt ρ f(y n )) and the associated strong stability as (29) Stab ρ [f] := E(f(Xn ) sgnt ρ f(y n )). Of course, just as for the regular noise sensitivity and stability, we have the trivial connection (30) Stab ρ[f] = 1 2NS 1 ρ[f], 2 and we can express strong stability in terms of the noise operator: (31) (32) (33) Stab ρ[f] = E(E(f(X n ) sgnt ρ f(y n ) Y n )) = E(T ρ f(y n ) sgnt ρ f(y n )) = E T ρ f(y n )

7 (34) = T ρ f 1. Thus the 1-norm of T ρ f can be interpreted in terms of the error probability associated with the optimal predictor for f. Since the optimal predictor sgnt ρ f can only do better than f itself, we immediately have: Proposition 3.3. For any function f and any ρ (35) T ρ f 2 T 2 ρf 1, with equality if and only if f is ρ-sp. The strong stability can also be upper bounded by a regular stability expression. Proposition 3.4. Stab ρ [f] Stab ρ[f] Stab ρ 2[f]. Proof. Write (36) (37) (38) (39) (40) Stab ρ [f] = T ρf,sgnt ρ f (a) T ρ f 2 sgnt ρ f 2 = T ρ f,t ρ f (b) = T ρ 2f,f = Stab ρ 2[f]. where (a) is by the Cauchy-Schwartz inequality, and (b) is since T ρ f is a self-adjoint operator. An immediate consequence of the above is: Corollary 3.5. The strong noise sensitivity satisfies: (41) 1 Stab ρ 2[f] 1 Stab ρ [f] NS δ [f] NS δ[f] NS δ [f]. Note that this bound is tight for the characters (and again shows that they are USP). We can easily derive the following weaker statements: Corollary 3.6. For any f (42) If f is balanced, then (43) NS δ [f] 2 NS δ [f] NS δ[f]. NS δ [f] 1+ρ NS δ [f] NS δ[f]. We may obtain improved bounds for low correlation values:

8 Proposition 3.7. Suppose W 1 [f] > 0. Then: { } 1 (44) max 1, 2W1 [f] +O(ρ2 ) Stab ρ [f] Stab ρ [f] 1 W1 [f] +O(ρ2 ). Proof. We have that (45) (46) Stab ρ [f] = E T ρf(y n ) n = E ρˆf i Y i +O(ρ 2 ). i=1 Khintchine s inequality [Haa81] then implies (47) 1 2 W 1 [f] ρ+o(ρ 2 ) Stab ρ[f] W 1 [f] ρ+o(ρ 2 ), and the result follows from [O D14, Proposition 2.51] (48) Stab ρ [f] = W 1 [f] ρ+o(ρ 2 ) Majority is USP. The Majority function (for odd n) is given by (49) Maj(x n ) := sgn i [n]x i. In this subsection we show the following: Theorem 3.8. Majority is USP. We define the natural partial order over R k, where y k z k if and only if y i z i for all coordinates i. We write to denote the case of strict inequality in at least one of the coordinates. We say that a function f is monotone on a set of coordinates S [n], if f(y n ) f(z n ) whenever both y S z S and y [n]\s = z [n]\s. A function that is monotone on [n] is simply called monotone. Lemma 3.9. Let f : { 1,1} n { 1,1} be monotone on S [n], and suppose f(y n ) = 1. Let z n satisfy y S z S and y [n]\s = z [n]\s. Then if f is ρ-sp at y n, it is also ρ-sp at z n. We note that as usual, analogous statements immediately hold when the direction of monotonicity on every coordinate is determined separately. Proof. We prove the statement for a singleton S, say S = {n}. The general case then follows by applying the same argument repeatedly. Ify n = 1 the claim is trivial. Assume y n = 1 and let z n agree with y n except on the nth coordinate. Due to monotonicity

9 we have that f(z n ) = 1. Then (50) (51) (52) (53) (54) (55) T ρ f(z n ) = x n p(x n z n )f(x n ) = x n 1 = x n 1 x n p(x n 1 y n 1 )p(x n 1)f(x n ) p(x n 1 y n 1 ) [ δf(x n 1, 1)+(1 δ)f(x n 1,1) ] (a) p(x n 1 y n 1 ) [ (1 δ)f(x n 1, 1)+δf(x n 1,1) ] x n 1 = T ρ f(y n ) (b) 0, where (a) holds since f is monotone on the nth coordinate, and (b) holds by the assumptions that f(y n ) = 1 and that f is ρ-sp at y n. Recall that x n is called a boundary point of f if the value of f(x n ) can be flipped by filliping some single coordinate of x n. We further say that x n is a dominating boundary point of f if f(x n ) = 1 (resp. = 1) and f(y n ) = 1 (resp. = 1) for any y n x n (resp. x n y n ). The following corollary follows easily from Lemma 3.9. Corollary A monotone function is ρ-sp if and only if it is ρ-sp at all its dominating boundary points. Proof of Theorem 3.8. By Corollary 3.10 it suffices to check only the dominating boundary points, which in the case of Majority are exactly those pointsy n for which n i=1 y i = 1. Before we proceed with the proof, note that at least in the immediate neighborhood of such a point (say, Hamming distance one or two), there are more neighbors who disagree with y n on the value of the function, than those who agree with it. Due to oddness and symmetry, it suffices to check a single such point say, a concatenation of n 1 2 minus ones followed by n+1 2 ones (e.g., ( 1, 1,1,1,1) for n = 5). Let y n be that point, and note that y n 1 is balanced, i.e., n 1 i=1 y i = 0. Let us define for each x n 1 a conjugate vector x n 1 obtained by flipping all the bits ofx n 1, followed by a cyclic shift of n 1 2 symbols (e.g., x n 1 = (1, 1, 1, 1) and x n 1 = (1,1, 1,1)). Let A 0,A + and A be the sets of all balanced, positive sum, and negative sum vectors in { 1,1} n 1, respectively. We note that conjugation is a bijective mapping from A + to A which satisfies d H (x n 1,y n 1 ) = d H ( x n 1,y n 1 ), and so also p(x n 1 y n 1 ) = p( x n 1 y n 1 ). Hence, (56) T ρ Maj(y n ) = x n p(x n y n ) Maj(x n )

10 (57) (58) (59) (60) (61) = p(x n 1 y n 1 )p(x n 1) Maj(x n ) x n 1 x n = p(x n 1 y ) [(1 δ)maj(x n 1 n 1,1)+δMaj(x n 1, 1) ] x n 1 (a) = p(x n 1 y n 1 ) (1 2δ) x n 1 A 0 p(x n 1 y n 1 )+ p(x n 1 y n 1 ) x n 1 A x n 1 A + (b) = (1 2δ) Pr ( X n 1 A 0 Y n 1 = y n 1) 0, where (a) is since Maj(x n 1,x) = x for any x n 1 A 0, whereas Maj(x n 1,x) = Maj( x n 1,x) = 1 for any x n 1 A +, and (b) is by the properties of the conjugation mapping. Noting that the inequality in (61) is strict for any δ [0,1/2) we find that Majority is USP at y n, thus concluding the proof. Majority (and characters) are not the only USP functions, and not even the only USP LTF: Example The balanced LTFs with n = 5 and coefficients a 5 1 = (1,1,3,3,5), with n = 7 and coefficients a 7 1 = (1,1,3,3,3,5,7), with n = 9 and coefficients a9 1 = (1,1,3,3,3,5,5,5,7), with n = 11 and coefficients a 11 1 = (1,1,3,3,3,3,5,5,5,7,7) can all be verified by direct computation to be USP SP/USP Preserving Operators. Let us now discuss several operations that preserve self-predictability. First, we note that self-predictability is invariant to negation of inputs. We write for the Hadamard product. Proposition Let a n { 1,1} n Then, f(x n ) is ρ-sp if and only if f(a n x n ) is ρ-sp. The straightforward proof is omitted. Next, we consider the case of separable functions. Proposition Let f(x n ) = g(x k 1) h(x n k+1 ). Then f is ρ-sp if and only if both g and h are ρ-sp. Proof. If g and h are both ρ-sp then for any y n, (62) (63) (64) sgnt ρ f(y n ) = sgnt ρ ( g(y k ) h(y n k+1) ) = sgn ( T ρ g(y k ) T ρ h(y n k+1 )) = g(y k 1 ) h(yn k+1 )

11 (65) = f(y n ). Conversely, suppose that f is ρ-sp. Note that Lemma 3.2 implies in particular that there must exist at least one point yk+1 n at which h is ρ-sp. Without loss of generality, assume that h(yk+1 n ) = 1. Then for any yk (66) (67) (68) (69) sgnt ρ g(y k ) = sgnt ρ g(y k ) sgnt ρ h(yk+1) n = sgn ( T ρ g(y k ) T ρ h(yk+1) ) n = sgnt ρ f(y n ) = f(y n ) (70) (71) = g(y k ). = g(y k ) h(y n k+1 ) Hence g (and symmetrically, also h) is ρ-sp. Note that Proposition 3.1 also follows as a simple corollary to Proposition Next, we consider functions of equal-size disjoint characters. Proposition Let {S l [n]} l [m] be disjoint subsets of equal size S l = w. Let f : { 1,1} m { 1,1} be ρ w -SP. Then f(x S 0,x S 1,...,x S m 1 ) is ρ-sp. Proof. It is easy to check that the Fourier coefficients of h(x n ) = f(x S 1,x S 2,...,x Sm ) are given by ˆf T, S = t T S t (72) ĥ S =. 0, otherwise Hence, (73) (74) (75) (76) (77) (78) sgnt ρ h(y n ) = sgn ρ S ĥ S y S S [n] = sgn ρ w T ĥ t T S t y t TS t T [m] = sgn ρ w T ˆfT y T T [m] = sgnt ρ wf(y S 0,y S 1,...,y S m 1 ) = f(y S 0,y S 1,...,y S m 1 ) = h(y n ).

12 Example Using the fact that characters and Majority are USP functions, together with Propositions 3.12, 3.13 and 3.14, we can construct many distinct USP functions. For example, the function (79) sgn((x 1 x 2 +x 3 x 4 +x 5 x 6 ) (x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 ) x 16 ) is USP. Nonetheless, there are USP functions that cannot be constructed from characters and Majority this way. For example, none of these functions can be an LTF, as the USP functions in Example We note in passing that several seemingly plausible properties do not hold in general: Example The optimal predictor of a balanced function may not be balanced. For example, the function 1 4 (2x 1 +x 3 2x 1 x 2 +x 1 x 3 +x 2 x 3 x 3 x 4 +x 1 x 2 x 3 +x 1 x 3 x 4 x 2 x 3 x 4 +x 1 x 2 x 3 x 4 ) is a balanced function, yet sgnt ρ f is non-balanced when ρ = 1/2. Example In the following subsections we explore functions that are SP for high or low correlation. However, self-predictability is not necessarily a monotone property in ρ. to wit, if a function is ρ 0 -SP then might not be ρ-sp for some ρ ρ 0. Indeed, there are functions that admit an irregular behavior. For example, the balanced LTF with n = 11 and coefficients (80) a 11 1 = (13,43,67,67,67,117,153,165,165,179,179) can be verified by direct computation to be ρ-sp only for ρ [0,0.312] (0.544,1]. 4. High Correlation Sufficient Conditions In this section, we derive sufficient conditions on a function to be SP using various arguments. All our conditions will be high correlation ones, i.e., for ρ 0 larger than some threshold. Proposition 4.1. Any function is ρ-sp for ρ > 2 (n 1) /n 1, and there is no better universal guarantee. Proof. This range corresponds to the values of the crossover probability δ [0,1 2 1/n ) for which the probability no bit was flipped (1 δ) n, is at least 1/2. This bound is achieved with equality by the OR function OR(x n ). To see this, note that the OR

13 function is monotone and symmetric with two types of dominating boundary points. The first is the all-ones sequence 1 n. In this case (81) T ρ OR(1 n ) = (1 δ) n 1+[1 (1 δ) n ] ( 1). which is non-negative if and only if δ [0,1 2 1/n ]. The second type is y n = (1 n 1, 1) (or any permutation thereof), in which case (82) (83) (84) T ρ OR(y n ) δ(1 δ) n 1 δ 1+(1 δ(1 δ) n 1 ) ( 1) 1 2 (1 δ)n 2 1 < 0 for any δ [0,1/2). Our next goal is to obtain improved sufficient conditions using specific properties of the function. The extermal property of the OR function noted above may ostensibly be attributed to the fact that it is extremely unbalanced. Hence, it is natural to wonder if the statement in Proposition 4.1 would change if we restricted ourselves to balanced functions. As it turns out, the answer is no. Proposition 4.2. Any balanced function f is ρ-sp for ρ > 1 2ln(2) n there is no better universal guarantee. + O(n 2 ), and Proof. Note that the above region is essentially the same as the one in Proposition 4.1, hence one direction is clear. We need to show there exists a balanced function that is not ρ-sp at any point outside this region. To that end, let us introduce the enlightened dictator (E-DIC) function, defined for n 3 to be ( ) n (85) E-Dict(x n ) := sgn (n 2)x 1 + x i. Evidently, E-Dict(x n ) is determined by the dictator x 1, unless all the subjects x 2,...,x n disagree. It is easy to verify that E-Dict(x n ) is a monotone, odd (and hence balanced) function. By Lemma 3.9 we need only check its dominating boundary points to establish self-predictability. Due to oddness, it suffices to check the dominating boundary points for which E-Dict(y n ) = 1. There are two types of such points. The first is y n = ( 1,1 n 1 ). The function is SP at this y n if and only if i=2 (86) Pr(E-Dict(X n ) = 1 Y n = y n ) = (1 δ) n +δ(1 δ n 1 ) 1/2. The second derivative of the left-hand side (l.h.s.) above is n(n 1)((1 δ) n 1 δ n 2 ), which is non-negative for δ [0, 1/2], hence the l.h.s. is convex inside this interval. It is easy to check that equality in (86) holds for δ = ln(2) n 1 O(n 2 ) and for δ = 1/2,

14 hence by convexity y n is δ-sp if and only if δ < ln(2) n 1 O(n 2 ), or equivalently, ρ > 1 2ln(2) n 1 +O(n 2 ). The second type of dominating boundary points is of the form y n = (1,1, 1 n 2 ) (or any other permutation of the subjects). For this y n we have (87) (88) (89) Pr(E-Dict(X n ) = 1 Y n = y n ) = δ n 1 (1 δ)+(1 δ)(1 δ(1 δ) n 2 ) = (1 δ) [1 δ((1 δ) n 2 δ n 2 ) ] (1 δ) [1 δ(1 2δ)], where the inequality follows since (1 δ) n 2 δ n 2 1 2δ for δ [0,1/2] and any n 1. It is easy to check that (89) is strictly decreasing in δ [0,1/2] and equals 1/2 for δ = 1/2. This implies that the function is USP at this y n. Hence we conclude that E-Dict is ρ-sp if and only if ρ > 1 2ln(2) n 1 +O(n 2 ), concluding the proof Bounded Degree. Next, we provide an stronger statement that uses the degree Deg(f) of the function, i.e., the maximal character degree appearing in the Fourier representation of f. Theorem 4.3. Any function f is ρ-sp for ρ 1 1 Deg(f). Proof. Fix any y n and think of T ρ f(y n ) as a polynomial in ρ. Let ρ 0 be the largest root of this polynomial in [0,1] (if there is one, otherwise ρ 0 = 0). Since T ρ f(y n ) equals f(y n ) {1, 1} for ρ = 1, then by continuity f is ρ-sp at y n for any ρ ρ 0. Let us now upper bound ρ 0 for any y n, in terms of Deg(f). To that end, recall that Bernstein s inequality [RS02] states that for any polynomial Q(z) of degree k, (90) max dq(z) z 1 dz k max Q(z). z 1 So, since T ρ f(y n ) 1 for any ρ (0,1], and since the degree (in ρ) of T ρ f equals the (Fourier) degree of Deg(f) of f, we have (91) max d ρ [0,1] dρ T ρf(y n ) Deg(f), and the claim follows. Theorem 4.3 significantly improves on Theorem 4.1 whenever Deg(f) n, e.g., for n-dimensional functions f that can be computed by a decision tree of depth k, in which case Deg(f) k [O D14, Proposition 3.16] Sparse PTFs. Next, we derive a sufficient condition that applies to PTFs of a given sparsity.

15 Theorem 4.4. Let f be a PTF of sparsity s and character widths {w j } s j=1. Then f is ρ-sp for all ρ ρ 0 where ρ 0 is the (unique) solution to s (92) ρ w j = s 1. j=1 Proof. Let ζ w denote the probability that the value of a character of width w [n] is flipped over the noisy channel, i.e., ( w (93) ζ w := Pr X l (94) (95) = 1 2 ( l=1 = 1 ρw. 2 ) w Y l l=1 [ w ]) 1 Stab ρ X l Also, let {ˆp j } s j=1 and {S j } s j=1 denote the coefficients and character sets corresponding to the widths {w j } s j=1, respectively. Assume without loss of generality that f(y n ) = 1. Then T ρ f(y n ) can be expanded as follows: ( ) s (96) T ρ f(y n ) = E sgn ˆp j X S j Y n = y n (97) j=1 = Pr ( X S 1 = y S 1 Y n = y n) ( ( ) ) s E sgn ˆp 1 y S 1 + ˆp j X S j X S 1 = y S 1,Y n = y n j=2 l=1 +Pr ( X S 1 = y S 1 Y n = y n) ( ( ) ) s E sgn ˆp 1 y S 1 + ˆp j X S j X S 1 = y S 1,Y n = y n. We can add and subtract to the second addend above, noting that its absolute value is upper bounded by Pr ( X S 1 = y S 1 Y n = y n). This yields ( ) s (98) T ρ f(y n ) = E sgn ˆp j X S j Y n = y n (99) (100) E ( sgn j=1 ( ˆp 1 y S 1 + j=2 ) ) s ˆp j X S j Y n = y n j=2 2 Pr ( X S 1 = y S 1 Y n = y n) ( ( ) s = E sgn ˆp 1 y S 1 + ˆp j X S j Y n = y ) 2 ζ n w1. j=2

16 Continuing to eliminate terms in this manner, we obtain ( ( s ) ) T ρ f(y n ) = E sgn ˆp j X S j (101) Y n = y n (102) (103) j=1 ( s ) sgn ˆp j y S j 2 = 1 j=1 s (1 ρ w j ). j=1 Thus, f is ρ-sp at y n for any ρ satisfying s j=1 ρw j s 1. The derivation for the case where f(y n ) = 1 is similar. The claim now follows since s j=1 ρw j is monotonically increasing with ρ. The theorem is useful for moderate values ofn. Using the convexity ofρ t the following is easily verified: Corollary 4.5. Let f be a PTF of sparsity s. Then f is ρ-sp for all ( (104) ρ exp s ) n ln s. s 1 A simple generalization of Theorem 4.4 is as follows. Corollary 4.6. f(x n ) = sgn s j=1 f j(x n ) is ρ-sp for any ρ > ρ 0, where ρ 0 is the (unique) solution to (105) 1 s s j=1 s Stab ρ [f j ] = 1 1 s. j= Friendly Neighbors. Given a function f, we say that a point x n has a radius-d friendly neighborhood w.r.t. f if there exists some y n of distance at most d that agrees with x n, namely, where d H (x n,y n ) d and f(x n ) = f(y n ). Proposition 4.7. Suppose f is ρ-sp for all ρ > 1 ε, and n > max{2ε 1,γ} where γ is a universal constant. Then each point in { 1,1} n has a radius-2 friendly neighborhood w.r.t. f. Proof. Suppose toward contradiction that all the neighbors at Hamming distance 1 and 2 from some y n disagree with it. This implies that ( n (106) Pr(f(X n ) f(y n ) Y n = y n ) 1 (107) ζ wj ) δ(1 δ) n 1 + ( = (1 δ) n 2 nδ ( ) n δ 2 (1 δ) n 2 2 ) (1 δ)+ (n 1) δ 2.

17 Choosing δ = α, and assuming that n > 2α so that we are in the SP region, yields n ε ( Pr(f(X n ) f(y n ) Y n = y n ) 1 α ) ( n 2α 1+ α n 2 3α ) (108) 2n ( 1 α ) ( ) n 2 (α+ n) α2 1 (109) O 2 n ) ( ) = e α (α+ α2 1 (110) O. 2 n One can check that, e.g., for α = 1, (α+ α2 2 )e α > 1/2, and so f cannot be SP if n is larger than some universal constant, in contradiction. Hence, for a function to be SP even slightly above the guaranteed high correlation threshold of ρ > 1 2ln(2) n +O(n 2 ), every point admit a radius-2 friendly neighborhood. The OR function, e.g., does not satisfy this property. Furthermore, this result is tight: For the largest character x [n] = n i=1 x i, which is USP, the distance-1 neighbors of each point do not agree with it. The following corollary, which is not directly related to self-predictability, is obtained by combining Theorem 4.3 and Proposition 4.7. Corollary 4.8. If Degf < n/2 and n is larger than a universal constant, then each point in { 1,1} n has a radius-2 friendly neighborhood w.r.t. f. 5. Low Correlation Self Predicting (LCSP) Functions In this section we discuss LCSP functions, i.e., functions that are ρ-sp for any ρ < ρ for some ρ > 0. Note that any USP function is trivially also LCSP, hence all our LCSP necessary conditions will apply to USP functions verbatim LCSP and Spectral Threshold Functions. Let the minimal level of a function f be defined as (111) Lev(f) := min { k [n] : W k [f] > 0 }, and let (112) f Lev (x n ) := S: S =Lev(f) ˆf S x S. We say that f is weakly spectral threshold (WST) if f Lev (x n ) f(x n ) 0 for all x n, i.e., the sign of both functions agree whenever f Lev 0. We say that f is strongly spectral threshold (SST) if it is WST and f Lev is never zero. For an LTF f, the Fourier coefficients (ˆf φ, ˆf 1,..., ˆf n ) are known as the Chow parameters [Cho61, Tan61]. In this case, SST functions are exactly the LTFs for which

18 the solution to the Chow-parameters problem [OS11] is exactly the chow parameters themselves. Proposition 5.1. SST implies LCSP. Conversely, LCSP implies WST. Proof. The optimal predictor for f satisfies (113) sgnt ρ f(x n ) = sgn ρ ρ S Lev(f) ˆfS x S s: S Lev(f) (114) = sgn(f Lev (x n )+O(ρ)). Thus, sgnt ρ f(x n ) = sgnf Lev (x n ) for any ρ small enough whenever f Lev (x n ) 0. If f is SST f Lev (x n ) never vanishes, and hence f(x n ) = sgnf Lev (x n ) = sgnt ρ f(x n ), implying LCSP. Conversely, if f is LCSP, then f(x n ) = sgnt ρ f(x n ) = sgnf Lev (x n ) unless f Lev vanishes, implying WST. An immediate consequence of Proposition 5.1 is: Corollary 5.2. An LCSP function is either balanced or constant. Proof. Suppose f is LCSP and unbalanced. Then Lev[f] = 0 and ˆf φ 0, and by Proposition 5.1 it must be WST. Hence f = sgn ˆf φ { 1,1} must be constant. It is interesting to note that in light of Proposition 5.1, Proposition 3.7 immediately implies the following. Corollary 5.3. Let f be an LCSP function. Then either W 1 [f] = 0 or W 1 [f] 1/2. This result is very similar to the claim that W 1 [f] 1/2 for LTFs [O D14, Theorem 5.2]. Note however that the above claim holds for LCSP functions that are not LTFs but do have energy on the first level. Next, recall that Proposition 3.3 states a function is ρ-sp if and only if T ρ f 1 = T ρf 2 2. A similar property holds for f Lev if the function is LCSP. Corollary 5.4. If f is LCSP then f Lev 1 = f Lev 2 2. Proof. f must be WST by Proposition 5.1, and Plancharel s identity implies that (115) (116) (117) (118) E f Lev (X n ) = E(f Lev (X n ) f(x n )) = f Lev,f = S: S =Lev[f] ˆf 2 S = E ( f 2 Lev(X n ) ).

19 The following two examples show that the distinction between WST and SST in the theorem is necessary. Example 5.5 (LCSP does not imply SST). Consider the balanced LTF with n = 4 and coefficients a 3 1 = (2,1,1,1). This is a Majority function with a tie breaking input. It can be verified by direct computation that this function is USP, hence also LCSP. However, its level-1 Fourier coefficients are ( 3, 1, 1, 1 ). Hence, while it is clearly WST, it is not SST as there are 2 inputs for which f Lev (x n ) = 0. Example 5.6 (WST does not imply LCSP). The balanced LTF with n = 9 and coefficients a 9 1 = (1,5,16,19,25,58,68,91,94) can be verified to be WST, but not LCSP. It is ρ-sp only for ρ > The following example shows that the SST property is limited to the low-correlation regime only. Example 5.7 (SST does not imply USP). The LTF of Example 3.17 is SST, but as was shown there, is not USP. Thus, while an SST is always LCSP, it is not necessarily USP. We note in passing that are SST and WST functions outside Majority that are USP. Example 5.8. The LTF in Example 3.11 is SST and USP, while the balanced LTF with n = 9 and coefficients a 9 1 = (1,1,1,3,3,3,5,5,7) is WST and USP (f Lev = 0 for 30 inputs), but not SST. Next, using Proposition 5.1, we can show that the largest coefficients of an LCSP LTF cannot be too distinct. Proposition 5.9. Let f be an LTF that depends on all its n variables. Let a and b be its first and second largest coefficients in absolute values, respectively, in some representation of f. If f is LCSP then a < nlnn+1. b Proof. Assume without loss of generality that a 1 a 2 a n > 0. Recall also that by Corollary 5.2 we know that a 0 = 0. The level-1 Fourier coefficients are given by (119) (120) (121) ˆf k = E(X k f(x n )) ( n ) = E sgn a i X i X k i=1 ( = E sgn a k + ) a i X i i k

20 (122) (123) (124) ( = 2Pr a i X i a k ) 1 i k ( ( = Pr a i X i a k )+Pr a i X i a k ) 1 i k i k ) = Pr( a i X i a k. i k Assume without loss of generality that a 2 = 1, and write a := a 1. For brevity, also write Z := n i=3 a ix i and X := X 1. Then, from the symmetry of Z, (125) (126) (127) ˆf 1 = Pr( X +Z a) = Pr( 1+Z a) Pr( Z < a 1), and (128) (129) (130) ˆf 2 = Pr( ax +Z 1) Pr(a 1 Z a+1) Pr( Z a 1). Hence, (131) ˆf 1 ˆf 2 1 Pr( Z a 1). Pr( Z a 1) Since a i 1 for 3 i n, and assuming toward contradiction that a > nlnn+1, Hoeffding s inequality implies that (132) Pr( Z a 1) < 1/n, and so ˆf 1 /ˆf 2 > n 1. Noting that a i a j implies ˆf i ˆf j, we also have that ˆf 1 /ˆf i n 1+ε for any i > 1, for ε > 0 small enough. Since f is WST, i.e., f(x n ) = sgn n i=1 ˆf i x i whenever the right-hand side (r.h.s.) is nonzero, but for these ratios of coefficients clearly it must be that f(x n ) = x 1, in contradiction to the assumption that it depends on all the variables. For example, the enlightened dictator function E-Dict( ) (85) has first-to-second coefficient ratio of n 2, and thus cannot be LCSP. It should be noted however, that E-Dict( ) can also be written as an LTF with coefficients E-Dict( ) = ( n,1,c,c,...,c) wherec = n 1+ε for someε > 0. When, given in this form, Proposition 5.9 is incapable n 2 of ruling it out from being SP. Nonetheless, it is easy to verify that LTFs of coefficients (c, 1, 1,..., 1) for c < n 2 must have first-to-second-coefficient ratio of Ω(n).

21 5.2. LTF Approximation. The WST condition can be leveraged to show that a LCSP function can typically be well approximated by an LTF. Specifically: Theorem An LCSP f is 2 πn f -close to an LTF, where n f := {i [n] : ˆf i 0}. Corollary A monotone LCSP function that depends on all its coordinates is -close to an LTF. 2 πn To prove Theorem 5.10 we first establish the following technical lemma. We state it in a slightly more general form than we actually need. Lemma Let a n R n be a vector of nonzero coefficients. Then for any b R ) n ( ) n 2 (133) Pr( a i X i b < min a k 2 n k [n] n /2 πn. i=1 Proof. Write a = min a k and let { } (134) A x n { 1,1} n n : a i x i b < a. It is easy to see that A forms an antichain w.r.t. the partial order on { 1,1} n, i.e., that there are no two distinct x n,y n A such that x n y n. This holds simply since for such a pair it must hold that n (135) a i y i i=1 i=1 n a i x i 2a. i=1 Such an antichain is called a Sperner family, and Sperner s theorem [AS04, Maximal Antichains, Corollary 2] shows that (136) A concluding the proof. ( ) n n /2 Proof of Theorem Assume Lev[f] = 1 (trivial otherwise), and define g(x n ) = sgn( n ˆf i=1 i x i ). Let A = {x n { 1,1} n : g(x n ) = 0}. Using Lemma 5.12, we have that (137) Pr(X n A) 2 πn f. Since f is LCSP then by Proposition 5.1 is it also WST, and hence f(x n ) = g(x n ) for any x n A. By slightly perturbing the coefficients of g, one can clearly obtain a legal LTF g that takes values only in { 1,1} and still agrees with f for all x n A. The distance between f and g is therefore at most A /2 n.

22 5.3. Chow Distance. The Chow distance between two Boolean functions f and g is defined as (138) d Chow (f,g) ) 2 (ˆfi ĝ i i [n] 1/2. It was shown in [OS11, Prop. 1.5, Th. 1.6] that for any f and g ( ) 1 (139) 4 d2 Chow(f,g) Dist(f,g) Õ 1 logdchow (f,g) where for q < 1, Õ(q) means O(q log c (1/q)) for some absolute constant c. For LCSP LTF functions, the upper bound can be generally improved. We will state our result for the case where one of the functions is SST, though it can be somewhat cumbersomely extended to the case where none of them is. Let Gap[f] be the minimal positive value of n i=1 ˆf i x i over the Hamming cube (with Gap[f] = 0 if all the ˆf i s are zero). Theorem Let f and g be two balanced LCSP functions that depend on all n variables, and assume that f is SST. Then (140) Dist(f,g) d2 Chow (f,g). 2Gap[f] Proof of Theorem Let (141) B := {x n { 1,1} n : f(x n ) g(x n )}., Then (142) (143) (144) d 2 Chow (f,g) = E (f(x n ) g(x n )) ) (ˆfi ĝ i ) = 2E (ˆfi ĝ i i [n] X i 2Gap[f] Pr(X n B) i [n] X i ½(Xn B) where (142) follows from linearity of expectation and the definition of the Fourier coefficients, (143) holds since both f and g are WST by virtue of Proposition 5.1, and (144) holds since f is SST and g is WST. Lemma 5.12 implies that Gap[Maj]. Since Majority is SST, we have: 2 πn Corollary For odd n and any LCSP function g, (145) 1 4 d2 Chow(Maj,g) Dist(Maj,g) πn 8 d2 Chow(Maj,g).

23 6. Stability-based Conditions In this section we provide simple necessary conditions for a function to be ρ-sp, in terms of its stability and Fourier coefficients. Proposition 6.1. If f is ρ-sp then (146) Stab ρ [f] maxρ S ˆf S. S [n] Proof. If f is ρ-sp, then Stab ρ [f] = Stab ρ [f]. Letting T [n], the strong stability can be lower bounded as follows: Stab ρ [f] = E ρ S ˆf (147) S Y S S [n] = E ρ S ˆf (148) S Y S YT S [n] = E ρ S ˆf (149) S Y S Y T S [n] E S [n]ρ S ˆf (150) S Y S Y T (151) = ρ T ˆf T. The proof is completed by optimizing over T. Example 6.2. When f is the OR function, we have (152) max S [n] ρ S ˆf S = ˆf φ = n. It is easy to verify that (153) Pr(f(X n ) = f(y n )) = n (1 (1 δ) n ), and using ρ = 1 2δ (154) (155) Stab ρ [f] = 2 Pr(f(X n ) = f(y n )) 1 ( ( ) n ) 1+ρ = n 1. 2 Then, OR is ρ-sp only when Stab ρ [OR] n, which can be seen to be equivalent to ρ 2 ( n 1)/n 1. This is the same result that can be obtained by direct computation, and so the bound of Proposition 6.1 is tight in this case. We may deduce again the result of Corollary 5.2:

24 Corollary 6.3. An LCSP function is either balanced or constant. Proof. If f is ρ-sp then (156) Stab ρ [f] = S [n]ρ S ˆf S 2 ˆf φ. As ρ 0, this bound implies that ˆf φ 2 ˆf φ, and as ˆf φ 1, this is only possible when either ˆf φ = 0 or ˆf φ = 1. More generally, we have the following: Corollary 6.4. If f is LCSP then (157) W Lev[f] [f] max ˆf S. S [n]: S =Lev(f) Specifically, if f is also monotone, this bound reads (158) W 1 [f] = max i [n] ˆf i = maxinf[f], where the r.h.s. is the so-called maximal influence of f. When the Deg(f) < n, another bound of the form of Proposition 6.1 can be derived using the following implication of hypercontractivity [Bon70, Gro75]: When f : { 1,1} n R has Deg(f) = k then f 2 e k f 1 [O D14, Theorem 9.22]. Proposition 6.5. If f is ρ-sp and Deg(f) = k then (159) Stab ρ [f] e k Stab ρ 2[f]. Proof. As in the proof of Proposition 6.1, we lower bound E ρ S ˆf (160) S Y S = T ρf 1 S [n] (161) (162) (163) (164) (a) e k T ρ f 2 = e k T ρ f,t ρ f (b) = e k T ρ 2f,f = e k Stab ρ 2[f] where (a) is since if Deg(f) = Deg[T ρ f] = k, and (b) is since T ρ f is a self-adjoint operator. The last proof implies for a degree k, ρ-sp function f (165) e k Stab ρ 2[f] Stab ρ [f] Stab ρ 2[f].

25 It can be observed that even for a given degree k, neither of the bounds in Propositions 6.1 and 6.5 subsumes the other. 7. Sharp Threshold at High Correlation As we have seen, all functions are ρ-sp when ρ > 1 2ln2 n +O(n 2 ). In this section, we show that when the correlation is reduced ever so slightly to ρ 1 2, the fraction n of SP functions becomes double-exponentially small. Theorem 7.1. For any α > 1 and all n sufficiently large, the fraction of ρ-sp functions for ρ = 1 2α n is at most exp( 2n h(α 1 2α )+o(n) ). The fact that ρ-sp functions are rare is not limited to the ρ = 1 O( 1 n ) regime, yet different techniques are needed in order to establish this in other regimes. We next demonstrate how a similar phenomenon holds in a high correlation regime where ρ is fixed. Let η δ be the minimal η > 0 such that (166) 1 2 log 1 δ 2 +(1 δ) 2 < min { log } 1 1 δ,d(η δ) holds, where d(p q) := plog p 1 p +(1 p)log is the binary divergence function. It can q 1 q be verified that η δ < 1/4 for any δ < δ max Theorem 7.2. For any δ (0,δ max ) and all n sufficiently large, the fraction of ρ-sp functions for ρ = 1 2δ is at most exp ( 2 n[1 h(2η δ)] o(n) ). We begin with the proof of Theorem 7.1. Proof of Theorem 7.1. In this proof we find it more convenient to work with the δ and {0,1} convention. The proof comprises two steps. First, we focus on a specific y n and derive a sufficient condition for a function not to be ρ-sp at y n. This condition is specifically tailored to the regime of δ = Θ(1/n), and depends only on local values of the function, up to Hamming distance logn from y n. We show that for a random choice of function, the probability that this condition is satisfied decays exponentially, but we derive an upper bound on the exponential decay rate. In the second step, we recall that a function is ρ-sp only if it is ρ-sp at all 2 n points of the Hamming cube. This implies that the expected number of non ρ-sp points is exponentially large. While there are statistical dependencies between different points in the Hamming cube, Janson s theorem [AS04, Theorem 8.1.1] and the fact that the condition in the first step is local allows us to prove that the probability that all points in the Hamming cube are ρ-sp is only double-exponentially small.

26 We begin with the first step. To this end, let us denote the shell of radius d around x n {0,1} n by (167) S(x n,d) := { x n : d H (x n, x n ) = d}. For any function f : {0,1} n {0,1}, let the d-shell bias of f be (168) β d,f (x n 1 ) := f( x n ). S(x n,d) x n S(x n,d) Fix η > 0 and some y n. Without loss of generality, below we assume that f(y n ) = 0. Define the set of functions (169) B η (y n,1) = {f : β 1,f (y n ) 1 η}, and for 2 d l, the sets (170) B η (y n,d) = {f : β d,f (y n ) 1/2}, where l 3. We say that y n is bad for f if f B η (y n ), where (171) B η (y n ) := l B η (y n,d). d=1 Now, for any n > l, setting δ = α n, for any f B η(y n ): (172) (173) (174) (175) (176) Pr(f(X n ) f(y n ) Y n = y n ) l ( ) n β d,f (y n ) δ d (1 δ) n d d d=1 (1 δ) n d=1 ( 1 α ) n n l ( ) n β d,f (y n ) δ d d ( 1 l ) ( l l n d=1 β d,f (y n ) d! α d ) ( 1 α ) ( n 1 l ) ( ) l (1 η) α+ 1 l α d n n 2 d! d=2 ( = 1 α ) ( n 1 l ) ( ( l (1 η) α+ 1 e α 1 α n n 2 Choosing l = logn, (176) tends to (177) ( 1 2 η ) αe α 1 2 e α d=l+1 )) α d. d!

27 as n. Let (178) η α := α 1 2α. Clearly, η α is monotonically increasing for α > 0, where lim α 1 η α = 0, and lim α η α = 1/2. Setting η (0,η α ) guarantees that (177) is larger than 1/2 for all large enough n. Hence, for such a choice, (179) Pr(f(X n ) f(y n ) Y = y n ) > 1/2, and so (180) {f B η (y n )} {f is not ρ-sp at y n }. Let us now choose f uniformly at random over all functions Boolean functions on {0,1} n, and upper bound the probability Pr(f E}. This in turn will serve as an upper bound the probability that the function we draw is ρ-sp for the aforementioned ρ. To this end, note that Chernoff s bound implies that (181) Pr(β 1,f (y n ) 1 η) = 2 n(1 h(η))+o(n), and symmetry implies that (182) Pr(β d,f (y n ) > 1/2) = 1/2, for 2 d l = logn. By independence, (183) (184) (185) Pr(B η (y n )) = logn d=1 Pr(B η (y n,d)) = 2 n(1 h(η))+o(n) 2 (logn 1) = 2 n(1 h(η))+o(n), and so (186) Pr(f is not ρ-sp at y n ) 2 n(1 h(η))+o(n). This completes the first step of the proof. For the second step, note that if f is ρ-sp for ρ = 1 2α n 1 then it must be that f has no bad inputs, i.e., that f E, where (187) E := y n {0,1} n B c η (yn ). and therefore, the expected number of bad inputs is given by (188) µ := 2 n Pr(B η (y n )) = 2 nh(η)+o(n).

28 Had the number of bad inputs been distributed according to a Poisson distribution with expected value µ, then the probability of E would have been given by (189) Pr(E) = e µ = exp ( 2 n h(η)+o(n)). However, the events B η (x n ) and B η (y n ) are dependent whenever d H (x n,y n ) 2l. Nonetheless, Janson s correction [AS04, Theorem 8.1.1] implies that (190) Pr(E) e µ+ 2, where is a correction term that depends on joint probability of dependent bad events Pr(B η (x n ) B η (y n )). We next show that 0 as n exponentially fast, as long as η (0,h 1 (1/2)). As η max h 1 (1/2), for any η (0,η max ) (191) Pr(E) exp ( 2 n h(η)+o(n)). The statement of the theorem will then follow. To complete the proof, it remains to show that 0 exponentially fast. Let us denote x n y n whenever the events {f B η (x n )} and {f B η (y n )} are statistically dependent. For brevity, below we write B η (y n ) to mean the corresponding event. The term required for Janson s theorem is then given by (192) := x n y n Pr(B η (x n ) B η (y n )). Let us analyze the probability in (192) under the assumption that f(x n ) = f(y n ) = 0. Bayes rule implies that Pr(B η (x n ) B η (y n ) f(x n ) = 0,f(y n ) = 0) = Pr(B η (x n,1) f(x n ) = 0,f(y n ) = 0) Pr(B η (y n,1) f(x n ) = 0,f(y n ) = 0,B η (x n,1)) (193) ( l Pr B η (x n,d), d=2 ) l B η (y n,d) f(x n ) = 0,f(y n ) = 0,B η (x n,1),b η (y n,1). d=2 For the first probability on the r.h.s. of (193), we note that if d H (x n,y n ) 2 then S(x n,1) {x n,y n } = φ and (181) holds. Otherwise, if d H (x n,y n ) = 1 then S(x n,1) {x n,y n } = y n. In that case, Pr(B η (x n,1) f(x n ) = 0,f(y n ) = 0) (194) = Pr(β 1,f (x n ) 1 η f(x n ) = 0,f(y n ) = 0)

29 (195) (196) (197) (198) = Pr = Pr 1 ( n 1) 1 n 1 ỹ n S(x n,1)\{y n } ỹ n S(x n,1)\{y n } = 2 (n 1){1 h(η+o(n 1 ))}+o(n) = 2 n(1 h(η))+o(n), f(ỹ n )+f(y n ) 1 η f(x n ) = 0,f(y n ) = 0 f(ỹ n ) n n 1 (1 η) where the last transition is since h(η) is a smooth function, with bounded derivatives around a neighborhood of any fixed η (0,1). For the second probability on the r.h.s. of (193), if d H (x n,y n ) 3 then S(y n,1) {{x n,y n } S(x n,1)} = φ and (181) holds. Next, if d H (x n,y n ) = 1 then S(y n,1) {{x n,y n } S(x n,1)} = x n. A derivation similar to (198) shows that (199) Pr(B η (y n,1) f(x n ) = 0,f(y n ) = 0,B η (x n,1)) = 2 n(1 h(η))+o(n) holds. If d H (x n,y n ) = 2 then S(y n,1) {{x n,y n } S(x n,1)} contains exactly two points. Again, a derivation similar to (198) (with n 2 replacing n 1) shows that (199) holds. The third probability in the r.h.s. of (193) can be trivially upper bounded by 1. Thus, (200) Pr(B η (x n ) B η (y n ) f(x n ) = 0,f(y n ) = 0) 2 2n(1 h(η))+o(n). Evidently, analogous analysis holds for all other three possibilities of the pair(f(x n ),f(y n )), and so (201) Pr(B η (x n ) B η (y n )) 2 2n(1 h(η))+o(n). As if x n y n then d H (x n,y n ) 2l the number of dependent pairs is upper bounded by 2 n (n 2l). As l = logn was chosen, ( n 2l) n logn = 2 log2n. Then (201) implies that (202) (203) 2 n+o(n) 2 2n(1 h(η))+o(n) = 2 n(1 2h(η))+o(n). Thus, 0 as n exponentially fast, as long as η (0,h 1 (1/2)), as was required to be proved. We move on to the proof of Theorem 7.2. Proof of Theorem 7.2. The proof again comprises two steps, in the spirit of Theorem 7.1, first focusing on a single point, and then on the entire Hamming cube. However, the arguments in each step are different. Specifically, in the first step we derive a

30 necessary condition for a function to be ρ-sp, which is tailored to the regime of a fixed δ, and now based only on the value of the function at points of Hamming distance of (slightly less than) ηn with η < 1/4. We then use a central-limit theorem to show that the probability that this condition is satisfied is close to 1/2. In the second step, rather the considering all 2 n points of the Hamming cube, we consider a subset of the hamming cube of size about 2 n[1 h(2η)], which satisfies that the Hamming distance between each two points in the subset is at least ηn. The existence of this set is assured from the Gilbert-Varshamov bound [Rot06, Th. 4.10]. Since the points in this subset are sufficiently far apart, the event that the condition derived in the first step occurs for one point is independent of all other points. Thus, the probability of a function to be ρ-sp is about 2 2n[1 h(2η)]. For the first step, let η < 1/4 be given, and let B η (y n ) be a punctured Hamming ball of relative radius η around y n, i.e., { (204) B η (y n ) := z n { 1,1} n : 0 < 1 } n d H(z n,y n ) η. Then, clearly (205) p(y n y n ) f(x n ) p(y n y n ) = 2 nlog 1 1 δ, and by the Chernoff bound (or the method of types [CK11]) p(x n y n ) f(x n ) x n Bη c(yn )\y (206) p(x n y n ) n x n Bη c(yn )\y n (207) (208) = Pr(X n B η (y n ) Y n = y n ) 2 nd(η δ) Θ(logn). Focusing on a given y n, let us assume without loss of generality that f(y n ) = 1. Then, (209) (210) (211) and thus, E(f(X n ) Y n = y n ) = x n p(x n y n ) f(x n ) = p(y n y n ) f(x n )+ 2 nlog 1 1 δ + (212) {f is ρ-sp at y n } x n B η(y n ) x n B η(y n ) x n B η(y n ) p(x n y n ) f(x n )+ p(x n y n ) f(x n ) 2 nd(η δ) Θ(logn), p(x n y n ) f(x n ) 2 nlog 1 x n B c η(y n )\y n p(x n y n ) f(x n ) 1 δ +2 nd(η δ) Θ(logn).

CSE 291: Fourier analysis Chapter 2: Social choice theory

CSE 291: Fourier analysis Chapter 2: Social choice theory CSE 91: Fourier analysis Chapter : Social choice theory 1 Basic definitions We can view a boolean function f : { 1, 1} n { 1, 1} as a means to aggregate votes in a -outcome election. Common examples are: