Recovering randomness from an asymptotic Hamming distance Bjørn Kjos-Hanssen March 23, 2011, Workshop in Computability Theory @ U. San Francisco
Recovering randomness from an asymptotic Hamming distance Bjørn Kjos-Hanssen Consultants: Cameron E. Freer and Joseph S. Miller March 23, 2011, Workshop in Computability Theory @ U. San Francisco
Kolmogorov s probability experiment definition Understanding randomness
Understanding randomness Kolmogorov s probability experiment definition: Ω = sample space,
Understanding randomness Kolmogorov s probability experiment definition: Ω = sample space, P(A) [0, 1] for A Ω (A in a σ-algebra),
Understanding randomness Kolmogorov s probability experiment definition: Ω = sample space, P(A) [0, 1] for A Ω (A in a σ-algebra), P is countably additive, and
Understanding randomness Kolmogorov s probability experiment definition: Ω = sample space, P(A) [0, 1] for A Ω (A in a σ-algebra), P is countably additive, and P( ) = 0, P(Ω) = 1.
Understanding randomness Kolmogorov s probability experiment definition: Ω = sample space, P(A) [0, 1] for A Ω (A in a σ-algebra), P is countably additive, and P( ) = 0, P(Ω) = 1. Randomness is introduced by a phrase like
Understanding randomness Kolmogorov s probability experiment definition: Ω = sample space, P(A) [0, 1] for A Ω (A in a σ-algebra), P is countably additive, and P( ) = 0, P(Ω) = 1. Randomness is introduced by a phrase like If x Ω is chosen at random, then P(x A) = P(A) =...
Understanding randomness Kolmogorov s probability experiment definition: Ω = sample space, P(A) [0, 1] for A Ω (A in a σ-algebra), P is countably additive, and P( ) = 0, P(Ω) = 1. Randomness is introduced by a phrase like If x Ω is chosen at random, then P(x A) = P(A) =... What aspects of randomness are useful?
A webmaster s use of randomness
A webmaster s use of randomness
A webmaster s use of randomness 7 news items to display on web page, each with probability 1/7
A webmaster s use of randomness 7 news items to display on web page, each with probability 1/7 Do not want to store a record of what has been displayed, so cycling through the items is not an option.
A webmaster s use of randomness 7 news items to display on web page, each with probability 1/7 Do not want to store a record of what has been displayed, so cycling through the items is not an option. Do not want to display the same item too often
A webmaster s use of randomness 7 news items to display on web page, each with probability 1/7 Do not want to store a record of what has been displayed, so cycling through the items is not an option. Do not want to display the same item too often Sequence of displayed items should satisfy strong law of large numbers
A webmaster s use of randomness 7 news items to display on web page, each with probability 1/7 Do not want to store a record of what has been displayed, so cycling through the items is not an option. Do not want to display the same item too often Sequence of displayed items should satisfy strong law of large numbers Not just overall, but in each user s experience
A webmaster s use of randomness
A webmaster s use of randomness currentnewsid := Math.floor(Math.random()*7); makenews(currentnewsid);
A webmaster s use of randomness currentnewsid := Math.floor(Math.random()*7); makenews(currentnewsid); Hopefully Math.random() gives random enough output.
Random enough Definition (Martin-Löf, 1966) x 2 ω is random enough if for each uniformly Σ 0 1 sequence of sets U n 2 ω with P(U n ) 2 n, we have x n U n.
Random enough Definition (Martin-Löf, 1966) x 2 ω is random enough if for each uniformly Σ 0 1 sequence of sets U n 2 ω with P(U n ) 2 n, we have x n U n. Example U n = those outcomes where the AMS meeting news item appears more than 1/7 + ε of the time at some point after time n.
Theorem (Schnorr/Levin, 1970s) x is Martin-Löf random K(A n) n c for all n, where c is a constant and K is prefix-free Kolmogorov complexity.
Theorem (Schnorr/Levin, 1970s) x is Martin-Löf random K(A n) n c for all n, where c is a constant and K is prefix-free Kolmogorov complexity. Definition (K,Merkle,Stephan 2006) A real x is complex if its prefixes have Kolmogorov complexity bounded below by an order function (an unbounded, nondecreasing computable function). For example, x is complex if K(x n) n + log n...
Stochastic immunity Definition A set X is immune if for each N C, N X. If ω \ X is immune then X is co-immune. If X is both immune and co-immune then X is bi-immune. Definition A set X is stochastically bi-immune if for each set N C, X N satisfies the strong law of large numbers, i.e., X N n lim = 1 n N n 2.
Stochastic immunity Definition A set X is immune if for each N C, N X. If ω \ X is immune then X is co-immune. If X is both immune and co-immune then X is bi-immune. Definition A set X is stochastically bi-immune if for each set N C, X N satisfies the strong law of large numbers, i.e., X N n lim = 1 n N n 2. Could also be called: weakly Mises-Wald-Church stochastic
Definition A sequence X 2 ω is Mises-Wald-Church stochastic if no partial computable monotonic selection rule can select a biased subsequence of X, i.e., a subsequence where the relative frequencies of 0s and 1s do not converge to 1/2.
Question Is every stochastically bi-immune set complex?
Question Is every stochastically bi-immune set complex? Doubtful, but the answer is almost yes.
Question Is every complex set stochastically bi-immune?
Question Is every complex set stochastically bi-immune? No, consider x.
Question Is every complex set stochastically bi-immune? No, consider x. Question Can we from a complex set compute a stochastically bi-immune set?
Medvedev reducibility
Theorem (Downey, Greenberg, Jockusch, Milans 2009) MLR s DNR 3
Theorem (Downey, Greenberg, Jockusch, Milans 2009) MLR s DNR 3 Corollary (inspection of the proof) MWC-stochastic s DNR 3
Theorem (Downey, Greenberg, Jockusch, Milans 2009) MLR s DNR 3 Corollary (inspection of the proof) MWC-stochastic s DNR 3 Theorem (K, Merkle, Stephan 2006) Complex s DNR 3
Theorem (Downey, Greenberg, Jockusch, Milans 2009) MLR s DNR 3 Corollary (inspection of the proof) MWC-stochastic s DNR 3 Theorem (K, Merkle, Stephan 2006) Complex s DNR 3 Corollary MWC-stochastic s Complex
Theorem (Downey, Greenberg, Jockusch, Milans 2009) MLR s DNR 3 Corollary (inspection of the proof) MWC-stochastic s DNR 3 Theorem (K, Merkle, Stephan 2006) Complex s DNR 3 Corollary MWC-stochastic s Complex Theorem (K, in preparation) weakly MWC-stochastic (stochastically bi-immune) s Complex
Relationship to Hamming distance Hamming distance d(σ, τ) is given by d(σ, τ) = {n : σ(n) τ(n)}. Let the collection of all infinite computable sets be denoted by C. Let p : ω ω. For X, Y 2 ω and N C we write X p,n Y ( n N) (d(x n, Y n) p(n)). X p Y X p,ω Y. X p Y X p,l Y ( L C). Easy to understand randomness extraction in terms of p. Seems hard in terms of p.
Can (almost exactly) characterize those functions p for which MLR s {y : y p x x MLR} If p(n) << n then yes.
Can (almost exactly) characterize those functions p for which MLR s {y : y p x x MLR} If p(n) << n then yes. Majority vote If p(n) >> n then no.
Can (almost exactly) characterize those functions p for which MLR s {y : y p x x MLR} If p(n) << n then yes. Majority vote If p(n) >> n then no. More complicated p(n) >> n is compatible with y being complex. This way we get stochastically bi-immune s complex. Can even ensure dim p (y) = 1 (where dim p is effective packing dimension).
Definition Define effective Hausdorff dimension, complex packing dimension, and effective packing dimension of a set A 2 ω by: dim H (A) = lim inf n ω dim cp (A) = sup inf N C n N dim p (A) = lim sup n ω K(A n) n K(A n) n K(A n) n (0 dim H dim cp dim p 1)
Main Theorem Let A and B with P(A) = P(B) = 1 be given by A = {X : X is weakly 3-random}, B = {X : X is stochastically bi-immune}. Theorem Let p : ω ω be any computable function such that p(n) = ω ( n). Let Φ be a Turing reduction. Then ( X A)( Y p X)(Φ Y B). Two cases: Φ maps almost every random to a random, or not.
Hamming distance For a set of strings A, d(x, A) = min{d(x, y) : y A}. The r-fold boundary of A {0, 1} n is Balls centered at 0 are given by {x {0, 1} n : 0 < d(x, A) r}. B(p) = {x : d(x, 0) p}, where 0 is the string of n many zeroes. A Hamming-sphere is a set H with B(p) H B(p + 1).
Hamming-spheres Theorem (Harper, 1966) For each k, n, r, one can find a Hamming-sphere that has minimal r-fold boundary among sets of cardinality k in {0, 1} n. So a ball is a set having minimal boundary just like in Euclidean space. The cardinality of B(p) is ( ( n 0) + n p). The r-fold boundary of B(p) is just the set B(p + r) \ B(p), which then has cardinality ( ) ( n p+1 + + n p+r).
Proving the Main Theorem By Harper s theorem balls in Hamming space have minimum boundary (and generally, k-fold boundary) for given volume. Fortunately, balls still have pretty large (and known) boundary. So all sets have large boundary. Thus the adversary producing Y from X can often be on the boundary of the set {X : Φ X (n) = 1} and thus by making few changes set Φ = 0 or Φ = 1 as they desire.
Two kinds of Turing reductions 1 (Easy) λ{x ( Y = X) Φ Y MLR} = 1. Then for almost every X there is a Y which trivially satisfies Y p X, with Φ Y MLR. Unfortunately need X to be weakly 3-random! 2 (Hard) λ{x Φ X MLR} = 1. Then the averages of values Φ(n) approach 1/2 on each sequence of integers almost surely, and one can use a Harper style argument. Get for each 1-random set X a Y p X with Φ Y MLR.
The hard case: Φ maps randoms to randoms Problem is, Φ may have non-disjoint use. If Φ maps all randoms to randoms, then so does Φ σ, defined by Φ X σ(n) = Φ σ X (n) where σ X = σ(0)σ(1) σ(k 1)X(k)X(k + 1), k = σ. Technical detail: We can computably find infinitely many inputs n where the image measure of Φ σ is close to Lebesgue measure in a uniform way over all choices of σ of a fixed length. We then take σ to be the part of Y so far constructed.
Question Why did you obtain results for p but not for p? Lack of problem-solving... or a valid reason?
Justifying p, Part I Theorem (Law of the iterated logarithm, Khintchine 1924) Let X = (X 0, X 1,...) be a random variable on 2 ω having the fair-coin distribution. Let S n = n 1 k=0 X k = d(x n, n). Then with probability one, S n n 2 lim sup n ϕ(n) n = 1, 1 where ϕ(n) = 2 log log n.
Justifying p, Part I Theorem (Law of the iterated logarithm, Khintchine 1924) Let X = (X 0, X 1,...) be a random variable on 2 ω having the fair-coin distribution. Let S n = n 1 k=0 X k = d(x n, n). Then with probability one, S n n 2 lim sup n ϕ(n) n = 1, 1 where ϕ(n) = 2 log log n. The standard deviation of S n is simply n, so why the strange ϕ(n)?
Justifying p, Part I Theorem (Law of the iterated logarithm, Khintchine 1924) Let X = (X 0, X 1,...) be a random variable on 2 ω having the fair-coin distribution. Let S n = n 1 k=0 X k = d(x n, n). Then with probability one, S n n 2 lim sup n ϕ(n) n = 1, 1 where ϕ(n) = 2 log log n. The standard deviation of S n is simply n, so why the strange ϕ(n)? Michel Weber (1990): can replace ϕ by arbitrarily slow-growing function if we take lim sup n N for sparse set N.
Justifying p, Part II Theorem If X is Kurtz random relative to A and S n := d(x n, A n) then lim sup n S n n 2 ϕ(n) 1. n
Justifying p, Part II Theorem (K, 2007) A has non-dnr Turing degree = each Martin-Löf random set X is Kurtz random relative to A. (Deeper but less useful here: the converse also holds; Greenberg and J. Miller, 2009.) Corollary A p X, where p is computable, X is Martin-Löf random, and P({Y : Y p X}) = 0 = A has DNR degree. The amount of closeness p is sharp because P(Y p X) = P(Y p ), and does not have DNR degree.
The end
Review of year-old material
Can we extract randomness from an almost random sequence? Fix p : N N. Question: is there a procedure Φ such that if X {0, 1} N is random and Y p X, then Φ(Y) is random? We can call Φ a randomness extractor. Y is almost random, and Φ extracts randomness from Y.
Can we extract randomness from an almost random sequence? If our adversary must decide in advance which bits to corrupt, then parity will be a randomness extractor. Otherwise, one approach is to take Φ to be the Majority function on disjoint blocks of the input. Maj 5 (0, 0, 0, 1, 0) = 0 Maj 3 (x, y, z) = (x y) (y z) (x z) Maj 2n+1 (x 0,..., x 2n ) = I 2n, card(i)=n+1 i I x i Does this work if p is sufficiently slow-growing? At least if X is random then Maj(X) is random.
Majority - a randomness extractor? If X is random and Y p X, is Φ(Y) still random? Majority is somewhat robust or stable. Hamming distance on {0, 1} 2n+1 : d(σ, τ) = cardinality of {k σ(k) τ(k)}. Example: If d(σ, 0, 0, 0, 1, 0 ) 1 then Maj 5 (σ) = Maj 5 ( 0, 0, 0, 1, 0 ). We can change roughly n of the inputs and not alter the value of Majority. For large n, for most σ of length n, for all τ, d(σ, τ) significantly less than n Maj(σ) = Maj(τ).
Let X {0, 1} N be a random variable whose distribution is that of infinitely many mutually independent fair coin tosses. The standard deviation of S n = n i=1 X(n) from its mean n/2 is n/2. The Central Limit Theorem: ( P S n n 2 + a ) n 2 ( ) Sn n/2 = P a n/2 N(a) := 1 2π a e x2 /2 dx as n. Thus if a = a(n) then this probability 0, e.g. if a(n) = log n.
Can we compute a random sequence Z from an almost-random sequence Y? Y = 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, {0, 1} N X is random, Y p X, Z is computed from Y We can use Majority as follows: Y = 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0,... }{{}}{{}}{{}}{{} 1, 0,... Z= 0, The block size is here 1, 3, 5,... [Block size must go to. Otherwise our adversary who constructs Y can affect the limiting frequency of 1s in Z on a computably selected sequence of blocks.] Actually each block should be larger than all previous blocks combined.
How close to the random set X must Y be for Majority to work? p(n) lim = 0 n n p(n) = o( n) If p(n) = o( n), X is random, and Y p X, then Z :=Majority(Y) is random.
Democracy is the best form of randomness extraction If X is random and Y p X where p(n) = o( n) then majority for Y is random. For any procedure Φ (such as Majority), if X is random and g(n) O( n) then there is a Y with Y g X such that Φ(Y) is not random. This phenomenon for random variables on {0, 1} n rather than individual elements of {0, 1} N is known in combinatorics / computer science. Harper 1966 Ben-Or and Linial 1989 Lichtenstein, Linial, and Saks 1989.
Positive result p(n) = o( n) Φ = majority function on disjoint blocks of increasing size For all ML-random X and all Y p X, Φ Y is ML-random. Negative result p(n) O( n) Almost optimal. Φ Turing reduction For almost all X there is a Y p X with Φ Y not ML-random.