CS 683 Spring 07 Learning, Games, and Electronic Markets Solution Set for Homework #1 1. Suppose x and y are real numbers and x > y. Prove that e x > ex e y x y > e y. Solution: Let f(s = e s. By the mean value theorem, there exists a number z such that x > z > y and f f(x f(y (z =. x y Observe that the left side is equal to e z and the right side is equal to (e x e y /(x y. As x > z > y we have e x > e z = ex e y x y > ey. 2. A standard 52-card deck 1 is randomly partitioned into four 13-element sets, which are dealt to players named North, South, East, and West. (a Calculate Pr(South gets exactly 2 aces North gets exactly 1 ace. Solution: For a player P, let A(P denote the number of aces dealt to P. We have Pr(A(South = 2 A(North = 1 = Pr(A(South = 2 and A(North = 1. Pr(A(North = 1 The number of ways of dealing a hand to North is ( 52 13, and the number of ways of dealing a hand to North with exactly 1 ace is ( ( 4 48 ( 1 12 because there are 4 1 ways to deal an ace to North and ( 48 12 ways to deal North s 12 remaining cards from the remaining 48 cards which are not aces. Hence, ( 4 48 Pr(A(North = 1 = 1( 12. ( 39 Similarly, there are ( 52 13 13 ways to deal a 13-card hand to each of North and South. (After dealing 13 cards to North, there are 39 cards remaining in the deck and hence ( 39 13 ways to deal South a 13-card hand from these remaining cards. Of all the possible ways to deal North and South s 13-card hands, there are ( 4 3 ( 48 ( 36 1( 2 12 11 ways to deal a pair of hands in which North gets one ace and South gets two. (The product of four terms is justified by considering the number of ways to deal one ace to North, the number of ways to deal two of the remaining aces to South, the number of ways to deal 12 more non-ace cards to North, and 1 A standard 52-card deck is the set 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, queen, king, ace} clubs, diamonds, hearts, spades}. ( 52 13
the number of ways to deal 11 more non-ace cards to South from the remaining 36 non-ace cards which were not dealt to North. Hence, ( 4 ( 3 48 ( 36 1 Pr(A(South = 2 and A(North = 1 = 2( 12 11. Putting all of this together, we find that Pr(A(South = 2 A(North = 1 = = ( 4 1 ( 52 13 ( 3 48 2( 12 ( 36 11 ( 52 ( 39 13 13 ( 3 36 2( 11 ( 39 13 ( 39 13 = 156 703 0.2219... / ( 4 1 ( 48 12 ( 52 13 (b Let H and D denote the number of hearts and diamonds, respectively, dealt to North. Calculate E(H D as a function of D. Solution: Let S and C denote the number of spades and clubs, respectively, dealt to North. By symmetry, we have E(H D = E(S D = E(C D and we also know that H + S + C = 13 D. Hence E(H D = 1 3 E(H + S + C D = 1 (13 D. 3 3. Let x 1, x 2,..., x n be independent uniformly-distributed random samples from the interval [0, 1]. Define the following probabilities: p(n is the probability that min k x k > 0.01. q(n is the probability that min i j x i x j < 1 n 2. r(n is the probability that min i j x i x j > 1 100n. s(n is the probability that exactly n/2 of the numbers x 1,..., x n lie in [0, 1 2 ]. Estimate the asymptotic behavior of each of these probabilities as n tends to infinity. Specifically, for each of the sequences p(n, q(n, r(n, s(n, determine whether the sequence (A tends to zero exponentially fast, i.e. is bounded above by c n for some constant c < 1, for all sufficiently large n; (B tends to zero, but not exponentially fast; (C remains bounded away from 0 and 1;
(D tends to 1. Solution: Answers: (A for p(n. (C for q(n. (A for r(n. (B for s(n. Justifications: (a We have p(n = (0.99 n, hence (A is correct. (b First, here s a heuristic for seeing that q(n should remain bounded away from 0 and 1. For each pair of distinct indices i, j, the probability that x i x j < 1 n 2 is very close to 2. Hence the expected number of unordered pairs i, j} such n 2 that i j and x i x j < 1 is roughly ( n 2, which approaches 1 as n. n 2 2 n 2 Since the expected number of pairs i, j} satisfying x i x j < 1 is approaching a n 2 constant, it is reasonable to suspect that the probability that one such pair exists is bounded away from 0 and 1. To make this rigorous, we start by establishing the following lemma. Lemma 1. Let ε > 0 be a positive real number, n be a positive integer, and x 1, x 2,..., x n independent uniformly-distributed samples from [0, 1]. The probability that min i j x i x j > ε is bounded above by e (n 1(n 2ε/2. Proof. For a point x [0, 1] let I x denote the interval [x, x + ε] [0, 1], and for a set S [0, 1] let I S = x S I x. Let us call a set S [0, 1] independent if x y > ε for all distinct x, y S. It is clear that if S is an independent set of k elements then the sets I x (x S are pairwise disjoint, and at most one of them has measure less than ε; consequently I S has measure at least (k 1ε. Now let S k = x 1, x 2,..., x k } and observe that Pr(x 1,..., x n } is independent n = Pr(S k is independent S k 1 is independent k=2 n Pr(x k I Sk 1 S k 1 is independent k=2 n [1 (k 2ε] k=2 n k=2 e (k 2ε = e (n 1(n 2ε/2.
If we apply Lemma 1 to x 1, x 2,..., x n } with ε = 1 we find that n 2 Pr(min i j x i x j > 1 n 2 < e (n 1(n 2/2n2. The right side converges to e 1/2 as n, and this proves that q(n is bounded away from 0. To prove that q(n is bounded away from 1, let Y be the random variable which counts the number of unordered pairs i, j} of distinct elements of [n] satisfying x i x j < 1 n 2. For any particular pair i, j, we have Hence Pr ( x i x j 1n x 2 i = 1n + min x 2 i, 1 } n, 1 x 2 i. Pr ( x i x j 1n = 1n 1 + min x, 1n }, 1 x dx = 2 2 2 0 2 n 1 2 n, 4 and consequently E(Y = ( n 2 ( 2 n 2 1 n 4, which tends to 1 as n. On the other hand, since Y is a non-negative integer valued random variable we have E(Y = Pr(Y > n Pr(Y > 0 + Pr(Y > 1 = q(n + Pr(Y > 1. n=0 The left side tends to 1 as n. So if we can prove that Pr(Y > 1 is bounded away from zero, this implies that q(n is bounded away from 1. To prove Pr(Y > 1 is bounded away from zero, we apply Lemma 1 twice, with ε = 1/n 2, using the sets T = x 1, x 2,..., x n/2 } and U = x n/2 +1,..., x n }. We find that lim sup n Pr(T is independent and lim sup n Pr(U is independent are both bounded above by e 1/8. Also, the events T is independent and U is independent are independent, so lim inf n Pr(neither T nor U is independent is greater than or equal to (1 e 1/8 2. The event neither T nor U is independent implies that Y 2, which completes the proof that Pr(Y > 1 is bounded away from zero. (c Applying Lemma 1 with ε = 1 100n, we conclude that r(n e (n 1(n 2/200n, which tends to zero exponentially fast as n tends to infinity. (d Let T = i 0 x i 1 }. For any given set U [n], we have Pr(T = U = 2 2 n since the numbers x 1,..., x n are independent, and each has probability 1 of 2 belonging to the interval [0, 1 ]. Hence 2 s(n = ( n 2 n. (1 n/2
Using Stirling s approximation to the factorial function, we find that ( n 2πn (n/e n = Θ ( n/2 2π(n/2 2 = Θ (n/2e n/2 ( n 1/2 2 n. (2 Putting together (1 and (2, we find that s(n = Θ ( n 1/2, which implies that s(n tends to zero, but not exponentially fast. 4. Let x be a real-valued random variable which is exponentially distributed with decay rate 4, i.e. Pr(x > r = e 4r for all r > 0. (a What is the probability density function of x? Solution: The cumulative distribution function of x is 1 e 4r if r 0 F (r = 0 if r < 0. so the probability density function is f(r = F (r = (b What is the expected value of x? 4e 4r if r 0 0 if r < 0. Solution: For a non-negative random variable x, it holds that E(x = For the given distribution of x, this implies E(x = 0 0 Pr(x rdr. e 4r dr = 1 4 e 4r = 1 4. (c Let y = x. What is the probability density function of y? Solution: The cumulative distribution function of y is G(s = Pr(y s Pr(x s = 2 if s 0 0 otherwise 1 e 4s 2 if s 0 = 0 otherwise. Therefore the probability density function of y is 8se g(s = G 4s 2 if s 0 (s = 0 otherwise. 0
5. (a You are shopping for a house by observing a sequence of houses one by one, in random order. You have decided to stop and buy a house as soon as you see one which is nicer than the first house you observed. What is the expected number of houses you will have to look at, including the first one? Mathematically, let s model this process as follows. Let z 1, z 2,... denote an infinite sequence of independent uniformly-distributed random samples from the interval [0, 1]. (Interpretation: z i is the quality of the i-th house you observed. Let τ be the smallest i > 1 such that z i > z 1. What is E(τ? Solution: For a non-negative integer valued random variable X, it holds that E(X = n=0 Pr(X > n. The probability that τ > n is equal to 1 if n = 0, and otherwise it is the probability that z 1 = maxz 1, z 2,..., z n }, which is equal to 1/n since each of z 1,..., z n is equally likely to be the maximum. Hence 1 E(τ = Pr(τ > n = 1 + n =. n=0 (b Now suppose that you modify your stopping rule. For some fixed predetermined number k, you observe the first k houses without purchasing. Let h be the secondbest house observed among the first k. Your policy is to buy the next house (after the first k which is nicer than h. In more precise terms, let z 1, z 2,... be a sequence of independent random variables uniformly distributed in [0, 1] as before, let z a > z b be the two largest elements of the set z 1,..., z k }, and let ρ be the smallest i > k such that z i > z b. What is E(ρ, as a function of k? Solution: As above, we begin by computing Pr(ρ > n. This is equal to 1 when n k. Otherwise, ρ > n if and only if the two largest samples in z 1,..., z n } belong to the subset z 1,..., z k }. Each of the ( n 2 unordered pairs of samples is equally likely to be the two largest samples, so we find that ( / ( k n k(k 1 Pr(ρ > n = = 2 2 n(n 1. Hence E(ρ = Pr(ρ > n n=0 n=1 k(k 1 = k + n(n 1 n=k ( 1 = k + k(k 1 n 1 1 n n=k ( 1 = k + k(k 1 k 1 = 2k.
6. (a Find a non-zero vector v = (x, y, z such that v is a linear combination of (0, 1, 1 and (1, 0, 0, and v is also a linear combination of ( 1, 2, 0 and (1, 1, 1. (b Let Solution: A vector v = (v x, v y, v z is a linear combination of (0, 1, 1 and (1, 0, 0 if and only if v x v y v z det 0 1 1 = 0 1 0 0 v y v z = 0. (3 Similarly, v is a linear combination of ( 1, 2, 0 and (1, 1, 1 if and only if v x v y v z det 1 2 0 = 0 1 1 1 2v x + v y 3v z = 0. (4 From (3 we get v y = v z. Plugging this into (4 we get 2v x 4z v = 0 which implies v x = 2v z. Thus v = (2, 1, 1 is a valid solution. S = (x, y, z x + y + z = 5, x 0, y 0, z 0}. Let v = (13, 16, 6. Find the point in S which is closest to v, i.e. the w S which minimizes v w 2. Solution: S is a triangular region in the plane x + y + z = 5}. It is bounded by the lines x+y = 5, z = 0}, x+z = 5, y = 0}, y +z = 5, x = 0}. Its corners are (5, 0, 0, (0, 5, 0, (5, 0, 0. Therefore w must be one of the following seven points. i. The point (5, 0, 0. ii. The point (0, 5, 0. iii. The point (0, 0, 5. iv. The point on the line x + y = 5, z = 0} which is closest to v. This point u = (u x, u y, u z satisfies u x + u y = 5, u z = 0, and ( u v (1, 1, 0 = 0, i.e. u x 13 = u y 16. Therefore the closest point on that line is (1, 4, 0. v. The point on the line x + z = 5, y = 0} which is closest to v. This point u = (u x, u y, u z satisfies u x + u z = 5, u y = 0, and ( u v (1, 0, 1 = 0, i.e. u x 13 = u z 6. Therefore the closest point on that line is (6, 0, 1. vi. The point on the line y + z = 5, x = 0} which is closest to v. This point u = (u x, u y, u z satisfies u y + u z = 5, u x = 0, and ( u v (0, 1, 1 = 0, i.e. u y 16 = u z 6. Therefore the closest point on that line is (0, 7.5, 2.5. vii. The point on the plane x + y + z = 0} which is closest to v. Denoting this point by u = (u x, u y, u z, we know that the vector u v must be parallel to the normal vector to this plane, which is (1, 1, 1. Hence u x + u y + u z = 5 and u x 13 = u y 16 = u z 6. Solving, we obtain u = (3, 6, 4.
Only the first four of these seven points belong to S, and one can check that (1, 4, 0 is the closest to v. 7. Recall the prediction problem we discussed on the first day of class: there are n experts predicting a binary sequence B 1, B 2,..., and one of the experts never makes a mistake. (In other words, if b ij denotes the prediction of expert i on the j-th trial, we are assuming that there exists some i (1 i n such that b ij = B j for all j. Assume that both the prediction matrix (b ij and the sequence (B j are specified by an oblivious adversary. We saw that there is a deterministic prediction algorithm which makes at most log 2 (n mistakes, and that this mistake bound is optimal. (a Show that there is a randomized prediction algorithm whose expected number of mistakes (against any oblivious adversary is at most 1 2 log 2(n. Solution: For a function f : [0, 1] [0, 1], consider the following algorithm. Alg(f : S = 1, 2,..., n} /* S is the set of experts who have not made a mistake yet */ for j=1,2,... Let a = i S b ij /* The number of experts in S predicting 1. */ Let b = S a. /* The number of experts in S predicting 0. */ Output prediction A t = 1 with probability f ( a a+b. Otherwise output prediction A t = 0. Observe B j. S S \ i b ij B j } /* Remove experts who made a mistake. */ end We will analyze the algorithm for a generic function f. From this analysis we will deduce the constraints which f must satisfy in order to ensure at most 1 2 log 2(n mistakes in expectation. Let W t be the number of experts in S at the beginning of the t-th iteration of the main loop. Note that W 1 = n and that W t 1 for all t, by assumption. Let Φ t = log 2 (W t, and observe that Φ t 0 for all t, hence (Φ t Φ t+1 Φ 1 = log 2 (n. t=1 Let X t = a t B t and observe that t=1 X t is the number of mistakes made by the algorithm. So if we can prove that we will be done. E(X t 1 2 (Φ t Φ t+1
Let a and b be the number of experts in S predicting 1 and 0 (respectively at time t. Let p = a. If B a+b t = 0 we have ( 1 E(X t = f(p, Φ t Φ t+1 = log 2. 1 p If B t = 1 we have E(X t = 1 f(p, Φ t Φ t+1 = log 2 ( 1 p Hence we are looking for a function f that satisfies the following for all p: f(p 1 ( 1 2 log 2 (5 1 p 1 f(p 1 ( 1 2 log 2. (6 p We may rewrite (6 as f(p 1 1 2 log 2 Combining all the constraints on f, we come up with: max 0, 1 } 2 log 2 (4p f(p min. ( 1 = 1 p 2 log 2 (4p. (7 1, 1 2 log 2 ( 1 } 1 p for all p [0, 1]. For any function f satisfying (8, the algorithm Alg(f will make at most 1 2 log 2(n mistakes in expectation. To see that there is at least one function f satisfying (8, observe that 1 4p(1 p = (1 2p 2 0, hence the inequality 4p(1 p 1 is valid for all p [0, 1]. Taking the logarithm of both sides, we obtain log 2 (4p + log 2 (1 p 0 1 2 log 2(4p 1 ( 1 2 log 2. (9 1 p For 0 p 1/4, the left side of (8 is 0 while the right side is non-negative. For 1/4 p 3/4, the fact that the left side of (8 is bounded above by the right side follows from (9. For 3/4 p 1, the left side is at most 1 while the right side is equal to 1. So, for example, the following choice of f suffices: f(p = 0 if 0 p < 1/4 1 2 log 2(4p if 1/4 p 3/4 1 if 3/4 < p 1. (8
(b Show that no randomized prediction algorithm can make fewer than 1 2 log 2(n mistakes, in expectation. In other words, prove that for every randomized prediction algorithm there exists an oblivious adversary such that the expected number of mistakes made by the algorithm against this adversary is at least 1 2 log 2(n. Solution: Let k = log 2 (n. Consider the following random input instance. For j = 1, 2,..., k and 1 i 2 k, expert i predicts the j-th binary digit of the integer i 1 (padded with initial 0 s if necessary, so that it is a string of k binary digits. For i > 2 k, expert i always predicts 0. Finally, B 1,..., B k is a string of independent, uniformly-distributed, binary digits. Note that the construction guarantees the existence of an expert who makes no mistakes. For any randomized prediction algorithm, the probability of a mistake at time 1 j k is 1 since the algorithm s prediction depends only on the experts 2 predictions and the algorithm s own random bits, and the random variable B j is independent of this data. Hence the expected number of mistakes made by the algorithm is k/2. It follows that there is at least one sequence B 1, B 2,..., B k which causes the algorithm to make at least k/2 mistakes in expectation.