UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores via the Google form: https://goo.gl/forms/imaaryozsjxgnsun. Make sure you use your SORTABLE NAME on CalCentral.. Graphical Density Figure shows the joint density f X,Y of the random variables X and Y. Figure : Joint density of X and Y. (a Find A and sketch f X, f Y, and f X X+Y 3. (b Find E[X Y y] for y 3 and E[Y X x] for x 4. (c Find cov(x, Y. (a The integration over the total shown area should be so A+A+A so A /6. To spell this out in more detail, 3 A dx dy + A + A + A 6A. 3 3 A dx dy + 3 4 3 A dx dy
We find the densities as follows. X is clearly uniform in intervals (,, (, 3, and (3, 4. The probability of X being in any of these intervals is A /3 so f X (x x 4}. 3 Y is uniform in intervals (, and (, 3. The probability of the first interval is /3 and the probability of being in second one is /3. So f Y (y 3 y } + < y 3}. 3 Finally, given that X + Y 3, (X, Y is chosen randomly in the triangle constructed by (,, (,, (,. Thus, f X X+Y 3 (x 3 x Sketching the densities is then straightforward. dy ( x x }. (b Given any value of y [, 3], X has a symmetric distribution with respect to the line x.5. Thus, E[X Y y].5 for all y, y 3. To calculate E[Y X x], we consider two cases: (a x 3, then E[Y X x].5, (b x < or 3 < x 4, then E[Y X x]. (c Since E[X Y y] E[X] we have E[XY ] 3 E[XY Y y]f Y (y dy E[X] E[Y ]. So the covariance is. 3 yf Y (y E[X] dy. Conditional Distribution of a Poisson Random Variable with Exponentially Distributed Parameter Let X have a Poisson distribution with parameter λ >. Suppose λ itself is random, having an exponential density with parameter θ >. (a Show that (b What is the distribution of X? E(λ k k! θ k, k N (c Determine the conditional density of λ given X k, where k N. (a E(λ k x k θe θx dx. Integrating by parts, with proper limits, E[λ k ] x k θ exp( θx dx
so x k exp( θx + k x k exp( θx dx x k θ x k θ exp( θx dx, E(λ k k θ E(λk. Continuing, and with the base case we get E(λ θ, E(λ k k! θ k. (b The PDF of λ is: f(λ θ exp( θλλ > }. The PMF of X conditioned on λ is (c P(X k λ e λ λ k, k N. k! Applying the total law of probability yields, for k N, e λ λ k P(X k θ exp( θλ dλ k! θ λ k ( + θ exp ( ( + θλ dλ ( + θk! θ ( + θ k+, because the last integral is E[Y k ] when Y Exponential( + θ, which is k!/( + θ k. f(λ X k P(X k λf(λ P(X k e (+θλ λ k ( + θ k+, λ >. k! To understand the above equation, think about the analogy to Bayes Law. Remember here that θ is fixed and λ is the argument. You should check that the integral of this over [, is. 3. Gaussian Densities (a Let X N (, σ, X N (, σ, where X and X are independent. Convolve the densities of X and X to show that X +X N (, σ +σ. (b Show that all linear combinations of i.i.d. finitely many Gaussians are Gaussian. (c Let X N (, σ ; find E[X n ] for n N. 3
(d Let Z N (,. For a random vector (X,..., X n where n is a positive integer and X,..., X n are real-valued random variables, the expectation of (X,..., X n is the vector of elementwise expectations of each random variable and the covariance matrix of (X,..., X n is the n n matrix whose (i, j entry is cov(x i, X j for all i, j,..., n}. Find the mean and covariance matrix of (Z, Z > c} in terms of φ and Φ, the standard Gaussian PDF and CDF respectively. (a We know that the pdf f Z (z of Z X + Y is given by f X f Y f Z (z πσ σ f X (xf X (z xdx ( x exp σ + } (z x σ dx Notice that the exponent is a quadratic in x. To solve this we will complete the square and write it in the form ( (x µ } exp + c σ Why? This integral can be compared to the integral of a Gaussian pdf (which we know is. This is a very useful strategy in general with integrals resembling the Gaussian. ( (x µ } exp σ + c dx e c exp (x µ } σ dx πσ e c exp (x µ } πσ σ dx }} integrates to So the final answer will be σ e c πσ σ ( We find µ, σ and c as follows - x σ + z zx + x σ ( x σ + σ }} a z σ }} b x + z σ (ax + a b ( b ( b a x + + z a a σ }} (ax+ b a 4
a ( x + b a + ( z σ b a We then see that σ a σ σ σ + σ and c z σ z σ 4 σ σ σ + σ } z σ ( σ σ + σ z (σ + σ Plugging this back into ( gives us the answer in all its glory - f Z (z π(σ + σ exp z } (σ + σ Phew! This shows that the sum of two independent Gaussians is Gaussian and the variance as we d expect from linearity is σ + σ. (Note: If you used MGFs or showed this for the case of σ σ or followed the procedure with a few calculation mistakes you may still give yourself full credit. (b Given X N (µ, σ, one can easily verify that ax N (aµ, a σ. Also the results in (a can be generalized as follows: if X N (µ, σ and X N (µ, σ, then ax + bx N (aµ + bµ, a σ + b σ. For a finite collection, using simple induction, we get the result. (c If Z N (,, then E[X n ] σ n E[Z n ] so it suffices to work with E[Z n ]. For odd n, E[Z n ] because the density of the Gaussian is symmetric around. For even n, by integration by parts, E[Z n ] π x n exp (n E[Z n ]. ( x dx n π x n exp ( x dx Iterating this, we find that E[Z n ] (n!! : n/ i (i. Another way to write this is E[Z n ] (n!! n! 4 (n n n! n/ (n/!. (d Let W : (Z, Z > c}. E[Z] and E[Z > c}] P(Z > c Φ( c by symmetry of the Gaussian; also, var Z and (using the variance of an indicator var Z > c} Φ( c( Φ( c. It remains to compute cov(z, Z > c} E[ZZ > c}] c zφ(z dz φ(c, where we have used the fact that φ (z zφ(z. Hence, [ ] [ ] φ(c E[W ], cov W. Φ( c φ(c Φ( c( Φ( c 5
4. Joint Density for Exponential Distribution (a If X Exp(λ and Y Exp(µ, X and Y independent, compute P(X < Y. (b If X k, k n are exponentially distributed with parameters λ,..., λ n, show that, P(X i min X λ k i n k n j λ j (a P(X < Y y P(X < y Y yf Y (ydy Since X and Y are independent, P(X < y Y y P(X < y, and since X Exp(λ and Y Exp(µ, P(X < y e λy and f Y (y µe µy. Plugging in, we get, P(X < Y λ λ+µ. (b We need to verify a nice fact about a collection of independent exponentially distributed random variable. Given a collection of random variables, Y i Exp(µ i, i n, min(y i, i n is exponentially distributed with parameter n i µ i. This can be easily checked by considering the cdf of min(y i. (Try it out! Now, P(X i min k n X k P(X i min k n,k i X k. From the previous argument, min k n,k i X k j,j i λ j. Using the result of part (a, the claim follows. 5. Matrix Sketching Matrix sketching is an important technique in randomized linear algebra to do large computations efficiently. For example, to compute the multiplication A T B of two large matrices A and B, we can use a random sketch matrix S to compute a sketch SA of A and a sketch SB of B. Such a sketching matrix has the property that S T S I so that the approximate multiplication A T S T SB is close to A T B. In this problem, we will discuss two popular sketching schemes and understand how they help in approximate computation. Let Î ST S and the dimension of sketch matrix S be d n (typically d n. (a (Gaussian-sketch Define S S...... S n..... d S d...... S dn such that S ij s are chosen i.i.d. from N (, for all i [, d] and j [, n]. Find the element-wise mean and variance of the matrix Î ST S, that is, find E[Îij] and Var[Îij] for all i [, d] and j [, n]. 6
(b (Count-sketch For each column j [, n] of S, choose a row i uniformly randomly from [, d] such that, with probability.5 S ij, with probability.5 and assign S kj for all k i. An example of a 3 8 count-sketch is S Again, find the element-wise mean and variance of the matrix Î ST S. Note that for sufficiently large d, the matrix Î is close to the identity matrix for both cases. We will use this fact in the lab to do an approximate matrix multiplication. Let Î ST S. (a For the Gaussian-sketch Îij d d k S kis kj. Thus, by using linearity of expectation and the fact that S ik s are drawn i.i.d. from N (,, we get, if i j E[Îij], otherwise. By the definition of variance, we have Var[Îij] [ ] [ ] (E d S ki S kj (E S ki S kj k k [ ] ( (E d S ki S kj E[S ki S kj ] k Next, we consider two cases when i j and when i j. When i j Var[Îii] [ (E d d ( k d ( k k S ki E[S 4 ki ] + k ] ( E[Ski ] k k,l k l E[S 4 ki ] + d(d d E[S ki ]E[S ki ] d d ( 3d + d(d d d. where we use the fact that the fourth moment of a standard Gaussian random variable is 3. 7
For the case when i j, we use the fact that S ki and S kj are independent and get Var[Îij] [ ] ( (E d S ki S kj E[S ki ]E[S kj ] d ( k Thus, we have k d ( d + d. E[S ki ]E[S kj ] + Var[Îij] k /d, k,l k l /d, E[S ki ]E[S kj ]E[S li ]E[S lj ] if i j otherwise. (b Note that for Count-sketch, we have Îij d k S kis kj. By construction of S, the diagonal terms Îii are always one. Thus, we only need to worry about the non-diagonal terms. It is also important to note that in S, entries in a row are independent but the entries in a column are dependent (there can only be one non-zero entry in one column, as shown in the example. Also,, with probability /d S ki S kj, with probability /d i j., with probability /d. Thus, E[Îij], if i j, otherwise. The diagonal terms in Î are exactly one, and hence, their variance is zero. For the non-diagonal terms, i.e. when i j, we have [ ] [ ] Var[Îij] E S ki S kj (E S ki S kj k E[Ski ]E[S kj ] + k k d + d. k k,l k l E[S ki S li ]E[S kj S lj ] where the in the last step comes from the fact at in any column j, the product of two elements S kj, S lj is since only one can be non-zero. Hence, the element-wise variance is, if i j Var[Îij] /d, otherwise. 8
6. Records Let n be a positive integer and X, X,..., X n be a sequence of i.i.d. continuous random variable with common probability density f X. For any integer k n, define X k as a record-to-date of the sequence if X k > X i for all i,..., k. (X is automatically a record-to-date. (a Find the probability that X is a record-to-date. Hint: You should be able to do it without rigorous computation. (b Find the probability that X n is a record-to-date. (c Find the expected number of records-to-date that occur over the first n trials (Hint: Use indicator functions. Compute this when n. (a X is record-to-date with probability /. The reason is that X and X are i.i.d., so either one is larger than other with probability /. This uses the fact that they are equal with probability, since they have a density. (b Now, by the same symmetry argument, each X i for i,..., n is equally likely to the the largest, so that each is largest with probability /n. Since X n is the record-to-date if it is the largest among X,..., X n, it is a record with probability /n. Remark: An incorrect argument proceeds as follows. Since X,..., X n are i.i.d., the probability of the event X n X i } is / for all i,..., n, and so the probability that X n is a record is (n by independence. The reason why this argument is incorrect is because the events X n X i } are not independent for different i; indeed, conditioned on the event X n X }, it is reasonable to think that X n is probably large, which increases the probability of the event X n X }. (c For i,..., n, let i be if X i is a record-to-date, otherwise. Thus E( i is the expected value of the number of records-to-date on trial i. Thus, Thus, E( i P( i i. E(records to date in n trials n E( i This is a harmonic series, and if n, it diverges to. i n i i. 9