Lecture 19 - Covariance, Conditioning

THEORY OF PROBABILITY VLADIMIR KOBZAR Review. Lecture 9 - Covariance, Conditioning Proposition. (Ross, 6.3.2) If X,..., X n are independent normal RVs with respective parameters µ i, σi 2 for i,..., n, then ř n X i is Npµ `... ` µ n, σ 2 `... ` σnq 2 (Beware that we add the squares of σ i s!) Proof. Step : Let n 2, X be Np0, q and X 2 be Np0, σ 2 q Step 2: get the result for any two independent normals by scaling and translating. Step 3: induction on n. More on sums: Variance and covariance (Ross, Secs 7.3 and 7.4): Proposition 2. (Ross Prop 7.4.) If X and Y are independent, and g and h are any functions from reals to reals, then (Not true without independence!) ErgpXqhpY qs ErgpXqsErhpY qs Proof. (idea) Split the two-variate integral into a product of singlevariate integrals. Now recall: the variance of a RV X is V arpxq ErpX µq 2 s where µ ErXs It gives a useful measure of how spread out X is. We can generalize it to two RVs X and Y: Definition. Let X and Y be RVs and let µ X ErXs and µ Y ErY s. The covariance of X and Y is CovpX, Y q ErpX µ X qpy µ Y qs provided that this expectation converges for discrete RV s, or continuous, or otherwise. Date: August 8, 206.

2 VLADIMIR KOBZAR First properties. Symmetry: CovpX, Y q CovpY, Xq 2. Applying with Y X, we see that Cov generalizes V ar, CovpX, Xq V arpxq 3. Like V ar, Cov has a useful alternative formula: CovpX, Y q CovpY, Xq 4. By 3., if X and Y are independent, CovpX, Y q ErX µ X s ErY µ Y s 0 Example. (Ross, 7.4d) Let A and B be events, and let I A and I B be their indicator variables. Since ErI A s P paq, ErI B s P pbq, and ErI A X I B s P pa X Bq, CovpI A, I B q P pa X Bq P paqp pbq P pbqrp pa Bq P paqs Therefore, Cov can be positive, zero, or negative depending on whether P pa Bq is, respectively, greater than, equal to, or less than P paq. Also, by Property 2., V arpi A q P paq P paq 2 P paqp P paqq Here is what makes covariance really useful: how it transforms under sums and products: Proposition 3. (Ross Prop 7.4.2). For any RVs X and Y, and any real value a, we have CovpaX, Y q CovpX, ay q acovpx, Y q (so V arpaxq CovpaX, axq a 2 CovpX, Xq a 2 V arpxq - consistency check) 2. For any RVs X,..., X n and Y,...Y m, we have mÿ mÿ Covp X i, Y j q CovpX i, Y j q j j (so it behaves just like multiplying out a product of sums of numbers.) In particular, if m n and Y i X i in part 2 above, we get (Ross, eqn (7.4.)) ÿ V arp X i q V arpx i q ` 2 CovpX i, X j q ďiăjďn

THEORY OF PROBABILITY 3 If X,..., X n are independent, then CovpX i, X j q 0 whenever i ă j, so in this case, we re left with V arp X i q V arpx i q. Example 2. (Ross, 7.4b) If X is binom(n,p) then V arpxq npp pq For an independent Bernoulli RV X i V arpx i q ErX 2 i s perx i sq 2 ErX 2 i s perx i sq 2 p p 2 Now recall that X X `...X n, a sum of independent Bernoulli variables. Note that this is a third proof in Ross of variance of a binomial random variable. In Lecture, we reviewed the proof on p 39 (using the recursive expression for ErX k s). Example 7.3a, p 299, uses moments of binomial random variable, which we will see study tomorrow. Example 3. (special case of Ross, 7.4f) An experiment has three possible outcomes with respective probabilities p, p 2, p 3. After n independent trials are performed, we write X i for the number of times outcome i occurred, i, 2, 3. Find CovpX, X 2 q. Note that where X I pkq I 2 pkq CovpX, X 2 q I pkq and X 2 k I 2 pkq k if trial k results in outcome if trial k results in outcome 2 CovpI plq, I 2 pkqq l k

4 VLADIMIR KOBZAR Since the trial are independent, only the diagonal terms survive. CovpI plq, I 2 plqq ErI plqi 2 plqs ErI plqseri 2 plqs p p 2 Therefore, CovpX, X 2 q np p 2 as expected. Now some continuous examples... Example 4. (Ross, Section 7.2l) (Random walk) A flee starts at the origin of the plane. Each second it jumps one inch. For each jump, it chooses the direction uniformly at random, independently of its previous jump. Find the expected square of the distance from the origin after n jumps. Let px i, Y i q denote the change in position at the i-th step, i,..., n where X i cos θ i and Y i sin θ i and θ i is an independent uniform p0, 2πq random variable. Then D 2 p X i q 2 ` p Y i q 2 pxi 2 ` Yi 2 q ` ÿ ÿ pxi X j ` Y i Y j q i j n ` ÿ ÿ pcos θj cos θ i ` sin θ j sin θ i q i j where we have used the fact that cos 2 θ i ` sin 2 θ i. We observe that 2πErcos θ i s 2πErsin θ i s ż 2π 0 ż 2π 0 cos udu 0 sin udu 0 Now, taking the expectation and using the independence of θ i and θ j when i j and the immediately preceding result we obtain ErD 2 s n

THEORY OF PROBABILITY 5 Example 5. If X,...X n are independent normal RVs with respective parameters µ i, σ i for i,..., n and, then Er X i s µ i and V arp X i q Therefore, Ross eq. (7.4.) is consistent with our previous calculation (Ross, Prop 6.3.2) that ř n X i is still normal with those parameters. Gamma functions. (Special case of Ross, Prop 6.3.): If X and Y are independent Exp(λ) RVs, then X ` Y is continuous with PDF f X`Y λ 2 ze λz z ą 0 Applying the identity Ross, 6.3.2, we get ż 8 f X`Y pzq f X pz yqf Y pyqdy 8 ş z λe λpz yqλe λy dy if z ą 0 0 λ 2ze λz if z ą 0 were in setting the limits of integration we observed that f Y pyq 0 for y ă 0 and f X pz yq 0 for y ą z. Now assume that for X,..., X n independent Exppλq RVs, their sum is λn z n 2e λz if z ą 0 pn 2q! f X`X n pzq Again by the identity Ross, (6.3.2), we get ż 8 f px`x n q`x n pzq f X`X n pyqf Xn pz yqdy 8 ş z 0 p λn y n 2e λyqpλe λpz yq qdy if z ą 0 pn 2q! pn q! λn z n e λz if z ą 0 σ 2 i

6 VLADIMIR KOBZAR were in setting the limits of integration we observed that f X`X n pyq 0 for y ă 0 and f Xn pz yq 0 for y ą z. Any continuous RV with the PDF above is called Gammapn, λq, so for instance Gammap, λq = Exppλq. See Ross subsections 5.3. and 6.3.2 for more information on Gamma RVs. Conditioning with random variables. Our next topic is the use of conditional probability in connection with random variables as well as single events. As for events, this can come up in various ways. Sometimes we know the joint distribution of RVs X and Y, and want to compute updated probabilities for the behavior of X in light of some information about Y. Sometimes there is a natural modeling choice for the conditional probabilities, and we must reconstruct the unconditioned probabilities from them. The tools we introduce below can give an convenient way to do a calculation even when we start and end with unconditional quantities. Conditional distributions: discrete case (Ross Sec 6.4) Suppose that X and Y are discrete RVs with joint PMF p. We have have various events defined in terms of X and others defined in terms of Y. Sometimes we need conditional probabilities of X events given Y events and vice-versa. This requires no new ideas beyond conditional probability. But some new notation can be convenient. Definition 2. The conditional PMF of X given Y is the function P X Y px yq P tx x Y yu ppx, yq p Y pyq It is defined for any x and any y for which p Y pyq ą 0 (we simply don t use it for other choices of y). Example 6. (Ross, 6.4a) Suppose that X and Y have joint PMF given by pp0, 0q 0.4, pp0, q 0.2, pp, 0q 0., pp, q 0.3 Calculate the conditional PMF of X given that Y

THEORY OF PROBABILITY 7 Example 7. (Ross, 6.4b) Suppose that X and Y be independent Poipλ q, Poipλ 2 q RVs. Calculate the conditional PMF of X given that X ` Y n Conditioning can also involve three or more RVs. This requires care, but no new ideas. Example 8. (Special case of Ross, 6.4c) An experiment has 3 possible outcomes with respective probabilities p, p 2, p 3. After n independent trials are performed, we write X i for the number of times outcome i occurred, i, 2, 3 Find the conditional distribution (that is the conditional joint PMF) of px, X 2 q given that X 3 m. WHAT THE QUESTION IS ASKING FOR: p X,X 2 X 3 pk, l mq P tx k, X 2 l X 3 mu for all possible values of k, l and m. Another fact to note: If X and Y are independent, then P X Y px yq p Xpxqp Y pyq p Y pyq p X pxq So this is just like for events: if X and Y are independent, then knowing the value taken by Y doesn t influence the probability distribution of X. Conditional distributions: continuous case (Ross, Sec 6.5). When continuous RVs are involved we do need a new idea. THE PROBLEM: Suppose that Y is a continuous RV and E is any event. Then P ty yu 0 for any real value y, so we cannot define the conditional probability P pe ty yu using the usual formula. However, we can sometimes make sense of this in another way.

8 VLADIMIR KOBZAR Instead of assuming that Y takes the value y exactly, let us condition on Y taking a value in a tiny window around y: P pe X ty ď Y ď y ` dyuq P pe ty ď Y ď y ` dyuq P ty ď Y ď y ` dyu P pe X ty ď Y ď y ` dyuq f Y pyqdy where we use the infinitessimal interpretation of f Y pyq. This makes sense provided f Y ą 0. Now let dy Ñ 0. The conditional probability of E given that Y y is defined to be P pe Y yq lim dyñ0 P pe X ty ď Y ď y ` dyuq f Y pyqdy Theorem 4. The above limit always exists except maybe for a negligible set of possible values of y, which you can safely ignore. Negligible sets are another idea from measure theory, so we won t describe it here. Ross doesn t mention this result at all. In this course, you can always assume that the limit exists. Example 9. (Ross 6.5.) Suppose that X, Y have joint PMF e x{ye y 0 ă x, y ă 8 y fpx, yq Find P tx ą Y yu In this example, E is defined in terms of another RV, jointly continuous with Y. In fact there is a useful general tool for this situation. Definition 3. (Ross, p250) Suppose X and Y are jointly continuous with joint PDF f. The conditional PDF of X given Y is the function f X Y px yq fpx, yq f Y pyq it is defined for all real values x and all y such that f Y pyq ą 0. This is not a conditional probability since it s a ratio of densities, not of probability values.

THEORY OF PROBABILITY 9 INTUITIVE INTERPRETATION: Instead of conditioning tx xu on ty yu, let s allow very small window around both x and y. We find: P tx ď X ď x ` dx ty ď Y ď y ` dyu P tx ď X ď x ` dx, y ď Y ď y ` dyu P ty ď Y ď y ` dyu fpx, yqdxdy «f X Y px yqdx f Y pyqdy so f X Y px yq is a PDF, which describes the probability distribution of X, given that Y lands in a very small window around y. Also, just as in the discrete case, if X and Y are independent, then f X Y px yq f X pxq so knowing the value taken by Y (to arbitrary accuracy) doesn t influence the probability distribution of X. The conditional probability PDF enables us to compute probabilities of X-events given that ty yu, without taking a limit: Proposition 5. (See Ross p.25) Let a ă b, or a 8 or b 8. Let y be a real value such that f Y pyq ą 0. Then P ta ď X ď b Y yu ż b a f X Y px yqdx MORAL: f X Y px yq is a new PDF. It describes the probabilities of Xevents given that Y y. It has all other properties that we ve already seen for PDFs. Finding a conditional PDF is just like finding a conditional PMF. Example 0. (Ross, 6.5a). The joint PDF of X and Y is 2 xp2 x yq 0 ă x, y ă fpx, yq 5 Find f X Y px yq for all x and for 0 ă y ă. PROCEDURE: Find the marginal f Y the formula for f X Y. by integrating, then plug into

0 VLADIMIR KOBZAR In some situations it is more natural to assume a conditional PDF and then reconstruct a joint P DF. This is the same idea as the multiplication rule for conditional probabilities of events. Example. A stick of unit length is broken at a uniformly random point. Then the left-hand piece is broken again at a uniform random point. Let Y be the length of the left-most piece after the second break. Find f y. Let X be the length of the left-hand piece after the first break. Step : Obtain fpx, yq f X pxqf Y X py xq (multiplication rule!) fpx, yq x if 0 ă y ă x ă Step 2: Integrate out x to get f Y pyq. ż 8 8 ż fpx y qdx y x log y if 0 ă y ă Definition 4 (See Ross, Example 6.5d for the general bivariate normal, which has several other parameters).. The random vector px, Y q is a bivariate standard normal with correlation ă ρ ă if it is jointly continuous with joint PDF fpx, yq 2π a ρ expp 2 2p ρ 2 q px2 2ρxy ` y 2 qq for 8 ă x, y ă 8. In Lecture 8, we showed that the marginal distributions of X and Y are standard normals.

Therefore, conditioned on ty yu, f X Y px yq fpx, yq f Y pyq Npρy, ρ 2 q THEORY OF PROBABILITY 2π? ρ expp 2? a 2π ρ 2 2p ρ 2 q px2 2ρxy ` y 2 qq 2? π e y2 {2 expp px ρyq2 2p ρ 2 q q Mixtures of discrete and continuous. Some situations are best modeled using both some continuous RVs and some discrete RVs. Example: Suppose electronic components can be more or less durable, and this is measured by a parameter between 0 and. If we buy and install a component, its parameter is a continuous RV X lying in that range. Then it survives each use independently with probability X. The number of uses before failure is a discrete RV N, but it depends on the continuous RV X. This is sometimes called a hybrid model, or said to be of mixed type. Conditional probability plays an especially important role here. Suppose that X is a RV and E is any event with P peq ą 0. Consider any X-event of the form ta ď X ď bu. By definition, its conditional probability given E is KEY FACT: P pa ď X ď b Eq P ta ď X ď bu X E P peq Proposition 6. (cf Ross, p 255) If X is continuous, and P peq ą 0, then there is a conditional PDF f X E such that whenever a ă b. P pa ď X ď b Eq ż b a f X E pxqdx That is X continuous ñ X still continuous after conditioning. Proof. INTUITION: P px ď X ď x ` dx Eq «f X E pxqdx

2 VLADIMIR KOBZAR ROUGH IDEA OF THE PROOF: if X is continuous and P peq ą 0, then we can define P ptx ď X ď x ` dxu Eq f X E pxq lim dxñ0 dx That is the limit exists and it satisfies the equation for f X E. Ross doesn t mention this point and we will not attempt to make it rigorous. Example 2. Letx X be Exppλq, and let E tx ą tu. Then f X E psq P tsďxďs`dsu lim dsñ0 if s ą t e λt ds e λs if s ą t e λt References [] Austin, Theory of Probability lecture notes, https://cims.nyu.edu/ tim/tofp [2] Bernstein, Theory of Probability lecture notes, http://www.cims.nyu.edu/ brettb/probsum205/index.html [3] Ross, A First Course in Probability (9th ed., 204)