Lecture 19 - Covariance, Conditioning

Similar documents
THEORY OF PROBABILITY VLADIMIR KOBZAR

DS-GA 1002: PREREQUISITES REVIEW SOLUTIONS VLADIMIR KOBZAR

THEORY OF PROBABILITY VLADIMIR KOBZAR

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Probability Generating Functions

STAT 430/510: Lecture 16

STAT 430/510: Lecture 15

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Ch. 5 Joint Probability Distributions and Random Samples

Probability Review. Gonzalo Mateos

We introduce methods that are useful in:

18.440: Lecture 28 Lectures Review

Bivariate distributions

MTH 224: Probability and Statistics Fall 2017

STAT 430/510 Probability

SDS 321: Introduction to Probability and Statistics

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Joint Distribution of Two or More Random Variables

Chapter 5. Chapter 5 sections

Joint Probability Distributions, Correlations

Chapter 2. Discrete Distributions

Multivariate distributions

Topic 3: The Expectation of a Random Variable

M378K In-Class Assignment #1

Lecture 11: Probability Distributions and Parameter Estimation

Preliminary statistics

Data, Estimation and Inference

Topic 3: The Expectation of a Random Variable

STAT Chapter 5 Continuous Distributions

Multiple Random Variables

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

Math Review Sheet, Fall 2008

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

4. CONTINUOUS RANDOM VARIABLES

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

HW7 Solutions. f(x) = 0 otherwise. 0 otherwise. The density function looks like this: = 20 if x [10, 90) if x [90, 100]

1 Presessional Probability

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

7 Random samples and sampling distributions

1: PROBABILITY REVIEW

Things to remember when learning probability distributions:

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

Special distributions

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Review: mostly probability and some statistics

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

Review of Probabilities and Basic Statistics

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

Random Variables and Their Distributions

REAL ANALYSIS II TAKE HOME EXAM. T. Tao s Lecture Notes Set 5

Entropy and Ergodic Theory Lecture 27: Sinai s factor theorem

SDS 321: Introduction to Probability and Statistics

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Entropy and Ergodic Theory Notes 22: The Kolmogorov Sinai entropy of a measure-preserving system

Probability Background

18.440: Lecture 28 Lectures Review

Contents 1. Contents

EXAM # 3 PLEASE SHOW ALL WORK!

Math/Stats 425, Sec. 1, Fall 04: Introduction to Probability. Final Exam: Solutions

Probability. Table of contents

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

SOLUTION FOR HOMEWORK 12, STAT 4351

1 Random Variable: Topics

Continuous Random Variables

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Notes for Math 324, Part 19

6.1 Moment Generating and Characteristic Functions

MTH 224: Probability and Statistics Fall 2017

ADVANCE TOPICS IN ANALYSIS - REAL. 8 September September 2011

Discrete Random Variables

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

15 Discrete Distributions

Random Variables. Andreas Klappenecker. Texas A&M University

Probability. Carlo Tomasi Duke University

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Conditional distributions (discrete case)

Basics on Probability. Jingrui He 09/11/2007

Lecture 2: Repetition of probability theory and statistics

Multivariate Random Variable

Multivariate random variables

Chapter 4 Multiple Random Variables

Gaussian vectors and central limit theorem

A Bowl of Kernels. By Nuriye Atasever, Cesar Alvarado, and Patrick Doherty. December 03, 2013

Hilbert modules, TRO s and C*-correspondences

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Discrete Random Variable

Joint Probability Distributions, Correlations

Chapter 5 Joint Probability Distributions

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be

ECE353: Probability and Random Processes. Lecture 5 - Cumulative Distribution Function and Expectation

4 Pairs of Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

Review of probability

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Transcription:

THEORY OF PROBABILITY VLADIMIR KOBZAR Review. Lecture 9 - Covariance, Conditioning Proposition. (Ross, 6.3.2) If X,..., X n are independent normal RVs with respective parameters µ i, σi 2 for i,..., n, then ř n X i is Npµ `... ` µ n, σ 2 `... ` σnq 2 (Beware that we add the squares of σ i s!) Proof. Step : Let n 2, X be Np0, q and X 2 be Np0, σ 2 q Step 2: get the result for any two independent normals by scaling and translating. Step 3: induction on n. More on sums: Variance and covariance (Ross, Secs 7.3 and 7.4): Proposition 2. (Ross Prop 7.4.) If X and Y are independent, and g and h are any functions from reals to reals, then (Not true without independence!) ErgpXqhpY qs ErgpXqsErhpY qs Proof. (idea) Split the two-variate integral into a product of singlevariate integrals. Now recall: the variance of a RV X is V arpxq ErpX µq 2 s where µ ErXs It gives a useful measure of how spread out X is. We can generalize it to two RVs X and Y: Definition. Let X and Y be RVs and let µ X ErXs and µ Y ErY s. The covariance of X and Y is CovpX, Y q ErpX µ X qpy µ Y qs provided that this expectation converges for discrete RV s, or continuous, or otherwise. Date: August 8, 206.

2 VLADIMIR KOBZAR First properties. Symmetry: CovpX, Y q CovpY, Xq 2. Applying with Y X, we see that Cov generalizes V ar, CovpX, Xq V arpxq 3. Like V ar, Cov has a useful alternative formula: CovpX, Y q CovpY, Xq 4. By 3., if X and Y are independent, CovpX, Y q ErX µ X s ErY µ Y s 0 Example. (Ross, 7.4d) Let A and B be events, and let I A and I B be their indicator variables. Since ErI A s P paq, ErI B s P pbq, and ErI A X I B s P pa X Bq, CovpI A, I B q P pa X Bq P paqp pbq P pbqrp pa Bq P paqs Therefore, Cov can be positive, zero, or negative depending on whether P pa Bq is, respectively, greater than, equal to, or less than P paq. Also, by Property 2., V arpi A q P paq P paq 2 P paqp P paqq Here is what makes covariance really useful: how it transforms under sums and products: Proposition 3. (Ross Prop 7.4.2). For any RVs X and Y, and any real value a, we have CovpaX, Y q CovpX, ay q acovpx, Y q (so V arpaxq CovpaX, axq a 2 CovpX, Xq a 2 V arpxq - consistency check) 2. For any RVs X,..., X n and Y,...Y m, we have mÿ mÿ Covp X i, Y j q CovpX i, Y j q j j (so it behaves just like multiplying out a product of sums of numbers.) In particular, if m n and Y i X i in part 2 above, we get (Ross, eqn (7.4.)) ÿ V arp X i q V arpx i q ` 2 CovpX i, X j q ďiăjďn

THEORY OF PROBABILITY 3 If X,..., X n are independent, then CovpX i, X j q 0 whenever i ă j, so in this case, we re left with V arp X i q V arpx i q. Example 2. (Ross, 7.4b) If X is binom(n,p) then V arpxq npp pq For an independent Bernoulli RV X i V arpx i q ErX 2 i s perx i sq 2 ErX 2 i s perx i sq 2 p p 2 Now recall that X X `...X n, a sum of independent Bernoulli variables. Note that this is a third proof in Ross of variance of a binomial random variable. In Lecture, we reviewed the proof on p 39 (using the recursive expression for ErX k s). Example 7.3a, p 299, uses moments of binomial random variable, which we will see study tomorrow. Example 3. (special case of Ross, 7.4f) An experiment has three possible outcomes with respective probabilities p, p 2, p 3. After n independent trials are performed, we write X i for the number of times outcome i occurred, i, 2, 3. Find CovpX, X 2 q. Note that where X I pkq I 2 pkq CovpX, X 2 q I pkq and X 2 k I 2 pkq k if trial k results in outcome if trial k results in outcome 2 CovpI plq, I 2 pkqq l k

4 VLADIMIR KOBZAR Since the trial are independent, only the diagonal terms survive. CovpI plq, I 2 plqq ErI plqi 2 plqs ErI plqseri 2 plqs p p 2 Therefore, CovpX, X 2 q np p 2 as expected. Now some continuous examples... Example 4. (Ross, Section 7.2l) (Random walk) A flee starts at the origin of the plane. Each second it jumps one inch. For each jump, it chooses the direction uniformly at random, independently of its previous jump. Find the expected square of the distance from the origin after n jumps. Let px i, Y i q denote the change in position at the i-th step, i,..., n where X i cos θ i and Y i sin θ i and θ i is an independent uniform p0, 2πq random variable. Then D 2 p X i q 2 ` p Y i q 2 pxi 2 ` Yi 2 q ` ÿ ÿ pxi X j ` Y i Y j q i j n ` ÿ ÿ pcos θj cos θ i ` sin θ j sin θ i q i j where we have used the fact that cos 2 θ i ` sin 2 θ i. We observe that 2πErcos θ i s 2πErsin θ i s ż 2π 0 ż 2π 0 cos udu 0 sin udu 0 Now, taking the expectation and using the independence of θ i and θ j when i j and the immediately preceding result we obtain ErD 2 s n

THEORY OF PROBABILITY 5 Example 5. If X,...X n are independent normal RVs with respective parameters µ i, σ i for i,..., n and, then Er X i s µ i and V arp X i q Therefore, Ross eq. (7.4.) is consistent with our previous calculation (Ross, Prop 6.3.2) that ř n X i is still normal with those parameters. Gamma functions. (Special case of Ross, Prop 6.3.): If X and Y are independent Exp(λ) RVs, then X ` Y is continuous with PDF f X`Y λ 2 ze λz z ą 0 Applying the identity Ross, 6.3.2, we get ż 8 f X`Y pzq f X pz yqf Y pyqdy 8 ş z λe λpz yqλe λy dy if z ą 0 0 λ 2ze λz if z ą 0 were in setting the limits of integration we observed that f Y pyq 0 for y ă 0 and f X pz yq 0 for y ą z. Now assume that for X,..., X n independent Exppλq RVs, their sum is λn z n 2e λz if z ą 0 pn 2q! f X`X n pzq Again by the identity Ross, (6.3.2), we get ż 8 f px`x n q`x n pzq f X`X n pyqf Xn pz yqdy 8 ş z 0 p λn y n 2e λyqpλe λpz yq qdy if z ą 0 pn 2q! pn q! λn z n e λz if z ą 0 σ 2 i

6 VLADIMIR KOBZAR were in setting the limits of integration we observed that f X`X n pyq 0 for y ă 0 and f Xn pz yq 0 for y ą z. Any continuous RV with the PDF above is called Gammapn, λq, so for instance Gammap, λq = Exppλq. See Ross subsections 5.3. and 6.3.2 for more information on Gamma RVs. Conditioning with random variables. Our next topic is the use of conditional probability in connection with random variables as well as single events. As for events, this can come up in various ways. Sometimes we know the joint distribution of RVs X and Y, and want to compute updated probabilities for the behavior of X in light of some information about Y. Sometimes there is a natural modeling choice for the conditional probabilities, and we must reconstruct the unconditioned probabilities from them. The tools we introduce below can give an convenient way to do a calculation even when we start and end with unconditional quantities. Conditional distributions: discrete case (Ross Sec 6.4) Suppose that X and Y are discrete RVs with joint PMF p. We have have various events defined in terms of X and others defined in terms of Y. Sometimes we need conditional probabilities of X events given Y events and vice-versa. This requires no new ideas beyond conditional probability. But some new notation can be convenient. Definition 2. The conditional PMF of X given Y is the function P X Y px yq P tx x Y yu ppx, yq p Y pyq It is defined for any x and any y for which p Y pyq ą 0 (we simply don t use it for other choices of y). Example 6. (Ross, 6.4a) Suppose that X and Y have joint PMF given by pp0, 0q 0.4, pp0, q 0.2, pp, 0q 0., pp, q 0.3 Calculate the conditional PMF of X given that Y

THEORY OF PROBABILITY 7 Example 7. (Ross, 6.4b) Suppose that X and Y be independent Poipλ q, Poipλ 2 q RVs. Calculate the conditional PMF of X given that X ` Y n Conditioning can also involve three or more RVs. This requires care, but no new ideas. Example 8. (Special case of Ross, 6.4c) An experiment has 3 possible outcomes with respective probabilities p, p 2, p 3. After n independent trials are performed, we write X i for the number of times outcome i occurred, i, 2, 3 Find the conditional distribution (that is the conditional joint PMF) of px, X 2 q given that X 3 m. WHAT THE QUESTION IS ASKING FOR: p X,X 2 X 3 pk, l mq P tx k, X 2 l X 3 mu for all possible values of k, l and m. Another fact to note: If X and Y are independent, then P X Y px yq p Xpxqp Y pyq p Y pyq p X pxq So this is just like for events: if X and Y are independent, then knowing the value taken by Y doesn t influence the probability distribution of X. Conditional distributions: continuous case (Ross, Sec 6.5). When continuous RVs are involved we do need a new idea. THE PROBLEM: Suppose that Y is a continuous RV and E is any event. Then P ty yu 0 for any real value y, so we cannot define the conditional probability P pe ty yu using the usual formula. However, we can sometimes make sense of this in another way.

8 VLADIMIR KOBZAR Instead of assuming that Y takes the value y exactly, let us condition on Y taking a value in a tiny window around y: P pe X ty ď Y ď y ` dyuq P pe ty ď Y ď y ` dyuq P ty ď Y ď y ` dyu P pe X ty ď Y ď y ` dyuq f Y pyqdy where we use the infinitessimal interpretation of f Y pyq. This makes sense provided f Y ą 0. Now let dy Ñ 0. The conditional probability of E given that Y y is defined to be P pe Y yq lim dyñ0 P pe X ty ď Y ď y ` dyuq f Y pyqdy Theorem 4. The above limit always exists except maybe for a negligible set of possible values of y, which you can safely ignore. Negligible sets are another idea from measure theory, so we won t describe it here. Ross doesn t mention this result at all. In this course, you can always assume that the limit exists. Example 9. (Ross 6.5.) Suppose that X, Y have joint PMF e x{ye y 0 ă x, y ă 8 y fpx, yq Find P tx ą Y yu In this example, E is defined in terms of another RV, jointly continuous with Y. In fact there is a useful general tool for this situation. Definition 3. (Ross, p250) Suppose X and Y are jointly continuous with joint PDF f. The conditional PDF of X given Y is the function f X Y px yq fpx, yq f Y pyq it is defined for all real values x and all y such that f Y pyq ą 0. This is not a conditional probability since it s a ratio of densities, not of probability values.

THEORY OF PROBABILITY 9 INTUITIVE INTERPRETATION: Instead of conditioning tx xu on ty yu, let s allow very small window around both x and y. We find: P tx ď X ď x ` dx ty ď Y ď y ` dyu P tx ď X ď x ` dx, y ď Y ď y ` dyu P ty ď Y ď y ` dyu fpx, yqdxdy «f X Y px yqdx f Y pyqdy so f X Y px yq is a PDF, which describes the probability distribution of X, given that Y lands in a very small window around y. Also, just as in the discrete case, if X and Y are independent, then f X Y px yq f X pxq so knowing the value taken by Y (to arbitrary accuracy) doesn t influence the probability distribution of X. The conditional probability PDF enables us to compute probabilities of X-events given that ty yu, without taking a limit: Proposition 5. (See Ross p.25) Let a ă b, or a 8 or b 8. Let y be a real value such that f Y pyq ą 0. Then P ta ď X ď b Y yu ż b a f X Y px yqdx MORAL: f X Y px yq is a new PDF. It describes the probabilities of Xevents given that Y y. It has all other properties that we ve already seen for PDFs. Finding a conditional PDF is just like finding a conditional PMF. Example 0. (Ross, 6.5a). The joint PDF of X and Y is 2 xp2 x yq 0 ă x, y ă fpx, yq 5 Find f X Y px yq for all x and for 0 ă y ă. PROCEDURE: Find the marginal f Y the formula for f X Y. by integrating, then plug into

0 VLADIMIR KOBZAR In some situations it is more natural to assume a conditional PDF and then reconstruct a joint P DF. This is the same idea as the multiplication rule for conditional probabilities of events. Example. A stick of unit length is broken at a uniformly random point. Then the left-hand piece is broken again at a uniform random point. Let Y be the length of the left-most piece after the second break. Find f y. Let X be the length of the left-hand piece after the first break. Step : Obtain fpx, yq f X pxqf Y X py xq (multiplication rule!) fpx, yq x if 0 ă y ă x ă Step 2: Integrate out x to get f Y pyq. ż 8 8 ż fpx y qdx y x log y if 0 ă y ă Definition 4 (See Ross, Example 6.5d for the general bivariate normal, which has several other parameters).. The random vector px, Y q is a bivariate standard normal with correlation ă ρ ă if it is jointly continuous with joint PDF fpx, yq 2π a ρ expp 2 2p ρ 2 q px2 2ρxy ` y 2 qq for 8 ă x, y ă 8. In Lecture 8, we showed that the marginal distributions of X and Y are standard normals.

Therefore, conditioned on ty yu, f X Y px yq fpx, yq f Y pyq Npρy, ρ 2 q THEORY OF PROBABILITY 2π? ρ expp 2? a 2π ρ 2 2p ρ 2 q px2 2ρxy ` y 2 qq 2? π e y2 {2 expp px ρyq2 2p ρ 2 q q Mixtures of discrete and continuous. Some situations are best modeled using both some continuous RVs and some discrete RVs. Example: Suppose electronic components can be more or less durable, and this is measured by a parameter between 0 and. If we buy and install a component, its parameter is a continuous RV X lying in that range. Then it survives each use independently with probability X. The number of uses before failure is a discrete RV N, but it depends on the continuous RV X. This is sometimes called a hybrid model, or said to be of mixed type. Conditional probability plays an especially important role here. Suppose that X is a RV and E is any event with P peq ą 0. Consider any X-event of the form ta ď X ď bu. By definition, its conditional probability given E is KEY FACT: P pa ď X ď b Eq P ta ď X ď bu X E P peq Proposition 6. (cf Ross, p 255) If X is continuous, and P peq ą 0, then there is a conditional PDF f X E such that whenever a ă b. P pa ď X ď b Eq ż b a f X E pxqdx That is X continuous ñ X still continuous after conditioning. Proof. INTUITION: P px ď X ď x ` dx Eq «f X E pxqdx

2 VLADIMIR KOBZAR ROUGH IDEA OF THE PROOF: if X is continuous and P peq ą 0, then we can define P ptx ď X ď x ` dxu Eq f X E pxq lim dxñ0 dx That is the limit exists and it satisfies the equation for f X E. Ross doesn t mention this point and we will not attempt to make it rigorous. Example 2. Letx X be Exppλq, and let E tx ą tu. Then f X E psq P tsďxďs`dsu lim dsñ0 if s ą t e λt ds e λs if s ą t e λt References [] Austin, Theory of Probability lecture notes, https://cims.nyu.edu/ tim/tofp [2] Bernstein, Theory of Probability lecture notes, http://www.cims.nyu.edu/ brettb/probsum205/index.html [3] Ross, A First Course in Probability (9th ed., 204)