Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Size: px
Start display at page:

Download "Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows."

Transcription

1 Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage and current, while some are abstracted from the problem, e.g., probability of passing a class and probability of graduating. Whenever we need to handle relationship between two or more events, we need mathematical tools to describe the probabilistic phenomenon. The objective of this chapter to present the concepts of joint distributions. 5. Joint PMF and Joint PDF Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Definition. Let X and Y be two discrete random variables. The joint PMF of X and Y is defined as p X,Y (x, y) P[X x Y y]. (5.) The interpretation of a joint PMF is that the sample space is now the Cartesian plane of Ω X Ω Y, where Ω X is the sample space of X, and Ω Y is the sample space of Y. Pictorially, this means that the sample space of the joint PMF is a two-dimensional plane (X, Y ). We stress the importance of this two-dimensional sample space, because every outcome of a joint variable is a point in the two-dimensional space, i.e., (X, Y ). Therefore, P[X A Y B] for sets A and B can be interpreted as P[X A Y B] P[(ξ, ζ) ξ X (A), and ζ Y (B)}]. (5.) For discrete random variables, the PMF p X,Y (x, y) can be considered as delta functions in the two-dimensional space.

2 Example. Let X be a coin flip, Y be a dice. Find the joint PMF of X and Y. Solution. The joint PMF is p X,Y (x, y), x 0,, y,, 3, 4, 5, 6. Pictorially, we have the joint PMF given by the following table. Y X 0 X In this example, we observe that if X and Y are not interacting (formally, we call them independent which we will discuss later), then the joint PMF is the product of the two individual probabilities. The continuous version of the joint PMF is called the joint PDF. Definition. Let X and Y be two continuous random variables. The joint PDF of X and Y is a function f X,Y (x, y) that can be integrated to yield a probability: P[a X b c Y d] d b c a f X,Y (x, y)dxdy. (5.3) Like PDFs for single random variables, a joint PDF is a density which can be integrated to obtain the probability. Note also in this definition, the probabilities of the events a X b} and c Y d} are related using logical AND. Example. Consider a uniform joint PDF f X,Y (x, y) defined on [0, ], as shown in Figure 5.. The shaded area corresponds to P[a X b c X d] d b c a d b c a f X,Y (x, y)dxdy dxdy (d c)(b a). In general, when f X,Y (x, y) is not uniform, we have to integrate f X,Y (x, y) over the interval specified. c 8 Stanley Chan. All Rights Reserved.

3 (a) General f X,Y (x, y) (b) Example. Figure 5.: The joint PDF f X,Y (x, y) is a two-dimensional function. Integrating over the rectangle [a, b] [c, d] returns the probability P[a X b c Y d]. Normalization The normalization property of a two-dimensional PMF and PDF states that by enumerating over all outcomes of the sample space we will obtain. Theorem. All joint PMFs and joint PDFs satisfy p X,Y (x, y) x y or f X,Y (x, y)dxdy. (5.4) Example. Consider a joint uniform PDF defined in the shaded area Ω with PDF defined below. Find the constant c. c, if (x, y) Ω, f X,Y (x, y) 0, otherwise. Solution. To find the constant c, we note that f X,Y (x, y)dxdy. The left hand side of this equation is precisely the area, which is Ω. Therefore, we have c / Ω. c 8 Stanley Chan. All Rights Reserved. 3

4 Marginal PMF and Marginal PDF If we only sum / integrate with respect to one random variable, we obtain the PMF / PDF of the other random variable. The resulting PMF / PDF is called the marginal PMF / PDF. Definition 3. The marginal PMF is defined as p X (x) y p X,Y (x, y) and p Y (y) x p X,Y (x, y) (5.5) Definition 4. The marginal PDF is defined as f X (x) f X,Y (x, y)dy and f Y (y) f X,Y (x, y)dx (5.6) Since f X,Y (x, y) is a two-dimensional function, when integrating over y from to, we project f X,Y (x, y) onto the x-axis. Therefore, the resulting function depends on x only. Example. Consider the joint PDF f X,Y (x, y) shown in Figure 5.. Find the marginal PDFs. Solution. If we integrate over x and y, then we have, if < x, 3, if < x,, if < x 3, f X (x), if < x 3, and f Y (y), if 3 < x 4, 0, otherwise. 0, otherwise. Figure 5.: Example of a joint uniform PDF f X,Y (x, y) and the corresponding marginal PDFs. Example. Consider a D Gaussian PDF as shown in Figure 5.3. The PDF of the joint Gaussian is f X,Y (x, y) πσ exp ((x µ } X) + (y µ Y ) ). σ c 8 Stanley Chan. All Rights Reserved. 4

5 Find the marginal PDFs f X (x) and f Y (y). Solution. f X (x) Similarly, we have f X,Y (x, y)dy exp (x µ X) πσ σ exp (x µ X) πσ σ f Y (y) πσ exp } }. ((x µ X) + (y µ Y ) ) σ exp (y µ Y ) πσ σ exp (y µ } Y ). πσ σ } dy } dy The result of this example shows that the marginalization of a D Gaussian is D Gaussian along the vertical and the horizontal axes. Thus, we can think of marginalization of a projection. Figure 5.3: Marginalization is equivalent to projection. A joint PDF shown in this figure can be marginalized onto the x or the y axis. Independence of Random Variables Finally, we say that two random variables are independent if the joint PMF or PDF can be factorized as a product of the marginal PMF / PDFs: Definition 5. If two random variables X and Y are independent, then p X,Y (x, y) p X (x)p Y (y), and f X,Y (x, y) f X (x)f Y (y). c 8 Stanley Chan. All Rights Reserved. 5

6 To see why this definition is coherent to the definition of independence of two events, we recall that two events A and B are independent if P[A B] P[A]P[B]. Letting A X x} and B Y y}, we see that if A and B are independent then P[X x Y y] P[X x]p[y y]. This is precisely the relationship p X,Y (x, y) p X (x)p Y (y). Independence is an important statistical property. If there are many random variables X, X,..., X N, the joint PDF f X,...,X N (x,..., x N ) is a N-dimensional function which could be computationally intractable. However, if we assume all these random variables are independent, then the joint PDF becomes f X,...,X N (x,..., x N ) N f Xn (x n ), which is often manageable. As a special case of independent random variables, we define the notion of independent and identically distributed (i.i.d.) random variables. n Definition 6 (Independent and Identically Distributed (i.i.d.)). A collection of random variables X,..., X N are called independent and identically distributed (i.i.d.) if All X,..., X N are independent; All X,..., X N have the same distribution, i.e., f X (x)... f XN (x). If X,..., X N are i.i.d., we have that f X,...,X N (x,..., x) N f Xn (x) [f X (x)] N, (5.7) where the particular choice of X is unimportant because f X (x)... f XN (x). n 5. Joint CDF Same as Ch.3 and Ch.4, we need to understand the cumulative distribution function (CDF) for the multi-variable case. Definition 7. Let X and Y be two random variables. The joint CDF of X and Y is the function F X,Y (x, y) such that F X,Y (x, y) P[X x Y y]. (5.8) c 8 Stanley Chan. All Rights Reserved. 6

7 From this definition, we can explicitly write out the probability as follows. Definition 8. If X and Y are discrete, then F X,Y (x, y) p X,Y (x, y ). (5.9) y y x x If X and Y are continuous, then F X,Y (x, y) y x f X,Y (x, y )dx dy. (5.0) Note that since F X,Y (x, y) is the integration from to x (and y), we have F X,Y (, y) y y 0dy 0. f X,Y (x, y )dx dy Similarly, we have F X,Y (x, ) 0, and F X,Y (, ) 0. CDF evaluated at x and y is F X,Y (, ) f X,Y (x, y )dx dy. If only x or y is at, we obtain the marginal CDF. Proposition. Let X and Y be two random variables. Then marginal CDF can be obtained from F X (x) F X,Y (x, ) F Y (y) F X,Y (, y). To see these results, we note that F X,Y (x, ) x y f X,Y (x, y )dy dx f X (x )dx F X (x). c 8 Stanley Chan. All Rights Reserved. 7

8 By fundamental theorem of calculus, we can derive PDF from the CDF. Definition 9. Let F X,Y (x, y) be the joint CDF of X and Y. Then, the joint PDF can be obtained through f X,Y (x, y) y x F X,Y (x, y). The order of the partial derivative can be switched, yielding a symmetric result: f X,Y (x, y) x y F X,Y (x, y). 5.3 Conditional PMF and PDF Conditional PMF Definition 0. Let X and Y be two discrete random variables. The conditional PMF of X given Y is p X Y (x y) p X,Y (x, y). (5.) p Y (y) By definition of conditional probability, we can also define p X Y (x y) P[X x Y y] because p X Y (x y) p X,Y (x, y) p Y (y) P[X x Y y] P[Y y] P[X x Y y]. It is important to understand the randomness exhibited in a conditional PMF. In p X Y (x y), the random variable Y is fixed to a specific value Y y. The randomness of Y has been taken care by the denominator p Y (y) in Equation 5.. Therefore, there is no randomness associated with Y. The variable x in p X Y (x y) describes the randomness. In particular, we have that but p X Y (x p X,Y y) (x, y) p x x Y (y) x p X,Y (x, y) p Y (y) p X Y (x y p X,Y (x, ) y ). p y y Y (y ) Therefore, p X Y (x y) is a probability of X, not Y. p Y (y) p Y (y), Unlike marginal PMF which is a function of either x or y, e.g., p X (x) or p Y (y), a conditional PMF can be a function of both x and y. For example, p X Y (x y) is the conditional probability c 8 Stanley Chan. All Rights Reserved. 8

9 of having random variable X x, given that Y is at a fixed value y. Thus p X Y (x y) depends on both x and y. Example. Consider a joint PMF given in the following table. Find the conditional PMF p X Y (x ) and the marginal PMF p X (x). Y 3 4 X Solution. To find the marginal PMF, we need to sum over all the y for every x. Therefore, 4 x : p X () p X,Y (, y) x : p X () x 3 : p X (3) x 4 : p X (4) Hence, the marginal PMF is y 4 p X,Y (, y) y 4 p X,Y (3, y) y 4 p X,Y (4, y) y p X (x) [ ] 3 The conditional PMF p X Y (x ) is p X Y (x ) p X,Y (x, ) p Y () [ ]. [ 3 ] 0 Example. Consider two random variables X and Y defined as follows. 0, with prob 5/6, 0 4 Y, with prob /, Y X 0 0 4, with prob /6. 3 Y, with prob /3, 0 Y, with prob /6. c 8 Stanley Chan. All Rights Reserved. 9

10 Find p X Y (x y), p X (x) and p X,Y (x, y). Solution. Since Y takes two different states, we can enumerate Y 0 and Y 0 4. This gives us /, if x 0.0, /, if x, p X Y (x 0 ) /3, if x 0., and p X Y (x 0 4 ) /3, if x 0, /6, if x. /6, if x 00. The joint PDF p X,Y (x, y) can be found as ( ) ( 5 ) p X,Y (x, 0 ) p X Y (x 0 )p Y (0 6, x 0.0, ( ) ) ( 5 ) 3 6, x 0., ( ) ( 5 ) 6 6, x. ( ) ( ) p X,Y (x, 0 4 ) p X Y (x 0 4 )p Y (0 4 6, x, ( ) ) ( ) 3 6, x 0, ( ) ( ) 6 6, x 00. Therefore, the joint PDF is given by the following table. The marginal PDF p X (x) is p X (x) y p X,Y (x, y) [ ]. Conditional PDF Definition. Let X and Y be two continuous random variables. The conditional PDF of X given Y is f X Y (x y) f X,Y (x, y). (5.) f Y (y) Example. Let X and Y be two continuous random variables with a joint PDF e x e y, 0 y x < f X,Y (x, y) 0, otherwise. Find the conditional PDF f X Y (x y) and f Y X (y x). c 8 Stanley Chan. All Rights Reserved. 0

11 Solution. In order to find the conditional PDFs, we first find the marginal PDFs. f X (x) f Y (y) Therefore, the conditional PDFs are f X Y (x y) f X,Y (x, y) f Y (y) f Y X (y x) f X,Y (x, y) f X (x) f X,Y (x, y)dy f X,Y (x, y)dx x 0 y e x e y dy e x ( e x ) e x e y dx e y. e x e y e y e (x+y), x y e x e y e x ( e x ) e y, 0 y < x. e x Example. This example considers a classical detection problem. Let X be a random bit such that +, with prob / X, with prob /. Suppose that X is transmitted over a noisy channel so that the observed signal is Y X + N, where N N (0, ) is the noise which is independent to the signal X. Suppose that we observe Y > 0, is the signal more likely to be X + or X? Solution. First of all, we know that Therefore, given Y > 0, we need to find It holds that P[X + Y > 0] P[Y > 0 X +] f Y X (y + ) e (y ) π f Y X (y ) π e (y+). P[Y > 0 X +]P[X +]. P[Y > 0] 0 π e (y ) dy 0 e (y ) dy π ( ) 0 Φ Φ( ). c 8 Stanley Chan. All Rights Reserved.

12 Similarly, we have By law of total probability, we have that P[Y > 0 X ] Φ(+). P[Y > 0] P[Y > 0 X +]P[X +] + P[Y > 0 X ]P[X ] (Φ(+) + Φ( )), because Φ(+) + Φ( ) Φ(+) + Φ(+). Therefore, P[X + Y > 0] Φ( ) The implication is that if Y > 0, the posterior probability P[X + Y > 0] The complement of this result gives that P[X Y > 0] Therefore, X + is more likely. 5.4 Joint Expectation, Moment, and Covariance Joint Expectation and Joint Moment Definition. Let X and Y be two random variables. The joint expectation is E[XY ] xyp X,Y (x, y) (5.3) y x if X and Y are discrete, or E[XY ] xyf X,Y (x, y)dxdy (5.4) if X and Y are continuous. Joint expectation is also called correlation. Theorem. If X and Y are independent, then E[XY ] E[X]E[Y ]. (5.5) Proof. We only prove the discrete case because the continuous can be proved similarly. If X and Y are independent, we have p X,Y (x, y) p X (x)p Y (y). Therefore, E[XY ] xyp X,Y (x, y) ( ) ( ) xyp X (x)p Y (y) xp X (x) yp Y (y) y x y x x E[X]E[Y ]. y c 8 Stanley Chan. All Rights Reserved.

13 In general, for any two independent random variables and two functions f and g, it holds that E[f(X)g(Y )] E[f(X)]E[g(Y )]. (5.6) Of particular interest is the function f(x) X k and g(y ) Y l, which gives the definition of joint moments. Definition 3. Let X and Y be two random variables. The joint moment is E[X k Y l ] x k y l p X,Y (x, y) (5.7) y x if X and Y are discrete, or if X and Y are continuous. Covariance E[X k Y l ] x k y l f X,Y (x, y)dxdy (5.8) The concept of covariance can be considered as a generalization of the concept of variance. Instead of measuring (X µ X ), a covariance of two random variables measures (X µ X )(Y µ Y ). Thus while the variance is always non-negative, a covariance can be negative. Definition 4. Let X and Y be two random variables. The covariance is where µ X E[X] and µ Y E[Y ]. Cov(X, Y ) E[(X µ X )(Y µ Y )], (5.9) The following theorem illustrates a few important properties of the covariance. Theorem 3. The following results hold: a. Cov(X, Y ) E[XY ] E[X]E[Y ] b. X and Y are independent Cov(X, Y ) 0. c. Cov(X, Y ) 0 X and Y are independent. Remark: If Y X, then Cov(X, Y ) E[X ] E[X] Var[X]. Proof. The proof of part (a) is straight-forward: Cov(X, Y ) E[(X µ X )(Y µ Y )] E[XY Xµ Y Y µ X + µ X µ Y ] E[XY ] µ X µ Y. c 8 Stanley Chan. All Rights Reserved. 3

14 The proof of part (b) follows from Equation (5.5). If X and Y are independent, then E[XY ] E[X]E[Y ]. In this case, Cov(X, Y ) E[XY ] E[X]E[Y ] E[X]E[Y ] E[X]E[Y ] 0. Proof of part (c) requires a counter example. Consider a discrete random variable Z with PMF p Z (z) [ ]. Let X and Y be X cos π Z and Y sin π Z. Then, we can show that E[X] 0, E[Y ] 0. The covariance is Cov(X, Y ) E[(X 0)(Y 0)] E [cos π Z sin π ] Z [ E [ (sin π0) 4 + (sin π) 4 + (sin π) 4 + (sin π3) 4 ] sin πz ] 0. Our next goal is to show that X and Y are dependent. To this end, we only need to show that p X,Y (x, y) p X (x)p Y (y). The joint PMF p X,Y (x, y) can be found by noting that Z 0 X, Y 0 Z X 0, Y Z X, Y 0 Z 3 X 0, Y. Thus, the PMF is The marginal PMFs are p X,Y (x, y) p X (x) [ 4 ] 4, py (y) [ 4 4]. The product p X (x)p Y (y) is p X (x)p Y (y) Therefore, p X,Y (x, y) p X (x)p Y (y), although E[XY ] E[X]E[Y ]. c 8 Stanley Chan. All Rights Reserved

15 The next theorem is general to random variables that are not necessarily independent. Theorem 4. For any X and Y (not necessarily independent), a. E[X + Y ] E[X] + E[Y ]. b. Var[X + Y ] Var[X] + Cov(X, Y ) + Var[Y ]. Of course, if X and Y are independent, then Cov(X, Y ) 0 and hence Var[X + Y ] Var[X] + Var[Y ]. Proof. Proof of (a). Recall the definition of joint expectation: E[X + Y ] y y x x (x + y)p X,Y (x, y) x xp X,Y (x, y) + yp X,Y (x, y) x y x ( ) x p X,Y (x, y) + ( ) y p X,Y (x, y) y y x xp X (x) + y yp Y (y) E[X] + E[Y ]. Proof of (b). Var[X + Y ] E[(X + Y ) ] E[X + Y ] E[(X + Y ) ] (µ X + µ Y ) E[X + XY + Y ] (µ X + µ X µ Y + µ Y ) E[X ] µ X + E[Y ] µ Y + (E[XY ] µ X µ Y ) Var[X] + Cov(X, Y ) + Var[Y ]. Correlation Coefficient Definition 5. Let X and Y be two random variables. The correlation coefficient is ρ Cov(X, Y ) Var[X]Var[Y ] (5.) Correlation coefficient provides a convenient way of assessing the relationship between two random variables. The following proposition outlines its properties. c 8 Stanley Chan. All Rights Reserved. 5

16 Theorem 5. The correlation coefficient ρ has the properties that: When X Y (fully correlated), ρ +. When X Y (negatively correlated), ρ. When X and Y are independent, ρ 0. However, if ρ 0, it does not imply that X and Y are independent. Proof. When X Y, ρ Var[X] Var[X]Var[X]. When X Y, ρ E[X( X)] E[X]E[ X] Var[X]Var[ X]. When X and Y are independent, then Cov(X, Y ) 0. A counter example for the converse can be found in Theorem 3(c). In general a correlation coefficient is always bounded between - and. Theorem 6. Correlation coefficient always satisfies ρ. (5.) Proof. We prove this result by Cauchy inequality. Cauchy inequality states that E[XY ] E[X ]E[Y ]. Therefore, we have Cov(X, Y ) E[(X µ X )(Y µ Y )] E[(X µ X ) ]E[(Y µ Y ) ] Var[X]Var[Y ]. Hence, we have Cov(X, Y ) Var[X]Var[Y ]. c 8 Stanley Chan. All Rights Reserved. 6

17 5.5 Conditional Expectation When dealing with two dependent random variables, sometimes we would like to determine the expectation of a random variable when the second random variable takes a particular state. The conditional expectation is a formal way of doing so. Definition 6. The conditional expectation of X given Y y is E[X Y y] x xp X Y (x y) (5.) for the discrete random variables, and E[X Y y] for the continuous random variables. There are a few points to note here: xf X Y (x y)dx (5.3) In E[X Y y], the expectation is taken over X. In other words, we are exploring the randomness of X. To evaluate the conditional expectation, the PDF is f X Y (x y). The random variable Y is fixed at Y y. Thus, there is no randomness associated with Y. The resulting object E[X Y y] is a function of y because the random variable X has been eliminated by the expectation. Conditional expectation is meaningful only when X and Y are dependent. If X and Y are independent, then f X Y (x y) f X (x) and so E[X Y y] E[X]. That is the conditional expectation does not really depend on y. If we do not specify a particular value Y takes, then we refer to E[X Y ], which is a random variable in Y. One of the most useful results in conditional expectation is the following theorem. Theorem 7 (Law of Total Expectation). E[X] y E[X Y y]p Y (y), or E[X] E[X Y y]p Y (y)dy. (5.4) c 8 Stanley Chan. All Rights Reserved. 7

18 Proof. We only prove the discrete case, as the continuous case can be proved by replacing summation with integration. E[X] xp X (x) ( ) x p X,Y (x, y) xp X Y (x y)p Y (y) x x y x y ( ) xp X Y (x y) p Y (y) E[X Y y]p Y (y). y x y Corollary. Let X and Y be two random variables. Then, E[X] E [E[X Y ]]. (5.5) Proof. The previous theorem states that E[X] y E[X Y y]p Y (y). If we treat E[X Y y] as a function of y, e.g., h(y), then E[X] y E[X Y y]p Y (y) y h(y)p Y (y) E[h(Y )] E [E[X Y ]]. Remark: To be slightly more clear, the two expectations in Equation (5.5) are E[X] E Y [ EX Y [X Y ] ], i.e., the inner expectation is taken over f X Y, whereas the outer expectation is f Y. Example. Consider a joint PMF given by the following table. Find E[X Y 0 ] and E[X Y 0 4 ]. Y X Solution. To find the conditional expectation, we first need to know the conditional PMF. 0 0 ] 6 p X Y (x 0 ) [ 3 p X Y (x 0 4 ) [ 0 0 c 8 Stanley Chan. All Rights Reserved ].

19 Therefore, the conditional expectations are ( ) ( ) ( ) E[X Y 0 ] (0 ) + (0 ) + () 3 6 ( ) ( ) ( ) E[X Y 0 4 ] () + (0) + (00) 3 6 From the conditional expectations we can also find E[X]: E[X] E[X Y 0 ]p Y (0 + E[X Y 0 4 ]p Y (0 4 ) ( ) ( ) ( ) ( ) Example. Consider two random variables X and Y. The random variable X is Gaussian distributed with X N (µ, σ ). The random variable Y has a conditional distribution Y X N (X, X ). Find E[Y ]. Solution. We know that the two PDFs are f X (x) (x µ) e σ, and f Y X (y x) πσ The conditional expectation of Y given X is E[Y X x] yf Y X (y x)dy (y x) e x. πx (y x) y e x dy x. πx The last equality holds because we are computing the expectation of a Gaussian random variable with mean x. Finally, applying the law of total expectation we can show that E[Y ] E[Y X x]f X (x)dx x πσ e (x µ) σ dx µ. Application: MMSE Estimator. (Optional) Consider a pair of random variables (X, Y ). We observed this pair of random variables. Can we determine the relationship between them? That is, can we design a function g such that we can minimize the error min g E[(Y g(x)) ]. c 8 Stanley Chan. All Rights Reserved. 9

20 We may assume that we know the distributions f X (x), f Y (y), and f Y X (y x). The solution to this problem is called the minimum mean squared error (MMSE) estimator. Theorem 8. The MMSE estimator is a function g which minimizes the mean squared error: g argmin E[(Y g(x)) ], g and is given by Proof. By law of total expectation, we have that E[(Y g(x)) ] g (x) E[Y X x]. (5.6) E[(Y g(x)) X x]f Y X (y x)dy. Since all terms in this integration are non-negative, we can minimize the overall by minimizing the inner expectation. The inner expectation is E[(Y g(x)) X x]. When conditioned on X x, the function g(x) g(x), and is independent of Y. Therefore, we can treat g(x) c for some constant c and try to determine c. This means that we want to find c to minimize c argmin c argmin c E[(Y c) X x] (y c) f Y X (y x)dy. Take derivative with respect to c and set it to zero yields ( d ) (y c) f Y X (y x)dy 0 (y c)f Y X (y x)dy 0. dc which implies that c yf Y X (y x)dy E[Y X x]. Therefore, the inner expectation is minimized when g(x) E[Y X x]. 5.6 Sum of Two Random Variables One typical problem we encounter in engineering is that given two random variables X and Y, what is the PDF of the sum, i.e., X + Y? Such problem arises naturally when we want to evaluate the average of a number of random variables, e.g., the sample mean of a collection c 8 Stanley Chan. All Rights Reserved.

21 of data points. In this section we will discuss a general principle of how to determine the PDF of a sum of two random variables. To start with, we consider two random variable X and Y with PDFs f X (x) and f Y (y) respectively. Let us define the sum as Z X + Y. Our goal is to determine the PDF of Z. Theorem 9. Let X and Y be two independent random variables with PDFs f X (x) and f Y (y) respectively. Let Z X + Y. The PDF of Z is given by f Z (z) (f X f Y )(z) where denotes the convolution. Proof. Let us start by analyzing the CDF of Z. The CDF of Z is F Z (z) P[Z z] z y f X (z y)f Y (y)dy, (5.7) f X (x)f Y (y)dxdy, where the integration limits can be seen from Figure 5.4. Then, by fundamental theorem of calculus, we can show that f Z (z) d dz F Z(z) d dz z y where denotes the convolution. ( d dz z y f X (x)f Y (y)dxdy ) f X (x)f Y (y)dx dy f X (z y)f Y (y)dy (f X f Y )(z), The result of this derivation shows that the PDF of X + Y is the convolution of f X (x) and f Y (y). The following example illustrate how we can compute the convolution. Example. Let X and Y be independent, and let xe x, x 0 f X (x) f Y (y) 0, x < 0 ye y, y 0 0, y < 0. c 8 Stanley Chan. All Rights Reserved.

22 Figure 5.4: The shaded region highlights the set X + Y Z. Find the PDF of Z X + Y. Solution. Using the results derived above, we see that f Z (z) z f X (z y)f Y (y)dy f X (z y)f Y (y)dy, where the upper limit z came from the fact that x 0. Therefore, since Z X + Y, we must have Z Y X 0 and so Z Y. Substituting the PDFs into the integration yields For z < 0, f Z (z) 0. f Z (z) z 0 (z y)e (z y) ye y dy z3 6 e z, z 0. In general, function of two random variables is not limited to summation. The following example illustrates the case of a product of two random variables. Example. Let X and Y be two independent random variables such that x, if 0 x,, if 0 y, f X (x) and f Y (y) 0, otherwise, 0, otherwise. Let Z XY. Find f Z (z). Solution. The CDF of Z can be evaluated as F Z (z) P[Z z] P[XY z] z y f X (x)f Y (y)dxdy. c 8 Stanley Chan. All Rights Reserved.

23 Taking the derivative yields z y f Z (z) d dz F Z(z) d dz (a) y f X( z y )f Y (y)dy, f X (x)f Y (y)dxdy where (a) holds by the fundamental theorem of calculus. The upper and lower limit of this integration can be determined by noting that z 0 z y x, which implies that z y. Since y, we have that z y. Therefore, the PDF is z ( ) z f Z (z) y f X f Y (y)dy y For z < 0, f Z (z) 0. z dy ( z), z 0. y 5.7 Two-dimensional Gaussian Covariance Matrix and Joint Gaussian PDF Among many joint distributions, the joint Gaussian is of particular interest because of its usefulness. To define a joint Gaussian distribution, we first define a few notations: [ ] [ ] [ ] X µ Var(X ) Cov(X X, µ, Σ, X ). Cov(X, X ) Var(X ) X µ The vector µ is called the mean vector, and the matrix Σ is called the covariance matrix. It is not difficult to show that the covariance matrix can be defined in the following way. Theorem 0. The covariance matrix Σ is equivalent to Σ E[(X µ)(x µ) T ]. (5.8) Proof. For a two-dimensional random variable, the theorem holds because [[ ] E[(X µ)(x µ) T X µ [X ] E ] ] µ X µ X µ [ ] (X E µ ) (X µ )(X µ ) (X µ )(X µ ) (X µ ) [ ] Var(X, X ) Cov(X, X ). Cov(X, X ) Var(X, X ) c 8 Stanley Chan. All Rights Reserved. 3

24 Clearly, the definition can be extended to random vector with any finite dimension. We can also prove the following property of the covariance matrix. Theorem. The covariance matrix Σ is symmetric positive semi-definite, i.e., Σ T Σ, and v T Σv 0, v R d. Proof. Symmetry is immediate from the definition, because Cov(X i, X j ) Cov(X j, X i ). The positive semi-definiteness comes from the fact that v T Σv v T E[(X µ X )(X µ X ) T ]v E[v T (X µ X )(X µ X ) T v] E[u T u], let u (X µ X ) T v E[ u ] 0. The PDF of a multi- With these tools in hand, we can now define a joint Gaussian. dimensional Gaussian is given by the following definition. Definition 7. A d-dimensional joint Gaussian has a PDF f X (x) (π)d Σ exp } (x µ)t Σ (x µ), (5.9) where d denotes the dimensionality of the vector x. In this course, we are mostly interested in the case when d. As a special case, if we assume that X and X are independent, then we can show the following result. Theorem. Let x [X, X ] T. If X and X are independent, then ( f X (x) exp (x } ) ( µ ) exp (x } ) µ ), (5.30) (π)σ (π)σ σ i.e., the product of two D Gaussians. σ c 8 Stanley Chan. All Rights Reserved. 4

25 Proof. To show this result, we note that if X and X are independent, then Σ [ ] [ ] Var(X ) Cov(X, X ) Var(X ) 0 Cov(X, X ) Var(X ) 0 Var(X ) [ ] σ 0 0 σ. The determinant Σ is Σ σ σ. Therefore, (x µ) T Σ (x µ) [ x µ x µ ] [ σ (x µ ) σ 0 σ + (x µ ). σ Substituting these results into Equation 5.9 yields the desired result. ] [x ] 0 µ x µ Geometric Interpretation Geometrically, the mean µ and the covariance matrix Σ can be interpreted as the center and the radius of the ellipse representing the Gaussian. Figure 5.5 illustrate three examples. As one can observe in these examples, the mean vector µ controls the center of the Gaussian. The radius and orientation of the Gaussian is controlled by the covariance matrix. Figure 5.5: The center and the radius of the ellipse is determined by µ and Σ. The precise relation of the radius and orientation of the Gaussian is determined by eigenc 8 Stanley Chan. All Rights Reserved. 5

26 vectors and eigenvalues of Σ. Definition 8. The covariance matrix Σ can be decomposed as Σ UΛU T, (5.3) for some unitary matrix U and diagonal matrix Λ. The columns of U are called the eigenvectors, and the entries of Λ are called the eigenvalues. If we write out the definition of the eigenvector and eigenvalue, we can see that (at least for the two-dimensional case): [ ] [ ] Σ UΛU T u u λ 0 u T 0 λ u T. The column vector u defines the direction of the major axis, and u defines the direction of the minor axis. The values λ and λ define the radii of the axes, respectively. See Figure 5.6 for illustration. Figure 5.6: The center and the radius of the ellipse is determined by µ and Σ. Maximum-a-Posteriori Classifier Consider a dataset of two classes C and C. We assume that all data within each class follows a Gaussian distribution. More specifically, we assume that X C N (µ, Σ ), and X C N (µ, Σ ). Suppose we are given testing data point x, how do we design a classifier to classify this data point? To answer this question, we first need to determine the two PDFs. Assume that the probability of obtaining C is π and the probability of obtaining C is π. That is, f C (C ) π, c 8 Stanley Chan. All Rights Reserved. 6

27 and f C (C ) π, with π + π. The conditional PDFs are given by f X C (x C ) (π)d Σ exp } (x µ ) T Σ (x µ ) f X C (x C ) (π)d Σ exp } (x µ ) T Σ (x µ ) One possible way of designing a classifier is to test the posterior distribution, and check f C X (C x) f C X (C x). (5.3) If f C X (C x) f C X (C x), we claim that the class is C. Otherwise it is C. By Bayes theorem, we can rewrite the posterior distribution as Substituting the Gaussians we have f X C (x C )f C (C ) f X C (x C )f C (C ). π (π)d Σ e (x µ )T Σ (x µ ) π (π)d Σ e (x µ )T Σ (x µ ). The comparison defined by this posterior distribution is called the maximum-a-posteriori (MAP) classification. Definition 9. The maximum-a-posteriori (MAP) classification is a test to check whether f X C (x C )f C (C ) f X C (x C )f C (C ). (5.33) To demonstrate how the MAP classification can be used in practice, we consider a special case when Σ Σ, and π π /. Theorem 3. Let X C N (µ, Σ ), and X C N (µ, Σ ). Suppose that Σ Σ Σ, and π π /. Then the MAP classifier of C and C is w T x + x 0 0, (5.34) where w Σ (µ µ ), and x 0 µ T Σ µ + µ T Σ µ }. Proof. When Σ Σ, and π π /, the MAP classifier can be simplified as e (x µ )T Σ (x µ ) e (x µ )T Σ (x µ ), (5.35) c 8 Stanley Chan. All Rights Reserved. 7

28 which implies that (x µ ) T Σ (x µ ) (x µ ) T Σ (x µ ). (5.36) Note that the sign is flipped because there is a / term in the exponential. Rewriting the terms we obtain an equivalent expression x T Σ (µ µ ) µ T Σ µ + µ T Σ µ }. (5.37) If we define w Σ (µ µ ), and x 0 µ T Σ µ + µ T Σ µ }, the above expression can be simplified as w T x + x 0 0. (5.38) The result above shows a linear classifier. Given a data point x, all we need to do is to project x by w, and then check whether the intercept w T x + x 0 is less than or greater than 0. If it is less than 0, then we claim that the class is C. Figure 5.7: Classifying two classes of data points. c 8 Stanley Chan. All Rights Reserved. 8

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

STA 256: Statistics and Probability I

STA 256: Statistics and Probability I Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. There are situations where one might be interested

More information

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline. Random Variables Amappingthattransformstheeventstotherealline. Example 1. Toss a fair coin. Define a random variable X where X is 1 if head appears and X is if tail appears. P (X =)=1/2 P (X =1)=1/2 Example

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

Chapter 4. Continuous Random Variables 4.1 PDF

Chapter 4. Continuous Random Variables 4.1 PDF Chapter 4 Continuous Random Variables In this chapter we study continuous random variables. The linkage between continuous and discrete random variables is the cumulative distribution (CDF) which we will

More information

3 Multiple Discrete Random Variables

3 Multiple Discrete Random Variables 3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f

More information

ENGG2430A-Homework 2

ENGG2430A-Homework 2 ENGG3A-Homework Due on Feb 9th,. Independence vs correlation a For each of the following cases, compute the marginal pmfs from the joint pmfs. Explain whether the random variables X and Y are independent,

More information

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

BASICS OF PROBABILITY

BASICS OF PROBABILITY October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

ECE Lecture #9 Part 2 Overview

ECE Lecture #9 Part 2 Overview ECE 450 - Lecture #9 Part Overview Bivariate Moments Mean or Expected Value of Z = g(x, Y) Correlation and Covariance of RV s Functions of RV s: Z = g(x, Y); finding f Z (z) Method : First find F(z), by

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2}, ECE32 Spring 25 HW Solutions April 6, 25 Solutions to HW Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

2 (Statistics) Random variables

2 (Statistics) Random variables 2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Probability review. September 11, Stoch. Systems Analysis Introduction 1

Probability review. September 11, Stoch. Systems Analysis Introduction 1 Probability review Alejandro Ribeiro Dept. of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu http://www.seas.upenn.edu/users/~aribeiro/ September 11, 2015 Stoch.

More information

Lecture 16 : Independence, Covariance and Correlation of Discrete Random Variables

Lecture 16 : Independence, Covariance and Correlation of Discrete Random Variables Lecture 6 : Independence, Covariance and Correlation of Discrete Random Variables 0/ 3 Definition Two discrete random variables X and Y defined on the same sample space are said to be independent if for

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3 Probability Paul Schrimpf January 23, 2018 Contents 1 Definitions 2 2 Properties 3 3 Random variables 4 3.1 Discrete........................................... 4 3.2 Continuous.........................................

More information

Multivariate distributions

Multivariate distributions CHAPTER Multivariate distributions.. Introduction We want to discuss collections of random variables (X, X,..., X n ), which are known as random vectors. In the discrete case, we can define the density

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

STAT 430/510: Lecture 16

STAT 430/510: Lecture 16 STAT 430/510: Lecture 16 James Piette June 24, 2010 Updates HW4 is up on my website. It is due next Mon. (June 28th). Starting today back at section 6.7 and will begin Ch. 7. Joint Distribution of Functions

More information

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix: Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms,

More information

Mathematics 426 Robert Gross Homework 9 Answers

Mathematics 426 Robert Gross Homework 9 Answers Mathematics 4 Robert Gross Homework 9 Answers. Suppose that X is a normal random variable with mean µ and standard deviation σ. Suppose that PX > 9 PX

More information

EE4601 Communication Systems

EE4601 Communication Systems EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two

More information

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs Lecture Notes 3 Multiple Random Variables Joint, Marginal, and Conditional pmfs Bayes Rule and Independence for pmfs Joint, Marginal, and Conditional pdfs Bayes Rule and Independence for pdfs Functions

More information

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc. ECE32 Exam 2 Version A April 21, 214 1 Name: Solution Score: /1 This exam is closed-book. You must show ALL of your work for full credit. Please read the questions carefully. Please check your answers

More information

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables. Probability UBC Economics 326 January 23, 2018 1 2 3 Wooldridge (2013) appendix B Stock and Watson (2009) chapter 2 Linton (2017) chapters 1-5 Abbring (2001) sections 2.1-2.3 Diez, Barr, and Cetinkaya-Rundel

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University. Random Signals and Systems Chapter 3 Jitendra K Tugnait Professor Department of Electrical & Computer Engineering Auburn University Two Random Variables Previously, we only dealt with one random variable

More information

More on Distribution Function

More on Distribution Function More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function F X. Theorem: Let X be any random variable, with cumulative distribution

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 17-27 Review Scott Sheffield MIT 1 Outline Continuous random variables Problems motivated by coin tossing Random variable properties 2 Outline Continuous random variables Problems

More information

Appendix A : Introduction to Probability and stochastic processes

Appendix A : Introduction to Probability and stochastic processes A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

Multivariate probability distributions and linear regression

Multivariate probability distributions and linear regression Multivariate probability distributions and linear regression Patrik Hoyer 1 Contents: Random variable, probability distribution Joint distribution Marginal distribution Conditional distribution Independence,

More information

ACM 116: Lectures 3 4

ACM 116: Lectures 3 4 1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance

More information

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay 1 / 13 Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay August 8, 2013 2 / 13 Random Variable Definition A real-valued

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

Joint Distribution of Two or More Random Variables

Joint Distribution of Two or More Random Variables Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few

More information

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution: 4.1 Bivariate Distributions. Chapter 4. Multivariate Distributions For a pair r.v.s (X,Y ), the Joint CDF is defined as F X,Y (x, y ) = P (X x,y y ). Obviously, the marginal distributions may be obtained

More information

5 Operations on Multiple Random Variables

5 Operations on Multiple Random Variables EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y

More information

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

MAS113 Introduction to Probability and Statistics. Proofs of theorems

MAS113 Introduction to Probability and Statistics. Proofs of theorems MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 You can t see this text! Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Probability Review - Part 2 1 /

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 17: Continuous random variables: conditional PDF Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin

More information

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π Solutions to Homework Set #5 (Prepared by Lele Wang). Neural net. Let Y X + Z, where the signal X U[,] and noise Z N(,) are independent. (a) Find the function g(y) that minimizes MSE E [ (sgn(x) g(y))

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we

More information

Bivariate Distributions

Bivariate Distributions STAT/MATH 395 A - PROBABILITY II UW Winter Quarter 17 Néhémy Lim Bivariate Distributions 1 Distributions of Two Random Variables Definition 1.1. Let X and Y be two rrvs on probability space (Ω, A, P).

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type. Expectations of Sums of Random Variables STAT/MTHE 353: 4 - More on Expectations and Variances T. Linder Queen s University Winter 017 Recall that if X 1,...,X n are random variables with finite expectations,

More information

3-1. all x all y. [Figure 3.1]

3-1. all x all y. [Figure 3.1] - Chapter. Multivariate Distributions. All of the most interesting problems in statistics involve looking at more than a single measurement at a time, at relationships among measurements and comparisons

More information

Practice Examination # 3

Practice Examination # 3 Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single

More information

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27 Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27 Outline 1 Elements of probability 2 Random variables 3 Multiple

More information

Let X and Y denote two random variables. The joint distribution of these random

Let X and Y denote two random variables. The joint distribution of these random EE385 Class Notes 9/7/0 John Stensby Chapter 3: Multiple Random Variables Let X and Y denote two random variables. The joint distribution of these random variables is defined as F XY(x,y) = [X x,y y] P.

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Chapter 4. Chapter 4 sections

Chapter 4. Chapter 4 sections Chapter 4 sections 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP: 4.8 Utility Expectation

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Multivariate Distributions (Hogg Chapter Two)

Multivariate Distributions (Hogg Chapter Two) Multivariate Distributions (Hogg Chapter Two) STAT 45-1: Mathematical Statistics I Fall Semester 15 Contents 1 Multivariate Distributions 1 11 Random Vectors 111 Two Discrete Random Variables 11 Two Continuous

More information

STT 441 Final Exam Fall 2013

STT 441 Final Exam Fall 2013 STT 441 Final Exam Fall 2013 (12:45-2:45pm, Thursday, Dec. 12, 2013) NAME: ID: 1. No textbooks or class notes are allowed in this exam. 2. Be sure to show all of your work to receive credit. Credits are

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables CDA6530: Performance Models of Computers and Networks Chapter 2: Review of Practical Random Variables Two Classes of R.V. Discrete R.V. Bernoulli Binomial Geometric Poisson Continuous R.V. Uniform Exponential,

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous

More information

Data Analysis and Monte Carlo Methods

Data Analysis and Monte Carlo Methods Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -

More information

Notes for Math 324, Part 19

Notes for Math 324, Part 19 48 Notes for Math 324, Part 9 Chapter 9 Multivariate distributions, covariance Often, we need to consider several random variables at the same time. We have a sample space S and r.v. s X, Y,..., which

More information

Chapter 5,6 Multiple RandomVariables

Chapter 5,6 Multiple RandomVariables Chapter 5,6 Multiple RandomVariables ENCS66 - Probabilityand Stochastic Processes Concordia University Vector RandomVariables A vector r.v. is a function where is the sample space of a random experiment.

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Continuous r.v practice problems

Continuous r.v practice problems Continuous r.v practice problems SDS 321 Intro to Probability and Statistics 1. (2+2+1+1 6 pts) The annual rainfall (in inches) in a certain region is normally distributed with mean 4 and standard deviation

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables CDA5530: Performance Models of Computers and Networks Chapter 2: Review of Practical Random Variables Definition Random variable (R.V.) X: A function on sample space X: S R Cumulative distribution function

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

4 Pairs of Random Variables

4 Pairs of Random Variables B.Sc./Cert./M.Sc. Qualif. - Statistical Theory 4 Pairs of Random Variables 4.1 Introduction In this section, we consider a pair of r.v. s X, Y on (Ω, F, P), i.e. X, Y : Ω R. More precisely, we define a

More information

UCSD ECE153 Handout #27 Prof. Young-Han Kim Tuesday, May 6, Solutions to Homework Set #5 (Prepared by TA Fatemeh Arbabjolfaei)

UCSD ECE153 Handout #27 Prof. Young-Han Kim Tuesday, May 6, Solutions to Homework Set #5 (Prepared by TA Fatemeh Arbabjolfaei) UCSD ECE53 Handout #7 Prof. Young-Han Kim Tuesday, May 6, 4 Solutions to Homework Set #5 (Prepared by TA Fatemeh Arbabjolfaei). Neural net. Let Y = X + Z, where the signal X U[,] and noise Z N(,) are independent.

More information