Multivariate Distributions (Hogg Chapter Two)
|
|
- Lily Gibson
- 6 years ago
- Views:
Transcription
1 Multivariate Distributions (Hogg Chapter Two) STAT 45-1: Mathematical Statistics I Fall Semester 15 Contents 1 Multivariate Distributions 1 11 Random Vectors 111 Two Discrete Random Variables 11 Two Continuous Random Variables 4 1 Marginalization 5 13 Expectation Values Moment Generating Function 6 14 Transformations Transformation of Discrete RVs 6 14 Transformation of Continuous RVs 7 Conditional Distributions 9 1 Conditional Probability 9 Conditional Probability Distributions 1 1 Example 11 Conditional Expectations 11 3 Independence Dependent rv example # Dependent rv example #1 14 Copyright 15, John T Whelan, and all that 33 Factoring the joint pdf Expectation Values 15 3 Covariance and Correlation 15 4 Generalization to Several RVs Linear Algebra: Reminders and Notation 16 4 Multivariate Probability Distributions Marginal and Conditional Distributions 18 4 Mutual vs Pairwise Independence The Variance-Covariance Matrix 44 Transformations of Several RVs 1 45 Example #1 46 Example # (fewer Y s than Xs) 3 Tuesday 8 September 15 Read Section 1 of Hogg 1 Multivariate Distributions We introduced a random variable X as a function X(c) which assigned a real number to each outcome c in the sample space 1
2 C There s no reason, of course, that we can t define multiple such functions, and we now turn to the formalism for dealing with multiple random variables at the same time 11 Random Vectors We can think of several random variables X 1, X, X n as making up the elements of a random vector X 1 X X n X (11) We can define an event X A corresponding to the random vector X lying in some region A R n, and use the probability function to define the probability P (X A) of this event We ll focus on n initially, and define the joint cumulative distribution function of two random variables X 1 and X as F X1,X (x 1, x ) P ([X 1 x 1 ] [X x ]) (1) As in the case of a single random variable, this can be used as a starting point for defining the probability of any event we like For example, with a bit of algebra it s possible to show that P [(a 1 < X 1 b 1 ) (a < X b )] F (b 1, b ) F (b 1, a ) F (a 1, b ) + F (a 1, a ) (13) where we have suppressed the subscript X 1, X since there s only one cdf of interest at the moment We won t really dwell on this, though, since it s a lot easier to work with joint probability mass and density functions 111 Two Discrete Random Variables If both random variables are discrete, ie, they can take on either a finite set of values, or at most countably many values, the joint cdf will once again be constant aside from discontinuities, and we can describe the situation using a joint probability mass function p X1,X (x 1, x ) P [(X 1 x 1 ) (X x )] (14) We give an example of this, in which we for convenience refer to the random variables as X and Y Recall the example of three flips of a fair coin, in which we defined X as the number of heads Now let s define another random variable Y, which is the number of tails we see before the first head is flipped (If all three flips are tails, then Y is defined to be 3 We can work out the probabilities by first just enumerating all of the outcomes, which are assumed to have equal probability because the coin is fair: outcome c P (c) X value Y value HHH 1/8 3 HHT 1/8 HTH 1/8 HTT 1/8 1 THH 1/8 1 THT 1/8 1 1 TTH 1/8 1 TTT 1/8 3
3 We can look through and see that 1/8 x 3, y /8 x, y 1/8 x 1, y 1/8 x, y 1 p X,Y (x, y) 1/8 x 1, y 1 1/8 x 1, y 1/8 x, y 3 otherwise This is most easily summarized in a table: (15) out the joint cdf, but if you do, you get x < x < 1, y < 3 1/8 x < 1, 3 y 1 x, y < 1/8 1 x <, y < 1 /8 1 x <, 1 y < 3/8 1 x <, y < 3 F X,Y (x, y) 4/8 1 x <, 3 y 3/8 x < 3, y < 1 5/8 x < 3, 1 y < 6/8 x < 3, y < 3 8/8 x, 3 y 4/8 3 x, y < 1 7/8 3 x, 1 y < 3 (16) This is not very enlightening, and not really much more so if you plot it: y p X,Y (x, y) 1 3 1/8 1 1/8 1/8 1/8 x /8 1/8 3 1/8 y 3 1 1/8 4/8 3/8 /8 1/8 8/8 6/8 5/8 3/8 8/8 7/8 7/8 4/8 Note that it s a lot more convenient to work with the joint pmf than the joint cdf It takes a fair bit of concentration to work 1 3 x 3
4 So instead, we ll work with the joint pmf and use it to calculate probabilities like and P (X + Y ) p X,Y (, ) + p X,Y (1, 1) P [( < X ) (Y 1)] In general, p X,Y (1, ) + p X,Y (1, 1) + p X,Y (, ) + p X,Y (, 1) P [(X, Y ) A] (17) (18) (x,y) A 11 Two Continuous Random Variables p X,Y (x, y) (19) For example, for a rectangular region we have b ( d ) P [(a<x <b) (c<y <d)] f X,Y (x, y) dy dx (11) a We can connect the joint pdf to the joint cdf by considering the event ( < X x) ( < Y y): x ( y ) P [(<X x) (<Y y)] f X,Y (t, u) du dt, (113) where we have called the integration variables t and u rather than x and y because the latter were already in use As another example, for the event X + Y < c, where the region of integration looks like this: c c On the other hand, we may be dealing with continuous random variables, which means that the joint cdf F X,Y (x, y) is continuous Then we can proceed as before and define a probability density function by taking derivatives of the cdf In this case, since F X,Y (x, y) has multiple arguments, we take the partial derivative so that the joint pdf is f X,Y (x, y) F X,Y (x, y) x y (11) We won t worry too much about this derivative for now, since in practice, we will start with the joint pdf rather than differentiating the joint cdf to get it We can use the pdf to assign a probability for the random vector (X, Y ) to be in some region of the (x, y) plane P [(X, Y ) A] f X,Y (x, y) dx dy (111) (x,y) A we have y c c P (X + Y < c) c c x ( c x ( c y c f X,Y (x, y) dy ) dx ) f X,Y (x, y) dx dy (114) 4
5 For a more explicit demonstration of why this works, consult your notes from Multivariable Calculus (specifically Fubini s theorem) and/or Probability and/or edu/~whelan/courses/13_1sp_116_345/notes5pdf As an example, consider the joint pdf e x y < x <, < y < f(x, y) (115) otherwise and the event X < Y The region over which we need to integrate is x >, y >, x < y: y 1 1 x If we do the y integral first, the limits will be set by x/ < y <, and if we do the x integral first, they will be < x < y Doing the y integral first will give us a contribution from only one end of the integral, so let s do it that way P (X < Y ) x/ e x y dy dx e x e x/ dx e x [ e y] x/ dx e 3x/ dx 3 e 3x/ 3 (116) 1 Marginalization One of the events we can define given the probability distribution for two random variables X and Y is X x for some value x In the case of a pair of discrete random variables, this is P (X x) y p X,Y (x, y) But of course, P (X x) is just the pmf of X; we call this the marginal pmf p X (x) and define p X (x) P (X x) y p X (y) P (Y y) x p X,Y (x, y) (117) p X,Y (x, y) (118) (119) Returning to our coin-flip example, we can write the marginal pmfs for X and Y in the margins of the table: y p X,Y (x, y) 1 3 p X (x) 1/8 1/8 x 1 1/8 1/8 1/8 3/8 /8 1/8 3/8 3 1/8 1/8 p Y (y) 4/8 /8 1/8 1/8 For a pair of continuous random variables, we know that P (X x) but we can find the marginal cdf x ( ) F X (x) P (X x) f X,Y (t, y) dy dt (1) and then take the derivative to get the marginal pdf and likewise f X (x) F X(x) f Y (y) f X,Y (x, y) dy (11) f X,Y (x, y) dx (1) 5
6 The act of summing or integrating over arguments we don t care about, in order to get a marginal probability distribution, is called marginalizing Thursday 1 September 15 Read Section of Hogg 13 Expectation Values We can define the expectation value of a function of two discrete random variables in the straightforward way E[g(X 1, X )] x 1,x g(x 1, x ) p(x 1, x ) (13) and for two continuous random variables E[g(X 1, X )] g(x 1, x ) f(x 1, x ) dx 1 dx (14) In each case, we only consider the expectation value to be defined if the relevant sum or integral converges absolutely, ie, if E( g(x 1, X ) ) < Note that the expectation value is still linear, ie, E[k 1 g 1 (X 1, X )+k g (X 1, X )] k 1 E[g 1 (X 1, X )]+k E[g (X 1, X )] (15) 131 Moment Generating Function In the case of a pair of random variables, we can define the mgf as M(t 1, t ) E ( ( ) e t 1X 1 +t X ) ( )}] t1 X1 ( ) E [exp E e tt X t X (16) where t T is the transpose of the column vector t We can get the mgf for each of the random variables from the joint mgf M X1 (t 1 ) M X1,X (t 1, ) and M X (t ) M X1,X (, t ) (17) We ll actually mostly use the mgf as an easy way to identify the distribution, but it can also be used to generate moments in the usual way: m E[X 1 m 1 X ] m 1+m M(t m 1 t1 m 1, t ) t 14 Transformations (t1,t )(,) (18) We turn now to the question of how to transform the joint distribution function under a change of variables In order for the distribution of Y 1 u 1 (X 1, X ) and Y u (X 1, X ) to carry the same information as the distribution of X 1 and X, the transformation should be invertable over the space of possible X 1 and X values, ie, we should be able to write X 1 w 1 (Y 1, Y ) and X w (Y 1, Y ) 141 Transformation of Discrete RVs For the case of a pair of discrete random variables, things are very straightforward, since p Y1,Y (y 1, y ) P ([Y 1 y 1 ] [Y y ]) P ([u 1 (X 1, X ) y 1 ] [u (X 1, X ) y ]) P ([X 1 w 1 (Y 1, Y )] [X w (Y 1, Y )]) p X1,X (w 1 (y 1, y ), w (y 1, y )) (19) 6
7 For example, suppose ( 1 x1 +x p X1,X (x 1, x ) ) x1, 1,, ; x, 1,, otherwise (13) If we define Y 1 X 1 +X and Y X 1 X then X 1 Y 1+Y and X Y 1 Y The only tricky part is figuring out the allowed set of values for Y 1 and Y We note that y 1+y and y 1 y imply that, for a given y 1, y 1 y y 1 That s not quite the whole story, though, since y 1+y and y 1 y also have to be integers, so if y 1 is odd, y must be odd, and if y 1 is even, y must be even It s easiest to see what combinations are allowed by building a table for the first few values: x 1 x (y 1, y ) (1, 1) (, ) (3, 1) (4, ) (, ) (3, 1) (4, ) (5, 1) (, ) (1, 1) (, ) (3, 3) 3 (3, 3) (4, ) (5, 1) (6, ) So evidently y 1 can be any non-negative integer, and the possible values for y count by twos from y 1 to y 1 ( 1 y1 y 1, 1,, ; p Y1,Y (y 1, y ) ) y y 1, y 1 +,, y 1, y 1 otherwise (131) 14 Transformation of Continuous RVs For the case of the transformation of continuous random variables we have to deal with the fact that f X1,X (x 1, x ) and f Y1,Y (y 1, y ) are probability densities and the volume (area) element has to be transformed from one set of variables to the other If we write f X1,X (x 1, x ) and f Y1,Y (y 1, y ) d P dy 1 dy, the transformation we ll need is d P dy 1 dy d P dx 1 dx det (x 1, x ) (y 1, y ) d P dx 1 dx (13) where we use the determinant of the Jacobian matrix (y 1, y ) (x 1, x ) ( y1 x 1 y x y y 1 x 1 x ) (133) which may be familiar from the transformation of the volume element dy 1 dy det (y 1, y ) (x 1, x ) dx 1 dx (134) if we change variables in a double integral To get a concrete handle on this, consider an example Let X and Y be continuous random variables with a joint pdf f X,Y (x, y) 4 π e x y otherwise < x < ; < y < (135) If we want to calculate the probability that X + Y < a we have to integrate over the part of this disc which lies in the first quadrant x >, y > (where the pdf is non-zero): 7
8 y a a x The limits of the x integral are determined by < x and x + y < a, ie, x < a y ; the range of y values represented can be seen from the figure to be < y < a, so we can write the probability as P (X + Y < a ) a a y 4 y e x dx dy (136) π but we can t really do the integral in this form However, if we define random variables R X + Y and 1 Φ tan 1 (Y /X), so that X R cos Φ and Y R sin Φ, we can write the probability as P (X +Y <a ) P (R<a) π/ a f R,Φ (r, φ) dr dφ (138) if we have the transformed pdf f R,Φ (r, φ) On the other hand, we know that we can write the volume element dx dy r dr dφ We 1 Note that we can only get away with using the arctangent tan 1 (y/x) as an expression for φ because x and y are both positive In general, we need to be careful; (x, y) ( 1, 1) corresponds to φ 3π/4 even though tan 1 ([ 1]/[ 1]) tan 1 (1) π/4 if we use the principal branch of the arctangent For a general point in the (x, y) plane, we d need to use the can get this either from geometry in this case, or more generally by differentiating the transformation ( ) ( ) x r cos φ (139) y r sin φ to get ( ) ( ) ( ) ( ) dx cos φ dr r sin φ dφ cos φ r sin φ dr dy sin φ dr + r cos φ dφ sin φ r cos φ dφ (14) and taking the determinant of the Jacobian matrix: (x, y) det (r, φ) cos φ r sin φ sin φ r cos φ r cos φ + r sin φ r (141) so the volume element transforms like dx dy (x, y) det (r, φ) dr dφ r dr dφ (14) Even if we knew nothing about the transformation of random variables, we could use this to change variables in the integral (136) to get a function a y 4 y e x dx dy π π/ a 4 π e r r dr dφ (143) tan 1 (y/x) π x < and y < π/ x and y < atan(y, x) tan 1 (y/x) x > π/ x and y > tan 1 (y/x) + π x < and y φ atan(y, x) to get the correct φ [ π, π) (137) 8
9 If we compare the integrands of (143) and (143) we can see that the transformed pdf must be f R,Φ (r, φ) r e r < r < ; < φ < π/ otherwise Incidentally, we can calculate the probability as P (R < a) π/ a 4 π e r r dr dφ a (144) e r r dr e r a 1 e a (145) To return to the general case, we see there are basically two things to worry about: one is the Jacobian determinant relating the volume elements in the two sets of variables, and the other is transforming the ranges of variables used to describe the event, as well as the allowed range of variables In general terms, if S is the support of the random variables X 1 and X, ie, the smallest region of R such that P [(X 1, X ) S] 1 and T is the support of Y 1 and Y, we need a transformation of the pdf f X1,X (x 1, x ) defined on S such that P [(X 1, X ) A] f X1,X (x 1, x ) dx 1 dx B A f Y1,Y (y 1, y ) dy 1 dy P [(Y 1, Y ) B] (146) where B is the image of A under the transformation, ie, (x 1, x ) A is equivalent to u 1 (x 1, x ), u (x 1, x )} B Since a change of variables in the integral gives us A f X1,X (x 1, x ) dx 1 dx f X1,X (w 1 (y 1, y ), w (y 1, y )) det (x 1, x ) (y 1, y ) dy 1 dy B we must have, in general, f Y1,Y (y 1, y ) det (x 1, x ) (y 1, y ) f X 1,X (w 1 (y 1, y ), w (y 1, y )) (147) (y 1, y ) T (148) which is the more careful way of writing the easier-to-remember formula we started with: d P dy 1 dy det (x 1, x ) d P (y 1, y ) (149) dx 1 dx Tuesday 15 September 15 Read Section 3 of Hogg Conditional Distributions 1 Conditional Probability Recall the definition of conditional probability: for events C 1 and C, P (C C 1 ) is the probability of C given C 1 If we recall that P (C) is the fraction of repeated experiments in which C is true, we can think of P (C C 1 ) as follows: restrict attention to those experiments in which C 1 is true, and take the fraction in which C is also true This conceptual definition leads to the 9
10 mathematical definition P (C C 1 ) P (C 1 C ) P (C 1 ) (1) A consequence of this definition is the multiplication rule for probabilities, P (C 1 C ) P (C C 1 )P (C 1 ) () This means that the probability of C 1 and C is the probability of C 1 times the probability of C given C 1, which makes logical sense In fact, since one often has easier access to conditional probabilities in the first place, you could start with the definition of P (C C 1 ) as the probability of C assuming C 1, and then use the multiplication rule () as one of the basic tenets of probability An extreme expression of this philosophy says that all probabilities are conditional probabilities, since you have to assume something about a model to calculate them One simple consequence of the multiplication rule is that we can write P (C 1 C ) two different ways: P (C 1 C )P (C ) P (C 1 C ) P (C C 1 )P (C 1 ) (3) dividing by P (C ) gives us Bayes s theorem P (C 1 C ) P (C C 1 )P (C 1 ) P (C 1 ) (4) which is useful if you want to calculate conditional probabilities with one condition when you know them with another condition See E T Jaynes Probability Theory: The Logic of Science for this approach Conditional Probability Distributions Given a pair of discrete random variables X 1 and X with joint pmf p X1,X (x 1, x ), we can define in a straightforward way the conditional probability that X takes on a value given a value for X 1 : p X X 1 (x, x 1 ) P (X 1 x 1 X x ) p X 1,X (x 1, x ) p X1 (x 1 ) where we ve used the marginal pmf (5) p X1 (x 1 ) x p X1,X (x 1, x ) (6) We often write p 1 (x x 1 ) as a shorthand for p X X 1 (x x 1 ) Note that conditional probability distributions are normalized just like ordinary ones: p(x 1, x ) x p 1 (x x 1 ) p(x 1, x ) p 1(x 1 ) p x x 1 (x 1 ) p 1 (x 1 ) p 1 (x 1 ) 1 (7) If we have a pair of continuous random variables with joint pdf f X1,X (x 1, x ), we d like to similarly define f 1 (x x 1 ) lim ξ P (x 1 ξ < X 1 x 1 X x ) (8) But there s a problem: since X is a continuous random variable, P (X x ), which means we can t divide by it So instead, we have to definite it as f 1 (x x 1 ) lim P (x 1 ξ 1 <X 1 x 1 x +ξ <X x ) f(x 1, x ), ξ1 f 1 (x 1 ) ξ (9) 1
11 where f 1 (x 1 ) f(x 1, x ) dx is the marginal pdf Again, the conditional pdf is properly normalized: f 1 (x x 1 ) dx f(x 1, x ) dx f 1 (x 1 ) f 1(x 1 ) f 1 (x 1 ) 1 (1) Note that f 1 (x x 1 ) is a density in x, not in x 1 This is also important in the continuous equivalent of Bayes s theorem: 1 Example f 1 (x 1 x ) f 1(x x 1 )f 1 (x 1 ) f (x ) (11) Consider continuous random variables X 1 and X with joint pdf The marginal pdf for x 1 is f 1 (x 1 ) f(x 1, x ) 6x, < x < x 1 < 1 (1) x1 so the conditional pdf is f 1 (x x 1 ) f(x 1, x ) f 1 (x 1 ) which we can see is normalized: f 1 (x x 1 ) dx 6x dx 3x 1, < x 1 < 1 (13) x, < x x < x 1 < 1 (14) 1 x1 Conditional Expectations x dx x x 1 1 x 1 1 (15) Since conditional pdfs or pmfs are just like regular probability distributions, you can also use them to define expectation values For discrete random variables X 1 and X we can define E(u(X ) X 1 x 1 ) E(u(X ) x 1 ) x u(x )p 1 (x x 1 ) and for continuous: E(u(X ) X 1 x 1 ) E(u(X ) x 1 ) if if x u(x ) p 1 (x x 1 ) < (16) This is still a linear operation, so u(x )p 1 (x x 1 ) dx u(x ) p 1 (x x 1 ) dx < (17) E(k 1 u 1 (X ) + k u (X ) x 1 ) k 1 E(u 1 (X ) x 1 ) + E(u (X ) x 1 ) (18) We can define a conditional variance by analogy to the usual variance: Var(X x 1 ) E[X E(X x 1 )] x 1 } (19) and since the conditional expectation value is linear, we have the usual shortcut Var(X x 1 ) E(X x 1 ) [E(X x 1 )] () Returning to our example, in which f 1 (x x 1 ) x, < x 1 x < x 1 < 1, we have and E(X x 1 ) E(X x 1 ) x1 x1 x x dx x 1 3 x 1 (1) x x dx x 1 1 x 1 () 11
12 so Var(X x 1 ) 1 x 1 ( ) 3 x 1 x 1 18 (3) Note that E(X x 1 ) is a function of x 1 and not a random variable But we can insert the random variable X 1 into that function, and define a random variable E(X X 1 ) which is equal to E(X x 1 ) when X 1 x 1 This random variable can also be written Note that E[E(X X 1 )] E(X X 1 ) x f 1 (X 1, x ) dx (4) E(X x 1 )f 1 (x 1 ) dx 1 x f 1 (x x 1 )f 1 (x 1 ) dx dx 1 x f(x 1, x ) dx dx 1 E[X ] (5) So E(X X 1 ) is an estimator of E(X ) It can be shown that Var[E(X X 1 )] Var(X ) (6) so E(X X 1 ) is potentially a better estimator of the mean E(X ) than X itself is (This isn t exactly a practical procedure, though, since to evaluate the function E(X x 1 ) you need the conditional probability density f 1 (x x 1 ) for all possible x ) In our specific example, since E(X x 1 ) x 3 1, E(X X 1 ) X 3 1 We can work out ( ) E[E(X x 1 )] E 3 X x 1(3x 1) dx x 1f 1 (x 1 ) dx 1 (7) and ( ) 4 E[E(X X 1 )} ] E 9 X 1 so that 1 Var[E(X X 1 )] x 1(3x 1) dx x 1f 1 (x 1 ) dx To get E(X ) and Var(X ) we need the marginal pdf f (x ) 1 (8) (9) x 6x dx 1 6x (1 x ), < x < 1 (3) from which we calculate 1 ( 1 E(X ) (x )6x (1 x ) dx ) (31) and 1 ( 1 E(X ) (x )6x (1 x ) dx ) (3) so Var(X ) (33) from which we can verify that in this case and E(X X 1 ) 1 E(X ) (34) Var[E(X X 1 )] Var(X ) (35) 1
13 Thursday 17 September 15 Read Sections 4-5 of Hogg 3 Independence Recall conditional distribution for discrete rvs X 1 and X p 1 (x x 1 ) P (X 1 x 1 X x ) P ([X 1 x 1 ] [X x ]) P (X 1 x 1 ) or for continuous rvs X 1 and X Consider the example f 1 (x x 1 ) f(x 1, x ) f(x 1 ) p(x 1, x ) p(x 1 ) (36) (37) f(x 1, x ) 6x 1 x < x 1 < 1, < x < 1 (38) The marginal pdf for X 1 is f 1 (x 1 ) 1 which makes the conditional pdf 6x 1 x dx x 1 (39) f 1 (x x 1 ) 6x 1x x 1 3x < x 1 < 1, < x < 1 (4) Note that in this case f 1 (x x 1 ) doesn t actually depend on x 1, as long as x 1 is in the support of the random variable X 1 This situation is called independence In fact, it s easy to show that in this situation f 1 (x x 1 ) f (x ), ie, the conditional pdf for X given any possible value of X 1 is the marginal pdf for X : f (x ) f 1 (x x 1 ) f(x 1, x ) dx 1 f 1 (x x 1 ) f 1 (x 1 ) dx 1 f 1 (x 1 ) dx 1 f 1 (x x 1 ) (X 1 & X indep) (41) where we can pull f 1 (x x 1 ) out of the x 1 integral because it doesn t actually depend on x 1, and we use the fact that the marginal pdf f 1 (x 1 ) is normalized We thus have the definition (X 1 & X independent) ( f 1 (x x 1 ) f (x ) for all (x 1, x ) S ) (4) This is not the most symmetric definition, and it s not immediately obvious that f 1 (x x 1 ) f (x ) implies f 1 (x 1 x ) f 1 (x 1 ) But it does because of the following result (X 1 & X independent) iff f(x 1, x ) f 1 (x 1 )f (x ) for all (x 1, x ) (43) (We don t need to specify (x 1, x ) S because we can think of f 1 (x 1 ) and f (x ) as being equal to zero if their arguments are outside their respective support spaces) It s easy enough to demonstrate (43) If we assume f(x 1, x ) f 1 (x 1 )f (x ), then f 1 (x x 1 ) f 1(x 1 )f (x ) f 1 (x 1 f ) (x ) as long as f 1 (x 1 ) Conversely, if we assume f 1 (x x 1 ) f (x ), then f(x 1, x ) f 1 (x x 1 )f 1 (x 1 ) f 1 (x 1 )f (x ) If X 1 and X are not independent, we call them dependent random variables We can consider a couple of examples of dependent rvs: 31 Dependent rv example #1 First, let Then f(x 1, x ) f 1 (x 1 ) 1 x 1 + x < x 1 < 1, < x < 1 otherwise (x 1 + x ) dx x (44) < x 1 < 1 (45) 13
14 and f 1 (x x 1 ) x 1 + x x < x 1 < 1, < x < 1 (46) which does depend on x 1, so X 1 and X are dependent 3 Dependent rv example #1 Second, return to our example from Tuesday, where and we saw f(x 1, x ) 6x, < x < x 1 < 1 (47) f 1 (x x 1 ) x, < x x < x 1 < 1 (48) 1 again, this depends on x 1, so X 1 and X are dependent 33 Factoring the joint pdf We don t have to calculate the conditional pdf to tell whether random variables are dependent or independent We can show that (X 1 & X independent) iff f(x 1, x ) g(x 1 )h(x ) for all (x 1, x ) (49) for some functions g and h The only if part is trivial; choose g(x 1 ) f 1 (x 1 ) and h(x ) f (x ) We can show the if part by assuming a factored form and working out f 1 (x 1 ) g(x 1 )h(x ) dx g(x 1 ) h(x ) dx (5) The integral h(x ) dx is just a constant, which we can call c, so we have g(x 1 ) f 1 (x 1 )/c and f(x 1, x ) f 1 (x 1 )h(x )/c Then we take f (x ) h(x ) c f 1 (x 1 ) dx 1 h(x ) c (51) which means that indeed f(x 1, x ) f 1 (x 1 )f (x ) Our two examples show ways in which the joint pdf can fail to factor In (44), x 1 + x can obviously not be written as a product of g(x 1 ) and g(x ) In (47), it s a little trickier, since it seems like we could write 6x 1 (6x 1 )(1) But the problem is the support of (47) If we took, for example, 6x 1 < x 1 < 1 g(x 1 ) (5) otherwise and we d end up with g(x 1 )h(x ) h(x ) 1 < x < 1 otherwise 6x 1 < x 1 < 1, < x < 1 otherwise (53) (54) which is not the same as the f(x 1, x ) given in (47) In general, for the factorization to work, the support of X 1 and X has to be a product space, ie, the intersection of a range of possible x 1 values with no reference to x and a range of possible x values with no reference to x 1 Some examples of product spaces are < x 1 < 1, < x < 1 1 < x 1 < 1, < x < < x 1 <, < x < < x 1 <, < x < 1 some examples of non-product spaces are < x < x 1 < 1 < x 1 < x < x 1 + x < 1 14
15 34 Expectation Values Finally we consider an important result related to expectation values Let X 1 and X be independent random variables Then the expectation value of the product of a function of each random variables is the product of their expectation values: E[u 1 (X 1 )u (X )] u 1 (x 1 )u (x ) f 1 (x 1 )f (x ) dx 1 dx ( ) ( ) u 1 (x 1 ) f 1 (x 1 ) dx 1 u (x ) f (x ) dx E[u 1 (X 1 )]E[u (X )] (X 1 & X indep) (55) In particular, the joint mgf is such an expectation value, so M(t 1, t ) E ( e t 1X 1 +t X ) E ( e t 1 X 1 ) E ( e t X ) M1 (t 1 )M (t ) M(t 1, )M(, t ) (X 1 & X indep) (56) It takes a little more work, but you can also prove the converse (see Hogg for details) so (X 1 & X independent) iff M(t 1, t ) M(t 1, )M(, t ) (57) You showed on the homework that in a particular case M(t 1, )M(, t ) M(t 1, t ); in that case the random variables were dependent because their support was not a product space 3 Covariance and Correlation Recall the definitions of the means µ X E(X) and µ Y E(Y ) (31) and variances We can define the covariance σ X Var(X) E([X µ X ] ) σ Y Var(Y ) E([Y µ Y ] ) (3a) (3b) Cov(X, Y ) E([X µ X ][Y µ Y ]) (33) Dimensionally, µ X and σ X have units of X, µ Y and σ Y have units of Y, and the covariance Cov(X, Y ) has units of XY It s useful to define a dimensionless quantity called the Correlation Coëfficient: ρ Cov(X, Y ) σ X σ Y (34) On the homework you will show that 1 ρ 1 One important result about independent random variables is that they are uncorrelated If X and Y are independent, then Cov(X, Y ) E([X µ X ][Y µ Y ]) E(X µ X )E(Y µ Y ) (µ X µ X )(µ Y µ Y ) (X & Y indep) (35) On the other hand, the converse is not true: it is still possible for the covariance of dependent variables to be zero Tuesday September 15 Read Section 6 of Hogg 4 Generalization to Several RVs Note: this week s material will not be included on Prelim Exam One 15
16 41 Linear Algebra: Reminders and Notation vector (ie, an m 1 matrix): If A is an m n matrix: A 11 A 1 A 1n A 1 A A n A A m1 A m A mn (41) w 1 w w m A 11 A 1 A 1n v 1 w Av A 1 A A n v A m1 A m A mn v n A 11 v 1 + A 1 v + + A 1n v n A 1 v 1 + A v + + A n v n (45) and B is an n p matrix, A m1 v 1 + A m v + + A mn v n B 11 B 1 B 1p B 1 B B p B B n1 B n B np (4) so that w i n j1 A ijv j If u is an n-element column vector, then u T is an n-element row vector (a 1 n matrix): u T ( u 1 u u n ) (46) then their product C AB is an m p matrix as shown in Figure 1 so that C ik n j1 A ijb jk If A is an m n matrix, B A T is an n m matrix with elements B ij A ji : B 11 B 1 B 1m A 11 A 1 A m1 B 1 B B m B A 1 A A m AT B n1 B n B nm A 1n A n A mn (44) If v is an n-element column vector (which is an n 1 matrix) and A is an m n matrix, w Av is an m-element column If u and v are n-element column vectors, u T v is a number, known as the inner product: v 1 u T v ( ) v u 1 u u n v n u 1 v 1 + u v + + u n v n n u i v i i1 (47) If v is an m-element column vector, and w is an n-element 16
17 C 11 C 1 C 1p A 11 A 1 A 1n B 11 B 1 B 1p C 1 C C p C AB A 1 A A n B 1 B B p C m1 C m C mp A m1 A m A mn B n1 B n B np A 11 B 11 + A 1 B A 1n B n1 A 11 B 1 + A 1 B + + A 1n B n A 11 B 1p + A 1 B p + + A 1n B np A 1 B 11 + A B A n B n1 A 1 B 1 + A B + + A n B n A 1 B 1p + A B p + + A n B np A m1 B 11 + A m B A mn B n1 A m1 B 1 + A m B + + A mn B n A m1 B 1p + A m B p + + A mn B np (43) Figure 1: Expansion of the product C AB to show C ik n j1 A ijb jk column vector, A vw T is an m n matrix A 11 A 1 A 1n A 1 A A n A vwt A m1 A m A mn v 1 v 1 w 1 v 1 w v 1 w n v ( ) v w 1 v w v w n w1 w w m v m v m w 1 v m w v m w n (48) so that A ij v i w j If M and N are n n matrices, the determinant det(mn) det(m) det(n) If M is an n n matrix (known as a square matrix), the inverse matrix M 1 is defined by M 1 M 1 n n MM 1 where 1 n n is the identity matrix n n 1 (49) If M 1 exists, we say M is invertible If M is a real, symmetric n n matrix, so that M T M, ie, M ji M ij, there is a set of n orthonormal eigenvectors v 1, v,, v n } with real eigenvalues λ 1, λ,, λ n }, so that Mv i λ i v i Orthonormal means vi T i j v j δ ij (41) 1 i j where we have introduced the Kronecker delta symbol δ ij The eigenvalue decomposition means n M λ i v i vi T (411) i1 17
18 The determinant is det(m) n i1 λ i If none of the eigenvalues λ i } are zero, M is invertible, and the inverse matrix is M 1 n i1 1 λ i v i v T i (41) If all of the eigenvalues λ i } are positive, we say M is positive definite If none of the eigenvalues λ i } are negative, we say M is positive semi-definite Note that these conditions are equivalent to the more common definition: M is positive definite if v T Mv > for any non-zero n-element column vector v and positive semi-definite if v T Mv for any n-element column vector v 4 Multivariate Probability Distributions Many of the definitions which we considered in detail for the case of two random variables generalize in a straightforward way to the case of n random variables It s notationally convenient to use the concept of a random vector X as in (11), so for example, the joint cdf is F X (x) P ([X 1 x 1 ] [X x ] [X n x n ]) (413) In the discrete case, the joint pmf is p X (x) P ([X 1 x 1 ] [X x ] [X n x n ]) (414) while in the continuous case the joint pdf is f X (x) n x 1 x n F X (x) lim ξ1 ξ n P ([x 1 ξ 1 < X 1 x 1 ] [x n ξ n < X n x n ]) ξ 1 ξ n (415) The probability for the random vector X to lie in some region A R n is P (X A) f X (x) dx 1 dx n (416) 41 Marginal and Conditional Distributions A One place where the multivariate case can be more complicated than the bivariate one is marginalization Given a joint distribution n random variables, you can in principle marginalize over m of them, and be left with a marginal distribution for the remaining n m variables To give a concrete example, consider three random variables X i i 1,, 3} with joint pdf 6 < x 1 < x < x 3 < 1 f X (x) (417) otherwise We can marginalize over X 3, which gives us a pdf for X 1 and X (ie, still a joint pdf); f X1 X (x 1, x ) f X1 X X 3 (x 1, x, x 3 ) dx 3 6(1 x ) < x 1 < x < 1 otherwise 1 x 6 dx 3 by similar calculations, we can marginalize over X to get x3 (418) f X1 X 3 (x 1, x 3 ) f X1 X X 3 (x 1, x, x 3 ) dx 6 dx x 1 6(x 3 x 1 ) < x 1 < x 3 < 1 otherwise (419) 18
19 or over X 1 to get f X X 3 (x, x 3 ) f X1 X X 3 (x 1, x, x 3 ) dx 1 6x < x < x 3 < 1 otherwise x 6 dx 1 (4) If we want the marginal distribution for X 1, we marginalize over both X and X 3 : f X1 (x 1 ) f X1 X X 3 (x 1, x, x 3 ) dx dx 3 1 f X1 X (x 1, x ) dx 6(1 x ) dx x 1 3 (1 x ) x 1 x x 1 3(1 x 1 ) < x 1 < 1 (41) of course, if we marginalize in the other order, we get the same result: 1 f X1 (x 1 ) f X1 X 3 (x 1, x 3 ) dx 3 6(x 3 x 1 ) dx 3 x 1 3 (x 3 x 1 ) x 31 x 3 x 1 3(1 x 1 ) < x 1 < 1 (4) Similarly, we can marginalize over X 1 and X 3 to get and f X (x ) 1 f X3 (x 3 ) x 6x dx 3 6x (1 x ) < x < 1 (43) x3 6x dx 3x 3 < x 3 < 1 (44) The various joint and marginal distributions can be combined into conditional distributions such as f 13 (x 1, x 3 x ) f 13(x 1, x, x 3 ) f (x ) and f 1 3 (x 1 x, x 3 ) f 13(x 1, x, x 3 ) f 3 (x, x 3 ) and f 1 (x 1 x ) f 1(x 1, x ) f (x ) 1 x (1 x ) < x 1 < x < x 3 < 1 (45) 1 x < x 1 < x < x 3 < 1 (46) 1 x < x 1 < x < 1 (47) 4 Mutual vs Pairwise Independence The notion of independence carries over to multivariate distributions as well We say that a set of n random variables X 1,, X n are mutually independent if we can write f(x 1, x,, x n ) f 1 (x 1 ) f (x ) f n (x n ) (48) for all x 3 You can show straightforwardly that this implies f ij (x i, x j ) f i (x i ) f j (x j ) for each i and j, ie, any pair of the random variables is independent This is known as pairwise independence However, pairwise independence doesn t imply mutual independence of the whole set There is a simple counterexample from S N Bernstein 4 Suppose you have a tetrahedron (fair four-sided die) on which one face is painted all black, 3 Except possibly at some points whose total probability is zero 4 Hogg doesn t actually give the Bernstein reference, just says this is attributed to him The original reference is to a book from 1946 which is in Russian But there is a more recent summary in Stȩpniak, The College Mathematics Journal 38, (7), which is available at wwwjstororg/stable/ if you re on campus and jstororgezproxyritedu/stable/ if you re not 19
20 one all blue, one all red, and one painted with all three colors Throw it and note the color or colors of the face it lands on Define X 1 to be 1 if the face landed on is at least partly black, if it isn t, X to be 1 if the face is at least partly blue and if not, and X 3 to be 1 if the face is part or all red, if not Any two of these random variables are independent For example, of the four faces, one is all black, one is all blue, one is both black and blue, and one is neither black nor blue, so the joint pmf for X 1 and X is x p 1 (x 1, x ) 1 p 1 (x 1 ) x 1 1/4 1/4 1/ 1 1/4 1/4 1/ p (x ) 1/ 1/ On the other hand, it is clear that p 13 (x 1, x, x 3 ) p 1 (x 1 ) p (x ) p 3 (x 3 ) since the right-hand side would be 1/8 for each combination of x 1, 1}, x, 1}, x 3, 1}, rather than the true pmf, which is 1/4 x 1, x, x 3 1 1/4 x 1, x 1, x 3 p 13 (x 1, x, x 3 ) 1/4 x 1 1, x, x 3 1/4 x 1 1, x 1, x 3 1 otherwise Thursday 4 September 15 Read Sections 7-8 of Hogg (49) Note: this week s material will not be included on Prelim Exam One 43 The Variance-Covariance Matrix Recall that the covariance of random variables X i and X j with means µ i E (X i ) and µ j E (X j ) is Cov(X i, X j ) E ([X i µ i ][X j µ j ]) (43) In particular, the covariance of one of the rvs with itself is its variance Cov(X i, X i ) E ( [X i µ i ] ) Var(X 1 ) (431) We can define a matrix Σ whose elements 5 σ ij Cov(X i, X j ) (43) are the covariances between the various rvs, with the diagonal elements being equal to the variances We can write this in matrix notation by first defining µ E (X) (433) ie, a column vector whose ith element is µ i E (X i ) Then X µ is a column vector whose ith element is X i µ i Looking at (43) and (43) we see that the elements of the matrix Σ are the expectation values of the elements of the matrix whose i, j element is [X i µ i ][X j µ j ] This matrix is X 1 µ 1 [X µ][x µ] T X µ ( ) X1 µ 1 X µ X n µ n X n µ n (434) 5 This is a really unfortunate notational choice by Hogg, if you ask me, since the variance of a single random variable is written σ, not σ, and eg, we re defining σ ii (σ i ) Among other things, if the random vector X has physical units, σ i and σ ij have different units
21 [X X 1 µ 1 µ 1 ] [X 1 µ 1 ][X µ ] [X 1 µ 1 ][X n µ n ] 1 X µ ( ) [X 1 µ 1 ][X µ ] [X µ ] [X µ ][X n µ n ] X1 µ 1 X µ X n µ n X n µ n [X 1 µ 1 ][X n µ n ] [X µ ][X n µ n ] [X n µ n ] (435) Figure : Elements of the random matrix [X µ][x µ] T ie, the outer product of the random vector X µ with itself, whose elements are spelled out in (435) in Figure (Note that this is a random matrix, a straightforward generalization of a random vector) Thus the variance-covariance matrix, which we write as Cov(X), is Cov(X) Σ E ( [X µ][x µ] T) (436) Note that Cov(X) must be a positive definite matrix This can be shown by considering v T Cov(X)v E ( v T [X µ][x µ] T v ) (437) Now, the inner product v T [X µ] is a single random variable (or, if you like, a 1 1 matrix), and so if we call that random variable Y, the fact that the inner product is symmetric tells us that [X µ] T v Y v T [X µ] (438) and so v T Cov(X)v E ( Y ) (439) 44 Transformations of Several RVs Consider the case where we have n random variables X 1, X,, X n with a joint pdf f X (x) and wish to obtain the joint pdf f Y (y) for n functions Y 1 u 1 (X), Y u (X),, Y n u n (X) of those random variables (We can summarize this as Y u(x)) For simplicity, we assume the transformation is invertible, so that we can write X 1 w 1 (Y), X w (Y),, X n w n (Y), or equivalently X w(y) (See Hogg for a discussion of the case where the transformation isn t single-valued) We refer to the sample space for X as S, and the transformation of that space (the sample space for Y ) as T Consider a subset A S whose transformation is B T Then X A is the same event as Y B, so A f X (x) dx 1 dx n P (X A) P (Y B) B f Y (y) dy 1 dy n (44) We can change variables in the first form of the integral, with the result that A is transformed into B, and the volume element becomes ( ) dx 1 dx n x det dy 1 dy n (441) y 1
22 where x y is the Jacobian matrix x y x 1 x 1 y 1 x x y 1 x n y 1 so we can write f X (x) d n x A x 1 y n x y n y y x n y x n y n B B ( ) f X (w(y)) x det d n y y f Y (y) d n y (44) (443) In order for this equality to hold for any region A, the integrands must be equal everywhere, ie, 45 Example #1 Consider the joint pdf 6 f X (x 1, x, x 3 ) f Y (y) f X (w(y)) x det y (444) 1x 1 x < x 1, x, x 3, x 1 + x + x 3 < 1 otherwise (445) 6 Incidentally, this is a real example, a Dirichlet distribution with parameters,, 1, 1} and define the transformation Y 1 Y X 1 X 1 + X + X 3 X X 1 + X + X 3 Y 3 X 1 + X + X 3 which has the inverse transformation X 1 Y 1 Y 3 X Y Y 3 X 3 (1 Y 1 Y )Y 3 (446a) (446b) (446c) (447a) (447b) (447c) The Jacobian matrix of the transformation is y x 3 y 1 y y 3 y (448) y 3 y 3 (1 y 1 y ) with determinant det x y y 3 y 1 y 3 y y 3 y 3 (1 y 1 y ) y 3(1 y 1 y ) ( y 3y 1 y 3y ) y 3 The sample space for X is (449) S < x 1, < x, < x 3, x 1 + x + x 3 < 1} (45) so we need to find the corresponding support space for Y T < y 1 y 3, < y y 3, < (1 y 1 y )y 3, y 3 < 1} (451) To see how we can satisfy this, first note that the first three conditions imply < y 3 This is because, in order to satisfy
23 the first two, we have to have y 1, y and y 3 all positive or all negative But if they are all negative then (1 y 1 y )y 3 cannot be positive Since we thus know < y 3, the first two conditions imply < y 1 and < y The third condition then implies < 1 y 1 y, ie, y 1 + y < 1 The necessary and sufficient conditions to describe the transformed sample space are thus T < y 1, < y, y 1 + y < 1, < y 3 < 1} (45) Putting it all together, we have a pdf of f Y (y 1, y, y 3 ) f X (y 1 y 3, y y 3, [1 y 1 y ]y 3 ) y3 1 y 1 y y3 4 < y 1, < y, y 1 + y < 1, < y 3 < 1 otherwise (453) 46 Example # (fewer Y s than Xs) It s also possible to consider transformations between different numbers of random variables Ie, we can start with f X1 X n (x 1,, x n ) and use it to obtain f Y1 Y m (y 1,, y m ) given transformations Y j u j (X 1,, X n ) even when m n If m > n it s a bit tricky, since the distribution for the Y j } is degenerate But if m < n, its relatively straightforward You considered the case where n and m 1 by a different approach, using the joint pdf f X1 X (x 1, x ) to obtain F Y (y) P (u(x 1, x ) y) and differentiating, but we consider a more general technique here We can make up an additional n m random variables Y m+1,, Y n }, use the expanded transformation to obtain f Y1 Y n (y 1,, y n ), and then integrate out marginalize over these n m variables to obtain the marginal pdf f Y1 Y m (y 1,, y m ) Any set of additional functions u m+1 (x 1,, x n ),, u n (x 1,, x n )} will work, as long as the full transformation we end up with is invertible As an example, return to the pdf (445), but consider the transformation Y 1 Y X 1 X 1 + X + X 3 X X 1 + X + X 3 (454a) (454b) and obtain the joint pdf f Y1 Y (y 1, y ) Well, in this case we already know a convenient choice, Y 3 X 1 + X + X 3 We have thus already done most of the problem, and just need to marginalize (453) over Y 3 to obtain the pdf 7 f Y1 Y (y 1, y ) f Y1 Y Y 3 (y 1, y, y 3 ) dy y 1 y y3 4 4 y 1 y < y 1, < y, y 1 + y < 1 dy 3 otherwise (455) Tuesday 9 September 15 Review for Prelim Exam One The exam covers materials from the first four weeks of the term, ie, Hogg sections and 1-5, and problem sets 1-4 Thursday 1 October 15 First Prelim Exam 7 This turns out to be a Dirichlet distribution as well, this time with parameters,, 1} 3
Joint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationMore on Distribution Function
More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function F X. Theorem: Let X be any random variable, with cumulative distribution
More informationSummary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016
8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying
More informationJoint Distribution of Two or More Random Variables
Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationSome Special Distributions (Hogg Chapter Three)
Some Special Distributions Hogg Chapter Three STAT 45-: Mathematical Statistics I Fall Semester 3 Contents Binomial & Related Distributions The Binomial Distribution Some advice on calculating n 3 k Properties
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationMULTIVARIATE PROBABILITY DISTRIBUTIONS
MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined
More informationLecture 3: Random variables, distributions, and transformations
Lecture 3: Random variables, distributions, and transformations Definition 1.4.1. A random variable X is a function from S into a subset of R such that for any Borel set B R {X B} = {ω S : X(ω) B} is an
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationProbability and Distributions (Hogg Chapter One)
Probability and Distributions (Hogg Chapter One) STAT 405-0: Mathematical Statistics I Fall Semester 05 Contents 0 Preliminaries 0. Administrata.................... 0. Outline........................ Review
More informationSUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)
SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationWhat is a random variable
OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More informationIEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion
IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there
More informationProbability theory. References:
Reasoning Under Uncertainty References: Probability theory Mathematical methods in artificial intelligence, Bender, Chapter 7. Expert systems: Principles and programming, g, Giarratano and Riley, pag.
More informationBivariate distributions
Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationREVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)
REVIEW OF MAIN CONCEPTS AND FORMULAS Boolean algebra of events (subsets of a sample space) DeMorgan s formula: A B = Ā B A B = Ā B The notion of conditional probability, and of mutual independence of two
More informationNotes on Mathematics Groups
EPGY Singapore Quantum Mechanics: 2007 Notes on Mathematics Groups A group, G, is defined is a set of elements G and a binary operation on G; one of the elements of G has particularly special properties
More informationRandom Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin
Random Variables Statistics 110 Summer 2006 Copyright c 2006 by Mark E. Irwin Random Variables A Random Variable (RV) is a response of a random phenomenon which is numeric. Examples: 1. Roll a die twice
More informationP (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n
JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationSome Special Distributions (Hogg Chapter Three)
Some Special Distributions Hogg Chapter Three STAT 45-: Mathematical Statistics I Fall Semester 5 Contents Binomial & Related Distributions. The Binomial Distribution............... Some advice on calculating
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationStochastic processes Lecture 1: Multiple Random Variables Ch. 5
Stochastic processes Lecture : Multiple Random Variables Ch. 5 Dr. Ir. Richard C. Hendriks 26/04/8 Delft University of Technology Challenge the future Organization Plenary Lectures Book: R.D. Yates and
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationSometimes the domains X and Z will be the same, so this might be written:
II. MULTIVARIATE CALCULUS The first lecture covered functions where a single input goes in, and a single output comes out. Most economic applications aren t so simple. In most cases, a number of variables
More informationLecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019
Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial
More informationSTAT2201. Analysis of Engineering & Scientific Data. Unit 3
STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random
More informationProbability (Devore Chapter Two)
Probability (Devore Chapter Two) 1016-345-01: Probability and Statistics for Engineers Fall 2012 Contents 0 Administrata 2 0.1 Outline....................................... 3 1 Axiomatic Probability 3
More informationBASICS OF PROBABILITY
October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,
More informationJointly Distributed Random Variables
Jointly Distributed Random Variables CE 311S What if there is more than one random variable we are interested in? How should you invest the extra money from your summer internship? To simplify matters,
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationCME 106: Review Probability theory
: Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:
More informationTopic 15 Notes Jeremy Orloff
Topic 5 Notes Jeremy Orloff 5 Transpose, Inverse, Determinant 5. Goals. Know the definition and be able to compute the inverse of any square matrix using row operations. 2. Know the properties of inverses.
More informationProbability Theory Review Reading Assignments
Probability Theory Review Reading Assignments R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (appendix A.4, hard-copy). "Everything I need to know about Probability"
More informationProbability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3
Probability Paul Schrimpf January 23, 2018 Contents 1 Definitions 2 2 Properties 3 3 Random variables 4 3.1 Discrete........................................... 4 3.2 Continuous.........................................
More informationX = X X n, + X 2
CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk
More informationChapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:
4.1 Bivariate Distributions. Chapter 4. Multivariate Distributions For a pair r.v.s (X,Y ), the Joint CDF is defined as F X,Y (x, y ) = P (X x,y y ). Obviously, the marginal distributions may be obtained
More informationMultivariate Random Variable
Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More informationThe Multivariate Normal Distribution. In this case according to our theorem
The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this
More informationNotes on Random Variables, Expectations, Probability Densities, and Martingales
Eco 315.2 Spring 2006 C.Sims Notes on Random Variables, Expectations, Probability Densities, and Martingales Includes Exercise Due Tuesday, April 4. For many or most of you, parts of these notes will be
More informationChapter 3: Random Variables 1
Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.
More informationSteve Smith Tuition: Maths Notes
Maths Notes : Discrete Random Variables Version. Steve Smith Tuition: Maths Notes e iπ + = 0 a + b = c z n+ = z n + c V E + F = Discrete Random Variables Contents Intro The Distribution of Probabilities
More information(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.
54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and
More informationReview: mostly probability and some statistics
Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random
More informationMathematics 426 Robert Gross Homework 9 Answers
Mathematics 4 Robert Gross Homework 9 Answers. Suppose that X is a normal random variable with mean µ and standard deviation σ. Suppose that PX > 9 PX
More information1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques
1 Proof techniques Here we will learn to prove universal mathematical statements, like the square of any odd number is odd. It s easy enough to show that this is true in specific cases for example, 3 2
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationMultiple Random Variables
Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An
More informationGetting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1
1 Rows first, columns second. Remember that. R then C. 1 A matrix is a set of real or complex numbers arranged in a rectangular array. They can be any size and shape (provided they are rectangular). A
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationLecture 10. (2) Functions of two variables. Partial derivatives. Dan Nichols February 27, 2018
Lecture 10 Partial derivatives Dan Nichols nichols@math.umass.edu MATH 233, Spring 2018 University of Massachusetts February 27, 2018 Last time: functions of two variables f(x, y) x and y are the independent
More informationThis exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.
TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to
More informationProbability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom
1 Learning Goals Probability: Terminology and Examples Class 2, 18.05 Jeremy Orloff and Jonathan Bloom 1. Know the definitions of sample space, event and probability function. 2. Be able to organize a
More information1.1 Limits and Continuity. Precise definition of a limit and limit laws. Squeeze Theorem. Intermediate Value Theorem. Extreme Value Theorem.
STATE EXAM MATHEMATICS Variant A ANSWERS AND SOLUTIONS 1 1.1 Limits and Continuity. Precise definition of a limit and limit laws. Squeeze Theorem. Intermediate Value Theorem. Extreme Value Theorem. Definition
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationTopics in Probability and Statistics
Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X
More informationPROBABILITY THEORY REVIEW
PROBABILITY THEORY REVIEW CMPUT 466/551 Martha White Fall, 2017 REMINDERS Assignment 1 is due on September 28 Thought questions 1 are due on September 21 Chapters 1-4, about 40 pages If you are printing,
More informationElliptically Contoured Distributions
Elliptically Contoured Distributions Recall: if X N p µ, Σ), then { 1 f X x) = exp 1 } det πσ x µ) Σ 1 x µ) So f X x) depends on x only through x µ) Σ 1 x µ), and is therefore constant on the ellipsoidal
More informationProbability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008
Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize
More informationHow to Use Calculus Like a Physicist
How to Use Calculus Like a Physicist Physics A300 Fall 2004 The purpose of these notes is to make contact between the abstract descriptions you may have seen in your calculus classes and the applications
More informationthe time it takes until a radioactive substance undergoes a decay
1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete
More informationStat 206: Sampling theory, sample moments, mahalanobis
Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because
More informationWhy should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.
I. Probability basics (Sections 4.1 and 4.2) Flip a fair (probability of HEADS is 1/2) coin ten times. What is the probability of getting exactly 5 HEADS? What is the probability of getting exactly 10
More information[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]
Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the
More information5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors
EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we
More information2 (Statistics) Random variables
2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes
More informationDiscrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations
EECS 70 Discrete Mathematics and Probability Theory Fall 204 Anant Sahai Note 5 Random Variables: Distributions, Independence, and Expectations In the last note, we saw how useful it is to have a way of
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More informationIntroduction to Stochastic Processes
Stat251/551 (Spring 2017) Stochastic Processes Lecture: 1 Introduction to Stochastic Processes Lecturer: Sahand Negahban Scribe: Sahand Negahban 1 Organization Issues We will use canvas as the course webpage.
More informationREVIEW OF DIFFERENTIAL CALCULUS
REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationBell-shaped curves, variance
November 7, 2017 Pop-in lunch on Wednesday Pop-in lunch tomorrow, November 8, at high noon. Please join our group at the Faculty Club for lunch. Means If X is a random variable with PDF equal to f (x),
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationAppendix A : Introduction to Probability and stochastic processes
A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationIAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES
IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES VARIABLE Studying the behavior of random variables, and more importantly functions of random variables is essential for both the
More information3 Multiple Discrete Random Variables
3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f
More informationStat 5101 Notes: Algorithms (thru 2nd midterm)
Stat 5101 Notes: Algorithms (thru 2nd midterm) Charles J. Geyer October 18, 2012 Contents 1 Calculating an Expectation or a Probability 2 1.1 From a PMF........................... 2 1.2 From a PDF...........................
More information7 Random samples and sampling distributions
7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces
More informationMATH Solutions to Probability Exercises
MATH 5 9 MATH 5 9 Problem. Suppose we flip a fair coin once and observe either T for tails or H for heads. Let X denote the random variable that equals when we observe tails and equals when we observe
More informationProbability. Lecture Notes. Adolfo J. Rumbos
Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................
More informationChapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables
Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results
More informationCovariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance
Covariance Lecture 0: Covariance / Correlation & General Bivariate Normal Sta30 / Mth 30 We have previously discussed Covariance in relation to the variance of the sum of two random variables Review Lecture
More informationL2: Review of probability and statistics
Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution
More informationSection 8.1. Vector Notation
Section 8.1 Vector Notation Definition 8.1 Random Vector A random vector is a column vector X = [ X 1 ]. X n Each Xi is a random variable. Definition 8.2 Vector Sample Value A sample value of a random
More informationIntroduction to bivariate analysis
Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.
More informationDiscrete Probability Refresher
ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory
More informationLecture 13: Simple Linear Regression in Matrix Format
See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix
More information1.1 Review of Probability Theory
1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,
More information2.23 Theorem. Let A and B be sets in a metric space. If A B, then L(A) L(B).
2.23 Theorem. Let A and B be sets in a metric space. If A B, then L(A) L(B). 2.24 Theorem. Let A and B be sets in a metric space. Then L(A B) = L(A) L(B). It is worth noting that you can t replace union
More information