Multivariate Distributions (Hogg Chapter Two)

Size: px
Start display at page:

Download "Multivariate Distributions (Hogg Chapter Two)"

Transcription

1 Multivariate Distributions (Hogg Chapter Two) STAT 45-1: Mathematical Statistics I Fall Semester 15 Contents 1 Multivariate Distributions 1 11 Random Vectors 111 Two Discrete Random Variables 11 Two Continuous Random Variables 4 1 Marginalization 5 13 Expectation Values Moment Generating Function 6 14 Transformations Transformation of Discrete RVs 6 14 Transformation of Continuous RVs 7 Conditional Distributions 9 1 Conditional Probability 9 Conditional Probability Distributions 1 1 Example 11 Conditional Expectations 11 3 Independence Dependent rv example # Dependent rv example #1 14 Copyright 15, John T Whelan, and all that 33 Factoring the joint pdf Expectation Values 15 3 Covariance and Correlation 15 4 Generalization to Several RVs Linear Algebra: Reminders and Notation 16 4 Multivariate Probability Distributions Marginal and Conditional Distributions 18 4 Mutual vs Pairwise Independence The Variance-Covariance Matrix 44 Transformations of Several RVs 1 45 Example #1 46 Example # (fewer Y s than Xs) 3 Tuesday 8 September 15 Read Section 1 of Hogg 1 Multivariate Distributions We introduced a random variable X as a function X(c) which assigned a real number to each outcome c in the sample space 1

2 C There s no reason, of course, that we can t define multiple such functions, and we now turn to the formalism for dealing with multiple random variables at the same time 11 Random Vectors We can think of several random variables X 1, X, X n as making up the elements of a random vector X 1 X X n X (11) We can define an event X A corresponding to the random vector X lying in some region A R n, and use the probability function to define the probability P (X A) of this event We ll focus on n initially, and define the joint cumulative distribution function of two random variables X 1 and X as F X1,X (x 1, x ) P ([X 1 x 1 ] [X x ]) (1) As in the case of a single random variable, this can be used as a starting point for defining the probability of any event we like For example, with a bit of algebra it s possible to show that P [(a 1 < X 1 b 1 ) (a < X b )] F (b 1, b ) F (b 1, a ) F (a 1, b ) + F (a 1, a ) (13) where we have suppressed the subscript X 1, X since there s only one cdf of interest at the moment We won t really dwell on this, though, since it s a lot easier to work with joint probability mass and density functions 111 Two Discrete Random Variables If both random variables are discrete, ie, they can take on either a finite set of values, or at most countably many values, the joint cdf will once again be constant aside from discontinuities, and we can describe the situation using a joint probability mass function p X1,X (x 1, x ) P [(X 1 x 1 ) (X x )] (14) We give an example of this, in which we for convenience refer to the random variables as X and Y Recall the example of three flips of a fair coin, in which we defined X as the number of heads Now let s define another random variable Y, which is the number of tails we see before the first head is flipped (If all three flips are tails, then Y is defined to be 3 We can work out the probabilities by first just enumerating all of the outcomes, which are assumed to have equal probability because the coin is fair: outcome c P (c) X value Y value HHH 1/8 3 HHT 1/8 HTH 1/8 HTT 1/8 1 THH 1/8 1 THT 1/8 1 1 TTH 1/8 1 TTT 1/8 3

3 We can look through and see that 1/8 x 3, y /8 x, y 1/8 x 1, y 1/8 x, y 1 p X,Y (x, y) 1/8 x 1, y 1 1/8 x 1, y 1/8 x, y 3 otherwise This is most easily summarized in a table: (15) out the joint cdf, but if you do, you get x < x < 1, y < 3 1/8 x < 1, 3 y 1 x, y < 1/8 1 x <, y < 1 /8 1 x <, 1 y < 3/8 1 x <, y < 3 F X,Y (x, y) 4/8 1 x <, 3 y 3/8 x < 3, y < 1 5/8 x < 3, 1 y < 6/8 x < 3, y < 3 8/8 x, 3 y 4/8 3 x, y < 1 7/8 3 x, 1 y < 3 (16) This is not very enlightening, and not really much more so if you plot it: y p X,Y (x, y) 1 3 1/8 1 1/8 1/8 1/8 x /8 1/8 3 1/8 y 3 1 1/8 4/8 3/8 /8 1/8 8/8 6/8 5/8 3/8 8/8 7/8 7/8 4/8 Note that it s a lot more convenient to work with the joint pmf than the joint cdf It takes a fair bit of concentration to work 1 3 x 3

4 So instead, we ll work with the joint pmf and use it to calculate probabilities like and P (X + Y ) p X,Y (, ) + p X,Y (1, 1) P [( < X ) (Y 1)] In general, p X,Y (1, ) + p X,Y (1, 1) + p X,Y (, ) + p X,Y (, 1) P [(X, Y ) A] (17) (18) (x,y) A 11 Two Continuous Random Variables p X,Y (x, y) (19) For example, for a rectangular region we have b ( d ) P [(a<x <b) (c<y <d)] f X,Y (x, y) dy dx (11) a We can connect the joint pdf to the joint cdf by considering the event ( < X x) ( < Y y): x ( y ) P [(<X x) (<Y y)] f X,Y (t, u) du dt, (113) where we have called the integration variables t and u rather than x and y because the latter were already in use As another example, for the event X + Y < c, where the region of integration looks like this: c c On the other hand, we may be dealing with continuous random variables, which means that the joint cdf F X,Y (x, y) is continuous Then we can proceed as before and define a probability density function by taking derivatives of the cdf In this case, since F X,Y (x, y) has multiple arguments, we take the partial derivative so that the joint pdf is f X,Y (x, y) F X,Y (x, y) x y (11) We won t worry too much about this derivative for now, since in practice, we will start with the joint pdf rather than differentiating the joint cdf to get it We can use the pdf to assign a probability for the random vector (X, Y ) to be in some region of the (x, y) plane P [(X, Y ) A] f X,Y (x, y) dx dy (111) (x,y) A we have y c c P (X + Y < c) c c x ( c x ( c y c f X,Y (x, y) dy ) dx ) f X,Y (x, y) dx dy (114) 4

5 For a more explicit demonstration of why this works, consult your notes from Multivariable Calculus (specifically Fubini s theorem) and/or Probability and/or edu/~whelan/courses/13_1sp_116_345/notes5pdf As an example, consider the joint pdf e x y < x <, < y < f(x, y) (115) otherwise and the event X < Y The region over which we need to integrate is x >, y >, x < y: y 1 1 x If we do the y integral first, the limits will be set by x/ < y <, and if we do the x integral first, they will be < x < y Doing the y integral first will give us a contribution from only one end of the integral, so let s do it that way P (X < Y ) x/ e x y dy dx e x e x/ dx e x [ e y] x/ dx e 3x/ dx 3 e 3x/ 3 (116) 1 Marginalization One of the events we can define given the probability distribution for two random variables X and Y is X x for some value x In the case of a pair of discrete random variables, this is P (X x) y p X,Y (x, y) But of course, P (X x) is just the pmf of X; we call this the marginal pmf p X (x) and define p X (x) P (X x) y p X (y) P (Y y) x p X,Y (x, y) (117) p X,Y (x, y) (118) (119) Returning to our coin-flip example, we can write the marginal pmfs for X and Y in the margins of the table: y p X,Y (x, y) 1 3 p X (x) 1/8 1/8 x 1 1/8 1/8 1/8 3/8 /8 1/8 3/8 3 1/8 1/8 p Y (y) 4/8 /8 1/8 1/8 For a pair of continuous random variables, we know that P (X x) but we can find the marginal cdf x ( ) F X (x) P (X x) f X,Y (t, y) dy dt (1) and then take the derivative to get the marginal pdf and likewise f X (x) F X(x) f Y (y) f X,Y (x, y) dy (11) f X,Y (x, y) dx (1) 5

6 The act of summing or integrating over arguments we don t care about, in order to get a marginal probability distribution, is called marginalizing Thursday 1 September 15 Read Section of Hogg 13 Expectation Values We can define the expectation value of a function of two discrete random variables in the straightforward way E[g(X 1, X )] x 1,x g(x 1, x ) p(x 1, x ) (13) and for two continuous random variables E[g(X 1, X )] g(x 1, x ) f(x 1, x ) dx 1 dx (14) In each case, we only consider the expectation value to be defined if the relevant sum or integral converges absolutely, ie, if E( g(x 1, X ) ) < Note that the expectation value is still linear, ie, E[k 1 g 1 (X 1, X )+k g (X 1, X )] k 1 E[g 1 (X 1, X )]+k E[g (X 1, X )] (15) 131 Moment Generating Function In the case of a pair of random variables, we can define the mgf as M(t 1, t ) E ( ( ) e t 1X 1 +t X ) ( )}] t1 X1 ( ) E [exp E e tt X t X (16) where t T is the transpose of the column vector t We can get the mgf for each of the random variables from the joint mgf M X1 (t 1 ) M X1,X (t 1, ) and M X (t ) M X1,X (, t ) (17) We ll actually mostly use the mgf as an easy way to identify the distribution, but it can also be used to generate moments in the usual way: m E[X 1 m 1 X ] m 1+m M(t m 1 t1 m 1, t ) t 14 Transformations (t1,t )(,) (18) We turn now to the question of how to transform the joint distribution function under a change of variables In order for the distribution of Y 1 u 1 (X 1, X ) and Y u (X 1, X ) to carry the same information as the distribution of X 1 and X, the transformation should be invertable over the space of possible X 1 and X values, ie, we should be able to write X 1 w 1 (Y 1, Y ) and X w (Y 1, Y ) 141 Transformation of Discrete RVs For the case of a pair of discrete random variables, things are very straightforward, since p Y1,Y (y 1, y ) P ([Y 1 y 1 ] [Y y ]) P ([u 1 (X 1, X ) y 1 ] [u (X 1, X ) y ]) P ([X 1 w 1 (Y 1, Y )] [X w (Y 1, Y )]) p X1,X (w 1 (y 1, y ), w (y 1, y )) (19) 6

7 For example, suppose ( 1 x1 +x p X1,X (x 1, x ) ) x1, 1,, ; x, 1,, otherwise (13) If we define Y 1 X 1 +X and Y X 1 X then X 1 Y 1+Y and X Y 1 Y The only tricky part is figuring out the allowed set of values for Y 1 and Y We note that y 1+y and y 1 y imply that, for a given y 1, y 1 y y 1 That s not quite the whole story, though, since y 1+y and y 1 y also have to be integers, so if y 1 is odd, y must be odd, and if y 1 is even, y must be even It s easiest to see what combinations are allowed by building a table for the first few values: x 1 x (y 1, y ) (1, 1) (, ) (3, 1) (4, ) (, ) (3, 1) (4, ) (5, 1) (, ) (1, 1) (, ) (3, 3) 3 (3, 3) (4, ) (5, 1) (6, ) So evidently y 1 can be any non-negative integer, and the possible values for y count by twos from y 1 to y 1 ( 1 y1 y 1, 1,, ; p Y1,Y (y 1, y ) ) y y 1, y 1 +,, y 1, y 1 otherwise (131) 14 Transformation of Continuous RVs For the case of the transformation of continuous random variables we have to deal with the fact that f X1,X (x 1, x ) and f Y1,Y (y 1, y ) are probability densities and the volume (area) element has to be transformed from one set of variables to the other If we write f X1,X (x 1, x ) and f Y1,Y (y 1, y ) d P dy 1 dy, the transformation we ll need is d P dy 1 dy d P dx 1 dx det (x 1, x ) (y 1, y ) d P dx 1 dx (13) where we use the determinant of the Jacobian matrix (y 1, y ) (x 1, x ) ( y1 x 1 y x y y 1 x 1 x ) (133) which may be familiar from the transformation of the volume element dy 1 dy det (y 1, y ) (x 1, x ) dx 1 dx (134) if we change variables in a double integral To get a concrete handle on this, consider an example Let X and Y be continuous random variables with a joint pdf f X,Y (x, y) 4 π e x y otherwise < x < ; < y < (135) If we want to calculate the probability that X + Y < a we have to integrate over the part of this disc which lies in the first quadrant x >, y > (where the pdf is non-zero): 7

8 y a a x The limits of the x integral are determined by < x and x + y < a, ie, x < a y ; the range of y values represented can be seen from the figure to be < y < a, so we can write the probability as P (X + Y < a ) a a y 4 y e x dx dy (136) π but we can t really do the integral in this form However, if we define random variables R X + Y and 1 Φ tan 1 (Y /X), so that X R cos Φ and Y R sin Φ, we can write the probability as P (X +Y <a ) P (R<a) π/ a f R,Φ (r, φ) dr dφ (138) if we have the transformed pdf f R,Φ (r, φ) On the other hand, we know that we can write the volume element dx dy r dr dφ We 1 Note that we can only get away with using the arctangent tan 1 (y/x) as an expression for φ because x and y are both positive In general, we need to be careful; (x, y) ( 1, 1) corresponds to φ 3π/4 even though tan 1 ([ 1]/[ 1]) tan 1 (1) π/4 if we use the principal branch of the arctangent For a general point in the (x, y) plane, we d need to use the can get this either from geometry in this case, or more generally by differentiating the transformation ( ) ( ) x r cos φ (139) y r sin φ to get ( ) ( ) ( ) ( ) dx cos φ dr r sin φ dφ cos φ r sin φ dr dy sin φ dr + r cos φ dφ sin φ r cos φ dφ (14) and taking the determinant of the Jacobian matrix: (x, y) det (r, φ) cos φ r sin φ sin φ r cos φ r cos φ + r sin φ r (141) so the volume element transforms like dx dy (x, y) det (r, φ) dr dφ r dr dφ (14) Even if we knew nothing about the transformation of random variables, we could use this to change variables in the integral (136) to get a function a y 4 y e x dx dy π π/ a 4 π e r r dr dφ (143) tan 1 (y/x) π x < and y < π/ x and y < atan(y, x) tan 1 (y/x) x > π/ x and y > tan 1 (y/x) + π x < and y φ atan(y, x) to get the correct φ [ π, π) (137) 8

9 If we compare the integrands of (143) and (143) we can see that the transformed pdf must be f R,Φ (r, φ) r e r < r < ; < φ < π/ otherwise Incidentally, we can calculate the probability as P (R < a) π/ a 4 π e r r dr dφ a (144) e r r dr e r a 1 e a (145) To return to the general case, we see there are basically two things to worry about: one is the Jacobian determinant relating the volume elements in the two sets of variables, and the other is transforming the ranges of variables used to describe the event, as well as the allowed range of variables In general terms, if S is the support of the random variables X 1 and X, ie, the smallest region of R such that P [(X 1, X ) S] 1 and T is the support of Y 1 and Y, we need a transformation of the pdf f X1,X (x 1, x ) defined on S such that P [(X 1, X ) A] f X1,X (x 1, x ) dx 1 dx B A f Y1,Y (y 1, y ) dy 1 dy P [(Y 1, Y ) B] (146) where B is the image of A under the transformation, ie, (x 1, x ) A is equivalent to u 1 (x 1, x ), u (x 1, x )} B Since a change of variables in the integral gives us A f X1,X (x 1, x ) dx 1 dx f X1,X (w 1 (y 1, y ), w (y 1, y )) det (x 1, x ) (y 1, y ) dy 1 dy B we must have, in general, f Y1,Y (y 1, y ) det (x 1, x ) (y 1, y ) f X 1,X (w 1 (y 1, y ), w (y 1, y )) (147) (y 1, y ) T (148) which is the more careful way of writing the easier-to-remember formula we started with: d P dy 1 dy det (x 1, x ) d P (y 1, y ) (149) dx 1 dx Tuesday 15 September 15 Read Section 3 of Hogg Conditional Distributions 1 Conditional Probability Recall the definition of conditional probability: for events C 1 and C, P (C C 1 ) is the probability of C given C 1 If we recall that P (C) is the fraction of repeated experiments in which C is true, we can think of P (C C 1 ) as follows: restrict attention to those experiments in which C 1 is true, and take the fraction in which C is also true This conceptual definition leads to the 9

10 mathematical definition P (C C 1 ) P (C 1 C ) P (C 1 ) (1) A consequence of this definition is the multiplication rule for probabilities, P (C 1 C ) P (C C 1 )P (C 1 ) () This means that the probability of C 1 and C is the probability of C 1 times the probability of C given C 1, which makes logical sense In fact, since one often has easier access to conditional probabilities in the first place, you could start with the definition of P (C C 1 ) as the probability of C assuming C 1, and then use the multiplication rule () as one of the basic tenets of probability An extreme expression of this philosophy says that all probabilities are conditional probabilities, since you have to assume something about a model to calculate them One simple consequence of the multiplication rule is that we can write P (C 1 C ) two different ways: P (C 1 C )P (C ) P (C 1 C ) P (C C 1 )P (C 1 ) (3) dividing by P (C ) gives us Bayes s theorem P (C 1 C ) P (C C 1 )P (C 1 ) P (C 1 ) (4) which is useful if you want to calculate conditional probabilities with one condition when you know them with another condition See E T Jaynes Probability Theory: The Logic of Science for this approach Conditional Probability Distributions Given a pair of discrete random variables X 1 and X with joint pmf p X1,X (x 1, x ), we can define in a straightforward way the conditional probability that X takes on a value given a value for X 1 : p X X 1 (x, x 1 ) P (X 1 x 1 X x ) p X 1,X (x 1, x ) p X1 (x 1 ) where we ve used the marginal pmf (5) p X1 (x 1 ) x p X1,X (x 1, x ) (6) We often write p 1 (x x 1 ) as a shorthand for p X X 1 (x x 1 ) Note that conditional probability distributions are normalized just like ordinary ones: p(x 1, x ) x p 1 (x x 1 ) p(x 1, x ) p 1(x 1 ) p x x 1 (x 1 ) p 1 (x 1 ) p 1 (x 1 ) 1 (7) If we have a pair of continuous random variables with joint pdf f X1,X (x 1, x ), we d like to similarly define f 1 (x x 1 ) lim ξ P (x 1 ξ < X 1 x 1 X x ) (8) But there s a problem: since X is a continuous random variable, P (X x ), which means we can t divide by it So instead, we have to definite it as f 1 (x x 1 ) lim P (x 1 ξ 1 <X 1 x 1 x +ξ <X x ) f(x 1, x ), ξ1 f 1 (x 1 ) ξ (9) 1

11 where f 1 (x 1 ) f(x 1, x ) dx is the marginal pdf Again, the conditional pdf is properly normalized: f 1 (x x 1 ) dx f(x 1, x ) dx f 1 (x 1 ) f 1(x 1 ) f 1 (x 1 ) 1 (1) Note that f 1 (x x 1 ) is a density in x, not in x 1 This is also important in the continuous equivalent of Bayes s theorem: 1 Example f 1 (x 1 x ) f 1(x x 1 )f 1 (x 1 ) f (x ) (11) Consider continuous random variables X 1 and X with joint pdf The marginal pdf for x 1 is f 1 (x 1 ) f(x 1, x ) 6x, < x < x 1 < 1 (1) x1 so the conditional pdf is f 1 (x x 1 ) f(x 1, x ) f 1 (x 1 ) which we can see is normalized: f 1 (x x 1 ) dx 6x dx 3x 1, < x 1 < 1 (13) x, < x x < x 1 < 1 (14) 1 x1 Conditional Expectations x dx x x 1 1 x 1 1 (15) Since conditional pdfs or pmfs are just like regular probability distributions, you can also use them to define expectation values For discrete random variables X 1 and X we can define E(u(X ) X 1 x 1 ) E(u(X ) x 1 ) x u(x )p 1 (x x 1 ) and for continuous: E(u(X ) X 1 x 1 ) E(u(X ) x 1 ) if if x u(x ) p 1 (x x 1 ) < (16) This is still a linear operation, so u(x )p 1 (x x 1 ) dx u(x ) p 1 (x x 1 ) dx < (17) E(k 1 u 1 (X ) + k u (X ) x 1 ) k 1 E(u 1 (X ) x 1 ) + E(u (X ) x 1 ) (18) We can define a conditional variance by analogy to the usual variance: Var(X x 1 ) E[X E(X x 1 )] x 1 } (19) and since the conditional expectation value is linear, we have the usual shortcut Var(X x 1 ) E(X x 1 ) [E(X x 1 )] () Returning to our example, in which f 1 (x x 1 ) x, < x 1 x < x 1 < 1, we have and E(X x 1 ) E(X x 1 ) x1 x1 x x dx x 1 3 x 1 (1) x x dx x 1 1 x 1 () 11

12 so Var(X x 1 ) 1 x 1 ( ) 3 x 1 x 1 18 (3) Note that E(X x 1 ) is a function of x 1 and not a random variable But we can insert the random variable X 1 into that function, and define a random variable E(X X 1 ) which is equal to E(X x 1 ) when X 1 x 1 This random variable can also be written Note that E[E(X X 1 )] E(X X 1 ) x f 1 (X 1, x ) dx (4) E(X x 1 )f 1 (x 1 ) dx 1 x f 1 (x x 1 )f 1 (x 1 ) dx dx 1 x f(x 1, x ) dx dx 1 E[X ] (5) So E(X X 1 ) is an estimator of E(X ) It can be shown that Var[E(X X 1 )] Var(X ) (6) so E(X X 1 ) is potentially a better estimator of the mean E(X ) than X itself is (This isn t exactly a practical procedure, though, since to evaluate the function E(X x 1 ) you need the conditional probability density f 1 (x x 1 ) for all possible x ) In our specific example, since E(X x 1 ) x 3 1, E(X X 1 ) X 3 1 We can work out ( ) E[E(X x 1 )] E 3 X x 1(3x 1) dx x 1f 1 (x 1 ) dx 1 (7) and ( ) 4 E[E(X X 1 )} ] E 9 X 1 so that 1 Var[E(X X 1 )] x 1(3x 1) dx x 1f 1 (x 1 ) dx To get E(X ) and Var(X ) we need the marginal pdf f (x ) 1 (8) (9) x 6x dx 1 6x (1 x ), < x < 1 (3) from which we calculate 1 ( 1 E(X ) (x )6x (1 x ) dx ) (31) and 1 ( 1 E(X ) (x )6x (1 x ) dx ) (3) so Var(X ) (33) from which we can verify that in this case and E(X X 1 ) 1 E(X ) (34) Var[E(X X 1 )] Var(X ) (35) 1

13 Thursday 17 September 15 Read Sections 4-5 of Hogg 3 Independence Recall conditional distribution for discrete rvs X 1 and X p 1 (x x 1 ) P (X 1 x 1 X x ) P ([X 1 x 1 ] [X x ]) P (X 1 x 1 ) or for continuous rvs X 1 and X Consider the example f 1 (x x 1 ) f(x 1, x ) f(x 1 ) p(x 1, x ) p(x 1 ) (36) (37) f(x 1, x ) 6x 1 x < x 1 < 1, < x < 1 (38) The marginal pdf for X 1 is f 1 (x 1 ) 1 which makes the conditional pdf 6x 1 x dx x 1 (39) f 1 (x x 1 ) 6x 1x x 1 3x < x 1 < 1, < x < 1 (4) Note that in this case f 1 (x x 1 ) doesn t actually depend on x 1, as long as x 1 is in the support of the random variable X 1 This situation is called independence In fact, it s easy to show that in this situation f 1 (x x 1 ) f (x ), ie, the conditional pdf for X given any possible value of X 1 is the marginal pdf for X : f (x ) f 1 (x x 1 ) f(x 1, x ) dx 1 f 1 (x x 1 ) f 1 (x 1 ) dx 1 f 1 (x 1 ) dx 1 f 1 (x x 1 ) (X 1 & X indep) (41) where we can pull f 1 (x x 1 ) out of the x 1 integral because it doesn t actually depend on x 1, and we use the fact that the marginal pdf f 1 (x 1 ) is normalized We thus have the definition (X 1 & X independent) ( f 1 (x x 1 ) f (x ) for all (x 1, x ) S ) (4) This is not the most symmetric definition, and it s not immediately obvious that f 1 (x x 1 ) f (x ) implies f 1 (x 1 x ) f 1 (x 1 ) But it does because of the following result (X 1 & X independent) iff f(x 1, x ) f 1 (x 1 )f (x ) for all (x 1, x ) (43) (We don t need to specify (x 1, x ) S because we can think of f 1 (x 1 ) and f (x ) as being equal to zero if their arguments are outside their respective support spaces) It s easy enough to demonstrate (43) If we assume f(x 1, x ) f 1 (x 1 )f (x ), then f 1 (x x 1 ) f 1(x 1 )f (x ) f 1 (x 1 f ) (x ) as long as f 1 (x 1 ) Conversely, if we assume f 1 (x x 1 ) f (x ), then f(x 1, x ) f 1 (x x 1 )f 1 (x 1 ) f 1 (x 1 )f (x ) If X 1 and X are not independent, we call them dependent random variables We can consider a couple of examples of dependent rvs: 31 Dependent rv example #1 First, let Then f(x 1, x ) f 1 (x 1 ) 1 x 1 + x < x 1 < 1, < x < 1 otherwise (x 1 + x ) dx x (44) < x 1 < 1 (45) 13

14 and f 1 (x x 1 ) x 1 + x x < x 1 < 1, < x < 1 (46) which does depend on x 1, so X 1 and X are dependent 3 Dependent rv example #1 Second, return to our example from Tuesday, where and we saw f(x 1, x ) 6x, < x < x 1 < 1 (47) f 1 (x x 1 ) x, < x x < x 1 < 1 (48) 1 again, this depends on x 1, so X 1 and X are dependent 33 Factoring the joint pdf We don t have to calculate the conditional pdf to tell whether random variables are dependent or independent We can show that (X 1 & X independent) iff f(x 1, x ) g(x 1 )h(x ) for all (x 1, x ) (49) for some functions g and h The only if part is trivial; choose g(x 1 ) f 1 (x 1 ) and h(x ) f (x ) We can show the if part by assuming a factored form and working out f 1 (x 1 ) g(x 1 )h(x ) dx g(x 1 ) h(x ) dx (5) The integral h(x ) dx is just a constant, which we can call c, so we have g(x 1 ) f 1 (x 1 )/c and f(x 1, x ) f 1 (x 1 )h(x )/c Then we take f (x ) h(x ) c f 1 (x 1 ) dx 1 h(x ) c (51) which means that indeed f(x 1, x ) f 1 (x 1 )f (x ) Our two examples show ways in which the joint pdf can fail to factor In (44), x 1 + x can obviously not be written as a product of g(x 1 ) and g(x ) In (47), it s a little trickier, since it seems like we could write 6x 1 (6x 1 )(1) But the problem is the support of (47) If we took, for example, 6x 1 < x 1 < 1 g(x 1 ) (5) otherwise and we d end up with g(x 1 )h(x ) h(x ) 1 < x < 1 otherwise 6x 1 < x 1 < 1, < x < 1 otherwise (53) (54) which is not the same as the f(x 1, x ) given in (47) In general, for the factorization to work, the support of X 1 and X has to be a product space, ie, the intersection of a range of possible x 1 values with no reference to x and a range of possible x values with no reference to x 1 Some examples of product spaces are < x 1 < 1, < x < 1 1 < x 1 < 1, < x < < x 1 <, < x < < x 1 <, < x < 1 some examples of non-product spaces are < x < x 1 < 1 < x 1 < x < x 1 + x < 1 14

15 34 Expectation Values Finally we consider an important result related to expectation values Let X 1 and X be independent random variables Then the expectation value of the product of a function of each random variables is the product of their expectation values: E[u 1 (X 1 )u (X )] u 1 (x 1 )u (x ) f 1 (x 1 )f (x ) dx 1 dx ( ) ( ) u 1 (x 1 ) f 1 (x 1 ) dx 1 u (x ) f (x ) dx E[u 1 (X 1 )]E[u (X )] (X 1 & X indep) (55) In particular, the joint mgf is such an expectation value, so M(t 1, t ) E ( e t 1X 1 +t X ) E ( e t 1 X 1 ) E ( e t X ) M1 (t 1 )M (t ) M(t 1, )M(, t ) (X 1 & X indep) (56) It takes a little more work, but you can also prove the converse (see Hogg for details) so (X 1 & X independent) iff M(t 1, t ) M(t 1, )M(, t ) (57) You showed on the homework that in a particular case M(t 1, )M(, t ) M(t 1, t ); in that case the random variables were dependent because their support was not a product space 3 Covariance and Correlation Recall the definitions of the means µ X E(X) and µ Y E(Y ) (31) and variances We can define the covariance σ X Var(X) E([X µ X ] ) σ Y Var(Y ) E([Y µ Y ] ) (3a) (3b) Cov(X, Y ) E([X µ X ][Y µ Y ]) (33) Dimensionally, µ X and σ X have units of X, µ Y and σ Y have units of Y, and the covariance Cov(X, Y ) has units of XY It s useful to define a dimensionless quantity called the Correlation Coëfficient: ρ Cov(X, Y ) σ X σ Y (34) On the homework you will show that 1 ρ 1 One important result about independent random variables is that they are uncorrelated If X and Y are independent, then Cov(X, Y ) E([X µ X ][Y µ Y ]) E(X µ X )E(Y µ Y ) (µ X µ X )(µ Y µ Y ) (X & Y indep) (35) On the other hand, the converse is not true: it is still possible for the covariance of dependent variables to be zero Tuesday September 15 Read Section 6 of Hogg 4 Generalization to Several RVs Note: this week s material will not be included on Prelim Exam One 15

16 41 Linear Algebra: Reminders and Notation vector (ie, an m 1 matrix): If A is an m n matrix: A 11 A 1 A 1n A 1 A A n A A m1 A m A mn (41) w 1 w w m A 11 A 1 A 1n v 1 w Av A 1 A A n v A m1 A m A mn v n A 11 v 1 + A 1 v + + A 1n v n A 1 v 1 + A v + + A n v n (45) and B is an n p matrix, A m1 v 1 + A m v + + A mn v n B 11 B 1 B 1p B 1 B B p B B n1 B n B np (4) so that w i n j1 A ijv j If u is an n-element column vector, then u T is an n-element row vector (a 1 n matrix): u T ( u 1 u u n ) (46) then their product C AB is an m p matrix as shown in Figure 1 so that C ik n j1 A ijb jk If A is an m n matrix, B A T is an n m matrix with elements B ij A ji : B 11 B 1 B 1m A 11 A 1 A m1 B 1 B B m B A 1 A A m AT B n1 B n B nm A 1n A n A mn (44) If v is an n-element column vector (which is an n 1 matrix) and A is an m n matrix, w Av is an m-element column If u and v are n-element column vectors, u T v is a number, known as the inner product: v 1 u T v ( ) v u 1 u u n v n u 1 v 1 + u v + + u n v n n u i v i i1 (47) If v is an m-element column vector, and w is an n-element 16

17 C 11 C 1 C 1p A 11 A 1 A 1n B 11 B 1 B 1p C 1 C C p C AB A 1 A A n B 1 B B p C m1 C m C mp A m1 A m A mn B n1 B n B np A 11 B 11 + A 1 B A 1n B n1 A 11 B 1 + A 1 B + + A 1n B n A 11 B 1p + A 1 B p + + A 1n B np A 1 B 11 + A B A n B n1 A 1 B 1 + A B + + A n B n A 1 B 1p + A B p + + A n B np A m1 B 11 + A m B A mn B n1 A m1 B 1 + A m B + + A mn B n A m1 B 1p + A m B p + + A mn B np (43) Figure 1: Expansion of the product C AB to show C ik n j1 A ijb jk column vector, A vw T is an m n matrix A 11 A 1 A 1n A 1 A A n A vwt A m1 A m A mn v 1 v 1 w 1 v 1 w v 1 w n v ( ) v w 1 v w v w n w1 w w m v m v m w 1 v m w v m w n (48) so that A ij v i w j If M and N are n n matrices, the determinant det(mn) det(m) det(n) If M is an n n matrix (known as a square matrix), the inverse matrix M 1 is defined by M 1 M 1 n n MM 1 where 1 n n is the identity matrix n n 1 (49) If M 1 exists, we say M is invertible If M is a real, symmetric n n matrix, so that M T M, ie, M ji M ij, there is a set of n orthonormal eigenvectors v 1, v,, v n } with real eigenvalues λ 1, λ,, λ n }, so that Mv i λ i v i Orthonormal means vi T i j v j δ ij (41) 1 i j where we have introduced the Kronecker delta symbol δ ij The eigenvalue decomposition means n M λ i v i vi T (411) i1 17

18 The determinant is det(m) n i1 λ i If none of the eigenvalues λ i } are zero, M is invertible, and the inverse matrix is M 1 n i1 1 λ i v i v T i (41) If all of the eigenvalues λ i } are positive, we say M is positive definite If none of the eigenvalues λ i } are negative, we say M is positive semi-definite Note that these conditions are equivalent to the more common definition: M is positive definite if v T Mv > for any non-zero n-element column vector v and positive semi-definite if v T Mv for any n-element column vector v 4 Multivariate Probability Distributions Many of the definitions which we considered in detail for the case of two random variables generalize in a straightforward way to the case of n random variables It s notationally convenient to use the concept of a random vector X as in (11), so for example, the joint cdf is F X (x) P ([X 1 x 1 ] [X x ] [X n x n ]) (413) In the discrete case, the joint pmf is p X (x) P ([X 1 x 1 ] [X x ] [X n x n ]) (414) while in the continuous case the joint pdf is f X (x) n x 1 x n F X (x) lim ξ1 ξ n P ([x 1 ξ 1 < X 1 x 1 ] [x n ξ n < X n x n ]) ξ 1 ξ n (415) The probability for the random vector X to lie in some region A R n is P (X A) f X (x) dx 1 dx n (416) 41 Marginal and Conditional Distributions A One place where the multivariate case can be more complicated than the bivariate one is marginalization Given a joint distribution n random variables, you can in principle marginalize over m of them, and be left with a marginal distribution for the remaining n m variables To give a concrete example, consider three random variables X i i 1,, 3} with joint pdf 6 < x 1 < x < x 3 < 1 f X (x) (417) otherwise We can marginalize over X 3, which gives us a pdf for X 1 and X (ie, still a joint pdf); f X1 X (x 1, x ) f X1 X X 3 (x 1, x, x 3 ) dx 3 6(1 x ) < x 1 < x < 1 otherwise 1 x 6 dx 3 by similar calculations, we can marginalize over X to get x3 (418) f X1 X 3 (x 1, x 3 ) f X1 X X 3 (x 1, x, x 3 ) dx 6 dx x 1 6(x 3 x 1 ) < x 1 < x 3 < 1 otherwise (419) 18

19 or over X 1 to get f X X 3 (x, x 3 ) f X1 X X 3 (x 1, x, x 3 ) dx 1 6x < x < x 3 < 1 otherwise x 6 dx 1 (4) If we want the marginal distribution for X 1, we marginalize over both X and X 3 : f X1 (x 1 ) f X1 X X 3 (x 1, x, x 3 ) dx dx 3 1 f X1 X (x 1, x ) dx 6(1 x ) dx x 1 3 (1 x ) x 1 x x 1 3(1 x 1 ) < x 1 < 1 (41) of course, if we marginalize in the other order, we get the same result: 1 f X1 (x 1 ) f X1 X 3 (x 1, x 3 ) dx 3 6(x 3 x 1 ) dx 3 x 1 3 (x 3 x 1 ) x 31 x 3 x 1 3(1 x 1 ) < x 1 < 1 (4) Similarly, we can marginalize over X 1 and X 3 to get and f X (x ) 1 f X3 (x 3 ) x 6x dx 3 6x (1 x ) < x < 1 (43) x3 6x dx 3x 3 < x 3 < 1 (44) The various joint and marginal distributions can be combined into conditional distributions such as f 13 (x 1, x 3 x ) f 13(x 1, x, x 3 ) f (x ) and f 1 3 (x 1 x, x 3 ) f 13(x 1, x, x 3 ) f 3 (x, x 3 ) and f 1 (x 1 x ) f 1(x 1, x ) f (x ) 1 x (1 x ) < x 1 < x < x 3 < 1 (45) 1 x < x 1 < x < x 3 < 1 (46) 1 x < x 1 < x < 1 (47) 4 Mutual vs Pairwise Independence The notion of independence carries over to multivariate distributions as well We say that a set of n random variables X 1,, X n are mutually independent if we can write f(x 1, x,, x n ) f 1 (x 1 ) f (x ) f n (x n ) (48) for all x 3 You can show straightforwardly that this implies f ij (x i, x j ) f i (x i ) f j (x j ) for each i and j, ie, any pair of the random variables is independent This is known as pairwise independence However, pairwise independence doesn t imply mutual independence of the whole set There is a simple counterexample from S N Bernstein 4 Suppose you have a tetrahedron (fair four-sided die) on which one face is painted all black, 3 Except possibly at some points whose total probability is zero 4 Hogg doesn t actually give the Bernstein reference, just says this is attributed to him The original reference is to a book from 1946 which is in Russian But there is a more recent summary in Stȩpniak, The College Mathematics Journal 38, (7), which is available at wwwjstororg/stable/ if you re on campus and jstororgezproxyritedu/stable/ if you re not 19

20 one all blue, one all red, and one painted with all three colors Throw it and note the color or colors of the face it lands on Define X 1 to be 1 if the face landed on is at least partly black, if it isn t, X to be 1 if the face is at least partly blue and if not, and X 3 to be 1 if the face is part or all red, if not Any two of these random variables are independent For example, of the four faces, one is all black, one is all blue, one is both black and blue, and one is neither black nor blue, so the joint pmf for X 1 and X is x p 1 (x 1, x ) 1 p 1 (x 1 ) x 1 1/4 1/4 1/ 1 1/4 1/4 1/ p (x ) 1/ 1/ On the other hand, it is clear that p 13 (x 1, x, x 3 ) p 1 (x 1 ) p (x ) p 3 (x 3 ) since the right-hand side would be 1/8 for each combination of x 1, 1}, x, 1}, x 3, 1}, rather than the true pmf, which is 1/4 x 1, x, x 3 1 1/4 x 1, x 1, x 3 p 13 (x 1, x, x 3 ) 1/4 x 1 1, x, x 3 1/4 x 1 1, x 1, x 3 1 otherwise Thursday 4 September 15 Read Sections 7-8 of Hogg (49) Note: this week s material will not be included on Prelim Exam One 43 The Variance-Covariance Matrix Recall that the covariance of random variables X i and X j with means µ i E (X i ) and µ j E (X j ) is Cov(X i, X j ) E ([X i µ i ][X j µ j ]) (43) In particular, the covariance of one of the rvs with itself is its variance Cov(X i, X i ) E ( [X i µ i ] ) Var(X 1 ) (431) We can define a matrix Σ whose elements 5 σ ij Cov(X i, X j ) (43) are the covariances between the various rvs, with the diagonal elements being equal to the variances We can write this in matrix notation by first defining µ E (X) (433) ie, a column vector whose ith element is µ i E (X i ) Then X µ is a column vector whose ith element is X i µ i Looking at (43) and (43) we see that the elements of the matrix Σ are the expectation values of the elements of the matrix whose i, j element is [X i µ i ][X j µ j ] This matrix is X 1 µ 1 [X µ][x µ] T X µ ( ) X1 µ 1 X µ X n µ n X n µ n (434) 5 This is a really unfortunate notational choice by Hogg, if you ask me, since the variance of a single random variable is written σ, not σ, and eg, we re defining σ ii (σ i ) Among other things, if the random vector X has physical units, σ i and σ ij have different units

21 [X X 1 µ 1 µ 1 ] [X 1 µ 1 ][X µ ] [X 1 µ 1 ][X n µ n ] 1 X µ ( ) [X 1 µ 1 ][X µ ] [X µ ] [X µ ][X n µ n ] X1 µ 1 X µ X n µ n X n µ n [X 1 µ 1 ][X n µ n ] [X µ ][X n µ n ] [X n µ n ] (435) Figure : Elements of the random matrix [X µ][x µ] T ie, the outer product of the random vector X µ with itself, whose elements are spelled out in (435) in Figure (Note that this is a random matrix, a straightforward generalization of a random vector) Thus the variance-covariance matrix, which we write as Cov(X), is Cov(X) Σ E ( [X µ][x µ] T) (436) Note that Cov(X) must be a positive definite matrix This can be shown by considering v T Cov(X)v E ( v T [X µ][x µ] T v ) (437) Now, the inner product v T [X µ] is a single random variable (or, if you like, a 1 1 matrix), and so if we call that random variable Y, the fact that the inner product is symmetric tells us that [X µ] T v Y v T [X µ] (438) and so v T Cov(X)v E ( Y ) (439) 44 Transformations of Several RVs Consider the case where we have n random variables X 1, X,, X n with a joint pdf f X (x) and wish to obtain the joint pdf f Y (y) for n functions Y 1 u 1 (X), Y u (X),, Y n u n (X) of those random variables (We can summarize this as Y u(x)) For simplicity, we assume the transformation is invertible, so that we can write X 1 w 1 (Y), X w (Y),, X n w n (Y), or equivalently X w(y) (See Hogg for a discussion of the case where the transformation isn t single-valued) We refer to the sample space for X as S, and the transformation of that space (the sample space for Y ) as T Consider a subset A S whose transformation is B T Then X A is the same event as Y B, so A f X (x) dx 1 dx n P (X A) P (Y B) B f Y (y) dy 1 dy n (44) We can change variables in the first form of the integral, with the result that A is transformed into B, and the volume element becomes ( ) dx 1 dx n x det dy 1 dy n (441) y 1

22 where x y is the Jacobian matrix x y x 1 x 1 y 1 x x y 1 x n y 1 so we can write f X (x) d n x A x 1 y n x y n y y x n y x n y n B B ( ) f X (w(y)) x det d n y y f Y (y) d n y (44) (443) In order for this equality to hold for any region A, the integrands must be equal everywhere, ie, 45 Example #1 Consider the joint pdf 6 f X (x 1, x, x 3 ) f Y (y) f X (w(y)) x det y (444) 1x 1 x < x 1, x, x 3, x 1 + x + x 3 < 1 otherwise (445) 6 Incidentally, this is a real example, a Dirichlet distribution with parameters,, 1, 1} and define the transformation Y 1 Y X 1 X 1 + X + X 3 X X 1 + X + X 3 Y 3 X 1 + X + X 3 which has the inverse transformation X 1 Y 1 Y 3 X Y Y 3 X 3 (1 Y 1 Y )Y 3 (446a) (446b) (446c) (447a) (447b) (447c) The Jacobian matrix of the transformation is y x 3 y 1 y y 3 y (448) y 3 y 3 (1 y 1 y ) with determinant det x y y 3 y 1 y 3 y y 3 y 3 (1 y 1 y ) y 3(1 y 1 y ) ( y 3y 1 y 3y ) y 3 The sample space for X is (449) S < x 1, < x, < x 3, x 1 + x + x 3 < 1} (45) so we need to find the corresponding support space for Y T < y 1 y 3, < y y 3, < (1 y 1 y )y 3, y 3 < 1} (451) To see how we can satisfy this, first note that the first three conditions imply < y 3 This is because, in order to satisfy

23 the first two, we have to have y 1, y and y 3 all positive or all negative But if they are all negative then (1 y 1 y )y 3 cannot be positive Since we thus know < y 3, the first two conditions imply < y 1 and < y The third condition then implies < 1 y 1 y, ie, y 1 + y < 1 The necessary and sufficient conditions to describe the transformed sample space are thus T < y 1, < y, y 1 + y < 1, < y 3 < 1} (45) Putting it all together, we have a pdf of f Y (y 1, y, y 3 ) f X (y 1 y 3, y y 3, [1 y 1 y ]y 3 ) y3 1 y 1 y y3 4 < y 1, < y, y 1 + y < 1, < y 3 < 1 otherwise (453) 46 Example # (fewer Y s than Xs) It s also possible to consider transformations between different numbers of random variables Ie, we can start with f X1 X n (x 1,, x n ) and use it to obtain f Y1 Y m (y 1,, y m ) given transformations Y j u j (X 1,, X n ) even when m n If m > n it s a bit tricky, since the distribution for the Y j } is degenerate But if m < n, its relatively straightforward You considered the case where n and m 1 by a different approach, using the joint pdf f X1 X (x 1, x ) to obtain F Y (y) P (u(x 1, x ) y) and differentiating, but we consider a more general technique here We can make up an additional n m random variables Y m+1,, Y n }, use the expanded transformation to obtain f Y1 Y n (y 1,, y n ), and then integrate out marginalize over these n m variables to obtain the marginal pdf f Y1 Y m (y 1,, y m ) Any set of additional functions u m+1 (x 1,, x n ),, u n (x 1,, x n )} will work, as long as the full transformation we end up with is invertible As an example, return to the pdf (445), but consider the transformation Y 1 Y X 1 X 1 + X + X 3 X X 1 + X + X 3 (454a) (454b) and obtain the joint pdf f Y1 Y (y 1, y ) Well, in this case we already know a convenient choice, Y 3 X 1 + X + X 3 We have thus already done most of the problem, and just need to marginalize (453) over Y 3 to obtain the pdf 7 f Y1 Y (y 1, y ) f Y1 Y Y 3 (y 1, y, y 3 ) dy y 1 y y3 4 4 y 1 y < y 1, < y, y 1 + y < 1 dy 3 otherwise (455) Tuesday 9 September 15 Review for Prelim Exam One The exam covers materials from the first four weeks of the term, ie, Hogg sections and 1-5, and problem sets 1-4 Thursday 1 October 15 First Prelim Exam 7 This turns out to be a Dirichlet distribution as well, this time with parameters,, 1} 3

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

More on Distribution Function

More on Distribution Function More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function F X. Theorem: Let X be any random variable, with cumulative distribution

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

Joint Distribution of Two or More Random Variables

Joint Distribution of Two or More Random Variables Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Some Special Distributions (Hogg Chapter Three)

Some Special Distributions (Hogg Chapter Three) Some Special Distributions Hogg Chapter Three STAT 45-: Mathematical Statistics I Fall Semester 3 Contents Binomial & Related Distributions The Binomial Distribution Some advice on calculating n 3 k Properties

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Lecture 3: Random variables, distributions, and transformations

Lecture 3: Random variables, distributions, and transformations Lecture 3: Random variables, distributions, and transformations Definition 1.4.1. A random variable X is a function from S into a subset of R such that for any Borel set B R {X B} = {ω S : X(ω) B} is an

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Probability and Distributions (Hogg Chapter One)

Probability and Distributions (Hogg Chapter One) Probability and Distributions (Hogg Chapter One) STAT 405-0: Mathematical Statistics I Fall Semester 05 Contents 0 Preliminaries 0. Administrata.................... 0. Outline........................ Review

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

What is a random variable

What is a random variable OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there

More information

Probability theory. References:

Probability theory. References: Reasoning Under Uncertainty References: Probability theory Mathematical methods in artificial intelligence, Bender, Chapter 7. Expert systems: Principles and programming, g, Giarratano and Riley, pag.

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B) REVIEW OF MAIN CONCEPTS AND FORMULAS Boolean algebra of events (subsets of a sample space) DeMorgan s formula: A B = Ā B A B = Ā B The notion of conditional probability, and of mutual independence of two

More information

Notes on Mathematics Groups

Notes on Mathematics Groups EPGY Singapore Quantum Mechanics: 2007 Notes on Mathematics Groups A group, G, is defined is a set of elements G and a binary operation on G; one of the elements of G has particularly special properties

More information

Random Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin

Random Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin Random Variables Statistics 110 Summer 2006 Copyright c 2006 by Mark E. Irwin Random Variables A Random Variable (RV) is a response of a random phenomenon which is numeric. Examples: 1. Roll a die twice

More information

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Some Special Distributions (Hogg Chapter Three)

Some Special Distributions (Hogg Chapter Three) Some Special Distributions Hogg Chapter Three STAT 45-: Mathematical Statistics I Fall Semester 5 Contents Binomial & Related Distributions. The Binomial Distribution............... Some advice on calculating

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

Stochastic processes Lecture 1: Multiple Random Variables Ch. 5

Stochastic processes Lecture 1: Multiple Random Variables Ch. 5 Stochastic processes Lecture : Multiple Random Variables Ch. 5 Dr. Ir. Richard C. Hendriks 26/04/8 Delft University of Technology Challenge the future Organization Plenary Lectures Book: R.D. Yates and

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Sometimes the domains X and Z will be the same, so this might be written:

Sometimes the domains X and Z will be the same, so this might be written: II. MULTIVARIATE CALCULUS The first lecture covered functions where a single input goes in, and a single output comes out. Most economic applications aren t so simple. In most cases, a number of variables

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Probability (Devore Chapter Two)

Probability (Devore Chapter Two) Probability (Devore Chapter Two) 1016-345-01: Probability and Statistics for Engineers Fall 2012 Contents 0 Administrata 2 0.1 Outline....................................... 3 1 Axiomatic Probability 3

More information

BASICS OF PROBABILITY

BASICS OF PROBABILITY October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,

More information

Jointly Distributed Random Variables

Jointly Distributed Random Variables Jointly Distributed Random Variables CE 311S What if there is more than one random variable we are interested in? How should you invest the extra money from your summer internship? To simplify matters,

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

CME 106: Review Probability theory

CME 106: Review Probability theory : Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:

More information

Topic 15 Notes Jeremy Orloff

Topic 15 Notes Jeremy Orloff Topic 5 Notes Jeremy Orloff 5 Transpose, Inverse, Determinant 5. Goals. Know the definition and be able to compute the inverse of any square matrix using row operations. 2. Know the properties of inverses.

More information

Probability Theory Review Reading Assignments

Probability Theory Review Reading Assignments Probability Theory Review Reading Assignments R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (appendix A.4, hard-copy). "Everything I need to know about Probability"

More information

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3 Probability Paul Schrimpf January 23, 2018 Contents 1 Definitions 2 2 Properties 3 3 Random variables 4 3.1 Discrete........................................... 4 3.2 Continuous.........................................

More information

X = X X n, + X 2

X = X X n, + X 2 CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk

More information

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution: 4.1 Bivariate Distributions. Chapter 4. Multivariate Distributions For a pair r.v.s (X,Y ), the Joint CDF is defined as F X,Y (x, y ) = P (X x,y y ). Obviously, the marginal distributions may be obtained

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

Notes on Random Variables, Expectations, Probability Densities, and Martingales

Notes on Random Variables, Expectations, Probability Densities, and Martingales Eco 315.2 Spring 2006 C.Sims Notes on Random Variables, Expectations, Probability Densities, and Martingales Includes Exercise Due Tuesday, April 4. For many or most of you, parts of these notes will be

More information

Chapter 3: Random Variables 1

Chapter 3: Random Variables 1 Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

Steve Smith Tuition: Maths Notes

Steve Smith Tuition: Maths Notes Maths Notes : Discrete Random Variables Version. Steve Smith Tuition: Maths Notes e iπ + = 0 a + b = c z n+ = z n + c V E + F = Discrete Random Variables Contents Intro The Distribution of Probabilities

More information

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise. 54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

Mathematics 426 Robert Gross Homework 9 Answers

Mathematics 426 Robert Gross Homework 9 Answers Mathematics 4 Robert Gross Homework 9 Answers. Suppose that X is a normal random variable with mean µ and standard deviation σ. Suppose that PX > 9 PX

More information

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques 1 Proof techniques Here we will learn to prove universal mathematical statements, like the square of any odd number is odd. It s easy enough to show that this is true in specific cases for example, 3 2

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

Getting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1

Getting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1 1 Rows first, columns second. Remember that. R then C. 1 A matrix is a set of real or complex numbers arranged in a rectangular array. They can be any size and shape (provided they are rectangular). A

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Lecture 10. (2) Functions of two variables. Partial derivatives. Dan Nichols February 27, 2018

Lecture 10. (2) Functions of two variables. Partial derivatives. Dan Nichols February 27, 2018 Lecture 10 Partial derivatives Dan Nichols nichols@math.umass.edu MATH 233, Spring 2018 University of Massachusetts February 27, 2018 Last time: functions of two variables f(x, y) x and y are the independent

More information

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text. TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to

More information

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Probability: Terminology and Examples Class 2, 18.05 Jeremy Orloff and Jonathan Bloom 1. Know the definitions of sample space, event and probability function. 2. Be able to organize a

More information

1.1 Limits and Continuity. Precise definition of a limit and limit laws. Squeeze Theorem. Intermediate Value Theorem. Extreme Value Theorem.

1.1 Limits and Continuity. Precise definition of a limit and limit laws. Squeeze Theorem. Intermediate Value Theorem. Extreme Value Theorem. STATE EXAM MATHEMATICS Variant A ANSWERS AND SOLUTIONS 1 1.1 Limits and Continuity. Precise definition of a limit and limit laws. Squeeze Theorem. Intermediate Value Theorem. Extreme Value Theorem. Definition

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Topics in Probability and Statistics

Topics in Probability and Statistics Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X

More information

PROBABILITY THEORY REVIEW

PROBABILITY THEORY REVIEW PROBABILITY THEORY REVIEW CMPUT 466/551 Martha White Fall, 2017 REMINDERS Assignment 1 is due on September 28 Thought questions 1 are due on September 21 Chapters 1-4, about 40 pages If you are printing,

More information

Elliptically Contoured Distributions

Elliptically Contoured Distributions Elliptically Contoured Distributions Recall: if X N p µ, Σ), then { 1 f X x) = exp 1 } det πσ x µ) Σ 1 x µ) So f X x) depends on x only through x µ) Σ 1 x µ), and is therefore constant on the ellipsoidal

More information

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize

More information

How to Use Calculus Like a Physicist

How to Use Calculus Like a Physicist How to Use Calculus Like a Physicist Physics A300 Fall 2004 The purpose of these notes is to make contact between the abstract descriptions you may have seen in your calculus classes and the applications

More information

the time it takes until a radioactive substance undergoes a decay

the time it takes until a radioactive substance undergoes a decay 1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4. I. Probability basics (Sections 4.1 and 4.2) Flip a fair (probability of HEADS is 1/2) coin ten times. What is the probability of getting exactly 5 HEADS? What is the probability of getting exactly 10

More information

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.] Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the

More information

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we

More information

2 (Statistics) Random variables

2 (Statistics) Random variables 2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes

More information

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations EECS 70 Discrete Mathematics and Probability Theory Fall 204 Anant Sahai Note 5 Random Variables: Distributions, Independence, and Expectations In the last note, we saw how useful it is to have a way of

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Introduction to Stochastic Processes

Introduction to Stochastic Processes Stat251/551 (Spring 2017) Stochastic Processes Lecture: 1 Introduction to Stochastic Processes Lecturer: Sahand Negahban Scribe: Sahand Negahban 1 Organization Issues We will use canvas as the course webpage.

More information

REVIEW OF DIFFERENTIAL CALCULUS

REVIEW OF DIFFERENTIAL CALCULUS REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Bell-shaped curves, variance

Bell-shaped curves, variance November 7, 2017 Pop-in lunch on Wednesday Pop-in lunch tomorrow, November 8, at high noon. Please join our group at the Faculty Club for lunch. Means If X is a random variable with PDF equal to f (x),

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Appendix A : Introduction to Probability and stochastic processes

Appendix A : Introduction to Probability and stochastic processes A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES VARIABLE Studying the behavior of random variables, and more importantly functions of random variables is essential for both the

More information

3 Multiple Discrete Random Variables

3 Multiple Discrete Random Variables 3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f

More information

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Stat 5101 Notes: Algorithms (thru 2nd midterm) Stat 5101 Notes: Algorithms (thru 2nd midterm) Charles J. Geyer October 18, 2012 Contents 1 Calculating an Expectation or a Probability 2 1.1 From a PMF........................... 2 1.2 From a PDF...........................

More information

7 Random samples and sampling distributions

7 Random samples and sampling distributions 7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces

More information

MATH Solutions to Probability Exercises

MATH Solutions to Probability Exercises MATH 5 9 MATH 5 9 Problem. Suppose we flip a fair coin once and observe either T for tails or H for heads. Let X denote the random variable that equals when we observe tails and equals when we observe

More information

Probability. Lecture Notes. Adolfo J. Rumbos

Probability. Lecture Notes. Adolfo J. Rumbos Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance Covariance Lecture 0: Covariance / Correlation & General Bivariate Normal Sta30 / Mth 30 We have previously discussed Covariance in relation to the variance of the sum of two random variables Review Lecture

More information

L2: Review of probability and statistics

L2: Review of probability and statistics Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution

More information

Section 8.1. Vector Notation

Section 8.1. Vector Notation Section 8.1 Vector Notation Definition 8.1 Random Vector A random vector is a column vector X = [ X 1 ]. X n Each Xi is a random variable. Definition 8.2 Vector Sample Value A sample value of a random

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Lecture 13: Simple Linear Regression in Matrix Format

Lecture 13: Simple Linear Regression in Matrix Format See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix

More information

1.1 Review of Probability Theory

1.1 Review of Probability Theory 1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,

More information

2.23 Theorem. Let A and B be sets in a metric space. If A B, then L(A) L(B).

2.23 Theorem. Let A and B be sets in a metric space. If A B, then L(A) L(B). 2.23 Theorem. Let A and B be sets in a metric space. If A B, then L(A) L(B). 2.24 Theorem. Let A and B be sets in a metric space. Then L(A B) = L(A) L(B). It is worth noting that you can t replace union

More information