2 Conditioning. 1 Conditional Distributions

Size: px
Start display at page:

Download "2 Conditioning. 1 Conditional Distributions"

Transcription

1 Conditioning 1 Conditional Distributions Let A and B be events, and suppose that P (B) >. We recall from Section 3 of the Introduction that the conditional probability of A given B is defined as P (A B) P (A B)/P (B) and that P (A B) P (A) if A and B are independent. Now, let (X, Y ) be a two-dimensional random variable whose components are discrete. Example 1.1. A symmetric die is thrown twice. Let U 1 be a random variable denoting the number of dots on the first throw, let U be a random variable denoting the number of dots on the second throw, and set X U 1 + U and Y minu 1, U }. Suppose we wish to find the distribution of Y for some given value of X, for example, P (Y X 7). Set A Y } and B X 7}. From the definition of conditional probabilities we obtain P (Y X 7) P (A B) P (A B) P (B) With this method one may compute P (Y y X x) for any fixed value of x as y varies for arbitrary, discrete, jointly distributed random variables. This leads to the following definition. Definition 1.1. Let X and Y be discrete, jointly distributed random variables. For P (X x) > the conditional probability function of Y given that X x is p Y Xx (y) P (Y y X x) p X,Y (x, y), p X (x) and the conditional distribution function of Y given that X x is A. Gut, An Intermediate course in Probabilty, Springer Texts in Statistics, DOI: 1.17/ _, Springer Science + Business Media, LLC 9 31

2 3 Conditioning F Y Xx (y) z y p Y Xx (z). Exercise 1.1. Show that p Y Xx (y) is a probability function of a true probability distribution. It follows immediately (please check) that and that p Y Xx (y) p X,Y (x, y) p X (x) F Y Xx (y) p X,Y (x, z) z y p X (x) p X,Y (x, y) p X,Y (x, z) z p X,Y (x, z) z y p X,Y (x, z). Exercise 1.. Compute the conditional probability function p Y Xx (y) and the conditional distribution function F Y Xx (y) in Example 1.1. Now let X and Y have a joint continuous distribution. Expressions like P (Y y X x) have no meaning in this case, since the probability that a fixed value is assumed equals zero. However, an examination of how the preceding conditional probabilities are computed makes the following definition very natural. Definition 1.. Let X and Y have a joint continuous distribution. For f X (x) >, the conditional density function of Y given that X x is f Y Xx (y) f X,Y (x, y) f X (x) and the conditional distribution function of Y given that X x is F Y Xx (y) y z f Y Xx (z) dz. In analogy with the discrete case, we further have, and f Y Xx (y) F Y Xx (y) y f X,Y (x, y) f X,Y (x, z) dz f X,Y (x, z) dz. f X,Y (x, z) dz

3 Conditional Expectation and Conditional Variance 33 Exercise 1.3. Show that f Y Xx (y) is a density function of a true probability distribution. Exercise 1.4. Find the conditional distribution of Y given that X x in Example and Exercise Exercise 1.5. Prove that if X and Y are independent then the conditional distributions and the unconditional distributions are the same. Explain why this is reasonable. Remark 1.1. Definitions 1.1 and 1. can be extended to situations with more than two random variables. How? Conditional Expectation and Conditional Variance In the same vein as the concepts of expected value and variance are introduced as convenient location and dispersion measures for (ordinary) random variables or distributions, it is natural to introduce analogs to these concepts for conditional distributions. The following example shows how such notions enter naturally. Example.1. A stick of length one is broken at a random point, uniformly distributed over the stick. The remaining piece is broken once more. Find the expected value and variance of the piece that now remains. In order to solve this problem we let X U(, 1) be the first remaining piece. The second remaining piece Y is uniformly distributed on the interval (, X). This is to be interpreted as follows: Given that X x, the random variable Y is uniformly distributed on the interval (, x): Y X x U(, x), that is, f Y Xx (y) 1/x for < y < x and, otherwise. Clearly, E X 1/ and Var X 1/1. Furthermore, intuition suggests that E(Y X x) x and Var(Y X x) x 1. (.1) We wish to determine E Y and Var Y somehow with the aid of the preceding relations. We are now ready to state our first definition. Definition.1. Let X and Y be jointly distributed random variables. The conditional expectation of Y given that X x is y p Y Xx (y) in the discrete case, y E(Y X x) y f Y Xx (y) dy in the continuous case, provided the relevant sum or integral is absolutely convergent.

4 34 Conditioning Exercise.1. Let X, Y, Y 1, and Y be random variables, let g be a function, and c a constant. Show that (a) E(c X x) c, (b) E(Y 1 + Y X x) E(Y 1 X x) + E(Y X x), (c) E(cY X x) c E(Y X x), (d) E(g(X, Y ) X x) E(g(x, Y ) X x), (e) E(Y X x) E Y if X and Y are independent. The conditional distribution of Y given that X x depends on the value of x (unless X and Y are independent). This implies that the conditional expectation E(Y X x) is a function of x, that is, E(Y X x) h(x) (.) for some function h. (If X and Y are independent, then check that h(x) E Y, a constant.) An object of considerable interest and importance is the random variable h(x), which we denote by h(x) E(Y X). (.3) This random variable is of interest not only in the context of probability theory (as we shall see later) but also in statistics in connection with estimation. Loosely speaking, it turns out that if Y is a good estimator and X is suitably chosen, then E(Y X) is a better estimator. Technically, given a so-called unbiased estimator U of a parameter θ, it is possible to construct another unbiased estimator V by considering the conditional expectation of U with respect to what is called a sufficient statistic T ; that is, V E(U T ). The point is that E U E V θ (unbiasedness) and that Var V Var U (this follows essentially from the sufficiency and Theorem.3 ahead). For details, we refer to the statistics literature provided in Appendix A. A natural question at this point is: What is the expected value of the random variable E(Y X)? Theorem.1. Suppose that E Y <. Then E ( E(Y X) ) E Y. Proof. We prove the theorem for the continuous case and leave the (completely analogous) proof for the discrete case as an exercise. E ( E(Y X) ) E h(x) h(x) f X (x) dx E(Y X x) f X (x) dx ( ) y f Y Xx (y) dy f X (x) dx

5 Conditional Expectation and Conditional Variance 35 y f X,Y (x, y) f X (x) dy dx f X (x) ( ) y f X,Y (x, y) dx dy y f X,Y (x, y) dy dx y f Y (y) dy E Y. Remark.1. Theorem.1 can be interpreted as an expectation version of the law of total probability. Remark.. Clearly, E Y must exist in order for Theorem.1 to make sense, that is, the corresponding sum or integral must be absolutely convergent. Now, given this assumption, one can show that E(E(Y X)) exists and is finite and that the computations in the proof, such as reversing orders of integration, are permitted. We shall, in the sequel, permit ourselves at times to be somewhat sloppy about such verifications. Analogous remarks apply to further results ahead. We close this remark by pointing out that the conclusion always holds in case Y is nonnegative, in the sense that if one of the members is infinite, then so is the other. Exercise.. The object of this exercise is to show that if we do not assume that E Y < in Theorem.1, then the conclusion does not necessarily hold. Namely, suppose that X Γ(1/, ) ( χ (1)) and that f Y Xx (y) 1 π x 1 e 1 xy, < y <. (a) Compute E(Y X x), E(Y X), and, finally, E(E(Y X)). (b) Show that Y C(, 1). (c) What about E Y? We are now able to find E Y in Example.1. Example.1 (continued). It follows from the definition that the first part of (.1) holds: E(Y X x) x, that is, h(x) x. An application of Theorem.1 now yields E Y E ( E(Y X) ) ( 1 ) E h(x) E X 1 E X We have thus determined E Y without prior knowledge about the distribution of Y. Exercise.3. Find the expectation of the remaining piece after it has been broken off n times.

6 36 Conditioning Remark.3. That the result E Y 1/4 is reasonable can intuitively be seen from the fact that X on average equals 1/ and that Y on average equals half the value of X, that is 1/ of 1/. The proof of Theorem.1 consists, in fact, of a stringent version of this kind of argument. Theorem.. Let X and Y be random variables and g be a function. We have (a) E ( g(x)y X ) g(x) E(Y X), and (b) E(Y X) E Y if X and Y are independent. Exercise.4. Prove Theorem.. Remark.4. Conditioning with respect to X means that X should be interpreted as known, and, hence, g(x) as a constant that thus may be moved in front of the expectation (recall Exercise.1(a)). This explains why Theorem.(a) should hold. Part (b) follows from the fact that the conditional distribution and the unconditional distribution coincide if X and Y are independent; in particular, this should remain true for the conditional expectation and the unconditional expectation (recall Exercises 1.5 and.1(e)). A natural problem is to find the variance of the remaining piece Y in Example.1, which, in turn, suggests the introduction of the concept of conditional variance. Definition.. Let X and Y have a joint distribution. The conditional variance of Y given that X x is Var(Y X x) E ( (Y E(Y X x)) X x ), provided the corresponding sum or integral is absolutely convergent. The conditional variance is (also) a function of x; call it v(x). The corresponding random variable is The following result is fundamental. v(x) Var(Y X). (.4) Theorem.3. Let X and Y be random variables and g a real-valued function. If E Y < and E ( g(x) ) <, then E ( Y g(x) ) E Var(Y X) + E ( E(Y X) g(x) ). Proof. An expansion of the left-hand side yields E ( Y g(x) ) E ( Y E(Y X) + E(Y X) g(x) ) E ( Y E(Y X) ) + E ( Y E(Y X) )( E(Y X) g(x) ) + E ( E(Y X) g(x) ).

7 Conditional Expectation and Conditional Variance 37 Using Theorem.1, the right-hand side becomes E E ( (Y E(Y X)) X ) + E E ( (Y E(Y X)) (E(Y X) g(x)) X ) + E ( E(Y X) g(x) ) E Var(Y X) + E (E(Y X) g(x)) E(Y E(Y X) X) } + E ( E(Y X) g(x) ) by Theorem.(a). Finally, since E(Y E(Y X) X), this equals E Var(Y X) + E (E(Y X) g(x)) } + E ( E(Y X) g(x) ), which was to be proved. The particular choice g(x) E Y, together with an application of Theorem.1, yields the following corollary: Corollary.3.1. Suppose that E Y <. Then Var Y E Var (Y X) + Var ( E(Y X) ). Example.1 (continued). Let us determine Var Y with the aid of Corollary.3.1. It follows from second part of formula (.1) that Var(Y X x) 1 1 x, and hence, v(x) 1 1 X, so that ( 1 E Var(Y X) E v(x) E 1 X) Furthermore, Var ( E(Y X) ) ( 1 ) Var(h(X)) Var X 1 4 Var(X) An application of Corollary.3.1 finally yields Var Y 1/36 + 1/48 7/144. We have thus computed Var Y without knowing the distribution of Y. Exercise.5. Find the distribution of Y in Example.1, and verify the values of E Y and Var Y obtained above. A discrete variant of Example.1 is the following: Let X be uniformly distributed over the numbers 1,,..., 6 (that is, throw a symmetric die) and let Y be uniformly distributed over the numbers 1,,..., X (that is, then throw a symmetric die with X faces). In this case, h(x) E(Y X x) 1 + x, from which it follows that ( ) 1 + X E Y E h(x) E 1 (1 + E X) 1 ( ).5. The computation of Var Y is somewhat more elaborate. We leave the details to the reader.

8 38 Conditioning 3 Distributions with Random Parameters We begin with two examples: Example 3.1. Suppose that the density X of red blood corpuscles in humans follows a Poisson distribution whose parameter depends on the observed individual. This means that for Jürg we have X Po(m J ), where m J is Jürg s parameter value, while for Alice we have X Po(m A ), where m A is Alice s parameter value. For a person selected at random we may consider the parameter value M as a random variable such that, given that M m, we have X Po(m); namely, P (X k M m) e m mk, k, 1,,.... (3.1) k! Thus, if we know that Alice was chosen, then P (X k M m A ) e ma m k A /k!, for k, 1,,..., as before. We shall soon see that X itself (unconditioned) need not follow a Poisson distribution. Example 3.. A radioactive substance emits α-particles in such a way that the number of emitted particles during an hour, N, follows a Po(λ)-distribution. The particle counter, however, is somewhat unreliable in the sense that an emitted particle is registered with probability p ( < p < 1), whereas it remains unregistered with probability q 1 p. All particles are registered independently of each other. This means that if we know that n particles were emitted during a specific hour, then the number of registered particles X Bin(n, p), that is, ( ) n P (X k N n) p k q n k, k, 1,..., n (3.) k (and N Po(λ)). If, however, we observe the process during an arbitrarily chosen hour, it follows, as will be seen below, that the number of registered particles does not follow a binomial distribution (but instead a Poisson distribution). The common feature in these examples is that the random variable under consideration, X, has a known distribution but with a parameter that is a random variable. Somewhat imprecisely, we might say that in Example 3.1 we have X Po(M), where M follows some distribution, and that in Example 3. we have X Bin(N, p), where N Po(λ). We prefer, however, to describe these cases as X M m Po(m) with M F, (3.3) where F is some distribution, and respectively. X N n Bin(n, p) with N Po(λ), (3.4)

9 3 Distributions with Random Parameters 39 Let us now determine the (unconditional) distributions of X in our examples, where, in Example 3.1, we assume that M Exp(1). Example 3.1 (continued). We thus have X M m Po(m) with M Exp(1). (3.5) By (the continuous version of) the law of total probability, we obtain, for k, 1,,..., P (X k) 1 k+1 P (X k M x) f M (x) dx x xk e 1 k ( 1 k! x k e x dx k! e x dx 1 Γ(k + 1) k+1 x k+1 1 e x dx ) k, that is, X Ge(1/). The unconditional distribution in this case thus is not a Poisson distribution; it is a geometric distribution. Exercise 3.1. Determine the distribution of X if M has (a) an Exp(a)-distribution, (b) a Γ(p, a)-distribution. Note also that we may use the formulas from Section to compute E X and Var X without knowing the distribution of X. Namely, since E(X M m) m (i.e., h(m) E(X M) M), Theorem.1 yields and Corollary.3.1 yields E X E ( E(X M) ) E M 1, Var X E Var(X M) + Var ( E(X M) ) E M + Var M If, however, the distribution has been determined (as above), the formulas from Section may be used for checking. If applied to Exercise 3.1(a), the latter formulas yield E X a and Var X a+a. Since this situation differs from Example 3.1 only by a rescaling of M, one might perhaps guess that the solution is another geometric distribution. If this were true, we would have E X a q p 1 p p 1 p 1; p 1 a + 1. This value of p inserted in the expression for the variance yields

10 4 Conditioning q p 1 p p 1 p 1 p (a + 1) (a + 1) a + a, which coincides with our computations above and provides the guess that X Ge(1/(a + 1)). Remark 3.1. In Example 3.1 we used the results of Section. to confirm our result. In Exercise 3.1(a) they were used to confirm (provide) a guess. We now turn to the α-particles. Example 3. (continued). Intuitively, the deficiency of the particle counter implies that the radiation actually measured is, on average, a fraction p of the original Poisson stream of particles. We might therefore expect that the number of registered particles during one hour should be a Po(λp)-distributed random variable. That this is actually correct is verified next. The model implies that X N n Bin(n, p) with N Po(λ). The law of total probability yields, for k, 1,,..., P (X k) P (X k N n) P (N n) n nk pk k! e λ (λp)k k! ( n )p k q n k λ λn e k n! nk e λ λ n (n k)! qn k (λp)k k! j (λq) j j! e λ nk (λq) n k (n k)! (λp)k e λ e λq λp (λp)k e, k! k! that is, X Po(λp). The unconditional distribution thus is not a binomial distribution; it is a Poisson distribution. Remark 3.. This is an example of a so-called thinned Poisson process. For more details, we refer to Section 8.6. Exercise 3.. Use Theorem.1 and Corollary.3.1 to check the values of E X and Var X. A family of distributions that is of special interest is the family of mixed normal, or mixed Gaussian, distributions. These are normal distributions with a random variance, namely, X Σ y N(µ, y) with Σ F, (3.6)

11 3 Distributions with Random Parameters 41 where F is some distribution (on (, )). For simplicity we assume in the following that µ. As an example, consider normally distributed observations with rare disturbances. More specifically, the observations might be N(, 1)-distributed with probability.99 and N(, 1)-distributed with probability.1. We may write this as X N(, Σ ), where P (Σ 1).99 and P (Σ 1).1. By Theorem.1 it follows immediately that E X. As for the variance, Corollary.3.1 tells us that Var X E Var (X Σ ) + Var ( E(X Σ ) ) E Σ If Σ has a continuous distribution, computations such as those above yield ( ) x F X (x) Φ f Σ (y) dy, y from which the density function of X is obtained by differentiation: f X (x) ( ) 1 x φ f Σ (y) dy y y Mean and variance can be found via the results of Section : E X E ( E(X Σ ) ), 1 πy e x /y f Σ (y) dy. (3.7) Var X E Var (X Σ ) + Var ( E(X Σ ) ) E Σ. Next, we determine the distribution of X under the particular assumption that Σ Exp(1). We are thus faced with the situation By (3.7), f X (x) X Σ y N(, y) with Σ Exp(1) (3.8) 1 e x /y e y dy [ set y u ] πy 1 e x /u e u du π π exp x u u} du. In order to solve this integral, the following device may be of use: Let x >, set I(x) exp x u u} du,

12 4 Conditioning differentiate (differentiation and integration may be interchanged), and make the change of variable y x/u. This yields I (x) ( x ) u exp x u u} du } exp y x y dy. It follows that I satisfies the differential equation with the initial condition the solution of which is I() I(x) I (x) I(x) e u du π, π e x, x >. (3.9) By inserting (3.9) into the expression for f X (x), and noting that the density is symmetric around x, we finally obtain π f X (x) π e x 1 e x 1 e x, < x <, that is, X L( 1 ); a Laplace distribution. An extra check yields E X and Var X E Σ 1 ( ( 1 ) ), as desired. Exercise 3.3. Show that if X has a normal distribution such that the mean is zero and the inverse of the variance is Γ-distributed, viz., ( n X Σ λ N(, 1/λ) with Σ Γ n),, then X t(n). Exercise 3.4. Sheila has a coin with P (head) p 1 and Betty has a coin with P (head) p. Sheila tosses her coin m times. Each time she obtains heads, Betty tosses her coin (otherwise not). Find the distribution of the total number of heads obtained by Betty. Further, check that mean and variance coincide with the values obtained by Theorem.1 and Corollary.3.1. Alternatively, find mean and variance first and try to guess the desired distribution (and check if your guess was correct). As a hint, observe that the game can be modeled as follows: Let N be the number of heads obtained by Sheila and X be the number of heads obtained by Betty. We thus wish to find the distribution of X, where X N n Bin(n, p ) with N Bin(m, p 1 ), < p 1, p < 1. We shall return to the topic of this section in Section 3.5.

13 4 The Bayesian Approach 43 4 The Bayesian Approach A typical problem in probability theory begins with assumptions such as let X Po(m), let Y N(µ, σ ), toss a symmetric coin 15 times, and so forth. In the computations that follow, one tacitly assumes that all parameters are known, that the coin is exactly symmetric, and so on. In statistics one assumes (certain) parameters to be unknown, for example, that the coin might be asymmetric, and one searches for methods, devices, and rules to decide whether or not one should believe in certain hypotheses. Two typical illustrations in the Gaussian approach are µ unknown and σ known and µ and σ unknown. The Bayesian approach is a kind of compromise. One claims, for example, that parameters are never completely unknown; one always has some prior opinion or knowledge about them. A probabilistic model describing this approach was given in Example 3.1. The opening statement there was that the density of red blood corpuscles follows a Poisson distribution. One interpretation of that statement could have been that whenever we are faced with a blood sample the density of red blood corpuscles in the sample is Poissonian. The Bayesian approach taken in Example 3.1 is that whenever we know from whom the blood sample has been taken, the density of red blood corpuscles in the sample is Poissonian, however, with a parameter depending on the individual. If we do not know from whom the sample has been taken, then the parameter is unknown; it is a random variable following some distribution. We also found that if this distribution is the standard exponential, then the density of red blood corpuscles is geometric (and hence not Poissonian). The prior knowledge about the parameters in this approach is expressed in such a way that the parameters are assumed to follow some probability distribution, called the prior (or a priori) distribution. If one wishes to assume that a parameter is completely unknown, one might solve the situation by attributing some uniform distribution to the parameter. In this terminology we may formulate our findings in Example 3.1 as follows: If the parameter in a Poisson distribution has a standard exponential prior distribution, then the random variable under consideration follows a Ge(1/)-distribution. Frequently, one performs random experiments in order to estimate (unknown) parameters. The estimates are based on observations from some probability distribution. The Bayesian analog is to determine the conditional distribution of the parameter given the result of the random experiment. Such a distribution is called the posterior (or a posteriori) distribution. Next we determine the posterior distribution in Example 3.1. Example 4.1. The model in the example was X M m Po(m) with M Exp(1). (4.1)

14 44 Conditioning We further had found that X Ge(1/). Now we wish to determine the conditional distribution of M given the value of X. For x >, we have P (M x} X k}) F M Xk (x) P (M x X k) P (X k) x P (X k M y) f M (y) dy P (X k) x e y yk k! e y dy ( 1 )k+1 which, after differentiation, yields f M Xk (x) x 1 Γ(k + 1) yk k+1 e y dy, 1 Γ(k + 1) xk k+1 e x, x >. Thus, M X k Γ(k + 1, 1 ) or, in our new terminology, the posterior distribution of M given that X equals k is Γ(k + 1, 1 ). Remark 4.1. Note that, starting from the distribution of X given M (and from that of M), we have determined the distribution of M given X and that the solution of the problem, in fact, amounted to applying a continuous version of Bayes formula. Exercise 4.1. Check that E M and Var M are what they are supposed to be by applying Theorem.1 and Corollary.3.1 to the posterior distribution. We conclude this section by studying coin tossing from the Bayesian point of view under the assumption that nothing is known about p P (heads). Let X n be the number of heads after n coin tosses. One possible model is X n P p Bin(n, p) with P U(, 1). (4.) The prior distribution of P, thus, is the U(, 1)-distribution. Models of this kind are called mixed binomial models. For k, 1,,..., n, we now obtain (via some facts about the beta distribution) 1 ( ) n P (X n k) x k (1 x) n k 1 dx k ( ) n 1 x (k+1) 1 (1 x) (n+1 k) 1 dx k ( n k ) Γ(k + 1)Γ(n + 1 k) Γ(k n + 1 k) n! k! (n k)! k! (n k)! (n + 1)! 1 n + 1.

15 4 The Bayesian Approach 45 This means that X n is uniformly distributed over the integers, 1,..., n. A second thought reveals that this is a very reasonable conclusion. Since nothing is known about the coin (in the sense of relation (4.)), there is nothing that favors a specific outcome, that is, all outcomes should be equally probable. If p is known, we know that the results in different tosses are independent and that the probability of heads given that we obtained 1 heads in a row (still) equals p. What about these facts in the Bayesian model? P (X n+1 n + 1 X n n) P (X n+1 n + 1} X n n}) P (X n n) P (X n+1 n + 1) P (X n n) 1 n+ 1 n+1 n + 1 n + 1 as n. This means that if we know that there were many heads in a row then the (conditional) probability of another head is very large; the results in different tosses are not at all independent. Why is this the case? Let us find the posterior distribution of P. x P (P x X n k) P (X n k P y) f P (y) dy P (X n k) x ( n ) k y k (1 y) n k 1 dy Differentiation yields f P Xnk(x) 1 n+1 ( ) n x (n + 1) y k (1 y) n k dy. k Γ(n + ) Γ(k + 1)Γ(n + 1 k) xk (1 x) n k, < x < 1, viz., a β(k + 1, n + 1 k)-distribution. For k n we obtain in particular (or, by direct computation) It follows that f P Xnn(x) (n + 1)x n, < x < 1. P (P > 1 ε X n n) 1 (1 ε) n+1 1 as n for all ε >. This means that if we know that there were many heads in a row then we also know that p is close to 1 and thus that it is very likely that the next toss will yield another head. Remark 4.. It is, of course, possible to consider the posterior distribution as a prior distribution for a further random experiment, and so on.

16 46 Conditioning 5 Regression and Prediction A common statistics problem is to analyze how different (levels of) treatments or treatment combinations affect the outcome of an experiment. The yield of a crop, for example, may depend on variability in watering, fertilization, climate, and other factors in the various areas where the experiment is performed. One problem is that one cannot predict the outcome y exactly, meaning without error, even if the levels of the treatments x 1, x,..., x n are known exactly. An important function for predicting the outcome is the conditional expectation of the (random) outcome Y given the (random) levels of treatment X 1, X,..., X n. Let X 1, X,..., X n and Y be jointly distributed random variables, and set h(x) h(x 1,..., x n ) E(Y X 1 x 1,..., X n x n ) E(Y X x). Definition 5.1. The function h is called the regression function Y on X. Remark 5.1. For n 1 we have h(x) E(Y X x), which is the ordinary conditional expectation. Definition 5.. A predictor (for Y ) based on X is a function, d(x). The predictor is called linear if d is linear, that is, if d(x) a +a 1 X 1 + +a n X n, where a, a 1,..., a n are constants. Predictors are used to predict (as the name suggests). The prediction error is given by the random variable Y d(x). (5.1) There are several ways to compare different predictors. One suitable measure is defined as follows: Definition 5.3. The expected quadratic prediction error is E ( Y d(x) ). Moreover, if d 1 and d are predictors, we say that d 1 is better than d if E(Y d 1 (X)) E(Y d (X)). In the following we confine ourselves to considering the case n 1. A predictor is thus a function of X, d(x), and the expected quadratic prediction error is E(Y d(x)). If the predictor is linear, that is, if d(x) a + bx, where a and b are constants, the expected quadratic prediction error is E(Y (a + bx)).

17 5 Regression and Prediction 47 Example 5.1. Pick a point uniformly distributed in the triangle x, y, x + y 1. We wish to determine the regression functions E(Y X x) and E(X Y y). To solve this problem we first note that the joint density of X and Y is c, for x, y, x + y 1, f X,Y (x, y), otherwise, where c is some constant, which is found by noticing that the total mass equals 1. We thus have 1 ( 1 x ) 1 f X,Y (x, y) dxdy c dy dx c 1 (1 x) dx c [ (1 x) ] 1 c, from which it follows that c. In order to determine the conditional densities we first compute the marginal ones: f X (x) f Y (y) f X,Y (x, y) dy f X,Y (x, y) dx 1 x 1 y dy (1 x), < x < 1, dx (1 y), < y < 1. Incidentally, X and Y have the same distribution for reasons of symmetry. Finally, and so f Y Xx (y) f X,Y (x, y) f X (x) E(Y X x) and, by symmetry, 1 x y (1 x) 1 1 x, < y < 1 x, 1 1 x dy 1 [ y 1 x E(X Y y) 1 y ] 1 x. (1 x) (1 x) 1 x Remark 5.. Note also, for example, that Y X x U(, 1 x) in the example, that is, the density is, for x fixed, a constant (which is the inverse of the length of the interval (, 1 x)). This implies that E(Y X x) (1 x)/, which agrees with the previous results. It also provides an alternative solution to the last part of the problem. In this case the gain is marginal, but in a more technically complicated situation it might be more substantial.

18 48 Conditioning Exercise 5.1. Solve the same problem when cx, for < x, y < 1, f X,Y (x, y), otherwise. Exercise 5.. Solve the same problem when e y, for < x < y, f X,Y (x, y), otherwise. Theorem 5.1. Suppose that E Y <. Then h(x) E(Y X) (i.e., the regression function Y on X) is the best predictor of Y based on X. Proof. By Theorem.3 we know that for an arbitrary predictor d(x), E ( Y d(x) ) E Var (Y X) + E ( h(x) d(x) ) E Var (Y X), where equality holds iff d(x)h(x) (more precisely, iff P (d(x)h(x)) 1). The choice d(x) h(x) thus yields minimal expected quadratic prediction error. Example 5.. In Example 5.1 we found the regression function of Y based on X to be (1 X)/. By Theorem 5.1 it is the best predictor of Y based on X. A simple calculation shows that the expected quadratic prediction error is E(Y (1 X)/) 1/48. We also noted that X and Y have the same marginal distribution. A (very) naive suggestion for another predictor therefore might be X itself. The expected quadratic prediction error for this predictor is E(Y X) 1/4 > 1/48, which shows that the regression function is indeed a better predictor. Sometimes it is difficult to determine regression functions explicitly. In such cases one might be satisfied with the best linear predictor. This means that one wishes to minimize E(Y (a + bx)) as a function of a and b, which leads to the well-known method of least squares. The solution of this problem is given in the following result. Theorem 5.. Suppose that E X < and E Y <. Set µ x E X, µ y E Y, σ x Var X, σ y Var Y, σ xy Cov(X, Y ), and ρ σ xy /σ x σ y. The best linear predictor of Y based on X is L(X) α + βx, where α µ y σ xy σx µ x µ y ρ σ y µ x and β σ xy σ x σx ρ σ y σ x.

19 5 Regression and Prediction 49 The best linear predictor thus is µ y + ρ σ y σ x (X µ x ). (5.) Definition 5.4. The line y µ y + ρ σy σ x (x µ x ) is called the regression line Y on X. The slope, ρ σy σ x, of the line is called the regression coefficient. Remark 5.3. Note that y L(x), where L(X) is the best linear predictor of Y based on X. Remark 5.4. If, in particular, (X, Y ) has a joint Gaussian distribution, it turns out that the regression function is linear, that is, for this very important case the best linear predictor is, in fact, the best predictor. For details, we refer the reader to Section 5.6. Example 5.1 (continued). The regression function Y on X turned out to be linear in this example; y (1 x)/. It follows in particular that the regression function coincides with the regression line Y on X. The regression coefficient equals 1/. The expected quadratic prediction error of the best linear predictor of Y based on X is obtained as follows: Theorem 5.3. E ( Y L(X) ) σ y (1 ρ ). Proof. E ( Y L(X) ) E ( Y µy ρ σ y σ x (X µ x ) ) E(Y µy ) + ρ σ y σx E(X µ x ) ρ σ y E(Y µ y )(X µ x ) σ x σ y + ρ σ y ρ σ y σ x σ xy σ y(1 ρ ). Definition 5.5. The quantity σ y(1 ρ ) is called residual variance. Exercise 5.3. Check via Theorem 5.3 that the residual variance in Example 5.1 equals 1/48 as was claimed in Example 5.. The regression line X on Y is determined similarly. It is which can be rewritten as x µ x + ρ σ x σ y (y µ y ), y µ y + 1 ρ σy σ x (x µ x )

20 5 Conditioning if ρ. The regression lines Y on X and X on Y are thus, in general, different. They coincide iff they have the same slope iff ρ σy σ x 1 ρ σy σ x ρ 1, that is, iff there exists a linear relation between X and Y. Example 5.1 (continued). The regression function X on Y was also linear (and coincides with the regression line X on Y ). The line has the form x (1 y)/, that is, y 1 x. In particular, we note that the slopes of the regression lines are 1/ and, respectively. 6 Problems 1. Let X and Y be independent Exp(1)-distributed random variables. Find the conditional distribution of X given that X + Y c (c is a positive constant).. Let X and Y be independent Γ(, a)-distributed random variables. Find the conditional distribution of X given that X + Y. 3. The life of a repairing device is Exp(1/a)-distributed. Peter wishes to use it on n different, independent, Exp(1/na)-distributed occasions. (a) Compute the probability P n that this is possible. (b) Determine the limit of P n as n. 4. The life T (hours) of the lightbulb in an overhead projector follows an Exp(1)-distribution. During a normal week it is used a Po(1)- distributed number of lectures lasting exactly one hour each. Find the probability that a projector with a newly installed lightbulb functions throughout a normal week (without replacing the lightbulb). 5. The random variables N, X 1, X,... are independent, N Po(λ), and X k Be(1/), k 1. Set Y 1 N X k and Y N Y 1 k1 (Y 1 for N ). Show that Y 1 and Y are independent, and determine their distributions. 6. Suppose that X N(, 1) and Y Exp(1) are independent random variables. Prove that X Y has a standard Laplace distribution. 7. Let N Ge(p) and set X ( 1) N. Compute (a) E X and Var X, (b) the distribution (probability function) of X. 8. The density function of the two-dimensional random variable (X, Y ) is f X,Y (x, y) x y 3 e x y, for < x <, < y < 1,, otherwise.

21 6 Problems 51 (a) Determine the distribution of Y. (b) Find the conditional distribution of X given that Y y. (c) Use the results from (a) and (b) to compute E X and Var X. 9. The density of the random vector (X, Y ) is cx, for x, y, x + y 1, f X,Y (x, y), otherwise. Compute (a) c, (b) the conditional expectations E(Y X x) and E(X Y y). 1. Suppose X and Y have a joint density function given by cx, for < x < y < 1,, otherwise. Find c, the marginal density functions, E X, E Y, and the conditional expectations E(Y X x) and E(X Y y). 11. Suppose X and Y have a joint density function given by c x y, for < y < x < 1,, otherwise. Compute c, the marginal densities, E X, E Y, and the conditional expectations E(Y X x) and E(X Y y). 1. Let X and Y have joint density cxy, when < y < x < 1,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 13. Let X and Y have joint density cy, when < y < x <,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 14. Suppose that X and Y are random variables with joint density c(x + y), when < x < y < 1,, otherwise. Compute the regression functions E(Y X x) and E(X Y y).

22 5 Conditioning 15. Suppose that X and Y are random variables with a joint density 5 (x + 3y), when < x, y < 1,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 16. Let X and Y be random variables with a joint density 4 5 (x + 3y)e x y, when x, y >,, otherwise. Compute the regression functions E(Y X x) and E(X Y y). 17. Suppose that the joint density of X and Y is given by xe x xy, when x >, y >,, otherwise. Determine the regression functions E(Y X x) and E(X Y y). 18. Let the joint density function of X and Y be given by c(x + y), for < x < y < 1,, otherwise. Determine c, the marginal densities, E X, E Y, and the conditional expectations E(Y X x) and E(X Y y). 19. Let the joint density of X and Y be given by c, for x 1, x y x, f X,Y (x, y), otherwise. Compute c, the marginal densities, and the conditional expectations E(Y X x) and E(X Y y).. Suppose that X and Y are random variables with joint density cx, when < x < 1, x 3 < y < x 1/3,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 1. Suppose that X and Y are random variables with joint density cy, when < x < 1, x 4 < y < x 1/4,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y).

23 . Let the joint density function of X and Y be given by c x 3 y, for x, y >, x + y 1,, otherwise. 6 Problems 53 Compute c, the marginal densities, and the conditional expectations E(Y X x) and E(X Y y). 3. The joint density function of X and Y is given by c xy, for x, y >, 4x + y 1,, otherwise. Compute c, the marginal densities, and the conditional expectations E(Y X x) and E(X Y y). 4. Let X and Y have joint density c x 3, when 1 < y < x, y, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 5. Let X and Y have joint density c x 4, when 1 < y < x, y, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 6. Suppose that X and Y are random variables with a joint density c, when < y < x < 1, (1 + x y), otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 7. Suppose that X and Y are random variables with a joint density c cos x, when < y < x < π,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 8. Let X and Y have joint density c log y, when < y < x < 1,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y).

24 54 Conditioning 9. The random vector (X, Y ) has the following joint distribution: ( ) m 1 m P (X m, Y n) n m 15, where m 1,,..., 5 and n, 1,..., m. Compute E(Y X m). 3. Show that a suitable power of a Weibull-distributed random variable whose parameter is gamma-distributed is Pareto-distributed. More precisely, show that if X A a W ( 1 a, 1 b ) with A Γ(p, θ), then X b has a (translated) Pareto distribution. 31. Show that an exponential random variable such that the inverse of the parameter is gamma-distributed is Pareto-distributed. More precisely, show that if X M m Exp(m) with M 1 Γ(p, a), then X has a (translated) Pareto distribution. 3. Let X and Y be random variables such that Y X x Exp(1/x) with X Γ(, 1). (a) Show that Y has a translated Pareto distribution. (b) Compute E Y. (c) Check the value in (b) by recomputing it via our favorite formula for conditional means. 33. Suppose that the random variable X is uniformly distributed symmetrically around zero, but in such a way that the parameter is uniform on (, 1); that is, suppose that X A a U( a, a) with A U(, 1). Find the distribution of X, E X, and Var X. 34. In Section 4 we studied the situation when a coin, such that p P (head) is considered to be a U(, 1)-distributed random variable, is tossed, and found (i.a.) that if X n # heads after n tosses, then X n is uniformly distributed over the integers, 1,..., n. Suppose instead that p is considered to be β(, )-distributed. What then? More precisely, consider the following model: X n Y y Bin(n, y) with f Y (y) 6y(1 y), < y < 1. (a) Compute E X n and Var X n. (b) Determine the distribution of X n. 35. Let X and Y be jointly distributed random variables such that Y X x Bin(n, x) with X U(, 1). Compute E Y, Var Y, and Cov(X, Y ) (without using what is known from Section 4 about the distribution of Y ).

25 36. Let X and Y be jointly distributed random variables such that Y X x Fs(x) with f X (x) 3x, x 1. 6 Problems 55 Compute E Y, Var Y, Cov (X, Y ), and the distribution of Y. 37. Let X be the number of coin tosses until heads is obtained. Suppose that the probability of heads is unknown in the sense that we consider it to be a random variable Y U(, 1). (a) Find the distribution of X (cf. Problem ). (b) The expected value of an Fs-distributed random variable exists, as is well known. What about E X? (c) Suppose that the value X n has been observed. Find the posterior distribution of Y, that is, the distribution of Y X n. 38. Let p be the probability that the tip points downward after a person throws a drawing pin once. Annika throws a drawing pin until it points downward for the first time. Let X be the number of throws for this to happen. She then throws the drawing pin another X times. Let Y be the number of times the drawing pin points downward in the latter series of throws. Find the distribution of Y (cf. Problem ). 39. A point P is chosen uniformly in an n-dimensional sphere of radius 1. Next, a point Q is chosen uniformly within the concentric sphere, centered at the origin, going through P. Let X and Y be the distances of P and Q, respectively, to the common center. Find the joint density function of X and Y and the conditional expectations E(Y X x) and E(X Y y). Hint 1. Begin by trying the case n. Hint. The volume of an n-dimensional sphere of radius r is equal to c n r n, where c n is some constant (which is of no interest for the problem). Remark. For n 1 we rediscover the stick from Example Let X and Y be independent random variables. The conditional distribution of Y given that X x then does not depend on x. Moreover, E(Y X x) is independent of x; recall Theorem.(b) and Remark.4. Now, suppose instead that E(Y X x) is independent of x (i.e., that E(Y X) E Y ). We say that Y has constant regression with respect to X. However, it does not necessarily follow that X and Y are independent. Namely, let the joint density of X and Y be given by 1, for x + y 1,, otherwise. Show that Y has constant regression with respect to X and/but that X and Y are not independent.

26

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Conditional distributions (discrete case)

Conditional distributions (discrete case) Conditional distributions (discrete case) The basic idea behind conditional distributions is simple: Suppose (XY) is a jointly-distributed random vector with a discrete joint distribution. Then we can

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix: Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms,

More information

1 Basic continuous random variable problems

1 Basic continuous random variable problems Name M362K Final Here are problems concerning material from Chapters 5 and 6. To review the other chapters, look over previous practice sheets for the two exams, previous quizzes, previous homeworks and

More information

We introduce methods that are useful in:

We introduce methods that are useful in: Instructor: Shengyu Zhang Content Derived Distributions Covariance and Correlation Conditional Expectation and Variance Revisited Transforms Sum of a Random Number of Independent Random Variables more

More information

Theory of probability and mathematical statistics

Theory of probability and mathematical statistics Theory of probability and mathematical statistics Tomáš Mrkvička Bibliography [1] J. [2] J. Andďż l: Matematickďż statistika, SNTL/ALFA, Praha 1978 Andďż l: Statistickďż metody, Matfyzpress, Praha 1998

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

STAT 516 Midterm Exam 3 Friday, April 18, 2008

STAT 516 Midterm Exam 3 Friday, April 18, 2008 STAT 56 Midterm Exam 3 Friday, April 8, 2008 Name Purdue student ID (0 digits). The testing booklet contains 8 questions. 2. Permitted Texas Instruments calculators: BA-35 BA II Plus BA II Plus Professional

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text. TEST #3 STA 536 December, 00 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. You will have access to a copy

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Joint Distribution of Two or More Random Variables

Joint Distribution of Two or More Random Variables Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few

More information

ACM 116: Lectures 3 4

ACM 116: Lectures 3 4 1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance

More information

More on Distribution Function

More on Distribution Function More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function F X. Theorem: Let X be any random variable, with cumulative distribution

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2016 MODULE 1 : Probability distributions Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal marks.

More information

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text. TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

MATH/STAT 3360, Probability Sample Final Examination Model Solutions

MATH/STAT 3360, Probability Sample Final Examination Model Solutions MATH/STAT 3360, Probability Sample Final Examination Model Solutions This Sample examination has more questions than the actual final, in order to cover a wider range of questions. Estimated times are

More information

STA 256: Statistics and Probability I

STA 256: Statistics and Probability I Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. There are situations where one might be interested

More information

MAS113 Introduction to Probability and Statistics. Proofs of theorems

MAS113 Introduction to Probability and Statistics. Proofs of theorems MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a

More information

Notes for Math 324, Part 19

Notes for Math 324, Part 19 48 Notes for Math 324, Part 9 Chapter 9 Multivariate distributions, covariance Often, we need to consider several random variables at the same time. We have a sample space S and r.v. s X, Y,..., which

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Multivariate probability distributions and linear regression

Multivariate probability distributions and linear regression Multivariate probability distributions and linear regression Patrik Hoyer 1 Contents: Random variable, probability distribution Joint distribution Marginal distribution Conditional distribution Independence,

More information

Multivariate distributions

Multivariate distributions CHAPTER Multivariate distributions.. Introduction We want to discuss collections of random variables (X, X,..., X n ), which are known as random vectors. In the discrete case, we can define the density

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51 Yi Lu Correlation and Covariance Yi Lu ECE 313 2/51 Definition Let X and Y be random variables with finite second moments. the correlation: E[XY ] Yi Lu ECE 313 3/51 Definition Let X and Y be random variables

More information

Tom Salisbury

Tom Salisbury MATH 2030 3.00MW Elementary Probability Course Notes Part V: Independence of Random Variables, Law of Large Numbers, Central Limit Theorem, Poisson distribution Geometric & Exponential distributions Tom

More information

Lecture 9. = 1+z + 2! + z3. 1 = 0, it follows that the radius of convergence of (1) is.

Lecture 9. = 1+z + 2! + z3. 1 = 0, it follows that the radius of convergence of (1) is. The Exponential Function Lecture 9 The exponential function 1 plays a central role in analysis, more so in the case of complex analysis and is going to be our first example using the power series method.

More information

180B Lecture Notes, W2011

180B Lecture Notes, W2011 Bruce K. Driver 180B Lecture Notes, W2011 January 11, 2011 File:180Lec.tex Contents Part 180B Notes 0 Course Notation List......................................................................................................................

More information

Statistics 100A Homework 5 Solutions

Statistics 100A Homework 5 Solutions Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to

More information

1 Review of di erential calculus

1 Review of di erential calculus Review of di erential calculus This chapter presents the main elements of di erential calculus needed in probability theory. Often, students taking a course on probability theory have problems with concepts

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Conditioning a random variable on an event

Conditioning a random variable on an event Conditioning a random variable on an event Let X be a continuous random variable and A be an event with P (A) > 0. Then the conditional pdf of X given A is defined as the nonnegative function f X A that

More information

FINAL EXAM: Monday 8-10am

FINAL EXAM: Monday 8-10am ECE 30: Probabilistic Methods in Electrical and Computer Engineering Fall 016 Instructor: Prof. A. R. Reibman FINAL EXAM: Monday 8-10am Fall 016, TTh 3-4:15pm (December 1, 016) This is a closed book exam.

More information

University of Regina. Lecture Notes. Michael Kozdron

University of Regina. Lecture Notes. Michael Kozdron University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Lectures on Elementary Probability. William G. Faris

Lectures on Elementary Probability. William G. Faris Lectures on Elementary Probability William G. Faris February 22, 2002 2 Contents 1 Combinatorics 5 1.1 Factorials and binomial coefficients................. 5 1.2 Sampling with replacement.....................

More information

Discrete Random Variable

Discrete Random Variable Discrete Random Variable Outcome of a random experiment need not to be a number. We are generally interested in some measurement or numerical attribute of the outcome, rather than the outcome itself. n

More information

Frank Porter April 3, 2017

Frank Porter April 3, 2017 Frank Porter April 3, 2017 Chapter 1 Probability 1.1 Definition of Probability The notion of probability concerns the measure ( size ) of sets in a space. The space may be called a Sample Space or an Event

More information

Generating and characteristic functions. Generating and Characteristic Functions. Probability generating function. Probability generating function

Generating and characteristic functions. Generating and Characteristic Functions. Probability generating function. Probability generating function Generating and characteristic functions Generating and Characteristic Functions September 3, 03 Probability generating function Moment generating function Power series expansion Characteristic function

More information

CME 106: Review Probability theory

CME 106: Review Probability theory : Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:

More information

STAT/MA 416 Answers Homework 6 November 15, 2007 Solutions by Mark Daniel Ward PROBLEMS

STAT/MA 416 Answers Homework 6 November 15, 2007 Solutions by Mark Daniel Ward PROBLEMS STAT/MA 4 Answers Homework November 5, 27 Solutions by Mark Daniel Ward PROBLEMS Chapter Problems 2a. The mass p, corresponds to neither of the first two balls being white, so p, 8 7 4/39. The mass p,

More information

Conditional distributions. Conditional expectation and conditional variance with respect to a variable.

Conditional distributions. Conditional expectation and conditional variance with respect to a variable. Conditional distributions Conditional expectation and conditional variance with respect to a variable Probability Theory and Stochastic Processes, summer semester 07/08 80408 Conditional distributions

More information

MAT 271E Probability and Statistics

MAT 271E Probability and Statistics MAT 71E Probability and Statistics Spring 013 Instructor : Class Meets : Office Hours : Textbook : Supp. Text : İlker Bayram EEB 1103 ibayram@itu.edu.tr 13.30 1.30, Wednesday EEB 5303 10.00 1.00, Wednesday

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Probability Models. 4. What is the definition of the expectation of a discrete random variable? 1 Probability Models The list of questions below is provided in order to help you to prepare for the test and exam. It reflects only the theoretical part of the course. You should expect the questions

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y. CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook

More information

Week 1 Quantitative Analysis of Financial Markets Distributions A

Week 1 Quantitative Analysis of Financial Markets Distributions A Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October

More information

MATH 151, FINAL EXAM Winter Quarter, 21 March, 2014

MATH 151, FINAL EXAM Winter Quarter, 21 March, 2014 Time: 3 hours, 8:3-11:3 Instructions: MATH 151, FINAL EXAM Winter Quarter, 21 March, 214 (1) Write your name in blue-book provided and sign that you agree to abide by the honor code. (2) The exam consists

More information

University of Chicago Graduate School of Business. Business 41901: Probability Final Exam Solutions

University of Chicago Graduate School of Business. Business 41901: Probability Final Exam Solutions Name: University of Chicago Graduate School of Business Business 490: Probability Final Exam Solutions Special Notes:. This is a closed-book exam. You may use an 8 piece of paper for the formulas.. Throughout

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES Contents 1. Continuous random variables 2. Examples 3. Expected values 4. Joint distributions

More information

6 The normal distribution, the central limit theorem and random samples

6 The normal distribution, the central limit theorem and random samples 6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

X 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition:

X 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition: nna Janicka Probability Calculus 08/09 Lecture 4. Real-valued Random Variables We already know how to describe the results of a random experiment in terms of a formal mathematical construction, i.e. the

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 17-27 Review Scott Sheffield MIT 1 Outline Continuous random variables Problems motivated by coin tossing Random variable properties 2 Outline Continuous random variables Problems

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction

More information

MAT 271E Probability and Statistics

MAT 271E Probability and Statistics MAT 7E Probability and Statistics Spring 6 Instructor : Class Meets : Office Hours : Textbook : İlker Bayram EEB 3 ibayram@itu.edu.tr 3.3 6.3, Wednesday EEB 6.., Monday D. B. Bertsekas, J. N. Tsitsiklis,

More information

Physics 6720 Introduction to Statistics April 4, 2017

Physics 6720 Introduction to Statistics April 4, 2017 Physics 6720 Introduction to Statistics April 4, 2017 1 Statistics of Counting Often an experiment yields a result that can be classified according to a set of discrete events, giving rise to an integer

More information

February 26, 2017 COMPLETENESS AND THE LEHMANN-SCHEFFE THEOREM

February 26, 2017 COMPLETENESS AND THE LEHMANN-SCHEFFE THEOREM February 26, 2017 COMPLETENESS AND THE LEHMANN-SCHEFFE THEOREM Abstract. The Rao-Blacwell theorem told us how to improve an estimator. We will discuss conditions on when the Rao-Blacwellization of an estimator

More information

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1). Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

Lecture 2 : CS6205 Advanced Modeling and Simulation

Lecture 2 : CS6205 Advanced Modeling and Simulation Lecture 2 : CS6205 Advanced Modeling and Simulation Lee Hwee Kuan 21 Aug. 2013 For the purpose of learning stochastic simulations for the first time. We shall only consider probabilities on finite discrete

More information

T has many other desirable properties, and we will return to this example

T has many other desirable properties, and we will return to this example 2. Introduction to statistics: first examples 2.1. Introduction. The basic problem of statistics is to draw conclusions about unknown distributions of random variables from observed values. These conclusions

More information

1 Exercises for lecture 1

1 Exercises for lecture 1 1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )

More information

The Delta Method and Applications

The Delta Method and Applications Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Lecture 13 and 14: Bayesian estimation theory

Lecture 13 and 14: Bayesian estimation theory 1 Lecture 13 and 14: Bayesian estimation theory Spring 2012 - EE 194 Networked estimation and control (Prof. Khan) March 26 2012 I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates

More information

Week 9 The Central Limit Theorem and Estimation Concepts

Week 9 The Central Limit Theorem and Estimation Concepts Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population

More information

1 Random variables and distributions

1 Random variables and distributions Random variables and distributions In this chapter we consider real valued functions, called random variables, defined on the sample space. X : S R X The set of possible values of X is denoted by the set

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

The expected value E[X] of discrete random variable X is defined by. xp X (x), (6.1) E[X] =

The expected value E[X] of discrete random variable X is defined by. xp X (x), (6.1) E[X] = Chapter 6 Meeting Expectations When a large collection of data is gathered, one is typically interested not necessarily in every individual data point, but rather in certain descriptive quantities such

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Discrete Mathematics and Probability Theory Fall 2015 Note 20. A Brief Introduction to Continuous Probability

Discrete Mathematics and Probability Theory Fall 2015 Note 20. A Brief Introduction to Continuous Probability CS 7 Discrete Mathematics and Probability Theory Fall 215 Note 2 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces Ω, where the number

More information

5. Conditional Distributions

5. Conditional Distributions 1 of 12 7/16/2009 5:36 AM Virtual Laboratories > 3. Distributions > 1 2 3 4 5 6 7 8 5. Conditional Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an

More information

1 Basic continuous random variable problems

1 Basic continuous random variable problems Name M362K Final Here are problems concerning material from Chapters 5 and 6. To review the other chapters, look over previous practice sheets for the two exams, previous quizzes, previous homeworks and

More information

Chapter 2 Random Variables

Chapter 2 Random Variables Stochastic Processes Chapter 2 Random Variables Prof. Jernan Juang Dept. of Engineering Science National Cheng Kung University Prof. Chun-Hung Liu Dept. of Electrical and Computer Eng. National Chiao Tung

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

ECE 313: Conflict Final Exam Tuesday, May 13, 2014, 7:00 p.m. 10:00 p.m. Room 241 Everitt Lab

ECE 313: Conflict Final Exam Tuesday, May 13, 2014, 7:00 p.m. 10:00 p.m. Room 241 Everitt Lab University of Illinois Spring 1 ECE 313: Conflict Final Exam Tuesday, May 13, 1, 7: p.m. 1: p.m. Room 1 Everitt Lab 1. [18 points] Consider an experiment in which a fair coin is repeatedly tossed every

More information

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t, CHAPTER 2 FUNDAMENTAL CONCEPTS This chapter describes the fundamental concepts in the theory of time series models. In particular, we introduce the concepts of stochastic processes, mean and covariance

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information