2 Conditioning. 1 Conditional Distributions
|
|
- Bertha Fox
- 6 years ago
- Views:
Transcription
1 Conditioning 1 Conditional Distributions Let A and B be events, and suppose that P (B) >. We recall from Section 3 of the Introduction that the conditional probability of A given B is defined as P (A B) P (A B)/P (B) and that P (A B) P (A) if A and B are independent. Now, let (X, Y ) be a two-dimensional random variable whose components are discrete. Example 1.1. A symmetric die is thrown twice. Let U 1 be a random variable denoting the number of dots on the first throw, let U be a random variable denoting the number of dots on the second throw, and set X U 1 + U and Y minu 1, U }. Suppose we wish to find the distribution of Y for some given value of X, for example, P (Y X 7). Set A Y } and B X 7}. From the definition of conditional probabilities we obtain P (Y X 7) P (A B) P (A B) P (B) With this method one may compute P (Y y X x) for any fixed value of x as y varies for arbitrary, discrete, jointly distributed random variables. This leads to the following definition. Definition 1.1. Let X and Y be discrete, jointly distributed random variables. For P (X x) > the conditional probability function of Y given that X x is p Y Xx (y) P (Y y X x) p X,Y (x, y), p X (x) and the conditional distribution function of Y given that X x is A. Gut, An Intermediate course in Probabilty, Springer Texts in Statistics, DOI: 1.17/ _, Springer Science + Business Media, LLC 9 31
2 3 Conditioning F Y Xx (y) z y p Y Xx (z). Exercise 1.1. Show that p Y Xx (y) is a probability function of a true probability distribution. It follows immediately (please check) that and that p Y Xx (y) p X,Y (x, y) p X (x) F Y Xx (y) p X,Y (x, z) z y p X (x) p X,Y (x, y) p X,Y (x, z) z p X,Y (x, z) z y p X,Y (x, z). Exercise 1.. Compute the conditional probability function p Y Xx (y) and the conditional distribution function F Y Xx (y) in Example 1.1. Now let X and Y have a joint continuous distribution. Expressions like P (Y y X x) have no meaning in this case, since the probability that a fixed value is assumed equals zero. However, an examination of how the preceding conditional probabilities are computed makes the following definition very natural. Definition 1.. Let X and Y have a joint continuous distribution. For f X (x) >, the conditional density function of Y given that X x is f Y Xx (y) f X,Y (x, y) f X (x) and the conditional distribution function of Y given that X x is F Y Xx (y) y z f Y Xx (z) dz. In analogy with the discrete case, we further have, and f Y Xx (y) F Y Xx (y) y f X,Y (x, y) f X,Y (x, z) dz f X,Y (x, z) dz. f X,Y (x, z) dz
3 Conditional Expectation and Conditional Variance 33 Exercise 1.3. Show that f Y Xx (y) is a density function of a true probability distribution. Exercise 1.4. Find the conditional distribution of Y given that X x in Example and Exercise Exercise 1.5. Prove that if X and Y are independent then the conditional distributions and the unconditional distributions are the same. Explain why this is reasonable. Remark 1.1. Definitions 1.1 and 1. can be extended to situations with more than two random variables. How? Conditional Expectation and Conditional Variance In the same vein as the concepts of expected value and variance are introduced as convenient location and dispersion measures for (ordinary) random variables or distributions, it is natural to introduce analogs to these concepts for conditional distributions. The following example shows how such notions enter naturally. Example.1. A stick of length one is broken at a random point, uniformly distributed over the stick. The remaining piece is broken once more. Find the expected value and variance of the piece that now remains. In order to solve this problem we let X U(, 1) be the first remaining piece. The second remaining piece Y is uniformly distributed on the interval (, X). This is to be interpreted as follows: Given that X x, the random variable Y is uniformly distributed on the interval (, x): Y X x U(, x), that is, f Y Xx (y) 1/x for < y < x and, otherwise. Clearly, E X 1/ and Var X 1/1. Furthermore, intuition suggests that E(Y X x) x and Var(Y X x) x 1. (.1) We wish to determine E Y and Var Y somehow with the aid of the preceding relations. We are now ready to state our first definition. Definition.1. Let X and Y be jointly distributed random variables. The conditional expectation of Y given that X x is y p Y Xx (y) in the discrete case, y E(Y X x) y f Y Xx (y) dy in the continuous case, provided the relevant sum or integral is absolutely convergent.
4 34 Conditioning Exercise.1. Let X, Y, Y 1, and Y be random variables, let g be a function, and c a constant. Show that (a) E(c X x) c, (b) E(Y 1 + Y X x) E(Y 1 X x) + E(Y X x), (c) E(cY X x) c E(Y X x), (d) E(g(X, Y ) X x) E(g(x, Y ) X x), (e) E(Y X x) E Y if X and Y are independent. The conditional distribution of Y given that X x depends on the value of x (unless X and Y are independent). This implies that the conditional expectation E(Y X x) is a function of x, that is, E(Y X x) h(x) (.) for some function h. (If X and Y are independent, then check that h(x) E Y, a constant.) An object of considerable interest and importance is the random variable h(x), which we denote by h(x) E(Y X). (.3) This random variable is of interest not only in the context of probability theory (as we shall see later) but also in statistics in connection with estimation. Loosely speaking, it turns out that if Y is a good estimator and X is suitably chosen, then E(Y X) is a better estimator. Technically, given a so-called unbiased estimator U of a parameter θ, it is possible to construct another unbiased estimator V by considering the conditional expectation of U with respect to what is called a sufficient statistic T ; that is, V E(U T ). The point is that E U E V θ (unbiasedness) and that Var V Var U (this follows essentially from the sufficiency and Theorem.3 ahead). For details, we refer to the statistics literature provided in Appendix A. A natural question at this point is: What is the expected value of the random variable E(Y X)? Theorem.1. Suppose that E Y <. Then E ( E(Y X) ) E Y. Proof. We prove the theorem for the continuous case and leave the (completely analogous) proof for the discrete case as an exercise. E ( E(Y X) ) E h(x) h(x) f X (x) dx E(Y X x) f X (x) dx ( ) y f Y Xx (y) dy f X (x) dx
5 Conditional Expectation and Conditional Variance 35 y f X,Y (x, y) f X (x) dy dx f X (x) ( ) y f X,Y (x, y) dx dy y f X,Y (x, y) dy dx y f Y (y) dy E Y. Remark.1. Theorem.1 can be interpreted as an expectation version of the law of total probability. Remark.. Clearly, E Y must exist in order for Theorem.1 to make sense, that is, the corresponding sum or integral must be absolutely convergent. Now, given this assumption, one can show that E(E(Y X)) exists and is finite and that the computations in the proof, such as reversing orders of integration, are permitted. We shall, in the sequel, permit ourselves at times to be somewhat sloppy about such verifications. Analogous remarks apply to further results ahead. We close this remark by pointing out that the conclusion always holds in case Y is nonnegative, in the sense that if one of the members is infinite, then so is the other. Exercise.. The object of this exercise is to show that if we do not assume that E Y < in Theorem.1, then the conclusion does not necessarily hold. Namely, suppose that X Γ(1/, ) ( χ (1)) and that f Y Xx (y) 1 π x 1 e 1 xy, < y <. (a) Compute E(Y X x), E(Y X), and, finally, E(E(Y X)). (b) Show that Y C(, 1). (c) What about E Y? We are now able to find E Y in Example.1. Example.1 (continued). It follows from the definition that the first part of (.1) holds: E(Y X x) x, that is, h(x) x. An application of Theorem.1 now yields E Y E ( E(Y X) ) ( 1 ) E h(x) E X 1 E X We have thus determined E Y without prior knowledge about the distribution of Y. Exercise.3. Find the expectation of the remaining piece after it has been broken off n times.
6 36 Conditioning Remark.3. That the result E Y 1/4 is reasonable can intuitively be seen from the fact that X on average equals 1/ and that Y on average equals half the value of X, that is 1/ of 1/. The proof of Theorem.1 consists, in fact, of a stringent version of this kind of argument. Theorem.. Let X and Y be random variables and g be a function. We have (a) E ( g(x)y X ) g(x) E(Y X), and (b) E(Y X) E Y if X and Y are independent. Exercise.4. Prove Theorem.. Remark.4. Conditioning with respect to X means that X should be interpreted as known, and, hence, g(x) as a constant that thus may be moved in front of the expectation (recall Exercise.1(a)). This explains why Theorem.(a) should hold. Part (b) follows from the fact that the conditional distribution and the unconditional distribution coincide if X and Y are independent; in particular, this should remain true for the conditional expectation and the unconditional expectation (recall Exercises 1.5 and.1(e)). A natural problem is to find the variance of the remaining piece Y in Example.1, which, in turn, suggests the introduction of the concept of conditional variance. Definition.. Let X and Y have a joint distribution. The conditional variance of Y given that X x is Var(Y X x) E ( (Y E(Y X x)) X x ), provided the corresponding sum or integral is absolutely convergent. The conditional variance is (also) a function of x; call it v(x). The corresponding random variable is The following result is fundamental. v(x) Var(Y X). (.4) Theorem.3. Let X and Y be random variables and g a real-valued function. If E Y < and E ( g(x) ) <, then E ( Y g(x) ) E Var(Y X) + E ( E(Y X) g(x) ). Proof. An expansion of the left-hand side yields E ( Y g(x) ) E ( Y E(Y X) + E(Y X) g(x) ) E ( Y E(Y X) ) + E ( Y E(Y X) )( E(Y X) g(x) ) + E ( E(Y X) g(x) ).
7 Conditional Expectation and Conditional Variance 37 Using Theorem.1, the right-hand side becomes E E ( (Y E(Y X)) X ) + E E ( (Y E(Y X)) (E(Y X) g(x)) X ) + E ( E(Y X) g(x) ) E Var(Y X) + E (E(Y X) g(x)) E(Y E(Y X) X) } + E ( E(Y X) g(x) ) by Theorem.(a). Finally, since E(Y E(Y X) X), this equals E Var(Y X) + E (E(Y X) g(x)) } + E ( E(Y X) g(x) ), which was to be proved. The particular choice g(x) E Y, together with an application of Theorem.1, yields the following corollary: Corollary.3.1. Suppose that E Y <. Then Var Y E Var (Y X) + Var ( E(Y X) ). Example.1 (continued). Let us determine Var Y with the aid of Corollary.3.1. It follows from second part of formula (.1) that Var(Y X x) 1 1 x, and hence, v(x) 1 1 X, so that ( 1 E Var(Y X) E v(x) E 1 X) Furthermore, Var ( E(Y X) ) ( 1 ) Var(h(X)) Var X 1 4 Var(X) An application of Corollary.3.1 finally yields Var Y 1/36 + 1/48 7/144. We have thus computed Var Y without knowing the distribution of Y. Exercise.5. Find the distribution of Y in Example.1, and verify the values of E Y and Var Y obtained above. A discrete variant of Example.1 is the following: Let X be uniformly distributed over the numbers 1,,..., 6 (that is, throw a symmetric die) and let Y be uniformly distributed over the numbers 1,,..., X (that is, then throw a symmetric die with X faces). In this case, h(x) E(Y X x) 1 + x, from which it follows that ( ) 1 + X E Y E h(x) E 1 (1 + E X) 1 ( ).5. The computation of Var Y is somewhat more elaborate. We leave the details to the reader.
8 38 Conditioning 3 Distributions with Random Parameters We begin with two examples: Example 3.1. Suppose that the density X of red blood corpuscles in humans follows a Poisson distribution whose parameter depends on the observed individual. This means that for Jürg we have X Po(m J ), where m J is Jürg s parameter value, while for Alice we have X Po(m A ), where m A is Alice s parameter value. For a person selected at random we may consider the parameter value M as a random variable such that, given that M m, we have X Po(m); namely, P (X k M m) e m mk, k, 1,,.... (3.1) k! Thus, if we know that Alice was chosen, then P (X k M m A ) e ma m k A /k!, for k, 1,,..., as before. We shall soon see that X itself (unconditioned) need not follow a Poisson distribution. Example 3.. A radioactive substance emits α-particles in such a way that the number of emitted particles during an hour, N, follows a Po(λ)-distribution. The particle counter, however, is somewhat unreliable in the sense that an emitted particle is registered with probability p ( < p < 1), whereas it remains unregistered with probability q 1 p. All particles are registered independently of each other. This means that if we know that n particles were emitted during a specific hour, then the number of registered particles X Bin(n, p), that is, ( ) n P (X k N n) p k q n k, k, 1,..., n (3.) k (and N Po(λ)). If, however, we observe the process during an arbitrarily chosen hour, it follows, as will be seen below, that the number of registered particles does not follow a binomial distribution (but instead a Poisson distribution). The common feature in these examples is that the random variable under consideration, X, has a known distribution but with a parameter that is a random variable. Somewhat imprecisely, we might say that in Example 3.1 we have X Po(M), where M follows some distribution, and that in Example 3. we have X Bin(N, p), where N Po(λ). We prefer, however, to describe these cases as X M m Po(m) with M F, (3.3) where F is some distribution, and respectively. X N n Bin(n, p) with N Po(λ), (3.4)
9 3 Distributions with Random Parameters 39 Let us now determine the (unconditional) distributions of X in our examples, where, in Example 3.1, we assume that M Exp(1). Example 3.1 (continued). We thus have X M m Po(m) with M Exp(1). (3.5) By (the continuous version of) the law of total probability, we obtain, for k, 1,,..., P (X k) 1 k+1 P (X k M x) f M (x) dx x xk e 1 k ( 1 k! x k e x dx k! e x dx 1 Γ(k + 1) k+1 x k+1 1 e x dx ) k, that is, X Ge(1/). The unconditional distribution in this case thus is not a Poisson distribution; it is a geometric distribution. Exercise 3.1. Determine the distribution of X if M has (a) an Exp(a)-distribution, (b) a Γ(p, a)-distribution. Note also that we may use the formulas from Section to compute E X and Var X without knowing the distribution of X. Namely, since E(X M m) m (i.e., h(m) E(X M) M), Theorem.1 yields and Corollary.3.1 yields E X E ( E(X M) ) E M 1, Var X E Var(X M) + Var ( E(X M) ) E M + Var M If, however, the distribution has been determined (as above), the formulas from Section may be used for checking. If applied to Exercise 3.1(a), the latter formulas yield E X a and Var X a+a. Since this situation differs from Example 3.1 only by a rescaling of M, one might perhaps guess that the solution is another geometric distribution. If this were true, we would have E X a q p 1 p p 1 p 1; p 1 a + 1. This value of p inserted in the expression for the variance yields
10 4 Conditioning q p 1 p p 1 p 1 p (a + 1) (a + 1) a + a, which coincides with our computations above and provides the guess that X Ge(1/(a + 1)). Remark 3.1. In Example 3.1 we used the results of Section. to confirm our result. In Exercise 3.1(a) they were used to confirm (provide) a guess. We now turn to the α-particles. Example 3. (continued). Intuitively, the deficiency of the particle counter implies that the radiation actually measured is, on average, a fraction p of the original Poisson stream of particles. We might therefore expect that the number of registered particles during one hour should be a Po(λp)-distributed random variable. That this is actually correct is verified next. The model implies that X N n Bin(n, p) with N Po(λ). The law of total probability yields, for k, 1,,..., P (X k) P (X k N n) P (N n) n nk pk k! e λ (λp)k k! ( n )p k q n k λ λn e k n! nk e λ λ n (n k)! qn k (λp)k k! j (λq) j j! e λ nk (λq) n k (n k)! (λp)k e λ e λq λp (λp)k e, k! k! that is, X Po(λp). The unconditional distribution thus is not a binomial distribution; it is a Poisson distribution. Remark 3.. This is an example of a so-called thinned Poisson process. For more details, we refer to Section 8.6. Exercise 3.. Use Theorem.1 and Corollary.3.1 to check the values of E X and Var X. A family of distributions that is of special interest is the family of mixed normal, or mixed Gaussian, distributions. These are normal distributions with a random variance, namely, X Σ y N(µ, y) with Σ F, (3.6)
11 3 Distributions with Random Parameters 41 where F is some distribution (on (, )). For simplicity we assume in the following that µ. As an example, consider normally distributed observations with rare disturbances. More specifically, the observations might be N(, 1)-distributed with probability.99 and N(, 1)-distributed with probability.1. We may write this as X N(, Σ ), where P (Σ 1).99 and P (Σ 1).1. By Theorem.1 it follows immediately that E X. As for the variance, Corollary.3.1 tells us that Var X E Var (X Σ ) + Var ( E(X Σ ) ) E Σ If Σ has a continuous distribution, computations such as those above yield ( ) x F X (x) Φ f Σ (y) dy, y from which the density function of X is obtained by differentiation: f X (x) ( ) 1 x φ f Σ (y) dy y y Mean and variance can be found via the results of Section : E X E ( E(X Σ ) ), 1 πy e x /y f Σ (y) dy. (3.7) Var X E Var (X Σ ) + Var ( E(X Σ ) ) E Σ. Next, we determine the distribution of X under the particular assumption that Σ Exp(1). We are thus faced with the situation By (3.7), f X (x) X Σ y N(, y) with Σ Exp(1) (3.8) 1 e x /y e y dy [ set y u ] πy 1 e x /u e u du π π exp x u u} du. In order to solve this integral, the following device may be of use: Let x >, set I(x) exp x u u} du,
12 4 Conditioning differentiate (differentiation and integration may be interchanged), and make the change of variable y x/u. This yields I (x) ( x ) u exp x u u} du } exp y x y dy. It follows that I satisfies the differential equation with the initial condition the solution of which is I() I(x) I (x) I(x) e u du π, π e x, x >. (3.9) By inserting (3.9) into the expression for f X (x), and noting that the density is symmetric around x, we finally obtain π f X (x) π e x 1 e x 1 e x, < x <, that is, X L( 1 ); a Laplace distribution. An extra check yields E X and Var X E Σ 1 ( ( 1 ) ), as desired. Exercise 3.3. Show that if X has a normal distribution such that the mean is zero and the inverse of the variance is Γ-distributed, viz., ( n X Σ λ N(, 1/λ) with Σ Γ n),, then X t(n). Exercise 3.4. Sheila has a coin with P (head) p 1 and Betty has a coin with P (head) p. Sheila tosses her coin m times. Each time she obtains heads, Betty tosses her coin (otherwise not). Find the distribution of the total number of heads obtained by Betty. Further, check that mean and variance coincide with the values obtained by Theorem.1 and Corollary.3.1. Alternatively, find mean and variance first and try to guess the desired distribution (and check if your guess was correct). As a hint, observe that the game can be modeled as follows: Let N be the number of heads obtained by Sheila and X be the number of heads obtained by Betty. We thus wish to find the distribution of X, where X N n Bin(n, p ) with N Bin(m, p 1 ), < p 1, p < 1. We shall return to the topic of this section in Section 3.5.
13 4 The Bayesian Approach 43 4 The Bayesian Approach A typical problem in probability theory begins with assumptions such as let X Po(m), let Y N(µ, σ ), toss a symmetric coin 15 times, and so forth. In the computations that follow, one tacitly assumes that all parameters are known, that the coin is exactly symmetric, and so on. In statistics one assumes (certain) parameters to be unknown, for example, that the coin might be asymmetric, and one searches for methods, devices, and rules to decide whether or not one should believe in certain hypotheses. Two typical illustrations in the Gaussian approach are µ unknown and σ known and µ and σ unknown. The Bayesian approach is a kind of compromise. One claims, for example, that parameters are never completely unknown; one always has some prior opinion or knowledge about them. A probabilistic model describing this approach was given in Example 3.1. The opening statement there was that the density of red blood corpuscles follows a Poisson distribution. One interpretation of that statement could have been that whenever we are faced with a blood sample the density of red blood corpuscles in the sample is Poissonian. The Bayesian approach taken in Example 3.1 is that whenever we know from whom the blood sample has been taken, the density of red blood corpuscles in the sample is Poissonian, however, with a parameter depending on the individual. If we do not know from whom the sample has been taken, then the parameter is unknown; it is a random variable following some distribution. We also found that if this distribution is the standard exponential, then the density of red blood corpuscles is geometric (and hence not Poissonian). The prior knowledge about the parameters in this approach is expressed in such a way that the parameters are assumed to follow some probability distribution, called the prior (or a priori) distribution. If one wishes to assume that a parameter is completely unknown, one might solve the situation by attributing some uniform distribution to the parameter. In this terminology we may formulate our findings in Example 3.1 as follows: If the parameter in a Poisson distribution has a standard exponential prior distribution, then the random variable under consideration follows a Ge(1/)-distribution. Frequently, one performs random experiments in order to estimate (unknown) parameters. The estimates are based on observations from some probability distribution. The Bayesian analog is to determine the conditional distribution of the parameter given the result of the random experiment. Such a distribution is called the posterior (or a posteriori) distribution. Next we determine the posterior distribution in Example 3.1. Example 4.1. The model in the example was X M m Po(m) with M Exp(1). (4.1)
14 44 Conditioning We further had found that X Ge(1/). Now we wish to determine the conditional distribution of M given the value of X. For x >, we have P (M x} X k}) F M Xk (x) P (M x X k) P (X k) x P (X k M y) f M (y) dy P (X k) x e y yk k! e y dy ( 1 )k+1 which, after differentiation, yields f M Xk (x) x 1 Γ(k + 1) yk k+1 e y dy, 1 Γ(k + 1) xk k+1 e x, x >. Thus, M X k Γ(k + 1, 1 ) or, in our new terminology, the posterior distribution of M given that X equals k is Γ(k + 1, 1 ). Remark 4.1. Note that, starting from the distribution of X given M (and from that of M), we have determined the distribution of M given X and that the solution of the problem, in fact, amounted to applying a continuous version of Bayes formula. Exercise 4.1. Check that E M and Var M are what they are supposed to be by applying Theorem.1 and Corollary.3.1 to the posterior distribution. We conclude this section by studying coin tossing from the Bayesian point of view under the assumption that nothing is known about p P (heads). Let X n be the number of heads after n coin tosses. One possible model is X n P p Bin(n, p) with P U(, 1). (4.) The prior distribution of P, thus, is the U(, 1)-distribution. Models of this kind are called mixed binomial models. For k, 1,,..., n, we now obtain (via some facts about the beta distribution) 1 ( ) n P (X n k) x k (1 x) n k 1 dx k ( ) n 1 x (k+1) 1 (1 x) (n+1 k) 1 dx k ( n k ) Γ(k + 1)Γ(n + 1 k) Γ(k n + 1 k) n! k! (n k)! k! (n k)! (n + 1)! 1 n + 1.
15 4 The Bayesian Approach 45 This means that X n is uniformly distributed over the integers, 1,..., n. A second thought reveals that this is a very reasonable conclusion. Since nothing is known about the coin (in the sense of relation (4.)), there is nothing that favors a specific outcome, that is, all outcomes should be equally probable. If p is known, we know that the results in different tosses are independent and that the probability of heads given that we obtained 1 heads in a row (still) equals p. What about these facts in the Bayesian model? P (X n+1 n + 1 X n n) P (X n+1 n + 1} X n n}) P (X n n) P (X n+1 n + 1) P (X n n) 1 n+ 1 n+1 n + 1 n + 1 as n. This means that if we know that there were many heads in a row then the (conditional) probability of another head is very large; the results in different tosses are not at all independent. Why is this the case? Let us find the posterior distribution of P. x P (P x X n k) P (X n k P y) f P (y) dy P (X n k) x ( n ) k y k (1 y) n k 1 dy Differentiation yields f P Xnk(x) 1 n+1 ( ) n x (n + 1) y k (1 y) n k dy. k Γ(n + ) Γ(k + 1)Γ(n + 1 k) xk (1 x) n k, < x < 1, viz., a β(k + 1, n + 1 k)-distribution. For k n we obtain in particular (or, by direct computation) It follows that f P Xnn(x) (n + 1)x n, < x < 1. P (P > 1 ε X n n) 1 (1 ε) n+1 1 as n for all ε >. This means that if we know that there were many heads in a row then we also know that p is close to 1 and thus that it is very likely that the next toss will yield another head. Remark 4.. It is, of course, possible to consider the posterior distribution as a prior distribution for a further random experiment, and so on.
16 46 Conditioning 5 Regression and Prediction A common statistics problem is to analyze how different (levels of) treatments or treatment combinations affect the outcome of an experiment. The yield of a crop, for example, may depend on variability in watering, fertilization, climate, and other factors in the various areas where the experiment is performed. One problem is that one cannot predict the outcome y exactly, meaning without error, even if the levels of the treatments x 1, x,..., x n are known exactly. An important function for predicting the outcome is the conditional expectation of the (random) outcome Y given the (random) levels of treatment X 1, X,..., X n. Let X 1, X,..., X n and Y be jointly distributed random variables, and set h(x) h(x 1,..., x n ) E(Y X 1 x 1,..., X n x n ) E(Y X x). Definition 5.1. The function h is called the regression function Y on X. Remark 5.1. For n 1 we have h(x) E(Y X x), which is the ordinary conditional expectation. Definition 5.. A predictor (for Y ) based on X is a function, d(x). The predictor is called linear if d is linear, that is, if d(x) a +a 1 X 1 + +a n X n, where a, a 1,..., a n are constants. Predictors are used to predict (as the name suggests). The prediction error is given by the random variable Y d(x). (5.1) There are several ways to compare different predictors. One suitable measure is defined as follows: Definition 5.3. The expected quadratic prediction error is E ( Y d(x) ). Moreover, if d 1 and d are predictors, we say that d 1 is better than d if E(Y d 1 (X)) E(Y d (X)). In the following we confine ourselves to considering the case n 1. A predictor is thus a function of X, d(x), and the expected quadratic prediction error is E(Y d(x)). If the predictor is linear, that is, if d(x) a + bx, where a and b are constants, the expected quadratic prediction error is E(Y (a + bx)).
17 5 Regression and Prediction 47 Example 5.1. Pick a point uniformly distributed in the triangle x, y, x + y 1. We wish to determine the regression functions E(Y X x) and E(X Y y). To solve this problem we first note that the joint density of X and Y is c, for x, y, x + y 1, f X,Y (x, y), otherwise, where c is some constant, which is found by noticing that the total mass equals 1. We thus have 1 ( 1 x ) 1 f X,Y (x, y) dxdy c dy dx c 1 (1 x) dx c [ (1 x) ] 1 c, from which it follows that c. In order to determine the conditional densities we first compute the marginal ones: f X (x) f Y (y) f X,Y (x, y) dy f X,Y (x, y) dx 1 x 1 y dy (1 x), < x < 1, dx (1 y), < y < 1. Incidentally, X and Y have the same distribution for reasons of symmetry. Finally, and so f Y Xx (y) f X,Y (x, y) f X (x) E(Y X x) and, by symmetry, 1 x y (1 x) 1 1 x, < y < 1 x, 1 1 x dy 1 [ y 1 x E(X Y y) 1 y ] 1 x. (1 x) (1 x) 1 x Remark 5.. Note also, for example, that Y X x U(, 1 x) in the example, that is, the density is, for x fixed, a constant (which is the inverse of the length of the interval (, 1 x)). This implies that E(Y X x) (1 x)/, which agrees with the previous results. It also provides an alternative solution to the last part of the problem. In this case the gain is marginal, but in a more technically complicated situation it might be more substantial.
18 48 Conditioning Exercise 5.1. Solve the same problem when cx, for < x, y < 1, f X,Y (x, y), otherwise. Exercise 5.. Solve the same problem when e y, for < x < y, f X,Y (x, y), otherwise. Theorem 5.1. Suppose that E Y <. Then h(x) E(Y X) (i.e., the regression function Y on X) is the best predictor of Y based on X. Proof. By Theorem.3 we know that for an arbitrary predictor d(x), E ( Y d(x) ) E Var (Y X) + E ( h(x) d(x) ) E Var (Y X), where equality holds iff d(x)h(x) (more precisely, iff P (d(x)h(x)) 1). The choice d(x) h(x) thus yields minimal expected quadratic prediction error. Example 5.. In Example 5.1 we found the regression function of Y based on X to be (1 X)/. By Theorem 5.1 it is the best predictor of Y based on X. A simple calculation shows that the expected quadratic prediction error is E(Y (1 X)/) 1/48. We also noted that X and Y have the same marginal distribution. A (very) naive suggestion for another predictor therefore might be X itself. The expected quadratic prediction error for this predictor is E(Y X) 1/4 > 1/48, which shows that the regression function is indeed a better predictor. Sometimes it is difficult to determine regression functions explicitly. In such cases one might be satisfied with the best linear predictor. This means that one wishes to minimize E(Y (a + bx)) as a function of a and b, which leads to the well-known method of least squares. The solution of this problem is given in the following result. Theorem 5.. Suppose that E X < and E Y <. Set µ x E X, µ y E Y, σ x Var X, σ y Var Y, σ xy Cov(X, Y ), and ρ σ xy /σ x σ y. The best linear predictor of Y based on X is L(X) α + βx, where α µ y σ xy σx µ x µ y ρ σ y µ x and β σ xy σ x σx ρ σ y σ x.
19 5 Regression and Prediction 49 The best linear predictor thus is µ y + ρ σ y σ x (X µ x ). (5.) Definition 5.4. The line y µ y + ρ σy σ x (x µ x ) is called the regression line Y on X. The slope, ρ σy σ x, of the line is called the regression coefficient. Remark 5.3. Note that y L(x), where L(X) is the best linear predictor of Y based on X. Remark 5.4. If, in particular, (X, Y ) has a joint Gaussian distribution, it turns out that the regression function is linear, that is, for this very important case the best linear predictor is, in fact, the best predictor. For details, we refer the reader to Section 5.6. Example 5.1 (continued). The regression function Y on X turned out to be linear in this example; y (1 x)/. It follows in particular that the regression function coincides with the regression line Y on X. The regression coefficient equals 1/. The expected quadratic prediction error of the best linear predictor of Y based on X is obtained as follows: Theorem 5.3. E ( Y L(X) ) σ y (1 ρ ). Proof. E ( Y L(X) ) E ( Y µy ρ σ y σ x (X µ x ) ) E(Y µy ) + ρ σ y σx E(X µ x ) ρ σ y E(Y µ y )(X µ x ) σ x σ y + ρ σ y ρ σ y σ x σ xy σ y(1 ρ ). Definition 5.5. The quantity σ y(1 ρ ) is called residual variance. Exercise 5.3. Check via Theorem 5.3 that the residual variance in Example 5.1 equals 1/48 as was claimed in Example 5.. The regression line X on Y is determined similarly. It is which can be rewritten as x µ x + ρ σ x σ y (y µ y ), y µ y + 1 ρ σy σ x (x µ x )
20 5 Conditioning if ρ. The regression lines Y on X and X on Y are thus, in general, different. They coincide iff they have the same slope iff ρ σy σ x 1 ρ σy σ x ρ 1, that is, iff there exists a linear relation between X and Y. Example 5.1 (continued). The regression function X on Y was also linear (and coincides with the regression line X on Y ). The line has the form x (1 y)/, that is, y 1 x. In particular, we note that the slopes of the regression lines are 1/ and, respectively. 6 Problems 1. Let X and Y be independent Exp(1)-distributed random variables. Find the conditional distribution of X given that X + Y c (c is a positive constant).. Let X and Y be independent Γ(, a)-distributed random variables. Find the conditional distribution of X given that X + Y. 3. The life of a repairing device is Exp(1/a)-distributed. Peter wishes to use it on n different, independent, Exp(1/na)-distributed occasions. (a) Compute the probability P n that this is possible. (b) Determine the limit of P n as n. 4. The life T (hours) of the lightbulb in an overhead projector follows an Exp(1)-distribution. During a normal week it is used a Po(1)- distributed number of lectures lasting exactly one hour each. Find the probability that a projector with a newly installed lightbulb functions throughout a normal week (without replacing the lightbulb). 5. The random variables N, X 1, X,... are independent, N Po(λ), and X k Be(1/), k 1. Set Y 1 N X k and Y N Y 1 k1 (Y 1 for N ). Show that Y 1 and Y are independent, and determine their distributions. 6. Suppose that X N(, 1) and Y Exp(1) are independent random variables. Prove that X Y has a standard Laplace distribution. 7. Let N Ge(p) and set X ( 1) N. Compute (a) E X and Var X, (b) the distribution (probability function) of X. 8. The density function of the two-dimensional random variable (X, Y ) is f X,Y (x, y) x y 3 e x y, for < x <, < y < 1,, otherwise.
21 6 Problems 51 (a) Determine the distribution of Y. (b) Find the conditional distribution of X given that Y y. (c) Use the results from (a) and (b) to compute E X and Var X. 9. The density of the random vector (X, Y ) is cx, for x, y, x + y 1, f X,Y (x, y), otherwise. Compute (a) c, (b) the conditional expectations E(Y X x) and E(X Y y). 1. Suppose X and Y have a joint density function given by cx, for < x < y < 1,, otherwise. Find c, the marginal density functions, E X, E Y, and the conditional expectations E(Y X x) and E(X Y y). 11. Suppose X and Y have a joint density function given by c x y, for < y < x < 1,, otherwise. Compute c, the marginal densities, E X, E Y, and the conditional expectations E(Y X x) and E(X Y y). 1. Let X and Y have joint density cxy, when < y < x < 1,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 13. Let X and Y have joint density cy, when < y < x <,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 14. Suppose that X and Y are random variables with joint density c(x + y), when < x < y < 1,, otherwise. Compute the regression functions E(Y X x) and E(X Y y).
22 5 Conditioning 15. Suppose that X and Y are random variables with a joint density 5 (x + 3y), when < x, y < 1,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 16. Let X and Y be random variables with a joint density 4 5 (x + 3y)e x y, when x, y >,, otherwise. Compute the regression functions E(Y X x) and E(X Y y). 17. Suppose that the joint density of X and Y is given by xe x xy, when x >, y >,, otherwise. Determine the regression functions E(Y X x) and E(X Y y). 18. Let the joint density function of X and Y be given by c(x + y), for < x < y < 1,, otherwise. Determine c, the marginal densities, E X, E Y, and the conditional expectations E(Y X x) and E(X Y y). 19. Let the joint density of X and Y be given by c, for x 1, x y x, f X,Y (x, y), otherwise. Compute c, the marginal densities, and the conditional expectations E(Y X x) and E(X Y y).. Suppose that X and Y are random variables with joint density cx, when < x < 1, x 3 < y < x 1/3,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 1. Suppose that X and Y are random variables with joint density cy, when < x < 1, x 4 < y < x 1/4,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y).
23 . Let the joint density function of X and Y be given by c x 3 y, for x, y >, x + y 1,, otherwise. 6 Problems 53 Compute c, the marginal densities, and the conditional expectations E(Y X x) and E(X Y y). 3. The joint density function of X and Y is given by c xy, for x, y >, 4x + y 1,, otherwise. Compute c, the marginal densities, and the conditional expectations E(Y X x) and E(X Y y). 4. Let X and Y have joint density c x 3, when 1 < y < x, y, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 5. Let X and Y have joint density c x 4, when 1 < y < x, y, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 6. Suppose that X and Y are random variables with a joint density c, when < y < x < 1, (1 + x y), otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 7. Suppose that X and Y are random variables with a joint density c cos x, when < y < x < π,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y). 8. Let X and Y have joint density c log y, when < y < x < 1,, otherwise. Compute the conditional expectations E(Y X x) and E(X Y y).
24 54 Conditioning 9. The random vector (X, Y ) has the following joint distribution: ( ) m 1 m P (X m, Y n) n m 15, where m 1,,..., 5 and n, 1,..., m. Compute E(Y X m). 3. Show that a suitable power of a Weibull-distributed random variable whose parameter is gamma-distributed is Pareto-distributed. More precisely, show that if X A a W ( 1 a, 1 b ) with A Γ(p, θ), then X b has a (translated) Pareto distribution. 31. Show that an exponential random variable such that the inverse of the parameter is gamma-distributed is Pareto-distributed. More precisely, show that if X M m Exp(m) with M 1 Γ(p, a), then X has a (translated) Pareto distribution. 3. Let X and Y be random variables such that Y X x Exp(1/x) with X Γ(, 1). (a) Show that Y has a translated Pareto distribution. (b) Compute E Y. (c) Check the value in (b) by recomputing it via our favorite formula for conditional means. 33. Suppose that the random variable X is uniformly distributed symmetrically around zero, but in such a way that the parameter is uniform on (, 1); that is, suppose that X A a U( a, a) with A U(, 1). Find the distribution of X, E X, and Var X. 34. In Section 4 we studied the situation when a coin, such that p P (head) is considered to be a U(, 1)-distributed random variable, is tossed, and found (i.a.) that if X n # heads after n tosses, then X n is uniformly distributed over the integers, 1,..., n. Suppose instead that p is considered to be β(, )-distributed. What then? More precisely, consider the following model: X n Y y Bin(n, y) with f Y (y) 6y(1 y), < y < 1. (a) Compute E X n and Var X n. (b) Determine the distribution of X n. 35. Let X and Y be jointly distributed random variables such that Y X x Bin(n, x) with X U(, 1). Compute E Y, Var Y, and Cov(X, Y ) (without using what is known from Section 4 about the distribution of Y ).
25 36. Let X and Y be jointly distributed random variables such that Y X x Fs(x) with f X (x) 3x, x 1. 6 Problems 55 Compute E Y, Var Y, Cov (X, Y ), and the distribution of Y. 37. Let X be the number of coin tosses until heads is obtained. Suppose that the probability of heads is unknown in the sense that we consider it to be a random variable Y U(, 1). (a) Find the distribution of X (cf. Problem ). (b) The expected value of an Fs-distributed random variable exists, as is well known. What about E X? (c) Suppose that the value X n has been observed. Find the posterior distribution of Y, that is, the distribution of Y X n. 38. Let p be the probability that the tip points downward after a person throws a drawing pin once. Annika throws a drawing pin until it points downward for the first time. Let X be the number of throws for this to happen. She then throws the drawing pin another X times. Let Y be the number of times the drawing pin points downward in the latter series of throws. Find the distribution of Y (cf. Problem ). 39. A point P is chosen uniformly in an n-dimensional sphere of radius 1. Next, a point Q is chosen uniformly within the concentric sphere, centered at the origin, going through P. Let X and Y be the distances of P and Q, respectively, to the common center. Find the joint density function of X and Y and the conditional expectations E(Y X x) and E(X Y y). Hint 1. Begin by trying the case n. Hint. The volume of an n-dimensional sphere of radius r is equal to c n r n, where c n is some constant (which is of no interest for the problem). Remark. For n 1 we rediscover the stick from Example Let X and Y be independent random variables. The conditional distribution of Y given that X x then does not depend on x. Moreover, E(Y X x) is independent of x; recall Theorem.(b) and Remark.4. Now, suppose instead that E(Y X x) is independent of x (i.e., that E(Y X) E Y ). We say that Y has constant regression with respect to X. However, it does not necessarily follow that X and Y are independent. Namely, let the joint density of X and Y be given by 1, for x + y 1,, otherwise. Show that Y has constant regression with respect to X and/but that X and Y are not independent.
26
Formulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationConditional distributions (discrete case)
Conditional distributions (discrete case) The basic idea behind conditional distributions is simple: Suppose (XY) is a jointly-distributed random vector with a discrete joint distribution. Then we can
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationJoint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:
Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms,
More information1 Basic continuous random variable problems
Name M362K Final Here are problems concerning material from Chapters 5 and 6. To review the other chapters, look over previous practice sheets for the two exams, previous quizzes, previous homeworks and
More informationWe introduce methods that are useful in:
Instructor: Shengyu Zhang Content Derived Distributions Covariance and Correlation Conditional Expectation and Variance Revisited Transforms Sum of a Random Number of Independent Random Variables more
More informationTheory of probability and mathematical statistics
Theory of probability and mathematical statistics Tomáš Mrkvička Bibliography [1] J. [2] J. Andďż l: Matematickďż statistika, SNTL/ALFA, Praha 1978 Andďż l: Statistickďż metody, Matfyzpress, Praha 1998
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationSTAT 516 Midterm Exam 3 Friday, April 18, 2008
STAT 56 Midterm Exam 3 Friday, April 8, 2008 Name Purdue student ID (0 digits). The testing booklet contains 8 questions. 2. Permitted Texas Instruments calculators: BA-35 BA II Plus BA II Plus Professional
More information18.440: Lecture 28 Lectures Review
18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is
More informationThis exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.
TEST #3 STA 536 December, 00 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. You will have access to a copy
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationIf g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get
18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in
More information4. Distributions of Functions of Random Variables
4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationJoint Distribution of Two or More Random Variables
Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few
More informationACM 116: Lectures 3 4
1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance
More informationMore on Distribution Function
More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function F X. Theorem: Let X be any random variable, with cumulative distribution
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2016 MODULE 1 : Probability distributions Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal marks.
More informationThis exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.
TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationMATH/STAT 3360, Probability Sample Final Examination Model Solutions
MATH/STAT 3360, Probability Sample Final Examination Model Solutions This Sample examination has more questions than the actual final, in order to cover a wider range of questions. Estimated times are
More informationSTA 256: Statistics and Probability I
Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. There are situations where one might be interested
More informationMAS113 Introduction to Probability and Statistics. Proofs of theorems
MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a
More informationNotes for Math 324, Part 19
48 Notes for Math 324, Part 9 Chapter 9 Multivariate distributions, covariance Often, we need to consider several random variables at the same time. We have a sample space S and r.v. s X, Y,..., which
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationMultivariate probability distributions and linear regression
Multivariate probability distributions and linear regression Patrik Hoyer 1 Contents: Random variable, probability distribution Joint distribution Marginal distribution Conditional distribution Independence,
More informationMultivariate distributions
CHAPTER Multivariate distributions.. Introduction We want to discuss collections of random variables (X, X,..., X n ), which are known as random vectors. In the discrete case, we can define the density
More informationSTAT2201. Analysis of Engineering & Scientific Data. Unit 3
STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random
More informationProblem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51
Yi Lu Correlation and Covariance Yi Lu ECE 313 2/51 Definition Let X and Y be random variables with finite second moments. the correlation: E[XY ] Yi Lu ECE 313 3/51 Definition Let X and Y be random variables
More informationTom Salisbury
MATH 2030 3.00MW Elementary Probability Course Notes Part V: Independence of Random Variables, Law of Large Numbers, Central Limit Theorem, Poisson distribution Geometric & Exponential distributions Tom
More informationLecture 9. = 1+z + 2! + z3. 1 = 0, it follows that the radius of convergence of (1) is.
The Exponential Function Lecture 9 The exponential function 1 plays a central role in analysis, more so in the case of complex analysis and is going to be our first example using the power series method.
More information180B Lecture Notes, W2011
Bruce K. Driver 180B Lecture Notes, W2011 January 11, 2011 File:180Lec.tex Contents Part 180B Notes 0 Course Notation List......................................................................................................................
More informationStatistics 100A Homework 5 Solutions
Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to
More information1 Review of di erential calculus
Review of di erential calculus This chapter presents the main elements of di erential calculus needed in probability theory. Often, students taking a course on probability theory have problems with concepts
More informationMULTIVARIATE PROBABILITY DISTRIBUTIONS
MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined
More informationBivariate distributions
Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationConditioning a random variable on an event
Conditioning a random variable on an event Let X be a continuous random variable and A be an event with P (A) > 0. Then the conditional pdf of X given A is defined as the nonnegative function f X A that
More informationFINAL EXAM: Monday 8-10am
ECE 30: Probabilistic Methods in Electrical and Computer Engineering Fall 016 Instructor: Prof. A. R. Reibman FINAL EXAM: Monday 8-10am Fall 016, TTh 3-4:15pm (December 1, 016) This is a closed book exam.
More informationUniversity of Regina. Lecture Notes. Michael Kozdron
University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More informationLectures on Elementary Probability. William G. Faris
Lectures on Elementary Probability William G. Faris February 22, 2002 2 Contents 1 Combinatorics 5 1.1 Factorials and binomial coefficients................. 5 1.2 Sampling with replacement.....................
More informationDiscrete Random Variable
Discrete Random Variable Outcome of a random experiment need not to be a number. We are generally interested in some measurement or numerical attribute of the outcome, rather than the outcome itself. n
More informationFrank Porter April 3, 2017
Frank Porter April 3, 2017 Chapter 1 Probability 1.1 Definition of Probability The notion of probability concerns the measure ( size ) of sets in a space. The space may be called a Sample Space or an Event
More informationGenerating and characteristic functions. Generating and Characteristic Functions. Probability generating function. Probability generating function
Generating and characteristic functions Generating and Characteristic Functions September 3, 03 Probability generating function Moment generating function Power series expansion Characteristic function
More informationCME 106: Review Probability theory
: Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:
More informationSTAT/MA 416 Answers Homework 6 November 15, 2007 Solutions by Mark Daniel Ward PROBLEMS
STAT/MA 4 Answers Homework November 5, 27 Solutions by Mark Daniel Ward PROBLEMS Chapter Problems 2a. The mass p, corresponds to neither of the first two balls being white, so p, 8 7 4/39. The mass p,
More informationConditional distributions. Conditional expectation and conditional variance with respect to a variable.
Conditional distributions Conditional expectation and conditional variance with respect to a variable Probability Theory and Stochastic Processes, summer semester 07/08 80408 Conditional distributions
More informationMAT 271E Probability and Statistics
MAT 71E Probability and Statistics Spring 013 Instructor : Class Meets : Office Hours : Textbook : Supp. Text : İlker Bayram EEB 1103 ibayram@itu.edu.tr 13.30 1.30, Wednesday EEB 5303 10.00 1.00, Wednesday
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More information. Find E(V ) and var(v ).
Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number
More informationProbability Models. 4. What is the definition of the expectation of a discrete random variable?
1 Probability Models The list of questions below is provided in order to help you to prepare for the test and exam. It reflects only the theoretical part of the course. You should expect the questions
More information1 Presessional Probability
1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional
More information2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.
CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook
More informationWeek 1 Quantitative Analysis of Financial Markets Distributions A
Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October
More informationMATH 151, FINAL EXAM Winter Quarter, 21 March, 2014
Time: 3 hours, 8:3-11:3 Instructions: MATH 151, FINAL EXAM Winter Quarter, 21 March, 214 (1) Write your name in blue-book provided and sign that you agree to abide by the honor code. (2) The exam consists
More informationUniversity of Chicago Graduate School of Business. Business 41901: Probability Final Exam Solutions
Name: University of Chicago Graduate School of Business Business 490: Probability Final Exam Solutions Special Notes:. This is a closed-book exam. You may use an 8 piece of paper for the formulas.. Throughout
More informationMultiple Random Variables
Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES Contents 1. Continuous random variables 2. Examples 3. Expected values 4. Joint distributions
More information6 The normal distribution, the central limit theorem and random samples
6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e
More informationChapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables
Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results
More informationX 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition:
nna Janicka Probability Calculus 08/09 Lecture 4. Real-valued Random Variables We already know how to describe the results of a random experiment in terms of a formal mathematical construction, i.e. the
More information18.440: Lecture 28 Lectures Review
18.440: Lecture 28 Lectures 17-27 Review Scott Sheffield MIT 1 Outline Continuous random variables Problems motivated by coin tossing Random variable properties 2 Outline Continuous random variables Problems
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction
More informationMAT 271E Probability and Statistics
MAT 7E Probability and Statistics Spring 6 Instructor : Class Meets : Office Hours : Textbook : İlker Bayram EEB 3 ibayram@itu.edu.tr 3.3 6.3, Wednesday EEB 6.., Monday D. B. Bertsekas, J. N. Tsitsiklis,
More informationPhysics 6720 Introduction to Statistics April 4, 2017
Physics 6720 Introduction to Statistics April 4, 2017 1 Statistics of Counting Often an experiment yields a result that can be classified according to a set of discrete events, giving rise to an integer
More informationFebruary 26, 2017 COMPLETENESS AND THE LEHMANN-SCHEFFE THEOREM
February 26, 2017 COMPLETENESS AND THE LEHMANN-SCHEFFE THEOREM Abstract. The Rao-Blacwell theorem told us how to improve an estimator. We will discuss conditions on when the Rao-Blacwellization of an estimator
More information2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).
Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationLecture 2 : CS6205 Advanced Modeling and Simulation
Lecture 2 : CS6205 Advanced Modeling and Simulation Lee Hwee Kuan 21 Aug. 2013 For the purpose of learning stochastic simulations for the first time. We shall only consider probabilities on finite discrete
More informationT has many other desirable properties, and we will return to this example
2. Introduction to statistics: first examples 2.1. Introduction. The basic problem of statistics is to draw conclusions about unknown distributions of random variables from observed values. These conclusions
More information1 Exercises for lecture 1
1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )
More informationThe Delta Method and Applications
Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationProbability Theory and Statistics. Peter Jochumzen
Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................
More informationLecture 13 and 14: Bayesian estimation theory
1 Lecture 13 and 14: Bayesian estimation theory Spring 2012 - EE 194 Networked estimation and control (Prof. Khan) March 26 2012 I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates
More informationWeek 9 The Central Limit Theorem and Estimation Concepts
Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population
More information1 Random variables and distributions
Random variables and distributions In this chapter we consider real valued functions, called random variables, defined on the sample space. X : S R X The set of possible values of X is denoted by the set
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationThe expected value E[X] of discrete random variable X is defined by. xp X (x), (6.1) E[X] =
Chapter 6 Meeting Expectations When a large collection of data is gathered, one is typically interested not necessarily in every individual data point, but rather in certain descriptive quantities such
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationDiscrete Mathematics and Probability Theory Fall 2015 Note 20. A Brief Introduction to Continuous Probability
CS 7 Discrete Mathematics and Probability Theory Fall 215 Note 2 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces Ω, where the number
More information5. Conditional Distributions
1 of 12 7/16/2009 5:36 AM Virtual Laboratories > 3. Distributions > 1 2 3 4 5 6 7 8 5. Conditional Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an
More information1 Basic continuous random variable problems
Name M362K Final Here are problems concerning material from Chapters 5 and 6. To review the other chapters, look over previous practice sheets for the two exams, previous quizzes, previous homeworks and
More informationChapter 2 Random Variables
Stochastic Processes Chapter 2 Random Variables Prof. Jernan Juang Dept. of Engineering Science National Cheng Kung University Prof. Chun-Hung Liu Dept. of Electrical and Computer Eng. National Chiao Tung
More informationGaussian random variables inr n
Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b
More informationMonte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan
Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationECE 313: Conflict Final Exam Tuesday, May 13, 2014, 7:00 p.m. 10:00 p.m. Room 241 Everitt Lab
University of Illinois Spring 1 ECE 313: Conflict Final Exam Tuesday, May 13, 1, 7: p.m. 1: p.m. Room 1 Everitt Lab 1. [18 points] Consider an experiment in which a fair coin is repeatedly tossed every
More informationFor a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,
CHAPTER 2 FUNDAMENTAL CONCEPTS This chapter describes the fundamental concepts in the theory of time series models. In particular, we introduce the concepts of stochastic processes, mean and covariance
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More information