Module 3. Function of a Random Variable and its distribution

Size: px

Start display at page:

Download "Module 3. Function of a Random Variable and its distribution"

Jodie Randall
5 years ago
Views:

1 Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given function and let : Ω R be a function of random variable, defined by = h, Ω. In many situations it may be of interest to study the probabilistic properties of, which is a function of random variable. Since the variable takes values in R, to study the probabilistic properties of, it is necessary that F, B, i.e., is a random variable. Throughout, for a positive integer, R will denote the -dimensional Euclidean space and B will denote the Borel sigma-field in R. Definition 1.1 Let and be positive integers. A function h: R R is said to be a Borel function if h B, B. The following lemma will be useful in deriving conditions on the function h: R R so that : Ω R, defined by = h, Ω, is a random variable. Recall that, for a function Ψ: and, Ψ = : Ψ. Lemma 1.1 Let : Ω R and h: R R be given functions. Define : Ω R by = h, Ω. Then, for any R, = h. Proof. Fix R. Note that h = R: h. Clearly Therefore, h h. = : = : h = : h = h 1

2 Theorem 1.1 Let be a random variable defined on a probability space Ω, F, and let h: R R be a Borel function. Then the function : R, defined by = h,, is a random variable. Proof. Fix B. Since h is a Borel function, we have h B. Now using the fact that is a random variable it follows that This proves the result. Remark 1.1 = h F. (i) Let h: R R be a continuous function. According to a standard result in calculus inverse image of any open interval,, <, under continuous function h is a countable union of disjoint open intervals. Since B contains all open intervals and is closed under countable unions it follows that h, B, whenever <. Now on employing the arguments similar to the one used in proving Theorem 1.1, Module 2 (also see Theorem 1.2, Module 2) we conclude that h B, B. It follows that any continuous function h: R R is a Borel function and thus, in view of Theorem 1.1, any continuous function of a random variable is a random variable. In particular if is a random variable then,, max, 0, sin and cos are random variables. (ii) Let h: R R be a strictly monotone function. Then, for <, h, is a countable union of intervals and therefore h, B, i.e., h is a Borel function. It follows that if is a random variable and if h: R R is strictly monotone then h is a random variable. A random variable takes values in various Borel sets according to some probability law called the probability distribution of random variable. Clearly the probability distribution of a random variable of absolutely continuous/discrete type is described by its distribution function (d.f.) and/or by its probability density function/probability mass function (p.d.f/p.m.f.). For a given Borel function h: R R, in the following section, we will derive probability distribution of h using the probability distribution of random variable. 2

3 2. Probability Distribution of a Function of a Random Variable In our future discussions when we refer to a random variable, unless otherwise stated, it will be either of discrete type or of absolutely continuous type. The probability distribution of a discrete type random variable will be referred to as a discrete (probability) distribution and the probability distribution of a random variable of absolutely continuous type will be referred to as an absolutely continuous (probability) distribution. The following theorem deals with discrete probability distributions. Theorem 2.1 Let be a random variable of discrete type with support and p.m.f.. Let h: R R be a Borel function and let : Ω R be defined by = h, Ω.Then is a random variable of discrete type with support = h: and p.m.f. =, if 0, otherwise where = : h =. =, if 0, otherwise, Proof. Since h is a Borel function, using Theorem 1.1, it follows that is a random variable. Also is of discrete implies that is countable which further implies that is countable. Fix, so that = h for some. Then = = Ω: = : h = h = h = h = =, and = Ω: : h = h =. Therefore, = > 0, (since ), and = 1. 3

4 It follows that is countable, = > 0, and = 1, i. e., is a discrete type random variable with support. Moreover, for, Hence the result follows. = = Ω: h = = = = =. The following corollary is an immediate consequence of the above theorem. Corollary 2.1 Under the notation and assumptions of Theorem 2.1, suppose that h: R R is one-one with inverse function h : R, where = h: R. Then is a discrete type random variable with support = h: and p.m.f. = h, if 0, otherwise = = h, if 0, otherwise. Example 2.1 Let be a random variable with p.m.f. 1 7, if 2, 1, 0, 1 = 3., if 2, , otherwise Show that = is a random variable. Find its p.m.f. and distribution function. Solution. Since h =, R, is a continuous function and is a random variable, using Remark 1.1 (i) it follows that = h = is a random variable. Clearly = 2, 1, 0, 1, 2, 3 and = 0, 1, 4, 9. Moreover, 4

5 = 0 = = 0 = = 0 = 1 7, = 1 = = 1 = 1, 1 = = 2 7, = 4 = = 4 = 2, 2 = = 5 14, and = 9 = = 9 = 3, 3 = = 3 14 Therefore the p.m.f. of is and the distribution function of is 1, if = 0 7 2, if = 1 7 = 5,, if = , if = , otherwise 0, if < 0 1, if 0 < =, if 1 < , if 4 < , if z 9 Example 2.2 Let be a random variable with p.m.f., if ± 1, ±2,, ±50 =. 0, otherwise Show that = is a random variable. Find its p.m.f., and distribution function. 5

6 Solution. As h =, R, is a continuous function and is a random variable, using Remark 1.1 (i), = is a random variable. We have = ± 1, ±2,, ±50 and = 1, 2,, 50. Moreover, for, Therefore the p.m.f. of is = = = =, = = 1275 and the distribution function of is Example 2.3 Let be a random variable with p.m.f., if 1, 2,, 50 = 1275, 0, otherwise 0, if < 1, if 1 < 2 =., if < + 1, = 2, 3,,49 1, if 50 = 1, if 0, 1,,, 0, otherwise where is a positive integer and 0,1. Show that = is a random variable. Find its p.m.f. and distribution function. Solution. Note that = = 0, 1,, and h =, R, is a continuous function. Therefore = is a random variable. For = = = = 1 = 1. Thus the p.m.f. of is = 1, if 0, 1,,, 0, otherwise 6

7 and the distribution function of is 0, if < 0, if 0 < 1 = 1., if < + 1, = 1,2,, 1 1, if The following theorem deals with probability distribution of absolutely continuous type random variables. Theorem 2.2 Let be a random variable of absolutely continuous type with p.d.f. and support. Let,,,, be open intervals inr such that =, if and =. Let h: R R be a Borel function such that, on each = 1,,, h: R is strictly monotone and continuously differentiable with inverse function h. Let h = h: so that h = 1,, is an open interval in R. Then the random variable = h is of absolutely continuous type with p.d.f. = h h. Proof. We will provide an outline of the proof which may not be rigorous. Let be the distribution function of. For R and Δ > 0, + Δ = Δ < h + Δ Δ = < h + Δ, Δ Fix 1,,. First suppose that h is strictly decreasing on. Note that = h h and h is an open interval. Thus, for belongings to the exterior of h and sufficiently small Δ > 0, we have < h + Δ, = 0. Also, for h and sufficiently small Δ > 0, < h + Δ, = h + Δ < h. 7

8 Thus, for all R, we have < h + Δ, Δ = h + Δ < h Δ = 1 Δ Similarly if h is strictly increasing on then, for all R, we have < h + Δ, Δ h h. (2.1) = h < h + Δ Δ = 1 Δ h h. (2.2) Note that if h is strictly decreasing (increasing) on then h < >0 on. Now on combining (2.1) and (2.2) we get, for all R, < h + Δ, Δ + Δ Δ Similarly one can show that, for all R, h h, h h. +Δ lim Δ Δ = h h. 2.3 It follows that the distribution function of is differentiable everywhere on R except possibly at a finite number of points (on boundaries of intervals h,, h of ). Now the result follows from Remark 4.2 (vii) of Module 2 and using (2.3). The following corollary to the above theorem is immediate. 8

9 Corollary 2.2 Let be a random variable of absolutely continuous type with p.d.f. and support. Suppose that is a finite union of disjoint open intervals in R and let h: R R be a Borel function such that h is differentiable and strictly monotone on (i.e., either h < 0, or h > 0, ). Let = h:. Then = h is a random variable of absolutely continuous type with p.d.f. = h h, if 0, otherwise It may be worth mentioning here that, in view of Remark 4.2 (vii) of Module 2, Theorem 2.2 and Corollary 2.2 can be applied even in situations where the function h is differentiable everywhere on except possibly at a finite number of points. Example 2.4 Let be random variable with p.d.f. and let = =, if > 0 0, otherwise, (i) (ii) (iii) Show that is a random variable of absolutely continuous type; Find the distribution function of and hence find its p.d.f.; Find the p.d.f. of directly (i.e., without finding the distribution function of ). Solution. (i) and (iii). Clearly = is a random variable (being a continuous function of random variable ). We have = = 0,. Also h =,, is strictly increasing on with inverse function h =,. Using Corollary 2.1 it follows that = is a random variable of absolutely continuous type with p.d.f. =, if > 0 0, otherwise =, if > 0. 0, otherwise (ii) We have =, R. Clearly, for < 0, = 0. For 0, 9

10 = = Therefore the distribution function of is = = 1. 0, if < 0 = 1, if 0. Clearly is differentiable everywhere except at = 0. Therefore, using Remark 4.2 (vii) of Module 2, we conclude that the random variable is of absolutely continuous type with p.d.f. =, if 0. At = 0 we may assign any arbitrary non- negative value to 0. Thus a p.d.f. of is Example 2.5 Let be a random variable with p.d.f. and let = =, if > , otherwise =, if 1 < < 1, if 1 < 2 0, otherwise, (i) (ii) (iii) Show that is a random variable of absolutely continuous type; Find the distribution function of and hence find its p.d.f; Find the p.d.f. of directly (i.e., without finding the distribution function of ). 10

11 Solution. (i) and (iii). Clearly = is a random variable (being a continuous function of random variable ). We have = 1, 0 0, 2 =, say. Also h =,, is strictly decreasing in = 1, 0 with inverse function h = ; h =,, is strictly increasing in = 0, 2, with inverse function h = ; h = 0, 1 and h = 0, 4. Using Theorem 2.2 it follows that = is a random variable of absolutely continuous type with p.d.f. =, +, 1, if 0 < < 1 2 = 1, if 1 < < 4 6 0, otherwise (ii) We have =, R. Since 1, 2 = 1, we have 0, 4 = 1. Therefore, for < 0, = = 0 and, for 4, = = 1. For [0,4, we have = =, if 0 < 1 2 = , if 1 < 4 Therefore, the distribution function of is 0, if < 0, if 0 < 1 ) = 2 + 2, if 1 < 4 6 1, if 4 Clearly is differentiable everywhere except on set = 0, 1, 2. Also ) ) = =

12 Using Remark 4.2 (vii) of Module 2 it follows that the random variable is of absolutely continuous type with a p.d.f. 1, if 0 < < 1 2 = 1, if 1 < < 4 6 0, otherwise Note that a Borel function of a discrete type random variable is a random variable of discrete type (see Theorem 1.1). Theorem 2.2 provides sufficient conditions under which a Borel function of an absolutely continuous type random variable is of absolutely continuous type. The following example illustrates that, in general, a Borel function of an absolutely continuous type random variable may not be of absolutely continuous type. Example 2.6 Let be a random variable of absolutely continuous type with p.d.f. =, if > 0 0, otherwise, and let = [, where, for R, [ denotes the largest integer not exceeding. Show that is a random variable of discrete type and find its p.m.f.. Solution. For R, we have, =, [ + 1 B. It follows that is a random variable. Also = 0,. Since = 1, we have 0, 1, 2, = 1. Also, for 0, 1, 2,. = = < + 1 = = = 1 12

13 > 0. Consequently the random variable is of discrete type with support = 0, 1, 2, and p.m.f. = = = 1, if 0,1, 2,. 0, otherwise 3. Expectation and Moments of a Random Variable Suppose that is a discrete type random variable defined on a probability space Ω, F, associated with a random experiment. Let and denote, respectively, the support and the p.m.f. of. Suppose that the random experiment is repeated a large number of times. Let,, denote the frequency of the event = in the first repetitions of the random experiment. Then, according to the relative frequency approach to the probability, = = lim,. Note that represents the mean observed value (or expected value) of random variable in the first repetitions of the random experiment. Therefore, in line with axiomatic approach to probability, one may define the mean value (or expected value) of random variable as = lim = lim = = =, 13

14 provided the involved limits exist and the interchange of signs of summation and limit is admissible. A similar discussion can be provided for defining the expected value of an absolutely continuous type random variable, having p.d.f., as provided the integral is defined. =, The above discussion leads to the following definitions. Definition 3.1 (i) Let be a discrete type random variable with p.m.f. and support. We say that the expected value of (denoted by ) is finite and equals =, provided ) <. (ii) Let be an absolutely continuous type random variable with p.d.f.. We say that the expected value of (denoted by ) ) is finite and equals provided ) < ) = ), The following observations to above definitions are immediate. Remark 3.1 (i) Since ) ) = ) and ) ) = ), it follows that if the expected value of a random variable is finite, then ) <. 14

15 (ii) (iii) If is a random variable of discrete type with finite support, then ) <. Consequently the expected value of is finite. Suppose that is a random variable of absolutely continuous type with p.d.f. and support,, for some 0, ). Then Example 3.1 ) = ) ) = <. Consequently the expected value of is finite. Let be a random variable with p.m.f. ) =, if 1, 2, 3,. 0, otherwise Show that the expected value of is finite and find its value. Solution. We have = 1, 2, 3, and ) = 2 =, say. Clearly = > 0, = 1, 2, and By the ratio test, = < 1, as. 2 ) = <, and therefore the expected value of is finite. Moreover, 15

16 = = 2 = lim, where = On subtracting (3.2) from (3.1), we get = 2 = = = = = lim = 2 Example 3.2 Let be a random variable with p.m.f. Show that the expected of is not finite. Solution. We have = ±1, ±2, ±3, and 3 =, if ±1, ±2, ±3,. 0, otherwise ) = 6 π 1 = 16

17 Thus the expected value of is not finite. Example 3.3 Let be random variable with p.d.f. =, < <. 2 Show that the expected value of is finite and find its value. Solution. We have = = = 1. Thus the expected value of is finite and 2 = = = 0. 2 Example 3.4 Let be random variable with p.d.f. Show that the expected value of is not finite. Solution. We have = 1 1, < <

18 = = =. Therefore the expected value of is not finite. Theorem 3.1 Let be a random variable of absolutely continuous or discrete type with finite expected value. Then (i) = > < ; (ii) = >, provided 0 = 1; (iii) =, provided 0, ±1, ±2, = 1; (iv) =, provided 0, 1, 2, = 1. Proof. (i) Case I. is of absolutely continuous type = = + = + On changing the order of integration in the two integrals above, we get Case II. is of discrete type = + = < + >. 18

19 We will illustrate the idea of the proof by considering a special case where =,,, < < < < < 0 < < < < and lim =. Under the above situation > = > + > Also, = + = + = + = + = [ + [ = = < = < + < + < =

20 = = = = = Therefore, > < = = =. (ii) Suppose that 0 = 1. Then < = 0, 0, and therefore = > < = >. (iii) Suppose that 0, ±1, ±2,. Then, for (the set of integers) and 1, we have Therefore, > = and < = 1 > = > = =, 20

21 < = < = =, and the assertion follows on using (i). (iv) Suppose that 0, 1, 2, = 1. Then = 0, 1, 2, and the result follows from (iii). Theorem 3.2 (i) (ii) Let be a random variable of discrete type with support and p.m.f.. Let h: R R be a Borel function and let = h. Then = h, provided it is finite. Let be a random variable of absolutely continuous type with p.d.f.. Let h: R R be a Borel function and let = h. Then provided it is finite. = h, Proof. (i) By Theorem 2.1, = h is a random variable of discrete type with support = h: and p.m.f. =, = = = if, 0, otherwise where = : h =,, so that : forms a partition of =, if, and =. Therefore, 21

22 = = = = = = = h = since for, = h = h = since =, if = h = since = = h. (ii) Define = : h >, 0, and = : h <, 0. For simplicity we assume that, for every 0 and 0, and are intervals. Then, using Theorem 3.1 (i), = > < = =, on interchanging the order of integration in the above two integrals and using the following two immediate observations: (a) 0,, and 0, h; (b), 0, and h, 0. Therefore, = h + h 22

23 Remark 3.2 = h = h, since = and = : h = 0. Recall that probability density function of absolutely continuous type random variable is not unique. However the distribution function of any random variable is unique. Theorem 3.1 (i) implies that the expected value = = 1, of an absolutely continuous type random variable is unique (i.e., it does not depend on the version of p.d.f. used) although the probability density function may not be unique. Some special kind of expectations which will be frequently used are defined below. Definition 3.2 Let be a random variable defined on some probability space. (i) (ii) (iii) (iv) (v) (vi) =, provided it is finite, is called the mean of the (distribution of) random variable ; For 1, 2, =, provided it is finite, is called the r-th moment of the (distribution of) random variable ; For 1, 2,, provided it is finite, is called the r-th absolute moment of the (distribution of) random variable ; For 1, 2, =, provided it is finite, is called the r-th central moment of the (distribution of) random variable ; =, provided it is finite, is called the variance of the (distribution of) random variable. The variance of a random variable is denoted by Var. The quantity = = is called the standard deviation of the (distribution of) random variable. Suppose that the distribution function of a random variable X can be decomposed as 23

24 Theorem 3.3 = + 1, R, [0,1, where is a distribution function of a discrete type random variable (say ) and is a distribution function of an absolutely continuous type random variable (say ). Then, for a Borel function h: R R, the expectation of h is defined by h = h + 1 h, provided h, h and h are finite. Let be a random variable. (i) If h and h are Borel functions such that h h = 1, then h h, provided the involved expectations are finite; (ii) If, for real constants and with, = 1, then ; (iii) If 0 = 1 and = 0, then = 0 = 1; (iv) If is finite, then ; (v) For real constants and, + = +, provided the involved expectations are finite; (vi) If h,, h are Borel function then h = h, provided the involved expectations are finite. Proof. We will provide the proof for the situation when is of absolutely continuous type. The proof for the discrete case is analogous and is left as an exercise. Also assertions (iv)-(vi) follow directly from the definition of the expectation of a random variable and using elementary properties of integrals (sums/series). Therefore we will provide the proofs of only first three assertions. (i) Define = R: h h, = and =, if 0, otherwise. Then 0, R, = 0, = 0, = = + = = 1, 24

25 = = = 1 and, for any B, = since = 1 = since = 0 = =. It follows that is also a p.d.f. of with support =. The above discussion suggest that, without loss of generality, we may take = R: h h (otherwise replace by and by ). Then h h, R h = h h = h. (ii) (iii) Since = 1, as in (i), without loss of generality we may assume that [,. Then, R = =, i.e.,. Since 0 = 1, without loss of generality we may take [0,. Then, 0 = R: = 0 and therefore, for 1, 2,, 0 = = + = 25

26 1 = = 0, 1, 2, lim 1 = 0 1 = 0 since 1 > 0 = 0 = 0 = 0 > 0 = 1. Corollary 3.1 Let be random variable with finite first two moments and let =. Then, (i) Var = ; (ii) Var 0. Moreover, Var = 0 if, and only if, = = 1; (iii) (Cauchy Schwarz inequality); (iv) For real constants and, Var + = Var. Proof. (i) Note that = is a fixed real number. Therefore, using Theorem 3.3 (v)-(vi), we have Var = = 2 + = =. 26

27 (ii) Since 0 = Ω = 1, using Theorem 3.3 (i), we have Var = 0. Also, using Theorem 3.3 (iii), if Var = = 0 then = 0 = 1, i. e., = = 1. Conversely if = = 1, then = and =. Now using (i), we get Var = = 0. (iii) (iv) Follows from (i) and (ii). Let = +. Then = + (using Theorem 3.3 (v)) = and Var = = = = Var Example 3.5 Let be a random variable with p.d.f. 1 2, if 2 < < 1 =, if 0 < < 3 9 0, otherwise (i) If = max, 0, find the mean and variance of ; (ii) If = 2 + 3, + 4, find. Proof. On using Theorem 3.2 (ii) we get, for > 0, = max, 0 27

28 = max, 0 = 9 = It follows that = 1, = 9 4 and Var = = 5 4 (iii) We have = = 2 = and, =, = = 18 Therefore, = 2 + 3, + 4 = 2 + 3, + 4 Example 3.6 = 28

29 Let be random variable with p.m.f. where 1, 2,,, 0, 1 and = 1. =, if 0, 1,,, 0, otherwise (i) (ii) (iii) For 1, 2,, find, where = is called the -th factorial moment of, = 1,2, ; Find mean and variance of ; Let = Find. Solution. (i) Fix 1, 2,,. Then = = 1 + 1! = 1 + 1!! = = = = (ii) Using (i), we get = = 1 = = 1 Therefore, = 1 + =

30 and Var = =. (iii) For R, we have = = = +. Therefore, = = = We are familiar with the Laplace transform of a given real-valued function defined on R. We also know that, under certain conditions, the Laplace transform of a function determines the function almost uniquely. In probability theory the Laplace transform of a p.d.f./p.m.f. of a random variable plays an important role and is referred to as moment generating function (of probability distribution) of random variable. Definition 3.3 Let be a random variable and let = R: = is finite. Define : R by =,. (i) (ii) We call the function the moment generating function (m.g.f.) (of probability distribution) of random variable ; We say that the m.g.f. of a random variable exists if there exists a positive real number such that, (i.e., if = is finite in an interval containing 0). Note that 0 = 1 and, therefore, = R: is finite. Moreover, using Theorem 3.3 (ii)-(iii), we have > 0,. Also if = exists and is finite on an interval,, > 0, then for any real constants and the m.g.f. of = + also 30

31 exists and = = = =,,, with the convention that 0 = The name moment generating function to the transform is derived from the fact that can be used to generate moments of random variable, as illustrated in the following theorem. Theorem 3.4 Let be a random variable with m.g.f. that is finite on an interval,, for some > 0 (i.e., m.g.f. of exists). Then, (i) for each 1, 2,, = is finite; (ii) for each 1, 2, exists on,, and for each 1, 2,, = (iii) = 0, where 0 =, the r-th derivative of at the point 0; =!,,. Proof. We will provide the proof for the case where is of absolutely continuous type. The proof for the case of discrete type follows in the similar fashion with integral signs being replaced by summation sings. (i) We have <,,. Therefore, <,, and <,, <,, and <,, <,, and <,, <,, and <,, 31

32 ) <,, ), i.e., <,, ). Fix 1, 2, and, ) 0. Then lim = 0 and therefore there exists a positive real number, such that <, whenever >,. Thus we have ) = ) = ) + ),,, ) + ),,, <. + ) (ii) Fix 1, 2,. Then, for, ), and ) = ) ) ) = ). Under the assumption that ) = ) <,, ), using arguments of advanced calculus, it can be shown that the derivatives exists and can be passed through the integral sign. Therefore ) ) = ) = ) 32

33 0 = =. (iii) Fix 1, 2,. Then, for,, = =!. Under the assumption that = <,,, using arguments of advanced calculus, it can be shown that the integral sign can be passed through the summation sign, i.e., Corollary 3.2 =! =!. Under the notation and assumptions of Theorem 3.4, define :, R by = ln,,. Then = 0 and = Var = 0, where denotes the r-th 1, 2 derivative of. Proof. We have, for,, = and =. Using the facts that 0 = 1 and 0 =, 1, 2, we get 0 = 0 0 =, 33

34 and 0 = = = Var. Example 3.7 Let be a random variable with p.m.f. where > 0. =, if 0, 1, 2,!, 0, otherwise (i) (ii) (iii) Find the m.g.f., = R: <, of. Show that possesses moment of all orders. Find the mean and variance of ; Find = ln,. Hence find the mean and variance of ; What are the first four terms in the power series expansion of around the point 0. Solution. (i) We have = = = = =, R.!! Since = R: < = R, by Theorem 3.4 (i), for every 1, 2,, = is finite. Clearly Therefore, = and = 1 +, R. = 0 =, = 0 = 1 +, 34

35 and Var = =. (ii) We have, for R, = ln = 1, = =. Therefore, = 0 = and Var = 0 =. (iii) We have = , R = = 0 = Since = R: < = R, by Theorem 3.4 (iii), we have Example 3.8 = ! + 3! + Let be a random variable with p.d.f. = ! , R. 3! =, if > 0 0, otherwise. (i) (ii) (iii) Find the m.g.f., = R: < of. Show that posseses moments of all orders. Find the mean and variance of ; Find = ln,. Hence find the mean and variance of ; Expand as a power series around the point 0 and hence find, 1, 2,. Solution. (i) We have = = = <, if < 1. Clearly = R: < =, 1 1, 1and = 1, < 1. By Theorem 3.4 (i), for every 1, 2,, is finite. Clearly 35

36 = 1 and = 21, < 1, = 0 = 1, = 0 = 2, and Var = = 1. (ii) We have = ln = ln1, < 1 = 1 1 and 1 =, < 1 1 = 0 = 1 and Var = 0 = 1. (iii) We have = 1 =, 1, 1. Example 3.9 Since = R: < =, 1 1, 1, using Theorem 3.4 (iii), we conclude that = coefficient of =!. Let be a random variable with p.d.f. Show that the m.g.f. of does not exist.! in the power series expansion of around 0 = 1 π. 1, < <. 1 + Solution. From Example 3.4 we know that the expected value of is not finite. Therefore, using Theorem 3.4 (i), we conclude that the m.g.f. of does not exist. 4. Properties of Random Variables Having the Same Distribution We begin this section with the following definition. 36

37 Definition 4.1 Two random variables and, defined on the same probability space Ω, F,, are said to have the same distribution (written as = ) if they have the same distribution function, i.e., if =, R. Theorem 4.1 (i) (ii) Let and be random variables of discrete type with p.m.f.s and respectively. Then = if, and only if, =, R; Let and be random variables of absolutely continuous type. Then = if, and only if, there exist versions of p.d.f.s and of and, respectively, such that =, R. Proof. (i) Suppose that =, R. Then, clearly, =, R, and therefore =. Conversely suppose that =, i.e., = =, say, R. Then R: > 0 = R: > 0 = R: > 0 = =, say. Moreover,, if = = 0, otherwise. (ii) Suppose that =, R, for some versions of p.d.f.s of and of and respectively. Then, clearly, =, R and therefore =. Conversely suppose that =, i.e., suppose that = =, say, R. For simplicity suppose that is differentiable everywhere except possibly on a countable set and = 1. Using Remark 4.2 (vii), Module 2, it follows that both and are of absolutely continuous type with a common (version of) p.d.f. =, if 0, otherwise 37

38 As a consequence of the above theorem we have the following corollary. Theorem 4.2 Let and be two random variables, of either discrete type or of absolutely continuous type, with =. Then, (i) (ii) for any Borel function h, h = h, provided the expectations are finite; for any Borel function, =. Proof. (i) Since =, we have = =, say, R Case I. is of discrete type. Since =, using Theorem 4.1 (i), it follows that = =, say, and = =, say, R. Therefore, Case II. is of absolutely continuous type. h = h = h = h. For simplicity assume that is differentiable everywhere except possibly on a finite set. Using Remark 4.2 (vii), Module 2, we may take Therefore, = =, if 0, if h = h = h 38

39 (ii) Fix R. On taking in (i), we get = h. h =, = 1, if 0, if >, =, =, R =. Example 4.1 (i) Let be a random variable with p.m.f. = 1 2, if 0, 1,,, 0, otherwise where is a given positive integer. Let =. Show that = and hence show that =. (ii) Let be a random variable with p.d.f. Solution. = 2, < <, and let =. Show that = and hence show that = 0, 0, 1,. (i) Clearly is finite. Using Example 2.3 it follows that the p.m.f. of = is given by = = = 1 2, if 0, 1,, 0, otherwise =, R, 39

40 i.e., =. Hence, using Theorem 4.2 (i), = = = = 2. (ii) Using Corollary 2.2, it can be shown that = is a random variable of absolutely continuous type with p.d.f. = 2 =, < <. It follows that =. It can be shown that is finite for every > 1. Therefore Definition 4.2 = = = 0. A random variable is said to have a symmetric distribution about a point R if =. Theorem 4.3 Let be a random variable having p.d.f./p.m.f. and distribution function. Let R. (i) Then the distribution of is symmetric about if, and only if, = +, R ; (ii) Then the distribution of is symmetric about if, and only if, + + = 1, R i. e., if and only if, + = ; (iii) Then the distribution of is symmetric about if, and only if, the distribution of = is symmetric about 0; (iv) If the distribution of is symmetric about, then ; (v) If the distribution of is symmetric about and the expected value of is finite, then = ; (vi) If the distribution of is symmetric about 0, then = 0, 0, 1, 2,, provided the involved expectations exist. Proof. For simplicity we will assume that if is of absolutely continuous type then its distribution function is differentiable everywhere except possibly on a countable set. 40

41 (i) Let = and =. Then the p.d.f.s/p.m.f.s of and are given by = +, R and =, R, respectively. Now, under the hypothesis, distribution of is symmetric about = =, R + =, R. (ii) Let = and = so that the distribution functions of and are given by = +, R, and = 1, R. Therefore, under the hypothesis, = =, R + + = 1, R. (iii) Clearly, distribution of is symmetric about = = =. (iv) Using (ii), we have distribution of is symmetric about + + = 1, R + = 1 (v) since. 1 2, Suppose that = and <. Then = and therefore =. (vi) Suppose that =. Then =, 0,1,2,, provided the expectation exist. Therefore, =, 0, 1, 2, = 0, 0, 1, 2,. Theorem 4.4 Let and be random variables having m.g.f.s and respectively. Suppose that there exists a positive real number such that =,,. Then =. 41

42 Proof. We will provide the proof for the special case where and are of discrete type and = 0, 1, 2,, as the proof for general and is involved. We have =,, = = =,, = = =,,. We know that if two power series or polynomials match over an interval then they have the same coefficients. It follows that = = =, 0,1,2,, i. e., and have the same p.m.f.. Now the result follows on using Theorem 4.1 (i). Example 4.2 Let R and > 0 be real constants and let, be a random variable having a p.d.f., = (i) Show that, is a p.d.f.; 1 σ 2 e, < < 4.1 (ii) Show that the probability distribution function of, is symmetric about. Hence find, ; (iii) Find the m.g.f. of, and hence find the mean and variance of, ; (iv) Let, =, +, where 0 and are real constants. Using the m.g.f. of,, show Solution. that the p.d.f. of, is, = 1 a σ 2 e, < <. (i) Clearly, 0, R. Also,, = e = 1 2 e on making the transformation μ = Clearly 0 and =, say. 42

43 = 1 2 e = 1 2π 1 2. On making the transformation = cos, = sin, > 0, 0 < 2 (so that the Jacobian of the transformation is r) we get = 1 2 = = = 1. e Since 0, it follows that = 1 and thus, is a p.d.f.. (ii) Clearly,, μ =, μ + = 1 σ 2 e, R. Using Theorem 4.3 (i) and (v) it follows that the distribution of, is symmetric about μ and E, = μ. (iii) For R, =, = e 1 σ 2 e d = 1 2 e e d = = e, e dt 43

44 since, by (i), e d = σ 2π, R and > 0. Thus, for R, ψ, t = ln M, t = μt + σ t EX =ψ, 0 = μ and VarX =ψ, 0 = σ. (iv) From the discussion following Definition 3.3 we have, for R, M, t =, =, = =, 2 Therefore the p.d.f. of, is given by, =,., =, = 1 2 R. In the statistical literature, the probability distribution of the random variable, having a p.d.f.,, defined by (4.1), is called the normal distribution (or Gaussian distribution) with mean and variance (denoted by,, ). Various properties of this distribution are further discussed in Module 5. Example 4.3 Let 0,1 and let be a random variable with p.m.f. =, if 0, 1,,, 4.2 0, otherwise where is a given positive integer and = 1. 44

45 (i) Find the m.g.f. of and hence find the mean and variance of, 0,1; (ii) Let =, p 0,1. Using the m.g.f. of show that the p.m.f. of is Solution. = 1, if 0, 1,,. 0, otherwise (i) From the solution of Example 3.6 (iii), it is clear that the m.g.f. of is given by = 1 +, R. Therefore, for t R, = ln = ln1 +, R, =, R, 1 + = , R = 0 = and Var = 0 = 1. (ii) For t R = = = 1 + = + 1 =, i.e., =, 0,1. Therefore the p.m.f. of is given by = = 1, if 0, 1,,. 0, otherwise The probability distribution of the random variable having p.m.f., defined by (4.2), is called a binomial distribution with trials and success probability (denoted by 45

46 Bin, ). The binomial distribution and other related distributions are discussed in more detail in Module Probability and Moment inequalities Let be a random variable defined on a probability space Ω, F,, and let B be a Borel set. In many situation cannot be explicitly evaluated and therefore some estimate of this probability may be desired. For example if a random variable has the p.d.f. = 1 2, < <, then > 2 = (5.1) cannot be explicitly evaluated and, therefore, an estimate of this probability may be desired. To estimate this probability one has to either resort to numerical integration or use some other estimation procedure. Inequalities are popular estimation procedures and they play an important role in probability theory. Theorem 5.1 Let be a random variable and let : [0, R be a non-negative and non-decreasing function such that the expected value of is finite. Then, for any > 0 for which > 0,. Proof. We will provide the proof for the case when is of absolutely continuous type. The proof for the discrete case follows in the similar fashion with integral signs replaced by summation signs. Fix > 0 such that > 0. Define = R, so that, for,. Then = since, R 46

47 , R, as = = Ε. Corollary 5.1 Let be a random variable. (i) (ii) Proof. (i) (ii) (Markov Inequality) Suppose that <, for some > 0. Then, for any > 0, (Chebyshev Inequality) Suppose that has finite first two moments. If = and = Var 0. Then for any > 0,. Fix > 0 and > 0 and let =, 0. Clearly is a non-negative and nondecreasing function. Using Theorem 5.1, we get. Using (i) on =, for = 2, we get Example: 5.1 =. Let us revisit the problem of estimating > 2, defined by (5.1). Using Example 4.2 (iii), we have = = 0 and = Var = 1. Moreover, using Example 4.2 (ii), =. Consequently > 2 = > 2 = < 2, i.e., > 2 = > 2 2 = 2. 2 Now, using the Chebyshev inequality we have 47

48 > 2 = = 1 8 = The exact value of > 2, obtained using numerical integration, is The following example illustrates that bounds provided in Theorem 5.1 and Corollary 5.1 are tight, i.e., the upper bounds provided there in may be attained. Example 5.2 Let be a random variable with p.m.f. 1 8, if 1,1 = 3., if = 0 4 0, otherwise Clearly = and, therefore, using the Markov inequality we have The exact probability is 1 = 1 4. Definition = 1,1 = 1 4. A random variable is said to be degenerate at a point R if = = 1. Suppose that a random variable is degenerate at R. Then clearly is of discrete type with support =, distribution function. and p.m.f. = 0, < 1,, = 1, = 0, otherwise. Note that a random variable is degenerate at a point R if, and only if, = and Var = 0. 48

49 Theorem 5.2 (Jensen Inequality) Let R be an interval and let : R be a twice differentiable function such that its second order derivative is continuous on and 0, R. Let be a random variable with support and finite expectation. Then. If > 0,, then the inequality above is strict unless is a degenerate random variable. Proof. Let =. On expanding into a Taylor series about we get = + + ",, 2! for some between and. Since " 0,, it follows that +, 5.2 +, + = =. Clearly the inequality in (5.2) is strict unless = 0, i. e., = = 1 (using Corollary 3.1 (ii)). Example 5.3 Let be a random variable with support, an interval in R. Then (i) taking =, R, in Theorem 5.2; (ii) ln ln,provided 0, taking = ln, R, in Theorem 5.2; (iii) taking =, R, in Theorem 5.2; (iv), if 0, taking =, R in Theorem 5.2 ; provided the involved expectations exist. Definition

50 Let be a random variable with support 0,. Then, provided they are finite, is called the arithmetic mean of, is called the geometric mean of, and is called harmonic mean ( of. Example 5.4 (i) (-- inequality) Let be a random variable with support 0,. Then provided the expectations are finite. 1, (ii) Let,, be positive real constants and let,, be another set of positive real constants such that = 1. Then 1 Solution. (i) From Example 5.3 (ii) we have ln ln (5.3) Using (5.3) on,we get 1 = The assertion now follows on combined (5.3) and (5.4).. (5.4) (ii) Let be a discrete type random variable with support =, and = =, = 1,,. Clearly, = > 0 an = = = 1. On using (i), we get 50

51 Descriptive Measures of Probability Distributions Let be a random variable defined on a probability space Ω, F,, associated with a random experiment E. Let and denote, respectively, the distribution function and the p.d.f./p.m.f. of. The probability distribution (i.e., the distribution function/p.d.f./p.m.f.) of describes the manner in which the random variable takes values in different Borel sets. It may be desirable to have a set of numerical measures that provide a summary of the prominent features of the probability distribution of. We call these measures as descriptive measures. Four prominently used descriptive measures are: (i) Measures of central tendency (or location), also referred to as averages; (ii) measures of dispersion; (iii) measures of skewness, and (iv) measures of kurtosis. I. Measures of Central Tendency A measure of central tendency or location (also called an average) gives us the idea about the central value of the probability distribution around which values of the random variable are clustered. Three commonly used measures of central tendency are mean, median and mode. I (a). Mean. Recall (Definition 3.2 (i)) that the mean (of probability distribution) of a random variable is given by =. We have seen that the mean of a probability distribution gives us idea about the average observed value of in the long run (i.e., the average of observed values of when the random experiment is repeated a large number of times). Mean seems to be the best suited average if the distribution is symmetric about a point (i.e., =, in which case = provided it is finite), values in the neighborhood of occur with high probabilities, and as we move away from in either direction decreases. 51

52 Because of its simplicity mean is the most commonly used average (especially for symmetric or nearly symmetric distributions). Some of the demerits of this measure are that in some situations this may not be defined (Examples 3.2 and 3.4) and that it is very sensitive to presence of a few extreme values of which are different from other values of (even though they may occur with small positive probabilities). So this measure should be used with caution if probability distribution assigns positive probabilities to a few Borel sets having some extreme values. I (b). Median. A real number satisfying 1 2, i. e., < 1, 2 is called the median (of the probability distribution) of. Clearly if is the median of a probability distribution then, in the long run (i. e., when the random experiment E is repeated a large number of times), the values of on either side of in are observed with the same frequency. Thus the median of a probability distribution, in some sense, divides into two equal parts each having the same probability of occurrence. It is evident that if is of continuous type then the median m is given by = 1/2 and it may not be unique. Also, for some distributions (especially for distributions of discrete type random variable) it may happen that < and R: = 1/2 = [,, for some < < <, so that the median is again not unique. In such situations we define the median to be = inf R: 1/2. For random variables having a symmetric probability distribution it is easy to verify that the mean and the median coincide (see Problem 33). Unlike the mean, the median of a probability distribution is always defined. Moreover the median is not affected by a few extreme values as it takes into account only the probabilities with which different values occur and not their numerical values. As a measure of central tendency the median is preferred over the mean if the distribution is asymmetric and a few extreme observations are assigned positive probabilities. However the fact that the median does not at all take into account the numerical values of is one of its demerits. Another disadvantage with median is that for many probability distributions it is not easy to evaluate (especially for distributions whose distribution functions do not have closed forms). I (c). Mode. Roughly speaking the mode of a probability distribution is the value that occurs with highest probability and is defined by = sup :. Clearly if is the mode of a probability distribution of then, in the long run, either or a value in the neighborhood of is observed with maximum frequency. Mode is easy to understand and easy to calculate. Normally, it can be found by just inspection. Note that a probability distribution may have more than one mode which may be far apart. Moreover mode does not take into account the numerical values of and it also does not take into account the 52

53 probabilities associated with all the values of. These are crucial deficiencies of mode which make it less preferable over mean and median. A probability distribution with one (two / three) mode(s) is called an unimodal (bimodal/trimodal) distribution. A distribution with multiple modes is called a multimodal distribution. II. Measures of Dispersion Measures of central tendency give us the idea about the location of only central part of the distribution. Other measures are often needed to describe a probability distribution. The values assumed by a random variable usually differ from each other. The usefulness of mean or median as an average is very much dependent on the variability (or dispersion) of values of around mean or median. A probability distribution (or the corresponding random variable is said to have a high dispersion if its support contains many values that are significantly higher or lower than the mean or median value. Some of the commonly used measures of dispersion are standard deviation, quartile deviation (or semi-inter-quartile range) and coefficient of variation. II (a). Standard Deviation. Recall (Definition 3.2 (v)) that the variance (of the probability distribution) of a random variable is defined by =, where = is the mean (of the probability distribution) of. The standard deviation (of the probability distribution) of is defined by = =. Clearly the variance and the standard deviation give us the idea about the average spread of values of around the mean. However, unlike the variance, the unit of measurement of standard deviation is the same as that of. Because of its simplicity and intuitive appeal, standard deviation is the most widely used measure of dispersion. Some of the demerits of standard deviation are that in many situations it may not be defined (distribution for which second moment is not finite) and that it is sensitive to presence of a few extreme values of which are different from other values. A justification for having the mean in place of median or any other average in the definition of = is that R (Problem 32). II (b). Mean Deviation. Let be a suitable average. The mean deviation (of probability distribution) of around average is defined by MD =. Among various mean deviations, the mean deviation about the median is more preferable than the others. A reason for this preference is the fact that for any random variable,md = =MD, R (Problem 24). Since a natural distance between and is, as a measure of dispersion, the mean deviation about median seems to be more appealing than the standard deviation. Although the mean deviation about median (or mean) has more intuitive appeal than the standard deviation, in most situations, it is not easy to evaluate. Some of the other demerits of mean deviations are that in many situations they may 53

54 not be defined and that they are sensitive to presence of a few extreme values of which are different from other values. II (c). Quartile Deviation. A common drawback with the standard deviation and mean deviations, as measures of dispersion, is that they are sensitive to presence of a few extreme values of. Quartile deviation measures the spread in the middle half of the distribution and is therefore not influenced by extreme values. Let and be real numbers such that 1 4 and 3 4, i. e., < and <. The quantities and are called, respectively, the lower and upper quartiles of the probability distribution of random variable. Clearly if, and are respectively the lower quartile, the median and the upper quartile of a probability distribution then they divide the probability distribution in four parts so that, in the long run (i.e., when the random experiment E is repeated a large number of times), twenty five percent of the observed values of are expected to be less than, fifty percent of the observed values of are expected to be less than and seventy five percent of the observed values of are expected to be less than. The quantity IQR = is called the inter quartile range of the probability distribution of and the quantity QD = is called the quartile deviation or the semi-interquartile range of the probability distribution of. It can be seen that if is of absolutely continuous type then and are given by = and = and they may not be unique. Also, for some distributions (especially for distributions of discrete type random variables) it may happen that < and/or R: = 1/4 = [, ( < and/or R: = 3/4 = [, ) for some < < < (for some < < <, so that ( ) is again not uniquely defined. In such situations we define = inf R: 1/4 ( = inf R: 3/4 ). For random variables having a symmetric probability distribution it is easy to verify that = + /2 (Problem 33). Although, unlike the standard deviation and the mean deviation, quartile deviation is not sensitive to presence of some extreme values of a major drawback with the quartile deviation is that it ignores the tails of the probability distribution (which constitute 50% of the probability distribution). Note that the quartile deviation depends on the units of measurement of random variable and thus it may not be an appropriate measure for comparing dispersions of two different probability distributions. For comparing dispersions of two different probability distributions a normalized measure such as 54

55 CQD = = + seems to be more appropriate. The quantity CQD is called the coefficient of quartile deviation of the probability distribution of. Clearly the coefficient of quartile deviation is independent of units of measurement and thus it can be used to be compare dispersion of two different probability distributions. II (d). Coefficient of Variation. Like quartile deviation, standard deviation also depends on the units of measurement of random variable and thus it is not an appropriate measure for comparing dispersions of two different probability distributions. Let and, respectively, be the mean and the standard deviation of the distribution of. Suppose that 0. The coefficient of variation of the probability distribution of is defined by =. Clearly the coefficient of variation measures the variation per unit of mean and is independent of units. Therefore it seems to be an appropriate measure to compare dispersions of two different probability distributions. A disadvantage with the coefficient of variation is that when the mean is close to zero it is very sensitive to small changes in the mean. III. Measures of Skewness of a probability distribution is a measure of asymmetry (or lack of symmetry). Recall that the probability distribution of random variable is said to be symmetric about point if =. In that case = (provided it exists) and + =, R. Evidently, for symmetric distributions, the shape of the p.d.f./p.m.f. on the left of is the mirror image of that on the right side of. It can be shown that, for symmetric distributions the mean and the media coincide (Problem 33). We say that a probability distribution is positively skewed if the tail on the right side of the p.d.f./p.m.f. is longer than that on the left side of the p.d.f./p.m.f. and bulk of the values lie on the left side of the mean. Clearly a positively skewed distribution indicates presence of a few high values of which pull up the value of the mean resulting in mean larger than the median and the mode. For unimodal and positively skewed distribution we normally have Mode <Median <Mean. Similarly we say that a probability distribution is negatively skewed if the tail on the left side of the p.d.f./p.m.f. is longer than that on the right side of the p.d.f./p.m.f. and bulk of the values lie on the right side of the mean. Clearly a negatively skewed distribution indicates presence of a few low values of which pull down the value of the mean resulting in mean smaller than the median and the mode. For unimodal and negatively skewed distribution we normally have Mean< Median <Mode. Let 55

56 and, respectively, be the mean and the standard deviation of and let = / be the standardized variable (independent of units of measurement). A measure of skewness of the probability distribution of is defined by = = =. The quantity is simply called the coefficient of skewness. Clearly, for symmetric distributions, = 0 (Theorem 4.3 (vi)). However the converse may not be true, i.e., there are examples of skewed probability distributions for which = 0. A large positive value of indicates that the data is positively skewed and a small negative value of indicates that the data is negatively skewed. A measure of skewness can also be based on quartiles. Let,, and denote respectively the lower quartile, the median, the upper quartile and the mean of the probability distribution of. We know that for random variables having a symmetric probability = =,.., =. For positively (negatively) skewed distribution we will have > <. Thus one may also define a measure of skewness based on = 2 +. To make this quantity independent of units of measurement one may consider = 2 + as a measure of skewness. The quantity is called the Yule coefficient of skewness. IV. Measures of Kurtosis For real constants R and > 0, let, be a random variable having the p.d.f., = 1 2, < <, < <, > 0, i.e.,,, (see Example 4.2 and discussion following it). We have seen (Example 4.2 (iii)) that and are respectively the mean and the variance of the distribution of,. We call the probability distribution corresponding to p.d.f., as the normal distribution with mean and variance (denoted by,. We know that, distribution is symmetric about (cf. Example 4.2 (ii)). Also it is easy to verify that, distribution is unimodal with as the common value of mean, median and mode. Kurtosis of the probability distribution of is a measure of peakedness and thickness of tail of p.d.f. of relative to the peakedness and thickness of tails of the p.d.f. of normal distribution. A distribution is said to have higher (lower) kurtosis than the normal distribution if its p.d.f., in comparison with the p.d.f. of a normal distribution, has a sharper (rounded) peak and longer, fatter (shorter, thinner) tails. Let and 56

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative