MAS113 Introduction to Probability and Statistics School of Mathematics and Statistics, University of Sheffield 2018 19
Identically distributed Suppose we have n random variables X 1, X 2,..., X n.
Identically distributed Suppose we have n random variables X 1, X 2,..., X n. If these each have the same probability distribution, which is the same thing as their cumulative distribution functions being the same, P(X 1 a) = P(X 2 a) = P(X n a), then we say that they are identically distributed.
Independent and identically distributed We are particularly interested in the case where X 1, X 2,..., X n are not only identically distributed, but also independent so that for all 1 i, j n with i j: P{(X i a) (X j b)} = P(X i a)p(x j b).
Independent and identically distributed We are particularly interested in the case where X 1, X 2,..., X n are not only identically distributed, but also independent so that for all 1 i, j n with i j: P{(X i a) (X j b)} = P(X i a)p(x j b). We then say that X 1, X 2,..., X n are independent and identically distributed, or i.i.d. for short.
Independent and identically distributed cont. We can, if we like, regard X 1, X 2,..., X n as independent copies of some given random variable.
Independent and identically distributed cont. We can, if we like, regard X 1, X 2,..., X n as independent copies of some given random variable. I.i.d. random variables are very important in applications as they describe repeated experiments that are carried out under identical conditions, in which the outcome of each experiment does not affect the others.
Sums Now define S(n) to be the sum and X (n) to be the mean:
Sums Now define S(n) to be the sum and X (n) to be the mean: S(n) = n X i, i=1
Sums Now define S(n) to be the sum and X (n) to be the mean: S(n) = n X i, i=1 and X (n) = S(n) n.
Sums Now define S(n) to be the sum and X (n) to be the mean: S(n) = n X i, i=1 and X (n) = S(n) n. Both S(n) and X (n) are also random variables, as they are functions of the random variables X 1,..., X n.
Mean and variance of sums If we write E(X i ) = µ and Var(X i ) = σ 2, it is straightforward to derive the mean and variance of S(n) and X (n) in terms of µ and σ 2.
Mean and variance of sums If we write E(X i ) = µ and Var(X i ) = σ 2, it is straightforward to derive the mean and variance of S(n) and X (n) in terms of µ and σ 2. Theorem We have: 1 E(S(n)) = nµ;
Mean and variance of sums If we write E(X i ) = µ and Var(X i ) = σ 2, it is straightforward to derive the mean and variance of S(n) and X (n) in terms of µ and σ 2. Theorem We have: 1 E(S(n)) = nµ; 2 Var(S(n)) = nσ 2 ;
Mean and variance of sums If we write E(X i ) = µ and Var(X i ) = σ 2, it is straightforward to derive the mean and variance of S(n) and X (n) in terms of µ and σ 2. Theorem We have: 1 E(S(n)) = nµ; 2 Var(S(n)) = nσ 2 ; 3 E( X (n)) = µ;
Mean and variance of sums If we write E(X i ) = µ and Var(X i ) = σ 2, it is straightforward to derive the mean and variance of S(n) and X (n) in terms of µ and σ 2. Theorem We have: 1 E(S(n)) = nµ; 2 Var(S(n)) = nσ 2 ; 3 E( X (n)) = µ; 4 Var( X (n)) = σ2 n
Standard error The standard deviation of X (n) plays an important role.
Standard error The standard deviation of X (n) plays an important role. It is called the standard error and we denote it by SE( X (n)), so that SE( X (n)) = σ n.
Application in statistics These results have important applications in statistics.
Application in statistics These results have important applications in statistics. Suppose we are able to observe i.i.d. random variables X 1,..., X n, but we don t know the value of E(X i ) = µ.
Application in statistics These results have important applications in statistics. Suppose we are able to observe i.i.d. random variables X 1,..., X n, but we don t know the value of E(X i ) = µ. Theorem 28 part 4 tells us that as n increases, the variance of X (n) gets smaller, and the smaller the variance is, the closer we expect X (n) to be to its mean value.
Application in statistics These results have important applications in statistics. Suppose we are able to observe i.i.d. random variables X 1,..., X n, but we don t know the value of E(X i ) = µ. Theorem 28 part 4 tells us that as n increases, the variance of X (n) gets smaller, and the smaller the variance is, the closer we expect X (n) to be to its mean value. Theorem 28 part 3 tells us that the mean value of X (n) is µ (for any value of n). In other words, as n gets larger, we expect X (n) to be increasingly close to the unknown quantity µ, so we can use the observed value of X (n) to estimate µ.
Illustration The four plots show the density functions of X (n) for n = 1, 10, 20 and 100. In each case, X 1,..., X n N(0, 1), so E(X i ) = µ = 0. 4 n=1 4 n=10 3 2 1 0 4 2 0 2 4 x n=20 4 3 2 1 0 4 2 0 2 4 x 3 2 1 0 4 2 0 2 4 x n=100 4 3 2 1 0 4 2 0 2 4 x
Examples Here are two key examples of sums of i.i.d. random variables:
Examples Here are two key examples of sums of i.i.d. random variables: If X i Bernoulli(p) for 1 i n, then S(n) Bin(n, p).
Examples Here are two key examples of sums of i.i.d. random variables: If X i Bernoulli(p) for 1 i n, then S(n) Bin(n, p). If X i N(µ, σ 2 ) for 1 i n, then S(n) N(nµ, nσ 2 ).
Examples Here are two key examples of sums of i.i.d. random variables: If X i Bernoulli(p) for 1 i n, then S(n) Bin(n, p). If X i N(µ, σ 2 ) for 1 i n, then S(n) N(nµ, nσ 2 ). The first of these is how we defined the Binomial; we will prove the second later on, using moment generating functions.
Chebyshev s inequality We will now derive an important result regarding the behaviour of X (n) for large n.
Chebyshev s inequality We will now derive an important result regarding the behaviour of X (n) for large n. We first prove a useful inequality. It is true if X is discrete or continuous. Theorem (Chebyshev s inequality)
Chebyshev s inequality We will now derive an important result regarding the behaviour of X (n) for large n. We first prove a useful inequality. It is true if X is discrete or continuous. Theorem (Chebyshev s inequality) Let X be a random variable for which E(X ) = µ and Var(X ) = σ 2. Then for any c > 0 P( X µ c) σ2 c 2.
Chebyshev s inequality cont. The same inequality holds if P( X µ c) is replaced by P( X µ > c), as P( X µ > c) P( X µ c).
Chebyshev s inequality cont. The same inequality holds if P( X µ c) is replaced by P( X µ > c), as P( X µ > c) P( X µ c). What does Chebyshev s inequality tell us?
Chebyshev s inequality cont. The same inequality holds if P( X µ c) is replaced by P( X µ > c), as P( X µ > c) P( X µ c). What does Chebyshev s inequality tell us? We expect the probability to find the value of a random variable to be smaller, the further away we get from the mean.
Chebyshev s inequality cont. The same inequality holds if P( X µ c) is replaced by P( X µ > c), as P( X µ > c) P( X µ c). What does Chebyshev s inequality tell us? We expect the probability to find the value of a random variable to be smaller, the further away we get from the mean. But a large variance may counteract this a little, as it tells us that the values which have high probability are more spread out. Chebyshev s inequality makes this more precise.
Example Example A random variable X has mean 1 and variance 0.5. What can you say about P(X > 6)?
Weak Law of Large Numbers Theorem (The Weak Law of Large Numbers) Let X 1, X 2,... be a sequence of i.i.d. random variables, each with mean µ and variance σ 2. Then for all ɛ > 0, lim P( X (n) µ > ɛ) = 0. n
Strong Law It is possible to prove a stronger result, which is that ( ) P X (n) = µ = 1; lim n
Strong Law It is possible to prove a stronger result, which is that ( ) P X (n) = µ = 1; lim n but the proof is outside the scope of this module.
Strong Law It is possible to prove a stronger result, which is that ( ) P X (n) = µ = 1; lim n but the proof is outside the scope of this module. This result is known as the strong law of large numbers.
Informal law of large numbers In section 4 we introduced the following informal version: The law of large numbers (an informal version). Suppose we do a sequence of independent experiments, so that the outcome in one experiment has no effect on the outcome in another experiment.
Informal law of large numbers In section 4 we introduced the following informal version: The law of large numbers (an informal version). Suppose we do a sequence of independent experiments, so that the outcome in one experiment has no effect on the outcome in another experiment. Now suppose in experiment i, there is an event E i that has a probability of p of occurring, for i = 1, 2,....
Informal law of large numbers In section 4 we introduced the following informal version: The law of large numbers (an informal version). Suppose we do a sequence of independent experiments, so that the outcome in one experiment has no effect on the outcome in another experiment. Now suppose in experiment i, there is an event E i that has a probability of p of occurring, for i = 1, 2,.... The proportion of events out of E 1, E 2,... which actually occur as we do the experiments will typically get closer and closer to p, as the number of experiments increases.
Relationship to informal law To see the relationship between this and Theorem 30, for each of the events E i define a random variable X i which takes the value 1 if E i occurs and 0 if it does not.
Relationship to informal law To see the relationship between this and Theorem 30, for each of the events E i define a random variable X i which takes the value 1 if E i occurs and 0 if it does not. Then X i is a Bernoulli random variable with P(X i = 1) = p and P(X i = 0) = 1 p. As the E i are independent, the X i will be independent too, and it is easy to see that the mean µ = p.
Relationship to informal law Hence Theorem 30 tells us that, for any ɛ > 0, lim P( X (n) p > ɛ) = 0, n and X (n) here is precisely the proportion of the events E 1, E 2,..., E n which actually occur.
Relationship to informal law Hence Theorem 30 tells us that, for any ɛ > 0, lim P( X (n) p > ɛ) = 0, n and X (n) here is precisely the proportion of the events E 1, E 2,..., E n which actually occur. So Theorem 30 tells us that if n is large there is high probability that the proportion of the events E 1, E 2,..., E n which occur is close to p.