Statistics 1 - Lecture Notes Chapter 1

Statistics 1 - Lecture Notes Chapter 1 Caio Ibsen Graduate School of Economics - Getulio Vargas Foundation April 28, 2009 We want to establish a formal mathematic theory to work with results of experiments that can change randomly. For example, imagine the experiment of rolling a dice, there are six possible outputs, {1, 2, 3, 4, 5, 6}. But we can think in another event, for instance, an event can be the even numbers. Moreover, we want to know what the chances are for some event to occur, that is, the probabilities of each possible result of an experiment(what is exactly a possible result? And a probability?). How we model this kind of situations is the goal of this chapter. 1 Basic Model Let Ω be a space of outputs of an experiment, called sample space, in the example above Ω {1, 2, 3, 4, 5, 6}. We call the elements of Ω as the simple or elementary events. When we are discussing about the possible results of an experiment, in general, we are interested in more results than the elements of Ω, like the result of the dice being even or odd. Let us define what we will considerer as a possible result of an experiment. Definition 1.1. Let Ω be a nonempty set of elementary results of an experiment, we define as a field (or an algebra) of events of Ω the set A of subsets of Ω such that i - Ω A; ii - if A A then A c A; and iii - for any finite family {A n } of elements of A we have n A n A. It is interesting to note the reasons of this definition. Intuitively, A represents the set of all events that we can identify at the end of the experiment. Thus, the first property tell us that we always know when Ω happens, the second is if we can tell when an event A occurs then we can tell when this event does not occur. The last property tells us that if we can identify when an event A occurs and when an event B occurs, defining the event D as the A or B, we also need to be able to identify when the event D occurs. 1

Exercise 1.1. Show that the property (iii) of the definition of a field can be exchanged for: for any finite family {A n } of elements of A we have n A n A. Solution 1.1. We need to prove that both sentences are equivalent. First assume that for any finite family {A n } of elements of A we have n A n A. By item ii, we know that for any finite family {A n } of elements of A, {A c n} is a finite family of elements of A, thus, n A c n A ii ( n A c n) c A. But ( n A c n) c n A n then we have the result. On the other hand, assuming that for any finite family {A n } of elements of A we have n A n A the result follows likewise. Example 1.1. For the discussed example of a dice we have Ω {1, 2, 3, 4, 5, 6} and a field for this Ω can be {, Ω}. We call this kind of field a trivial field of events. It is the most simple field of any set Ω. The largest possible algebra is the set of all subsets of Ω, denoted by P(Ω). Exercise 1.2. Prove that for any nonempty set Ω, we have that P(Ω) is an algebra of events of Ω. Solution 1.2. Let us see the three properties: i - Ω Ω Ω P(Ω); ii - A Ω A c Ω A c P(Ω); and iii - {A n } P (Ω) A n Ω n n A n Ω n A n P(Ω). Then we can conclude that P (Ω) is an algebra of events. Without loss of generality 1 we can apply a more useful class of events called σ field. The advantage of this definition is that it allows you to work with the limits of unions and intersections, which will be very useful for the discussion about convergence of random variables. Definition 1.2. Let Σ be a field of a nonempty set Ω. We say that Σ is a σ field of events of Ω if iii - for any countable family {A n } of elements of Σ we have n A n Σ. A pair (Ω, Σ) with a nonempty set Ω and a σ field Σ is called a measurable space, and an arbitrary element of Σ is called a Σ measurable set(or simply a measurable set if Σ is implicit in the context.). Exercise 1.3. Prove that iii implies iii. Moreover, show that, with Ω [0, 1], the family A defined as the set of all finite unions of intervals belonging to [0, 1] is a field of Ω but it is not a σ field. 1 See Billingsley, P Probability and Measure for a more detailed discussion on fields extension. 2

Solution 1.3. To see that iii implies iii is only necessary define, for all finite family {A n } N n1, the infinite family {A n } n1 with A n for all n N + 1,..., then N n1a n n1a n Σ. For the second part it is easy to see the three properties of an algebra. To see that A is not a σ field note that for all n N the set ( 1 1 2, 1 1 ) n 2 n+1 belongs to A. But the infinite union does not since each element is nonempty and the sets are disjoint. Given a nonempty set Ω and a nonempty family of subsets of Ω, C, we define as the σ algebra generated by C, σ(c), the smallest σ field that contains all elements of C. An equivalent definition is that σ(c) is the intersection of all σ fields containing C. Exercise 1.4. Let {Σ λ } λ Λ be a family of σ fields of a nonempty set Ω. Where Λ is any nonempty family of indices. Show that Σ λ Λ Σ λ is a σ field of Ω. Solution 1.4. Let us see the three properties of a σ field: i - Ω Σ λ λ Ω λ Λ Σ λ ; ii - A λ Λ Σ λ A Σ λ λ A c Σ λ λ A c λ Λ Σ λ ; and iii - {A n } λ Λ Σ λ {A n } Σ λ λ n A n Σ λ λ n A n λ Λ Σ λ. Exercise 1.5. Prove that for any nonempty set Ω, we have that P(Ω) is an σ algebra of events of Ω. Conclude that σ(c), as defined above, is well defined. Solution 1.5. The first part is exactly the same prove of exercise1.2. To see that σ(c) is well defined note that we can write σ(c) as Σ S Σ where S is the set of all σ fields containing C 2. As we see in the exercise, this intersection is a sigma field. Therefore, σ(c) is well defined. Considering Ω the real numbers. We call Borel σ field, and denote B, the smallest σ field that contains all intervals in R. The elements of B are called Borel sets. Exercise 1.6. Let (Ω, Σ) be a measurable space. Prove that for any nonempty set S Σ, we have that Σ S {A Σ; A S} is a σ field of S. Solution 1.6. Let us see the three properties of a sigma field: i - S Σ and S S then S Σ S ; ii - A Σ S A Σ A c Σ A c S Σ A c S Σ S ; and 2 That is nonempty by the first part of exercise. 3

iii - {A n } Σ S ({A n } Σ) (A n S n) ( n A n Σ) ( n A n S) n A n Σ S. Thus, we have the proof. The last object necessary to have a closed mathematical model to an experiment is the probability of an event. This definition is given below. Definition 1.3. A probability on (Ω, Σ) is a function P : Σ R + such that i - P (Ω) 1; and ii - for any countable family {A n } of disjoints elements of Σ we have P ( n A n ) n P (A n). The triple (Ω, Σ, P ) is called a probability space. All the theory developed in these notes will be based on a probability space. Example 1.2. Imagine an experiment of heads and tails with a fair coin. We can model this experiment such as Ω {H, T }, Σ {, Ω, {H}, {T }} and the probability P is such that P ( ) 0, P (Ω) 1 and P (H) P (T ) 1/2. In the proposition below we give some useful properties that come from the definition of probability. It is interesting for the reader to know the basic results of set operations to understand the following properties. If you are not certain about this knowledge, it is advisable to take a look at a basic book of theory of sets before continuing. Proposition 1.1. Let (Ω, Σ, P ) be a probability space. Then we have the following properties. 1. P ( ) 0; 2. P (A) 1 A Σ; 3. P (A c ) 1 P (A) A Σ; 4. P (B A) P (B) P (B A c ) A, B Σ; 5. P (A B) P (A) + P (B) P (A B) A, B Σ; and 6. A, B Σ A B P (A) P (B). Exercise 1.7. Prove proposition 1.1. Solution 1.7. See Barry James. Proposition 1.2. Let (Ω, Σ, P ) be a probability space and {C i } i a partition of Ω such that {C i } i Σ. Then A Σ we have P (A) i P (A C i). In addition, for any countable family of sets {A i } in Σ, we have that P ( i A i ) i P (A i). 4

Proof: To see the first result it is only necessary to note that, since A Σ and {C i } i Σ, we have that A C i Σ i. Moreover, {A C i } is a countable family of disjoint sets, then P (A) P (A Ω) P (A ( i C i )) P ( i (A C i )) i P (A C i). For the last result define B i A i ( j<i A j ) c Σ. Note that B i A i i and i B i i A i. By definition, the family {B i } is countable and disjoint, thus P ( i B i ) i P (B i) i P (A i), where the last inequality comes from B i A i i, and we have the wanted result. Exercise 1.8. Let (Ω, Σ, P ) be a probability space and {A n } a family of elements of Σ such that A n+1 A n and n A n A, or A n A n+1 and n A n A. Prove that in both cases lim P (A n ) P (A). Solution 1.8. See Barry James. 2 Conditional Probability and Independence Definition 2.1. Let (Ω, Σ, P ) be a probability space and B an element of Σ with P (B) 0. For all A Σ the conditional probability of A given B is P (A B) P (A B). (1) P (B) For the case of P (B) 0 we will define P (A B) P (A) A Σ. Intuitively, the conditional probability of a set A given another nonempty set B is the relative measure of A in B, that is, the conditional probability is the probability of the intersection but we need to ponderate by the probability of the set that we are conditioning on. Exercise 2.1. In a talking show, the guest needs to chose one door among three, A, B and C, in one of the doors there is an award. After the guest chooses the door A, the host opens the door B and the award is not there. What is the probability of the award being in door A? And in door C? Solution 2.1. First not that P (A) P (B) P (C) 1/3. Moreover, P (OB CA) P (OB A CA) + P (OB C CA) P (A)P (OB A CA) + P (C)P (OB C CA) 1 1 3 2 + 1 3 1 1 2. Where CI, OI represent the guest chose the door I and the host open the door I. Then, P (A OB CA) P (A OB CA) P (OB CA) P (A)P (CA A)P (OB A CA) P (OB CA)P (CA) ( ) ( ) 1 1 1 1 1 1 1 3 3 2 2 3 3. 5

Analogously, P (C OB CA) P (C OB CA) P (OB CA) P (C)P (CA C)P (OB C CA) P (OB CA)P (CA) ( ) ( ) 1 1 1 1 3 3 1 1 2 2 3 3. Proposition 2.1. Let (Ω, Σ, P ) be a probability space and {A n+1 } any finite family of measurable sets. Then P (A 1... A n+1 ) P (A 1 )P (A 2 A 1 )P (A 3 A 1 A 2 )... P (A n+1 A 1... A n ). Proof: Let us prove by induction. 1 - For n 1 we have by definition that P (A 2 A 1 ) P (A 1 A 2 )/P (A 1 ) then, multiplying both sides by P (A 1 ), we have that P (A 1 A 2 ) P (A 1 )P (A 2 A 1 ). 2 - Assume that the sentence is true for n k, using the previous result applied to A 1... A k+1 and A k+2 we have that P (A 1... A k+1 A k+2 ) P (A 1... A k+1 )P (A k+2 A 1... A k+1 ). But by induction hypothesis P (A 1... A k+1 ) P (A 1 )P (A 2 A 1 )P (A 3 A 1 A 2 )... P (A k+1 A 1... A k ), thus P (A 1... A k+1 A k+2 ) P (A 1 )P (A 2 A 1 )P (A 3 A1 A 2 )... P (A k+1 A 1... A k )P (A k+2 A 1... A k+1 ) and the sentence is true for k + 1. 3 - The sentence is true for n 1 and if it is true for n k then it is true for n k + 1. Therefore, by induction, it is true for all natural numbers. And we have the proof. The following theorem is known as the Bayes Rule. The Bayesian method of estimation comes from this theorem and works with the idea of a priori distributions. That is, imagine that you have beliefs about the probabilities of some events {A n } that form a partition of Ω. The question is how you change your beliefs after the realization of some event B. This is the answer that Bayes Rule gives us. Theorem 2.1. Let (Ω, Σ, P ) be a probability space and {A n } Σ be a countable partition of Ω. Then, for all measurable set B and all i we have P (A i B) P (B A i)p (A i ) n P (B A n)p (A n ). (2) Proof: By definition P (A i B) P (A i B)/P (B), using the previous proposition, we know that P (A i B) P (B A i )P (A i ) then it is only necessary to prove that P (B) n P (B A n)p (A n ). Note that the family {B A n } is a countable family of disjoint events and B n (B A n ), hence, P (B) n P (B A n). Again, using the previous exercise, P (B A n ) P (B A n )P (A n ). Therefore, we have that P (A i B) P (B A i )P (A i )/ n P (B A n)p (A n ). 6

Exercise 2.2. In a factory of cellphones you know that five percent of the cellphones have a problem in the screen and three percent of them the sound does not work. Moreover, you know that in half of the cellphones with a problem in the screen, the sound does not work. If you take one which the sound does not work, what is the probability that the screen has a problem? Solution 2.2. Define V P, SP as the events of screen problem and sound problem respectively. Then, and P (V P ) 5/100, P (SP ) 3/100, P (SP V P ) 1/2 P (SP V P c ) P (SP V P c ) P (V P c ) 5/100 3/200 1 5/100 Using the Bayes Rule P (SP ) P (SP V P ) 1 P (V P ) 7/200 95/100 7 190. P (SP ) P (SP V P )P (V P ) 1 P (V P ) P (SP V P )P (V P ) P (V P SP ) P (SP V P )P (V P ) + P (SP V P c )P (V P c ) (1/2)(5/100) (1/2)(5/100) + (7/190)(95/100) 5/200 12/200 5 12. Now we will introduce the concept of independent events. We need to formally define the idea that two events are independent when the occurrence of one does not add any information about the chances of the other occurring. Definition 2.2. Let (Ω, Σ, P ) be a probability space and A, B Σ two events. We say that the events A, B are independent if we have P (A B) P (A)P (B). Note that for two sets, A, B Σ, with positive probability, this definition implies that P (A B) P (A) and P (B A) P (B). The concept of independence is that information about one event does not tell us anything about the chances of the other. When we are working with many events, this idea changes a little. To assert that these events are independent, it is necessary that subsets of combined events do not give any piece information about the others. The following definition formalize this idea. Definition 2.3. Let (Ω, Σ, P ) be a probability space and {A i } i I Σ a finite family of events. We say that these events are independent if for all set of indices J I we have ( ) P A j P (A j ). j J j J 7

Example 2.1. Consider a fair dice. Then the probability space is (Ω, Σ, P ) with Ω {1, 2, 3, 4, 5, 6}, Σ P(Ω) and P is the probability constructed using that P (ω) 1/6 for all ω Ω. Defining the events A 1 {1, 2, 3, 4} and A 2 A 3 {4, 5, 6} we have that But the events are not independent, since P (A 1 )P (A 2 )P (A 3 ) 1 6 P (A 1 A 2 A 3 ). P (A 1 A 2 ) 1 6 1 3 P (A 1)P (A 2 ) and P (A 2 A 3 ) 1 2 1 4 P (A 2)P (A 3 ). Exercise 2.3. Consider a probability space (Ω, Σ, P ) and two events A, B Σ. Mark true or false for the following items. When the answer is true, make a proof, and when it is false, give a counter example: 1 - ( ) If A, B are independent then A c, B c are independent events. 2 - ( ) If A B then A, B are independent. 3 - ( ) P (A B) 1 P (A c )P (B c ) when A, B are independent. 4 - ( ) Let σ(a) and σ(b) be the sigma fields generated by A and B respectively. Then C, D Σ are independent for all C σ(a) and D σ(b). 8