Probability, Random Processes and Inference

INSTITUTO POLITÉCNICO NACIONAL CENTRO DE INVESTIGACION EN COMPUTACION Laboratorio de Ciberseguridad Probability, Random Processes and Inference Dr. Ponciano Jorge Escamilla Ambrosio pescamilla@cic.ipn.mx http://www.cic.ipn.mx/~pescamilla/

Probability, Random Processes and Inference CIC Instructor Dr. Ponciano Jorge Escamilla Ambrosio pescamilla@cic.ipn.mx http://www.cic.ipn.mx/~pescamilla/ Class meetings Mondays and Wednesdays 12:00 14:00 hrs. Classroom Aula A3 2

Course web site Course web site: http://www.cic.ipn.mx/~pescamilla/academy.html Reader material and homework exercises, etc. 3

Course Objective The student will learn the fundamentals of probability theory: probabilistic models, discrete and continuous random variables, multiple random variables and limit theorems as well as an introduction to more advanced topics such as random processes and statistical inference. At the end of the course the student will be able to develop and analyse probabilistic models in a manner that combines intuitive understanding and mathematical precision. 4

Course content 1. Probability 1. 1. What is Probability? 1.1.1. Statistical Probability 1.1.2. Probability as a Measure of Uncertainty 1. 2. Sample Space and Probability 1.2.1. Probabilistic Models 1.2.2. Conditional Probability 1.2.3. Total Probability Theorem and Bayes Rule 1.2.4. Independence 1.2.5. Counting 1.2.6. The probabilistic Method 5

Course content 1. 3. Discrete Random Variables 1.3.1. Basic Concepts 1.3.2. Probability Mass Functions 1.3.3. Functions of Random Variables 1.3.4. Expectation and Variance 1.3.5. Joint PMFs of Multiple Random Variables 1.3.6. Conditioning 1.3.7. Independence 6

Course content 1. 4. General Random Variables 1.4.1. Continuous Random Variables and PDFs 1.4.2. Cumulative Distribution Function 1.4.3. Normal Random Variables 1.4.4. Joint PDFs of Multiple Random Variables 1.4.5. Conditioning 1.4.6. The Continuous Bayes Rule 1.4.7. The Strong Law of Large Numbers 7

Course content 2. Introduction to Random Processes 2.1. Markov Chains 2.1.1. Discrete Time Markov Chains 2.1.2. Classification of States 2.1.3. Steady State Behavior 2.1.4. Absorption Probabilities and Expected Time to Absorption 2.1.5. Continuous Time Markov Chains 2.1.6. Ergodic Theorem for Discrete Markov Chains 2.1.7. Markov Chain Montecarlo Method 2.1.8.Queuing Theory 8

Course content 3. Statistics 3.2. Classical Statistical Inference 3.2.1. Classical Parameter Estimation 3.2.2. Linear Regression 3.2.3. Analysis of Variance and Regression 3.2.4. Binary Hypothesis Testing 3.2.5. Significance Testing 9

Course text books Dimitri P. Bertsekas and John N. Tsitsiklis. Introduction to probability, 2nd Edition, Athena Scientific, 2008. http://athenasc.com/probbook.html Joseph Blitzstein, Jessica Hwang. Introduction to probability, CRC Press 2014. https://www.crcpress.com/introduction-to-probability/blitzstein- Hwang/9781466575578 10

Course text books William Feller. An introduction to probability theory and its applications, Vol. 1, 3rd Edition, Wiley, 1968. http://www.wiley.com/wileycda/wileytitle/productcd-0471257087.html Géza Schay, Introduction to probability with statistical applications, Birkhauser, Boston, 2007. http://link.springer.com/book/10.1007/978-0-8176-4591-5 11

Grading Midterm exam 15% Final exam 15% Homework assignments 20% One written departmental exam 50% 12

Course Schedule A-17 http://www.cic.ipn.mx/~pescamilla/academy.html 13

Probability 1. What is Probability? 1.1.1. Statistical Probability 1.1.2. Probability as a Measure of Uncertainty 14

What is Probability? 15

What is Probability? The relative is trying to use the concept of probability to discuss an uncertain situation Luck, Coincidence, Randomness, Uncertainty, Risk, Doubt, Fortune, Chance Used in a vague, casual way! A first approach to define probability is in terms of frequency of occurrence, as a percentage of success 16

What is Probability? For example, if we toss a coin, and observe whether it lands head (H) or tail (T) up What is the probability of either result? Why? 17

What is Probability? P A = # Favorable outcomes # Possible outcomes Example: Flip a coin twice 18

Sample space Definition 1 (Sample space and event). The sample space S of an experiment is the set of all possible outcomes of an experiment. An event A is a subset of the sample space S, and we say that A occurred if the actual outcome is in A. 19

Sample space Tossing twice a coin experiment, example 20

What is Probability? Probability is logical framework for quantifying uncertainty and randomness [Blitzstein and Hwang, 2014] Probability theory is a branch of mathematics that deals with repetitive events whose occurrence or nonoccurrence is subject to chance variation. [Schay, 2007] 21

What is Probability? Provides tools for understanding and explaining variation, separating signal from noise, and modeling complex phenomena. (engineer definition) 22

What is Probability? There are situation where the frequency interpretation is not appropriate Example: A scholar asserts that the Iliad and the Odyssey were composed by the same person, with probability 90% It is based on the scholar s subjective belief 23

What is Probability? The theory of probability is useful in a broad variety of contexts and applications: Statistics, Physics, Biology, Computer Science, Meteorology, Gambling, Finance, Political Science, Medicine, Life. Assignment 1a: Give an example of the application of probability theory in each area Assignment 1b: Read math review: http://projects.iq.harvard.edu/files/stat110/files/math_rev iew_handout.pdf 24

Probabilistic Model 25

Elements of a Probabilistic Model The sample space S, which is the set of all possible outcomes of an experiment. The probability law, which assigns to a set A of possible outcomes (also called an event) a nonnegative number P(A) (called the probability of A) that encodes our knowledge or belief about the collective likelihood of the elements of A. The probability law must satisfy certain properties. 26

Experiments and events The experiment will produce exactly one out of several possible outcomes. A subset of the sample space, that is, a collection of possible outcomes, is called an event. It means that any collection of possible outcomes, including the entire sample space S and its complement, the empty set, may qualify as an event. Strictly speaking, however, some sets have to be excluded. In particular when dealing with probabilistic models involving an uncountable infinite sample space, there are certain unusual subsets for which one cannot associate meaningful probabilities. 27

Experiments and events There is no restriction on what constitutes an experiment. The events to be considered can be described by such statements as a toss of a given coin results in head, a card drawn at random from a regular 52 card deck is an Ace, or this book is green. Associated with each statement there is a set S of possibilities, or possible outcomes. 28

Experiments and events Examples of experiments and events: Tossing a Coin. For a coin toss, S may be taken to consist of two possible outcomes, which we may abbreviate as H and T for head and tail. We say that H and T are the members, elements or points of S, and write S = {H, T}. Tossing two coins but ignore one of them. In this case S = {HH, HT, TH, TT}. In this case, for instance, the outcome the first coin shows H is represented by the set {HH, HT}, that is, this statement is true if we obtain HH or HT and false if we obtain TH or TT. 29

Experiments and events Tossing a Coin Until an H is Obtained. If we toss a coin until an H is obtained, we cannot say in advance how many tosses will be required, and so the natural sample space is S = {H, TH, TTH, TTTH,... }, an infinite set. We can use, of course, many other sample spaces as well, for instance, we may be interested only in whether we had to toss the coin more than twice or not, in which case S = {1 or 2, more than 2} is adequate. Selecting a Number from an Interval. Sometimes, we need an uncountable set for a sample space. For instance, if the experiment consists of choosing a random number between 0 and 1, we may use S = {x : 0 < x < 1}. 30

The probability law Specifies the likelihood of any outcome, or of any set of possible outcomes. Assigns to every event A, a number P(A), called the probability of A. 31

Probability Space [Schay 2007] Given a sample space S and a certain collection F of its subsets, called events, an assignment P of a number P(A) to each event A in F is called a probability measure, and P(A) the probability of A, if P has the following properties: 1. P(A) 0 for every A, 2. P(S) = 1, and 3. P(A1 A2 ) = P(A1)+ P(A2) + for any finite or countably infinite set of mutually exclusive events A1, A2, Then, the sample space S together with F and P is called a probability space. 32

Probability Axioms [Bertsekas and Tsitsiklis, 2008] CIC P(S) = 1. S 33

Probability Space [Blitzstein and Hwang, 2015] Definition 1.6.1 (General definition of probability). A probability space consists of a sample space S and a probability function P which takes an event A S as input and returns P(A), a real number between 0 and 1, as output. The function P must satisfy the following axioms: 1. P( ) = 0, P(S) = 1. 2. If A 1, A 2,... are disjoint events, then: (Saying that these events are disjoint means that they are mutually exclusive: A i A j = for i j.) 34

Properties of probabilities The Probability of the Empty Set Is 0. In any probability space, P( ) = 0. Proof: 1 = P(S) = P(S ) = P(S) + P( ) = 1 + P( ) 35

Properties of probabilities The Probability of the Union of Two Events. For any two events A and B, Proof: P(A B) = P(A) + P(B) P(A B) 36

Properties of probabilities Probability of Complements. For any event A, P(A c ) = 1 P(A) Proof: A c A = and A c A = S by the definition of A c. Thus, by Axiom 3, P(S) = P(A c A) = P(A c ) + P(A). Now, Axiom 2 says that P(S) = 1, and so, comparing these two values of P(S), we obtain P(A c ) + P(A) = 1. 37

Properties of probabilities Probability of Subsets. If A B, Proof: then P(A) P(B). If A B, then we can write B as the union of A and B A c, where B A c is the part of B not also in A. Since A and B A c are disjoint, we can apply the second axiom: P(B) = P(A (B A c )) = P(A) + P(B A c ) Probability is nonnegative, so P(B A c ) 0, proving that P(B) P(A). 38

Properties of probabilities Inclusion-exclusion. For any events A1,...,An, 39

Properties of probabilities Example: 40

Properties of Probability Laws 41

Discrete Probability Law 42

Discrete Uniform Probability Law In the special case where the probabilities P(s 1 ),, P(s n ) are all the same, by necessity equal to 1/n, in view of the normalization axiom, we obtain: 43

Discrete Uniform Probability Law 44

Discrete Uniform Probability Law 45

Counting The calculation of probabilities often involves counting the number of outcomes in various events. When the sample space S has finite number of equally likely outcomes, so that the discrete uniform probability law applies. Then, the probability of any event A is given by: P A = number of elements of A number of elements of S = k n When we want to calculate the probability of an event A with a finite number of equally likely outcomes, each of which has an already known probability p. Then the probability of A is given by: P A = p (number of elements of A) 46

Basic Counting Principle In how many ways you can dress today if you find: 4 shirts 3 ties 2 jackets in your closet? 47

The Multiplication Principle Consider a process that consists of r stages. Suppose that: a) There are n 1 possible results at the firs stage. b) For every possible result at the first stage, there are n 2 possible results at the second stage. c) More generally, for any sequence of possible results at the first i 1 stage, there are n i possible results at the ith stage. Then, the total number of possible results of the r-stage process is: n 1 n 2 n r 48

The Multiplication Principle 49

The Multiplication Principle Example 1. The number of telephone numbers. A local telephone company number is a 7-digit sequence, but the first digit has to be different from 0 or 1. How many distinct telephone numbers are there? 50

The Multiplication Principle Example 2. The number of subsets of an n- element set. Consider an n-element set {s 1, s 2,, s n }. How many subsets it have, including itself and the empty set? Example, in the set {1,2,3}? 51

The Multiplication Principle This is a sequential process where we take in turn each of the n elements and decide whether to include it in the desired subset or not. Thus we have n steps, and in each step two choices, namely yes or no to the question of whether the element belongs to the desired subset. Therefore the number of subsets is: for n = 1? 52

Number of subsets Example 3. Drawing three cards. What is the number of ways three cards can be drawn one after the other from a regular 52 cards deck without replacement? n 1 = 52, n 2 = 51, n 3 = 50 52 51 50 What is this number if we replace each card before the next one is drawn? n 1 = n 2 = n 3 = 52 52 3 53

Permutation and Combination Involve the selection of k objects out of a collection of n objects. If the order of selection matters, the selection is called a permutation. If the order of selection does not matter, the selection is called a combination. 54

Permutation k permutations Assume there are n distinct objects, and let k be some positive integer with k n. We want to count the number of different ways that we can pick k out of these n objects and arrange them in a sequence, e.g. the number of distinct k-object sequences. 55

Permutation In place 1 we can put n objects, which we can write as n 1+1; In place 2 we can put n 1 = n 2+1 objects; and so on. Thus the kth factor will be n k + 1, and so, for any 2 positive integers n and k n: n(n 1)(n 2) (n k + 1) = P n,k In the special case where k = n: n(n 1)(n 2) 3 2 1 = n! The number of possible sequences is simple called permutations 56

Permutation From the definitions of n!, (n k)! and P n,k we can obtain the following relation: n! = [n(n 1)(n 2) (n k + 1)][(n k)(n k 1) 2 1] = P n,k (n k)! and so: with 0! = 1. P n,k = n! n k! 57

Probability calculation Example 4. Six rolls of a die. Find the probability that: Six rolls of a (six sided) die all give different numbers Assume all outcomes are equally likely P(all six rolls give different numbers) =? P A = number of elements of A number of elements of S = k n P A = p (number of elements of A) p = probability of each equally likely outcome in A 58

Probability calculation Example 4. Six rolls of a die. Find the probability that: Six rolls of a (six sided) die all give different numbers Assume all outcomes are equally likely P(all six rolls give different numbers) =? P A = number of elements of A number of elements of S = k n = A = P 6,6 # elements in S = 6! 6 6 P A = p number of elements of A = 1 6 6 6! p = probability of each equally likely outcome in A 59

Permutation Example 5. Dealing Three Cards. In how many ways can three cards be dealt from a regular deck of 52 cards? 60

Permutation Example 5. Dealing Three Cards. In how many ways can three cards be dealt from a regular deck of 52 cards? n! P 52,3 = P n,k = n k! = 52 51 50 = 132, 600. 61

Permutation Example 6. Birthday problem. There are k people in a room. Assume each person s birthday is equally likely to be any of the 365 days of the year (we exclude February 29), and that people s birthdays are independent (we assume there are no twins in the room). What is the probability that two or more people in the group have the same birthday? 62

Permutation This amounts to sampling the 365 days of the year without replacement, so: 365 364 363 (365 k +1) for k 365 Therefore the probability of no birthday matches in a group of k people is: and the probability of at least one birthday match is: 63

Permutation Probability that in a room of k people, at least two were born on the same day. This probability first exceeds 0.5 when k = 23. 64

Combinations The number of possible unordered selections of k different things out of n different ones is denoted by C n,k, and each such selection is called a combination of the given things. If we select k things out of n without regard to order, then, this can be done in C n,k ways. In each case we have k things which can be ordered k! ways. Thus, by the multiplication principle, the number of ordered selections is C n,k k! On the other hand, this number is, by definition, P n,k. Therefore C n,k k! = P n,k, and so: C n,k = P n,k n! = k! k! n k! 65

Combinations The quantity on the right-hand side is usually abbreviated as n k, and is called a binomial coefficient. Thus, for any positive integer n and k = 1, 2,..., n: C n,k = n k n(n 1)(n 2) (n k + 1) = k! n! = k! n k! n! = [n (n 1)(n 2) (n k + 1)][(n k)(n k 1) 2 1] 66

Combinations 67

Binomial probabilities Binomial coefficient n k Binomial probabilities n 1 independent coin tosses: P(H) = p; P(k heads) =? Example: P(HTTTHH) =? P(particular sequence) =? P(particular k-head sequence) =? 68

Partitions A combination can be seen as a partition of the set in two: one part contains k elements and the other contains the remaining n k elements. Given an n-element set and nonnegative integers n 1, n 2,, n r, whose sum is equal to n; consider partitions of the set into r disjoint subsets, with the ith subset containing exactly n i elements. In how many ways this can be done? 69

Partitions There are n n 1 ways of forming the first subset. Having formed the first subset, there are left n n 1 elements. We need to choose n 2 of them in order to form the second subset, and have n n 1 n 2 Thus, using the Counting Principle: choices, and so on. 70

Partitions As several terms cancel, it results: This is called the multinomial coefficient and is usually denoted by: 71

Partitions 72

Partitions Example 7. Each person gets an ace. There is a 52- card deck, dealt (fairly) to four players. What is the probability of each player getting an ace? 73

Partitions Example 7. Each person gets an ace. There is a 52- card deck, dealt (fairly) to four players. What is the probability of each player getting an ace? The size of the sample space is: 13!13!13!13! Constructing an outcome with one ace for each person: o # of different ways of distributing the 4 aces to 4 players: 4! 52! o Distribution of the remaining 48 cards: 48! 12!12!12!12! 74

Summary of Counting Results 75

Conditional Probability Conditional probability provides us with a way to reason about the outcome of an experiment, based on partial information. Examples: A) In an experiment involving two successive rolls of a die, you are told that the sum of the two rolls is 9. How likely is that the first roll was 6? B) In a word guessing game, the first letter of the word is a t. What is the likelihood that the second letter is an h? 76

Conditional Probability C) How likely is it that a person has certain disease given that a medical test was negative? D) A spot shows up on a radar screen. How likely is it to correspond to an aircraft? 77

Conditional Probability Given: An experiment A corresponding sample space A probability law We know that the outcome is within some given event B. Quantify the likelihood that the outcome also belongs to some other given event A. 78

Conditional Probability Construct a new probability law that takes into account the available knowledge. A probability law that for any event A, specifies the conditional probability of A given B, P(A B). The conditional probabilities P(A B) of different events A should satisfy the probability axioms. 79

Conditional Probability Example: Suppose that all six possible outcomes of a fair die roll are equally likely. If the outcome is even, then there are only three possible outcomes: 2, 4 and 6. What is the probability of the outcome being 6 given that the outcome is even? 80

Conditional Probability If all possible outcomes are equally likely: Conditional probability definition: With P(B) > 0. The total probability of the elements of B, P(A B) is the fraction that is assigned to possible outcomes that also belong to A. 81

Conditional Probability Probability law of conditional probabilities satisfy the three axioms: 1. P(A B) 0 for every event A, 2. P(S B) = 1, 3. P(A 1 A 2 B) = P(A 1 B)+ P(A 2 B) + for any finite or countably infinite number of mutually exclusive events A 1, A 2,.... 82

Conditional Probability Proofs: 1. In the definition of P(A B) the numerator is nonnegative by Axiom 1, and the denominator is positive by assumption. Thus, the fraction is nonnegative. 2. Taking A = S in the definition of P(A B), we get: 83

Conditional Probability 3. 84

Conditional Probability Knowledge that event B has occurred implies that the outcome of the experiment is in the set B. In computing P(A B) we can therefore view the experiment as now having the reduced sample space B. The event A occurs in the reduced sample space if and only if the outcome ζ is in A B. The equation simply renormalizes the probability of events that occur jointly with B. 85

Conditional Probability Suppose that we learn that B occurred. Upon obtaining this information, we get rid of all the pebbles in B c because they are incompatible with the knowledge that B has occurred. Then P(A B) is the total mass of the pebbles remaining in A. Finally, we renormalize, that is, divide all the masses by a constant so that the new total mass of the remaining pebbles is 1. This is achieved by dividing by P(B), the total mass of the pebbles in B. The updated mass of the outcomes corresponding to event A is the conditional probability P(A B) = P(A B)/P(B). 86

Conditional Probability If we interpret probability as relative frequency: P(A B) should be the relative frequency of the event P(A B) in experiments where B occurred. Suppose that the experiment is performed n times, and suppose that event B occurs n B times, and that event A B occurs n A B times. The relative frequency of interest is then: where we have implicitly assumed that P(B) > 0. 87

Conditional Probability Example 1. Given the figure below, obtain P(A B) 88

Conditional Probability Example 2. A ball is selected from an urn containing two black balls, numbered 1 and 2, and two white balls, numbered 3 and 4. The number and color of the ball is noted, so the sample space is {(1,b),(2,b), (3,w), (4,w)}. Assuming that the four outcomes are equally likely, find P(A B) and P(A C), where A, B, and C are the following events: 89

Conditional Probability Example 3. From all families with three children, we select one family at random. What is the probability that the children are all boys, if we know that a) the first one is a boy, and b) at least one is a boy? (Assume that each child is a boy or a girl with probability 1/2, independently of each other.) 90

Conditional Probability Example 4. A card is drawn at random from a deck of 52 cards. What is the probability that it is a King or a 2, given that it is a face card (J, Q, K)? 91

Total Probability Theorem and Bayes Rule CIC If we multiply both sides of the definition of P(A B) by P(B) we obtain: P(A B) = P(A B) P(B) Similarly, if we multiply both sides of the definition of P(B A) by P(A) we obtain: P(B A) = P(B A) P(A) 92

Total Probability Theorem and Bayes Rule CIC Joint Probability of Two Events. For any events A and B with positive probabilities: P(A B) = P(B) P(A B) = P(A) P(B A) Joint Probability of Three Events P(A B C) = P(A) P(B A) P(C A B) P(A 1 A 2 A 3 ) = P(A 1 ) P(A 2 A 1 ) P(A 3 A 1 A 2 ) 93

Total Probability Theorem and Bayes Rule CIC Applying repeatedly, we can generalise to the intersection of n events. 94

Total Probability Theorem and Bayes Rule CIC 95

Total Probability Theorem Total Probability Theorem: 96

Total Probability Theorem P(B) = P(A 1 ) P(B A 1 ) + + P(A n ) P(B A n ) The probability that B occurs is a weighted average of its conditional probability under each scenario, where each scenario is weighted according to its (unconditional) probability. The A i partition the sample space; P(B) is equal to: 97

Total Probability Theorem 98

Total Probability Theorem Example 1. Radar detection. If an aircraft is present in certain area, a radar detects it and generates an alarm signal with probability 0.99. If an aircraft is not present, the radar generates a (false) alarm, with probability 0.10. We assume that an aircraft is present with probability 0.05. What is the probability of no aircraft presence and false alarm? What is the probability of aircraft presence and no detection? 99

Total Probability Theorem Sequential representation in a tree diagram 100

Total Probability Theorem Sequential Representation in a tree diagram 101

Total Probability Theorem Example 2. Picking Balls from Urns. Suppose we have two urns, with the first one containing 2 white and 6 black balls, and the second one containing 2 white and 2 black balls. We pick an urn at random, and then pick a ball from the chosen urn at random. What is the probability of picking a white ball? 102

Total Probability Theorem Tree diagram What is the probability of picking a black ball? 103

Total Probability Theorem Dealing Three Cards. From a deck of 52 cards three are drawn without replacement. What is the probability of the event E of getting two Aces and one King in any order? Denote the relevant outcomes by A, K and O (for other ), 104

Total Probability Theorem 105

Total Probability Theorem 106

Bayes Rule 107

Bayes Rule To verify Bayes rule, by the definition of conditional probability: P(B) follows from the total probability theorem. 108

Bayes Rule 109

Bayes Rule 110

Bayes Rule Example 1. Rare disease. A test for a rare disease is assumed to be correct 95% of the time: if a person has the disease, the test results are positive with probability 0.95, and if the person does not have the disease, the results are negative with probability 0.95. A random person drawn from a certain population has probability 0.001 of having the disease. Given that the person just tested positive, what is the probability of having the disease? A={ the person has the disease } B={ the test results are positive } P(A B)=? 111

Bayes Rule A rare disease we need a much more accurate test. The probability of a false positive result must be of a lower order of magnitude than the fraction of people with the disease. 112

Bayes Rule Example 2. Random coin. You have one fair coin, and one biased coin which lands Heads with probability 3/4. You pick one of the coins at random and flip it three times. It lands Heads all three times. Given this information, what is the probability that the coin you picked is the fair one? 113

Bayes Rule Before flipping the coin, we thought we were equally likely to have picked the fair coin as the biased coin: P(F) = P(F c ) = 1/2. Upon observing three Heads, however, it becomes more likely that we ve chosen the biased coin than the fair coin, so P(F A) is only about 0.23. 114

Independence Independence of two events. Events A and B are independent if P(A B) = P(A) P(B) If P(A) > 0 and P(B) > 0, then this is equivalent to: and also equivalent to: P(A B) = P(A) P(B A) = P(B) 115

Independence Two events are independent if we can obtain the probability of their intersection by multiplying their individual probabilities. Alternatively, A and B are independent if learning that B occurred gives us no information that would change our probabilities for A occurring (and vice versa). Independence is a symmetric relation: if A is independent of B, then B is independent of A. 116

Independence Independence is completely different from disjointness. If A and B are disjoint, then P(A B) = 0, so disjoint events can be independent only if P(A) = 0 or P(B) = 0. Knowing that A occurs tells us that B definitely did not occur, so A clearly conveys information about B, meaning the two events are not independent (except if A or B already has zero probability). 117

Independence If A and B are independent, then A and B c are independent, A c and B are independent, and A c and B c are independent. Proof. Let A and B be independent. Then P(B c A) = 1 P(B A) = 1 P(B) = P(B c ) so A and B c are independent. Swapping the roles of A and B, we have that A c and B are independent. Using the fact that A, B independent implies A, B c independent, with A c playing the role of A, we also have that A c and B c are independent. 118

Independence Independence of three events. Events A, B, and C are said to be independent if all of the following equations hold: P(A B) = P(A)P(B) P(A C) = P(A)P(C) P(B C) = P(B)P(C) P(A B C) = P(A)P(B)P(C) 119

Independence 120

Independence Independence of many events. For n events A 1,A 2,..., A n to be independent, we require any pair to satisfy: any triplet to satisfy: P(A i A j ) = P(A i )P(A j ) (for i j), P(A i A j A k ) = P(A i )P(A j )P(A k ) (for i, j, k distinct) And similarly for all quadruplets, quintuplets, and so on. For infinitely many events, we say that they are independent if every finite subset of the events is independent. 121

Conditional independence Given an event C, the events A and B are said to be conditionally independent if: P(A B C) = P(A C) P(B C) 122

Conditional independence The previous relation states that if C is known to have occurred, the additional knowledge that B also occurred does not change the probability of A. The independence of two events A and B with respect to the unconditional probability law, does not imply conditional independence, and vice versa. 123

Independence Example 2. Reliability. p i : probability that unit i is up u i : ith unit is up u 1, u 2,, u n are independent f i : ith unit is down f i are independent P(system is up) =? 124