STAT:5100 (22S:193) Statistical Inference I

Similar documents
2. AXIOMATIC PROBABILITY

Lecture 2: Probability, conditional probability, and independence

Statistical Inference

Lecture 16. Lectures 1-15 Review

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

The enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}

Chapter 3 : Conditional Probability and Independence

Conditional Probability

Conditional Probability and Bayes

Announcements. Topics: To Do:

Probability Theory and Applications

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

1 INFO 2950, 2 4 Feb 10

Dept. of Linguistics, Indiana University Fall 2015

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

Outline Conditional Probability The Law of Total Probability and Bayes Theorem Independent Events. Week 4 Classical Probability, Part II

1 INFO Sep 05

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

UNIT 5 ~ Probability: What Are the Chances? 1

Statistics for Business and Economics

Introduction and basic definitions

Introduction to Probability 2017/18 Supplementary Problems

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

Lecture 7. Bayes formula and independence

18.600: Lecture 7 Bayes formula and independence

CS626 Data Analysis and Simulation

2.6 Tools for Counting sample points

MA : Introductory Probability

What is Probability? Probability. Sample Spaces and Events. Simple Event

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Axiomatic Foundations of Probability. Definition: Probability Function

Econ 325: Introduction to Empirical Economics

With Question/Answer Animations. Chapter 7

18.600: Lecture 7 Bayes formula and independence

Mutually Exclusive Events

LECTURE NOTES by DR. J.S.V.R. KRISHNA PRASAD

CSC Discrete Math I, Spring Discrete Probability

Probability the chance that an uncertain event will occur (always between 0 and 1)

Chapter 2.5 Random Variables and Probability The Modern View (cont.)

Lecture 3 - Axioms of Probability

Conditional Probability

2011 Pearson Education, Inc

STA Module 4 Probability Concepts. Rev.F08 1

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Single Maths B: Introduction to Probability

Statistical Theory 1

CIVL 7012/8012. Basic Laws and Axioms of Probability

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Chapter 2 Class Notes

Chapter. Probability

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Compound Events. The event E = E c (the complement of E) is the event consisting of those outcomes which are not in E.

Monty Hall Puzzle. Draw a tree diagram of possible choices (a possibility tree ) One for each strategy switch or no-switch

Formalizing Probability. Choosing the Sample Space. Probability Measures

Chapter 7 Wednesday, May 26th

Probability Theory for Machine Learning. Chris Cremer September 2015

Topic 2 Probability. Basic probability Conditional probability and independence Bayes rule Basic reliability

Topic 5 Basics of Probability

STAT 516: Basic Probability and its Applications

Example. What is the sample space for flipping a fair coin? Rolling a 6-sided die? Find the event E where E = {x x has exactly one head}

Example: Suppose we toss a quarter and observe whether it falls heads or tails, recording the result as 1 for heads and 0 for tails.

Probability- describes the pattern of chance outcomes

Dynamic Programming Lecture #4

Origins of Probability Theory

Independence 1 2 P(H) = 1 4. On the other hand = P(F ) =

Problem #1 #2 #3 #4 Extra Total Points /3 /13 /7 /10 /4 /33

Probability 5-4 The Multiplication Rules and Conditional Probability

Probability, Random Processes and Inference

Lecture 3 Probability Basics

Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya

Randomized Algorithms

the time it takes until a radioactive substance undergoes a decay

Cogs 14B: Introduction to Statistical Analysis

STAT 430/510 Probability

Chapter Summary. 7.1 Discrete Probability 7.2 Probability Theory 7.3 Bayes Theorem 7.4 Expected value and Variance

STAT509: Probability

Determining Probabilities. Product Rule for Ordered Pairs/k-Tuples:

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Basic notions of probability theory

Probability and Statistics Notes

Notes 1 Autumn Sample space, events. S is the number of elements in the set S.)

Lecture 4. Selected material from: Ch. 6 Probability

Chapter 3: Probability 3.1: Basic Concepts of Probability

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

Fundamentals of Probability CE 311S

Lecture 1. Chapter 1. (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 ( ). 1. What is Statistics?

Basic Statistics and Probability Chapter 3: Probability

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

STOR Lecture 4. Axioms of Probability - II

TOPIC 12 PROBABILITY SCHEMATIC DIAGRAM

SDS 321: Introduction to Probability and Statistics

Denker FALL Probability- Assignment 6

3.2 Probability Rules

Lecture 1. ABC of Probability

F71SM STATISTICAL METHODS

Properties of Probability

Conditional Probability & Independence. Conditional Probabilities

Discrete Probability

Transcription:

STAT:5100 (22S:193) Statistical Inference I Week 3 Luke Tierney University of Iowa Fall 2015 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 1

Recap Matching problem Generalized birthday problem Inclusion-exclusion formula Countably infinite sample spaces Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 2

Countably Infinite Sample Spaces Countably Infinite Sample Spaces Example (continued) Suppose two players, A and B, take turns rolling a die. The first player who rolls a six wins the game. Player A rolls first. We can use as a sample space S = {1, 2, 3,... } Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 3

Countably Infinite Sample Spaces Example (continued) What is the probability that A wins? A wins if the game ends on an odd roll: 1, 3, 5,.... Let F n be the event that the game ends on roll n. The F n are pairwise disjoint. It is tempting to write ( ) P(A wins) = P F 2k 1 = P(F 2k 1 ) = = = k=1 k=1 k=1 k=1 p 2k 1 1 6 1 6 k=1 ( ) (2k 1) 1 5 = 6 k=1 ( ) k 1 25 = 1 ( 36 6 1 6 ( ) 2k 2 5 6 ) 1 1 25/36 = 6 11 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 4

Countably Infinite Sample Spaces Example (continued) This is in fact the right answer. But there is a flaw in the argument. For infinite sample spaces our definition of probability does not guarantee that ( ) P F 2k 1 = P(F 2k 1 ) k=1 k=1 for pairwise disjoint F n. This identity does hold in situations like ours where p k = 1. k=1 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 5

Countable Additivity Theorem Let P be a probability on a countably infinite sample space S = {s 1, s 2,... }, let p k = P({s k }), and suppose that p k = 1. k=1 Then for any pairwise disjoint events A i S ( ) P A i = i=1 P(A i ). i=1 This property is known as countable additivity. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 6

Countable Additivity Theorem Let P be a countably additive probability, let A 1 A 2 A 3... be a decreasing sequence of events, and let Then A = A i. i=1 lim P(A i) = P(A). i This is called the continuity property of countably additive probability. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 7

Countable Additivity Knowing that a probability is countably additive can make many calculations easier. All probabilities on finite sample spaces are countably additive. Are there probabilities that are not countably additive? Yes; one example is a uniform probability distribution on the integers. For all practical purposes all probabilities used in practice are countably additive. As a result, the standard approach is to require countable additivity as part of the definition or a probability. The definition we have used so far is then called finitely additive probability. There is a school of probability and statistics based entirely on the use of finitely additive probabilities. Bruno de Finetti was a major proponent of the finitely additive approach. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 8

Countably Additive Probability Countably Additive Probability Definition A probability, or probability function, is a real-valued function defined on all subsets of a sample space S such that (i) P(A) 0 for all A S (ii) P(S) = 1 (iii) If A 1, A 2, S are pairwise disjoint, then ( ) P A i = P(A i ). i=1 i=1 Notes This definition does not add the countable additivity axiom; it replaces the finite additivity axiom with the countably additive version. We need to check that finite additivity still holds. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 9

Countably Additive Probability Theorem Suppose A, B S are disjoint and P is a (countably additive) probability on S. Then P(A B) = P(A) + P(B). Proof. Let C 1 = A, C 2 = B, and C i = for i = 3, 4,.... Then the C i are mutually exclusive, and A B = C i. So by the countable additivity axiom and the fact that P( ) = 0, ( ) P(A B) = P C i = P(C i ) i=1 i=1 i=1 = P(A) + P(B) + 0 + 0 + = P(A) + P(B). Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 10

Countably Additive Probability There is a gap in the proof: Our earlier proof that P( ) = 0 used finite additivity! To fill the hole we need to prove P( ) = 0 using only the three axioms for countably additive probability. Here is one way: Let A 1 = S and let A i = for i = 2, 3,.... Then the A i are pairwise disjoint, and ( ) P(S) = P A i = P(A i ) = P(S) + P( ). i=1 This is only possible if P( ) = 0. i=1 i=2 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 11

Countably Additive Probability We have already seen that a countably additive probability satisfies a certain continuity property. A finitely additive probability that has a continuity property is countably additive: Theorem Suppose P is a finitely additive probability on a sample space S and that for any events A 1 A 2... with i=1 A i = lim P(A i) = 0. i Then P is a countably additive probability on S. So, in this sense, a countably additive probability is a finitely additive probability that is also continuous. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 12

Countably Additive Probability Proof. Outline: Using finite additivity, write ( ) ( P A i = P A 1 A 2 A n ) A i i=1 i=n+1 ( ) = P(A 1 ) + P(A 2 ) + + P(A n ) + P A i. i=n+1 Then argue that continuity implies ( ) P A i 0. i=n+1 as n. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 13

Building Probability Models Building Probability Models The main tool we have so far for building probability models is equally likely outcomes. A lot can be done with this framework. Much of classical probability is bases on equally likely outcomes. Box models used in some introductory text books are another example. But more flexible tools are needed for more complex situations. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 14

Building Probability Models Example About 50% of decided Iowa voters say they will vote Democratic, 50% Republican. About 10% of all people are left handed. If we select a decided voter at random, what is the chance the person will be a left-handed Democratic voter? It seems reasonable to assume about 10% of Iowa decided voters are left handed; about 10% of decided Republican voters are left handed; about 10% of decided Democratic voters are left handed. So the probability should be about 0.5 0.1 = 0.05. The classifications of party and handedness seem unrelated, or independent. What if instead of handedness the second classification was Male/Female? Urban/Rural? Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 15

Independence Independence Definition Two events A and B are independent if P(A B) = P(A)P(B). Notes In model building we often assume (at least approximate) independence. This allows complex probabilities to be derived from simple ones. Like any assumption, you have to think it through carefully before relying on it. It is rarely exactly true but often close enough to be useful. In some cases independence is derived. If A and B are disjoint and have positive probability then they are not independent; they are dependent. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 16

Independence Example A remote sensing device can be equipped with one or two power supplies. With a single power supply the device fails if the power supply fails. With two power supplies the device will only fail once both power supplies have failed. Suppose the chance of one power supply failing in a particular time frame is 10%. If we assume independent failures, then the chance that both fail is 0.1 0.1 = 0.01 or 1%. Is the independence assumption reasonable? Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 17

Independence Example A fair die is rolled. Let A be the event that an even number is rolled. Let B be the event that the number rolled is at most 2. Then P(A) = 3 6 = 1 2 P(B) = 2 6 = 1 3 P(A B) = 1 6 = 1 2 1 3 = P(A)P(B). So A and B are independent, even though they relate to the same roll of a die. This is an example of derived independence. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 18

Independence If A and B are independent, then P(A B c ) = P(A) P(A B) So A and B c are also independent. = P(A) P(A)P(B) = P(A)(1 P(B)) = P(A)P(B c ) Similarly, A c and B are independent, and A c and B c are independent. In words: knowing whether A has or has not occurred tells us nothing about whether B has or has not occurred. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 19

Independence Definition Events A 1, A 2,..., A n are mutually independent if P(A 1 A 2 A n ) = P(A 1 )P(A 2 ) P(A n ) and every sub-collection of n 1 of the A i is mutually independent. Notes Three events A, B, and C are mutually independent if P(A B) = P(A)P(B) P(A C) = P(A)P(C) P(B C) = P(B)P(C) P(A B C) = P(A)P(B)P(C) It is possible to have the first three but not the fourth; it is possible to have the fourth but not the first three. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 20

Independence Example (Remote sensing power supplies, continued) Suppose we can use n power supplies in our remote sensing device. The device will only fail once all power supplies have failed. Suppose the chance of a particular power supply failing is again 10%. Suppose failures of these devices are mutually independent. Then the chance of all power supplies failing is (0.1) n. By making n large we can make the chance of failure as small as we like. Is the independence assumption likely to be reasonable for very large n? Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 21

Independence Examples A fair coin is tossed independently n times. All 2 n outcomes are equally likely. A fair die is rolled independently n times. All 6 n outcomes are equally likely. A biased coin with probability p of heads is tossed n times. The probability of a particular outcome with k heads and n k tails is p k (1 p) n k. The probability of k heads is ( ) n p k (1 p) n k. k This is an important basic model for binary data. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 22

Friday, September 11, 2015 Recap Countably additive probability Independence Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 23

Friday, September 11, 2015 Example Reliability block diagrams are sometimes used to analyze reliability of complex systems. 1 2 A B 3 4 Failure modes are usually assumed independent. Let F = {whole system fails} F i = {Component i fails} p i = P(F i ). Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 24

Friday, September 11, 2015 Example (continued) We can often break larger systems into smaller sub-systems. The system of components 1 and 2 is a series system: A 1 2 B The sub-system fails if either 1 or 2 fails The sub-system failure event is F 12 = F 1 F 2. So the sub-system failure probability is P(F 12 ) = P(F 1 F 2 ) = P(F 1 ) + P(F 2 ) P(F 1 F 2 ) = p 1 + p 2 p 1 p 2 Similarly for the second sub-system P(F 34 ) = p 3 + p 4 p 3 p 4. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 25

Friday, September 11, 2015 Example (continued) The whole system is a parallel system of these two sub-systems. 1,2 A B The whole system only fails if both sub-systems fail, F = F 12 F 34. 3,4 So the failure probability for the whole system is P(F ) = P(F 12 F 34 ) = P(F 12 )P(F 34 ) = (p 1 +p 2 p 1 p 2 )(p 3 +p 4 p 3 p 4 ) The process of generating system failure probabilities from reliability block diagrams can be automated. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 26

Friday, September 11, 2015 Updating Probabilities: Conditional Probability Updating Probabilities: Conditional Probability Example Suppose I roll a die. The probability of an even roll is 1/2. Suppose I tell you the roll is 3. Does this change the probability that the roll is even? Using relative frequencies, we can look at cases where the roll is 3: new P(even) #(even and 3) = #(roll 3) P(even and 3) P( 3) prop(even and 3) prop( 3) = 1/6 1/2 = 1/3 You adjust your probability to a new sample space that only allows rolls 3. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 27

Friday, September 11, 2015 Updating Probabilities: Conditional Probability Definition For events A and B with P(B) > 0, the conditional probability of A given that B has occurred is P(A B) = P(A B) P(B) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 28

Friday, September 11, 2015 Updating Probabilities: Conditional Probability Notes If A and B are independent then P(A B) = P(A). Sometimes we compute conditional probabilities. Other times we use them to build up more complicated probabilites: using the multiplication formula: using the Law of Total Probability: P(A B) = P(A B)P(B) P(A) = P(A B) + P(A B c ) = P(A B)P(B) + P(A B c )P(B c ) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 29

Friday, September 11, 2015 Updating Probabilities: Conditional Probability Example About half of potential Iowa voters are male and half female About 40% of female potential voters would vote Republican About 60% of male potential voters would vote republican If a voter is selected at random then P(F ) = 0.5 P(M) = 0.5 P(R F ) = 0.4 P(R M) = 0.6 The probability of selecting a Republican potential voter is P(R F ) = P(R F )P(F ) = 0.4 0.5 = 0.2 P(R M) = P(R M)P(M) = 0.6 0.5 = 0.3 P(R) = P(R F ) + P(R M) = 0.2 + 0.3 = 0.5 Mosaic plots can be used to illustrate jount probabilities of two or more classifications. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 30

Friday, September 11, 2015 Updating Probabilities: Conditional Probability Example 10% of a population has a disease. A test has the properties P(+ D) = 0.96 P( D c ) = 0.95 sensitivity specificity If a person is chosen at random and tested, then the probability of a positive result is P(+) = P(+ D) + P(+ D c ) = P(+ D)P(D) + P(+ D c )P(D c ) = P(+ D)P(D) + (1 P( D c ))P(D c ) = 0.96 0.1 + (1 0.95) 0.9 = 0.096 + 0.045 = 0.141 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 31

Friday, September 11, 2015 Bayes Theorem Bayes Theorem It is often useful to reverse the calculation: Given that a randomly chosen person has a positive result, what is P(D), the probability that the person has the disease? P(D +) = P(D +) P(+) = 0.096 0.141 = 0.678 This calculation is very easy to do from scratch just rewrite the probability you want in terms of the inputs you have. But this calculation is very important, so it has a formal theorem attached to it. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 32

Friday, September 11, 2015 Bayes Theorem Theorem (Bayes Theorem) Let the events B 1, B 2,... partition S, i.e. the B i are pairwise disjoint, and they are collectively exhaustive: B i = S). Then for any event A P(B i A) = P(A B i) P(A) = P(A B i)p(b i ) j P(A B j)p(b j ) Notes Bayes theorem is the foundation to the Bayesian approach to drawing conclusions from data. The denominator follows from the law of total probability. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 33

Friday, September 11, 2015 Conditioning Examples Example Another reliability block diagram: 1 2 A 5 3 4 B One possible approach is to condition on whether component 5 has failed or not. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 34

Friday, September 11, 2015 Conditioning Examples Example (continued) If component 5 has failed, then the system is like the one we analyzed previously; so P(F F 5 ) = (p 1 + p 2 p 1 p 2 )(p 3 + p 4 p 3 p 4 ). If component 5 has not failed, then the system is a series of two parallel systems, so P(F F c 5 ) = p 1 p 3 + p 2 p 4 p 1 p 3 p 2 p 4 The system failure probability is then P(F ) = P(F F 5 )p 5 + P(F F c 5 )(1 p 5 ) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 35

Friday, September 11, 2015 Conditioning Examples Example The counting multiplication rule can often be replaced by a conditioning argument. In one approach to the matching problem we needed to compute P(A i1 A ik ), with A i the event that person i receives their own hat. P(A i1 ) = 1 n. P(A i1 A i2 ) = P(A i1 )P(A i2 A i1 ) = 1 n 1 n 1 = 1 n(n 1). P(A i1 A i2 A i3 ) = P(A i1 A i2 )P(A i3 A i1 A i2 ) = 1 n(n 1) 1 n 2 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 36

Friday, September 11, 2015 Conditioning Examples Example In analyzing processes based on a sequence of independent trials it is often useful to condition on the result of the first trial. Suppose a die is rolled until a six occurs. Let E be the event that a six eventually occurs. Let A be the event that the first roll is a six. Then P(E) = P(E A)P(A) + P(E A c )P(A c ) = P(E A) 1 6 + P(E Ac ) 5 6. Now P(E A) = 1. If A does not occur, then the process starts over independently, and P(E A c ) = P(E). So P(E) satisfies P(E) = 1 6 + P(E)5 6. So P(E) = 1. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 37