Joint, Conditional, & Marginal Probabilities

Similar documents
Joint, Conditional, & Marginal Probabilities

Statistics 220 Bayesian Data Analysis

Lecture 2. Conditional Probability

Probability, Statistics, and Bayes Theorem Session 3

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

P (E) = P (A 1 )P (A 2 )... P (A n ).

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Intermediate Math Circles November 15, 2017 Probability III

- a value calculated or derived from the data.

Lecture 4 Bayes Theorem

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

With Question/Answer Animations. Chapter 7

Formalizing Probability. Choosing the Sample Space. Probability Measures

Bayesian Updating: Odds Class 12, Jeremy Orloff and Jonathan Bloom

6 Event-based Independence and Conditional Probability. Sneak peek: Figure 3: Conditional Probability Example: Sneak Peek

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

2.4 Conditional Probability

Basics on Probability. Jingrui He 09/11/2007

Introduction to Probability Theory, Algebra, and Set Theory

Chapter 1 Review of Equations and Inequalities

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

the time it takes until a radioactive substance undergoes a decay

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

Basic Probability. Introduction

Lecture 4 Bayes Theorem

Introduction to Algebra: The First Week

Terminology. Experiment = Prior = Posterior =

Statistical Distribution Assumptions of General Linear Models

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Lecture 3 - More Conditional Probability

Computational Perception. Bayesian Inference

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Uncertainty. Michael Peters December 27, 2013

CS395T Computational Statistics with Application to Bioinformatics

Lecture 04: Conditional Probability. Lisa Yan July 2, 2018

18.05 Problem Set 5, Spring 2014 Solutions

20 Hypothesis Testing, Part I

Lecture 3 Probability Basics

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Introduction to Statistical Methods for High Energy Physics

Statistics 1 - Lecture Notes Chapter 1

Lecture 1: Probability Fundamentals

2. I will provide two examples to contrast the two schools. Hopefully, in the end you will fall in love with the Bayesian approach.

Lecture 3. January 7, () Lecture 3 January 7, / 35

CS 361: Probability & Statistics

Lecture 3: Probability

Probability is related to uncertainty and not (only) to the results of repeated experiments

STOR 435 Lecture 5. Conditional Probability and Independence - I

Park School Mathematics Curriculum Book 1, Lesson 1: Defining New Symbols

Stat 225 Week 2, 8/27/12-8/31/12, Notes: Independence and Bayes Rule

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

CM10196 Topic 2: Sets, Predicates, Boolean algebras

Probabilistic models

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Probability and Discrete Distributions

CHAPTER 1. Introduction

Homework 4 solutions

Quantitative Understanding in Biology 1.7 Bayesian Methods

Descriptive Statistics (And a little bit on rounding and significant digits)

Conditional probability

Properties of Probability

The problem Countable additivity Finite additivity Conglomerability Structure Puzzle. The Measure Problem. Alexander R. Pruss

Computational Cognitive Science

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Bayesian Estimation An Informal Introduction

Pre-calculus is the stepping stone for Calculus. It s the final hurdle after all those years of

Grades 7 & 8, Math Circles 24/25/26 October, Probability

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Assessing Risks. Tom Carter. tom/sfi-csss. Complex Systems

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

Extended Algorithms Courses COMP3821/9801

LECTURE 15: SIMPLE LINEAR REGRESSION I

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

L14. 1 Lecture 14: Crash Course in Probability. July 7, Overview and Objectives. 1.2 Part 1: Probability

Conditional probabilities and graphical models

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. NPTEL National Programme on Technology Enhanced Learning. Probability Methods in Civil Engineering

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Bayesian Learning Extension

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

t. y = x x R² =

Probability. Lecture Notes. Adolfo J. Rumbos

Lecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 11: Uncertainty. Uncertainty.

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

The Island Problem Revisited

Using Probability to do Statistics.

MA554 Assessment 1 Cosets and Lagrange s theorem

EECS 349: Machine Learning Bryan Pardo

CS 361: Probability & Statistics

We set up the basic model of two-sided, one-to-one matching

CS 361: Probability & Statistics

EXAM 2 REVIEW DAVID SEAL

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

2. l = 7 ft w = 4 ft h = 2.8 ft V = Find the Area of a trapezoid when the bases and height are given. Formula is A = B = 21 b = 11 h = 3 A=

Transcription:

Joint, Conditional, & Marginal Probabilities Statistics 110 Summer 2006 Copyright c 2006 by Mark E. Irwin

Joint, Conditional, & Marginal Probabilities The three axioms for probability don t discuss how to create probabilities for combined events such as P [A B] or for the likelihood of an event A given that you know event B occurs. Example: Let A be the event it rains today and B be the event that it rains tomorrow. Does knowing about whether it rains today change our belief that it will rain tomorrow. That is, is P [B], the probability that it rains tomorrow ignoring information on whether it rains today, different from P [B A], the probability that it rains tomorrow given that it rains today. P [B A] is known as the conditional probability of B given A. It is quite likely that P [B] and P [B A] are different. Joint, Conditional, & Marginal Probabilities 1

Example: ELISA (Enzyme-Linked Immunosorbent Assay) test for HIV ELISA is a common screening test for HIV. However it is not perfect as P [+test HIV] = 0.98 P [ test Not HIV] = 0.93 So for people with HIV infections, 98% of them have positive tests (sensitivity), whereas people without HIV infections, 93% of them have negative tests (specificity). These give the two error rates P [ test HIV] = 0.02 = 1 P [+test HIV] P [+test Not HIV] = 0.07 = 1 P [ test Not HIV] (Note: the complement rule holds for conditional probabilities) When this test was evaluated in the early 90s, for a randomly selected American, P [HIV] = 0.01 Joint, Conditional, & Marginal Probabilities 2

The question of real interest is what are P [HIV +test] P [Not HIV test] To figure these out, we need a bit more information. Joint, Conditional, & Marginal Probabilities 3

Definition: Let A and B be two events with P [B] > 0. The conditional probability of A given B is defined to be P [A B] = P [A B] P [B] One way to think about this is that if we are told that event B occurs, the sample space of interest is now B instead of Ω and conditional probability is a probability measure on B. Joint, Conditional, & Marginal Probabilities 4

Since conditional probability is just ordinary probability on a reduced sample space, the usual axioms hold. e.g. P [B B] = 1 (Boundedness) P [A B] 0 (Positivity) If A 1 and A 2 are disjoint, then P [A 1 A 2 B] = P [A 1 B] + P [A 2 B] (Additivity) Also the usual theorems also hold. For example P [A c B] = 1 P [A B] P [A B C] = P [A C] + P [B C] P [A B C] P [A B C] P [A C] + P [B C] 1 Joint, Conditional, & Marginal Probabilities 5

We can use the definition of conditional probability to get Multiplication Rule: Let A and B be events. Then P [A B] = P [A B]P [B] Also the relationship also holds with the other ordering, i.e. P [A B] = P [B A]P [A] Note that P [A B] is sometimes known as the joint probability of A and B. Back to ELISA example To get P [HIV +test] and P [Not HIV test] we need the following quantities P [HIV +test], P [Not HIV test], P [+test], P [ test] Joint, Conditional, & Marginal Probabilities 6

The joint probabilities can be calculated using the multiplication rule P [HIV +test] = P [HIV]P [+test HIV] = 0.01 0.98 = 0.0098 P [HIV test] = P [HIV]P [ test HIV] = 0.01 0.02 = 0.0002 P [Not HIV +test] = P [Not HIV]P [+test Not HIV] = 0.99 0.07 = 0.0693 P [Not HIV test] = P [Not HIV]P [ test Not HIV] = 0.99 0.93 = 0.9207 The marginal probabilities on test status are P [+test] = P [HIV +test] + P [Not HIV +test] = 0.0098 + 0.0693 = 0.0791 P [ test] = P [HIV test] + P [Not HIV test] = 0.0002 + 0.9207 = 0.9209 Joint, Conditional, & Marginal Probabilities 7

It is often convenient to display the joint and marginal probabilities in a 2 way table as follows HIV Not HIV +Test 0.0098 0.0693 0.0791 Test 0.0002 0.9207 0.9209 0.0100 0.9900 1.0000 Note that the calculation of the test status probabilities is an example of the Law of Total Probability Law of Total Probability: Let B 1, B 2,..., B n be such that n i=1 B i = Ω and B i B j for i j with P [B i ] > 0 for all i. Then for any event A, P [A] = n P [A B i ]P [B i ] i=1 Joint, Conditional, & Marginal Probabilities 8

Proof P [A] = P [A Ω] [ ( n )] = P A B i = P i=1 [ n ] (A B i ) i=1 Since the events A B i are disjoint [ n ] P (A B i ) = i=1 = n P [A B i ] i=1 n P [A B i ]P [B i ] i=1 Joint, Conditional, & Marginal Probabilities 9

Note: If a set of events B i satisfy the conditions above, they are said to form a partition of the sample space. One way to think of this theorem is to find the probability of A, we can sum the conditional probabilities of A given B i, weighted by P [B i ]. Now we can get the two desired conditional probabilities. P [HIV +test] = P [Not HIV test] = P [HIV +test] P [+test] P [Not HIV test] P [ test] = 0.0098 0.0791 = 0.124 = 0.9207 0.9209 = 0.99978 These numbers may appear surprising. What is happening here is that most of the people that have positive test are actually uninfected and they are swamping out the the people that actually are infected. One key thing to remember that P [A B] and P [B A] are completely different things. In the example P [HIV +test] and P [+test HIV] are describing two completely different concepts. Joint, Conditional, & Marginal Probabilities 10

The above calculation are an example of Bayes Rule. Bayes Rule: Let A and B 1, B 2,..., B n be events where B i are disjoint, n i=1 B i = Ω, and P [B i ] > 0 for all i. Then P [B j A] = P [A B j]p [B j ] n i=1 P [A B i]p [B i ] Proof. The numerator is just P [A B j ] by the multiplication rule and the denominator is P [A] by the law of total probability. Now just apply the definition of conditional probability. One way of thinking of Bayes theorem is that is allows the direction of conditioning to be switched. In the ELISA example, it allowed switching from conditioning on disease status to conditioning to on test status. While it is possible to directly apply Bayes theorem, it is usually safer, particularly early on, to apply the definition of conditional probability and calculate the necessary pieces separately, as I did in the ELISA example. Joint, Conditional, & Marginal Probabilities 11

There is another way of looking at Bayes theorem. Instead of probabilities, we can look at odds. An equivalent statement is P [B i A] P [B j A] }{{} Posterior Odds = P [A B i] P [A B j ] }{{} Likelihood Ratio P [B i] P [B j ] }{{} Prior Odds This approach is useful when B 1, B 2,..., B n is a set of competing hypotheses and A is information (data) that we want to use to try to help pick the correct hypothesis. This suggests another way of thinking of Bayes theorem. It tells us how to update probabilities in the presence of new evidence. Joint, Conditional, & Marginal Probabilities 12

Who is this man? This is reportedly the only known picture of the Reverend Thomas Bayes, F.R.S. 1701? - 1761. This picture was taken from the 1936 History of Life Insurance (by Terence O Donnell, American Conservation Co., Chicago). As no source is given, the authenticity of this portrait is open to question. So what is the probability that this is actually Reverend Thomas Bayes? However there is some additional information. How does this change our belief about who this is? Who is this man? 13

Many details about Bayes are sketchy. Much of his work was unpublished and what was often was anonymous. According to Steve Stigler The date of his birth is not known: Bayes s posterior is better known than his prior. Who is this man? 14

Actually his date of death isn t that well known either. Its general considered to be April 7, 1761 however it has also be reported as April 17th of the same year. There is some additional information we can get from this picture to help use decide whether this is Bayes or not. 1. The caption under the photo in O Donnell s book was Rev. T. Bayes: Improver of the Columnar Method developed by Barrett. There are some problems with this claim. First Barrett was born in 1752 and would have been about 9 years old when Bayes died. In addition, the method that Bayes allegedly improved was apparently developed between 1788 and 1811 and read to the Royal Society in 1812, long after Bayes death. So there is a problem here, but whether it is a problem with just the caption or the picture as well isn t clear. Who is this man? 15

2. Bayes was a Nonconformist (Presbyterian) Minister. Does the clothing in the picture match that of a Nonconformist Minister in the 1740 s and 1750 s. The picture has been compared to three other Ministers, Joshua Bayes, Bayes father, Richard Price (portrait dated 1776), the person who read Bayes paper to the Royal Society, and Philip Doddridge, a friend of Bayes brother-in-law. Joshua Bayes Richard Price Philip Doddridge (1671-1746) (1723-1791) (1702-1751) Who is this man? 16

Two things stand out in the comparisons. (a) No wig. It is likely that Bayes should have been wearing a wig similar to Doddridge s, which was going out of fashion in the 1740 s or similar to Price s, which was coming into style at the time. (b) Bayes appears to be wearing a clerical gown like his father or a larger frock coat with a high collar. On viewing the other two pictures, we can see that the gown is not in the style for Bayes generation and the frock coat with a large collar is definitely anachronistic. (Interpretation of David Bellhouse from IMS Bulletin 17, No. 1, page 49) Who is this man? 17

Question: How to we incorporate this information to adjust our probability that this is actually a picture of Bayes? Answer: Theorem. P [This is Bayes Data] which can be determined by Bayes P [B Data] = P [B]P [Data B] P [B]P [Data B] + P [B c ]P [Data B c ] Note that this is not easy to do as assigning probabilities here is difficult. For an example of how this can be done, see the paper (available on the course web site) Stigler SM (1983). Who Discovered Bayes s Theorem. American Statistician 37: 290-296. Who is this man? 18

In this paper, Stigler examines whether Bayes was the first person to discover what is now known as Bayes Theorem. There is evidence that the result was known in 1749, 12 years before Bayes death and 15 years before Bayes paper was published. In Stigler s analysis P [Bayes 1st discovered Data] = 0.25 P [Saunderson 1st discovered Data] = 0.75 Note that Bayes actually proved a special case of Bayes Theorem involving inference on a Bernoulli success probability. Who is this man? 19

Monty Hall Problem There are three doors. One has a car behind it and the other two have farm animals behind them. You pick a door, then Monty will have the lovely Carol Merrill open another door and show you some farm animals and allow you to switch. You then win whatever is behind your final door choice. You choose door 1 and then Monty opens door 2 and shows you the farm animals. Should you switch to door 3? Monty Hall Problem 20

Answer: It depends Three competing hypotheses D 1, D 2, and D 3 where D i = {Car is behind door i} What should our observation A be? A = {Door 2 is empty}? or A = {The opened door is empty}? or something else? We want to condition on all the available information, implying we should use A = {After door 1 was selected, Monty chose door 2 to be opened} Monty Hall Problem 21

Prior probabilities on car location: P [D i ] = 1 3, i = 1, 2, 3 Likelihoods: P [A D 1 ] = 1 2 ( ) P [A D 2 ] = 0 P [A D 3 ] = 1 P [A] = 1 2 1 3 + 0 1 3 + 1 1 3 = 1 2 Monty Hall Problem 22

Posterior probabilities on car location: P [D 1 A] = 1 2 1 3 1 2 = 1 3 No change! P [D 2 A] = 0 1 3 1 2 P [D 3 A] = 1 1 3 1 2 = 0 = 2 3 Bigger! If you are willing to assume that when two empty doors are available, Monty will randomly choose one of them to open (with equal probability) (assumption *), then you should switch. You ll win the car 2 3 of the time. Monty Hall Problem 23

Now instead, assume Monty opens the door based on the assumption P [A D 1 ] = 1 ( ) i.e. Monty will always choose door 2 when both doors 2 and 3 have animals behind them. (The other two are the same.) Now P [A] = 1 1 3 + 0 1 3 + 1 1 3 = 2 3 Now the posterior probabilities are P [D 1 A] = 1 1 3 2 3 P [D 2 A] = 0 P [D 3 A] = 1 1 3 2 3 = 1 2 = 1 2 Monty Hall Problem 24

So in this situation, switching doesn t help (doesn t hurt either). Note: This is an extremely subtle problem that people have discussed for years (go do a Google search on Monty Hall Problem). Some of the discussion goes back before the show Let Make a Deal ever showed up on TV. The solution depends on precise statements about how doors are chosen to be opened. Changing the assumptions can lead to situations that changing can t help and I believe there are situations where changing can actually be worse. They can also lead to situations where the are advantages to picking certain doors initially. Monty Hall Problem 25

General Multiplication Rule The multiplication rule discussed earlier can be extended to an arbitrary number of events as follows P [A 1 A 2... A n ] = P [A 1 ] P [A 2 A 1 ] P [A 3 A 1 A 2 ]... P [A n A 1 A 2... A n 1 ] This rule can be used to build complicated probability models. General Multiplication Rule 26

Example: Jury selection When a jury is selected for a trial, it is possible that a prospective can be excused from duty for cause. The following model describes a possible situation. Bias of randomly selected juror - B 1 : Unbiased - B 2 : Biased against the prosecution - B 3 : Biased against the defence R: Existing bias revealed during questioning E: Juror excused for cause General Multiplication Rule 27

The probability of any combination of these three factors can be determined by multiplying the correct conditional probabilities. The probabilities for all twelve possibilities are R E E c E E c R c B 1 0 0 0 0.5000 B 2 0.0595 0.0255 0 0.0150 B 3 0.2380 0.1020 0 0.0600 For example P [B 2 R E c ] = 0.1 0.85 0.3 = 0.0255. General Multiplication Rule 28

From this we can get the following probabilities P [E] = 0.2975 P [E c ] = 0.7025 P [R] = 0.4250 P [R c ] = 0.5750 P [B 1 E] = 0 P [B 2 E] = 0.0595 P [B 3 E] = 0.238 P [B 1 E c ] = 0.5 P [B 2 E c ] = 0.0405 P [B 3 E c ] = 0.162 From these we can get the probabilities of bias status given that a person was not excused for cause from the jury. P [B 1 E c ] = 0.5 0.7025 = 0.7117 P [B 1 E] = 0 0.2975 = 0 P [B 2 E c ] = 0.0405 0.7025 = 0.0641 P [B 2 E] = 0.0595 0.2975 = 0.2 P [B 3 E c ] = 0.1620 0.7025 = 0.2563 P [B 3 E] = 0.238 0.2975 = 0.8 General Multiplication Rule 29