Bayesian Updating: Discrete Priors: Spring

Similar documents
Bayesian Updating: Discrete Priors: Spring

Bayesian Updating: Discrete Priors: Spring 2014

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

Formal Modeling in Cognitive Science

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random Variables; Distributions. Background.

Frequentist Statistics and Hypothesis Testing Spring

Compute f(x θ)f(θ) dθ

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

The Bayes Theorema. Converting pre-diagnostic odds into post-diagnostic odds. Prof. Dr. F. Vanstapel, MD PhD Laboratoriumgeneeskunde UZ KULeuven

Bayesian Updating: Odds Class 12, Jeremy Orloff and Jonathan Bloom

18.05 Problem Set 5, Spring 2014 Solutions

(1) Introduction to Bayesian statistics

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

CS6220: DATA MINING TECHNIQUES

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Computing Posterior Probabilities. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Comparison of Bayesian and Frequentist Inference

Bayesian Learning Extension

Choosing priors Class 15, Jeremy Orloff and Jonathan Bloom

Statistical learning. Chapter 20, Sections 1 3 1

The Naïve Bayes Classifier. Machine Learning Fall 2017

Statistical learning. Chapter 20, Sections 1 4 1

Topic 2 Probability. Basic probability Conditional probability and independence Bayes rule Basic reliability

Base Rates and Bayes Theorem

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CSCE 478/878 Lecture 6: Bayesian Learning

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Answers and expectations

Computational Cognitive Science

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Computational Perception. Bayesian Inference

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Machine Learning CMU-10701

CS6220: DATA MINING TECHNIQUES

Conjugate Priors: Beta and Normal Spring 2018

Bayes Theorem (10B) Young Won Lim 6/3/17

Bayes Theorem (4A) Young Won Lim 3/5/18

Joint, Conditional, & Marginal Probabilities

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Basic Probabilistic Reasoning SEG

PRACTICAL ANALYTICS 7/19/2012. Tamás Budavári / The Johns Hopkins University

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Lecture 1: Probability Fundamentals

Statistical learning. Chapter 20, Sections 1 3 1

CS 361: Probability & Statistics

Exam 2 Practice Questions, 18.05, Spring 2014

P (B)P (A B) = P (AB) = P (A)P (B A), and dividing both sides by P (B) gives Bayes rule: P (A B) = P (A) P (B A) P (B),

MLE/MAP + Naïve Bayes

LEARNING WITH BAYESIAN NETWORKS

Machine Learning

6 Event-based Independence and Conditional Probability. Sneak peek: Figure 3: Conditional Probability Example: Sneak Peek

Probability and (Bayesian) Data Analysis

Bayesian RL Seminar. Chris Mansley September 9, 2008

Part IV: Monte Carlo and nonparametric Bayes

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 28

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Modeling Environment

Bayesian data analysis using JASP

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

When working with probabilities we often perform more than one event in a sequence - this is called a compound probability.

Formalizing Probability. Choosing the Sample Space. Probability Measures

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

Applied Bayesian Statistics STAT 388/488

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Probability Theory for Machine Learning. Chris Cremer September 2015

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

1.6/1.7 - Conditional Probability and Bayes Theorem

Statistical Methods in Particle Physics. Lecture 2

The Random Variable for Probabilities Chris Piech CS109, Stanford University

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

A Bayesian Method for Guessing the Extreme Values in a Data Set

Discrete Random Variables

Conditional Probability. CS231 Dianna Xu

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Ways to make neural networks generalize better

MATH 10 INTRODUCTORY STATISTICS

Introduction to Bayesian Learning

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Learning Bayesian belief networks

Chapter Six. Approaches to Assigning Probabilities

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Model Averaging (Bayesian Learning)

CS 361: Probability & Statistics

Learning with Probabilities

CS 6375 Machine Learning

Bayesian Learning (II)

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 23

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

Probability. Introduction to Biostatistics

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Probability, Statistics, and Bayes Theorem Session 3

Logistic Regression. Machine Learning Fall 2018

A SAS/AF Application For Sample Size And Power Determination

Probability, Entropy, and Inference / More About Inference

Conditional Probability, Independence, Bayes Theorem Spring 2018

Continuous Data with Continuous Priors Class 14, Jeremy Orloff and Jonathan Bloom

Transcription:

Bayesian Updating: Discrete Priors: 18.05 Spring 2017 http://xkcd.com/1236/

Learning from experience Which treatment would you choose? 1. Treatment 1: cured 100% of patients in a trial. 2. Treatment 2: cured 95% of patients in a trial. 3. Treatment 3: cured 90% of patients in a trial. Which treatment would you choose? 1. Treatment 1: cured 3 out of 3 patients in a trial. 2. Treatment 2: cured 19 out of 20 patients treated in a trial. 3. Standard treatment: cured 90000 out of 100000 patients in clinical practice. March 12, 2017 2 / 21

Which die is it? I have a bag containing dice of two types: 4-sided and 10-sided. Suppose I pick a die at random and roll it. Based on what I rolled which type would you guess I picked? Suppose you find out that the bag contained one 4-sided die and one 10-sided die. Does this change your guess? Suppose you find out that the bag contained one 4-sided die and 100 10-sided dice. Does this change your guess? March 12, 2017 3 / 21

Board Question: learning from data A certain disease has a prevalence of 0.005. A screening test has 2% false positives an 1% false negatives. Suppose a patient is screened and has a positive test. 1 Represent this information with a tree and use Bayes theorem to compute the probabilities the patient does and doesn t have the disease. 2 Identify the data, hypotheses, likelihoods, prior probabilities and posterior probabilities. 3 Make a full likelihood table containing all hypotheses and possible test data. 4 Redo the computation using a Bayesian update table. Match the terms in your table to the terms in your previous calculation. Solution on next slides. March 12, 2017 4 / 21

Solution 1. Tree based Bayes computation Let H + mean the patient has the disease and H they don t. Let T + : they test positive and T they test negative. We can organize this in a tree: H + 0.005 0.995 H 0.99 0.01 0.02 0.98 T + T T + T Bayes theorem says P(H + T + ) = P(T + H + )P(H + ). P(T + ) Using the tree, the total probability P(T + ) = P(T + H + )P(H + ) + P(T + H )P(H ) = 0.99 0.005 + 0.02 0.995 = 0.02485 Solution continued on next slide. March 12, 2017 5 / 21

Solution continued So, P(H + T + ) = P(T + H + )P(H + ) P(T + ) P(H T + ) = P(T + H )P(H ) P(T + ) = = 0.99 0.005 0.02485 0.02 0.995 0.02485 = 0.199 = 0.801 The positive test greatly increases the probability of H +, but it is still much less probable than H. Solution continued on next slide. March 12, 2017 6 / 21

Solution continued 2. Terminology Data: The data are the results of the experiment. In this case, the positive test. Hypotheses: The hypotheses are the possible answers to the question being asked. In this case they are H + the patient has the disease; H they don t. Likelihoods: The likelihood given a hypothesis is the probability of the data given that hypothesis. In this case there are two likelihoods, one for each hypothesis P(T + H + ) = 0.99 and P(T + H ) = 0.02. We repeat: the likelihood is a probability given the hypothesis, not a probability of the hypothesis. Continued on next slide. March 12, 2017 7 / 21

Solution continued Prior probabilities of the hypotheses: The priors are the probabilities of the hypotheses prior to collecting data. In this case, P(H + ) = 0.005 and P(H ) = 0.995 Posterior probabilities of the hypotheses: The posteriors are the probabilities of the hypotheses given the data. In this case P(H + T + ) = 0.199 and P(H T + ) = 0.801. Posterior Likelihood Prior P (H + T + ) = P (T + H + ) P (H + ) P (T + ) Total probability of the data March 12, 2017 8 / 21

Solution continued 3. Full likelihood table The table holds likelihoods P(D H) for every possible hypothesis and data combination. hypothesis H likelihood P(D H) disease state P(T + H) P(T H) H + 0.99 0.01 H 0.02 0.98 Notice in the next slide that the P(T + H) column is exactly the likelihood column in the Bayesian update table. March 12, 2017 9 / 21

Solution continued 4. Calculation using a Bayesian update table H = hypothesis: H + (patient has disease); H (they don t). Data: T + (positive screening test). Bayes hypothesis prior likelihood numerator posterior H P(H) P(T + H) P(T + H)P(H) P(H T + ) H + 0.005 0.99 0.00495 0.199 H 0.995 0.02 0.0199 0.801 total 1 NO SUM P(T + ) = 0.02485 1 Data D = T + Total probability: P(T + ) = sum of Bayes numerator column = 0.02485 Bayes theorem: P(H T + ) = P(T + H)P(H) P(T + ) = likelihood prior total prob. of data March 12, 2017 10 / 21

Board Question: Dice Five dice: 4-sided, 6-sided, 8-sided, 12-sided, 20-sided. Suppose I picked one at random and, without showing it to you, rolled it and reported a 13. 1. Make the full likelihood table (be smart about identical columns). 2. Make a Bayesian update table and compute the posterior probabilities that the chosen die is each of the five dice. 3. Same question if I rolled a 5. 4. Same question if I rolled a 9. (Keep the tables for 5 and 9 handy! Do not erase!) March 12, 2017 11 / 21

Tabular solution D = rolled a 13 Bayes hypothesis prior likelihood numerator posterior H P(H) P(D H) P(D H)P(H) P(H D) H 4 1/5 0 0 0 H 6 1/5 0 0 0 H 8 1/5 0 0 0 H 12 1/5 0 0 0 H 20 1/5 1/20 1/100 1 total 1 1/100 1 March 12, 2017 12 / 21

Tabular solution D = rolled a 5 Bayes hypothesis prior likelihood numerator posterior H P(H) P(D H) P(D H)P(H) P(H D) H 4 1/5 0 0 0 H 6 1/5 1/6 1/30 0.392 H 8 1/5 1/8 1/40 0.294 H 12 1/5 1/12 1/60 0.196 H 20 1/5 1/20 1/100 0.118 total 1 0.085 1 March 12, 2017 13 / 21

Tabular solution D = rolled a 9 Bayes hypothesis prior likelihood numerator posterior H P(H) P(D H) P(D H)P(H) P(H D) H 4 1/5 0 0 0 H 6 1/5 0 0 0 H 8 1/5 0 0 0 H 12 1/5 1/12 1/60 0.625 H 20 1/5 1/20 1/100 0.375 total 1.0267 1 March 12, 2017 14 / 21

Iterated Updates Suppose I rolled a 5 and then a 9. Update in two steps: First for the 5 Then update the update for the 9. March 12, 2017 15 / 21

Tabular solution D 1 = rolled a 5 D 2 = rolled a 9 Bayes numerator 1 = likelihood 1 prior. Bayes numerator 2 = likelihood 2 Bayes numerator 1 Bayes Bayes hyp. prior likel. 1 num. 1 likel. 2 num. 2 posterior H P(H) P(D 1 H) P(D 2 H) P(H D 1, D 2 ) H 4 1/5 0 0 0 0 0 H 6 1/5 1/6 1/30 0 0 0 H 8 1/5 1/8 1/40 0 0 0 H 12 1/5 1/12 1/60 1/12 1/720 0.735 H 20 1/5 1/20 1/100 1/20 1/2000 0.265 total 1 0.0019 1 March 12, 2017 16 / 21

Board Question Suppose I rolled a 9 and then a 5. 1. Do the Bayesian update in two steps: First update for the 9. Then update the update for the 5. 2. Do the Bayesian update in one step The data is D = 9 followed by 5 March 12, 2017 17 / 21

Tabular solution: two steps D 1 = rolled a 9 D 2 = rolled a 5 Bayes numerator 1 = likelihood 1 prior. Bayes numerator 2 = likelihood 2 Bayes numerator 1 Bayes Bayes hyp. prior likel. 1 num. 1 likel. 2 num. 2 posterior H P(H) P(D 1 H) P(D 2 H) P(H D 1, D 2 ) H 4 1/5 0 0 0 0 0 H 6 1/5 0 0 1/6 0 0 H 8 1/5 0 0 1/8 0 0 H 12 1/5 1/12 1/60 1/12 1/720 0.735 H 20 1/5 1/20 1/100 1/20 1/2000 0.265 total 1 0.0019 1 March 12, 2017 18 / 21

Tabular solution: one step D = rolled a 9 then a 5 Bayes hypothesis prior likelihood numerator posterior H P(H) P(D H) P(D H)P(H) P(H D) H 4 1/5 0 0 0 H 6 1/5 0 0 0 H 8 1/5 0 0 0 H 12 1/5 1/144 1/720 0.735 H 20 1/5 1/400 1/2000 0.265 total 1 0.0019 1 March 12, 2017 19 / 21

Board Question: probabilistic prediction Along with finding posterior probabilities of hypotheses. We might want to make posterior predictions about the next roll. With the same setup as before let: D 1 = result of first roll D 2 = result of second roll (a) Find P(D 1 = 5). (b) Find P(D 2 = 4 D 1 = 5). March 12, 2017 20 / 21

Solution D 1 = rolled a 5 D 2 = rolled a 4 Bayes hyp. prior likel. 1 num. 1 post. 1 likel. 2 post. 1 likel. 2 H P(H) P(D 1 H) P(H D 1 ) P(D 2 H, D 1 ) P(D 2 H, D 1 )P(H D 1 ) H 4 1/5 0 0 0 0 H 6 1/5 1/6 1/30 0.392 1/6 0.392 1/6 H 8 1/5 1/8 1/40 0.294 1/8 0.294 1/40 H 12 1/5 1/12 1/60 0.196 1/12 0.196 1/12 H 20 1/5 1/20 1/100 0.118 1/20 0.118 1/20 total 1 0.085 1 0.124 The law of total probability tells us P(D 1 ) is the sum of the Bayes numerator 1 column in the table: P(D 1 ) = 0.085. The law of total probability tells us P(D 2 D 1 ) is the sum of the last column in the table: P(D 2 D 1 ) = 0.124 March 12, 2017 21 / 21