Probability Intro Part II: Bayes Rule

Similar documents
Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Compute f(x θ)f(θ) dθ

Classical and Bayesian inference

Bayesian RL Seminar. Chris Mansley September 9, 2008

Computational Perception. Bayesian Inference

Computer vision: models, learning and inference

Probability and Inference

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Frequentist Statistics and Hypothesis Testing Spring

Computational Cognitive Science

Probability Theory for Machine Learning. Chris Cremer September 2015

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Bayesian Analysis (Optional)

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Bayesian Updating: Discrete Priors: Spring

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

Notes on Probability

Basics on Probability. Jingrui He 09/11/2007

Bayesian Methods: Naïve Bayes

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Lecture 6: Markov Chain Monte Carlo

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Bayesian Inference for Normal Mean

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Review: Statistical Model

Naïve Bayes classification

Math Review Sheet, Fall 2008

Computational Cognitive Science

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

STAT J535: Introduction

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

The Naïve Bayes Classifier. Machine Learning Fall 2017

Probability & statistics for linguists Class 2: more probability. D. Lassiter (h/t: R. Levy)

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Probability Review. Chao Lan

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Overview of Probability. Mark Schmidt September 12, 2017

STA 291 Lecture 8. Probability. Probability Rules. Joint and Marginal Probability. STA Lecture 8 1

Massachusetts Institute of Technology

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

7 Gaussian Discriminant Analysis (including QDA and LDA)

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Lecture 15. DATA 8 Spring Sampling. Slides created by John DeNero and Ani Adhikari

A PECULIAR COIN-TOSSING MODEL

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Discrete Probability and State Estimation

Review: Bayesian learning and inference

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Lecture 02: Summations and Probability. Summations and Probability

18.05 Practice Final Exam

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Language as a Stochastic Process

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Bayesian Concept Learning

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Rapid Introduction to Machine Learning/ Deep Learning

Computational Cognitive Science

Some Basic Concepts of Probability and Information Theory: Pt. 2

Appendix A : Introduction to Probability and stochastic processes

CSE 473: Artificial Intelligence

CSC321 Lecture 18: Learning Probabilistic Models

TDA231. Logistic regression

{ p if x = 1 1 p if x = 0

Introduction into Bayesian statistics

Intro to Probability Instructor: Alexandre Bouchard

COMP 328: Machine Learning

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

LEARNING WITH BAYESIAN NETWORKS

Which coin will I use? Which coin will I use? Logistics. Statistical Learning. Topics. 573 Topics. Coin Flip. Coin Flip

Lecture 1: Probability Fundamentals

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Bayesian Approaches Data Mining Selected Technique

Be able to define the following terms and answer basic questions about them:

Bayesian probability theory and generative models

(1) Introduction to Bayesian statistics

Lecture 2: Conjugate priors

Lecture 5: Bayes pt. 1

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Cartesian-product sample spaces and independence

Probabilistic and Bayesian Machine Learning

Single Maths B: Introduction to Probability

CS 188: Artificial Intelligence Spring Today

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING

Bayesian Networks BY: MOHAMAD ALSABBAGH

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Probability and Statistics (Continued)

Transcription:

Probability Intro Part II: Bayes Rule Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 13

Quick recap Random variable X takes on different values according to a probability distribution discrete: probability mass function (pmf) continuous: probability density function (pdf) marginalization: summing ( splatting ) conditionalization: slicing

conditionalization ( slicing ) 3 2 ( joint divided by marginal ) 1 0 1 2 3 3 2 1 0 1 2 3-3 -2-1 0 1 2 3

conditionalization ( slicing ) 3 2 ( joint divided by marginal ) 1 0 1 2 3 3 2 1 0 1 2 3-3 -2-1 0 1 2 3

conditionalization ( slicing ) conditional 3 2 1 0 1 marginal P(x) 2 3 3 2 1 0 1 2 3-3 -2-1 0 1 2 3

conditional densities 3 2 1 0 1 2 3 3 2 1 0 1 2 3-3 -2-1 0 1 2 3

conditional densities 3 2 1 0 1 2 3 3 2 1 0 1 2 3-3 -2-1 0 1 2 3

Bayes Rule Conditional Densities likelihood prior Bayes Rule posterior marginal probability of y ( normalizer )

A little math: Bayes rule very simple formula for manipulating probabilities P(B A) = conditional probability probability of B given that A occurred P(A B) P(B) P(A) probability of B probability of A simplified form: P(B A) P(A B) P(B)

Example: 2 coins A little math: Bayes rule P(B A) P(A B) P(B) one coin is fake: heads on both sides (H / H) one coin is standard: (H / T) You grab one of the coins at random and flip it. It comes up heads. What is the probability that you re holding the fake? p( Fake H) p(h Fake) p(fake) ( 1 ) ( ½ ) = ½ p( Nrml H) p (H Nrml) p(nrml) ( ½ ) ( ½ ) = ¼ probabilities must sum to 1

A little math: Bayes rule P(B A) P(A B) P(B) Example: 2 coins start fake normal H H H T p( Fake H) p( Nrml H) p(h Fake) p(fake) ( 1 ) ( ½ ) = ½ p (H Nrml) p(nrml) ( ½ ) ( ½ ) = ¼ probabilities must sum to 1

A little math: Bayes rule P(B A) P(A B) P(B) Example: 2 coins start fake normal Experiment #2: It comes up tails. What is the probability that you re holding the fake? H H H T p( Fake T) p(t Fake) p(fake) ( 0 ) ( ½ ) = 0 = 0 p( Nrml T) p (T Nrml) p(nrml) probabilities must sum to 1 ( ½ ) ( ½ ) = ¼ = 1

Is the middle circle popping out or in?

P( image OUT & light is above) = 1 P(image IN & Light is below) = 1 Image equally likely to be OUT or IN given sensory data alone What we want to know: P(OUT image) vs. P(IN image) Apply Bayes rule: prior P(OUT image) P(image OUT & light above) P(OUT) P(light above) P(IN image) P(image IN & light below ) P(IN) P(light below) Which of these is greater?

Bayesian Models for Perception Bayes rule: P(B A) P(A B) P(B) Formula for computing: P(what s in the world sensory data) (This is what our brain wants to know!) B A P(world sense data) P(sense data world) P(world) Posterior (resulting beliefs about the world) Likelihood (given by laws of physics; ambiguous because many world states could give rise to same sense data) Prior (given by past experience)

Helmholtz: perception as optimal inference Perception is our best guess as to what is in the world, given our current sensory evidence and our prior experience. helmholtz 1821-1894 P(world sense data) P(sense data world) P(world) Posterior (resulting beliefs about the world) Likelihood (given by laws of physics; ambiguous because many world states could give rise to same sense data) Prior (given by past experience)

Helmholtz: perception as optimal inference Perception is our best guess as to what is in the world, given our current sensory evidence and our prior experience. helmholtz 1821-1894 P(world sense data) P(sense data world) P(world) Posterior (resulting beliefs about the world) Likelihood (given by laws of physics; ambiguous because many world states could give rise to same sense data) Prior (given by past experience)

Many different 3D scenes can give rise to the same 2D retinal image The Ames Room

Many different 3D scenes can give rise to the same 2D retinal image The Ames Room A B How does our brain go about deciding which interpretation? P(image A) and P(image B) are equal! (both A and B could have generated this image) Let s use Bayes rule: P(A image) = P(image A) P(A) / Z P(B image) = P(image B) P(B) / Z

Hollow Face Illusion http://www.richardgregory.org/experiments/

Hollow Face Illusion Hypothesis #1: face is concave Hypothesis #2: face is convex P(convex video) P(video convex) P(convex) P(concave video) P(video concave) P(concave) posterior likelihood prior P(convex) > P(concave) posterior probability of convex is higher (which determines our percept)

prior belief that objects are convex is SO strong we can t over-ride it, even when we know it s wrong! (So your brain knows Bayes rule even if you don t!)

Terminology question: When do we call this a likelihood? A: when considered as a function of x (i.e., with y held fixed) note: doesn t integrate to 1. What s it called as a function of y, for fixed x? conditional distribution or sampling distribution

independence 3 2 1 0 1 2 3 3 2 1 0 1 2 3

independence Definition: x, y are independent iff 3 2 1 0 1 2 3 3 2 1 0 1 2 3

independence Definition: x, y are independent iff 3 2 1 In linear algebra terms: 0 1 (outer product) 2 3 3 2 1 0 1 2 3

Summary marginalization (splatting) conditionalization (slicing) Bayes rule (prior, likelihood, posterior) independence