Probability Review - Bayes Introduction

Similar documents
Statistics Introduction to Probability

Statistics Introduction to Probability

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Prior Choice, Summarizing the Posterior

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Class 26: review for final exam 18.05, Spring 2014

Computational Perception. Bayesian Inference

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Probability and Estimation. Alan Moses

Chapter 5. Bayesian Statistics

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

The Random Variable for Probabilities Chris Piech CS109, Stanford University

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Terminology. Experiment = Prior = Posterior =

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

CS 361: Probability & Statistics

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

STA Module 4 Probability Concepts. Rev.F08 1

Discrete Binary Distributions

Bayesian Models in Machine Learning

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

(1) Introduction to Bayesian statistics

Computational Cognitive Science

18.05 Practice Final Exam

Conditional distributions

Model Checking and Improvement

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Computational Cognitive Science

Common Discrete Distributions

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Lecture 2: Priors and Conjugacy

Exam 2 Practice Questions, 18.05, Spring 2014

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT J535: Introduction

Inference for a Population Proportion

Review of Probabilities and Basic Statistics

Time Series and Dynamic Models

Choosing priors Class 15, Jeremy Orloff and Jonathan Bloom

Introduction to Bayesian Statistics 1

Bayesian Methods for Machine Learning

Introduction into Bayesian statistics

Hierarchical Models & Bayesian Model Selection

Introduction to Machine Learning CMU-10701

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Bayesian Inference. Introduction

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Introduction to Bayesian Inference

Bayesian RL Seminar. Chris Mansley September 9, 2008

Analysis of Engineering and Scientific Data. Semester

CSC321 Lecture 18: Learning Probabilistic Models

David Giles Bayesian Econometrics

Topic 3: The Expectation of a Random Variable

Introduction to Bayesian inference

Deep Learning for Computer Vision

Topic 12 Overview of Estimation

Contents. Decision Making under Uncertainty 1. Meanings of uncertainty. Classical interpretation

Math 493 Final Exam December 01

CSC Discrete Math I, Spring Discrete Probability

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G.

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 5 Spring 2006

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Lecture 2: Conjugate priors

Linear Models A linear model is defined by the expression

Bayesian Regression (1/31/13)

Bayesian Analysis (Optional)

Computational Cognitive Science

Review: Statistical Model

STAT 345 Spring 2018 Homework 4 - Discrete Probability Distributions

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Probability theory basics

STAT 425: Introduction to Bayesian Analysis

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

CS 361: Probability & Statistics

Lecture notes for probability. Math 124

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Bayesian Estimation An Informal Introduction

Advanced Probabilistic Modeling in R Day 1

Lecture 3. Univariate Bayesian inference: conjugate analysis

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Discrete Random Variables

4 Branching Processes

Dept. of Linguistics, Indiana University Fall 2015

Discrete Random Variables

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Common probability distributionsi Math 217 Probability and Statistics Prof. D. Joyce, Fall 2014

MAT Mathematics in Today's World

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Transcription:

Probability Review - Bayes Introduction Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin

Advantages of Bayesian Analysis Answers the questions that researchers are usually interested in, What is the probability that... Formal method for combining prior beliefs with observed (quantitative) information. Natural way of combining information from multiple studies. General approach for model indentification Approach for comparing non-nested models (Bayes factors) Can fit very realistic but complicated models Probability Review - Bayes Introduction 1

Disadvantages of Bayesian Analysis Often computationally more demanding than classical (e.g. frequentist inference). Software availablility: no general purpose software packages like SAS, SPSS, S-Plus/R available. This is getting better though with programs like BUGS. Requires at least one of Elicitation of a real subjective probability distributions of prior beliefs. Sensitivity analysis to show that the choice of prior doesn t strongly affect inference. Can fit overly complicated, but realistic models. Probability Review - Bayes Introduction 2

Bayesian Analysis Want to make probabilistic statements about parameters, θ, functions of parameters, g(θ), processes (which I will throw into the parameters for now), given the probability model and the observed data, y. Need to determine the posterior distribution, p(θ y). Information available: the prior, p(θ), and the data model, p(y θ). This gives us the full probability model, describing the randomness in the parameters and the data. Want to go from p(y θ) to p(θ y). Probability Review - Bayes Introduction 3

Bayes Rule p(θ y) = = p(θ, y) p(y) = p(y θ)p(θ) p(y) p(y θ)p(θ) Θ p(y θ)p(θ)dθ The above is written assuming that θ is a continuous random variable with a density. However it could be discrete, giving p(θ y) = p(y θ)p(θ) i p(y θ i)p(θ i ) Generally I ll just write the continuous version as the discrete version will be analogous (replace integration with summation). Probability Review - Bayes Introduction 4

Bayes s Rule is often written as p(θ y) p(θ)p(y θ) Note that p(y θ), sometimes referred to as the measurement model, when treated as a function of θ for a fixed y, is just the likelihood L(θ y). So Bayes Rule can be thought of as Posterior Prior Likelihood One consequence of this is that Bayesian analysis satisfies the Likelihood Principle, which states that two data sets, with the same likelihood function, should lead to the same inferences. For example, suppose you had two sequences of independent Ber(p) trials of length n. If those two sequences had the same number of successes, then you would want to make the same statements about p from both analyses. Probability Review - Bayes Introduction 5

Odds Ratios: Bayes s rule has a nice form in terms of odds ratios p(θ 1 y) p(θ 2 y) = p(θ 1) p(y θ 1 ) p(θ 2 ) p(y θ 2 ) = p(θ 1) L(θ 1 y) p(θ 2 ) L(θ 2 y) i.e. the posterior odds are the prior odds times the likelihood ratio. Probability Review - Bayes Introduction 6

What is Probability? If we are going to use probability to describe our levels of belief about a parameter or a process, we need to have an idea what we mean by probability. Example 1: I have two dice in my pocket, one yellow and one purple. What is The probability that the yellow one rolls a 6? The probability that the purple one rolls a 6? The sum of the two rolls is 12? Probability Review - Bayes Introduction 7

Here is a picture of the two dice for those in the back. So the probabilities are P [Yellow = 6] = 0 P [Purple = 6] = 1 20 P [Sum = 12] = 1 20 When determining probabilities and probability model there are two things that need to be considered: 1. What assumptions are you making (e.g. each outcome equally likely for each die and the dice are independent)? 2. What information are you conditioning on? All probabilities are effectively conditional. Probability Review - Bayes Introduction 8

Example 2: I have another two dice in my pocket (Blue (die 1) and Yellow (die 2)). What is the probability that they both roll the same number? Let Y i = Roll on die i. Then P [All the same] = P [All 1] + P [All 2] +... = P [Y 1 = Y 2 = 1] + P [Y 1 = Y 2 = 2] +... Considerations: 1. Any dependency between the rolls on each die? Lets assume not. 2. What should we condition on - Fool me once, shame on you. Fool me twice, shame on me or will he only try to fool us once? Probability Review - Bayes Introduction 9

3. What could each die look like? There are 5 different dice based on Platonic solids (4, 6, 8, 12, 20 sides). I ve heard of somebody trying to develop a gambling game based on a 7 sided die. Let D i = Number faces on die i 4. What numbers could be on each die? 1, 2,..., D i? All 5 s? Is each side equally likely? Probability Review - Bayes Introduction 10

5. Any dependency between which dice are chosen? If the Blue die is 6 sided, can the other be 6 sided as well? Or are the D i s independent? 6. If the D i are independent, what are the probabilities? Assumption 1 gives P [Y 1 = Y 2 = 1] = = i=1 i=1 P [Y 1 = 1, D 1 = i, Y 2 = 1, D 2 = j] j=1 {P [Y 1 = 1 D 1 = i]p [Y 2 = 1 D 2 = j] j=1 P [D 1 = i, D 2 = j]} Probability Review - Bayes Introduction 11

If D 1 and D 2 are independent, then this reduces to P [Y 1 = Y 2 = 1] = = i=1 {P [Y 1 = 1 D 1 = i]p [Y 2 = 1 D 2 = j] j=1 P [D 1 = i, D 2 = j]} { } P [Y 1 = 1 D 1 = i]p [D 1 = i] i=1 P [Y 2 = 1 D 2 = j]p [D 2 = j] j=1 = P [Y 1 = 1]P [Y 2 = 1] Probability Review - Bayes Introduction 12

So we can get an answer, lets assume that 1. P [D i = 4] = P [D i = 6] = P [D i = 8] = P [D i = 12] = P [D i = 20] = 1 5 2. [Y i = k D i ] = 1 D i ; k = 1,..., D i Under these assumptions k 1-4 5-6 7-8 9-12 13-20 P [Y i = k] 81 600 51 600 31 600 16 600 6 600 P [All the same] = 0.0963 Probability Review - Bayes Introduction 13

Where to probabilities come from? Long run relative frequencies If an experiment of independent trials is repeated over and over, the relative frequency of an event will converge to the probability of the event. Let A be the event of interest and let p = P [A]. Then for a sequence of independent trials, let Y i = I(A occurs in trial i); X n = n i=1 Y i Then by the law of large numbers, the sample proportion of successes X n n p as n Probability Review - Bayes Introduction 14

For example, three different experiments looked at the probability of getting a head when flipping a coin. The French naturalist Count Buffon: 4040 tosses, 2048 heads (ˆp = 0.5069). While imprisoned during WWII, the South African statistician John Kerrich: 10000 tosses, 5067 heads (ˆp = 0.5067) Statistician Karl Pearson: 24000 tosses, 12012 heads (ˆp = 0.5005) Probability Review - Bayes Introduction 15

Subjective beliefs Can be used for experiments that can t be repeated exactly, such as a sporting event. For example, what is the probability that the Patriots will win the Super Bowl next year. Can be done through comparison (i.e. is getting a head on a single flip of a coin more or less likely, getting a 6 when rolling a fair 6 sided die, etc). Can also be done by comparing different possible outcomes (1.5 times more likely than the Eagles, 10 more likely than than the Jets, 1,000,000 times more likely than the 49ers, etc). Often expressed in terms of odds Odds = Prob Odds ; Prob = 1 Prob 1 + Odds Probability Review - Bayes Introduction 16

The idea of subjective probability fits into the idea of a fair bet. Let p [0, 1] be the amount that you are willing to bet for a return of $1 if the event E occurs, i.e. gain $(1-p) is E occurs, lose $p if the complement occurs. If you want this to be a fair bet (E[gain] = 0), then p is your subjective probability of the event E occuring. Let p be the probability of success. Then For this to be 0, p = p. E[gain] = (1 p) p p(1 p) Probability Review - Bayes Introduction 17

Models, physical understanding, etc. The structure of the problem will often suggest a probability model. For example, the physics of rolling a die suggest that no one side should be favoured (equally likely outcomes), giving the uniform model used in the earlier example. However this could be verified by looking at the long run frequencies. Example: Genetics - Mendel s breeding experiments. The expected fraction of observed phenotypes in one of Mendel s experiments is given by the following model Round/Yellow Round/Green Wrinkled/Yellow Wrinkled/Green 2+(1 θ) 2 4 θ(1 θ) 4 θ(1 θ) 4 (1 θ) 2 4 Probability Review - Bayes Introduction 18

The value θ, known as the recombination fraction, is a measure of distance between the two genes which regulate the two traits. θ must satisfy 0 θ 0.5 (under some assumptions about the process of meiosis) and when the two genes are on different chromosomes, θ = 0.5. The probabilities used to describe Mendel s experiments come from current beliefs of the underlying processes of meiosis (actually approximations to the processes), and the relationship between genes (genotype) and expressed traits (phenotypes). Actually the model is not right if we analyze Mendel s data. The model assumes that Mendel knew the genotypes of the plants he was crossing in his experiment. However his mechanism for determining this was not perfect. Probability Review - Bayes Introduction 19

Note that effectively, all probabilities and probability models are subjective. We must make assumptions about independence, exchangability, models whenever we analyze data. It also I feel makes moot the idea of an objective analysis. We are always making assumptions when building models. Even if we agree on a model, say linear regression, different people may proceed differently. Two people could look at a residual plot and one may decide that it support the standard linear regression model and the other may decide that it supports some curvature and non-homogeneity of variance. Probability Review - Bayes Introduction 20

Since all models are wrong the scientist cannot obtain a correct one by excessive elaboration.... Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity. Since all models are wrong the scientist must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad. George E. P. Box, 1976 All models are wrong but some are useful George E. P. Box Probability Review - Bayes Introduction 21

Single Parameter Models Binomial model Prior choice - conjugate vs non-conjugate priors Summarizing the posterior Sensitivity analysis Prediction Normal, Poisson, and Exponential models Single Parameter Models 22

Binomial Model Punxsutawney Phil and Wiarton Willie from first class Observe Willie for n years and observe y, the number of times the Willie correctly predicts winters finish. Assume that each year is independent and the outcome each year is a Bernoulli trial with success probability π. This implies that y π Bin(n, π) The sample size n is fixed and the success probability π can be any number in [0, 1]. Single Parameter Models 23

The measurement model is p(y π) = ( ) n π y (1 π) n y ; π [0, 1] y Since n is fixed in this analysis, we ll drop it as a conditioning argument for ease of notation. Want to make inference on π given y and n. Need a prior distribution p(π) for π. Bayes choice: π U(0, 1) p(π) = { 1 0 π 1 0 Otherwise Single Parameter Models 24

So the Bayes Rule gives p(y, π) = ( ) n π y (1 π) n y y p(y) = 1 0 ( ) n π y (1 π) n y dπ = 1 y n + 1 ( ) n p(π y) = (n + 1) π y (1 π) n y y In the Wiarton Willie case, n = 41 and y = 37. So the posterior density is ( ) 41 p(π y) = 42 π 37 (1 π) 4 = 42! 37 37!4! π37 (1 π) 4 Where did p(y) = 1 n+1 come from? Single Parameter Models 25

It is often easier to deal with the proportional form of Bayes Rule so p(π y) p(π)p(y π) p(π y) 1 π y (1 π) n y This is proportional to the density of a Beta(y +1, n y +1) distribution. The PDF for the for the beta distribution (Beta(α, β)) is f(x) = Γ(α + β) Γ(α)Γ(β) xα 1 (1 x) β 1 Single Parameter Models 26

In the example last class the properties of the posterior distribution used the fact that for a Beta(α, β) RV E[X] = Var(X) = Mode(X) = α α + β αβ (α + β) 2 (α + β + 1) α 1 α + β 2 Single Parameter Models 27