BEGINNING BAYES IN R. Bayes with discrete models

Similar documents
BEGINNING BAYES IN R. Comparing two proportions

Likelihood and Bayesian Inference for Proportions

Likelihood and Bayesian Inference for Proportions

Inference for a Population Proportion

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

(1) Introduction to Bayesian statistics

Quantitative Understanding in Biology 1.7 Bayesian Methods

Bayesian Inference for Normal Mean

Using Probability to do Statistics.

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

36-463/663: Hierarchical Linear Models

2. A Basic Statistical Toolbox

Comparison of Bayesian and Frequentist Inference

Bayesian Inference. p(y)

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling

Bayesian Regression Linear and Logistic Regression

POLI 443 Applied Political Research

CS 361: Probability & Statistics

Chapter 5. Bayesian Statistics

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Frequentist Statistics and Hypothesis Testing Spring

Solving Quadratic & Higher Degree Inequalities

Inference for the mean of a population. Testing hypotheses about a single mean (the one sample t-test). The sign test for matched pairs

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Discrete Binary Distributions

R version ( ) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN

Ch. 11 Solving Quadratic & Higher Degree Inequalities

Bayesian Models in Machine Learning

Bayesian RL Seminar. Chris Mansley September 9, 2008

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Exam 2 Practice Questions, 18.05, Spring 2014

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Confidence Distribution

Probability is related to uncertainty and not (only) to the results of repeated experiments

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Chapter Three. Hypothesis Testing

Computational Perception. Bayesian Inference

CS 361: Probability & Statistics

Chapter 1 Statistical Inference

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian Regression (1/31/13)

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Principles of Bayesian Inference

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Chapter 5: HYPOTHESIS TESTING

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Bayesian Updating: Discrete Priors: Spring

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Contents. Decision Making under Uncertainty 1. Meanings of uncertainty. Classical interpretation

Bayesian Inference: Posterior Intervals

Stat 5101 Lecture Notes

CENTRAL LIMIT THEOREM (CLT)

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

STAT Chapter 8: Hypothesis Tests

Inferences About Two Proportions

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

Multiple Regression Analysis

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

An introduction to biostatistics: part 1

ECO220Y Hypothesis Testing: Type I and Type II Errors and Power Readings: Chapter 12,

Chapter 12: Estimation

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

The Bayesian Paradigm

Estimating binomial proportions

The Components of a Statistical Hypothesis Testing Problem

Applied Bayesian Statistics STAT 388/488

Subject CS1 Actuarial Statistics 1 Core Principles

Notes on Machine Learning for and

Part 2: One-parameter models

Quadratic and Other Inequalities in One Variable

Neutral Bayesian reference models for incidence rates of (rare) clinical events

Principles of Bayesian Inference

Principles of Bayesian Inference

Bayesian Learning Extension

A Very Brief Summary of Bayesian Inference, and Examples

Confidence Intervals for Population Mean

The Random Variable for Probabilities Chris Piech CS109, Stanford University

Bayesian Estimation An Informal Introduction

Bayesian Updating: Discrete Priors: Spring

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Bayesian Concept Learning

LEARNING WITH BAYESIAN NETWORKS

Statistics 301: Probability and Statistics 1-sample Hypothesis Tests Module

SOLVING INEQUALITIES and 9.1.2

Testing Independence

Continuous random variables

Bayesian SAE using Complex Survey Data Lecture 1: Bayesian Statistics

In Chapter 17, I delve into probability in a semiformal way, and introduce distributions

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

2.4 Graphing Inequalities

Transcription:

BEGINNING BAYES IN R Bayes with discrete models

Beginning Bayes in R Survey on eating out What is your favorite day for eating out?

Construct a prior for p Define p: proportion of all students who answer "Friday or Saturday" Form opinion about p: 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 are plausible values 0.5 and 0.6 are most likely Each of these is twice as likely as other values

Construct a prior for p > bayes_df <- data.frame(p = seq(0.3, 0.8, by = 0.1), Weight = c(1, 1, 2, 2, 1, 1), Prior = c(1, 1, 2, 2, 1, 1) / 8) > bayes_df P Weight Prior 1 0.3 1 0.125 2 0.4 1 0.125 3 0.5 2 0.250 4 0.6 2 0.250 5 0.7 1 0.125 6 0.8 1 0.125

Collect data Survey random sample of 20 students 12 out of 20 say Friday or Saturday This is a binomial experiment

Likelihood Chance of 12 successes out of 20 where p is probability of success: > # Add likelihood to the table: > bayes_df$likelihood <- dbinom(12, size = 20, prob = bayes_df$p) > bayes_df <- bayes_df[, c(1, 3, 4)] > round(bayes_df, 3) P Prior Likelihood 1 0.3 0.125 0.004 2 0.4 0.125 0.035 3 0.5 0.250 0.120 4 0.6 0.250 0.180 5 0.7 0.125 0.114 6 0.8 0.125 0.022

Turn the Bayesian crank > library(teachbayes) > bayes_df <- bayesian_crank(bayes_df) > round(bayes_df, 3) P Prior Likelihood Product Posterior 1 0.3 0.125 0.004 0.000 0.005 2 0.4 0.125 0.035 0.004 0.046 3 0.5 0.250 0.120 0.030 0.310 4 0.6 0.250 0.180 0.045 0.463 5 0.7 0.125 0.114 0.014 0.147 6 0.8 0.125 0.022 0.003 0.029

Compare prior and posterior > library(teachbayes) > prior_post_plot(bayes_df)

Statistical inference What is the probability that p is larger than 0.5? > round(bayes_df[, c("p", "Posterior")], 3) P Posterior 1 0.3 0.005 2 0.4 0.046 3 0.5 0.310 4 0.6 0.463 5 0.7 0.147 6 0.8 0.029

BEGINNING BAYES IN R Let s practice!

BEGINNING BAYES IN R Continuous prior

Dining survey example Define p: proportion of all students who answer "Friday or Saturday" Form opinion about p: continuous on (0, 1) Represent prior probabilities by a beta curve: a and b are shape parameters of the beta curve

Example beta curve Beta curve with shape parameters 4 and 12: The area under the curve represents probability This dotted line represents the quantile corresponding to the shaded area

Probability for beta(7, 10) curve Beginning Bayes in R

Probability for beta(7, 10) curve > # Probability that p is between 0.4 and 0.8 > library(teachbayes) > beta_area(0.4, 0.8, c(7, 10))

Quantile for beta(7, 10) curve Quantile: value of p such that area to its left is a given number

Quantile for beta(7, 10) curve > # Finding the 0.50 quantile: > library(teachbayes) > beta_quantile(0.5, c(7, 10)) Shaded area is 0.5 p = 0.408

Constructing a prior Use beta curve to represent prior opinion about p Hard to pick values of a and b directly Instead, think of characteristics of the curve that are easier to specify (e.g. quantiles of the curve)

Finding a beta prior for p Specify the 0.5 and 0.9 quantiles Use beta.select() to find the parameters of the beta curve that match this information

Specify 0.50 quantile p50 represents 0.50 quantile for p, meaning p is equally likely to be smaller or larger than p50 After some thought decide that p50 is 0.55 p50 is also known as the median 0.5 0.5 0 0.55 p 1

Specify 0.90 quantile p90 represents 0.90 quantile for p, meaning p is likely (with probability 0.90) to be smaller than p90 After more thought decide that p90 = 0.80 0.9 0.1 0 0.8 1 p

Find the matching beta curve beta.select() finds shape parameters of the beta curve that has the same two quantiles > # Specify 0.50 and 0.90 quantiles > p50 <- list(x = 0.55, p = 0.5) > p90 <- list(x = 0.80, p = 0.9) > # Find the matching beta curve > library(teachbayes) > beta.select(p50, p90) [1] 4.91 3.38 corresponds to a and b shape parameters, respectively

My beta(4.91, 3.38) curve for p > # Plot beta prior obtained from beta.select() > library(teachbayes) > beta_draw(c(4.91, 3.38))

Reasonable prior? > # Compute 50% probability interval > library(teachbayes) > beta_interval(0.5, c(4.91, 3.38))

Reasonable prior? > # Compute probability p is smaller than 0.4 > library(teachbayes) > beta_area(0, 0.4, c(4.91, 3.38))

BEGINNING BAYES IN R Let s practice!

BEGINNING BAYES IN R Updating the beta prior

Recall dining survey example Interested in p: proportion of all students who answer "Friday or Saturday" Formed opinion about p: continuous on (0, 1) Represented prior probabilities by a beta curve: with 4.91 and 3.38 as a and b Survey results: 12 of 20 say "Friday or Saturday"

Likelihood for binomial sampling Chance of 12 successes in sample of 20: Probability of success

Bayes rule (a little math) Compute posterior: means "is proportional to"

Combine beta prior and binomial sampling Posterior is proportional to (beta prior) x (binomial likelihood): Recall that beta curves have this functional form: So the posterior is also a beta curve

From a beta prior to a beta posterior Beta shape 1 Beta shape 2 Prior a b Data # of successes (s) # of failures (f) Posterior a + s b+ f

Example: beta prior to beta posterior Beta Shape 1 Beta Shape 2 Prior 4.91 3.38 Data 12 8 Posterior 4.91 + 12 = 16.91 3.38 + 8 = 11.38 > # Calculate shape parameters for beta posterior: > prior_par <- c(4.91, 3.38) > data <- c(12, 8) > post_par <- prior_par + data > post_par [1] 16.91 11.38

Compare prior and posterior curves > # Overlay posterior on prior curve: > library(teachbayes) > beta_prior_post(prior_par, post_par) The blue prior curve shows a wider distribution, showing more uncertainty than the red posterior curve

BEGINNING BAYES IN R Let s practice!

BEGINNING BAYES IN R Bayesian inference

Recall dining survey example Interested in p: proportion of all students who answer "Friday or Saturday" Current opinion about p: posterior beta curve with 16.91 and 11.38 as a and b

Bayesian inference Based on summarizing posterior beta curve Summary depends on type of inference: Testing problem: interested in plausibility of values of p Interval estimation: interested in interval likely to contain p

Fellow worker makes claim "At least 75% of the college students prefer to eat on Friday or Saturday." Is this a reasonable claim? Hypothesis:

Bayesian approach The hypothesis is just an interval of values Find the area under posterior where If this probability is small, reject claim

Compute Prob(0.75 < p < 1) > # Graph region where p >= 0.75 on posterior beta curve: > library(teachbayes) > beta_area(0.75, 1, c(16.91, 11.38)) Note: 1 is maximum for beta curves 0.041 is small, so we reject the worker s claim

Interval estimate A 90% probability interval is an interval that contains 90% of the posterior probability Convenient to choose an equal-tails interval where 5% of the probability is in each tail

Compute 90% probability interval > # Compute the 90% probability interval: > library(teachbayes) > beta_interval(0.90, c(16.91, 11.38)) (0.445, 0.743) is the interval Shaded area is 0.90

Interpreting Bayesian interval The probability p is in (0.44, 0.74) is exactly 0.90 Differs from interpretation of classical confidence interval One does not know if p is in one 90% confidence interval Confidence is in repeated sampling

Compare with one classical method Add 2 successes and 2 failures method of Agresti and Coull Given successes and sample size, 90% interval is:

Classical CI for our example > y <- 12; n <- 20 > p_hat <- y/n; (se <- sqrt(p_hat * (1 - p_hat)/n)) [1] 0.1095 > (CI <- c(p_hat - 1.645 * se, p_hat + 1.645 * se)) [1] 0.4198725 0.7801275

Compare two intervals The Bayesian interval is shorter because the Bayesian approach combines data with prior information

BEGINNING BAYES IN R Let s practice!

BEGINNING BAYES IN R Posterior simulation

Recall dining survey example Interested in p: proportion of all students who answer "Friday or Saturday" Current beliefs about p: posterior beta curve with 16.91 and 11.38 as a and b

Bayesian inference using simulation So far: we summarized beta posterior density by computing probabilities and quantiles Simulate large number of values from the posterior Then summarize the posterior sample to do inference

Simulate using rbeta() function > # Draw 1000 values from posterior: sim_p > sim_p <- rbeta(1000, c(16.91, 11.38)) > # Check if simulations reflect posterior distribution > hist(sim_p, freq = FALSE) > curve(dbeta(x, 16.91, 11.38), add = TRUE, col = "red", lwd = 3)

Simulate using rbeta() function Beginning Bayes in R

Compute probabilities > # Probability p < 0.5 approximated using simulation > (prob <- sum(sim_p < 0.5) / 1000) [1] 0.145 Recall you drew 1000 samples from the posterior curve > # Exact answer from Beta(16.91, 11.38) posterior curve > pbeta(0.5, 16.91, 11.38) [1] 0.1448413

Compute quantiles > # Find the sample quantiles of simulated values of p: > quantile(sim_p, c(0.05, 0.95)) 5% 95% 0.4491962 0.7404108

Why simulate? We can compute exact posterior summaries using pbeta() and qbeta() functions Exact calculations are difficult for the posteriors of many Bayesian models Learn about functions of parameters of interest

Example: posterior of log odds ratio Interested in 90% probability interval for Useful parameter for categorical data analysis Easy to use simulation > sim_p <- rbeta(1000, 16.91, 11.38) > sim_logit <- log(sim_p / (1 - sim_p)) sim_logit is a sample from posterior distribution of the logit

Posterior of log odds ratio > hist(sim_logit, freq = FALSE)

Posterior summaries > # 90% probability interval for log odds ratio > quantile(sim_logit, c(0.10, 0.90)) 10% 90% -0.1027612 0.8877401

BEGINNING BAYES IN R Let s practice!