Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Similar documents
Computational Perception. Bayesian Inference

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Probability Review - Bayes Introduction

Probability and Estimation. Alan Moses

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

CSC321 Lecture 18: Learning Probabilistic Models

Lecture 3. Univariate Bayesian inference: conjugate analysis

Chapter 5. Bayesian Statistics

Computational Cognitive Science

Computational Cognitive Science

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Quantitative Understanding in Biology 1.7 Bayesian Methods

CS 361: Probability & Statistics

Introduction into Bayesian statistics

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Bayesian Models in Machine Learning

Bayesian Analysis (Optional)

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Class 26: review for final exam 18.05, Spring 2014

PMR Learning as Inference

Principles of Bayesian Inference

Compute f(x θ)f(θ) dθ

Introduction to Probabilistic Machine Learning

18.05 Practice Final Exam

Discrete Binary Distributions

Bayesian Inference. Introduction

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

CS 361: Probability & Statistics

Principles of Bayesian Inference

David Giles Bayesian Econometrics

Statistics: Learning models from data

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Bayesian Inference. p(y)

COMP90051 Statistical Machine Learning

Linear Models A linear model is defined by the expression

Introduc)on to Bayesian Methods

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Principles of Bayesian Inference

Applied Bayesian Statistics STAT 388/488

Principles of Bayesian Inference

Bayesian Inference: Posterior Intervals

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Midterm Examination. Mth 136 = Sta 114. Wednesday, 2000 March 8, 2:20 3:35 pm

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

6.867 Machine Learning

Conjugate Priors, Uninformative Priors

LEARNING WITH BAYESIAN NETWORKS

COS513 LECTURE 8 STATISTICAL CONCEPTS

Introduction to Machine Learning CMU-10701

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Bayesian Learning (II)

Review: Statistical Model

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

P (E) = P (A 1 )P (A 2 )... P (A n ).

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Density Estimation. Seungjin Choi

More Spectral Clustering and an Introduction to Conjugacy

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

What are the Findings?

Comparison of Bayesian and Frequentist Inference

Lecture 2: Priors and Conjugacy

Conditional distributions

Bayesian RL Seminar. Chris Mansley September 9, 2008

Introduction to Machine Learning

Lecture 6. Prior distributions

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

The Random Variable for Probabilities Chris Piech CS109, Stanford University

A Discussion of the Bayesian Approach

Computational Cognitive Science

Frequentist Statistics and Hypothesis Testing Spring

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips

Classical and Bayesian inference

Probability Theory and Simulation Methods

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

STAT 425: Introduction to Bayesian Analysis

CS281A/Stat241A Lecture 22

Swarthmore Honors Exam 2012: Statistics

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

Chapter 8: Sampling distributions of estimators Sections

Conditional probabilities and graphical models

Parametric Techniques Lecture 3

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Data Analysis and Uncertainty Part 2: Estimation

Contents. Decision Making under Uncertainty 1. Meanings of uncertainty. Classical interpretation

Probabilistic and Bayesian Machine Learning

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

A primer on Bayesian statistics, with an application to mortality rate estimation

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Naïve Bayes classification

Transcription:

Bayesian Inference STA 121: Regression Analysis Artin Armagan

Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y θ)p(θ)dθ

Flipping the Coin One of your friends - who s not necessarily the most reliable one - approaches you with a coin for a betting game. He says he wins every time we observe Heads after a flip. After a hundred flips, you lose all your money. At the end of a hundred flips (n=100), we observe eighty Heads (y=80). Something doesn t look right... You remember that statistics class you took last semester! Who thought you d ever use that stuff.

Bad Coin? You remember from your statistics class that the number of successes, Y, in an experiment with n Bernoulli trials is Binomially distributed with a probability of success θ. P(Y=y θ,n) = n C k θ y (1-θ) n-y It looks like a reverse problem since you know how many successes (Heads) you observed but don t know the underlying proportion (θ) that generated these successes. The data you observed definitely suggests a potential value for θ. You remember that the maximum likelihood estimator for θ in this case would be 80/100=0.8 which is far from fair. Is this sufficient information to conclude that this is not a fair coin though?

Likelihood x Prior You remember that you could make probabilistic statements about θ using some Bayesian stuff from your statistics class. You have your sampling distribution (Binomial), and need a prior distribution over θ. Since you don t have much of an idea about what θ may be a priori, you suggest that it can be any value in [0,1]. To assert your ignorance about θ, you use a uniform distribution over [0,1] as a prior distribution for θ. You remember that the product of the likelihood and the prior was proportional to the posterior distribution of the parameter of interest. p(θ y) θ y (1-θ) n-y x Constant θ y (1-θ) n-y

Is it a Beta? You remember that, if you can normalize this posterior distribution so that it integrates to one, you re all set! p(θ y) = θ y (1-θ) n-y /p(y), where p(y)= θ y (1-θ) n-y dθ. Somehow the expression θ y (1-θ) n-y looks familiar to you. You suspect you ve seen a density function that looks like this and you go back and check some of your many statistics books. Now you remember! It looks like a Beta distribution. If a random variable, θ, is Beta distributed, it s pdf is Γ(α+β)/[Γ(α)Γ(β)] x θ α-1 (1-θ) β-1.

Posterior Dashed line represents the prior distribution while the solid line is the posterior distribution of θ.

Credible Sets 99.99% 99% 99% Credible Set = [0.6817036, 0.8851065] 99.99% Credible Set = [0.6167157, 0.9192412] P(θ>0.5) 1 Victory!!!

Conclusion We can obtain actual probabilistic statements about θ using credible sets unlike the confidence intervals in frequentist inference. Here we observed that there is a 99.99% chance that θ lives in the interval [0.6167157, 0.9192412]. Note that this interval excludes the value θ = 0.5. In fact, we can go ahead and calculate the probability that θ > 0.5. P(θ>0.5) 1. Thus we have very strong evidence that this coin is not a fair one.

A Different Scenario Let this time one of your really reliable friends come to you for a little betting game with a coin. After a hundred coin tosses, you again lose all your money. However this time, since this is a good, reliable friend, you want to use some personal judgement that he wouldn t cheat with an adjusted coin. You need to find a way to assert some type if prior belief about the situation. Due to your trust in your friend, you think that the underlying θ should be centered around zero. How concentrated it is around zero is really up to how much you trust him.

Prior Density 0.6 0.8 1.0 1.2 1.4 Potential Priors 0.0 0.2 0.4 0.6 0.8 1.0 θ α=1, β=1 Prior Density 0.0 0.5 1.0 1.5 α=2, β=2 Prior Density 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 θ α=50, β=50 Prior Density 0 2 4 6 8 α=5, β=5 0.0 0.2 0.4 0.6 0.8 1.0 θ Prior Density 0.0 1.0 2.0 3.0 0.0 0.2 0.4 0.6 0.8 1.0 θ α=10, β=10 0.0 0.2 0.4 0.6 0.8 1.0 θ

Likelihood x Prior p(θ y) θ y (1-θ) n-y x θ α-1 (1-θ) β-1 θ y+α-1 (1-θ) n-y+β-1 Thus, θ is again Beta distributed with parameters α * = y+α and β * = n-y+β. Depending on our trust level (equivalently how much weight we want to assign to the prior belief) the posterior is going to change.

Here we used a very special prior distribution. The choice of prior distribution led to a same type posterior distribution, i.e. both prior and posterior were Beta distributions. Such priors are called conjugate priors. Posterior The stronger our belief is that the coin is a fair one, posterior starts shifting towards 0.5. Eventually, if we assign enough weight to the prior, it will overwhelm the observed data and center the posterior at 0.5. Priors Posteriors

Inference on the Population Mean This may be one of the first cases in an introductory level statistics course. You have n observations coming from a normal population with an unknown mean μ, and a known variance σ 2. Y i ~ N(μ,σ 2 ) You have learnt that the maximum likelihood estimator is arithmetic mean of the observations, μ ML.

A Conjugate Prior We briefly mentioned conjugate priors. If possible, we d like a prior that would give rise to a posterior of its own kind. A natural idea that comes to mind is the normal distribution (over μ). We know that our observations are coming from a normal distribution. We also know that the convolution (product) of two or more normal distributions would lead to another normal distribution.

Likelihood x Prior N(μ μ n,τ n2 ) μ n = (μ 0 /τ 2 +nμ ML /σ 2 )/(1/τ 2 +n/σ 2 ) 1/τ n 2 = 1/τ 02 +n/σ 2 p(μ y 1,..., y n ) p(y 1,..., y n μ) x p(μ) N(y 1 μ,σ 2 ) x... x N(y 2 μ,σ 2 ) N(μ μ 0,τ 2 )

What If No Prior Info? The posterior mean looks like a weighted average of the maximum likelihood estimator and the prior mean. How much weight we assign to the prior mean is up to how much we trust our prior belief. Note that as the variance of a normal distribution increases, it becomes flatter and flatter. As τ 2, the prior specified earlier becomes a uniform distribution over the whole domain of μ. This imposes NO prior information on the knowledge coming from the observed data.

What If Data Contradicts the Prior? Also notice that as we keep τ 2 constant and as n, the posterior again approaches a normal distribution with mean μ ML and variance σ 2 /n. This is due to the fact that the information coming from the observed data is overwhelming our prior belief. That said, although us Bayesians are statisticians with attitudes, our attitudes dwindle with observed evidence. How quickly it diminishes depends on how much trust we have on our prior knowledge.

Bayesian yet...?