Introduction to Applied Bayesian Modeling. ICPSR Day 4

Similar documents
STAT J535: Chapter 5: Classes of Bayesian Priors

Introduction to Bayesian Methods

Lecture 2: Priors and Conjugacy

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Foundations of Statistical Inference

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Lecture 6. Prior distributions

Chapter 8: Sampling distributions of estimators Sections

PMR Learning as Inference

Chapter 5. Bayesian Statistics

Beta statistics. Keywords. Bayes theorem. Bayes rule

More Spectral Clustering and an Introduction to Conjugacy

Introduc)on to Bayesian Methods

Probability and Estimation. Alan Moses

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

CS 361: Probability & Statistics

Other Noninformative Priors

Predictive Distributions

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

A Discussion of the Bayesian Approach

Hierarchical Models & Bayesian Model Selection

Classical and Bayesian inference

STAT 425: Introduction to Bayesian Analysis

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

A Few Special Distributions and Their Properties

CS 361: Probability & Statistics

Bayesian Regression Linear and Logistic Regression

Part 2: One-parameter models

Bayesian Inference: Concept and Practice

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Stat 5101 Lecture Notes

Checking for Prior-Data Conflict

Bayesian Regression (1/31/13)

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Advanced topics from statistics

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

One-parameter models

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Bayesian Methods for Machine Learning

Bayesian Inference. Chapter 2: Conjugate models

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

A Bayesian Treatment of Linear Gaussian Regression

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Lecture 3. Univariate Bayesian inference: conjugate analysis

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Statistical Data Analysis Stat 3: p-values, parameter estimation

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Probability Review - Bayes Introduction

A Very Brief Summary of Bayesian Inference, and Examples

Part 3 Robust Bayesian statistics & applications in reliability networks

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Conjugate Priors, Uninformative Priors

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Introduction to Probabilistic Machine Learning

Introduction to Machine Learning CMU-10701

INTRODUCTION TO BAYESIAN METHODS II

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Part 4: Multi-parameter and normal models

Discrete Binary Distributions

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Department of Large Animal Sciences. Outline. Slide 2. Department of Large Animal Sciences. Slide 4. Department of Large Animal Sciences

COS513 LECTURE 8 STATISTICAL CONCEPTS

Bayesian Models in Machine Learning

STAT J535: Introduction

David Giles Bayesian Econometrics

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Inference for a Population Proportion

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Statistical Theory MT 2007 Problems 4: Solution sketches

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Week 3: Linear Regression

Bayesian Inference for Normal Mean

Prior Choice, Summarizing the Posterior

10. Exchangeability and hierarchical models Objective. Recommended reading

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Modeling Environment

Department of Statistics

1 Uniform Distribution. 2 Gamma Distribution. 3 Inverse Gamma Distribution. 4 Multivariate Normal Distribution. 5 Multivariate Student-t Distribution

CS 237: Probability in Computing

Computational Cognitive Science

Data Analysis and Uncertainty Part 2: Estimation

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Lecture 4. Continuous Random Variables and Transformations of Random Variables

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

CSC321 Lecture 18: Learning Probabilistic Models

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Intro to Bayesian Methods

An Introduction to Bayesian Linear Regression

1 Introduction. P (n = 1 red ball drawn) =

Bernoulli and Poisson models

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Transcription:

Introduction to Applied Bayesian Modeling ICPSR Day 4

Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A

Simple prior Recall the test for disease example where we specified the prior probability of a disease as a point estimate. In that example we said the probability of having the disease was 0.01 and then computed the probability of having the disease given a positive test result.

Too simple? Considering the prior probability of having a disease (or being pregnant, or winning an election, or winning a war, etc ) to be a point estimate is probably unreasonable. Conditional on age (for disease), money raised (for elections) or economic strength (for conflicts), wouldn t we expect the prior probabilities to vary?

Distributions of priors Even without conditioning on covariates, we would likely find different sources of information providing slightly different unconditional prior probabilities. That is, our prior beliefs would be derived from different samples and are not known and fixed population quantities.

More realistic priors So rather than assigning a single value for P(A), it makes sense, from a Bayesian perspective, to assign a probability distribution to P(A) that captures prior uncertainty about it s true value. This results in the posterior probability that is also a probability distribution, rather than a point estimate.

How do we do this? As we ve seen from earlier lectures, we must specify the form of the prior probability distribution Must be an appropriate distribution That is, the characteristics of the parameter help dictate which distribution(s) is/are acceptable

Appropriate priors For example, if we want to specify a prior for a probability (i.e. p in a binomial), we need a distribution that varies between 0 and 1. The Beta distribution is perfect for this: p(p α, β) = Γ(α + β) Γ(α)+Γ(β) pα 1 (1 p) β 1

More priors If we are modeling counts with a Poisson, then we want our prior distribution to only allow positive values (negative counts make no sense). The Gamma distribution fits the bill nicely: p(θ) = βα Γ(α) θα 1 e βθ

And more priors What about normally distributed data? What do think would be a good prior for in a normal model? μ

Hmmm? How about Normal? p(μ) = 1 2πτ 2 exp[ (μ M )2 2τ 2 ] How about a prior for the variance?

Prior for variance The standard prior for the variance in the normal model is the inverse gamma distribution. p(σ) = βα Γ(α) σ (α 1) e β σ

So, what does this all mean Well, it means that we can specify uncertainty about our prior beliefs by using appropriate probability distributions. It also means that our answers are distributions rather than point estimates

Do I have to use these priors? NO. The examples on the previous slides are all what we call conjugate priors. Conjugacy has some very nice properties

How nice? If a prior is conjugate to a given likelihood function, then we are guaranteed to have a closed-form solution for the moments of the posterior. HUH? This means, that we know the distribution of the posterior and can easily solve for means, medians, variance, quantiles etc.

We know the form of the posterior? YES. If a prior is conjugate to the likelihood, then the posterior has the same distribution as the prior. It just has slightly different parameters Which are weighted averages of the prior and the likelihood. We saw this yesterday.

Conjugacy Table 1: Some Exponential Family Forms and Conjugate Priors Form Conjugate Prior Distribution Hyperparameters Bernoulli Beta α > 0, β > 0 Binomial Beta α > 0, β > 0 Multinomial Dirichlet θ j > 0, Σθ j = θ 0 Negative Binomial Beta α > 0, β > 0 Poisson Gamma α > 0, β > 0 Exponential Gamma α > 0, β > 0 Gamma (incl. χ 2 ) Gamma α > 0, β > 0 Normal for μ Normal μ R, σ 2 > 0 Normal for σ 2 Inverse Gamma α > 0, β > 0 Pareto for α Gamma α > 0, β > 0 Pareto for β Pareto α > 0, β > 0 Uniform Pareto α > 0, β > 0

Non-informative Priors F F F A noninformative prior is one in which little new explanatory power about the unknown parameter is provided by intention. Noninformative priors are very useful from the perspective of traditional Bayesianism that sought to mitigate frequentist criticisms of intentional subjectivity. Fisher was characteristically negative on the subject:...how are we to avoid the staggering falsity of saying that however extensive our knowledge of the values of x may be, yet we know nothing and can know nothing about the values of θ?

Uniform Priors F An obvious choice for the noninformative prior is the uniform distribution: I Uniform priors are particularly easy to specify in the case of a parameter with bounded support. I Proper uniform priors can be specified for parameters defined over unbounded space if we are willing to impose prior restrictions. I Thus if it reasonable to restrict the range of values for a variance parameter in a normal model, instead of specifying it over [0 : ], we restrict it to [0 : ν] andcannowarticulateitasp(σ) =1/ν, 0 θ ν. I Improper uniform priors that do not possess bounded integrals and surprisingly, these can result in fully proper posteriors under some circumstances (although this is far from guaranteed). I Consider the common case of a noninformative uniform prior for the mean of a normal distribution. It would necessarily have uniform mass over the interval: p(θ) = c, [ θ ]. Therefore to give any nonzero probability to values on this support, p(θ) = ²>0, would lead to a prior with infinite density: R p(θ)dθ =.

Elicited Priors F Clinical Priors: elicited from substantive experts who are taking part in the research project. F Skeptical Priors: built with the assumption that the hypothesized effect does not actually exist and are typically operationalized through a probability function (PMF or PDF) with mean zero. F Enthusiastic Priors: the opposite of the sceptical prior, built around the positions of partisan experts or advocates and generally assuming the existence of the hypothesized effect. F Reference Priors: produced from expert opinion as a way to express informational uncertainty, but they are somewhat misguided in this context since the purpose of elicitation is to glean information that can be described formally.

So, can I use any prior Well, pretty much, but there are some things to consider Some priors could lead to nonsensical results Some priors make life easier Some priors make publication easier unfortunately. Some priors lead to intractable results

what does that mean? Imagine the following prior: p(θ) = arctan θ sin π2 +2 Q N i=1 (x i) cos e π You could do this is you REALLY wanted to, but this leads to the following result

Weird prior result 1 + Math = HARD

So, then what If we have a posterior for which there is no analytical solution or, if the analytical solution is REALLY hard, Don t worry there is an answer. It s called MCMC.

Next week Summarizing posterior distributions Introducing covariates MCMC/Gibbs Sampling See you at the picnic!!