STAT 425: Introduction to Bayesian Analysis

Similar documents
A Discussion of the Bayesian Approach

Introduction to Bayesian Methods

Chapter 5. Bayesian Statistics

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

A Very Brief Summary of Bayesian Inference, and Examples

Statistical Theory MT 2006 Problems 4: Solution sketches

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Statistical Theory MT 2007 Problems 4: Solution sketches

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Invariant HPD credible sets and MAP estimators

Part 2: One-parameter models

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Beta statistics. Keywords. Bayes theorem. Bayes rule

(1) Introduction to Bayesian statistics

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Remarks on Improper Ignorance Priors

Bayesian Models in Machine Learning

Linear Models A linear model is defined by the expression

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Applied Bayesian Statistics STAT 388/488

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

INTRODUCTION TO BAYESIAN ANALYSIS

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

1 Introduction. P (n = 1 red ball drawn) =

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Other Noninformative Priors

Bayesian Inference: Concept and Practice

Foundations of Statistical Inference

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Inference for a Population Proportion

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Conjugate Priors, Uninformative Priors

Bayesian inference: what it means and why we care

HPD Intervals / Regions

Bayesian Inference: Posterior Intervals

ST 740: Model Selection

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Data Analysis and Uncertainty Part 2: Estimation


Lecture 6. Prior distributions

Lecture 2: Statistical Decision Theory (Part I)

Introduction to Machine Learning. Lecture 2

A Very Brief Summary of Statistical Inference, and Examples

One-parameter models

Hierarchical Models & Bayesian Model Selection

arxiv: v1 [math.st] 10 Aug 2007

Statistical Data Analysis Stat 3: p-values, parameter estimation

STAT 830 Bayesian Estimation

Bernoulli and Poisson models

Department of Statistics

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Bayesian data analysis using JASP

Default priors and model parametrization

Confidence Distribution


Bayesian parameter estimation with weak data and when combining evidence: the case of climate sensitivity

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling

1 Hypothesis Testing and Model Selection

My talk concerns estimating a fixed but unknown, continuously valued parameter, linked to data by a statistical model. I focus on contrasting

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Minimum Message Length Analysis of the Behrens Fisher Problem

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

New Bayesian methods for model comparison

General Bayesian Inference I

Neutral Bayesian reference models for incidence rates of (rare) clinical events

Computational Perception. Bayesian Inference

Bayesian Inference. Chapter 2: Conjugate models

Objective Bayesian Statistical Inference

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Part III. A Decision-Theoretic Approach and Bayesian testing

Statistical Methods in Particle Physics

Predictive Distributions

Introduction to Probabilistic Machine Learning

Introduction to Bayesian Statistics

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

STAT J535: Chapter 5: Classes of Bayesian Priors

Harrison B. Prosper. CMS Statistics Committee

E. Santovetti lesson 4 Maximum likelihood Interval estimation

Spring 2006: Examples: Laplace s Method; Hierarchical Models

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University

Computational Cognitive Science

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Seminar über Statistik FS2008: Model Selection

Bayesian Statistics. Luc Demortier. INFN School of Statistics, Vietri sul Mare, June 3-7, The Rockefeller University

Suggested solutions to written exam Jan 17, 2012

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

Introduction into Bayesian statistics

Introduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Transcription:

STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10

Lecture 7: Prior Types Subjective and objective priors Non-informative priors Jeffreys priors Diffuse priors Bayes factors for hypothesis testing Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 2 / 10

Choice of Prior Distribution What is the interpretation of probability? Objective Probability As normative and objective representations of what is rational to believe about a parameter, usually in a state of ignorance (dice, coin, etc.) Frequentist probability Long run frequency when a process is repeated Subjective Probability Personal judgment about an event. But, one must be coherent. Two general types of Priors: Subjective Priors Priors chosen to reflect expert opinion or personal beliefs Objective Priors Priors chosen to let the data (i.e., likelihood) dominate the posterior distribution, and hence inference. These are generally determined based on the sampling model in use. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 3 / 10

Subjective Priors Ways of Specifying For single events: Relative probability of A to not A (as for lotteries/betting) Continuous quantities: prob. distribution divide range into intervals, and assign probabilities for each (as for events, above) or specify percentiles. Then, fit a smooth probability density. Examples: (i) θ Beta(α, β). From expert: mean=.3 and s.d.=0.1. Then solve for α and β and find Beta(9.2,13.8). (ii) θ N(µ, s). From expert: an educated guess of 25 and a 95% confidence that θ is between 10 and 40. Then, θ N(25, 30/4). Subjective priors may be classified into two classes: Conjugate Priors Non-conjugate priors Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 4 / 10

Non-informative or Objective Priors Goal: Choose priors which reflect a state of no-information about θ. Central problem: specifying a prior distribution for a parameter about which nothing is known. For example, if θ can only have a finite set of values, it seems natural to assume all values equally likely a priori. Such specifications fall into the general class of noninformative priors. Roughly speaking a prior is non-informative if it has minimal impact on the posterior, i.e. it does not change much over the region where the likelihood is appreciable. This notion was first used by Bayes(1763), and Laplace(1812) in Astronomy. A formal development is given in Box and Tiao (1973) and in Kass and Wasserman (1996, JASA). Still the notion is not adequately well-defined. Two types: Reference or default priors Diffuse priors Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 5 / 10

Reference Priors Example: x = (x 1,..., x n ) from N(θ, σ 2 ), θ unknown, σ known. What is the prior that reflects a state of no information about θ and lets the likelihood dominate the posterior? The flat prior: π(θ) = 1 for < θ < Some features of the flat prior: Non-informative: All values of θ are equally likely. No information. Likelihood dominates posterior: p(θ x) L(θ)π(θ) = L(θ) May not be a probability distribution, i.e. π(θ)dθ = ( improper prior ). Not a major concern if posterior is proper. Ex: X Poisson(λ), 0 < λ <, λ Ga(α, β). Then α = 1, β 0 gives π(λ) 1 but with a proper posterior λ x Ga(α + t, n) (also, example above). In complex models, however, posterior may also be improper and one cannot make inference. Other default priors: π(λ) 1 [i.e.λ Ga(1/2, β 0)] and λ 1/2 π(λ) 1 λ [i.e.λ Ga(0, β 0)] (both with proper posterior). Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 6 / 10

Change of variable Another concern with reference priors: The constant/flat rule prior is not transformation invariant. If τ = g(θ) is a monotone function of θ with inverse g 1 (τ) = θ then the density of τ is obtained from the density of θ as p(τ) = p(g 1 (τ)) g 1 (τ), with Jacobian J = g 1 (τ) τ τ Ex: π(θ) = 1 and τ = e θ. Implied π(τ) = 1/τ, no-longer a uniform. Binomial model with θ Beta(1, 1) uniform for 0 θ 1 θ Beta(0, 0) uniform for log odds, τ = log[θ/(1 θ)] θ Beta(1, 1) uniform for odds, τ = θ/(1 θ) A density cannot be simultaneously uniform in all transformations. Problem with uniform distribution as non-informative (no information on a transformation of θ does not imply no information on θ). Remedy: Jeffreys priors. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 7 / 10

Jeffreys Priors Let p(x θ) be a (single-parameter) probability model. The Fisher information is defined as [ 2 log p(x θ) ] I(θ) = E x θ θ 2 and the Jeffreys prior is π(θ) I(θ). Properties: Locally uniform, i.e., flat in the region of the likelihood. Transformation invariant, ie. all parameterizations lead to same prior π(θ) = I(θ) and π(τ) = I(τ) with τ = g(θ) same prior obtained whether (i) apply Jeffreys rule to get π(θ) and then transform to get π(τ) or (ii) transform to τ and then apply Jeffreys rule to get π(τ). Caution: Posterior may not be proper. Does not work well for multi-parameter models, θ = (θ 1,..., θ k ), I(θ) is a matrix of partial derivatives and π(θ) I(θ). Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 8 / 10

Examples Binomial: x Binomial(n, θ), log(p(x θ)) = x log(θ) + (n x) log(1 θ), E[x θ] = nθ, 2 log(p(x θ)) θ 2 = x (n x) nθ (n nθ), I(θ) = + θ2 (1 θ) 2 θ2 (1 θ) 2 = n θ(1 θ) Jeffreys prior is π(θ) θ 1/2 (1 θ) 1/2, i.e., θ Beta(1/2, 1/2) (conjugate, proper) Poisson: x Poisson(λ), log(p(x λ)) = λ + x log(λ) log(x!) (single obs), E[x λ] = λ 2 log(p(x λ)) λ 2 = x λ 2, I(λ) = 1 λ Jeffreys prior is π(λ) λ 1/2, i.e., θ Ga(1/2, β 0) (conjugate, improper with proper posterior) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 9 / 10

Diffuse Priors Motivation is to use a proper prior with little info about the parameter. Conjugate priors are proper The spread of a prior is a measure of the amount of uncertainty about the parameter expressed by the prior. So, choose a conjugate prior with large standard deviation as a diffuse or weakly informative prior. Examples: (i) X Poisson(λ), 0 < λ <, λ Ga(α, β). A diffuse prior is Ga(α, 1) with α the prior expectation. (ii) X N(θ, σ 2 ) with σ 2 known. A diffuse prior for θ is N(µ 0, 1000). HTTP://WWW.AIACCESS.NET/E GM.HTM Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 10 / 10