Introduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department

Similar documents
David Giles Bayesian Econometrics

Bayesian Inference: What, and Why?

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Bayesian RL Seminar. Chris Mansley September 9, 2008

Advanced Herd Management Probabilities and distributions

Linear Models A linear model is defined by the expression

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Density Estimation. Seungjin Choi

Data Analysis and Uncertainty Part 2: Estimation

Hierarchical Models & Bayesian Model Selection

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

A Very Brief Summary of Bayesian Inference, and Examples

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Introduction to Bayesian Methods

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

STAT 425: Introduction to Bayesian Analysis

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Bayesian Methods: Naïve Bayes

Probability Dr. Manjula Gunarathna 1

Chapter 5. Bayesian Statistics

Introduc)on to Bayesian Methods

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Learning Bayesian network : Given structure and completely observed data

Time Series and Dynamic Models

Machine Learning 4771

Parametric Techniques Lecture 3

10. Exchangeability and hierarchical models Objective. Recommended reading

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Bayesian Inference: Concept and Practice

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Bayesian Inference: Posterior Intervals

A Few Special Distributions and Their Properties

Primer on statistics:

Inference for a Population Proportion

Intro to Probability. Andrei Barbu

CS 340 Fall 2007: Homework 3

Bayesian Regression Linear and Logistic Regression

Data Mining Techniques. Lecture 3: Probability

Machine Learning

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Part III. A Decision-Theoretic Approach and Bayesian testing

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Probability and (Bayesian) Data Analysis

From Bayes Theorem to Pattern Recognition via Bayes Rule

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Introduction to Machine Learning

Review of Probabilities and Basic Statistics

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian Assessment of Hypotheses and Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Remarks on Improper Ignorance Priors

Parametric Techniques

Announcements. Proposals graded

Bayesian Inference. Chapter 2: Conjugate models

Advanced Probabilistic Modeling in R Day 1

Bayesian Methods for Machine Learning

Principles of Bayesian Inference

Chapter 8: Sampling distributions of estimators Sections

Statistical Inference: Maximum Likelihood and Bayesian Approaches

Two Lectures: Bayesian Inference in a Nutshell and The Perils & Promise of Statistics With Large Data Sets & Complicated Models

Information Science 2

Advanced topics from statistics

Continuous random variables

Probability - Lecture 4

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

What are the Findings?

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Statistical Concepts

Lecture 13 Fundamentals of Bayesian Inference

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Bayesian inference: what it means and why we care

9 Bayesian inference. 9.1 Subjective probability

Model Checking and Improvement

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

One-parameter models

Bayesian Inference for Normal Mean

Bayesian methods in economics and finance

Exercise 1: Basics of probability calculus

CSC321 Lecture 18: Learning Probabilistic Models

Introduction to Bayesian Inference

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Introduction to Machine Learning. Lecture 2

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Basic Probabilistic Reasoning SEG

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Grundlagen der Künstlichen Intelligenz

1 Hypothesis Testing and Model Selection

COMP90051 Statistical Machine Learning

Tools for Parameter Estimation and Propagation of Uncertainty

Exam 2 Practice Questions, 18.05, Spring 2014

MAS3301 Bayesian Statistics

Introduction to Probabilistic Machine Learning

Transcription:

Introduction to Bayesian Statistics James Swain University of Alabama in Huntsville ISEEM Department

Author Introduction James J. Swain is Professor of Industrial and Systems Engineering Management at UAH. His BA,BS, and MS are from Notre Dame, and PhD in Industrial Engineering from Purdue University. Research interests include statistical estimation, Monte Carlo and Discrete Event simulation and analysis. Celebrating 25 years at UAH with experience at Georgia Tech, Purdue, the University of Miami, and Air Products and Chemicals.

Outline Views of probability Likelihood principle Classical Maximum Likelihood Bayes Theorem Bayesian Formulation Applications Proportions Means Approach to Hypothesis Tests Conclusion References

It is unanimously agreed that statistics depends somehow on probability. But, as to what probability is and how it is connected with statistics, there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel. Savage, 1954 Bayesian statistics are based on subjective rather than objective probability

Classical View of Probability Physical or Objective probability Also known as frequentist probability Description of physical random processes Dice Roulette Wheels Radioactive processes Probability as the limiting ratio of repeated occurrences Philosophically --- Popper, von Mises, etc.

Subjective Probability Subjective or Evidential Also called Bayesian probability Assessment of uncertainty Plausibility of possible outcomes May be applied to uncertain but fixed quantities Bayes analysis treats unknown parameters as random Sometimes elucidated in terms of wagers on outcomes Philosophically --- de Finetti, Savage, Carnap, etc.

Note on Notation Samples: X for either single or multiple samples Distributions: p(x) for either discrete or continuous N(µ, σ 2 ) for normal with mean and variance parameters (sometimes φ) Parameters: For binomial examples, π is population proportion of success For continuous case, it is simply π

Likelihood Principle If X = 8 is observed, which Binomial is more likely to have produced? Binomial, n=20 Example: X=8 --- source? 0.25 0.20 p 0.2 0.4 b(8:0.4, 20) = 0.18 Probability 0.1 5 0.1 0 b(8;0.2, 20) = 0.02 0.05 0.00 0 2 4 6 X 8 1 0 1 2 1 4 1 6 Either possible First more likely

Likelihood (Discrete Sample) Likelihood function L π: X Note: π is unknown proportion of success x i = ቊ 1 Success 0 Failure Here, sample X is given n L π: X = i=1 π x i 1 π 1 x i = π x 1 π n x Parameter π is variable to solve l π: X = ln L π: X = x ln π + n x ln 1 π Solution is MLE (max. likelihood estimator) dl dπ = 0 yields p = π = x n

Likelihood for continuous For sample x = 13 Normal( 12.5, 4) f(13: 12.5, 4) = 0.19 Normal (10, 4) f(13: 10, 4) = 0.065

MLE for Continuous (Normal) Multiple samples (n) Likelihood L L μ, σ 2 : X = n e x i μ 2 Τ2σ 2 i=1 2πσ 2 Parameters chosen to max L using Ln L Estimation (MLE) l μ, σ 2 : X = ln L μ, σ 2 : X = n 2 ln 2π n 2 ln σ2 1 n 2σ 2 i=1 dl dμ = 0 dl dσ 2 = 0 σ2 = σ n i=1 μ Ƹ = xҧ x i xҧ 2 n = n s 2 n 1 x i μ 2

Classical Statistics: Summary Theory is well developed E.g., most cases asymptotically unbiased and normal Variance parameters based on log likelihood derivatives Confidence Intervals Confidence level means? Hypothesis Tests Pesky p-value

There s no theorem like Bayes theorem Like no theorem we know Everything about it is appealing Everything about it is a wow Let out all that a priori feeling You ve been concealing right up to now! George E. P. Box Bayes Theorem useful, controversial (in its day), and the bane of introductory probability students. Sometimes known as inverse probability

Bayes Theorem I: Sets and Partition of S Consider a sample space S Event B S Partition of sample space A 1, A 2,, A k Exhaustive Mutually exclusive S = A 1 A 2 A k A i A j = B = B S = B A 1 B A 2 B A k Decomposition of B into subsets B A i By Axioms of probability k P B = P(B A i ) i=1

Bayes Theorem II: Conditional Probability Conditional probability P B A i = P(A i B) P(A i ) Multiplication rule P A i B = P A i P B A i Total probability k k P B = P(B A i ) = P A i P(B A i ) i=1 i=1

Bayes III: The Theorem Direct result of previous definitions Multiplication rule Total Probability P A i B = P(A i B) P(B) = P A i P(B A i ) σk i=1 P A i P(B A i ) Conditionals are reversed or inverted

Quick Bayes Theorem Example Let C = conforming A part is sampled, tested, and found to be conforming: P A 3 C =? Vendor P(C Vendor) (%) Vendor Proportion (%) A 1 88 30 A 2 85 20 A 3 95 50

Bayes Statistical Formulation Parameter θ uncertainty Prior distribution p(θ) p θ X = L θ X p(θ) p(x) Conditional on sample: Likelihood Posterior distribution p θ X p X = L θ i X p(θ i ) All i නL θ X p θ dθ Discrete case Continuous Case Posterior proportional to numerator p θ X L θ X p(θ)

Proportionality Denominator Complex A constant Key information is numerator p θ X = L θ X p(θ) p(x) Standardize numerically if needed p θ X L θ X p(θ) Bayes methods frequently computer intensive

Updating Posterior Two independent samples X, Y p θ X = L θ X p(θ) p(x) Posterior from X is prior to Y Updated posterior p θ X, Y = L θ Y p θ X p(y) = L θ Y L θ X p(θ) p Y p(x)

High Density Intervals (HDI) Summarizing Posterior Distribution High Density Intervals Probability content (e.g., 95%) Region with largest density values Similar for symmetric (e.g., normal) May differ for skewed (e.g., Chi-squared) Special tables may be required Source: Krusche, p. 41 Special tables for HDI (e.g., inverse log F)

Bayes Analysis: Binomial Sample (Beta prior) Estimating proportion of success π Noted: Likelihood x i = ቊ 1 Success 0 Failure What prior? Beta distribution is typical n L π: X = i=1 π x i 1 π 1 x i = π x 1 π n x

Beta Prior for Proportion Success Uncertainty about proportion success (π) Similarity to likelihood Uninformative prior p π = πα 1 1 π β 1 B(α, β) B α, β = Γ α Γ(β) Γ(α + β)

Beta Prior for Proportion Success Mean and variance formulas E Π = α α + β Var(Π) = αβ α + β 2 α + β + 1

Example: Small Poll Prior Worth sample 20 Estimated Mean 25% Sample Sample size 100 Vote favorable 40 (40%) α 0 = π 0 n 0 β 0 = 1 π 0 n 0 π 0 = α 0 α 0 + β 0 π 1 = x + α 0 n + α 0 + β 0 = x n n n + α 0 + β 0 + α 0 α 0 + β 0 α 0 + β 0 n + α 0 + β 0

Conjugate Priors Prior / Posterior distributions that are similar Conjugate: Posterior/Prior same form p π ~π α 1 1 π β 1 L π: X = π x 1 π n x Identification of distributional constants simplified p π X ~π α+x 1 1 π β+n x 1

Simple Normal-Normal Case for Mean Simplified Case Fixed variance Mean μ has prior parameters μ 0, φ 0 μ ~ N(μ 0, φ 0 )) x ~ N(μ, φ Derivation steps omitted (Ref: Lee) p(μ x) exp μ2 φ 0 1 + φ 1 2 + μ μ 0 φ 0 + x φ

Results: Normal-Normal Summarized results Posterior mean weighted value φ 1 = 1 φ 0 1 + φ 1 μ 1 = φ 1 μ 0 φ 0 + x φ μ ~ N(μ 1, φ 1 ) x ~ N(μ, φ) φ 0 1 φ μ 1 = μ 0 φ 1 + x 0 + φ 1 1 φ 0 1 + φ 1

Example: Carbon Dating I Ennerdale granophyre Earliest dating: K/Ar in 60 s, 70 s Prior: 370 M-yr, ±20 M-yr Later Rb/Sr 421 ±8 (M-yr) Posterior estimate

Carbon Dating II Substitute Alternative Expert Prior 400 M-yr, ±50 M-yr Revised posterior Note posterior roughly same

Normal-Normal (Multiple Samples) Similar arrangement Multiple samples Variances are constant μ ~ N(μ 0, φ 0 ) x i ~ N(μ, φ) L(μ x) exp( μ2 φ 0 1 + nφ 1 2 + μ μ 0 + σ x i φ 0 φ

Results: Multiple Samples Similar analysis As n increases, posterior converges to xҧ φ 1 = 1 φ 0 1 + nφ 1 μ 1 = φ 1 μ 0 φ 0 + xҧ φ/n μ ~ N(μ 1, φ 1 ) x ~ N(μ, φ/n) μ 1 = μ 0 φ + xҧ φ + nφ 0 nφ 0 φ + nφ 0

Example: Men s Fittings Modified from Lee (p. 46) Experience: Chest measurement N(38, 9) Shop data 39.8, ±2 on n = 890 samples

Normal: Prior in Variance Default prior in mean uniform Improper for infinite range Variance Uniformity in value (like mean)? Transform: uniform in log φ Conjugate prior in Variances Inverse Chi-squared distribution p φ = 1 φ p φ φ ν 2 1 ex p S 0 2φ p φ X φ ν+n 2 1 exp S 0 + S 2φ

Conjugate Prior for Normal Variance Inverse chi-squared distribution Exact match not critical (likelihood will dominate) φ ~ S 0 χ υ 2 E φ = S 0 ν 2 2 2S 0 Var φ = ν 2 2 ν 4

General Approach Hypothesis Tests Hypothesis Tests: Choice between two alternatives Alternatives disjoint and exhaustive of course Priors Posterior H 0 : θ Θ 0 H 1 : θ Θ 1 π 0 = θ Θ 0 π 1 = θ Θ 1 p 0 = P(θ Θ 0 X ) p 1 = P(θ Θ 1 X) π 0 + π 1 = 1 p 0 + p 1 = 1

Hypothesis Tests --- General Prior odds Posterior odds Bayes Factor B Odds favoring H 0 against H 1 Two simple alternatives Larger B denotes favor for H 0 π 0 π 1 = π 0 1 π 0 p 0 p 1 = p 0 1 p 0 B = B = p 0 p 1 π 0 π 1 ) p(x θ 0 ) p(x θ 1

Summary and Conclusion Bayesian estimation Since parameters random, provides probability statements about values Provides mechanism for non-sample information Avoids technical difficulties with likelihood along More advanced analysis computationally intensive Tied to computer software Open source R with application BUGS Machine learning heavily dependent on Bayesian methods

References Lee (2012), Bayesian Statistics: An Introduction, 4Ed., Wiley Lindley (1972), Bayesian Statistics, A Review, SIAM. Krusche (2011), Doing Bayesian Data Analysis: A Tutorial with R and BUGS, Academic Press/Elsevier.