An example to illustrate frequentist and Bayesian approches

Similar documents
2. A Basic Statistical Toolbox

Where now? Machine Learning and Bayesian Inference

Frequentist-Bayesian Model Comparisons: A Simple Example

Advanced Statistical Methods. Lecture 6

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University. False Positives in Fourier Spectra. For N = DFT length: Lecture 5 Reading

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

AST 418/518 Instrumentation and Statistics

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Statistical Methods in Particle Physics

STA 4273H: Sta-s-cal Machine Learning

Strong Lens Modeling (II): Statistical Methods

Conditional probabilities and graphical models

Rejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling

Computational Genomics

2 Statistical Estimation: Basic Concepts

Statistical Methods for Astronomy

Physics 509: Bootstrap and Robust Parameter Estimation

Introduction to Bayesian Methods

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

L14. 1 Lecture 14: Crash Course in Probability. July 7, Overview and Objectives. 1.2 Part 1: Probability

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Regression with Input-Dependent Noise: A Bayesian Treatment

Statistical Methods for Astronomy

Most data analysis starts with some data set; we will call this data set P. It will be composed of a set of n

Solving with Absolute Value

ebay/google short course: Problem set 2

MODULE 6 LECTURE NOTES 1 REVIEW OF PROBABILITY THEORY. Most water resources decision problems face the risk of uncertainty mainly because of the

Machine Learning 4771

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Bayesian Methods for Machine Learning

Stochastic Processes. Review of Elementary Probability Lecture I. Hamid R. Rabiee Ali Jalali

Bayesian Phylogenetics:

A Bayesian Approach to Phylogenetics

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Bayesian Inference (A Rough Guide)

14.30 Introduction to Statistical Methods in Economics Spring 2009

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling

MACHINE LEARNING ADVANCED MACHINE LEARNING

6. Vector Random Variables

Density Estimation. Seungjin Choi

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

{X i } realize. n i=1 X i. Note that again X is a random variable. If we are to

CSE 559A: Computer Vision

Fundamental Probability and Statistics

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Machine Learning 4771

Bayesian Regression Linear and Logistic Regression

Introduction to Bayesian Data Analysis

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Bayes: All uncertainty is described using probability.

Bayesian Inference. Chapter 1. Introduction and basic concepts

CSE 559A: Computer Vision Tomorrow Zhihao's Office Hours back in Jolley 309: 10:30am-Noon

Machine Learning Lecture 2

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

CSC 446 Notes: Lecture 13

STA414/2104 Statistical Methods for Machine Learning II

Statistics: Learning models from data

Bayesian RL Seminar. Chris Mansley September 9, 2008

Parameter estimation Conditional risk

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

1 A brief primer on probability distributions

F denotes cumulative density. denotes probability density function; (.)

Bayesian Inference of Noise Levels in Regression

Statistical techniques for data analysis in Cosmology

Bayesian Approaches Data Mining Selected Technique

An Introduction to Bayesian Linear Regression

Statistical Data Analysis Stat 3: p-values, parameter estimation

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Markov Chain Monte Carlo

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

P Values and Nuisance Parameters

Answers and expectations

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Statistical Methods in Particle Physics. Lecture 2

9 - Markov processes and Burt & Allison 1963 AGEC

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Part 6: Multivariate Normal and Linear Models

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Fundamental Issues in Bayesian Functional Data Analysis. Dennis D. Cox Rice University

Introduc)on to Bayesian Methods

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

The Metropolis-Hastings Algorithm. June 8, 2012

Introduction to Statistical Methods for High Energy Physics

Fitting a Straight Line to Data

Introduction to Probability and Statistics (Continued)

Inferring from data. Theory of estimators

State Space and Hidden Markov Models

Lies, damned lies, and statistics in Astronomy

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

Transcription:

Frequentist_Bayesian_Eample An eample to illustrate frequentist and Bayesian approches This is a trivial eample that illustrates the fundamentally different points of view of the frequentist and Bayesian approaches. Consider a data set of measurements: {, = 1,..., } Assume that the are each drawn from a random variable that is identically and independently distributed (i.i.d) with some probability density function The sample mean is = 1. This might be something as simple as the average of the heights of everyone in the room. Often we aren't really interested in the average of the particular entries, but rather what it tells us about some "true" mean of a large population of people. Or we might want to compare heights of one sample with another. E.g. are astronomers taller on average than, say, engineers? (Hypothesis testing). Frequentist view: The sample mean is considered to be the of a particular realization of the data values. If we had a different set of people we would get a different but statistically similar. The underlying notion is that there is an infinite ensemble of realizations and if we repeat an eperiment = "obtain height values," enough times (possibly infinite) we would learn about the ensemble average. The name 'frequentist' is given because the frequency of occurrence of among realizations is an estimate of the ensemble average (with caveats). The ensemble is described by the probability density function (PDF) d f () = 1. X The cumulative distribution function (CDF) is the integral F X () = d () f X and ranges between 0 and 1. The ensemble average of any of the is = d f X () that is normalized to unity: How does the sample mean relate to the ensemble average? In the following we will also need the second moment 2 from which we have the variance f X (). Var( ) = ( ( ). http://localhost:8888/nbconvert/html/frequentist_bayesian_eample.ipynb?download=false 2 Page 1 of 6

Frequentist_Bayesian_Eample Using those definitions we can calculate the ensemble mean and variance of the sample mean : We also want to know the variance of : It can be shown that Var() σx 2 = d ( ) 2 f (). X = 1 = (You can show this) Var( ) σ 2 = 2 =. What this says is that the standard deviation of the sample mean is of an individual data point. We have rediscovered the ubiquitous " σ 2 σ 2 2 1/ smaller than the standard deviation " law! Bayesian view: A Bayesian says, basically, you only have one data set (the particular data points) so live with it and figure out what your knowledge is about the ensemble average. That is, the Bayesian approach deals with probability as a statement of what you know about a parameter not as a frequency of occurrence of repeated eperiments. This may seem like a subtle difference. But in practice, while the frequentist approach provides point estimates like, the Bayesian approach gives a PDF for the "true mean." This same difference applies to much more complicated situations of model fitting and hypothesis testing. Thus, now we use the sample mean to infer knowledge about the true mean. The fundamental equation for Bayesian inference is based on (not surprisingly) Bayes' theorem that can be derived from conditional probabilities. Consider events A and B. The conditional probability that B occurs given that A occurs is We also have (inverting A and B) which means or P(A B) P(B A) =. P(A) P(A B) P(A B) =, P(B) P(B A)P(A) = P(A B)P(B) P(B A) = P(A B)P(B) P(A) OK, now lets get back to the sample mean and the true mean. We want to know given that we have data that give us. So we make the following assignments: B = = A = 'data' = http://localhost:8888/nbconvert/html/frequentist_bayesian_eample.ipynb?download=false Page 2 of 6

Frequentist_Bayesian_Eample A = 'data' = So now we have P( ) = P()P( ) P( ) To be useful we need to say what we mean by the various probabilities: The left hand side is the posterior probability of given the data. P() is the prior probability of (i.e. what we knew about it before acquiring any data; maybe we knew nothing. Or maybe we know that peoples' heights are bracketed and so there are constraints on that can be made. P( ) is the probability of having gotten the data given some value of the parameter. It is an assumption in the setup that all the data derive from the same PDF with true mean. The denominator of the right hand side is the probability of the data given all possible values of. That's a bit confusing. What we really need is for the probabilities to all fall between 0 and 1, so the denominator in this contet is really normalization. ow, we etend our point of view so that the entries in Bayes' theorem can be probability density functions. I prefer to use notation like f X () or f ( ) for PDFs but the Bayesian literature typically uses regardless of whether it means a probability or a probability density function. So for the problem at hand we could use PDFs as Usually the PDF of the data on the right-hand side is called the likelihood function so we will rename it as () = f ( ) giving f () () f( ) =. d f () () Thus we have a plausible epression that says the posterior PDF of the true mean is given by the product of its PDF prior to acquiring data multiplied by the likelihood function, which includes data that presumably (hopefully!) increase our knowledge about. A particular case is where each data point is distributed with a Gaussian PDF ( )2 /2σ 2 P() f () f ( ) f( ) =. d () f ( ) How do we construct the likelihood function? With independent data points the likelihood function is given by the product (since data points are independent by assumption) f f () = (2πσ) 1/2 e. P http://localhost:8888/nbconvert/html/frequentist_bayesian_eample.ipynb?download=false Page 3 of 6

Frequentist_Bayesian_Eample () = f ( ) = (2πσ = (2πσ ) 1/2 ( e ) 2 /2σ 2 ) /2 e =1 We can manipulate the eponent of the last epression by using = ( ) + ( ) which then gives the likelihood as () = (2πσ) /2 e σˆ2 /2σ 2 ( e 2 )/2σ 2 where the sample variance is ) 2 σˆ2 = 1 (. ( /2 ) 2 σ 2 Only the factor involving matters because normalization of the right hand side of the posterior PDF causes other factors to cancel. We then have ( ) 2 /2σ 2 f( ) f () e ote that the data appear in the likelihood function only via. ote also that the posterior PDF gives us a functional form for the PDF of, as opposed to the point estimate from the frequentist approach. We can obtain a point estimate from the posteriod PDF by finding the maimum of. For a flat prior where we really don't know anything about before acquiring data, the maimum of the posterior PDF is at =. But the posterior also tells us that there is uncertainty about determined by and the number of data points: σ = σ/. The maimum likelihood estimate for is simply the value where maimizes. This is ust. f( ) σ = In [71]: %matplotlib inline from numpy import * import scipy import matplotlib import matplotlib.pyplot as plt import astropy from scipy import constants as spconstants from scipy.special import gamma randn = random.randn In [72]: = 10 mu = 1.2 sigma = 1 muvec = arange(0., 3, 0.01) vec = randn() + mu bar = vec.mean() posterior_flat_prior = ep(-*(bar - muvec)**2/(2*sigma**2)) http://localhost:8888/nbconvert/html/frequentist_bayesian_eample.ipynb?download=false Page 4 of 6

Frequentist_Bayesian_Eample In [73]: print print bar 10 1.10537446678 In [74]: plt.plot(muvec, posterior_flat_prior) plt.plot((bar, bar), (0., 1.), '--', label=r'$ \overline{} $') plt.plot((mu, mu), (0., 1.), '--', label=r'$\rm \mu = true \ mean $') plt.label(r'$\mu $', fontsize=18) plt.ylabel(r'$\rm \propto \ posterior \ PDF \ of \ \mu $', fontsize= 18) plt.title(' = %d samples'%()) plt.legend(loc=1) plt.show() http://localhost:8888/nbconvert/html/frequentist_bayesian_eample.ipynb?download=false Page 5 of 6

Frequentist_Bayesian_Eample ote that For small the sample mean and true mean differ substantially In the frequentist view we epect the typical difference to be σ/. For = 10, this is about a 30% error. In the Bayesian approach the width of the posterior PDF reflects this error For either approach one can infer the same thing about : Frequentist: = ± σ = ± σ / Bayesian: our knowledge of is contained in the posterior PDF, which can be integrated to give the CDF from which we can establish a confidence interval for such as = +δ + δ. Etensions to higher dimensions This is a simple one-dimensional case (one parameter, ) where we have pretended that we know (for the individual i ). More realistically, we both parameters would be unknown. In that case the likelihood function is (, σ) = (2πσ) /2 /2 e σˆ2 σ 2 ( )/2 e 2 σ 2 the posterior PDF for both parameters is 2 σ e σˆ2 /2σ 2 ( e 2 )/2σ 2 ow we have a two-dimensional PDF from which to make our conclusions. f(, σ, ) f(, σ). Real-world problems can etend to hundreds of parameters. avigating the posterior PDF to make conclusions is a big challeng. That is why methods like simulated annealing and Markov Chain Monte Carlo (MCMC) have been developed. σ http://localhost:8888/nbconvert/html/frequentist_bayesian_eample.ipynb?download=false Page 6 of 6