Classical and Bayesian inference

Similar documents
Classical and Bayesian inference

Classical and Bayesian inference

Naïve Bayes classification

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

1 Introduction. P (n = 1 red ball drawn) =

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Computer vision: models, learning and inference

Homework 1: Solution

Remarks on Improper Ignorance Priors

Recitation 2: Probability

Basics on Probability. Jingrui He 09/11/2007

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Bayesian Models in Machine Learning

5. Conditional Distributions

Communication Theory II

Review: Statistical Model

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Probability Theory for Machine Learning. Chris Cremer September 2015

Nuisance parameters and their treatment

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Chapter 8: Sampling distributions of estimators Sections

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

ST 740: Model Selection

Computational Perception. Bayesian Inference

Exam 2 Practice Questions, 18.05, Spring 2014

Frequentist Statistics and Hypothesis Testing Spring

Probability and Inference

The devil is in the denominator

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Probability and Estimation. Alan Moses

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Compute f(x θ)f(θ) dθ

Bayesian Inference. Introduction

Probability theory basics

Introduction to Bayesian Statistics

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

HPD Intervals / Regions

Terminology. Experiment = Prior = Posterior =

Introduction to Bayesian Statistics

MA : Introductory Probability

Discrete Binary Distributions

Deep Learning for Computer Vision

Probability and Statisitcs

Bayesian Methods: Naïve Bayes

Probability Intro Part II: Bayes Rule

K. Nishijima. Definition and use of Bayesian probabilistic networks 1/32

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Inference

Introduction to Machine Learning

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

CSC321 Lecture 18: Learning Probabilistic Models

Time Series and Dynamic Models

Bayesian Analysis (Optional)

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Introduction to Machine Learning

Lecture 2: Priors and Conjugacy

Lecture 1: Probability Fundamentals

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability Theory and Applications

Review of probability

1 Probability Model. 1.1 Types of models to be discussed in the course

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Statistics for scientists and engineers

Today s Outline. Biostatistics Statistical Inference Lecture 01 Introduction to BIOSTAT602 Principles of Data Reduction

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Hidden Markov Models. Terminology, Representation and Basic Problems

Recursive Estimation

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Introduc)on to Bayesian Methods

Bayesian statistics, simulation and software

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

More on nuisance parameters

Probabilistic Graphical Models

Some Probability and Statistics

Probability Review. Chao Lan

ST5215: Advanced Statistical Theory

GAUSSIAN PROCESS REGRESSION

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

CS 630 Basic Probability and Information Theory. Tim Campbell

Grundlagen der Künstlichen Intelligenz

Dynamic Programming Lecture #4

Some Probability and Statistics

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Dept. of Linguistics, Indiana University Fall 2015

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture 2: From Linear Regression to Kalman Filter and Beyond

RVs and their probability distributions

Transcription:

Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11

The Prior Distribution Definition Suppose that one has a statistical model with parameter θ. If one treats θ as random, then the distribution that one assigns to θ before observing the other random variables of interest is called its prior distribution. If the parameter space is at most countable, then the prior distribution is discrete and its probability function (p.f) is called prior p.f. of θ. If the prior distribution is a continuous distribution, then its probability density function (p.d.f.) is called prior p.d.f. of θ. The prior p.f. or p.d.f. will be denoted by ξ(θ). When θ is random, then prior distribution is another name of marginal distribution of the parameter. The prior distribution of a parameter θ must be a probability distribution over the parameter space Ω. We assume that, before the experimental data have been collected or observed, the experimenters past experience and knowledge will lead him to believe that θ is more likely to lie in certain regions of Ω than in others. Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 2 / 11

The Prior Distribution Estimation Example (Fair or Two-Headed Coin) Consider the experiment of tossing a coin, and let θ denote the probability of obtaining a head in each trial. a) Describe the statistical model. b) Define a prior distribution for θ (See R). c) Suppose that it is known that the coin either is fair or has a head on each side. Write the prior distribution for θ when the prior probability that the coin is fair is 0.8. Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 3 / 11

The Prior Distribution Some terminology Estimation In what follows we will consider statistical problems in which parameters, denoted θ, are random variables of interest. θ can denote one or more parameters. When θ is random, f (x θ) is the conditional p.f. or p.d.f. of observation X given θ. When θ is random, we say that X 1,..., X n are conditionally i.i.d. r.v.s given θ with conditional p.f. or p.d.f. f (x θ). This is denoted by X i θ i.i.d. F( θ), where F is the distribution of X i. X i θ i.i.d. F( θ) will be understood equivalently as X 1,..., X n form a random sample with p.f. or p.d.f. f (x θ). Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 4 / 11

Sensitivity analysis Estimation A sensitivity analysis is to compare the posterior distribution that arise from considering several different prior distributions in order to see how much effect the prior distribution has on the answers to important questions. If there are a lot of data, then different experimenters that do not agree on a prior distribution becomes less important. If the prior distributions being compared are very spread out, then it is not going to matter much which one is specified. Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 5 / 11

Improper priors Estimation There are some calculations available that attempt to capture the idea that the data contain much more information than is available a priori. These calculations result in defining a prior distribution for θ such that ξ(θ)dθ =, Ω which violates the definition of a prior. Such priors are called improper priors. Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 6 / 11

The Posterior Distribution Definition (Posterior Distribution) Consider a statistical inference problem with parameter θ and random variables X 1,..., X n to be observed. The conditional distribution of θ given X 1,..., X n is called the posterior distribution of θ. The conditional p.f. or p.d.f. of θ given X 1 = x 1,..., X n = x n is called the posterior p.f. or p.d.f. of θ and is denoted ξ(θ x 1,..., x n). When parameters are random, the name posterior distribution is another name for conditional distribution of the parameter given the data. Bayes theorem tells us how to compute the posterior p.f. or p.d.f of θ after observing data. Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 7 / 11

The Posterior Distribution Theorem (Posterior Distribution) Suppose that the n random variables X 1,..., X n form a random sample from a distribution for which the p.d.f. or the p.f. is f (x θ). Suppose also that the value of the parameter θ is unknown and the prior p.d.f. or p.f. of θ is ξ(θ). Then the posterior p.d.f. or p.f. of θ is n i=1 ξ(θ x 1,..., x n) = f (x i θ)ξ(θ), g n(x 1,..., x n) for θ Ω where g n is the marginal joint p.d.f. or p.f. of X 1,..., X n. Proof:... Using vector notation, where x = (x 1,..., x n), then n i=1 ξ(θ x) = f (x i θ)ξ(θ). g n(x) Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 8 / 11

The Posterior Distribution Example (Lifetime of components) Suppose that the distribution of the lifetimes of certain components is the exponential distribution with parameter θ, and the prior distribution for θ is the gamma distribution with parameters a and b. Suppose also that the lifetimes x 1 = 0.92, x 2 = 0.72, x 3 = 1.27, x 4 = 0.69, x 5 = 1.97 of a random sample of n = 5 components are observed. a) Find the posterior distribution of θ. b) Plot the prior and posterior distributions of θ when a = 5, b = 1, and when a = 1 and b = 0.2 (See R) Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 9 / 11

The Likelihood Function The posterior p.f. or p.d.f. of θ can be written as ξ(θ x) f n(x θ)ξ(θ), where is the proportionality symbol. The normalizing constant can be determined at any time. Definition When the joint p.d.f. or the joint p.f. f n(x θ) of the observations in a random sample is regarded as a function of θ for given values of x 1,..., x n, it is called the likelihood function. Example (Lifetime of components (cont.)) Plot the likelihood function (See R). Find the posterior distribution of θ (See R). Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 10 / 11

Sequential Observations and Prediction In many experiments, the observations X 1,..., X n, which form the random sample, must be obtained sequentially, that is, one at a time. If that is the case, it can be shown that ξ(θ x) f (x n θ)ξ(θ x 1,..., x n 1 ). Alternatively, after all n values x 1,..., x n have been observed, we could calculate the posterior p.d.f. in the usual way, and the posterior distribution would be the same (Check it as an exercise!). When the observations are sequentially sampled the proportionality constant has a useful interpretation as 1 over the conditional p.d.f. or p.f. of X n given X 1 = x 1,..., X n 1 = x n 1. Now we can find the p.d.f. or p.f. of a new random variable, X n+1 given that we have observed X 1 = x 1,..., X n = x n. Example (Lifetime of components (cont.)) Let X 6 be the lifetime of a new component. Compute P(X 6 > 1.5 x 1,..., x 5 ) (See R). Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 11 / 11