Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts

Similar documents
tests 17.1 Simple versus compound

( ) = is larger than. the variance of X V

On the Beta Cumulative Distribution Function

Chapter 18: Sampling Distribution Models

Distribution of Sample Proportions

Chapter 6 Principles of Data Reduction

ENGI 4421 Discrete Probability Distributions Page Discrete Probability Distributions [Navidi sections ; Devore sections

Bayesian Methods: Introduction to Multi-parameter Models

Confidence Intervals

To make comparisons for two populations, consider whether the samples are independent or dependent.

Recursive Updating Fixed Parameter

13.1 Shannon lower bound

Estimating Proportions

Confidence Intervals for the Difference Between Two Proportions

Exponential Families and Bayesian Inference

Chapter 9, Part B Hypothesis Tests

Topic 4. Representation and Reasoning with Uncertainty

Hypothesis Testing. H 0 : θ 1 1. H a : θ 1 1 (but > 0... required in distribution) Simple Hypothesis - only checks 1 value

p we will use that fact in constructing CI n for population proportion p. The approximation gets better with increasing n.

Last Lecture. Biostatistics Statistical Inference Lecture 16 Evaluation of Bayes Estimator. Recap - Example. Recap - Bayes Estimator

= p x (1 p) 1 x. Var (X) =p(1 p) M X (t) =1+p(e t 1).

STAT-UB.0103 NOTES for Wednesday 2012.APR.25. Here s a rehash on the p-value notion:

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Nuclear Physics Worksheet

Lecture 9: September 19

Chapter 6 Sampling Distributions

Chapter 2 The Monte Carlo Method

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Elementary manipulations of probabilities

COMPUTING FOURIER SERIES

Confidence intervals for proportions

Bayesian Approach for ARMA Process and Its Application

Statistics 511 Additional Materials

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

Probability and MLE.

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 2: Monte Carlo Simulation

Lecture 11 and 12: Basic estimation theory

BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

ECE534, Spring 2018: Final Exam

AMS570 Lecture Notes #2

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

1.010 Uncertainty in Engineering Fall 2008

Estimation for Complete Data

Estimation for a Class of Generalized State-Space Space Models:

Putnam Training Exercise Counting, Probability, Pigeonhole Principle (Answers)

Pharmacogenomics. Yossi Levy Yossi Levy. Central Dogma of Molecular Biology

A statistical method to determine sample size to estimate characteristic value of soil parameters

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Songklanakarin Journal of Science and Technology SJST R1 Teerapabolarn

Lecture 19: Convergence

Statistical Pattern Recognition

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Random Variables, Sampling and Estimation

Statistics 300: Elementary Statistics

Simulation. Two Rule For Inverting A Distribution Function

Expectation and Variance of a random variable

Three classification models Discriminant Model: learn the decision boundary directly and apply it to determine the class of each data point

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Expectation-Maximization Algorithm.

L S => logf y i P x i ;S

SDS 321: Introduction to Probability and Statistics

6. Sufficient, Complete, and Ancillary Statistics

Machine Learning Brett Bernstein

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

John H. J. Einmahl Tilburg University, NL. Juan Juan Cai Tilburg University, NL

1 Models for Matched Pairs

Statistics Definition: The science of assembling, classifying, tabulating, and analyzing data or facts:

LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments

STAT Homework 1 - Solutions

Parameter, Statistic and Random Samples

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Regression and generalization

Quick Review of Probability

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

MTH 212 Formulas page 1 out of 7. Sample variance: s = Sample standard deviation: s = s

Element sampling: Part 2

A NEW METHOD FOR CONSTRUCTING APPROXIMATE CONFIDENCE INTERVALS FOR M-ESTU1ATES. Dennis D. Boos

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

PUTNAM TRAINING PROBABILITY

CS284A: Representations and Algorithms in Molecular Biology

(7 One- and Two-Sample Estimation Problem )

Summary. Recap ... Last Lecture. Summary. Theorem

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Approximations and more PMFs and PDFs

Chapter 6: BINOMIAL PROBABILITIES

Estimation Theory Chapter 3

Coping with Insufficient Data: The Case of Household Automobile Holding Modeling by Ryuichi Kitamura and Toshiyuki Yamamoto

15-780: Graduate Artificial Intelligence. Density estimation

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

Transcription:

Basics of Iferece Lecture 21: Sta230 / Mth230 Coli Rudel Aril 16, 2014 U util this oit i the class you have almost exclusively bee reseted with roblems where we are usig a robability model where the model arameters are give. I the real world this almost ever haes, a much more commo situatio is that you have collected some data ad have a idea about what tye of robability model might be aroriate but you do t kow (or have a guess / belief) about the values of the model arameters. Basic setu: x - a observed data oit θ - arameter (or vector of arameters) of the distributio roducig the data oits X - set of observed data oits Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 1 / 21 Review - Examle - Defective Parts Suose that a certai machie roduces defective ad odefective arts, but we do ot kow what roortio of defectives we would fid amog all arts that could be roduced by the machie. The distributio of X, assumig that we kow P =, is the biomial distributio with arameters ad. Give o other iformatio we might believe that P has a cotiuous distributio with df such as f P () = 1 for (0, 1). What is the joit robability of f (x, )? What is the margial distributio of X? f (x ) = ( x ) x (1 ) x, for x = 0, 1,..., f (x, ) = ( x ) x (1 ) x, for x = 0, 1,..., ad 0 1 ( ) 1 1 f X (x) = f (x, ) d = x (1 ) x d 0 x 0 ( ) ( ) = Γ(x 1)Γ( x 1) B(x + 1, x + 1) = x x Γ( 2) Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 2 / 21 Review - Examle - Defective Parts, cot. Based o the recedig results, what is the coditioal distributio of P give X = 5 ad N = 10? ( f (x, ) f ( x) = x) x f X (x) = (1 ) x ( Γ(x+1)Γ( x+1) x) Γ(+2) Γ( + 2) = Γ(x + 1)Γ( x + 1) x (1 ) x, for 0 1 Which is a Beta distributio with arameters α = x + 1 ad β = x + 1, therefore if X = 5 ad N = 10 the f ( x = 5, = 10) Beta(6, 6) E(P X = 5, N = 10) = 6 6 + 6 = 1/2 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 3 / 21

Review - Examle - Defective Parts, cot. f ( x = 5, = 10) Beta(6, 6) f( x) 0.0 0.5 1.0 1.5 2.0 2.5 f ( x = 1, = 10) Beta(2, 10) f ( x = 50, = 100) Beta(51, 51) f( x) 0 2 4 6 8 f ( x = 9, = 10) Beta(10, 2) As you might exect this aroach to iferece is based o Bayes Theorem which states P(A B) = P(B A)P(A) P(B) We are iterested i estimatig the model arameters based o the observed data ad ay rior belief about the arameters, which we setu as follows f( x) 0 1 2 3 4 f( x) 0 1 2 3 4 P(θ X ) = P(X θ) P(X ) π(θ) P(X θ) π(θ) Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 4 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 5 / 21 - Termiology Elemets of the Bayesia Model: π(θ) - Prior distributio - This distributio reflects ay reexistig iformatio / belief about the distributio of the arameter(s). P(X θ) - Likelihood / Samlig distributio - Distributio of the data give the arameters, which is the robability model believed to have geerated the data. P(X ) - Margial distributio of the data - Distributio of the observed data margialized over all ossible values of the arameter(s). P(θ X ) - Posterior distributio - Distributio of the arameter(s) after takig the observed data ito accout. Examle - Defective Parts, i Bayesia Terms For the Defective Parts we foud the joit, margial ad coditioal distributios. I terms of Bayesia iferece: Data - X - Number of defective arts Parameters - - Proortio of arts that are defective Prior distributio - π() = 1, for x (0, 1) Likelihood / Samlig distributio - f (x ) = ( ) x x (1 ) x Margial distributio of the data - f X (x) = ( ) Γ(x 1)Γ( x 1) x Γ( 2) Posterior distributio - f ( x) = Γ(+2) Γ(x+1)Γ( x+1) x (1 ) x, for 0 1 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 6 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 7 / 21

Examle - Defective Parts, Redux Whe we last worked through this roblem I claimed that sice we did t kow what the roortio of defective arts we should use a uiform rior (all values betwee 0 ad 1 equally likely). What could we do if we believed that the roortio was close to 0? Examle - Defective Parts, Redux Lets fid the osterior distributio of for a rior, Beta(α, β) Remember that the Uif(0, 1) is a secial case of the beta distributio where α = 1, β = 1, we ca try tweakig α ad β to better rereset this belief. Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 8 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 9 / 21 f( x) Examle - Defective Parts, Redux Cosequetly, if we a riori believed that the roortio of defective arts was close to zero we might use a Beta(1, 3) rior which would give us the followig osteriors for 1, 5, or 9 defective arts i 10. 0 1 2 3 4 5 P X = 1, N = 10 Beta(2, 12) f( x) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 P X = 5, N = 10 Beta(6, 8) f( x) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 P X = 9, N = 10 Beta(10, 4) Cojugate Distributios / Priors I the case of a Biomial likelihood we have just see that ay Beta rior we ick will result i a osterior that is also a Beta distributio. For a articular likelihood whe a rior ad osterior belog to the same distributio family this distributio is referred to as a cojugate rior. I this case the Beta distributio is a cojugate rior for the Biomial likelihood. Cojugate riors are immesely useful as they rovide simle aalytic solutio to this tye of iferece roblem, but they are also somewhat limitig sice our rior belief may ot be reresetable usig the cojugate family s arameterizatio. Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 10 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 11 / 21

Biomial ad a No-cojugate Prior Lets cosider a situatio where we do ot use a Beta rior, ad istead ot for a trucated Normal distributio o (0,1). What do we do the? This kid of situatio haes all the time i Bayesia iferece, we set u a model which results i a (seemigly) itractable osterior distributio. Istead of a aalytic solutio we make use of umerical Mote Carlo methods to geerate samles from the distributio, which ca be used to estimate the distributio ad its roerties. These methods are effective but comutatioally itesive, this is the reaso why Bayesia methods have become oular i the last 30 years as sufficiet comutatioal ower has become available to make use of these methods. More o this if you take Sta 250 or 360 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 12 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 13 / 21 Sequetial Udates Examle - Defective Parts - Sequetial Udates We have already show that if we have a Beta(1, 1) rior o the roortio of defective arts ad if we observe 5 of 10 arts are defective the we would have a Beta(6, 6) osterior for the roortio. If we were to the isect 10 more arts ad foud that 5 were defective, how should we udate our osterior? If we cosider this as two iid data oits (x 1, x 2 ), there are two otios: Take both ito accout at the same time whe calculatig the osterior We have already show that if we have a Beta(1, 1) rior o the roortio of defective arts ad if we observe 5 of 10 arts are defective the we would have a Beta(6, 6) osterior for the roortio. If we were to the isect 10 more arts ad foud that 5 were defective, how should we udate our osterior? If we cosider this as two iid data oits (x 1, x 2 ), there are two otios: Take both ito accout at the same time whe calculatig the osterior f ( x) = f (x ) f X (x) π() = f (x 1 )f (x 2 ) π() f X (x) f ( x) = f (x 1 )f (x 2 ) f X (x) π() 5 (1 ) 5 5 (1 ) 5 Beta(11, 11) First udate the rior usig x 1 ad the use f ( x 1 ) as the rior whe udatig usig x 2. f ( x 2, x 1 ) = f (x 2 ) f X (x 2 ) f ( x 1) = f (x 2 ) f (x 1 ) f X (x 2 ) f X (x 1 ) π() First udate the rior usig x 1 ad the use f ( x 1 ) as the rior whe udatig usig x 2. f ( x 2, x 1 ) = f (x 2 ) f X (x 2 ) f ( x 1) 5 (1 ) 5 5 (1 ) 5 Beta(11, 11) Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 14 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 15 / 21

Examle - Defective Parts - k lots We ca geeralize our results to k lots with differet lot sizes. Let X 1,..., X k be the umber of defective arts i each lot (which are iid) ad 1,..., k the umber of arts examied i each lot the for a rior Beta(α, β) Examle - Exoetial Distributio Let X be the lifesa of a Fluorescet lam which is modeled by a exoetial distributio with arameter λ where our rior belief o λ is give by a Gamma distributio with arameters k ad θ. If the failures of the lams are ideedet ad we observe the lifesa of lams (x 1,..., x ) what should our osterior distributio for λ be? Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 16 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 17 / 21 Likelihood of Multile Normal Data Poits If we are collectig data from a rocess that follows a ormal distributio with mea µ ad variace σ 2 ad where each observatio is iid, what is the likelihood of of these observatios (x 1, x 2,..., x )? Cojugate Prior for the Normal Distributio Lets cosider a Normal distributio with mea µ ad variace σ 2, if we assume that σ 2 is kow but µ is ot. What is the osterior distributio of µ if the rior µ N (λ, τ 2 )? Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 18 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 19 / 21

Cojugate Prior for the Normal Distributio, cot. Where to go from here? Hierarchical Models: Θ θ 11 θ 21 θ 31 y 11 y 12 y 13 y 21 y 22 y 23 y 31 y 32 y 33 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 20 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 21 / 21