Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts

Basics of Iferece Lecture 21: Sta230 / Mth230 Coli Rudel Aril 16, 2014 U util this oit i the class you have almost exclusively bee reseted with roblems where we are usig a robability model where the model arameters are give. I the real world this almost ever haes, a much more commo situatio is that you have collected some data ad have a idea about what tye of robability model might be aroriate but you do t kow (or have a guess / belief) about the values of the model arameters. Basic setu: x - a observed data oit θ - arameter (or vector of arameters) of the distributio roducig the data oits X - set of observed data oits Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 1 / 21 Review - Examle - Defective Parts Suose that a certai machie roduces defective ad odefective arts, but we do ot kow what roortio of defectives we would fid amog all arts that could be roduced by the machie. The distributio of X, assumig that we kow P =, is the biomial distributio with arameters ad. Give o other iformatio we might believe that P has a cotiuous distributio with df such as f P () = 1 for (0, 1). What is the joit robability of f (x, )? What is the margial distributio of X? f (x ) = ( x ) x (1 ) x, for x = 0, 1,..., f (x, ) = ( x ) x (1 ) x, for x = 0, 1,..., ad 0 1 ( ) 1 1 f X (x) = f (x, ) d = x (1 ) x d 0 x 0 ( ) ( ) = Γ(x 1)Γ( x 1) B(x + 1, x + 1) = x x Γ( 2) Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 2 / 21 Review - Examle - Defective Parts, cot. Based o the recedig results, what is the coditioal distributio of P give X = 5 ad N = 10? ( f (x, ) f ( x) = x) x f X (x) = (1 ) x ( Γ(x+1)Γ( x+1) x) Γ(+2) Γ( + 2) = Γ(x + 1)Γ( x + 1) x (1 ) x, for 0 1 Which is a Beta distributio with arameters α = x + 1 ad β = x + 1, therefore if X = 5 ad N = 10 the f ( x = 5, = 10) Beta(6, 6) E(P X = 5, N = 10) = 6 6 + 6 = 1/2 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 3 / 21

Review - Examle - Defective Parts, cot. f ( x = 5, = 10) Beta(6, 6) f( x) 0.0 0.5 1.0 1.5 2.0 2.5 f ( x = 1, = 10) Beta(2, 10) f ( x = 50, = 100) Beta(51, 51) f( x) 0 2 4 6 8 f ( x = 9, = 10) Beta(10, 2) As you might exect this aroach to iferece is based o Bayes Theorem which states P(A B) = P(B A)P(A) P(B) We are iterested i estimatig the model arameters based o the observed data ad ay rior belief about the arameters, which we setu as follows f( x) 0 1 2 3 4 f( x) 0 1 2 3 4 P(θ X ) = P(X θ) P(X ) π(θ) P(X θ) π(θ) Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 4 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 5 / 21 - Termiology Elemets of the Bayesia Model: π(θ) - Prior distributio - This distributio reflects ay reexistig iformatio / belief about the distributio of the arameter(s). P(X θ) - Likelihood / Samlig distributio - Distributio of the data give the arameters, which is the robability model believed to have geerated the data. P(X ) - Margial distributio of the data - Distributio of the observed data margialized over all ossible values of the arameter(s). P(θ X ) - Posterior distributio - Distributio of the arameter(s) after takig the observed data ito accout. Examle - Defective Parts, i Bayesia Terms For the Defective Parts we foud the joit, margial ad coditioal distributios. I terms of Bayesia iferece: Data - X - Number of defective arts Parameters - - Proortio of arts that are defective Prior distributio - π() = 1, for x (0, 1) Likelihood / Samlig distributio - f (x ) = ( ) x x (1 ) x Margial distributio of the data - f X (x) = ( ) Γ(x 1)Γ( x 1) x Γ( 2) Posterior distributio - f ( x) = Γ(+2) Γ(x+1)Γ( x+1) x (1 ) x, for 0 1 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 6 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 7 / 21

Examle - Defective Parts, Redux Whe we last worked through this roblem I claimed that sice we did t kow what the roortio of defective arts we should use a uiform rior (all values betwee 0 ad 1 equally likely). What could we do if we believed that the roortio was close to 0? Examle - Defective Parts, Redux Lets fid the osterior distributio of for a rior, Beta(α, β) Remember that the Uif(0, 1) is a secial case of the beta distributio where α = 1, β = 1, we ca try tweakig α ad β to better rereset this belief. Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 8 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 9 / 21 f( x) Examle - Defective Parts, Redux Cosequetly, if we a riori believed that the roortio of defective arts was close to zero we might use a Beta(1, 3) rior which would give us the followig osteriors for 1, 5, or 9 defective arts i 10. 0 1 2 3 4 5 P X = 1, N = 10 Beta(2, 12) f( x) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 P X = 5, N = 10 Beta(6, 8) f( x) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 P X = 9, N = 10 Beta(10, 4) Cojugate Distributios / Priors I the case of a Biomial likelihood we have just see that ay Beta rior we ick will result i a osterior that is also a Beta distributio. For a articular likelihood whe a rior ad osterior belog to the same distributio family this distributio is referred to as a cojugate rior. I this case the Beta distributio is a cojugate rior for the Biomial likelihood. Cojugate riors are immesely useful as they rovide simle aalytic solutio to this tye of iferece roblem, but they are also somewhat limitig sice our rior belief may ot be reresetable usig the cojugate family s arameterizatio. Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 10 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 11 / 21

Biomial ad a No-cojugate Prior Lets cosider a situatio where we do ot use a Beta rior, ad istead ot for a trucated Normal distributio o (0,1). What do we do the? This kid of situatio haes all the time i Bayesia iferece, we set u a model which results i a (seemigly) itractable osterior distributio. Istead of a aalytic solutio we make use of umerical Mote Carlo methods to geerate samles from the distributio, which ca be used to estimate the distributio ad its roerties. These methods are effective but comutatioally itesive, this is the reaso why Bayesia methods have become oular i the last 30 years as sufficiet comutatioal ower has become available to make use of these methods. More o this if you take Sta 250 or 360 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 12 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 13 / 21 Sequetial Udates Examle - Defective Parts - Sequetial Udates We have already show that if we have a Beta(1, 1) rior o the roortio of defective arts ad if we observe 5 of 10 arts are defective the we would have a Beta(6, 6) osterior for the roortio. If we were to the isect 10 more arts ad foud that 5 were defective, how should we udate our osterior? If we cosider this as two iid data oits (x 1, x 2 ), there are two otios: Take both ito accout at the same time whe calculatig the osterior We have already show that if we have a Beta(1, 1) rior o the roortio of defective arts ad if we observe 5 of 10 arts are defective the we would have a Beta(6, 6) osterior for the roortio. If we were to the isect 10 more arts ad foud that 5 were defective, how should we udate our osterior? If we cosider this as two iid data oits (x 1, x 2 ), there are two otios: Take both ito accout at the same time whe calculatig the osterior f ( x) = f (x ) f X (x) π() = f (x 1 )f (x 2 ) π() f X (x) f ( x) = f (x 1 )f (x 2 ) f X (x) π() 5 (1 ) 5 5 (1 ) 5 Beta(11, 11) First udate the rior usig x 1 ad the use f ( x 1 ) as the rior whe udatig usig x 2. f ( x 2, x 1 ) = f (x 2 ) f X (x 2 ) f ( x 1) = f (x 2 ) f (x 1 ) f X (x 2 ) f X (x 1 ) π() First udate the rior usig x 1 ad the use f ( x 1 ) as the rior whe udatig usig x 2. f ( x 2, x 1 ) = f (x 2 ) f X (x 2 ) f ( x 1) 5 (1 ) 5 5 (1 ) 5 Beta(11, 11) Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 14 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 15 / 21

Examle - Defective Parts - k lots We ca geeralize our results to k lots with differet lot sizes. Let X 1,..., X k be the umber of defective arts i each lot (which are iid) ad 1,..., k the umber of arts examied i each lot the for a rior Beta(α, β) Examle - Exoetial Distributio Let X be the lifesa of a Fluorescet lam which is modeled by a exoetial distributio with arameter λ where our rior belief o λ is give by a Gamma distributio with arameters k ad θ. If the failures of the lams are ideedet ad we observe the lifesa of lams (x 1,..., x ) what should our osterior distributio for λ be? Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 16 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 17 / 21 Likelihood of Multile Normal Data Poits If we are collectig data from a rocess that follows a ormal distributio with mea µ ad variace σ 2 ad where each observatio is iid, what is the likelihood of of these observatios (x 1, x 2,..., x )? Cojugate Prior for the Normal Distributio Lets cosider a Normal distributio with mea µ ad variace σ 2, if we assume that σ 2 is kow but µ is ot. What is the osterior distributio of µ if the rior µ N (λ, τ 2 )? Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 18 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 19 / 21

Cojugate Prior for the Normal Distributio, cot. Where to go from here? Hierarchical Models: Θ θ 11 θ 21 θ 31 y 11 y 12 y 13 y 21 y 22 y 23 y 31 y 32 y 33 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 20 / 21 Sta230 / Mth230 (Coli Rudel) Lecture 21 Aril 16, 2014 21 / 21