STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

Size: px

Start display at page:

Download "STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2"

Liliana Peters
5 years ago
Views:

1 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous calibration of how a testing rule reject H if T (x) > c will erform under different scenarios. Once a rule with good erformance (small size, large ower at alternatives) has been chosen, it is alied to the data in question to make a accet/reject decision on H. No quantification is rovided as to how certain we are about H being correct. The -value may seem to rovide one such calibration, but it is wrong to interret the -value as the robability of H being true. For observed data x, the -value may be interreted as the maximum robability of {T (X) > T (x)} under the null. If the -value =.2, then we are right to say either H is false or something rare (only 2% chance) has haened. Don t be fooled in thinking that this statement means H has 2% chance of being true. In fact, the statement says nothing as to what odds we are willing to assign to its either and or arts. A similarly structured statement can be made in the following situation. Suose you fli a coin 3 times and count 1498 heads. Then it is correct to state either the coin is biased or something rare (only 1.5% chance) has haened (because, when X Binomial(3,.5), P (X = 1498) =.15.). But in this case it is quite insane to susect that the coin maybe biased (certainly not with a 985-to-15 odds). At the other end of the sectrum of statistical ractice is the Bayesian aroach where one cares more about quantifying uncertainties about statements made regarding the data and roblem at hand. Below is a simle examle you might be familiar with. Examle (Clinical diagnosis). A clinical test for a relatively rare disease (only 1% of oulation affected) is tested to have a 99% accuracy rate on atients who have the disease, and a 2% failure rate on atients who do not have it. A atient takes the test and gets a ositive. What are the chances that he has the disease? Let D denote the event that this atient has the disease. Then P (D) =.1. Let T + denote the event that the test results in a ositive. Then P (T + D) =.99 and P (T + D c ) =.2 where D c is the comlement of D, i..e., the event that the atient does not have this disease. We want to evaluate P (D T + ). This is an instance of inverse robability calculation that we learn as the Bayes theorem in our robability course: P (D T + ) = P (D)P (T + D) P (D)P (T + D) + P (D c )P (T + D c ) =

2 So the atient has a one in three chance of having the disease. In other words, his odds of not having the disease is two to one. 2 Inference via inverse robability reasoning The formal comonents of the above analysis are 1. Plausibility sores attached to the two ossible states of the unknown disease status (P (D) =.1, and consequently, P (D c ) =.99). 2. Plausibility scores attached to test outcomes for each state of disease status (P (T + D) =.99 = 1 P (T D), where T is the event that the test gives a negative; similarly, P (T + D c ) =.2 = 1 P (T D c )). 3. Combining the above two via the Bayes theorem to udate the lausibility scores of D and D c once an outcome of the test has been observed. Comonent 2 above is like a robability model X f(x θ), with θ Θ an unobservable quantity of interest (disease status of the atient with Θ = {D, D c }), while X S is data to be observed (outcome of the clinical test, S = {T +, T }). One learns f(x θ) through exerience and laboratory exerimentation. Comonent 1 is the novel feature of the Bayesian aroach, where one needs to attach lausibility scores to the ossible states of the unobservable quantity of interest before any observation is made. This lausibility scores reresent one s rior belief the belief that recedes the observation rocess. Comonent 3 is ure mathematics, and results straight out of the Bayes theorem once comonents 2 and 3 have been secified and an observation has been made for the observable quantity. Prior belief is not a singular quantity and cannot be learned. Prior belief combines current understanding of the unknown quantity of interest with what one is willing to assume about it. It may vary from one erson to another. It may require more than a single set of lausibility scores to reresent one s rior belief. Examle (Clinical diagnosis (contd.)). For our clinical diagnosis examle, the fact that the atient has been recommended to take the test may ersuade us to ut P (D) between 1% to 3%. In this case, P (D T + ) ranges between 33% to 61%. 3 Bayesian analysis: from rior to osterior In the general setting, a Bayesian analysis of data X combines a statistical model X f(x θ), x S, θ Θ, with a rior df/mf ξ(θ) on Θ. This df/mf reresents re-observation (or a riori) lausibility scores of the arameter values. The function h(x, θ) = f(x θ)ξ(θ) is simly a d/mf on S Θ giving a re-observation joint descrition of X and θ. So, once X = x is observed, the ost-observation descrition of θ conditional on the observed data 2

3 on X is simly the conditional df/mf which can be written in the form of Bayes rule as: f(x θ)ξ(θ) θ A, if f(x θ)ξ(θ) > on some interval A A ξ(θ x) = f(x θ )ξ(θ )dθ f(x θ)ξ(θ) if f(x θ)ξ(θ) > on some discrete set B. θ B f(x θ )ξ(θ ) We shall call this df/mf ξ(θ x) the osterior df/mf of θ (on Θ) based on the model X f(x θ), the rior ξ(θ) and the observation X = x. 4 Likelihood function and osterior df/mf Note that ost the observation X = x, the relative lausibility of θ = θ 1 against θ = θ 2 is given by ξ(θ 1 x) ξ(θ 2 x) = f(x θ 1)ξ(θ 1 ) f(x θ 2 )ξ(θ 2 ) = L x(θ 1 ) L x (θ 2 ) ξ(θ 1) ξ(θ 2 ). Therefore the scores given by the osterior combine the scores given by the rior (reobservation beliefs) with scores given by the likelihood function (evidence/suort from observation). L x (θ)ξ(θ) Θ L x(θ )ξ(θ )dθ = a(θ)ξ(θ) ξ(θ x) = Θ a(θ )ξ(θ )dθ when ξ(θ) is a df, and the same holds when ξ(θ) is a mf. 5 An examle: female birth rate in 18th century Paris The great scholar Pierre-Simon, marquis de Lalace ( ) was interested in learning the rate [, 1] of female birth in Paris in the 18th century. He had access to a large body of birth records in Paris between 1745 to 177 with n entries. From these he could extract the total number of entires X which recorded a female birth. A reasonable model for X is X Binomial(n, ), [, 1]. For a rior df on, Lalace decided that he had no reason to believe that for any two 1, 2 [, 1], the case = 1 was more lausible than the case = 2. In other words, Lalace believed all ossible values of [, 1] to be equally lausible. A df that ensures this is the Uniform(, 1) df with ξ() = 1; [, 1]. For an observations X = x, where x {, 1,, n}, the likelihood function is L x () = const. x (1 ) n x. Therefore, the osterior df ξ( x) takes the form: ξ( x) = x (1 ) n x 1 qx (1 q) n x dq = x (1 )n x, [, 1]. B(x + 1, n x + 1) where B(a, b) denotes the beta function, defined for any a >, b > as B(a, b) = 1 q a 1 (1 q) b 1 dq = Γ(a)Γ(b) Γ(a + b), where Γ(a) = x a 1 ex( x)dx is the gamma function defined for every a >. 3

4 Toy examle ξ( x) Figure 1: Posterior dfs ξ( x) for the model X θ Binomial(n, ) and Uniform(, 1). Here n = 2 and the osterior ξ( x) is shown for each of x =, 1,, 2. The df g(y) = y a 1 (1 y) b 1 /B(a, b), y [, 1] is called the beta df with arameters a, b (both must be ositive), and is denoted Beta(a, b). Therefore, ξ( x) equals Beta(x + 1, n x+1). In R, you can use dbeta(), beta(), qbeta() and rbeta() to get, resectively, the density function, the cumulative distribution function, the quantile function and random observations from a beta distribution. Figure 1 shows ξ( x) against for a toy setting with n = 2. Each curve on the Figure corresonds to one x in the range, 1, 2,, n. As x increases from to n, the eak of the ξ( x) curve shifts from left to right. The data Lalace had contained n = records with x = female births. This leads to a ξ( x) = Beta(241946, ) osterior distribution for θ. Figure 2 shows this osterior df. Below are some of several ossible summaries of the osterior. Lalace was concerned whether the female birth rate was smaller than the commonly held figure of 5%. The lausibility of this event, based on Lalace s model and observed data, is simly P (.5 X = x) =.5 ξ( x)d, which in R can be comuted as > beta(.5, , ) = 1. Kee in mind that this number is close to 1, but not exactly 1. In fact, it is more useful to look at converse: P ( >.5 X = x) = 1 P (.5 x) = 1 ξ( x)d, which equals.5 > beta(.5, , ) = 1.146e-42. Lalace concluded that he was morally certain that Θ is smaller than.5. If we are interested in a single number summary of, we could try to extract a single number summary of the df ξ( x). An attractive choice is the mean (exectation) 4

5 Female birth rate ξ( x) ξ( x) Figure 2: Posterior df ξ( x) for female birth analysis by Lalace. The 5% rate (θ =.5) is highlighted with a dotted vertical line in the middle. The osterior concentrates at a value lower than this mark. The right anel shows the same, but zooms into the range [.48,.5]. under this df: (x) = E[ X = x] = 1 ξ( x)d. The mean of a Beta(a, b) df equals a/(a + b), therefore the osterior mean of the female birth rate is > / ( ) =.49 If we are interested in reorting a range of values of, we can look for an interval such that the df ξ( x) assigns a small robability outside this interval. This is best reresented by the quantiles u (x) of ξ( x), defined for any u (, 1), as the oint a such that P (θ a X = x) = a ξ( x)d = u. In articular, for any α (, 1), the interval A(x) = [ α/2 (x), 1 α/2 (x)] satisfies: P ( A(x) X = x) = P ( < α/2 (x) X = x) + P ( > 1 α/2 (x) X = x) = α/2 + α/2 = α. For α = 5%, the end-oints of the interval [ α/2 (x), 1 α/2 (x)] equal > lower.end <- qbeta(.5 / 2, , ) =.489 > uer.end <- qbeta(1 -.5 / 2, , ) =.491 and, indeed, we can say the (osterior) robability of {.489 θ.491} equals 95%. 5

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch