1 Hypothesis Testing and Model Selection

Size: px
Start display at page:

Download "1 Hypothesis Testing and Model Selection"

Transcription

1 A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection For Bayesians, model selection and model criticisms are extremely important inference problems. These problems are usually more difficult than point estimation or credible set construction. Suppose X θ has the density f(x θ), with θ being an unknown element in the parameter space Θ. Suppose that we are interested in comparing two models M and M 1, which are given by M : X has density f(x θ) where θ Θ M 1 : X has density f(x θ) where θ Θ 1. (1) For i =, 1, let g i (θ) be the prior density of θ, conditional on M i being the true model. Then to compare models M and M 1 on the basis of a random sample x = (x 1,..., x n ) we use the Bayes factor B 1 (x) = m (x)/m 1 (x), where m i (x) = f(x θ)g i (θ)dθ, i =, 1. (2) Θ i We also often use the notation BF 1 to denote this Bayes factor. Recall from Module 1 that if π = P π (M ) = P π (Θ ) and π 1 = 1 π = P π (M 1 ), then the posterior probability of M is { P (M x) = π 1 B1 (x)} 1. (3) π Thus, if conditional prior densities g and g 1 can be specified, we simply use the Bayes factor B 1 for model selection. If further π is also specified, we can use the posterior probability of M i, or the posterior odds ratio of M to M 1 for model selection. Consider a different model checking problem now, that of testing for normality. In its simplest form, the problem can be stated as checking whether a given random sample X 1,..., X n arose from a population having the normal distribution. In the setup given above in (1), we may write it as M : X is N(µ, σ 2 ) with arbitrary µ and σ 2 > M 1 : X does not have the normal distribution. (4) 1

2 The above model selection problem looks quite different from (1) above, because M 1 does not constitute a parametric alternative. Here Bayes factor or the posterior odds will not work. But we will later introduce Bayesian p value which works. Laplace approximation of Bayes factor: We approximate the Bayes factor B 1 by using Laplace approximation to the two marginal densities. Recall the marginal density under the model M i is m i (x) = f(x θ i )g i (θ i )dθ i. If θ i denotes the posterior mode of θ i, then by Taylor expansion log{f(x θ i )g i (θ i )} log{f(x θ i )g i ( θ i )} 1 2 (θ i θ i ) T H θi (θ i θ i ), where is the second-derivative matrix of log{f(x θ H θi i )g i (θ i )} wrt θ i evaluated at θ i. Then by Laplace approximation we get { m i (x) f(x θ i )g i ( θ i ) exp 1 } 2 (θ i θ i ) T (θ H θi i θ i ) dθ i = f(x θ i )g i ( θ i )(2π) p i 2 H θi 1 2. (5) The Bayes factor is often reported in log-scale and 2 log(b 1 ) is used as a evidential measure to compare the support provided by the data x for M relative to M 1. Using the above approximation we get { } { } f(x θ ) g ( θ ) 2 log(b 1 ) 2 log + 2 log f(x θ 1 ) g 1 ( θ 1 ) { +(p p 1 ) log(2π) + log H θ1 H θ }. (6) If ˆθ i denotes the MLE of θ i under the model M i, then since θ i ˆθ i = O(1/n), by ignoring all terms of order O(n 1 ), we have the following alternative approximation: { } { } f(x ˆθ ) g (ˆθ ) 2 log(b 1 ) 2 log + 2 log f(x ˆθ 1 ) g 1 (ˆθ 1 ) { } +(p p 1 ) log(2π) + log Hˆθ1. (7) Hˆθ If Hˆθi denotes the per unit observation observed Fisher information matrix for θ i in model M i, then using log( Hˆθi ) = p i log(n)+log( Hˆθi ), an approximation to (7) correct to O(1) 2

3 is { } f(x ˆθ ) 2 log(b 1 ) 2 log (p p 1 ) log n. (8) f(x ˆθ 1 ) This is the approximate Bayes factor based on the Bayesian information criterion (BIC) due to Schwarz (1978). The term (p p 1 ) log n can be considered a penalty for using a more complex model. A related criterion is { } f(x ˆθ ) 2 log(b 1 ) 2 log 2(p p 1 ), (9) f(x ˆθ 1 ) which is based on the Akaike information criterion (AIC), namely, AIC = 2 log f(x ˆθ) 2p for a model f(x θ). The penalty for using a complex model is not as drastic as that in BIC. 2 P value and Posterior Probability of H as Measures of Evidence Against the Null One particular tool from classical statistics that is very widely used in applied sciences for model checking or hypothesis testing is the P value. The basic idea behind R.A. Fisher s (1973) original (1925) definition of P value had a great deal of appeal. It is the probability under a (simple) null hypothesis of obtaining a value of a test statistic that is at least as extreme as that observed in the sample data. Suppose that it is desired to test H : θ = θ versus H 1 : θ θ, (1) and that a classical significance test is available and is based on a test statistic T (X), large values of which are deemed to provide evidence against the null hypothesis. If data X = x is observed, with corresponding t = T (x), the P value then is α = P θ {T (X) T (x)}. Remark 1. Fisher meant P value to be used as a measure of degree of surprise in the data relative to H. This use of P value as a post-experimental or conditional measure of statistical evidence seems to have some intuitive justification. However, Bayesians 3

4 have raised various objections to the use of P value as an evidence against H. The P value appears to be too strict against H. To a Bayesian the posterior probability of H summarizes evidence against H. In many common testing problems, the P value is smaller than the posterior probability of H by an order of magnitude. The reason for this is that the P value ignores the likelihood of the data under the alternative and takes into account not only the observed deviation of the data from the null hypothesis as measured by the test statistic but also more significant deviations. We give a simple example below where the P value can be quite different from the posterior probability. Example 1. Suppose for known σ 2, X θ N(θ, σ 2 /n). We wish to test H : θ = θ versus H 1 : θ θ. Let T = n( X θ )/σ be the test statistic with an observed value t = n( x θ )/σ. It can be checked that the P value is given by α = 2[1 Φ(t)] where Φ( ) is the standard normal cdf. For the point null hypothesis, on the set {θ θ }, let θ have the density (g 1 ) of N(µ, τ 2 ). Then under H 1, marginally, X N(µ, τ 2 + n 1 σ 2 ). If ρ = (σ/ n)/τ and η = (θ µ)/τ, then (2πn 1 σ 2 ) 1/2 exp[ n ( x θ 2σ B 1 = 2 ) 2 ] (2π(τ 2 + n 1 σ 2 )) 1/2 exp[ ( x µ)2 ] 2(τ 2 +n 1 σ 2 ) = ( σ2 + nτ 2 [ ) 1/2 exp n { ( x θ ) 2 ( x }] µ)2 σ 2 2 σ 2 σ 2 + nτ 2 = [ 1 + ρ 2 exp t2 2 + n( x θ + θ µ) 2 ] = [ 1 + ρ 2 exp t2 2 + t2 2 = { 1 + ρ 2 exp t2 2 = { 1 + ρ 2 exp 1 2 Now, if we choose µ = θ, τ = σ and π = 1/2, we get 2(σ 2 + nτ 2 ) σ 2 σ 2 + nτ + n(θ µ)( x θ ) + n(θ µ) 2 ] 2 σ 2 + nτ 2 2(σ 2 + nτ 2 ) nτ 2 σ 2 + nτ + θ µ n( x θ ) σ 2 τ σ [ (t ρη) 2 ]} (1 + ρ 2 ) η2. B 1 = { 1 + ρ 2 exp 1 [ t 2 ]} 2 (1 + ρ 2 ) nτ σ 2 + nτ 2 + (θ µ) 2 2τ 2 nτ 2 σ 2 + nτ 2 } and P (H x) = (1 + B 1 ) 1. For various values of t and n, the different measures of evidence, α = P -value, B = Bayes factor, and P = P (H x) are displayed in the table below (taken from GDS). It may be noted that the posterior probability of H varies between 4 and 5 times the corresponding P -value which is an indication of how different these two measures of evidence can be. 4

5 Table 1: Normal Example: Measures of Evidence n t α B P B P B P B P B P B P Interval null hypotheses and one-sided tests Closely related to a sharp null hypothesis H : θ = θ is an interval null hypothesis H : θ θ ɛ. The conflict between the P -values and the posterior probabilities remain if ɛ is small. Clearly, the disagreement also depends on the sample size n. However, the situation is somewhat different for one-sided null and alternative hypotheses. If θ is the normal mean, then with a uniform prior a direct calculation shows that the P -value for testing H : θ θ versus H : θ > θ is equal to the posterior probability of H. In general, these two values are not the same, and one may be higher or smaller than the other depending on the family of densities in the model. 2.2 Jeffreys-Lindley Paradox Suppose X 1,..., X n are iid N(θ, σ 2 ), σ 2 known, and consider testing H : θ = θ versus H 1 : θ θ. We now show that for a fixed prior density the conflict between the P -values and the posterior probabilities of H is enhanced as n goes to infinity. This is known as Jeffreys-Lindley Paradox. Without loss of generality take θ =. Consider a uniform prior density over some interval ( a, a). The posterior probability of H given X is P (H X) = π exp[ n X 2 /(2σ 2 )], (11) K where π is the specified prior probability of H and K = π exp[ n X 2 /(2σ 2 )] + 1 π 2a a a exp[ n( X θ) 2 /(2σ 2 )]dθ. Suppose the data is such that X = z α σ/ n where z α is the 1 αth quantile of standard normal. Then X is significant at level α. Also, for sufficiently large n, X is well within ( a, a) because X tends to zero as n increases. This leads to a a exp[ n( X θ) 2 /(2σ 2 )]dθ = σ (2π/n) 5

6 and hence from (11) P (H X) = π exp( zα/2) 2 π exp( zα/2) 2 + (1 π ) σ (2π/n). 2a Thus P (H X) 1 as n whereas the P -value is equal to α for all n. This is known as Jeffreys-Lindley paradox. This phenomenon would continue to hold with flat enough prior in place of uniform. Indeed, P -values cannot be compared across sample sizes or across experiments. Even a frequentist tends to agree that the conventional values of the significance level α like α =.5 or.1 are too large for large sample sizes. Jeffreys-Lindley paradox shows that for inference about θ, P -values and Bayes factors may provide contradictory evidence and hence can lead to opposite decisions. The evidence against H in the P -values seems unrealistically high. 3 Bayesian P -value Bayes factors and more appropriately posterior probabilities of hypotheses are in principle correct tools to measure evidence for or against some hypotheses. But, they are often hard to compute or even impossible to compute in the event the alternative is vaguely specified or not specified at all. Bayesian P -values have been proposed to deal with such problems. Let M be a target model, and departure from this model be of interest. If, under this model, X has density f(x η), η Ξ, then for a Bayesian with prior π on η, m π (x) = f(x η)π(η)dη, the prior predictive distribution is the actual predictive distribution of Ξ X. Therefore, if a model departure statistic T (X) is available, then one can define the prior predictive P -value (or tail area under the predictive distribution) as p = P mπ {T (X) T (x obs ) M }, where x obs is the observed value of X. This quantity is heavily influenced by the prior distribution. Example 2. Let X 1,..., X n be a random sample from N(θ, σ 2 ). Suppose H : θ =. Case (a): If σ 2 is assumed to be known as σ, 2 using the statistic X, the predictive P -value is P = P [ n X > n x obs θ =, σ] 2 n xobs = 2Φ( ). σ If σ 2 highly underestimates the actual model variance, then P is very small and the evidence against H is overestimated. 6

7 Case (b): If σ 2 is unknown and assigned an improper prior π(σ 2 ) = 1/σ 2, then m π (x) = ( f X (x σ 2 )π(σ 2 )dσ 2 exp( 1 2σ 2 x 2 i ) n/2, x 2 i )(σ 2 n/2 dσ2 ) σ 2 which is an improper density, thus completely disallowing computation of the prior predictive P -value. [This is not surprising since with any improper prior, the prior predictive density is improper.] Case (c): Consider an inverse gamma prior IG(ν, β) with density π(σ 2 ν, β) = (β ν /Γ(ν)) (σ 2 ) (ν+1) exp( β/σ 2 ), where ν, β are specified positive constants. Because T σ 2 N(, σ 2 ), under this prior the predictive density of T is then m π (t) = f T (t σ 2 )π(σ 2 ν, β)dσ 2 exp( 1 2σ 2 (β + t2 2 ))(σ2 ) (ν+1+1/2) dσ 2 (2β + t 2 ) (2ν+1)/2. If 2ν is an integer, the prior predictive distribution of T is T/( β/ν) t 2ν. Then p = P m π ( X x obs M ) ( = P m T n xobs ) π M β/ν β/ν ( n xobs ) = 2 1 F 2ν ( ), β/ν where F 2ν is the cdf of t 2ν. For n x o bs = 1.96 and various values of ν and β, the corresponding values of the prior predictive P -values are displayed in Table 2. Table 2: Normal Example: Prior Predictive P -values ν β P Further, note that P 1 as β for any fixed ν >. Thus the prior predictive P -value in this example depends crucially on the values of β and ν. 7

8 What we learn from this example is that if the prior π used is a poor choice, even an excellent model can come under suspicion based on prior predictive P -value. We have seen that an improper prior always produces an improper predictive density, which is an undesirable feature. To rectify these problems, modification has been suggested by replacing π in m π by π(η x obs ) to define posterior predictive density and posterior predictive P -value: m (x x obs ) = f(x η)π(η x obs )dη P = P m ( x obs ) (T (X) T (x obs )). Example 3. (Example 2 continued). We consider the noninformative prior π(σ 2 ) 1/σ 2 again. Then, as before, because T σ 2 N(, σ 2 ), and π(σ 2 x obs ) exp( 1 2σ 2 leading to the posterior predictive density of T as x 2 i )(σ 2 ) n+2 2, m π (t x obs ) = ( 1 + f T (t σ 2 )π(σ 2 x obs )dσ 2 (σ 2 ) 1/2 exp( t2 2σ 2 ) exp( 1 2σ 2 t 2 ) (n+1)/2. n x2 i x 2 i )(σ 2 ) n+2 2 dσ 2 This implies that the posterior predictive distribution of T is T n x2 i /n t n. Thus the posterior predictive P -value is P = P m π( x obs ) ( X x obs M ) ( = P m π( x obs ) T n x2 i /n x obs ) n x2 i /n M ( ( x obs ) ) = 2 1 F n n x2 i /n, where F n is the cdf of t n distribution. This definition of a Bayesian P -value doesn t seem satisfactory. Let x obs. Note that then P 2(1 F n ( n)). 8

9 Table 3: Values of p n = 2(1 F n ( n)) n p n Note that these values have no serious relationship with the observations and hence cannot be really used for model checking. Bayarri and Berger (1998) attributed this behavior to the double use of data. Here we have used x in computing the posterior distribution and the tail area probability of the posterior predictive distribution. In an effort to combine the desirable features of the prior predictive P -value and the posterior predictive P -value and eliminate the undesirable features, Bayarri and Berger introduce the conditional predictive P -value. This quantity is based on the prior predictive distribution m π but is more heavily influenced by by the model than the prior. Noninformative priors can be used and there is no double use of the data. In this approach an appropriate statistic U(X) not involving the model departure statistic T (X), is identified. The conditional predictive density m π (t u) is derived, and the conditional predictive P -value is defined as P c = P m( u obs) (T (X) T (x obs )), where u obs = U(x obs ). They considered the following example. Example 4. (Last example continued). Here T = n X is the model departure statistic for checking discrepancy of the mean in the normal model. Let U(X) = s 2 = n (X i X) 2 /n. Note that nu σ 2 σ 2 χ 2 n 1. Consider the noninformative prior π(σ 2 ) 1/σ 2. Then π(σ 2 s 2 ) (σ 2 ) (n 1)/2 1 exp( ns 2 /2(σ 2 )) is an inverse gamma density with shape parameter (n 1)/2. It can be checked that the conditional predictive density of T given s 2 obs is m π (t s 2 obs) = ( n f T (t σ 2 )π(σ 2 s 2 obs)dσ 2 t 2 ) n/2. s 2 obs Then the conditional predictive P -value is ( ( n 1 x obs ) ) P c = 2 1 F n 1. s obs We have found a Bayesian interpretation for the classical P -value from the usual t-test. Note that s 2 obs was used to produce the posterior distribution to eliminate σ2, and that 9

10 x obs was then used to compute the tail area probability. In this example, it is easy to identify U(X). In some problems it is not as easy, and also the computation of the conditional posterior predictive density is not straightforward. An alternative possibility is to use the partial posterior predictive P -value defined by P = P m ( ) (T (X) T (x obs )), where the predictive density m is obtained from a partial posterior density calculated from the conditional likelihood of X given T (X) = T (x obs ). The partial posterior π )(η) f X T (x obs t obs, η)π(η). For the normal example, check that f X X(x obs x obs, σ 2 ) (σ 2 ) (n 1)/2 exp( n 2σ 2 s2 obs). Thus, for π(σ 2 ) 1/σ 2, the π (σ 2 ) (σ 2 ) (n 1)/2 1 exp( n s 2 2σ 2 obs ). In this example, the partial predictive P -value is the same as the conditional predictive P -value. 4 Nonsubjective Bayes factors Consider two models M and M 1, under model M i the density for the data X is f i (x θ i ), θ i being an unknown parameter of dimension p i, i =, 1. Given the prior (proper) density g i (θ i ) for parameter θ i, the Bayes factor for M 1 relative to M is given by B 1 = m 1(x) m (x) = f1 (x θ 1 )g 1 (θ 1 )dθ 1 f (x θ 1 )g (θ )dθ, (12) where m i (x) is the marginal density of X under M i. If the priors g i (θ i ) cannot be subjectively specified, one tends to use a noninformative prior. There are difficulties with (12) for noninformative priors that are typically improper. If g i is improper, it is defined only up to a positive multiplicative constant c i, and c i g i has as much validity as g i. This means that (c 1 /c )B 1 has as much validity as B 1 as a Bayes factor. Thus the Bayes factor remains indeterminate for improper prior(s). This indeterminacy has been the main motivation of new objective methods. We will confine to the nested case where f and f 1 are of the same functional form and f (x θ ) is the same as the f 1 (x θ 1 ) with some of the coordinates of θ 1 specified. We show below that the use of a diffuse proper prior in place of an improper prior does not rectify the abovementioned deficiency of the Bayes factor. Example 5. (Testing normal mean with known variance.) Suppose we observe X = (X 1,..., X n ). Under M, X i are iid N(, 1) and under M 1, X i are iid N(θ, 1), θ real and 1

11 unknown. With the uniform noninformative prior g1 N (θ) = c for θ under M 1, the Bayes factor 2π B1 N 2 X = c exp(n n 2 ). For a uniform prior for θ over [ K, K] for large K, the new Bayes factor B K 1 satisfies B K 1 = BN 1 2Kc. Thus for large K, the Bayes factor B K 1 is biased against M 1. A similar conclusion is obtained if we use a diffuse proper prior N(, τ 2 ) with τ 2 large. The Bayes factor is B norm [ 1 = (nτ 2 + 1) 1/2 nτ 2 n exp X 2 ] nτ which is approximately (nτ 2 ) 1/2 exp[n X 2 /2] for large nτ 2. This can be made arbitrarily small by taking τ 2 arbitrarily large. This is clearly undesirable. A solution to this instability of a Bayes factor with a noninformative prior is to use part of the data as a training sample by dividing X = (X 1, X 2 ). We assume independence of X 1 and X 2. The first subset X 1 is treated as a training sample to convert a diffused (improper) prior into a proper posterior distribution for the parameters given X 1. For the the prior g i (θ i ), the (training) posterior is g i (θ i X 1 ) = f i(x 1 θ i )g i (θ i ) fi (X 1 θ i )g i (θ i )dθ i, i =, 1. These proper posteriors are then used as priors to compute the Bayes factor with the remaining data X 2. The conditional Bayes factor B 1 (X 1 ) conditioned on X 1, can be expressed as B 1 (X 1 ) = = f1 (X 2 θ 1 )g 1 (θ 1 X 1 )dθ 1 f (X 2 θ )g (θ X 1 )dθ f1 (X 2 θ 1 )f 1 (X 1 θ 1 )g 1 (θ 1 )dθ 1 /m 1 (X 1 ) f (X 2 θ )f (X 1 θ )g (θ )dθ / m (X 1 ) = m 1(X) m (X 1 ) m (X) m 1 (X 1 ) = B 1 m (X 1 ) m 1 (X 1 ), (13) where m i (X 1 ) is the marginal density of X 1 under M i, i =, 1. Note that if the priors c i g i, i =, 1 are used to compute B 1 (X 1 ), the arbitrary constant multiplier c 1 /c of B 1 is cancelled by (c /c 1 ) of m (X 1 )/m 1 (X 1 ) so that the indeterminacy of the Bayes factor is removed in (13). 11

12 It follows from the preceding discussion that X 1 may be used as a training sample if the corresponding posteriors g i (θ i X 1 ), i =, 1 are proper or, equivalently, the marginal densities m i (X 1 ) of X 1 under M i, i =, 1 are finite. Clearly, one should use the minimal amount of data as training sample and use the most of the data for model comparison. Berger and Pericchi and defined a minimal training sample if < m i (X 1 ) <, i =, 1 and the marginal is not finite for any subset of X 1. Example 6. Consider testing the normal mean equal to zero for known variance. Under the uniform noninformative prior g 1 (θ 1 ) = 1 under M 1, the minimal training samples are subsamples of size 1 with m (X i ) = (1/ 2π exp( X 2 i /2) and m 1 (X i ) = The intrinsic Bayes factor and the fractional Bayes factor We have described conditional Bayes factor B 1 (X 1 ) above corresponding to an improper prior through a minimal training sample X 1. However, this choice depends on X 1, for which there may be many possibilities. To remove arbitrariness, Berger and Pericchi suggested considering conditional Bayes factors for all possible training samples to define an intrinsic Bayes factor. If X(l), l = 1,..., L denote the list of all possible minimal training samples, they defined the arithmetic intrinsic Bayes factor (AIBF) as AIBF 1 = B 1 1 L L l=1 m (X(l)) m 1 (X(l)). (14) ( The geometric intrinsic Bayes factor GIBF 1 = B m ) (X(l)) 1/L. 1 m 1 (X(l)) A different solution for the model selection problem based on improper priors is due to O Hagan. He proposed using a fractional power of the likelihood to convert an improper prior into a proper posterior. Then used this posterior to combine with the other fraction of the likelihood to obtain the marginal densities under the models to create the Bayes factor. The resulting partial Bayes factor, called the fractional Bayes factor (FBF), is given by F BF 1 = m 1(X, b) m (X, b), where < b < 1 is appropriately chosen and m i (X, b) = f 1 b i (X θ i )fi b (X θ i )g i (θ i )dθ i = f b i (X θ i )g i (θ i )dθ i fi (X θ i )g i (θ i )dθ i f b i (X θ i )g i (θ i )dθ i. Note that F BF 1 can also be written as F BF 1 = B 1 m b (X) m b 1(X), 12

13 where m b i(x) = f b i (X θ i )g i (θ i )dθ i, i =, 1. To make FBF comparable with the IBF, we can take b = m/n, where m is the size of a minimal training sample. O Hagan also recommends other choices of b such as n/n or log n/n. Example 7. Consider testing the normal mean equal to zero for known variance. The Bayes factor with the noninformative prior g 1 (θ 1 ) = 1 is 2π 2 X B 1 = exp(n n 2 ). Hence, B 1 (X i ) = B 1 m (X i )/m 1 (X i ) = B 1 (1/ 2π) exp( X i 2 2 ). Thus AIBF 1 = n 1 B 1 (X i ) = n 3/2 exp(n X 2 /2) exp( X i 2 2 ), GIBF 1 = n 1/2 exp[n X 2 /2 (1/2n) Xi 2 ]. Note that for a fraction < b < 1, ( 1 ) bn m b b X (X) = exp[ i 2 ] 2π 2 ( 1 ) bn m b b (X 1(X) = i X) 2 2π exp[ ] 2π 2 bn m b 2 nb X nb = exp[ ] 2 2π. Hence the FBF is m b 1 F BF 1 = b 1/2 exp[n(1 b) X 2 /2] = n 1/2 exp[(n 1) X 2 /2], if b = 1/n. See Chapter 6, pp , of GDS for more examples. Exercise: Suppose X 1, X 2 are iid with a location-scale pdf f(x µ, σ) = 1 σ f(x µ ), < µ <, σ >. σ Show that 1 σ f(x 1 µ )f( x 2 µ 1 )dµdσ = 3 σ σ 2 x 1 x 2. Note: This result was discovered through simulations. 13

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Bayesian Assessment of Hypotheses and Models

Bayesian Assessment of Hypotheses and Models 8 Bayesian Assessment of Hypotheses and Models This is page 399 Printer: Opaque this 8. Introduction The three preceding chapters gave an overview of how Bayesian probability models are constructed. Once

More information

Bayesian tests of hypotheses

Bayesian tests of hypotheses Bayesian tests of hypotheses Christian P. Robert Université Paris-Dauphine, Paris & University of Warwick, Coventry Joint work with K. Kamary, K. Mengersen & J. Rousseau Outline Bayesian testing of hypotheses

More information

Divergence Based priors for the problem of hypothesis testing

Divergence Based priors for the problem of hypothesis testing Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Choosing among models

Choosing among models Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters Introduction Start with a probability distribution f(y θ) for the data y = (y 1,...,y n ) given a vector of unknown parameters θ = (θ 1,...,θ K ), and add a prior distribution p(θ η), where η is a vector

More information

An Overview of Objective Bayesian Analysis

An Overview of Objective Bayesian Analysis An Overview of Objective Bayesian Analysis James O. Berger Duke University visiting the University of Chicago Department of Statistics Spring Quarter, 2011 1 Lectures Lecture 1. Objective Bayesian Analysis:

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Bayesian Hypothesis Testing: Redux

Bayesian Hypothesis Testing: Redux Bayesian Hypothesis Testing: Redux Hedibert F. Lopes and Nicholas G. Polson Insper and Chicago Booth arxiv:1808.08491v1 [math.st] 26 Aug 2018 First draft: March 2018 This draft: August 2018 Abstract Bayesian

More information

Bayes Testing and More

Bayes Testing and More Bayes Testing and More STA 732. Surya Tokdar Bayes testing The basic goal of testing is to provide a summary of evidence toward/against a hypothesis of the kind H 0 : θ Θ 0, for some scientifically important

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

The Calibrated Bayes Factor for Model Comparison

The Calibrated Bayes Factor for Model Comparison The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University Joint work with Xinyi Xu, Pingbo Lu and Ruoxi Xu Supported by the NSF and NSA Bayesian Nonparametrics Workshop

More information

Statistical Theory MT 2006 Problems 4: Solution sketches

Statistical Theory MT 2006 Problems 4: Solution sketches Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Checking for Prior-Data Conflict

Checking for Prior-Data Conflict Bayesian Analysis (2006) 1, Number 4, pp. 893 914 Checking for Prior-Data Conflict Michael Evans and Hadas Moshonov Abstract. Inference proceeds from ingredients chosen by the analyst and data. To validate

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Why Try Bayesian Methods? (Lecture 5)

Why Try Bayesian Methods? (Lecture 5) Why Try Bayesian Methods? (Lecture 5) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ p.1/28 Today s Lecture Problems you avoid Ambiguity in what is random

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Bayes Factors for Goodness of Fit Testing

Bayes Factors for Goodness of Fit Testing Carnegie Mellon University Research Showcase @ CMU Department of Statistics Dietrich College of Humanities and Social Sciences 10-31-2003 Bayes Factors for Goodness of Fit Testing Fulvio Spezzaferri University

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

P Values and Nuisance Parameters

P Values and Nuisance Parameters P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Unified Frequentist and Bayesian Testing of a Precise Hypothesis

Unified Frequentist and Bayesian Testing of a Precise Hypothesis Statistical Science 1997, Vol. 12, No. 3, 133 160 Unified Frequentist and Bayesian Testing of a Precise Hypothesis J. O. Berger, B. Boukai and Y. Wang Abstract. In this paper, we show that the conditional

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Integrated Objective Bayesian Estimation and Hypothesis Testing

Integrated Objective Bayesian Estimation and Hypothesis Testing Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors

A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors Journal of Data Science 6(008), 75-87 A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors Scott Berry 1 and Kert Viele 1 Berry Consultants and University of Kentucky

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models D. Fouskakis, I. Ntzoufras and D. Draper December 1, 01 Summary: In the context of the expected-posterior prior (EPP) approach

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the

More information

STAT 740: Testing & Model Selection

STAT 740: Testing & Model Selection STAT 740: Testing & Model Selection Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 34 Testing & model choice, likelihood-based A common way to

More information

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Bayesian Model Comparison

Bayesian Model Comparison BS2 Statistical Inference, Lecture 11, Hilary Term 2009 February 26, 2009 Basic result An accurate approximation Asymptotic posterior distribution An integral of form I = b a e λg(y) h(y) dy where h(y)

More information

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010 Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Bayesian Asymptotics

Bayesian Asymptotics BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {

More information

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics

More information

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham NC 778-5 - Revised April,

More information

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Model comparison. Christopher A. Sims Princeton University October 18, 2016 ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research

More information

M. J. Bayarri and M. E. Castellanos. University of Valencia and Rey Juan Carlos University

M. J. Bayarri and M. E. Castellanos. University of Valencia and Rey Juan Carlos University 1 BAYESIAN CHECKING OF HIERARCHICAL MODELS M. J. Bayarri and M. E. Castellanos University of Valencia and Rey Juan Carlos University Abstract: Hierarchical models are increasingly used in many applications.

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

Harrison B. Prosper. CMS Statistics Committee

Harrison B. Prosper. CMS Statistics Committee Harrison B. Prosper Florida State University CMS Statistics Committee 08-08-08 Bayesian Methods: Theory & Practice. Harrison B. Prosper 1 h Lecture 3 Applications h Hypothesis Testing Recap h A Single

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

A noninformative Bayesian approach to domain estimation

A noninformative Bayesian approach to domain estimation A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

A REVERSE TO THE JEFFREYS LINDLEY PARADOX

A REVERSE TO THE JEFFREYS LINDLEY PARADOX PROBABILITY AND MATHEMATICAL STATISTICS Vol. 38, Fasc. 1 (2018), pp. 243 247 doi:10.19195/0208-4147.38.1.13 A REVERSE TO THE JEFFREYS LINDLEY PARADOX BY WIEBE R. P E S T M A N (LEUVEN), FRANCIS T U E R

More information

Bayesian Inference. Chapter 2: Conjugate models

Bayesian Inference. Chapter 2: Conjugate models Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

On the Bayesianity of Pereira-Stern tests

On the Bayesianity of Pereira-Stern tests Sociedad de Estadística e Investigación Operativa Test (2001) Vol. 10, No. 2, pp. 000 000 On the Bayesianity of Pereira-Stern tests M. Regina Madruga Departamento de Estatística, Universidade Federal do

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Ioannis Ntzoufras, Department of Statistics, Athens University of Economics and Business, Athens, Greece; e-mail: ntzoufras@aueb.gr.

More information

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016 Auxiliary material: Exponential Family of Distributions Timo Koski Second Quarter 2016 Exponential Families The family of distributions with densities (w.r.t. to a σ-finite measure µ) on X defined by R(θ)

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

On the use of non-local prior densities in Bayesian hypothesis tests

On the use of non-local prior densities in Bayesian hypothesis tests J. R. Statist. Soc. B (2010) 72, Part 2, pp. 143 170 On the use of non-local prior densities in Bayesian hypothesis tests Valen E. Johnson M. D. Anderson Cancer Center, Houston, USA and David Rossell Institute

More information

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions Communications for Statistical Applications and Methods 03, Vol. 0, No. 5, 387 394 DOI: http://dx.doi.org/0.535/csam.03.0.5.387 Noninformative Priors for the Ratio of the Scale Parameters in the Inverted

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

Bayes Factors for Discovery

Bayes Factors for Discovery Glen Cowan RHUL Physics 3 April, 22 Bayes Factors for Discovery The fundamental quantity one should use in the Bayesian framework to quantify the significance of a discovery is the posterior probability

More information

Lecture 10: Generalized likelihood ratio test

Lecture 10: Generalized likelihood ratio test Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual

More information

Divergence Based Priors for Bayesian Hypothesis testing

Divergence Based Priors for Bayesian Hypothesis testing Divergence Based Priors for Bayesian Hypothesis testing M.J. Bayarri University of Valencia G. García-Donato University of Castilla-La Mancha November, 2006 Abstract Maybe the main difficulty for objective

More information

Uncertain Inference and Artificial Intelligence

Uncertain Inference and Artificial Intelligence March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini

More information

A Bayesian solution for a statistical auditing problem

A Bayesian solution for a statistical auditing problem A Bayesian solution for a statistical auditing problem Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 July 2002 Research supported in part by NSF Grant DMS 9971331 1 SUMMARY

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information