1 Hypothesis Testing and Model Selection
|
|
- Drusilla Robertson
- 5 years ago
- Views:
Transcription
1 A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection For Bayesians, model selection and model criticisms are extremely important inference problems. These problems are usually more difficult than point estimation or credible set construction. Suppose X θ has the density f(x θ), with θ being an unknown element in the parameter space Θ. Suppose that we are interested in comparing two models M and M 1, which are given by M : X has density f(x θ) where θ Θ M 1 : X has density f(x θ) where θ Θ 1. (1) For i =, 1, let g i (θ) be the prior density of θ, conditional on M i being the true model. Then to compare models M and M 1 on the basis of a random sample x = (x 1,..., x n ) we use the Bayes factor B 1 (x) = m (x)/m 1 (x), where m i (x) = f(x θ)g i (θ)dθ, i =, 1. (2) Θ i We also often use the notation BF 1 to denote this Bayes factor. Recall from Module 1 that if π = P π (M ) = P π (Θ ) and π 1 = 1 π = P π (M 1 ), then the posterior probability of M is { P (M x) = π 1 B1 (x)} 1. (3) π Thus, if conditional prior densities g and g 1 can be specified, we simply use the Bayes factor B 1 for model selection. If further π is also specified, we can use the posterior probability of M i, or the posterior odds ratio of M to M 1 for model selection. Consider a different model checking problem now, that of testing for normality. In its simplest form, the problem can be stated as checking whether a given random sample X 1,..., X n arose from a population having the normal distribution. In the setup given above in (1), we may write it as M : X is N(µ, σ 2 ) with arbitrary µ and σ 2 > M 1 : X does not have the normal distribution. (4) 1
2 The above model selection problem looks quite different from (1) above, because M 1 does not constitute a parametric alternative. Here Bayes factor or the posterior odds will not work. But we will later introduce Bayesian p value which works. Laplace approximation of Bayes factor: We approximate the Bayes factor B 1 by using Laplace approximation to the two marginal densities. Recall the marginal density under the model M i is m i (x) = f(x θ i )g i (θ i )dθ i. If θ i denotes the posterior mode of θ i, then by Taylor expansion log{f(x θ i )g i (θ i )} log{f(x θ i )g i ( θ i )} 1 2 (θ i θ i ) T H θi (θ i θ i ), where is the second-derivative matrix of log{f(x θ H θi i )g i (θ i )} wrt θ i evaluated at θ i. Then by Laplace approximation we get { m i (x) f(x θ i )g i ( θ i ) exp 1 } 2 (θ i θ i ) T (θ H θi i θ i ) dθ i = f(x θ i )g i ( θ i )(2π) p i 2 H θi 1 2. (5) The Bayes factor is often reported in log-scale and 2 log(b 1 ) is used as a evidential measure to compare the support provided by the data x for M relative to M 1. Using the above approximation we get { } { } f(x θ ) g ( θ ) 2 log(b 1 ) 2 log + 2 log f(x θ 1 ) g 1 ( θ 1 ) { +(p p 1 ) log(2π) + log H θ1 H θ }. (6) If ˆθ i denotes the MLE of θ i under the model M i, then since θ i ˆθ i = O(1/n), by ignoring all terms of order O(n 1 ), we have the following alternative approximation: { } { } f(x ˆθ ) g (ˆθ ) 2 log(b 1 ) 2 log + 2 log f(x ˆθ 1 ) g 1 (ˆθ 1 ) { } +(p p 1 ) log(2π) + log Hˆθ1. (7) Hˆθ If Hˆθi denotes the per unit observation observed Fisher information matrix for θ i in model M i, then using log( Hˆθi ) = p i log(n)+log( Hˆθi ), an approximation to (7) correct to O(1) 2
3 is { } f(x ˆθ ) 2 log(b 1 ) 2 log (p p 1 ) log n. (8) f(x ˆθ 1 ) This is the approximate Bayes factor based on the Bayesian information criterion (BIC) due to Schwarz (1978). The term (p p 1 ) log n can be considered a penalty for using a more complex model. A related criterion is { } f(x ˆθ ) 2 log(b 1 ) 2 log 2(p p 1 ), (9) f(x ˆθ 1 ) which is based on the Akaike information criterion (AIC), namely, AIC = 2 log f(x ˆθ) 2p for a model f(x θ). The penalty for using a complex model is not as drastic as that in BIC. 2 P value and Posterior Probability of H as Measures of Evidence Against the Null One particular tool from classical statistics that is very widely used in applied sciences for model checking or hypothesis testing is the P value. The basic idea behind R.A. Fisher s (1973) original (1925) definition of P value had a great deal of appeal. It is the probability under a (simple) null hypothesis of obtaining a value of a test statistic that is at least as extreme as that observed in the sample data. Suppose that it is desired to test H : θ = θ versus H 1 : θ θ, (1) and that a classical significance test is available and is based on a test statistic T (X), large values of which are deemed to provide evidence against the null hypothesis. If data X = x is observed, with corresponding t = T (x), the P value then is α = P θ {T (X) T (x)}. Remark 1. Fisher meant P value to be used as a measure of degree of surprise in the data relative to H. This use of P value as a post-experimental or conditional measure of statistical evidence seems to have some intuitive justification. However, Bayesians 3
4 have raised various objections to the use of P value as an evidence against H. The P value appears to be too strict against H. To a Bayesian the posterior probability of H summarizes evidence against H. In many common testing problems, the P value is smaller than the posterior probability of H by an order of magnitude. The reason for this is that the P value ignores the likelihood of the data under the alternative and takes into account not only the observed deviation of the data from the null hypothesis as measured by the test statistic but also more significant deviations. We give a simple example below where the P value can be quite different from the posterior probability. Example 1. Suppose for known σ 2, X θ N(θ, σ 2 /n). We wish to test H : θ = θ versus H 1 : θ θ. Let T = n( X θ )/σ be the test statistic with an observed value t = n( x θ )/σ. It can be checked that the P value is given by α = 2[1 Φ(t)] where Φ( ) is the standard normal cdf. For the point null hypothesis, on the set {θ θ }, let θ have the density (g 1 ) of N(µ, τ 2 ). Then under H 1, marginally, X N(µ, τ 2 + n 1 σ 2 ). If ρ = (σ/ n)/τ and η = (θ µ)/τ, then (2πn 1 σ 2 ) 1/2 exp[ n ( x θ 2σ B 1 = 2 ) 2 ] (2π(τ 2 + n 1 σ 2 )) 1/2 exp[ ( x µ)2 ] 2(τ 2 +n 1 σ 2 ) = ( σ2 + nτ 2 [ ) 1/2 exp n { ( x θ ) 2 ( x }] µ)2 σ 2 2 σ 2 σ 2 + nτ 2 = [ 1 + ρ 2 exp t2 2 + n( x θ + θ µ) 2 ] = [ 1 + ρ 2 exp t2 2 + t2 2 = { 1 + ρ 2 exp t2 2 = { 1 + ρ 2 exp 1 2 Now, if we choose µ = θ, τ = σ and π = 1/2, we get 2(σ 2 + nτ 2 ) σ 2 σ 2 + nτ + n(θ µ)( x θ ) + n(θ µ) 2 ] 2 σ 2 + nτ 2 2(σ 2 + nτ 2 ) nτ 2 σ 2 + nτ + θ µ n( x θ ) σ 2 τ σ [ (t ρη) 2 ]} (1 + ρ 2 ) η2. B 1 = { 1 + ρ 2 exp 1 [ t 2 ]} 2 (1 + ρ 2 ) nτ σ 2 + nτ 2 + (θ µ) 2 2τ 2 nτ 2 σ 2 + nτ 2 } and P (H x) = (1 + B 1 ) 1. For various values of t and n, the different measures of evidence, α = P -value, B = Bayes factor, and P = P (H x) are displayed in the table below (taken from GDS). It may be noted that the posterior probability of H varies between 4 and 5 times the corresponding P -value which is an indication of how different these two measures of evidence can be. 4
5 Table 1: Normal Example: Measures of Evidence n t α B P B P B P B P B P B P Interval null hypotheses and one-sided tests Closely related to a sharp null hypothesis H : θ = θ is an interval null hypothesis H : θ θ ɛ. The conflict between the P -values and the posterior probabilities remain if ɛ is small. Clearly, the disagreement also depends on the sample size n. However, the situation is somewhat different for one-sided null and alternative hypotheses. If θ is the normal mean, then with a uniform prior a direct calculation shows that the P -value for testing H : θ θ versus H : θ > θ is equal to the posterior probability of H. In general, these two values are not the same, and one may be higher or smaller than the other depending on the family of densities in the model. 2.2 Jeffreys-Lindley Paradox Suppose X 1,..., X n are iid N(θ, σ 2 ), σ 2 known, and consider testing H : θ = θ versus H 1 : θ θ. We now show that for a fixed prior density the conflict between the P -values and the posterior probabilities of H is enhanced as n goes to infinity. This is known as Jeffreys-Lindley Paradox. Without loss of generality take θ =. Consider a uniform prior density over some interval ( a, a). The posterior probability of H given X is P (H X) = π exp[ n X 2 /(2σ 2 )], (11) K where π is the specified prior probability of H and K = π exp[ n X 2 /(2σ 2 )] + 1 π 2a a a exp[ n( X θ) 2 /(2σ 2 )]dθ. Suppose the data is such that X = z α σ/ n where z α is the 1 αth quantile of standard normal. Then X is significant at level α. Also, for sufficiently large n, X is well within ( a, a) because X tends to zero as n increases. This leads to a a exp[ n( X θ) 2 /(2σ 2 )]dθ = σ (2π/n) 5
6 and hence from (11) P (H X) = π exp( zα/2) 2 π exp( zα/2) 2 + (1 π ) σ (2π/n). 2a Thus P (H X) 1 as n whereas the P -value is equal to α for all n. This is known as Jeffreys-Lindley paradox. This phenomenon would continue to hold with flat enough prior in place of uniform. Indeed, P -values cannot be compared across sample sizes or across experiments. Even a frequentist tends to agree that the conventional values of the significance level α like α =.5 or.1 are too large for large sample sizes. Jeffreys-Lindley paradox shows that for inference about θ, P -values and Bayes factors may provide contradictory evidence and hence can lead to opposite decisions. The evidence against H in the P -values seems unrealistically high. 3 Bayesian P -value Bayes factors and more appropriately posterior probabilities of hypotheses are in principle correct tools to measure evidence for or against some hypotheses. But, they are often hard to compute or even impossible to compute in the event the alternative is vaguely specified or not specified at all. Bayesian P -values have been proposed to deal with such problems. Let M be a target model, and departure from this model be of interest. If, under this model, X has density f(x η), η Ξ, then for a Bayesian with prior π on η, m π (x) = f(x η)π(η)dη, the prior predictive distribution is the actual predictive distribution of Ξ X. Therefore, if a model departure statistic T (X) is available, then one can define the prior predictive P -value (or tail area under the predictive distribution) as p = P mπ {T (X) T (x obs ) M }, where x obs is the observed value of X. This quantity is heavily influenced by the prior distribution. Example 2. Let X 1,..., X n be a random sample from N(θ, σ 2 ). Suppose H : θ =. Case (a): If σ 2 is assumed to be known as σ, 2 using the statistic X, the predictive P -value is P = P [ n X > n x obs θ =, σ] 2 n xobs = 2Φ( ). σ If σ 2 highly underestimates the actual model variance, then P is very small and the evidence against H is overestimated. 6
7 Case (b): If σ 2 is unknown and assigned an improper prior π(σ 2 ) = 1/σ 2, then m π (x) = ( f X (x σ 2 )π(σ 2 )dσ 2 exp( 1 2σ 2 x 2 i ) n/2, x 2 i )(σ 2 n/2 dσ2 ) σ 2 which is an improper density, thus completely disallowing computation of the prior predictive P -value. [This is not surprising since with any improper prior, the prior predictive density is improper.] Case (c): Consider an inverse gamma prior IG(ν, β) with density π(σ 2 ν, β) = (β ν /Γ(ν)) (σ 2 ) (ν+1) exp( β/σ 2 ), where ν, β are specified positive constants. Because T σ 2 N(, σ 2 ), under this prior the predictive density of T is then m π (t) = f T (t σ 2 )π(σ 2 ν, β)dσ 2 exp( 1 2σ 2 (β + t2 2 ))(σ2 ) (ν+1+1/2) dσ 2 (2β + t 2 ) (2ν+1)/2. If 2ν is an integer, the prior predictive distribution of T is T/( β/ν) t 2ν. Then p = P m π ( X x obs M ) ( = P m T n xobs ) π M β/ν β/ν ( n xobs ) = 2 1 F 2ν ( ), β/ν where F 2ν is the cdf of t 2ν. For n x o bs = 1.96 and various values of ν and β, the corresponding values of the prior predictive P -values are displayed in Table 2. Table 2: Normal Example: Prior Predictive P -values ν β P Further, note that P 1 as β for any fixed ν >. Thus the prior predictive P -value in this example depends crucially on the values of β and ν. 7
8 What we learn from this example is that if the prior π used is a poor choice, even an excellent model can come under suspicion based on prior predictive P -value. We have seen that an improper prior always produces an improper predictive density, which is an undesirable feature. To rectify these problems, modification has been suggested by replacing π in m π by π(η x obs ) to define posterior predictive density and posterior predictive P -value: m (x x obs ) = f(x η)π(η x obs )dη P = P m ( x obs ) (T (X) T (x obs )). Example 3. (Example 2 continued). We consider the noninformative prior π(σ 2 ) 1/σ 2 again. Then, as before, because T σ 2 N(, σ 2 ), and π(σ 2 x obs ) exp( 1 2σ 2 leading to the posterior predictive density of T as x 2 i )(σ 2 ) n+2 2, m π (t x obs ) = ( 1 + f T (t σ 2 )π(σ 2 x obs )dσ 2 (σ 2 ) 1/2 exp( t2 2σ 2 ) exp( 1 2σ 2 t 2 ) (n+1)/2. n x2 i x 2 i )(σ 2 ) n+2 2 dσ 2 This implies that the posterior predictive distribution of T is T n x2 i /n t n. Thus the posterior predictive P -value is P = P m π( x obs ) ( X x obs M ) ( = P m π( x obs ) T n x2 i /n x obs ) n x2 i /n M ( ( x obs ) ) = 2 1 F n n x2 i /n, where F n is the cdf of t n distribution. This definition of a Bayesian P -value doesn t seem satisfactory. Let x obs. Note that then P 2(1 F n ( n)). 8
9 Table 3: Values of p n = 2(1 F n ( n)) n p n Note that these values have no serious relationship with the observations and hence cannot be really used for model checking. Bayarri and Berger (1998) attributed this behavior to the double use of data. Here we have used x in computing the posterior distribution and the tail area probability of the posterior predictive distribution. In an effort to combine the desirable features of the prior predictive P -value and the posterior predictive P -value and eliminate the undesirable features, Bayarri and Berger introduce the conditional predictive P -value. This quantity is based on the prior predictive distribution m π but is more heavily influenced by by the model than the prior. Noninformative priors can be used and there is no double use of the data. In this approach an appropriate statistic U(X) not involving the model departure statistic T (X), is identified. The conditional predictive density m π (t u) is derived, and the conditional predictive P -value is defined as P c = P m( u obs) (T (X) T (x obs )), where u obs = U(x obs ). They considered the following example. Example 4. (Last example continued). Here T = n X is the model departure statistic for checking discrepancy of the mean in the normal model. Let U(X) = s 2 = n (X i X) 2 /n. Note that nu σ 2 σ 2 χ 2 n 1. Consider the noninformative prior π(σ 2 ) 1/σ 2. Then π(σ 2 s 2 ) (σ 2 ) (n 1)/2 1 exp( ns 2 /2(σ 2 )) is an inverse gamma density with shape parameter (n 1)/2. It can be checked that the conditional predictive density of T given s 2 obs is m π (t s 2 obs) = ( n f T (t σ 2 )π(σ 2 s 2 obs)dσ 2 t 2 ) n/2. s 2 obs Then the conditional predictive P -value is ( ( n 1 x obs ) ) P c = 2 1 F n 1. s obs We have found a Bayesian interpretation for the classical P -value from the usual t-test. Note that s 2 obs was used to produce the posterior distribution to eliminate σ2, and that 9
10 x obs was then used to compute the tail area probability. In this example, it is easy to identify U(X). In some problems it is not as easy, and also the computation of the conditional posterior predictive density is not straightforward. An alternative possibility is to use the partial posterior predictive P -value defined by P = P m ( ) (T (X) T (x obs )), where the predictive density m is obtained from a partial posterior density calculated from the conditional likelihood of X given T (X) = T (x obs ). The partial posterior π )(η) f X T (x obs t obs, η)π(η). For the normal example, check that f X X(x obs x obs, σ 2 ) (σ 2 ) (n 1)/2 exp( n 2σ 2 s2 obs). Thus, for π(σ 2 ) 1/σ 2, the π (σ 2 ) (σ 2 ) (n 1)/2 1 exp( n s 2 2σ 2 obs ). In this example, the partial predictive P -value is the same as the conditional predictive P -value. 4 Nonsubjective Bayes factors Consider two models M and M 1, under model M i the density for the data X is f i (x θ i ), θ i being an unknown parameter of dimension p i, i =, 1. Given the prior (proper) density g i (θ i ) for parameter θ i, the Bayes factor for M 1 relative to M is given by B 1 = m 1(x) m (x) = f1 (x θ 1 )g 1 (θ 1 )dθ 1 f (x θ 1 )g (θ )dθ, (12) where m i (x) is the marginal density of X under M i. If the priors g i (θ i ) cannot be subjectively specified, one tends to use a noninformative prior. There are difficulties with (12) for noninformative priors that are typically improper. If g i is improper, it is defined only up to a positive multiplicative constant c i, and c i g i has as much validity as g i. This means that (c 1 /c )B 1 has as much validity as B 1 as a Bayes factor. Thus the Bayes factor remains indeterminate for improper prior(s). This indeterminacy has been the main motivation of new objective methods. We will confine to the nested case where f and f 1 are of the same functional form and f (x θ ) is the same as the f 1 (x θ 1 ) with some of the coordinates of θ 1 specified. We show below that the use of a diffuse proper prior in place of an improper prior does not rectify the abovementioned deficiency of the Bayes factor. Example 5. (Testing normal mean with known variance.) Suppose we observe X = (X 1,..., X n ). Under M, X i are iid N(, 1) and under M 1, X i are iid N(θ, 1), θ real and 1
11 unknown. With the uniform noninformative prior g1 N (θ) = c for θ under M 1, the Bayes factor 2π B1 N 2 X = c exp(n n 2 ). For a uniform prior for θ over [ K, K] for large K, the new Bayes factor B K 1 satisfies B K 1 = BN 1 2Kc. Thus for large K, the Bayes factor B K 1 is biased against M 1. A similar conclusion is obtained if we use a diffuse proper prior N(, τ 2 ) with τ 2 large. The Bayes factor is B norm [ 1 = (nτ 2 + 1) 1/2 nτ 2 n exp X 2 ] nτ which is approximately (nτ 2 ) 1/2 exp[n X 2 /2] for large nτ 2. This can be made arbitrarily small by taking τ 2 arbitrarily large. This is clearly undesirable. A solution to this instability of a Bayes factor with a noninformative prior is to use part of the data as a training sample by dividing X = (X 1, X 2 ). We assume independence of X 1 and X 2. The first subset X 1 is treated as a training sample to convert a diffused (improper) prior into a proper posterior distribution for the parameters given X 1. For the the prior g i (θ i ), the (training) posterior is g i (θ i X 1 ) = f i(x 1 θ i )g i (θ i ) fi (X 1 θ i )g i (θ i )dθ i, i =, 1. These proper posteriors are then used as priors to compute the Bayes factor with the remaining data X 2. The conditional Bayes factor B 1 (X 1 ) conditioned on X 1, can be expressed as B 1 (X 1 ) = = f1 (X 2 θ 1 )g 1 (θ 1 X 1 )dθ 1 f (X 2 θ )g (θ X 1 )dθ f1 (X 2 θ 1 )f 1 (X 1 θ 1 )g 1 (θ 1 )dθ 1 /m 1 (X 1 ) f (X 2 θ )f (X 1 θ )g (θ )dθ / m (X 1 ) = m 1(X) m (X 1 ) m (X) m 1 (X 1 ) = B 1 m (X 1 ) m 1 (X 1 ), (13) where m i (X 1 ) is the marginal density of X 1 under M i, i =, 1. Note that if the priors c i g i, i =, 1 are used to compute B 1 (X 1 ), the arbitrary constant multiplier c 1 /c of B 1 is cancelled by (c /c 1 ) of m (X 1 )/m 1 (X 1 ) so that the indeterminacy of the Bayes factor is removed in (13). 11
12 It follows from the preceding discussion that X 1 may be used as a training sample if the corresponding posteriors g i (θ i X 1 ), i =, 1 are proper or, equivalently, the marginal densities m i (X 1 ) of X 1 under M i, i =, 1 are finite. Clearly, one should use the minimal amount of data as training sample and use the most of the data for model comparison. Berger and Pericchi and defined a minimal training sample if < m i (X 1 ) <, i =, 1 and the marginal is not finite for any subset of X 1. Example 6. Consider testing the normal mean equal to zero for known variance. Under the uniform noninformative prior g 1 (θ 1 ) = 1 under M 1, the minimal training samples are subsamples of size 1 with m (X i ) = (1/ 2π exp( X 2 i /2) and m 1 (X i ) = The intrinsic Bayes factor and the fractional Bayes factor We have described conditional Bayes factor B 1 (X 1 ) above corresponding to an improper prior through a minimal training sample X 1. However, this choice depends on X 1, for which there may be many possibilities. To remove arbitrariness, Berger and Pericchi suggested considering conditional Bayes factors for all possible training samples to define an intrinsic Bayes factor. If X(l), l = 1,..., L denote the list of all possible minimal training samples, they defined the arithmetic intrinsic Bayes factor (AIBF) as AIBF 1 = B 1 1 L L l=1 m (X(l)) m 1 (X(l)). (14) ( The geometric intrinsic Bayes factor GIBF 1 = B m ) (X(l)) 1/L. 1 m 1 (X(l)) A different solution for the model selection problem based on improper priors is due to O Hagan. He proposed using a fractional power of the likelihood to convert an improper prior into a proper posterior. Then used this posterior to combine with the other fraction of the likelihood to obtain the marginal densities under the models to create the Bayes factor. The resulting partial Bayes factor, called the fractional Bayes factor (FBF), is given by F BF 1 = m 1(X, b) m (X, b), where < b < 1 is appropriately chosen and m i (X, b) = f 1 b i (X θ i )fi b (X θ i )g i (θ i )dθ i = f b i (X θ i )g i (θ i )dθ i fi (X θ i )g i (θ i )dθ i f b i (X θ i )g i (θ i )dθ i. Note that F BF 1 can also be written as F BF 1 = B 1 m b (X) m b 1(X), 12
13 where m b i(x) = f b i (X θ i )g i (θ i )dθ i, i =, 1. To make FBF comparable with the IBF, we can take b = m/n, where m is the size of a minimal training sample. O Hagan also recommends other choices of b such as n/n or log n/n. Example 7. Consider testing the normal mean equal to zero for known variance. The Bayes factor with the noninformative prior g 1 (θ 1 ) = 1 is 2π 2 X B 1 = exp(n n 2 ). Hence, B 1 (X i ) = B 1 m (X i )/m 1 (X i ) = B 1 (1/ 2π) exp( X i 2 2 ). Thus AIBF 1 = n 1 B 1 (X i ) = n 3/2 exp(n X 2 /2) exp( X i 2 2 ), GIBF 1 = n 1/2 exp[n X 2 /2 (1/2n) Xi 2 ]. Note that for a fraction < b < 1, ( 1 ) bn m b b X (X) = exp[ i 2 ] 2π 2 ( 1 ) bn m b b (X 1(X) = i X) 2 2π exp[ ] 2π 2 bn m b 2 nb X nb = exp[ ] 2 2π. Hence the FBF is m b 1 F BF 1 = b 1/2 exp[n(1 b) X 2 /2] = n 1/2 exp[(n 1) X 2 /2], if b = 1/n. See Chapter 6, pp , of GDS for more examples. Exercise: Suppose X 1, X 2 are iid with a location-scale pdf f(x µ, σ) = 1 σ f(x µ ), < µ <, σ >. σ Show that 1 σ f(x 1 µ )f( x 2 µ 1 )dµdσ = 3 σ σ 2 x 1 x 2. Note: This result was discovered through simulations. 13
7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationBayesian Assessment of Hypotheses and Models
8 Bayesian Assessment of Hypotheses and Models This is page 399 Printer: Opaque this 8. Introduction The three preceding chapters gave an overview of how Bayesian probability models are constructed. Once
More informationBayesian tests of hypotheses
Bayesian tests of hypotheses Christian P. Robert Université Paris-Dauphine, Paris & University of Warwick, Coventry Joint work with K. Kamary, K. Mengersen & J. Rousseau Outline Bayesian testing of hypotheses
More informationDivergence Based priors for the problem of hypothesis testing
Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and
More informationInvariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationAn Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationIntroduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters
Introduction Start with a probability distribution f(y θ) for the data y = (y 1,...,y n ) given a vector of unknown parameters θ = (θ 1,...,θ K ), and add a prior distribution p(θ η), where η is a vector
More informationAn Overview of Objective Bayesian Analysis
An Overview of Objective Bayesian Analysis James O. Berger Duke University visiting the University of Chicago Department of Statistics Spring Quarter, 2011 1 Lectures Lecture 1. Objective Bayesian Analysis:
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationBayesian Hypothesis Testing: Redux
Bayesian Hypothesis Testing: Redux Hedibert F. Lopes and Nicholas G. Polson Insper and Chicago Booth arxiv:1808.08491v1 [math.st] 26 Aug 2018 First draft: March 2018 This draft: August 2018 Abstract Bayesian
More informationBayes Testing and More
Bayes Testing and More STA 732. Surya Tokdar Bayes testing The basic goal of testing is to provide a summary of evidence toward/against a hypothesis of the kind H 0 : θ Θ 0, for some scientifically important
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationThe Calibrated Bayes Factor for Model Comparison
The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University Joint work with Xinyi Xu, Pingbo Lu and Ruoxi Xu Supported by the NSF and NSA Bayesian Nonparametrics Workshop
More informationStatistical Theory MT 2006 Problems 4: Solution sketches
Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine
More informationST 740: Model Selection
ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationChapter 5. Bayesian Statistics
Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective
More informationStatistical Theory MT 2007 Problems 4: Solution sketches
Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationChecking for Prior-Data Conflict
Bayesian Analysis (2006) 1, Number 4, pp. 893 914 Checking for Prior-Data Conflict Michael Evans and Hadas Moshonov Abstract. Inference proceeds from ingredients chosen by the analyst and data. To validate
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationWhy Try Bayesian Methods? (Lecture 5)
Why Try Bayesian Methods? (Lecture 5) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ p.1/28 Today s Lecture Problems you avoid Ambiguity in what is random
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationBayes Factors for Goodness of Fit Testing
Carnegie Mellon University Research Showcase @ CMU Department of Statistics Dietrich College of Humanities and Social Sciences 10-31-2003 Bayes Factors for Goodness of Fit Testing Fulvio Spezzaferri University
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationBayesian Inference: Concept and Practice
Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of
More informationP Values and Nuisance Parameters
P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationDefault priors and model parametrization
1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationUnified Frequentist and Bayesian Testing of a Precise Hypothesis
Statistical Science 1997, Vol. 12, No. 3, 133 160 Unified Frequentist and Bayesian Testing of a Precise Hypothesis J. O. Berger, B. Boukai and Y. Wang Abstract. In this paper, we show that the conditional
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective
More informationIntegrated Objective Bayesian Estimation and Hypothesis Testing
Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm
More informationHypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006
Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)
More informationA Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors
Journal of Data Science 6(008), 75-87 A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors Scott Berry 1 and Kert Viele 1 Berry Consultants and University of Kentucky
More informationStandard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j
Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )
More informationPower-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models
Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models D. Fouskakis, I. Ntzoufras and D. Draper December 1, 01 Summary: In the context of the expected-posterior prior (EPP) approach
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the
More informationSTAT 740: Testing & Model Selection
STAT 740: Testing & Model Selection Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 34 Testing & model choice, likelihood-based A common way to
More informationMODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY
ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference
More informationBayesian Model Comparison
BS2 Statistical Inference, Lecture 11, Hilary Term 2009 February 26, 2009 Basic result An accurate approximation Asymptotic posterior distribution An integral of form I = b a e λg(y) h(y) dy where h(y)
More informationStat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010
Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationBayesian Asymptotics
BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {
More informationA union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling
A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics
More informationSpatial Statistics Chapter 4 Basics of Bayesian Inference and Computation
Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationStable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence
Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham NC 778-5 - Revised April,
More informationModel comparison. Christopher A. Sims Princeton University October 18, 2016
ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research
More informationM. J. Bayarri and M. E. Castellanos. University of Valencia and Rey Juan Carlos University
1 BAYESIAN CHECKING OF HIERARCHICAL MODELS M. J. Bayarri and M. E. Castellanos University of Valencia and Rey Juan Carlos University Abstract: Hierarchical models are increasingly used in many applications.
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More information40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology
Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson
More informationRecall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n
Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible
More informationHarrison B. Prosper. CMS Statistics Committee
Harrison B. Prosper Florida State University CMS Statistics Committee 08-08-08 Bayesian Methods: Theory & Practice. Harrison B. Prosper 1 h Lecture 3 Applications h Hypothesis Testing Recap h A Single
More informationChapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1
Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in
More informationA noninformative Bayesian approach to domain estimation
A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationInference for a Population Proportion
Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist
More informationA REVERSE TO THE JEFFREYS LINDLEY PARADOX
PROBABILITY AND MATHEMATICAL STATISTICS Vol. 38, Fasc. 1 (2018), pp. 243 247 doi:10.19195/0208-4147.38.1.13 A REVERSE TO THE JEFFREYS LINDLEY PARADOX BY WIEBE R. P E S T M A N (LEUVEN), FRANCIS T U E R
More informationBayesian Inference. Chapter 2: Conjugate models
Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationOn the Bayesianity of Pereira-Stern tests
Sociedad de Estadística e Investigación Operativa Test (2001) Vol. 10, No. 2, pp. 000 000 On the Bayesianity of Pereira-Stern tests M. Regina Madruga Departamento de Estatística, Universidade Federal do
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationPower-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models
Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Ioannis Ntzoufras, Department of Statistics, Athens University of Economics and Business, Athens, Greece; e-mail: ntzoufras@aueb.gr.
More informationModern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016
Auxiliary material: Exponential Family of Distributions Timo Koski Second Quarter 2016 Exponential Families The family of distributions with densities (w.r.t. to a σ-finite measure µ) on X defined by R(θ)
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationBayesian Econometrics
Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationPart 2: One-parameter models
Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes
More informationOn the use of non-local prior densities in Bayesian hypothesis tests
J. R. Statist. Soc. B (2010) 72, Part 2, pp. 143 170 On the use of non-local prior densities in Bayesian hypothesis tests Valen E. Johnson M. D. Anderson Cancer Center, Houston, USA and David Rossell Institute
More informationNoninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions
Communications for Statistical Applications and Methods 03, Vol. 0, No. 5, 387 394 DOI: http://dx.doi.org/0.535/csam.03.0.5.387 Noninformative Priors for the Ratio of the Scale Parameters in the Inverted
More informationCh. 5 Hypothesis Testing
Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,
More informationBayes Factors for Discovery
Glen Cowan RHUL Physics 3 April, 22 Bayes Factors for Discovery The fundamental quantity one should use in the Bayesian framework to quantify the significance of a discovery is the posterior probability
More informationLecture 10: Generalized likelihood ratio test
Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual
More informationDivergence Based Priors for Bayesian Hypothesis testing
Divergence Based Priors for Bayesian Hypothesis testing M.J. Bayarri University of Valencia G. García-Donato University of Castilla-La Mancha November, 2006 Abstract Maybe the main difficulty for objective
More informationUncertain Inference and Artificial Intelligence
March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini
More informationA Bayesian solution for a statistical auditing problem
A Bayesian solution for a statistical auditing problem Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 July 2002 Research supported in part by NSF Grant DMS 9971331 1 SUMMARY
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More information