Objective Bayesian and fiducial inference: some results and comparisons. Piero Veronese and Eugenio Melilli Bocconi University, Milano, Italy

Size: px
Start display at page:

Download "Objective Bayesian and fiducial inference: some results and comparisons. Piero Veronese and Eugenio Melilli Bocconi University, Milano, Italy"

Transcription

1 Objective Bayesian and fiducial inference: some results and comparisons Piero Veronese and Eugenio Melilli Bocconi University, Milano, Italy Abstract Objective Bayesian analysis and fiducial inference are both attempts to derive probabilities for statements concerning the value of an unknown parameter, so that it is natural to inquire on their relationships. While results in this direction exist for univariate parameter models, where the objective analysis is usually based on the Jeffreys priors, little work has been done for the multivariate case, in which reference priors are typically employed. In this paper we propose a way to construct fiducial distributions, in quite general models, and show that they have many similarities with objective posteriors. First, the fiducial distributions and the reference priors are both constructed using a step-by-step conditional procedure which induces an order on the inferential importance of the components of the parameter. Second, in most cases fiducial distributions and reference posteriors coincide, as it happens for the location-scale models. The proposed procedure is shown to be easier when the model belongs to a particular subclass of the natural exponential family, called conditionally reducible, which includes the multinomial and the negative-multinomial models. In this class we characterize the models for which the fiducial distribution can be seen as a posterior and show that the corresponding prior belongs to the enriched conjugate family and coincides with the reference prior. Finally the asymptotic normality of the fiducial distribution is proved, obtaining the same result holding for regular Bayesian posteriors. Keywords: conditional reducibility, confidence distribution, Jeffreys prior, locationscale parameter model, multinomial model, natural exponential family, quadratic variance function, reference prior. 1 Introduction Objective Bayesian analysis, see e.g. Berger (2006) and Berger, Bernardo & Sun (2015), fiducial inference, see e.g. Fisher (1973) and Hannig (2009, 2013) and confidence distribution theory, see e.g. Schweder & Hjort (2002, 2015), Singh et al. (2005) and Xie & Singh (2013), are in some sense all attempts to face the same problem: to construct a (possibly formal) distribution on the parameter space depending only on the observed 1

2 data. Undoubtedly the leading and more natural approach towards this problem is the first. Indeed, objective Bayesian analysis is a recent name for a very old problem: how to perform a good Bayesian inference, especially for moderate sample size, when one is unwilling or unable to asses a subjective prior. Under this approach, the prior distribution is derived directly from the model and thus is labeled as objective. The reference prior, introduced by Bernardo (1979) and developed by Berger & Bernardo (1992), is the most successful default prior proposed in the literature. It has been deeply studied and its good properties, in particular with respect to the frequentist coverage of confidence sets, are well known. Typically, for a real parameter indexing a regular model, the reference prior coincides with the prior obtained by the Jeffreys rule, which is based on the Fisher information. For a multidimensional parameter the reference prior depends on the grouping and ordering of the components of the parameter and, in general, no longer coincides with the Jeffreys prior, which is known to be unsatisfactory in this context. Unfortunately, the reference prior is generally not simple to derive. Fiducial distributions, after having been introduced by Fisher (1930, 1935) and widely discussed (and criticized) in the subsequent years, have been de facto brushed aside for a long time and only recently they obtained new vitality. Originally Fisher considered a continuous sufficient statistics S with distribution function F θ, depending on a real parameter θ. Let q α (θ) denote the quantile of order α of F θ and let s be a realization of S. If q α (θ) is increasing in θ (i.e., F θ is decreasing in θ), the statement s < q α (θ) is equivalent to θ > qα 1 (s) and thus Fisher assumes qα 1 (s) as the quantile of order 1 α of the fiducial distribution. The set of all quantiles qα 1 (s), α (0, 1), establishes the fiducial distribution function H s (θ) and the corresponding density h s (θ), given by H s (θ) = 1 F θ (s) and h s (θ) = θ F θ(s). (1) Of course H s and h s must be properly modified if F θ is increasing in θ. Fisher (1973, cap.vi) provided some examples of continuous multivariate fiducial distributions obtained by a step-by-step procedure, but he never developed a general and rigorous theory. This fact, with the problems to cover discrete and/or multiparameter models, with some inconsistencies of the fiducial distribution (e.g. the marginalization paradox, see Dawid & Stone, 1982), and the difficulties in its interpretation gave rise to a quite strong negative attitude towards Fisher proposal. It is interesting to notice that one of such inconsistencies, i.e. the lack of invariance of the fiducial distribution with respect to a reparametrization of the model, see Dempster (1993), can be explained by introducing the notion of inferential importance of the parameters, similarly to what happens for reference priors, see e.g. Bernardo & Smith (1994). In the renewed interest for the fiducial approach a relevant role is played by the gen- 2

3 eralized fiducial inference introduced and developed by Hannig (2009, 2013). He provides a formal and mathematically rigorous definition which has a quite general applicability. The crucial element of his definition is a data-generating equation X = G(U, θ), linking the unknown parameter θ and the observed data X through a random element U with known distribution. Roughly speaking, by shifting the randomness induced by U from X to θ (i.e., by inverting G with respect to θ after fixing X = x), the distribution given by the statistical model leads to a distribution for the parameter θ. Contrary to the original idea of Fisher, the generalized fiducial distribution is non-unique and Hannig widely discusses this point. Applications to different statistical models can be found for instance in Hannig et al. (2007), Hannig & Iyer (2008) and Wandler & Hannig (2012). Other recent contributions to the topic of fiducial distributions are given by Taraldsen & Lindqvist (2013), who discuss optimality procedures, by Martin & Liu (2013), who attempt to define a quite general framework for inference with satisfactory long-run behavior and by Veronese & Melilli (2015), henceforth V&M. In this last paper the authors, according to the original Fisher idea, derive fiducial distributions for both discrete and continuous real natural exponential families (NEFs), and discuss some of their properties with particular emphasis on the frequentist coverage of the fiducial intervals. Historically, confidence distributions have typically been constructed by inverting the upper limits of lower sided confidence intervals and have often been associated with a fiducial interpretation. Recently, in Schweder & Hjort (2002) and Singh et al. (2005), a modern definition has been proposed. Confidence distribution theory can be seen as a frequentist setting in which both objective bayesian posteriors and fiducial distributions can be studied and compared. Despite their general importance, confidence distributions are not crucial for the specific aims tackled here. We will return on this topic in Section 5 in connection with possible further developments. The present paper does not discuss the philosophical bases of the different approaches, but want to show how and when they lead to conclusions which are identical, or similar, in several standard situations. Our first goal is to suggest a simple way to construct a (unique) fiducial distribution for sufficiently general discrete and continuous multiparameter models. The proposal uses the step-by-step idea in Fisher (1973) jointly with the results about fiducial distributions for real NEFs proved in V&M. The key-point of the construction is the procedure by conditioning: the distribution of the data is factorized as a product of one-dimensional laws and, for each of these, the fiducial density of a real parameter, possibly conditional on other parameters, is obtained. The joint fiducial distribution of the overall parameter is then defined as the product of the (conditional) one-dimensional fiducial laws. Assumptions on the statistical model needed to implement this procedure are discussed. To clarify the underlying idea, consider the 3

4 very standard example concerning the normal model with both mean µ and variance σ 2 unknown. The joint distribution function of the sufficient statistics X = n i=1 X i/n and S 2 = n i=1 (X i X) 2 /n can be factorized as F X,S 2 ( x, s 2 ) = F X S 2 ( x s 2 )F µ,σ 2 µ,σ 2 σ S2 (s 2 ) 2 and the fiducial distribution for µ and σ 2 can be constructed as the product of the two (univariate) fiducial distributions for σ 2 and for µ given σ 2, derived from Fσ S2 and F X S 2 2 µ,σ 2 respectively. Of course in this case X and S 2 are independent, but this is not crucial in our context. This joint fiducial distribution coincides with the posterior obtained with the reference prior, but is different from that derived from the Jeffreys rule. One of the new aspect of our construction is that it explicitly recognize the relevance of the ordering of the components of the parameter in terms of their inferential importance as it happens in the theory of reference priors. Thus the second goal of the paper is to inquire the relationships between the objective Bayesian posteriors and the suggested fiducial distributions. Lindley (1958) was the first to discuss these connections. Specifically, he proved that for models with a real parameter θ and a real sufficient statistic S, the fiducial distribution and the posterior coincide if and only if there exist transformations of S and θ which allow to rewrite the sampling distribution as a location model. V&M extend this result, within the real NEFs, to discrete models, characterizing all families admitting a fiducial prior, i.e. a prior leading to a posterior equal to the fiducial distribution. This prior, when properly defined for discrete families, coincides with Jeffreys prior. We show here, through several standard examples presenting a nuisance parameter, that a similar relationship exists with the reference priors instead of the Jeffreys ones. In particular, we study models belonging to the class of the so-called conditionally reducible NEF, see Consonni & Veronese (2001). This class of models can be indexed by a suitable parameter which strongly simplifies our procedure because its components are independent under the fiducial distribution. Furthermore, we characterize the conditionally reducible NEFs for which a fiducial prior exists and show that it belongs to the enriched conjugate family. Finally, we prove the asymptotic normality of the fiducial distributions for this class obtaining the same well known result concerning Bayesian posteriors under regularity conditions. The paper is structured as follows. Section 2 collects some basic properties and results on NEFs, including a brief review on the conditionally reducible NEFs. After recalling in Section 3.1 the main results on the fiducial distributions for real NEFs stated in V&M, Section 3.2 presents a proposal to construct a multivariate fiducial distribution in a quite general context. In Section 3.3 we discuss the relationships between the fiducial distributions and the reference posteriors, which are illustrated in Section 3.4 through several classical examples, including a general result on location-scale parameter models. Section 4 establishes fiducial distributions for conditionally reducible 4

5 NEFs and provides their expression for a particular subclass, including multinomial and negative-multinomial models. Families which admit a fiducial prior are characterized and relationships with reference prior are shown. Section 4.3 deals with asymptotic normality. Section 5 highlights the role of confidence distributions as a broad setting in which both objective Bayesian and fiducial inference can be considered and suggests other possible developments. Finally, Appendix A1 collects some useful technical results on conditionally reducible NEFs, Appendix A2 briefly presents the notion of enriched conjugate family while Appendix A3 includes the proofs of all the propositions and the theorems stated in the paper. 2 Preliminaries on natural exponential families 2.1 Basic results This subsection reviews some basic facts about exponential families. For a general treatment see Barndorff-Nielsen (1978), Brown (1986) and, for a Bayesian perspective, Gutiérrez-Peña & Smith (1997). Let ν be a σ-finite positive measure on the Borel sets of R d. Suppose ν is not concentrated on an affine subspace of R d and consider a family F of distributions whose densities with respect to ν are of the form { d } p θ (x) = exp θ k x k M(θ), θ = (θ 1..., θ d ) Θ, x = (x 1,..., x d ) R d, (2) with Θ R d nonempty. When the interior of the natural parameter space N = {θ R d : exp{ d θ kx k } ν(dx) < } is open and Θ = N the family F is said to be a regular natural exponential family (NEF). In the sequel we will consider only regular NEFs. Any NEF can be reparameterized in terms of the mean parameter µ = (µ 1,..., µ d ), where µ = µ(θ) = M(θ)/ θ, because µ( ) is a one-to-one differentiable map from Θ onto Ω = µ(θ). Notice that, for a regular NEF, Ω coincides with the interior of the convex hull of the support of ν. The matrix-valued function V(µ), whose ij-th element is 2 M(θ) θ i θ j, θ=θ(µ) is called the variance function of the family F. NEF F. The pair (V( ), Ω) characterizes the When the family F is real, its variance function is said to be quadratic if V (µ) = Qµ 2 + Lµ + C, for some Q, L, C R such that V (µ) > 0 for all µ Ω. The class of real NEFs with quadratic variance function includes some of the most widely used families of distributions, such as the normal (with known variance), binomial, Poisson, gamma (with known shape parameter) and negative-binomial, see Morris (1982, 1983). 5

6 When F is defined on R d, the notion of quadratic variance function can be extended in various ways (Letac, 1991). An important case is given by the simple quadratic variance function (SQVF) V(µ) whose ij-th element is V ij (µ) = qµ i µ j + d µ k L (k) ij + C ij, where q is a real constant and L (k), k = 1,..., d and C are constant d d symmetric matrices. Casalis (1996) showed that any NEF-SQVF can be obtained, via a nonsingular affine transformation, from one of the basic families: Poisson/normal, multinomial, negativemultinomial, negative-multinomial/gamma/normal and negative-multinomial/hyperbolicsecant. The ij-th element of the variance function V(µ) of a basic NEF-SQVF is V ij (µ) = qµ i µ j + d where q R, L (0) and C are d d constant symmetric matrices. L (0) ik µ k + C ij, (3) Consider now n random vectors X 1,..., X n independent and identically distributed (i.i.d.) according to the density p θ (x) in (2). Then S n = n i=1 X i is the minimal sufficient statistic for the sample and its density, with respect to the convolution measure ν n of ν, is { d } p n,θ (s) = exp θ k s k nm(θ), θ Θ. (4) Thus, S n is still distributed according to a NEF with natural parameter θ; the corresponding distribution function will be denoted by F n,θ. 2.2 Conditionally reducible natural exponential families A relevant role in our development of fiducial inference for multivariate parameters is played by the so-called conditionally reducible NEFs (in the sequel cr-nefs), introduced in Consonni & Veronese (2001). We give here a brief overview, considering only a particular case of cr-nefs, and refer to the Appendix A1 and to the cited paper for more technical and general results. In the following, given a vector y = (y 1,..., y d ), we denote by y [k] the sub-vector (y 1,..., y k ), k = 1,..., d 1. Let X be a random vector distributed according to a NEF F on R d, whose density with respect to ν is given in (2). F is a cr-nef if, for each k = 1,..., d, the conditional distribution of X k given X [k 1] = x [k 1] is a real exponential family with respect to a 6

7 suitable transition kernel. Notice that, for k = 1, we have the marginal distribution of X 1. Thus the joint density of a cr-nef can be factorized as follows: p θ (x 1,..., x d θ) = p ϕ (x 1,..., x d ϕ(θ)) = = p ϕk (x k x [k 1] ; ϕ k (θ)) exp { ϕ k (θ)x k M k (ϕ k (θ); x [k 1] ) }, (5) where ϕ = (ϕ 1,..., ϕ d ) is a one-to-one function from Θ onto ϕ(θ) = Φ. Furthermore, it can be shown that Φ = Φ 1 Φ d, with ϕ k Φ k, k = 1,..., d, so that the ϕ k s are variation independent. The parameter ϕ k represents the natural parameter associated with the k-th conditional distribution. All basic NEF-SQVFs are cr-nefs; their structure and the relationships among different parametrizations are given in Consonni & Veronese (2001, Table 1). Example 1 (Multinomial family). Consider the vector X distributed according to the multinomial distribution on R d { d } ( ) N p θ (x) = exp θ k x k M(θ), (6) x 1,..., x d+1 where: M(θ) = N ln(1 + d eθ k), Θ = R d, x d+1 = N d x k, d x k N, x k is a non-negative integer and θ k = log(p k /(1 d r=1 p r)), with p k the probability of the k-th outcome, k = 1,..., d. It is well known that the conditional distribution of X k (X [k 1] = x [k 1] ), k = 2,..., d, is Bi(N k 1 j=1 x j, p k /(1 k 1 j=1 p j)), whereas the marginal distribution of X 1 is Bi(N, p 1 ), where Bi(n, p) denotes the binomial distribution with n trials and success probability p. Since the binomial family is a NEF with natural parameter equal to the logit of the probability of success, one can factorize the family as in (5) with ϕ k = log p k 1 k j=1 p, ϕ k Φ k = R, k = 1,..., d. (7) j Notice that the parameter ϕ is specific to a given order of the vector components. Thus, considering a permutation of (X 1,..., X d ), the resulting distribution is still multinomial and thus conditionally reducible, but with a different ϕ-parameterization. This aspect is important in the construction of both the reference priors and the fiducial distributions, as we will see. 2.3 Objective bayesian inference for natural exponential families As recalled in Section 1, the reference prior is the most successful prior used in the objective Bayesian inference. Typically, for a real parameter admitting an asymptotically 7

8 normal posterior distribution, it coincides with the prior obtained by the well known Jeffreys rule. This happens for a real NEF F, with density (2), for which the Jeffreys prior π J for θ is π J (θ) I(θ) 1/2 = [M (θ)] 1/2 where ( ) 2 I(θ) = E θ θ 2 log p θ(x) = M (θ) represents the Fisher information. The Jeffreys prior is invariant under a reparameterization of the model and for real NEFs with quadratic variance function it belongs to the standard conjugate family, π J (θ) exp{θs n M(θ)}, with s = L/2 and n = Q, where L and Q are the coefficients defined in V (µ), see Gutiérrez-Peña & Smith (1997). For a multidimensional parameter the reference prior π R depends on the grouping and on the ordering of its components and, in general, it no longer coincides with π J, which is known to be unsatisfactory. The reference prior is not simple to derive in a general context, but for cr-nef-sqvfs Consonni et al. (2004, Prop. 2) found a simple and general formula for π R (ϕ). Furthermore, this is invariant with respect to the ordering of the groups and belongs to the enriched standard conjugate family, defined in Consonni & Veronese (2001). This aspect is important for several reasons, not least because the computation of the posterior can be obtained directly modifying the hyper-parameters of the family, see Appendix A2 for a very basic review on this topic. If F is one of the basic NEF-SQVFs, then the d-group reference prior for ϕ is π R (ϕ) { } 1 exp 2 ϕ kz k + qb k (ϕ k ), (8) where z k = L (0) kk is the k-th element of the diagonal of the matrix L(0) given in (3), and B k (ϕ k ) is defined in (39). The following proposition, besides being useful to simplify the computation of the reference priors for cr-nefs, also establishes a connections between reference and Jeffreys priors and, in our context, between objective Bayesian and fiducial inference. Proposition 1 Let F be a cr-nef on R d, with the k-th diagonal element in the Fisher information matrix given by I kk (ϕ) = a k (ϕ k )b k (ϕ [k 1] ). Then the d-group (order-invariant) reference prior π R for ϕ = (ϕ 1,..., ϕ d ) is π R (ϕ) = πk J (ϕ k) (a k (ϕ k )) 1/2, (9) where π J k (ϕ k) is the Jeffreys priors obtained from the conditional distribution of X k given X [k 1] = x [k 1]. 8

9 Finally, we recall that the reference priors are not invariant under arbitrary reparameterizations of the model. Thus, given the reference prior for ϕ, in general it is not possible to recover the reference prior for an alternative parameter of the model, λ say, via the standard change-of-variable technique. The procedure is correct only when the Jacobian matrix of the transformation from ϕ to λ is lower triangular, i.e. λ 1 = g 1 (ϕ 1 ), λ 2 = g 2 (ϕ 1, ϕ 2 ),..., see Yang (1995) and Datta & Ghosh (1996). This result is reasonable for such transformations preserve the order of inferential importance of the parameters, which is a crucial aspect of the reference priors. Notice that the transformation from ϕ to µ is lower triangular, as shown in (40), and thus π R (µ) can be easily derived from π R (ϕ). However, the reference priors for µ is not order-invariant, unlike that for ϕ. Example 1 (ctd.). As previously noted, a multinomial distribution can be factorized as a product of (conditional) binomial distributions with natural parameter ϕ k. Thus it is immediate to verify that the Jeffreys prior π J k for ϕ k is π J k (ϕ k) eϕ k/2 1 + e ϕ k, ϕ k R. The reference prior for ϕ, obtained by (8) noting that z k = 1 and B k (ϕ k ) = N log(1+e ϕ k), see Consonni & Veronese (2001, Table 1), is π R (ϕ) { } 1 exp 2 ϕ k log(1 + e ϕ k ) = e ϕ k/2 1 + e ϕ k, ϕ Rd, (10) and coincides with d πj k (ϕ k) as proved in Proposition 1. Because the cell-probabilities parameter is proportional to the mean parameter, p = Nµ, we can compute directly the reference prior on p = (p 1,..., p d ), with this order, from (10). Using (7), and noting that the Jacobian of the transformation is J ϕ (p) = d i=1 p 1 i (1 d i=1 p i) 1, we have 1/2 k π R (p) p k 1 p j. (11) The reference prior π R for p is a generalized Dirichlet distribution and belongs to the enriched conjugate family for the multinomial model, see Consonni & Veronese (2001). j=1 3 Fiducial distributions: a proposal of construction and some examples In this section we introduce and discuss, with a particular eye towards connections and similarities with objective Bayesian analysis, a way to construct fiducial distributions 9

10 for some multiparameter statistical models. First we recall some results holding for real NEFs. 3.1 Fiducial distributions for real NEFs The fiducial distributions for real NEFs have been constructed in a quite simple and direct way by V&M starting from a result in Petrone & Veronese (2010). Given a sufficient statistic S n with distribution function F n,θ (s), belonging to a real NEF F with density (4) and support S n, let a n = inf S n, b n = sup S n and define S n = [a n, b n ) if ν n (a n ) > 0, otherwise Sn = (a n, b n ). Then, for s Sn, 0 θ inf Θ H n,s (θ) = 1 F n,θ (s) inf Θ < θ < sup Θ 1 θ sup Θ is a fiducial distribution function (according to Fisher idea) for the natural parameter θ. It follows that the fiducial density of θ is h n,s (θ) = θ H n,s(θ) = θ F n,θ(s) = (,s] (12) (nm (θ) t)p n,θ (t)dν n (t). (13) It is simple to verify that the distribution function H n,s is also a (possibly asymptotic) confidence distribution, according to its modern definition given in Schweder & Hjort (2002) and Singh et al. (2005). For discrete NEFs, F n,θ (s) = Pr θ {S n s} and Pr θ {S n < s} do not coincide and thus, besides H n,s in (12), one could define a left fiducial distribution as H l n,s(θ) = 1 Pr θ {S n < s} = 1 Pr θ {S n s 1 } = H n,s 1 (θ), (14) where s 1 denotes the point before s in the support of S n. For convenience, sometimes H n,s will be called right fiducial distribution. A natural way to overcome this nonuniqueness issue might be considering their arithmetic mean, i.e. the mixture, H A n,s(θ) = (H n,s (θ) + H l n,s(θ))/2 = Pr θ {S n > s} + Pr θ {S n = s}/2, whose density is the arithmetic mean of h n,s (θ) and h l n,s(θ). Remarkably, H A n,s coincides with the approximate confidence distribution proposed for discrete data by Schweder & Hjort (2002); see also Hannig & Xie (2012). The distribution H n,s in (12) is well defined for each s S n, but it fails for s = b n, with b n finite and ν n (b n ) > 0, since H n,bn (θ) = 1 F n,θ (b n ) = 0 for each θ Θ. A similar problem exists also for H l n,s, as it occurs for instance in the binomial model. Thus both H n,s and H l n,s, and hence their mixture, can be undefined in specific cases. A possible solution is to consider, instead of the arithmetic mean of h n,s and h l n,s, their geometric mean (suitably normalized), defined as / h G n,s(θ) = h n,s (θ)h l n,s(θ) h n,s (θ)h l n,s(θ)dθ. 10

11 We denote by Hn,s G the distribution function corresponding to h G n,s. Recently Berger, Bernardo & Sun (2015) suggest to use the geometric mean as a reasonable way to average different reference priors, because it is not affected by the normalizing constant, often not existing for reference priors. Furthermore they mentioned its property of minimizing the Kullback-Leibler divergence, attributing this remark to Gauri Datta. We give a simple proof of this fact, without resorting to the calculus of variations. First recall that, given two densities p and q with the same support, the Kullback-Leibler divergence of p from q is defined as KL(q p) = q(x) log(q(x)/p(x))dx. Proposition 2 Consider two densities p 1 and p 2 with the same support. Then the density q = p G (p 1 p 2 ) 1/2 minimizes the sum of the Kullback-Leibler divergences of p 1 and p 2 from q. The previous proposition, with other advantages which will be clarified later on, justifies the preference given to Hn,s G with respect to Hn,s A to combine H n,s and Hn,s. l Real fiducial distributions are invariant under monotone continuous reparameterizations of the model. More precisely, if λ = λ(θ) is an increasing differentiable function of θ, then the fiducial distribution function of λ is Hn,s(λ) λ = 1 F n,θ(λ) (s) = H n,s (θ(λ)). The corresponding density h λ n,s(λ) coincides with that obtained directly via a changeof-variable technique starting from h n,s (θ). A similar result holds if λ is decreasing in θ. Table 1 provides the fiducial distributions for some important NEFs obtained in V&M and used in the sequel Connections with Jeffreys priors As mentioned in Section 1, Lindley (1958) was the first to study when there exists a prior (called by V&M fiducial prior) whose corresponding posterior coincides with a given fiducial distribution. In particular he proved that, within continuous NEFs, a fiducial prior exists only for gaussian (with known variance) and gamma (with known shape) models. A full characterization of all real NEFs admitting a fiducial prior is given by V&M. The following proposition summarizes their results, which will be useful later on for comparison purposes. Proposition 3 Let F be a real NEF with natural parameter θ. i) If a fiducial prior for θ exists, then F has quadratic variance function. ii) A fiducial prior exists if and only if F is an affine transformation of one of the following families: normal with known variance, gamma with known shape param- 11

12 Table 1: Fiducial distributions for some real NEFs Sufficient Fiducial statistic distributions N(µ, σ 2 ) S n = i X i H n,s (µ) : N(s/n, σ 2 /n) (σ 2 known) N(µ, σ 2 ) S n = i (X i µ) 2 H n,s (σ 2 ): In-Ga(n/2, s/2) (µ known) Ga(α, λ) S n = i X i H n,s (λ) : Ga(nα, s) (α known) Pa(λ, x 0 ) S n = i log(x i/x 0 ) H n,s (λ) : Ga(n, s) (x 0 known) We(λ, c) S n = i Xc i H n,s (λ) : Ga(n, s) (c known) Bi(m, p) S n = i X i H n,s (p) : Be(s + 1, nm s) (m known) H l n,s(p) : Be(s, nm s + 1) H G n,s(p) : Be(s + 1/2, nm s + 1/2) Po(µ) S n = i X i H n,s (µ) : Ga(s + 1, n) H l n,s(µ) : Ga(s, n) H G n,s(µ) : Ga(s + 1/2, n) Ne-Bi(m, p) S n = i X i H n,s (p) : Be(nm, s + 1) (m known) Hn,s(p) l : Be(nm, s) Hn,s(p) G : Be(nm, s + 1/2) The following notations are used: Ga(α, λ) for a gamma distribution with shape α and mean α/λ; In-Ga(α, λ) for an inverse-gamma distribution (if X Ga(α, λ) then 1/X In-Ga(α, λ)); Be(α, β) for a beta distribution with parameters α and β; Bi(m, p) for a binomial distribution with m trials and success probability p; Ne-Bi(m, p) for a negative-binomial with m successes and success probability p; Po(µ) for the Poisson distribuition with mean µ; Pa(λ, x 0 ) for a Pareto distribution with density λx λ 0 x λ 1, x > x 0 > 0, λ > 0; We(λ, c) for a Weibull distribution with density cλx c 1 exp( λx c ), x > 0, λ > 0, c > 0. 12

13 eter, binomial, Poisson and negative-binomial. For the three discrete families, the fiducial prior exists for all H n,s, Hn,s l and Hn,s. G iii) When a fiducial prior exists, it belongs to the family of conjugate distributions. Moreover, it coincides with the Jeffreys prior for continuous F and also for discrete F if we choose Hn,s G as the fiducial distribution. iv) The fiducial distribution H n,s (or Hn,s A in the discrete case) and the Bayesian posterior distribution corresponding to the Jeffreys prior have the same Edgeworth s expansions up to the term of order n 1. The previous results establish a strong connections, in the setting of real NEFs, between Jeffreys posteriors and fiducial distributions and thus the two different approaches lead, in some sense, to the same objective inference. For a discussion about the coverage of the fiducial and Jeffreys intervals and their good frequentist properties, in particular when compared with the standard Wald intervals for real NEFs, see V&M and references within. 3.2 Fiducial distributions in a more general context In this section we suggest a possible and natural way to extend the previous results to more complex models. As mentioned in Section 1, the idea is to construct a joint fiducial distribution as a product of (conditional) fiducial distributions obtained by the conditional densities in which the sampling distribution has been factorized. This type of construction mimics the one used to obtain reference priors in a multivariate context and, as a consequence, also our fiducial distributions depend on the chosen order of the conditioning. As far as we know, this is the first time that this aspect is considered in a fiducial context. In the following we will always assume that the one-dimensional conditional distribution function F θ (x y) of the random quantities involved in the analysis satisfies the conditions for which F θ (x y)/ θ is a density of θ for fixed x and y. Of course this is true, by (12), if F θ (x y) belongs to a NEF. Consider a statistic T = (T 1,..., T m ) with density p θ (t), θ = (θ 1,..., θ d ), d m, which summarizes the data X without loosing information on θ. T can be a sufficient statistic or a one-to-one transformation of X. Suppose that T can be split as (T [d], T [d] ), where T [d] = (T 1,..., T d ) and with T [d] = (T d+1,..., T m ) ancillary for θ. Then the inference on θ can be performed using the conditional distribution of T [d] given T [d]. Assume now that there exists a one-to-one smooth reparameterization from θ to ϕ, with 13

14 ϕ 1,..., ϕ d representing the inferential importance order of the component of ϕ, such that p ϕ (t [d] t [d] ) = p ϕd k+1 (t k t [k 1], t [d] ; ϕ [d k] ). (15) The density p ϕd k+1 (t k t [k 1], t [d] ; ϕ [d k] ), with the corresponding distribution function F ϕd k+1 (t k t [k 1], t [d] ; ϕ [d k] ), must be interpreted as the conditional distribution of T k given T [k 1] = t [k 1], T [d] = t [d], parameterized by ϕ d k+1, assuming ϕ [d k] known. Then we can construct the fiducial density of ϕ as h t (ϕ) = h t[k],t [d] (ϕ d k+1 ϕ [d k] ), (16) where t = (t [d], t [d] ) and h t[k],t [d] (ϕ d k+1 ϕ [d k] ) = F ϕd k+1 (t k t ϕ [k 1], t [d] ; ϕ [d k] ) d k+1. (17) Before providing several applications of the previous procedure, it is convenient to discuss some general points related to the fiducial distribution (16). The existence of an ancillary statistic T [d] is not required if m = d, i.e. when the dimensions of T and ϕ coincide. The fiducial distribution (16) is essentially invariant with respect to one-to-one transformations of the statistic T. More precisely, because all distributions are conditioned on the ancillary statistic, any arbitrary one-to-one transformation of T [d] establishes the same constraint on the distributions. Moreover, also the choice of T [d] does not affect the resulting fiducial distribution if we consider a one-to-one transformation T = (T [d], T [d]), where T [d] = g(t [d], T [d] ) is a lower triangular transformation of T [d] for fixed T [d], that is T k = g k(t [k], T [d] ) for k = 1,..., d. For t k = g k(t [k], t [d] ), with for instance g k (t [k], t [d] ) increasing in t k, we have Pr ϕd k+1 (T k t k T [k 1] = t [k 1], T [d] = t [d] ; ϕ [d k] ) = Pr ϕd k+1 (g k (T [k], T [d] ) g k (t [k], t [d] ) T [k 1] = t [k 1], T [d] = t [d] ; ϕ [d k] ) = Pr ϕd k+1 (T k t k T [k 1] = t [k 1], T [d] = t [d] ; ϕ [d k] ), so that T and T lead to the same fiducial distribution. If one is interested only in ϕ 1, it follows from (15) that it is enough to consider h t[d],t [d] (ϕ 1 ) = F ϕ1 (t d t ϕ [d 1], t [d] ) 1. which depends on all observations. Similarly, if one is interested in ϕ 1, ϕ 2 then it is enough to consider h t[d],t [d] (ϕ 1 ) h t[d 1],t [d] (ϕ 2 ϕ 1 ), and so on. 14

15 If (T [k 1], T [d] ) is sufficient for ϕ [d k], for each k, then the conditional distribution of T k given T [k 1] = t [k 1], T [d] = t [d] does not depend on ϕ [d k] and the fiducial distribution (16) becomes the product of the marginal fiducial distributions of the ϕ k s. As a consequence, the k-th factor in (16) can be used alone to make inference on ϕ d k+1 and the fiducial distribution is independent on the inferential ordering of the parameters. An important case in which this happens will be discussed in Section 4. As we have seen in section 3.1, for discrete NEFs it is possible to define a right and a left fiducial distribution starting from Pr ϕd k+1 {T k t k T [k 1] = t [k 1], T [d] = t [d] ; ϕ [d k] } = F ϕd k+1 (t k t [k 1], t [d] ; ϕ [d k] ), and Pr ϕd k+1 {T k < t k T [k 1] = t [k 1], T [d] = t [d] ; ϕ [d k] }, respectively. Thus we could define 2 d different fiducial distributions taking for each factor of (16) one of the two previous choices. In the following we will consider only four natural cases: the fiducial distribution function H t (ϕ) obtained as the product of all the right univariate conditional fiducial distributions, Ht l (ϕ) obtained as the product of all the left univariate conditional fiducial distributions, and other two motivated by the considerations developed in Section 3.1. The first, Ht A (ϕ), is obtained as the product of the d mixtures Ht A [k],t [d] = (H t[k],t [d] + Ht l [k],t [d] )/2, while the second, Ht G(ϕ), corresponds to the density hg t (ϕ) obtained as the product of the d geometric means h G t [k],t [d] (h t[k],t [d] h l t [k],t [d] ) 1/2. Notice that h G t (ϕ) also coincides with the geometric mean of all the 2 d left and right conditional fiducial densities. 3.3 Connections with reference priors The construction by conditioning is the key point of the fiducial distribution h t (ϕ) in (16) and thus it depends on the order of the components of ϕ. This is the reason why we assumed that ϕ 1,..., ϕ d are ordered according to their inferential importance. This is similar to what happens in the objective Bayesian analysis, where it s well known that the reference priors, and thus the reference posteriors, of a multidimensional parameter generally depend on the ordering, see Bernardo & Smith (1994, sec ). Indeed, the reference prior π R (ϕ) for a parameter ϕ = (ϕ 1,..., ϕ d ) is generated by successive conditioning as π R (ϕ) = π R (ϕ d ϕ [d 1] ) π R (ϕ 2 ϕ 1 )π R (ϕ 1 ) = d πr (ϕ d k+1 ϕ [d k] ), and our proposal, given in (16), mimics this construction. We observe that in an objective Bayesian context, this aspect is seen as a positive feature of the procedure: the dependence of the reference prior on the quantity of interest has proved necessary to obtain posteriors with appropriate properties - in particular, to have good frequentist coverage properties (when attainable) and to avoid marginalization paradoxes and strong inconsistencies (Berger, Bernardo & Sun, 2015). Thus the reparameterization from θ to ϕ used 15

16 in (15) has a double aim. On one hand it allows the parameter of interest to appear explicitly, if it is not directly a component of θ, on the other hand it can be needed to achieve the structure of the conditional distributions in (15). The fiducial distribution (16) is in general not invariant to reparameterizations. Note that this is true also for the reference posteriors unless the transformation from ϕ to λ = (λ 1,..., λ d ), say, maintains the same increasing ordering of importance in the components of the two vectors and λ k is a function of ϕ 1,..., ϕ k, for each k = 1,..., d, i.e. ϕ(λ) is a lower triangular transformation, see Yang (1995) and Datta & Ghosh (1996). This result holds also for fiducial distributions as the following proposition shows. Proposition 4 If ϕ = ϕ(λ) is a one-to-one lower triangular continuously differentiable function from Λ to Φ, then the fiducial distribution h ϕ t (ϕ), obtained applying (16) to the model p ϕ (t), and the fiducial distribution h λ t (λ), obtained applying (16) to the model p λ (t) = p ϕ(λ) (t), are such that, for each measurable A Φ, 3.4 Examples A h ϕ t (ϕ)dϕ = λ 1 (A) h λ t (λ)dλ. (18) This section shows how the suggested procedure to construct fiducial distributions can be fruitfully applied to several classical problems. For discrete models we always choose H G t as fiducial distribution. We will see that, in many cases, the fiducial distribution and the reference posterior coincide. This latter is generated by a reference prior which can be seen as the product of the Jeffreys priors derived from the conditional distributions of the data, as seen in Proposition Location-scale parameter models Consider first the case in which only one parameter, θ, is unknown. These models admit an ancillary statistic Z and, in particular, we assume Z i = X i X 1 or Z i = X i /X 1, i = 2,..., n, if θ is a location or a scale parameter, respectively. The following proposition characterizes the fiducial distribution for θ and establishes the equivalence with objective Bayesian inference. Proposition 5 Let X = (X 1,..., X n ) be an i.i.d. sample of size n from a density p θ, θ Θ R. If θ is a location or a scale parameter, then the fiducial distribution coincides with the Bayesian posterior obtained with the Jeffreys prior π J (θ) 1 or π J (θ) 1/θ, respectively. 16

17 Example 2. Let X be an i.i.d. sample from a uniform distribution on (0, θ), θ > 0, so that θ is a scale parameter. First notice that S = max(x 1,..., X n ) is a sufficient statistic for θ and thus we can obtain directly the fiducial distribution h s (θ) = θ H s(θ) = θ F θ(s) = θ ( s θ ) n = ns n, θ > s. (19) θn+1 Set w = max(z 2,..., z n ) and consider the distribution function of X 1 given the ancillary statistic Z = X 2 /X 1,..., X n /X 1 { ( x1 ) n F θ (x 1 z) = θ 0 < x 1 < θ, 0 < w 1 ) n 0 < x 1 < θ, w > 1 ( x1 w θ. (20) Now, because w 1 means x 1 = max(x 1,..., x n ), while for w > 1 we have x 1 w = max(x 2,..., x n ), expression (20), as a function of θ, is equivalent to that in (19) and thus provides the same fiducial distribution, which trivially coincides with the Jeffreys posterior. Note that if a one-dimensional sufficient statistic does not exist, only the second way can be applied to obtain the result. Example 3. Let X be an i.i.d. sample from a uniform distribution on (θ, θ + 1), θ R, so that θ is a location parameter. There exists a sufficient statistic S = (S 1 = min(x 1,..., X n ), S 2 = max(x 1,..., X n )), but it is not one-dimensional. Thus we can compute the fiducial distribution starting from the distribution function of S 2 given the ancillary statistic Z = S 2 S 1. Specifically we have h s (θ) = θ H s(θ) = θ F θ(s 2 z) = θ which coincides with π J (θ x). s 2 z θ 1 z = 1 1 z s 2 1 < θ < s 2 z = s 1, Example 4. Let X be an i.i.d. sample from a logistic distribution (or a Cauchy distribution) with know scale parameter. In these case a sufficient statistic does not exist and then the procedure to construct the fiducial distribution cannot be simplified. Using Proposition 5, we can write it as a Bayesian posterior obtained by a constant prior. Thus for the logistic distribution we have h x (θ) n i=1 exp{ (x i θ)/σ} σ(1 + exp{ (x i θ)/σ}) 2, whose normalizing constant can be easily computed numerically or via simulation. We can proceed similarly for the Cauchy distribution. Consider now a model with a location parameter θ and a scale parameter σ, both unknown. Given an i.i.d. sample of size n, an ancillary statistic is, for example, Z = 17

18 (Z 3,..., Z n ), with Z j = (X j X 1 )/Z 2, j = 3,..., n, where Z 2 = X 2 X 1 is marginally ancillary for θ. Then, the one-to-one transformation from X to (X 1, Z 2, Z) allows us to write the sampling distribution as p θ,σ (x 1 z 2, z) p σ (z 2 z) p(z), so that the fiducial distribution for (σ, θ) can be obtained using only the first two factors of the product. Note that in specific contexts other transformations could be more appropriate. For example, in a normal model it could be more reasonable to use ( X = n i=1 X i/n, S 2 = n i=1 (X i X) 2, Z) with Z j = (X j X)/S, j = 3,..., n, so that the factorization becomes p θ,σ ( x s 2, z) p σ (s 2 z) p(z). Because X and S 2 are complete sufficient statistics and are independent, it follows that, in this case, the fiducial distribution is obtained starting from p θ,σ ( x) p σ (s 2 ), as suggested in the Section 1. Proposition 6 Let X = (X 1,..., X n ) be an i.i.d. sample from a density p θ,σ, where θ and σ represent a location and a scale parameter, respectively. Then the fiducial distribution for (σ, θ) coincides with the Bayesian posterior obtained with the reference prior πσ,θ R (σ, θ) 1/σ. Notice that π R (σ, θ) 1/σ is different from π J (σ, θ) 1/σ 2 obtained by the Jeffreys rule. Furthermore, π R does not depend on the order of θ and σ, while our procedure applies only for (σ, θ). However, our fiducial distribution corresponds to that derived by Fisher (1973, Sec. 6.8) for the normal model and to those obtained through other symmetric fiducial approaches, see Hannig (2009) and Fraser (1961). model the inferential order of importance seems irrelevant. Thus in this Examples concerning normal models i) Difference of means. Consider two independent normal i.i.d. samples, each of size n, with known common variance σ 2 and means µ 1 and µ 2 respectively. The sufficient statistics are the samples sums S 1 and S 2, with S i N(nµ i, nσ 2 ), i = 1, 2. To make inference on δ = µ 2 µ 1, we reparameterize the joint density of (S 1, S 2 ) in (δ = µ 2 µ 1, λ = µ 1 ), and we obtain p δ,λ (s 1, s 2 ) = 1 2πnσ 2 exp ( s2 1 + s2 2 2nσ 2 ) { δs2 exp σ 2 + λ(s 1 + s 2 ) σ 2 n(2λ2 + δ 2 } + 2λδ) 2σ 2. It follows that the conditional distribution of S 2 given S 1 +S 2 is N((nδ+s 1 +s 2 )/2, nσ 2 /2). From Table 1, the fiducial distribution of δ/2 + (s 1 + s 2 )/(2n) is N(s 2 /n, σ 2 /(2n)), so that δ is N( x 2 x 1, 2σ 2 /n), where x i = s i /n, while the fiducial distribution of λ given δ, derived from the marginal distribution of S 2, is N( x 2 δ, σ 2 /n). The joint fiducial distribution of η and λ then follows. The same fiducial distribution for δ = µ 2 µ 1 is obtained starting from the two independent marginal fiducial distributions of µ 1 and µ Indeed, from Table 1, the

19 fiducial distribution of µ i, obtained from S i, is N( x i, σ 2 /n), i = 1, 2 so that a direct transformation implies that δ is N( x 2 x 1, 2σ 2 /n), as before. It is worth to remark that this phenomenon (i.e., the coincidence of the two fiducial distributions of the same parameter, obtained starting from different parameters of interest) is specific to this, and other examples, but it has not at all a general validity, as it is shown for instance in Section ii) Many normal means (Neyman Scott, 1948). Consider n samples of size two (X i1, X i2 ), with each X ij independently distributed according to a N(µ i, σ 2 ), i = 1,..., n. The aim is to make inference on the common variance σ 2, with nuisance parameter (µ 1,..., µ n ). Let X i = (X i1 + X i2 )/2 and W = n i=1 (X i1 X i2 ) 2. This well known example it is used to show that the standard maximum likelihood estimator and that based on the profile likelihood are both equal to ˆσ 2 = W/(2n), which is inconsistent because W/(2n) σ 2 /2, n. To obtain the fiducial distribution of σ 2, first notice that the joint distribution of the sufficient statistic ( X = ( X 1,..., X n ), W ) can be factorized as ( n ) p µi,σ 2( x i) p σ 2(w), (21) i=1 for the independence of the X i s and W. From the results in Table 1, it follows that the fiducial distribution of each µ i given σ 2, derived from p µi,σ 2( x i), is N( x i, σ 2 /2) and that of σ 2, obtained from p σ 2(w), is In-Ga(n/2, w/4). As a consequence ( n ) h x,w (µ, σ 2 ) = h xi (µ i σ 2 ) h w (σ 2 ). (22) i=1 It is interesting to observe that the fiducial distribution (22) is equal to the posterior obtained from the reference prior, independent of the ordering and grouping, π R (σ 2, µ 1,..., µ n ) 1/σ 2. This distribution does not present the inconsistency of the likelihood estimator, which instead occurs using the Jeffreys prior π J (σ 2, µ 1,..., µ n ) 1/σ n Comparison of two Poisson rates The comparison of the rates µ 1 and µ 2 of two independent Poisson distributions is a classical problem arising in many contexts. For instance, an unbiased uniformly most powerful test for the ratio η = µ 2 /µ 1 is discussed in Lehmann (2005). Given two i.i.d. samples of size n from two independent Poisson distributions, the sufficient statistics are the sample sums S 1 and S 2, with S i Po(nµ i ), i = 1, 2. Reparameterizing the joint density of (S 1, S 2 ) in (η = µ 2 /µ 1, λ = µ 1 ), we have p η,λ (s 1, s 2 ) = ns 1+s 2 s 1!s 2! exp{s 2 log η + (s 1 + s 2 ) log λ nλ(1 + η)}. 19

20 The conditional distribution of S 2 given S 1 +S 2 is Bi(s 1 +s 2, η/(1+η)) and the marginal distribution of S 1 +S 2 is Po(nλ(1+η)). Using Table 1, the fiducial density h G of η/(1+η), derived from the conditional distribution, is Be(s 2 + 1/2, s 1 + 1/2) which induces on η h G s 1,s 2 (η) = 1 B(s 2 + 1/2, s 1 + 1/2) ηs 2 1/2 (1 + η) s 1 s 2 1, η > 0, (23) where B(, ) denotes the beta function. From the marginal distribution of S 1 + S 2 and using again Table 1, it follows that h G s 1,s 2 (λ η) is Ga(s 1 + s 2 + 1/2, n(1 + η)) and thus the joint fiducial distribution of η and λ is h G s 1,s 2 (η, λ) = h G s 1,s 2 (λ η)h G s 1,s 2 (η). The fiducial distribution (23) of η coincides with both the reference and Jeffreys posterior distributions. However it is interesting, and easy to verify, that h G s 1,s 2 (η, λ) is equal to the reference posterior on (η, λ), but is different from the Jeffreys posterior, which instead coincides with the fiducial distribution induced on η by the two independent marginal fiducial densities of µ 1 and µ Bivariate binomial A Bayesian analysis for the bivariate binomial model has been discussed by Crowder & Sweeting (1989) in connection with a microbiological application. Consider m spores, each with a probability p to germinate, and denote by R the random number of germinating spores, so that R is Bi(m, p). If q is the probability that one of the latter spores bends in a particular direction and S is the random number of them, the probability distribution of S given R = r is Bi(r, q). The joint distribution of R and S is called bivariate binomial. Crowder and Sweeting observe that the Jeffreys prior π J (p, q) p 1 (1 p) 1/2 q 1/2 (1 q) 1/2 is not satisfactory for its asymmetry in p and 1 p, while Polson & Wasserman (1990) show that this fact does not occur using the (order invariant) reference prior π R (p, q) p 1/2 (1 p) 1/2 q 1/2 (1 q) 1/2, which is the product of the two independent Jeffreys priors for p and q. The symmetry condition is satisfied also by the (order invariant) reference prior on the alternative parametrization (η = pq, λ = p(1 q)/(1 pq)), which is π R (η, λ) η 1/2 (1 η) 1/2 λ 1/2 (1 λ) 1/2. The joint fiducial density h G r,s(q, p) can be obtained, as usual, as the product of h G r,s(q), derived from the conditional model Bi(r, q), and h G r (p q), derived from the marginal model Bi(m, p) of R, which does not depend on q, so that p and q are independent under h G r,s. This fact makes the fiducial distribution order invariant, as seen for the reference prior. Because for a binomial model there exists the fiducial prior Be(1/2, 1/2), equivalent to the Jeffreys prior, see Proposition 3, it follows immediately that the fiducial distribution of (p, q) coincides with the reference posterior. If the parameter of interest is η, or λ, we can reparameterize the model writing it in terms of the distribution of R S given 20

21 S = s, which is Bi(m s, λ), and that of S, which is Bi(m, η) and prove again that the joint fiducial distribution and reference posterior are both order-invariant and coincide Ratio of parameters of a trinomial distribution Bernardo & Ramon (1998) perform the Bayesian reference analysis for the ratio of two multinomial parameters, discussing in details the case of the trinomial distribution and presenting some applications. Consider n observations belonging to one out of three categories and denote by X i, i = 1, 2, 3, the number of occurrences in category i. Then the joint distribution of X 1 and X 2 is trinomial, with parameters p 1 and p 2 representing the probabilities associated with the first two categories. Bernardo & Ramon (1998, formula (7) and (8)) show that, if the parameter of interest is η = p 1 /p 2 and λ = p 2 is the nuisance parameter, then the (proper) reference prior of (η, λ) is π R (η, λ) [η(1 + η)λ(1 + λ(1 + η))] 1/2, 0 < λ < (1 + η) 1, 0 < η < +, (24) so that the marginal reference posterior of η is π R (η x 1, x 2 ) η x1 1/2 (1 + η) x 1 x 2 1. (25) To find the fiducial distribution of η, let us reparameterize the trinomial model in (η, λ): ( ) n p η,λ (x 1, x 2 ) = exp{x 1 log η + (x 1 + x 2 ) log(λ/(1 ηλ λ)) + n log(1 ηλ λ)}. x 1, x 2, x 3 The conditional distribution of X 1 given T = X 1 + X 2 = t is Bi(t; η/(1 + η)), while the distribution of T is Bi(n; λ(1 + η)). Thus, Table 1 shows that the fiducial density h G x 1,t of η/(1 + η) is Be(x 1 + 1/2, t x 1 + 1/2), which induces a fiducial density of η equal to (25). Of course, the joint fiducial density of (η, λ) is given by h G x 1,x 2 (η, λ) = h G x 1,t(η)h G t (λ η), which coincides with the reference posterior obtained by (24). 4 Fiducial distributions for conditionally reducible NEFs 4.1 Properties and examples The construction of the fiducial distribution proposed in Section 3.2 becomes much simpler when we consider a model belonging to a cr-nef. In this case there exists a sufficient statistic of the same dimension of the parameter, so that there is no need of an ancillary statistic, while the ϕ-parameterization, indexing each conditional distribution with a real parameter, implies the independence of the ϕ k s under the fiducial distribution. 21

Fiducial and Confidence Distributions for Real Exponential Families

Fiducial and Confidence Distributions for Real Exponential Families Scandinavian Journal of Statistics doi: 10.1111/sjos.12117 Published by Wiley Publishing Ltd. Fiducial and Confidence Distributions for Real Exponential Families PIERO VERONESE and EUGENIO MELILLI Department

More information

Order-invariant Group Reference Priors for Natural. Exponential Families Having a Simple Quadratic. Variance Function

Order-invariant Group Reference Priors for Natural. Exponential Families Having a Simple Quadratic. Variance Function Order-invariant Group Reference Priors for Natural Exponential Families Having a Simple Quadratic Variance Function Guido Consonni University of Pavia, Italy Piero Veronese L. Bocconi University, Milan

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Fiducial Inference and Generalizations

Fiducial Inference and Generalizations Fiducial Inference and Generalizations Jan Hannig Department of Statistics and Operations Research The University of North Carolina at Chapel Hill Hari Iyer Department of Statistics, Colorado State University

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Remarks on Improper Ignorance Priors

Remarks on Improper Ignorance Priors As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

A simple analysis of the exact probability matching prior in the location-scale model

A simple analysis of the exact probability matching prior in the location-scale model A simple analysis of the exact probability matching prior in the location-scale model Thomas J. DiCiccio Department of Social Statistics, Cornell University Todd A. Kuffner Department of Mathematics, Washington

More information

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions Communications for Statistical Applications and Methods 03, Vol. 0, No. 5, 387 394 DOI: http://dx.doi.org/0.535/csam.03.0.5.387 Noninformative Priors for the Ratio of the Scale Parameters in the Inverted

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Invariant Conjugate Analysis for Exponential Families

Invariant Conjugate Analysis for Exponential Families Bayesian Analysis (2012) 7, Number 4, pp. 903 916 Invariant Conjugate Analysis for Exponential Families Pierre Druilhet and Denys Pommeret Abstract. There are several ways to parameterize a distribution

More information

Integrated Objective Bayesian Estimation and Hypothesis Testing

Integrated Objective Bayesian Estimation and Hypothesis Testing Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

On Generalized Fiducial Inference

On Generalized Fiducial Inference On Generalized Fiducial Inference Jan Hannig jan.hannig@colostate.edu University of North Carolina at Chapel Hill Parts of this talk are based on joint work with: Hari Iyer, Thomas C.M. Lee, Paul Patterson,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Bootstrap and Parametric Inference: Successes and Challenges

Bootstrap and Parametric Inference: Successes and Challenges Bootstrap and Parametric Inference: Successes and Challenges G. Alastair Young Department of Mathematics Imperial College London Newton Institute, January 2008 Overview Overview Review key aspects of frequentist

More information

Divergence Based priors for the problem of hypothesis testing

Divergence Based priors for the problem of hypothesis testing Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and

More information

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank

More information

Fisher Information & Efficiency

Fisher Information & Efficiency Fisher Information & Efficiency Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Introduction Let f(x θ) be the pdf of X for θ Θ; at times we will also consider a

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Integrated Objective Bayesian Estimation and Hypothesis Testing

Integrated Objective Bayesian Estimation and Hypothesis Testing BAYESIAN STATISTICS 9, J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2010 Integrated Objective Bayesian Estimation

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Uncertain Inference and Artificial Intelligence

Uncertain Inference and Artificial Intelligence March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Conjugate Predictive Distributions and Generalized Entropies

Conjugate Predictive Distributions and Generalized Entropies Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy. 21-23 March, 2013 Menu 1 Antipasto/Appetizer

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Monday, September 26, 2011 Lecture 10: Exponential families and Sufficient statistics Exponential Families Exponential families are important parametric families

More information

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution. The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Monte Carlo conditioning on a sufficient statistic

Monte Carlo conditioning on a sufficient statistic Seminar, UC Davis, 24 April 2008 p. 1/22 Monte Carlo conditioning on a sufficient statistic Bo Henry Lindqvist Norwegian University of Science and Technology, Trondheim Joint work with Gunnar Taraldsen,

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13 The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Inferential models: A framework for prior-free posterior probabilistic inference

Inferential models: A framework for prior-free posterior probabilistic inference Inferential models: A framework for prior-free posterior probabilistic inference Ryan Martin Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago rgmartin@uic.edu

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 1.1 The Probability Model...1 1.2 Finite Discrete Models with Equally Likely Outcomes...5 1.2.1 Tree Diagrams...6 1.2.2 The Multiplication Principle...8

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Generalized Fiducial Inference

Generalized Fiducial Inference Generalized Fiducial Inference Parts of this short course are joint work with T. C.M Lee (UC Davis), H. Iyer (NIST) Randy Lai (U of Maine), J. Williams (UNC), Y. Cui (UNC), BFF 2018 Jan Hannig a University

More information

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Confidence Distribution

Confidence Distribution Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Carl N. Morris. University of Texas

Carl N. Morris. University of Texas EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province

More information

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia

More information

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

arxiv: v1 [math.st] 7 Jan 2014

arxiv: v1 [math.st] 7 Jan 2014 Three Occurrences of the Hyperbolic-Secant Distribution Peng Ding Department of Statistics, Harvard University, One Oxford Street, Cambridge 02138 MA Email: pengding@fas.harvard.edu arxiv:1401.1267v1 [math.st]

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition Preface Preface to the First Edition xi xiii 1 Basic Probability Theory 1 1.1 Introduction 1 1.2 Sample Spaces and Events 3 1.3 The Axioms of Probability 7 1.4 Finite Sample Spaces and Combinatorics 15

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Objective Bayesian Hypothesis Testing

Objective Bayesian Hypothesis Testing Objective Bayesian Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Statistical Science and Philosophy of Science London School of Economics (UK), June 21st, 2010

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

The exponential family: Conjugate priors

The exponential family: Conjugate priors Chapter 9 The exponential family: Conjugate priors Within the Bayesian framework the parameter θ is treated as a random quantity. This requires us to specify a prior distribution p(θ), from which we can

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

1 Introduction. P (n = 1 red ball drawn) =

1 Introduction. P (n = 1 red ball drawn) = Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist

More information

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY

More information

THE MINIMAL BELIEF PRINCIPLE: A NEW METHOD FOR PARAMETRIC INFERENCE

THE MINIMAL BELIEF PRINCIPLE: A NEW METHOD FOR PARAMETRIC INFERENCE 1 THE MINIMAL BELIEF PRINCIPLE: A NEW METHOD FOR PARAMETRIC INFERENCE Chuanhai Liu and Jianchun Zhang Purdue University Abstract: Contemporary very-high-dimensional (VHD) statistical problems call attention

More information

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger) 8 HENRIK HULT Lecture 2 3. Some common distributions in classical and Bayesian statistics 3.1. Conjugate prior distributions. In the Bayesian setting it is important to compute posterior distributions.

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Australian & New Zealand Journal of Statistics

Australian & New Zealand Journal of Statistics Australian & New Zealand Journal of Statistics Aust.N.Z.J.Stat.51(2), 2009, 115 126 doi: 10.1111/j.1467-842X.2009.00548.x ROUTES TO HIGHER-ORDER ACCURACY IN PARAMETRIC INFERENCE G. ALASTAIR YOUNG 1 Imperial

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Confidence distributions in statistical inference

Confidence distributions in statistical inference Confidence distributions in statistical inference Sergei I. Bityukov Institute for High Energy Physics, Protvino, Russia Nikolai V. Krasnikov Institute for Nuclear Research RAS, Moscow, Russia Motivation

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Advanced topics from statistics

Advanced topics from statistics Advanced topics from statistics Anders Ringgaard Kristensen Advanced Herd Management Slide 1 Outline Covariance and correlation Random vectors and multivariate distributions The multinomial distribution

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Principles of Statistics

Principles of Statistics Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Harrison B. Prosper. CMS Statistics Committee

Harrison B. Prosper. CMS Statistics Committee Harrison B. Prosper Florida State University CMS Statistics Committee 08-08-08 Bayesian Methods: Theory & Practice. Harrison B. Prosper 1 h Lecture 3 Applications h Hypothesis Testing Recap h A Single

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Improper mixtures and Bayes s theorem

Improper mixtures and Bayes s theorem and Bayes s theorem and Han Han Department of Statistics University of Chicago DASF-III conference Toronto, March 2010 Outline Bayes s theorem 1 Bayes s theorem 2 Bayes s theorem Non-Bayesian model: Domain

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information