1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

Size: px

Start display at page:

Download "1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters"

Egbert Eaton
5 years ago
Views:

1 ISyE8843A, Brani Vidakovic Handout 6 Priors, Contd.. Nuisance Parameters Assume that unknown parameter is θ, θ ) but we are interested only in θ. The parameter θ is called nuisance parameter. How do we handle nuisance parameters? Suppose πθ, θ ) is joint prior for θ, θ ). The posterior is πθ, θ x) fx θ, θ ) πθ, θ ). Marginal posterior of interest is obtained by averaging over the nuisance parameter, πθ x) = πθ, θ x)dθ, or πθ x) = πθ θ, x)πθ x)dθ. Example: Let X = X,..., X n ) be a sample from normal N µ, σ) distribution. Assume πµ, σ ) =. σ The posterior distribution is with slight abuse of notation) πµ, σ X) σ = e σ n n n i= X i µ) σ [n )s +n X µ) ], where s = n n i= X i X). Show that πσ X) is IGamma n, n )s ). This distribution is sometimes referred as scaled inverse χ, inv χ n, s ), see Handout 0. The joint posterior can be represented as πµ, σ X) = πµ σ, X) πσ X), πµ, σ X) n )s πσ σn+ /n )µ X) πσ /n e n/σ n )s n+ σ σ ) n e + IGamma n, n )s σ N X, σ /n) N X, σ /n) n )s ) N X, σ /n) Exercise: Derive prior and posterior predictive distributions for the above model. See page 76 of the text. Exercise: Consider the Normal Inverse Gamma prior. This prior is conjugate. Find the parameters of the

2 posterior. See the section 3.3 of the text, page 78. Example: What if the prior is πθ, σ ) = σ? ) This prior is noninformative independence prior since it is a product of non-informative, translation invariant priors for θ and σ. This prior, although not Jeffreys for the problem, was ultimately recommended by Jeffreys in his book from 96). For mathematically thirsty, the prior in ) is the right invariant Haar density for the problem, see Berger 985), Chapter 6. Recall that posterior for the parameter of interest was obtained by averaging over the nuisance parameter, πθ x) = πθ θ, x)πθ x)dθ, Alternativelly, one can first find the marginal likelihood, if this is convenient, and then use the prior on the parameter of interest to arrive to the same marginal posterior. Suppose πθ, θ ) = πθ )πθ θ ). The marginal likelihood is fx θ ) = fx θ, θ ) πθ θ )dθ. Θ Now, πθ x) fx θ )πθ ). Example: The following example is a simplification of a model from Vidakovic and Ruggeri 00). The observations denoted by d are observed wavelet coefficients. The model for d is normal, with the mean θ being the parameter of interest and variance σ being a nuisance parameter. If [d θ, σ ] Nθ, σ ) and the prior on σ is independent on θ, [σ ] Eµ), µ positive and known, with density fσ µ) = µe µσ, the resulting marginal likelihood is ) [d θ] DE θ,, with density fd θ) = µe µ µ d θ. Now, if the prior on θ is then the prior) predictive distribution of d is Find the posterior distribution for [θ d]. [θ] DE0, τ), τ known, [d] md) = τe d /τ µ e µ d τ. /µ

3 . MaxEnt Priors For a discrete probability distribution p,..., p n, i p i =, the entropy is defined as Ep) = i p i logp i ). Assume that the following restriction better said Information) on the prior π are available: E π [g k θ)] = i g k θ i )pθ i ) = µ k, k =,,..., m. The maxent prior is given by π θ i ) = exp{ k λ kg k θ i )} i exp{ k λ kg k θ i )}. The multipliers λ i are obtained by solving the optimization problem. Example: Assume Θ = {0,,,... }. Suppose that E π θ = 5. Here g θ) = θ and µ = 5. Thus, π θ) = e λ θ θ=0 eλ θ = eλ )e λ ) θ. This is Geome λ ) density and solving eλ e λ = 5 gives e λ = /6. Hence, maxent prior is Geom/6) If the problem requires continuous prior, the maxent approach becomes complicated. First, what should be the definition of entropy for continuous distribution π? Jaynes 968) argues that it should be defined via Kullback-Leibler divergence between π and some invariant noninformative prior for the problem, π 0, Eπ) = E π 0 [ log πθ) ] = π 0 θ) log πθ) π 0 θ) π 0θ)dθ. he maxent prior, under constraints as in the discrete case, is given by π θ) = exp{ k λ kg k θ i )}π 0 θ) exp{ k λ kg k θ i )}π 0 θ)dθ. The reference π 0 is thus instrumental in defining the maxent prior. Exercise: If E π θ = µ and π 0 is flat prior Lebesgue measure), show that the maxent solution is π θ) e λθ, which cannot be normalized to a proper density. If in addition V arθ) = σ, then the maxent solution is normal N µ, σ ) distribution..3 Multivariate Priors As a fundamentsl multvariate model we consider Multivariate Normal MVN) likelihood overview the MVN/MVN Bayes model. This model is not only important as educational example that mimics normal/normal case, but also as useful modeling tool. 3

4 Assume that X,..., X n and their location θ are p-dimensional, and distributed as MVN θ, Σ). The covariance matrix Σ is assumed known. The likelohood of θ based on X,..., X n is ) fx θ) = Σ n/ exp n x i θ) Σ x i θ) which after some matrix algebra becomes fx θ) = Σ n/ exp i= ) trσ S), ) and S = n i= x i θ)x i θ) is the sum-of-squares matrix. If the prior on θ is also multivariate normal MVN µ, Π), with vector µ and covariance matrix Π known, the posterior is [ n ]) πθ x) exp x i θ) Σ x i θ) + θ µ) Π θ µ), i= which, after standard matrix algebra becomes πθ x) exp ) θ µ ) Π θ µ ), i.e., MVN µ, Π ). The posterior parameters are matrix reformulations of the univariate case, µ = Π + nσ ) Π µ + nσ x), Π = Π + nσ ). As an illustration, we discuss the estimation of multivariate mean for the student test scores. The subjects are: mechanics, vectors, algebra, analysis, and statistics, and n = 88 records are available Data set and description in Mardia, Kent and Bibby, 979). The likelihood is MVN p θ, Σ), with Σ = 56I + 56J, where I is the identity matrix and J is the matrix of ones. Part of matlab code that calculates the Bayes estimator for the θ is Sigma = 56*eyep)+56*onesp,p); mu = 50*onesp,); Pi = 4* eyep)+ 4* onesp,p); mu_ = inv invpi) + n * invsigma)) *... invpi) * mu + n * invsigma) * meanscores)) ); Pi = inv invpi) + n * invsigma)); and the complete m-file producing the figure and containing the data set is bayes6.m. References [] Berger, J. 985). Statistical Decision Theory and Bayesian Analysis, Second Edition, Springer Verlag. [] Mardia, K.V., Kent, J.T., and Bibby, J.M. 979). Multivariate Analysis. London: Academic Press. [3] Robert, C. 00). Bayesian Choice, Second Edition, Springer Verlag. [4] Vidakovic, B. and Ruggeri, F. 00). BAMS Method: Theory and Simulations. Sankhyā, Series B, 63, Special Issue on Wavelets),

5 mechanics vectors algebra analysis statistics Figure : Multivariate Bayes estimator for the student scores from Mardia, Kent and Bibby, 979. The subjects are: mechanics, vectors, algebra, analysis, and statistics. 5

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.