Measuring nuisance parameter effects in Bayesian inference

Size: px

Start display at page:

Download "Measuring nuisance parameter effects in Bayesian inference"

Baldric Mitchell Carpenter
6 years ago
Views:

1 Measuring nuisance parameter effects in Bayesian inference Alastair Young Imperial College London WHOA-PSI / 31

2 Acknowledgements: Tom DiCiccio, Cornell University; Daniel Garcia Rasines, Imperial College London; Todd Kuffner, WUSTL. 2 / 31

3 Background Inferences drawn from Bayesian data analysis depend on the assumed prior distribution. 3 / 31

4 Background Inferences drawn from Bayesian data analysis depend on the assumed prior distribution. Substantial Bayesian robustness literature, quantifying degree to which posterior quantities depend on prior. 3 / 31

5 Background Inferences drawn from Bayesian data analysis depend on the assumed prior distribution. Substantial Bayesian robustness literature, quantifying degree to which posterior quantities depend on prior. Especially important with hierarchical models, interpretability deeper in hierarchy becomes challenging. 3 / 31

6 Approaches Global: all possible inferences arising from class of priors considered, hope resulting class of inferences is small, so robust to prior. Limited by difficulty of computing range of posterior quantity as prior varies. 4 / 31

7 Approaches Global: all possible inferences arising from class of priors considered, hope resulting class of inferences is small, so robust to prior. Limited by difficulty of computing range of posterior quantity as prior varies. Local: based on infinitesimal perturbations to prior, differentiation of posterior quantities wrt prior. 4 / 31

8 Typical inferential problem Let Y = {Y 1,..., Y n } be random sample from underlying distribution F θ, indexed by d + 1-dimensional parameter θ = (θ 1,..., θ d+1 ) = (ψ, χ). Have ψ = θ 1 a scalar parameter of interest, χ a d-dimensional nuisance parameter. 5 / 31

9 Key question How sensitive is the (marginal) posterior inference on ψ to presence in model of nuisance parameter χ and assumed form of prior distribution? 6 / 31

10 Purpose Motivated by corresponding frequentist ideas, provide machinery, based on Bayesian asymptotics, to measure extent of the influence that nuisance parameters have on the inference. 7 / 31

11 Purpose Motivated by corresponding frequentist ideas, provide machinery, based on Bayesian asymptotics, to measure extent of the influence that nuisance parameters have on the inference. Offer direct, practical interpretation of measures of influence in terms of posterior probabilities (in full and reduced models). 7 / 31

12 Key (frequentist) statistic For testing H 0 : ψ = ψ 0 against a one-sided alternative the signed root likelihood ratio statistic is R R(ψ 0 ) = sgn( ˆψ ψ 0 ) [ 2 { l(ˆθ) l(ψ 0, ˆχ 0 ) }] 1/2, where l(θ) = l(ψ, χ) is the log-likelihood function, ˆθ = ( ˆψ, ˆχ) is the MLE of θ, and ˆχ 0 is the constrained MLE of χ given the value ψ = ψ 0 of the interest parameter ψ. 8 / 31

13 Bayesian posterior approximation: full model For continuous prior density π(θ) = π(ψ, χ), the posterior tail probability given the data Y = (Y 1,..., Y n ) is P(ψ ψ 0 Y ) = Φ(R ) + O(n 3/2 ), where Φ( ) is the cumulative distribution function of the standard normal distribution, R R (ψ 0 ) = R + R 1 log ( T /R ), T T (ψ 0 ) = l ψ (ψ 0, ˆχ 0 ) l χχ(ψ 0, ˆχ 0 ) 1/2 l θθ (ˆθ) 1/2 π(ˆθ) π(ψ 0, ˆχ 0 ), and ˆψ ψ 0 is of order O(n 1/2 ). 9 / 31

14 A factorization We may factorize with T = T INF T NP, T INF T INF (ψ 0 ) = πψ ( ˆψ) π ψ (ψ 0 ) l ψ(ψ 0, ˆχ 0 ) { l ψψ (ˆθ) } 1/2, T NP T NP (ψ 0 ) = l χχ(ψ 0, ˆχ 0 ) 1/2 l χχ (ˆθ) 1/2 π χ ψ (ˆθ) π χ ψ (ψ 0, ˆχ 0 ), with π ψ (ψ) the marginal prior density for ψ and π χ ψ (θ) the conditional prior density for χ given ψ. 10 / 31

15 Properties For Bayesian inference, R = R + Z NP + Z INF, where Z NP = R 1 log T NP and Z INF = R 1 log ( T INF /R ). 11 / 31

16 Properties For Bayesian inference, R = R + Z NP + Z INF, where Z NP = R 1 log T NP and Z INF = R 1 log ( T INF /R ). This decomposition has two important properties: 11 / 31

17 Properties For Bayesian inference, R = R + Z NP + Z INF, where Z NP = R 1 log T NP and Z INF = R 1 log ( T INF /R ). This decomposition has two important properties: Both T INF and T NP are invariant under interest-respecting transformations, hence Z NP and Z INF also have this property. 11 / 31

18 Properties For Bayesian inference, R = R + Z NP + Z INF, where Z NP = R 1 log T NP and Z INF = R 1 log ( T INF /R ). This decomposition has two important properties: Both T INF and T NP are invariant under interest-respecting transformations, hence Z NP and Z INF also have this property. T NP does not appear in the tail probability formula when nuisance parameters are absent, so Z NP also vanishes in this case. 11 / 31

19 Relationship with posterior mean The posterior mean of R satisfies where E{R(ψ) Y } = { g NP + g INF } + O(n 1 ), g NP = lim Z NP (ψ 0 ), ψ 0 ˆψ g INF = lim Z INF (ψ 0 ). ψ 0 ˆψ Standard calculations show that Z NP (ψ 0 ) = g NP + O(n 1 ) and Z INF (ψ 0 ) = g INF + O(n 1 ), for ψ 0 in O(n 1/2 ) neighborhood of ˆψ. 12 / 31

20 Key observation Z INF agrees to error of order O(n 1 ) with the value it would take in the reduced problem where the nuisance parameter χ is set equal to the maximum likelihood estimate ˆχ and Bayesian inference is considered only for the scalar parameter ψ based on the prior density π ψ (ψ). 13 / 31

21 Key observation Z INF agrees to error of order O(n 1 ) with the value it would take in the reduced problem where the nuisance parameter χ is set equal to the maximum likelihood estimate ˆχ and Bayesian inference is considered only for the scalar parameter ψ based on the prior density π ψ (ψ). It is apparent that Z INF, at least to error of order O(n 1 ), does not account for the presence of nuisance parameters. 13 / 31

22 Key observation Z INF agrees to error of order O(n 1 ) with the value it would take in the reduced problem where the nuisance parameter χ is set equal to the maximum likelihood estimate ˆχ and Bayesian inference is considered only for the scalar parameter ψ based on the prior density π ψ (ψ). It is apparent that Z INF, at least to error of order O(n 1 ), does not account for the presence of nuisance parameters. Nuisance parameters are accounted for by Z NP. 13 / 31

23 Proposal Candidate measures to assess the extent of the influence that the nuisance parameters have on the Bayesian inference might be the ratios g NP /g INF and Z NP /Z INF. 14 / 31

24 Proposal Candidate measures to assess the extent of the influence that the nuisance parameters have on the Bayesian inference might be the ratios g NP /g INF and Z NP /Z INF. We offer a practical interpretation of this ratio in terms of posterior probabilities. 14 / 31

25 Proposal Candidate measures to assess the extent of the influence that the nuisance parameters have on the Bayesian inference might be the ratios g NP /g INF and Z NP /Z INF. We offer a practical interpretation of this ratio in terms of posterior probabilities. Asymptotics guiding the relevant finite-sample quantity to examine. 14 / 31

26 Details Let the reference value ˆψ 1 α be the asymptotic 1 α posterior percentage point of ψ, given by R( ˆψ 1 α ) = z α, where z α is the α-quantile of the standard normal distribution. 15 / 31

27 The relative difference in the percentile bias of the asymptotic percentage point ˆψ 1 α between the full and the reduced problems is { P ( ψ ˆψ1 α Y ) (1 α) } { P Reduced ( ψ ˆψ1 α Y ) (1 α) } P Reduced ( ψ ˆψ1 α Y ) (1 α) = exp{ z 1 α Z NP ( ˆψ 1 α ) } 1 1 exp { z 1 α Z INF ( ˆψ 1 α ) }, to error of order O(n 1 ). 16 / 31

28 The relative difference in the percentile bias of the asymptotic percentage point ˆψ 1 α between the full and the reduced problems is { P ( ψ ˆψ1 α Y ) (1 α) } { P Reduced ( ψ ˆψ1 α Y ) (1 α) } P Reduced ( ψ ˆψ1 α Y ) (1 α) = exp{ z 1 α Z NP ( ˆψ 1 α ) } 1 1 exp { z 1 α Z INF ( ˆψ 1 α ) }, to error of order O(n 1 ). In examples, such as high-dimensional linear regression, found to be highly accurate. 16 / 31

29 Simplifications To error of order O(n 1 ) we have exp { z 1 α Z NP ( ˆψ 1 α ) } 1 1 exp { z 1 α Z INF ( ˆψ 1 α ) } = Z NP( ˆψ 1 α ) Z INF ( ˆψ 1 α ) = g NP g INF. 17 / 31

30 Hierarchical specification The prior distribution of the parameter (ψ, χ) that determines the distribution of Y may itself depend on a hyperparameter β which is given a prior distribution. 18 / 31

31 Hierarchical specification The prior distribution of the parameter (ψ, χ) that determines the distribution of Y may itself depend on a hyperparameter β which is given a prior distribution. The entire prior density is of the form π(ψ, χ, β) = π ψ,χ β (ψ, χ, β)π β (β), while the log-likelihood function depends only on (ψ, χ). 18 / 31

32 Hierarchical specification The prior distribution of the parameter (ψ, χ) that determines the distribution of Y may itself depend on a hyperparameter β which is given a prior distribution. The entire prior density is of the form π(ψ, χ, β) = π ψ,χ β (ψ, χ, β)π β (β), while the log-likelihood function depends only on (ψ, χ). A factorization T = T INF T NP can be obtained as before by first integrating the complete prior density π(ψ, χ, β) with respect to β to obtain the marginal density π ψ,χ (ψ, χ) = π(ψ, χ, β)dβ. 18 / 31

33 Example: Poisson distribution Y 1,..., Y d+1 are independent, where Y i has the Poisson distribution with mean λ i t i and t i is known, i = 1,..., d + 1. Thus, the model parameter is λ = (λ 1,..., λ d+1 ); suppose the scalar parameter of interest is a component of λ, say ψ = λ 1, so the nuisance parameter is χ = (λ 2,..., λ d+1 ). 19 / 31

34 Priors The prior distribution involves a scalar hyperparameter β: the model parameters λ 1,..., λ d+1 are assumed to be a sample from the gamma distribution with shape parameter ω 0 and scale parameter β, and β is assumed to have the inverse gamma distribution with shape parameter γ 0 and scale parameter δ 0 ; thus, π λ1,...,λd+1 β 1 ( d+1 (λ 1,..., λ d+1, β) = {Γ(ω 0 )β ω0 } d+1 i=1 ) ω0 1 d+1 λ i exp ( i=1 ) λ i, β and π β (β) = 1 Γ(γ 0 )δ γ0 0 ( β (γ0+1) exp 1 ). βδ 0 20 / 31

35 Data example Here, λ i failure rates pumps nuclear power plant. Have 10 observed Poisson variables, so d = 9; the observed values are i Y i t i / 31

36 Data example Here, λ i failure rates pumps nuclear power plant. Have 10 observed Poisson variables, so d = 9; the observed values are i Y i t i Fix priors parameter values as (ω 0, γ 0, δ 0 ) = (1.802, 0.01, 1), as suggested in previous analyses. 21 / 31

37 Values of g NP, g INF, and the ratio g NP /g INF for different interest parameters: λ 1 λ 3 λ 5 λ 7 λ 9 g INF g NP g NP /g INF / 31

38 Dependence on prior Investigate the susceptibility of the ratio g NP /g INF to the choice of (ω 0, γ 0, δ 0 ), at least locally, by considering the derivative of the ratio with respect to the components. Table gives derivatives at (ω 0, γ 0, δ 0 ) = (1.802, 0.01, 1): wrt λ 1 λ 3 λ 5 λ 7 λ 9 ω γ δ / 31

39 Dependence on prior Investigate the susceptibility of the ratio g NP /g INF to the choice of (ω 0, γ 0, δ 0 ), at least locally, by considering the derivative of the ratio with respect to the components. Table gives derivatives at (ω 0, γ 0, δ 0 ) = (1.802, 0.01, 1): wrt λ 1 λ 3 λ 5 λ 7 λ 9 ω γ δ Influence of the nuisance parameters is relatively unaffected by the choice of ω 0, γ 0 and δ 0 for λ 1,..., λ 6, moderately affected for λ 7 and λ 8, significantly affected for λ 9 and λ / 31

40 Comparisons, λ 1 Posteriors, λ 1 Tail probability Full posterior Reduced posterior Asymptotic ψ 24 / 31

41 Comparisons, λ 5 Posteriors, λ 5 Tail probability Full posterior Reduced posterior Asymptotic ψ 25 / 31

42 Comparisons, λ 7 Posteriors, λ 7 Tail probability Full posterior Reduced posterior Asymptotic ψ 26 / 31

43 Varying δ, interest parameterλ 1 Posterior distributions for λ 1 for varying δ Tail probability δ =1.0 δ = λ 1 27 / 31

44 Varying δ, interest parameterλ 9 Posterior distributions for λ 9 for varying δ Tail probability δ =1.0 δ = λ 9 28 / 31

45 Summary/Conclusions 29 / 31

46 Summary/Conclusions Decomposition of Bayesian version of adjusted signed root likelihood ratio statistic provides a simple computational machinery for analysis of effects of prior assumptions on nuisance parameters on a marginal inference on an interest parameter. 29 / 31

47 Summary/Conclusions Decomposition of Bayesian version of adjusted signed root likelihood ratio statistic provides a simple computational machinery for analysis of effects of prior assumptions on nuisance parameters on a marginal inference on an interest parameter. Measures have direct interpretation in terms of posterior probabilities. 29 / 31

48 Provide practically useful, readily computed means of assessing nuisance parameter effects in Bayesian analysis. 30 / 31

49 Provide practically useful, readily computed means of assessing nuisance parameter effects in Bayesian analysis. Calibration: what constitutes a large ratio? Best considered by comparing measures for different priors. Calculation across range of priors allows a straightforward approach to sensitivity analysis/ evaluation of robustness. 30 / 31

50 Provide practically useful, readily computed means of assessing nuisance parameter effects in Bayesian analysis. Calibration: what constitutes a large ratio? Best considered by comparing measures for different priors. Calculation across range of priors allows a straightforward approach to sensitivity analysis/ evaluation of robustness. Permits identification of what components of prior specification have substantial effect on posterior marginal inference of interest. Especially valuable within hierarchical framework. 30 / 31

51 Some relevant references Bayesian asymptotics: DiCiccio & Martin (1993), JRSSB, 55, DiCiccio & Young (2010), Biometrika, 97, DiCiccio, Kuffner & Young (2012), Biometrika, 99, Frequentist asymptotics: DiCiccio, Kuffner & Young (2015), JSPI, 165, DiCiccio, Kuffner, Young & Zaretzki (2015), Stat. Sinica, 25, DiCiccio, Kuffner & Young (2017), JSPI, to appear. 31 / 31

The formal relationship between analytic and bootstrap approaches to parametric inference

The formal relationship between analytic and bootstrap approaches to parametric inference T.J. DiCiccio Cornell University, Ithaca, NY 14853, U.S.A. T.A. Kuffner Washington University in St. Louis, St.