Calibrated Bayes: spanning the divide between frequentist and. Roderick J. Little

Size: px

Start display at page:

Download "Calibrated Bayes: spanning the divide between frequentist and. Roderick J. Little"

Gwendolyn Lambert
6 years ago
Views:

1 Calibrated Bayes: spanning the divide between frequentist and Bayesian inference Roderick J. Little

2 Outline Census Bureau s new Research & Methodology Directorate The prevailing philosophy of sample survey inference design-model compromise -- and an alternative calibrated Bayes Why I prefer the alternative UNC Calibrated Bayes for surveys 2

3 Outline Census Bureau s new Research & Methodology Directorate The prevailing philosophy of sample survey inference design-model compromise -- and an alternative calibrated Bayes Why I prefer the alternative UNC Calibrated Bayes for surveys 3

4 What is the R&M Directorate? UNC Calibrated Bayes for surveys 4

5 Strategic objectives Building a Research & Methodology Directorate that fosters innovation and plays a strategic role in Bureau activities Increasing collaborations across Census Bureau directorates - "breaking down the silos" Porting research on new products / processes to program areas Establishing more robust collaborations with external researchers and agencies Finding ways to leverage competitive advantages of the Bureau (Title 13, access to admin data) to produce products that have high demand Increasing the statistical literacy of Census Bureau data users UNC Calibrated Bayes for surveys 5

6 Some Challenges Recruit the best researchers Build better links between research and production Institutionalizing research excellence Let people know that the Census Bureau has a new research directorate with exciting plans! UNC Calibrated Bayes for surveys 6

7 Outline Census Bureau s new Research & Methodology Directorate The prevailing philosophy of sample survey inference design-model compromise -- and an alternative calibrated Bayes Why I prefer the alternative UNC Calibrated Bayes for surveys 7

8 Design-based vs model-based inference Design-based (frequentist) inference Survey variables Y fixed, inference based on sampling distribution Model-based inference: Survey variables Y are also random, assigned statistical model. Two variants: Superpopulation: Frequentist inference based on repeated samples from sample and superpopulation Bayes: add prior for parameters; inference based on posterior distribution of finite population quantities Bayes is superior to superpopulation modeling in small sample problems, but requires choice of prior UNC Calibrated Bayes for surveys 8

9 Design-based Survey Inference Y = ( Y,..., Y ) = population values, treated as fixed 1 N Q = Q( Y ) = target finite population quantity I = ( I,..., I ) = Sample Inclusion Indicators, random I i 1 N 1, unit included in sample = 0, otherwise Yinc = part of Y included in the survey qˆ( Y, I) = sample estimate of Q inc vˆ ( Y, I) = sample estimate of variance of qˆ inc ( ˆ ± ˆ ) q 1.96 v = 95% CI for Q wrt distribution of I UNC Calibrated Bayes for surveys 9

10 Example 1: stratified sampling J Q( Y ) = Y = PY, population mean j= 1 j j P = N / N = pop proportion, Y = pop mean in stratum Z = j j j j 1 N N j j Pr( I ji = 1) =, if I ji n j, and 0 otherwise n = j i= 1 J 2 2 ˆ st ( inc, ) = (1 / ) / 1 j j= j j j j v Y I P n N s n 2 j J qˆ( Y, I) y P y, y = sample mean of Y in stratum j = = = inc st j 1 s = sample variance of Y in stratum j ( ˆ ) st ± st j j j y 1.96 v = 95% CI for Y Finite population correction UNC Calibrated Bayes for surveys 10

11 Bayesian model-based inference With ignorable (probability) sample designs: Model M p( Y Z) = prior distribution for Y Z = design variables (important to include in model) p( Q( Y ) Z, Y ) = posterior predictive distribution of Q given Z, Y inc Inferences about Q are based on this posterior distribution With large samples : 95% credibility interval = qˆ Estimate is posterior mean qˆ = E( Q Z, Y ) SE is posterior standard deviation: Var( Q Z, Y Small samples: 95% credibility interval = UNC Calibrated Bayes for surveys inc ± 1.96SE inc ) 2.5 to 97.5 pctl of posterior distribution ( ) Plays role of confidence interval but simpler interpretation 11 inc

12 Parametric Models Usually the prior is specified via parametric models: = p( Y Z) p( Y Z, θ ) p( θ Z) dθ p( Y Z, θ ) = parametric model, as in superpopulation approach p( θ Z) = prior distribution for θ Superpopulation models treat θ as fixed parameter, Inference by repeated sampling from superpopulation UNC Calibrated Bayes for surveys 12

13 Ex. 1 continued. Bayes for stratified samples Inference for Q = PY population mean j= 1 j j = Y = data selected by stratified random sampling inc 2 p( µ j,log σ j ) = const. Bayes theorem yields: J [ ] 2 2 y z = j θ N µ σ θ = µ σ Model:, ~ (, ); {, } i i iid j j j j E( Y Z, Y, I) = y = P y J inc st j= 1 J 2 2 (, ˆ inc, ) = st = (1 / ) / 1 j j= j j j j Var Y Z Y I v P n N s n j In large samples, posterior distribution is normal, yielding same posterior probability interval as 95% design-based CI In small samples: posterior distribution is mixture of t s a useful small-sample correction j UNC Calibrated Bayes for surveys 13

14 The status quo for survey statistics Design-model compromise (DMC) Design-based inference for large samples, descriptive statistics But often model assisted, e.g. regression calibration: model estimates adjusted to protect against misspecification, (e.g. Särndal, Swensson and Wretman 1992). Model-based for small area estimation, nonresponse In my view, this is a form of inferential schizophrenia UNC Calibrated Bayes for surveys 14

15 Some manifestations of inferential schizophrenia in the current survey philosophy UNC Calibrated Bayes for surveys 15

16 1. Statistical standards Census Statistical standards are built from a design-based perspective Economists and other substantive researchers build models I suspect a reason why people bridle at the standards is that they have a different statistical philosophy! [Economists generally don t think of themselves as Bayesian, but to my mind they act like Bayesians in important respects] UNC Calibrated Bayes for surveys 16

17 1. Statistical Standards and the Bayes/Frequentist Gorilla B/F Gorilla Follow my (frequentist) statistical standards Why? I am an economist, I build models! UNC Calibrated Bayes for surveys 17

18 Which weights? When I was little (ha ha!) I learnt: In multiple linear regression, if variance is not constant, weight by inverse of residual variance σ 2 Var( yi ) = / ui weighted LS with weight ui Survey sampling class: OLS wrong, weight by inverse of probability of selection, w = 1/ π Model u i. Design w. Which is right? See e.g. Brewer and Mellor (1973), Dumouchel and Duncan (1983). i i i UNC Calibrated Bayes for surveys 18

19 2. When is an area small? n - o m e t e r Design-based inference Model-based inference n 0 = Point of inferential schizophrenia How do I choose n 0? If n 0 = 35, should my entire statistical philosophy be different when n=34 and n=36? UNC Calibrated Bayes for surveys 19

20 Towards the alternative: Calibrated Bayes. UNC Calibrated Bayes for surveys 20

21 Strengths of frequentist inference Focus on repeated sampling properties tends to yield inferences with good frequentist properties (are well calibrated) E.g. in survey sampling setting, automatically takes into account survey design features No need to specify prior distributions Flexible range of procedures Come up with a method (even Bayes), and we can assess it s frequentist properties UNC Calibrated Bayes for surveys 21

Ambiguous about conditioning, violates the likelihood principle, which is based on compelling arguments (Birnbaum 1962)

22 Weaknesses of the frequentist paradigm Not prescriptive: a set of principles for assessing properties of inference procedures rather than an inferential system. Where do estimates come from? Ambiguous about conditioning, violates the likelihood principle, which is based on compelling arguments (Birnbaum 1962) Design-based survey inference is largely asymptotic no exact frequentist answers for many small-sample problems Mom, where do estimates come from? UNC Calibrated Bayes for surveys 22

23 Bayes is catching on (esp for hard problems!) Most-cited mathematicians in science (Science Watch 02) 2 D. L. Donoho Stanford Stat; 3 A.F.M. Smith London Stat 4 E. A. Thompson Washington Biostat; 5 I.M.Johnstone Stanford Stat 6 J. Fan Hong Kong Stat; 7 D.B. Rubin Harvard Stat. 9 A. E. Raftery Washington Stat; 10 A.E. Gelfand U. Conn Stat. 11 S-W Guo Med. Coll. Wisc Biostat; 12 S.L. Zeger JHU Biostat. 13 P.J. Green Bristol Stat; 14 B.P. Carlin Minnesota Biostat 15 J. S. Marron UNC Stat; 16 D.G. Clayton Cambridge Biostat 16 G.O. Roberts Lancaster Stat; 20. X-L Meng Chicago Stat 21. M. P. Wand Harvard Biostat; 22.W.R. Gilks MRC Biostat 23 M. Chris Jones Open U Stat; 25.N. E. Breslow Washington Biostat People in red are all Bayesians UNC Calibrated Bayes for surveys 23

24 Strengths of Bayes 1: conceptual simplicity Bayes theorem is direct and completely general Prescriptive for inferences Automatically optimal under the model Conceptually simple predict the quantities you don t know, with measures of uncertainty B applies to complex problems --once model is specified, difficulties are purely computational Distinguish between: posterior probabilty interval: the inference Confidence interval: operating characteristic of inference UNC Calibrated Bayes for surveys 24

25 Strengths of Bayes: avoids ancillarity angst Should F reference distribution condition on ancillary statistics? Approximate ancillary statistics? Example: tests for independence in 2x2 table (Little 1989) Fixing one margin leads to Pearson chi-squared test Fixing two margins leads to Fisher exact test, CC Which is right? A survey example: sample stratum counts in poststratification F theory is ambiguous about appropriate choice of reference distribution B avoids this problem, by conditioning on the entire data set Conditionality leads to the likelihood principle (Birnbaum 1962), satisfied by B but not by F UNC Calibrated Bayes for surveys 25

26 Strengths of Bayes: nails nuisance parameters! Integrating over nuisance parameters clearly the right approach; better than Maximum likelihood (missing uncertainty) Profile likelihood (better, but still misses uncertainty) Conditional likelihood to eliminate them ok, but works for limited set of problems Strict likelihoodist inference (not general enough) Bayes transitions smoothly between problems that are weakly identified (e.g. Heckman model) and unidentified UNC Calibrated Bayes for surveys 26

27 Strengths of Bayes: Escape from asymptotia! Maximum likelihood is a large sample approximation of Bayes Observed, not expected information Prior distribution washes out Bayes works better in small samples Student T-type corrections are automatic Harder problems, e.g.: inference for the second largest eigenvalue in a principle component analysis of 30 observations For Bayes this is no problem, F???! UNC Calibrated Bayes for surveys 27

28 Asymptotia Highlands Murky subasymptotial forests How many more to reach the promised land of asymptotia? UNC Calibrated Bayes for surveys 28

29 The standard error error Design-based survey methods assume large samples, often report estimates and standard errors (or margins of error, coefficients of variation) This implicitly assumes estimate +/- z* se is a valid confidence interval (e.g. z = 1.96 for 95% interval) But in small samples, this is not true, so The goal is confidence intervals that have the approximate nominal coverage, not estimates and standard errors As a calibrated Bayesian I would say probability intervals with the correct confidence coverage, but since regular people interpret confidence intervals like probability intervals the distinction is practically moot. UNC Calibrated Bayes for surveys 29

30 Weakness of B: where do models come from? B is less effective for model formulation and assessment than for inference under a model. For example, Bayesian hypothesis testing for comparing models of different dimension is tricky sensitive to choice of priors; can t just slap down a reference prior Hard-line subjective Bayesians claim they can make pure Bayesian model selection work, but this approach is a hard sell for scientific inference Most use the data for model selection, in some form Model formulation and assessment will never achieve the degree of clarity of Bayesian inference under an agreed model UNC Calibrated Bayes for surveys 30

31 Calibrated Bayes- combines strengths of design and model-based inference All inferences are model-based, but Select models that have good frequentist properties (e/g/ design consistency) in repeated samples (are well calibrated) Capitalizes on strengths of both paradigms! Box (1980), Rubin (1984), Little (2006, 2011) Activity Model-based Design-based Inference under assumed model Model formulation / assessment Strong Weak Weak Strong UNC Calibrated Bayes for surveys 31

32 Bayes/frequentist compromises The applied statistician should be Bayesian in principle and calibrated to the real world in practice appropriate frequency calculations help to define such a tie. frequency calculations are useful for making Bayesian statements scientific, scientific in the sense of capable of being shown wrong by empirical test; here the technique is the calibration of Bayesian probabilities to the frequencies of actual events. Rubin (1984) UNC Calibrated Bayes for surveys 32

33 Applications of Calibrated Bayes Small Area Estimation: SAIPE Inference for Proportion from PPS samples Survey Weights derived from a Bayes Model UNC Calibrated Bayes for surveys 33

34 Hierarchical Bayes Models for small areas Fixed-effects models have distinct parameters (means, variances) for small areas, e.g. 2 2 yai µ a, σ a ~ N( µ a, σ a ), for unit i in area a Hierarchical Bayes models assign distributions to the parameters for each area y µ σ µ σ 2 2 ai a, a ~ N( a, a ) 2 µ a ~ N( β za, τ ) Treating parameters as random effects achieves shrinkage between direct area estimate and model prediction Area-level models can also be fitted (see below) Fully Bayes inference adds priors for variances, with improved frequentist performance (Ganesh & Lahiri 2008) UNC Calibrated Bayes for surveys 34

35 n - o m e t e r w a Multilevel models ɶ µ = w y + (1 w ) ˆ µ 1 0 a a π a a a Sample size n Model estimate Direct estimate Bayesian multilevel model estimates borrow strength increasingly from model as n decreases UNC Calibrated Bayes for surveys 35

36 Ex 1: SAIPE project Objective: Provide estimates of poverty for various age groups and median household income for all states, counties, and school districts in the U.S. Problem: Direct survey estimates (from CPS or, later, ACS) too unreliable for many areas CPS sample small for most states; no sample in 2/3 counties ACS (single year) sample small for many counties and most school districts. Solution: Use Bayesian form of small area model (Fay & Herriott 1979) to integrate survey data with data from admin records (IRS, SNAP program) and previous census long form. UNC Calibrated Bayes for surveys 36

37 Posterior Variances from State Model for 2004 CPS 5-17 Poverty Rates Results for four states State n i v i Var(Y i data) approx. wt. on y i in E(Y i data) CA 5, NC 1, IN MS UNC Calibrated Bayes for surveys 37

38 1 n N π π 1 π 2... π n π n+1 π n+2... π N Ex 2: Estimating a proportion from PPS sample I Y s ns π i : probability of inclusion for unit i, which is assumed to be known for all units in the finite population before a sample is drawn I i : binary variable indicating which units are included in the sample Y i : binary survey variable of interest for unit i s : an unequal probability random sample Proportion of the population for which Y = 1: p = N 1 N i= 1 Y i (Chen et al. 2010) UNC Calibrated Bayes for surveys 38

39 Bayesian p-spline prediction (BPSP) estimator Probit penalized polynomial spline model with m truncated power bases: Φ 1 k ( ( y β, b, π )) = β + β π + b ( π k ) p p E i i 0 k i l i l + k = 1 l= 1 m b l ( 2 0, ) ~ N τ l = 1,..., m i = 1,..., n the constants k 1 <... < k m are m selected fixed knots. ( u ) p + = { u I( u 0) } p for any real number u. Gibbs sampling to obtain draws from the posterior distributions of the parameters. UNC Calibrated Bayes for surveys 39

40 BPSP estimator (Cont.) The posterior distribution of the population proportion can be simulated by generating a large number D of draws of the form p = N y + yˆ ( d ) 1 ( d ) i j i s j s ( ) where y ˆ d j is a draw from the posterior predictive distribution of the j th observation in the non-sampled units. BPSP estimator: average of these draws. The α posterior probability interval splits the tail area 1 α equally between the upper and lower endpoints. UNC Calibrated Bayes for surveys 40

41 Other estimators The Horvitz-Thompson estimator pˆ HT = yi / π i / 1/ π i i s i s The prediction estimator 1 pˆ ˆ M = N yi + y j i s j s ˆ = prediction based on linear probit model y j The generalized regression (GR) estimator N 1 pˆ ˆ ( ˆ GR = N yi + yi yi ) / π i i= 1 i s yˆ = prediction from linear probit model i UNC Calibrated Bayes for surveys 41

42 Design of simulation study Unequal probability sampling design: PPS sampling: units are selected with probability proportional to a given size variable related to the survey variable under study. Population and sample: N=2000 with sampling rates of 5% and 10% (n=100 or 200). N=5000 with a sampling rate of 10% (n=500). The size variable X takes the values 71, 72,..., 2070 for N=2000; and 171, 172,..., 5170 for N=5000. The inclusion probabilities π were proportional to X. Simulations: 1000 replicates Compare: Empirical Bias, Width of Posterior Prob/CI Root mean squared error (RMSE) Non coverage rate of 95% CI UNC Calibrated Bayes for surveys 42

43 Continuous data Population data NULL (no association): f π i LINUP (linear association): f QUAD (quadratic association): Binary outcomes Z ( ) ( 2 f,0.2 ) ~ N π Y, Y, Y, Y, Y ( ) ( π i ) = k1π i ( ) ( ) 2 f π i = k 2 π i k 3 created by using the superpopulation 10 th, 25 th, 50 th, 75 th and 90 th percentiles of Z as cut-off values. For instance, Y 1 equals to 1 if Z is less than its superpopulation 10 th percentile, otherwise 0. correspond to true proportions p = 0.1,0.25,0.5,0.75,0.9 UNC Calibrated Bayes for surveys 43

44 RMSE s (low = good) Population Sample size True prop. HT BPSP PR GR NULL N=200 n=100 LINUP N=200 n=100 QUAD N=200 n= UNC Calibrated Bayes for surveys 44

45 Interval noncoverages (nominal = 5) Population Sample size True prop. HT BPSP PR GR NULL N=200 n=100 LINUP N=200 n=100 QUAD N=200 n= UNC Calibrated Bayes for surveys 45

46 Ex 3. Back to weights in regression Z = weight stratifier, within which weights are constant If Z is included in the covariates,design weighting is not needed, but correct modeling of relationship between Y and Z is key If Z is not included in the covariates, assume Target quantities are OLS slopes of Y on X fitted to full population Working model needs to condition on Z - different regressions in weight strata Resulting model based inference for targets includes design weights! (Little, 1991) UNC Calibrated Bayes for surveys 46

47 Summary Philosophies of inference matter! A cohesive philosophy of statistics would be nice! Bayes and frequentist ideas are both important for good statistical inference The calibrated Bayes compromise capitalizes on strengths of Bayes and frequentist paradigms Focused on survey inference, but these ideas are for me a roadmap for statistics in general UNC Calibrated Bayes for surveys 47

48 References Birnbaum, A. (1962), On the Foundations of Statistical Inference, JASA, 57, Box, GEP (1980), Sampling and Bayes inference in scientific modelling and robustness (with discussion), JRSSA 143, Brewer, KRW. & Mellor, RW (1973), "The effect of sample structure on analytical surveys," Australian J. Statist. 15, Chen, Q., Elliott, MR. & Little, RJ. (2010). Bayesian Penalized Spline Model-Based Estimation of the Finite Population Proportion for Probability-Proportional-to-Size Samples. Surv. Meth. 36, Dumouchel, WH. and Duncan, GJ. (1983), "Using survey weights in multiple regression analysis of stratified samples," JASA, 78, Ganesh, N. & Lahiri, P. (2008). A new class of average moment matching priors, Biometrika, 95, 2, Little, RJ (1989). On testing the equality of two independent binomial proportions, Am.Statist., 43, Little, RJ (1991), Inference with survey weights, JOS, 7, Little, RJ (2006). Calibrated Bayes: A Bayes/Frequentist Roadmap. Am.Statist., 60, 3, Little, RJ (2011). Calibrated Bayes, for Statistics in General, and Missing Data in Particular with discussion and rejoinder. In press, Statist. Sci. Rubin, DB (1984), Bayesianly justifiable and relevant frequency calculations for the applied statistician, Ann. Statist. 12, Särndal, C-E, Swensson, B & Wretman, JH. (1992), Model Assisted Survey Sampling, Springer Verlag: New York. UNC Calibrated Bayes for surveys 48

49 and thanks to my recent students Hyonggin An, Qi Long, Ying Yuan, Guangyu Zhang, Xiaoxi Zhang, Di An, Yan Zhou, Rebecca Andridge, Qixuan Chen, Ying Guo, Chia-Ning Wang, Nanhua Zhang UNC 2011 SSIL 49

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods