Calibrated Bayes: spanning the divide between frequentist and. Roderick J. Little
|
|
- Gwendolyn Lambert
- 6 years ago
- Views:
Transcription
1 Calibrated Bayes: spanning the divide between frequentist and Bayesian inference Roderick J. Little
2 Outline Census Bureau s new Research & Methodology Directorate The prevailing philosophy of sample survey inference design-model compromise -- and an alternative calibrated Bayes Why I prefer the alternative UNC Calibrated Bayes for surveys 2
3 Outline Census Bureau s new Research & Methodology Directorate The prevailing philosophy of sample survey inference design-model compromise -- and an alternative calibrated Bayes Why I prefer the alternative UNC Calibrated Bayes for surveys 3
4 What is the R&M Directorate? UNC Calibrated Bayes for surveys 4
5 Strategic objectives Building a Research & Methodology Directorate that fosters innovation and plays a strategic role in Bureau activities Increasing collaborations across Census Bureau directorates - "breaking down the silos" Porting research on new products / processes to program areas Establishing more robust collaborations with external researchers and agencies Finding ways to leverage competitive advantages of the Bureau (Title 13, access to admin data) to produce products that have high demand Increasing the statistical literacy of Census Bureau data users UNC Calibrated Bayes for surveys 5
6 Some Challenges Recruit the best researchers Build better links between research and production Institutionalizing research excellence Let people know that the Census Bureau has a new research directorate with exciting plans! UNC Calibrated Bayes for surveys 6
7 Outline Census Bureau s new Research & Methodology Directorate The prevailing philosophy of sample survey inference design-model compromise -- and an alternative calibrated Bayes Why I prefer the alternative UNC Calibrated Bayes for surveys 7
8 Design-based vs model-based inference Design-based (frequentist) inference Survey variables Y fixed, inference based on sampling distribution Model-based inference: Survey variables Y are also random, assigned statistical model. Two variants: Superpopulation: Frequentist inference based on repeated samples from sample and superpopulation Bayes: add prior for parameters; inference based on posterior distribution of finite population quantities Bayes is superior to superpopulation modeling in small sample problems, but requires choice of prior UNC Calibrated Bayes for surveys 8
9 Design-based Survey Inference Y = ( Y,..., Y ) = population values, treated as fixed 1 N Q = Q( Y ) = target finite population quantity I = ( I,..., I ) = Sample Inclusion Indicators, random I i 1 N 1, unit included in sample = 0, otherwise Yinc = part of Y included in the survey qˆ( Y, I) = sample estimate of Q inc vˆ ( Y, I) = sample estimate of variance of qˆ inc ( ˆ ± ˆ ) q 1.96 v = 95% CI for Q wrt distribution of I UNC Calibrated Bayes for surveys 9
10 Example 1: stratified sampling J Q( Y ) = Y = PY, population mean j= 1 j j P = N / N = pop proportion, Y = pop mean in stratum Z = j j j j 1 N N j j Pr( I ji = 1) =, if I ji n j, and 0 otherwise n = j i= 1 J 2 2 ˆ st ( inc, ) = (1 / ) / 1 j j= j j j j v Y I P n N s n 2 j J qˆ( Y, I) y P y, y = sample mean of Y in stratum j = = = inc st j 1 s = sample variance of Y in stratum j ( ˆ ) st ± st j j j y 1.96 v = 95% CI for Y Finite population correction UNC Calibrated Bayes for surveys 10
11 Bayesian model-based inference With ignorable (probability) sample designs: Model M p( Y Z) = prior distribution for Y Z = design variables (important to include in model) p( Q( Y ) Z, Y ) = posterior predictive distribution of Q given Z, Y inc Inferences about Q are based on this posterior distribution With large samples : 95% credibility interval = qˆ Estimate is posterior mean qˆ = E( Q Z, Y ) SE is posterior standard deviation: Var( Q Z, Y Small samples: 95% credibility interval = UNC Calibrated Bayes for surveys inc ± 1.96SE inc ) 2.5 to 97.5 pctl of posterior distribution ( ) Plays role of confidence interval but simpler interpretation 11 inc
12 Parametric Models Usually the prior is specified via parametric models: = p( Y Z) p( Y Z, θ ) p( θ Z) dθ p( Y Z, θ ) = parametric model, as in superpopulation approach p( θ Z) = prior distribution for θ Superpopulation models treat θ as fixed parameter, Inference by repeated sampling from superpopulation UNC Calibrated Bayes for surveys 12
13 Ex. 1 continued. Bayes for stratified samples Inference for Q = PY population mean j= 1 j j = Y = data selected by stratified random sampling inc 2 p( µ j,log σ j ) = const. Bayes theorem yields: J [ ] 2 2 y z = j θ N µ σ θ = µ σ Model:, ~ (, ); {, } i i iid j j j j E( Y Z, Y, I) = y = P y J inc st j= 1 J 2 2 (, ˆ inc, ) = st = (1 / ) / 1 j j= j j j j Var Y Z Y I v P n N s n j In large samples, posterior distribution is normal, yielding same posterior probability interval as 95% design-based CI In small samples: posterior distribution is mixture of t s a useful small-sample correction j UNC Calibrated Bayes for surveys 13
14 The status quo for survey statistics Design-model compromise (DMC) Design-based inference for large samples, descriptive statistics But often model assisted, e.g. regression calibration: model estimates adjusted to protect against misspecification, (e.g. Särndal, Swensson and Wretman 1992). Model-based for small area estimation, nonresponse In my view, this is a form of inferential schizophrenia UNC Calibrated Bayes for surveys 14
15 Some manifestations of inferential schizophrenia in the current survey philosophy UNC Calibrated Bayes for surveys 15
16 1. Statistical standards Census Statistical standards are built from a design-based perspective Economists and other substantive researchers build models I suspect a reason why people bridle at the standards is that they have a different statistical philosophy! [Economists generally don t think of themselves as Bayesian, but to my mind they act like Bayesians in important respects] UNC Calibrated Bayes for surveys 16
17 1. Statistical Standards and the Bayes/Frequentist Gorilla B/F Gorilla Follow my (frequentist) statistical standards Why? I am an economist, I build models! UNC Calibrated Bayes for surveys 17
18 Which weights? When I was little (ha ha!) I learnt: In multiple linear regression, if variance is not constant, weight by inverse of residual variance σ 2 Var( yi ) = / ui weighted LS with weight ui Survey sampling class: OLS wrong, weight by inverse of probability of selection, w = 1/ π Model u i. Design w. Which is right? See e.g. Brewer and Mellor (1973), Dumouchel and Duncan (1983). i i i UNC Calibrated Bayes for surveys 18
19 2. When is an area small? n - o m e t e r Design-based inference Model-based inference n 0 = Point of inferential schizophrenia How do I choose n 0? If n 0 = 35, should my entire statistical philosophy be different when n=34 and n=36? UNC Calibrated Bayes for surveys 19
20 Towards the alternative: Calibrated Bayes. UNC Calibrated Bayes for surveys 20
21 Strengths of frequentist inference Focus on repeated sampling properties tends to yield inferences with good frequentist properties (are well calibrated) E.g. in survey sampling setting, automatically takes into account survey design features No need to specify prior distributions Flexible range of procedures Come up with a method (even Bayes), and we can assess it s frequentist properties UNC Calibrated Bayes for surveys 21
22 Weaknesses of the frequentist paradigm Not prescriptive: a set of principles for assessing properties of inference procedures rather than an inferential system. Where do estimates come from? Ambiguous about conditioning, violates the likelihood principle, which is based on compelling arguments (Birnbaum 1962) Design-based survey inference is largely asymptotic no exact frequentist answers for many small-sample problems Mom, where do estimates come from? UNC Calibrated Bayes for surveys 22
23 Bayes is catching on (esp for hard problems!) Most-cited mathematicians in science (Science Watch 02) 2 D. L. Donoho Stanford Stat; 3 A.F.M. Smith London Stat 4 E. A. Thompson Washington Biostat; 5 I.M.Johnstone Stanford Stat 6 J. Fan Hong Kong Stat; 7 D.B. Rubin Harvard Stat. 9 A. E. Raftery Washington Stat; 10 A.E. Gelfand U. Conn Stat. 11 S-W Guo Med. Coll. Wisc Biostat; 12 S.L. Zeger JHU Biostat. 13 P.J. Green Bristol Stat; 14 B.P. Carlin Minnesota Biostat 15 J. S. Marron UNC Stat; 16 D.G. Clayton Cambridge Biostat 16 G.O. Roberts Lancaster Stat; 20. X-L Meng Chicago Stat 21. M. P. Wand Harvard Biostat; 22.W.R. Gilks MRC Biostat 23 M. Chris Jones Open U Stat; 25.N. E. Breslow Washington Biostat People in red are all Bayesians UNC Calibrated Bayes for surveys 23
24 Strengths of Bayes 1: conceptual simplicity Bayes theorem is direct and completely general Prescriptive for inferences Automatically optimal under the model Conceptually simple predict the quantities you don t know, with measures of uncertainty B applies to complex problems --once model is specified, difficulties are purely computational Distinguish between: posterior probabilty interval: the inference Confidence interval: operating characteristic of inference UNC Calibrated Bayes for surveys 24
25 Strengths of Bayes: avoids ancillarity angst Should F reference distribution condition on ancillary statistics? Approximate ancillary statistics? Example: tests for independence in 2x2 table (Little 1989) Fixing one margin leads to Pearson chi-squared test Fixing two margins leads to Fisher exact test, CC Which is right? A survey example: sample stratum counts in poststratification F theory is ambiguous about appropriate choice of reference distribution B avoids this problem, by conditioning on the entire data set Conditionality leads to the likelihood principle (Birnbaum 1962), satisfied by B but not by F UNC Calibrated Bayes for surveys 25
26 Strengths of Bayes: nails nuisance parameters! Integrating over nuisance parameters clearly the right approach; better than Maximum likelihood (missing uncertainty) Profile likelihood (better, but still misses uncertainty) Conditional likelihood to eliminate them ok, but works for limited set of problems Strict likelihoodist inference (not general enough) Bayes transitions smoothly between problems that are weakly identified (e.g. Heckman model) and unidentified UNC Calibrated Bayes for surveys 26
27 Strengths of Bayes: Escape from asymptotia! Maximum likelihood is a large sample approximation of Bayes Observed, not expected information Prior distribution washes out Bayes works better in small samples Student T-type corrections are automatic Harder problems, e.g.: inference for the second largest eigenvalue in a principle component analysis of 30 observations For Bayes this is no problem, F???! UNC Calibrated Bayes for surveys 27
28 Asymptotia Highlands Murky subasymptotial forests How many more to reach the promised land of asymptotia? UNC Calibrated Bayes for surveys 28
29 The standard error error Design-based survey methods assume large samples, often report estimates and standard errors (or margins of error, coefficients of variation) This implicitly assumes estimate +/- z* se is a valid confidence interval (e.g. z = 1.96 for 95% interval) But in small samples, this is not true, so The goal is confidence intervals that have the approximate nominal coverage, not estimates and standard errors As a calibrated Bayesian I would say probability intervals with the correct confidence coverage, but since regular people interpret confidence intervals like probability intervals the distinction is practically moot. UNC Calibrated Bayes for surveys 29
30 Weakness of B: where do models come from? B is less effective for model formulation and assessment than for inference under a model. For example, Bayesian hypothesis testing for comparing models of different dimension is tricky sensitive to choice of priors; can t just slap down a reference prior Hard-line subjective Bayesians claim they can make pure Bayesian model selection work, but this approach is a hard sell for scientific inference Most use the data for model selection, in some form Model formulation and assessment will never achieve the degree of clarity of Bayesian inference under an agreed model UNC Calibrated Bayes for surveys 30
31 Calibrated Bayes- combines strengths of design and model-based inference All inferences are model-based, but Select models that have good frequentist properties (e/g/ design consistency) in repeated samples (are well calibrated) Capitalizes on strengths of both paradigms! Box (1980), Rubin (1984), Little (2006, 2011) Activity Model-based Design-based Inference under assumed model Model formulation / assessment Strong Weak Weak Strong UNC Calibrated Bayes for surveys 31
32 Bayes/frequentist compromises The applied statistician should be Bayesian in principle and calibrated to the real world in practice appropriate frequency calculations help to define such a tie. frequency calculations are useful for making Bayesian statements scientific, scientific in the sense of capable of being shown wrong by empirical test; here the technique is the calibration of Bayesian probabilities to the frequencies of actual events. Rubin (1984) UNC Calibrated Bayes for surveys 32
33 Applications of Calibrated Bayes Small Area Estimation: SAIPE Inference for Proportion from PPS samples Survey Weights derived from a Bayes Model UNC Calibrated Bayes for surveys 33
34 Hierarchical Bayes Models for small areas Fixed-effects models have distinct parameters (means, variances) for small areas, e.g. 2 2 yai µ a, σ a ~ N( µ a, σ a ), for unit i in area a Hierarchical Bayes models assign distributions to the parameters for each area y µ σ µ σ 2 2 ai a, a ~ N( a, a ) 2 µ a ~ N( β za, τ ) Treating parameters as random effects achieves shrinkage between direct area estimate and model prediction Area-level models can also be fitted (see below) Fully Bayes inference adds priors for variances, with improved frequentist performance (Ganesh & Lahiri 2008) UNC Calibrated Bayes for surveys 34
35 n - o m e t e r w a Multilevel models ɶ µ = w y + (1 w ) ˆ µ 1 0 a a π a a a Sample size n Model estimate Direct estimate Bayesian multilevel model estimates borrow strength increasingly from model as n decreases UNC Calibrated Bayes for surveys 35
36 Ex 1: SAIPE project Objective: Provide estimates of poverty for various age groups and median household income for all states, counties, and school districts in the U.S. Problem: Direct survey estimates (from CPS or, later, ACS) too unreliable for many areas CPS sample small for most states; no sample in 2/3 counties ACS (single year) sample small for many counties and most school districts. Solution: Use Bayesian form of small area model (Fay & Herriott 1979) to integrate survey data with data from admin records (IRS, SNAP program) and previous census long form. UNC Calibrated Bayes for surveys 36
37 Posterior Variances from State Model for 2004 CPS 5-17 Poverty Rates Results for four states State n i v i Var(Y i data) approx. wt. on y i in E(Y i data) CA 5, NC 1, IN MS UNC Calibrated Bayes for surveys 37
38 1 n N π π 1 π 2... π n π n+1 π n+2... π N Ex 2: Estimating a proportion from PPS sample I Y s ns π i : probability of inclusion for unit i, which is assumed to be known for all units in the finite population before a sample is drawn I i : binary variable indicating which units are included in the sample Y i : binary survey variable of interest for unit i s : an unequal probability random sample Proportion of the population for which Y = 1: p = N 1 N i= 1 Y i (Chen et al. 2010) UNC Calibrated Bayes for surveys 38
39 Bayesian p-spline prediction (BPSP) estimator Probit penalized polynomial spline model with m truncated power bases: Φ 1 k ( ( y β, b, π )) = β + β π + b ( π k ) p p E i i 0 k i l i l + k = 1 l= 1 m b l ( 2 0, ) ~ N τ l = 1,..., m i = 1,..., n the constants k 1 <... < k m are m selected fixed knots. ( u ) p + = { u I( u 0) } p for any real number u. Gibbs sampling to obtain draws from the posterior distributions of the parameters. UNC Calibrated Bayes for surveys 39
40 BPSP estimator (Cont.) The posterior distribution of the population proportion can be simulated by generating a large number D of draws of the form p = N y + yˆ ( d ) 1 ( d ) i j i s j s ( ) where y ˆ d j is a draw from the posterior predictive distribution of the j th observation in the non-sampled units. BPSP estimator: average of these draws. The α posterior probability interval splits the tail area 1 α equally between the upper and lower endpoints. UNC Calibrated Bayes for surveys 40
41 Other estimators The Horvitz-Thompson estimator pˆ HT = yi / π i / 1/ π i i s i s The prediction estimator 1 pˆ ˆ M = N yi + y j i s j s ˆ = prediction based on linear probit model y j The generalized regression (GR) estimator N 1 pˆ ˆ ( ˆ GR = N yi + yi yi ) / π i i= 1 i s yˆ = prediction from linear probit model i UNC Calibrated Bayes for surveys 41
42 Design of simulation study Unequal probability sampling design: PPS sampling: units are selected with probability proportional to a given size variable related to the survey variable under study. Population and sample: N=2000 with sampling rates of 5% and 10% (n=100 or 200). N=5000 with a sampling rate of 10% (n=500). The size variable X takes the values 71, 72,..., 2070 for N=2000; and 171, 172,..., 5170 for N=5000. The inclusion probabilities π were proportional to X. Simulations: 1000 replicates Compare: Empirical Bias, Width of Posterior Prob/CI Root mean squared error (RMSE) Non coverage rate of 95% CI UNC Calibrated Bayes for surveys 42
43 Continuous data Population data NULL (no association): f π i LINUP (linear association): f QUAD (quadratic association): Binary outcomes Z ( ) ( 2 f,0.2 ) ~ N π Y, Y, Y, Y, Y ( ) ( π i ) = k1π i ( ) ( ) 2 f π i = k 2 π i k 3 created by using the superpopulation 10 th, 25 th, 50 th, 75 th and 90 th percentiles of Z as cut-off values. For instance, Y 1 equals to 1 if Z is less than its superpopulation 10 th percentile, otherwise 0. correspond to true proportions p = 0.1,0.25,0.5,0.75,0.9 UNC Calibrated Bayes for surveys 43
44 RMSE s (low = good) Population Sample size True prop. HT BPSP PR GR NULL N=200 n=100 LINUP N=200 n=100 QUAD N=200 n= UNC Calibrated Bayes for surveys 44
45 Interval noncoverages (nominal = 5) Population Sample size True prop. HT BPSP PR GR NULL N=200 n=100 LINUP N=200 n=100 QUAD N=200 n= UNC Calibrated Bayes for surveys 45
46 Ex 3. Back to weights in regression Z = weight stratifier, within which weights are constant If Z is included in the covariates,design weighting is not needed, but correct modeling of relationship between Y and Z is key If Z is not included in the covariates, assume Target quantities are OLS slopes of Y on X fitted to full population Working model needs to condition on Z - different regressions in weight strata Resulting model based inference for targets includes design weights! (Little, 1991) UNC Calibrated Bayes for surveys 46
47 Summary Philosophies of inference matter! A cohesive philosophy of statistics would be nice! Bayes and frequentist ideas are both important for good statistical inference The calibrated Bayes compromise capitalizes on strengths of Bayes and frequentist paradigms Focused on survey inference, but these ideas are for me a roadmap for statistics in general UNC Calibrated Bayes for surveys 47
48 References Birnbaum, A. (1962), On the Foundations of Statistical Inference, JASA, 57, Box, GEP (1980), Sampling and Bayes inference in scientific modelling and robustness (with discussion), JRSSA 143, Brewer, KRW. & Mellor, RW (1973), "The effect of sample structure on analytical surveys," Australian J. Statist. 15, Chen, Q., Elliott, MR. & Little, RJ. (2010). Bayesian Penalized Spline Model-Based Estimation of the Finite Population Proportion for Probability-Proportional-to-Size Samples. Surv. Meth. 36, Dumouchel, WH. and Duncan, GJ. (1983), "Using survey weights in multiple regression analysis of stratified samples," JASA, 78, Ganesh, N. & Lahiri, P. (2008). A new class of average moment matching priors, Biometrika, 95, 2, Little, RJ (1989). On testing the equality of two independent binomial proportions, Am.Statist., 43, Little, RJ (1991), Inference with survey weights, JOS, 7, Little, RJ (2006). Calibrated Bayes: A Bayes/Frequentist Roadmap. Am.Statist., 60, 3, Little, RJ (2011). Calibrated Bayes, for Statistics in General, and Missing Data in Particular with discussion and rejoinder. In press, Statist. Sci. Rubin, DB (1984), Bayesianly justifiable and relevant frequency calculations for the applied statistician, Ann. Statist. 12, Särndal, C-E, Swensson, B & Wretman, JH. (1992), Model Assisted Survey Sampling, Springer Verlag: New York. UNC Calibrated Bayes for surveys 48
49 and thanks to my recent students Hyonggin An, Qi Long, Ying Yuan, Guangyu Zhang, Xiaoxi Zhang, Di An, Yan Zhou, Rebecca Andridge, Qixuan Chen, Ying Guo, Chia-Ning Wang, Nanhua Zhang UNC 2011 SSIL 49
Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little
Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods
More informationSome methods for handling missing values in outcome variables. Roderick J. Little
Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean
More informationAccounting for Complex Sample Designs via Mixture Models
Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3
More informationBayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples
Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationTesting Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA
Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationPenalized Spline of Propensity Methods for Missing Data and Causal Inference. Roderick Little
Penalized Spline of Propensity Methods for Missing Data and Causal Inference Roderick Little A Tail of Two Statisticians (but who s tailing who) Taylor Little Cambridge U 978 BA Math ( st class) 97 BA
More informationHypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006
Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationThe propensity score with continuous treatments
7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationHB Methods for Combining Estimates from Multiple Surveys
Hierarchical Bayesian Methods for Combining Estimates from Multiple Surveys Adrijo Chakraborty NORC at the University of Chicago January 30, 2015 Joint work with Gauri Sankar Datta and Yang Cheng Outline
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationEquivalence of random-effects and conditional likelihoods for matched case-control studies
Equivalence of random-effects and conditional likelihoods for matched case-control studies Ken Rice MRC Biostatistics Unit, Cambridge, UK January 8 th 4 Motivation Study of genetic c-erbb- exposure and
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationWhy Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory
Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationExam C Solutions Spring 2005
Exam C Solutions Spring 005 Question # The CDF is F( x) = 4 ( + x) Observation (x) F(x) compare to: Maximum difference 0. 0.58 0, 0. 0.58 0.7 0.880 0., 0.4 0.680 0.9 0.93 0.4, 0.6 0.53. 0.949 0.6, 0.8
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationUniversity of Michigan School of Public Health
University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 013 Paper 97 In Praise of Simplicity not Mathematistry! Ten Simple Powerful
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationAn Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys
An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationBayesian Model Comparison and Validation John Geweke, University of Iowa December 22, 2006 Models are the venue for much of the
Bayesian Model Comparison and Validation John Geweke, University of Iowa john-geweke@uiowa.edu December 22, 2006 Models are the venue for much of the work of the economics profession. We use them to express,
More informationSTATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS
STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Design-based properties, Informative sampling,
More informationSmall Area Confidence Bounds on Small Cell Proportions in Survey Populations
Small Area Confidence Bounds on Small Cell Proportions in Survey Populations Aaron Gilary, Jerry Maples, U.S. Census Bureau U.S. Census Bureau Eric V. Slud, U.S. Census Bureau Univ. Maryland College Park
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationSTA Module 10 Comparing Two Proportions
STA 2023 Module 10 Comparing Two Proportions Learning Objectives Upon completing this module, you should be able to: 1. Perform large-sample inferences (hypothesis test and confidence intervals) to compare
More informationthe unification of statistics its uses in practice and its role in Objective Bayesian Analysis:
Objective Bayesian Analysis: its uses in practice and its role in the unification of statistics James O. Berger Duke University and the Statistical and Applied Mathematical Sciences Institute Allen T.
More informationUsing Bayesian Priors for More Flexible Latent Class Analysis
Using Bayesian Priors for More Flexible Latent Class Analysis Tihomir Asparouhov Bengt Muthén Abstract Latent class analysis is based on the assumption that within each class the observed class indicator
More informationComments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek
Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that
More informationSmall Area Modeling of County Estimates for Corn and Soybean Yields in the US
Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Matt Williams National Agricultural Statistics Service United States Department of Agriculture Matt.Williams@nass.usda.gov
More informationBayesian Econometrics
Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationCarl N. Morris. University of Texas
EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province
More informationIntroduction to Econometrics. Review of Probability & Statistics
1 Introduction to Econometrics Review of Probability & Statistics Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com Introduction 2 What is Econometrics? Econometrics consists of the application of mathematical
More informationBAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS
BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA
More informationCausal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions
Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationBayesian spatial quantile regression
Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationarxiv: v1 [math.st] 28 Feb 2017
Bridging Finite and Super Population Causal Inference arxiv:1702.08615v1 [math.st] 28 Feb 2017 Peng Ding, Xinran Li, and Luke W. Miratrix Abstract There are two general views in causal analysis of experimental
More informationSPRING 2007 EXAM C SOLUTIONS
SPRING 007 EXAM C SOLUTIONS Question #1 The data are already shifted (have had the policy limit and the deductible of 50 applied). The two 350 payments are censored. Thus the likelihood function is L =
More informationBootstrap and Parametric Inference: Successes and Challenges
Bootstrap and Parametric Inference: Successes and Challenges G. Alastair Young Department of Mathematics Imperial College London Newton Institute, January 2008 Overview Overview Review key aspects of frequentist
More informationMonte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics
Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,
More informationShort Questions (Do two out of three) 15 points each
Econometrics Short Questions Do two out of three) 5 points each ) Let y = Xβ + u and Z be a set of instruments for X When we estimate β with OLS we project y onto the space spanned by X along a path orthogonal
More informationMiscellanea A note on multiple imputation under complex sampling
Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.
More informationVARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA
Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationRegression with a Single Regressor: Hypothesis Tests and Confidence Intervals
Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression
More informationDiscussion of Papers on the Extensions of Propensity Score
Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationMulti-level Models: Idea
Review of 140.656 Review Introduction to multi-level models The two-stage normal-normal model Two-stage linear models with random effects Three-stage linear models Two-stage logistic regression with random
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationA note on Reversible Jump Markov Chain Monte Carlo
A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction
More informationIntroduction to Survey Data Integration
Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5
More informationDiscussion on Fygenson (2007, Statistica Sinica): a DS Perspective
1 Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective Chuanhai Liu Purdue University 1. Introduction In statistical analysis, it is important to discuss both uncertainty due to model choice
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationData Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationGENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University
GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationInferences on missing information under multiple imputation and two-stage multiple imputation
p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches
More informationSigmaplot di Systat Software
Sigmaplot di Systat Software SigmaPlot Has Extensive Statistical Analysis Features SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical
More informationBFF Four: Are we Converging?
BFF Four: Are we Converging? Nancy Reid May 2, 2017 Classical Approaches: A Look Way Back Nature of Probability BFF one to three: a look back Comparisons Are we getting there? BFF Four Harvard, May 2017
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationTwo examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.
Two examples of the use of fuzzy set theory in statistics Glen Meeden University of Minnesota http://www.stat.umn.edu/~glen/talks 1 Fuzzy set theory Fuzzy set theory was introduced by Zadeh in (1965) as
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationBayesian Model Diagnostics and Checking
Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in
More informationMore on nuisance parameters
BS2 Statistical Inference, Lecture 3, Hilary Term 2009 January 30, 2009 Suppose that there is a minimal sufficient statistic T = t(x ) partitioned as T = (S, C) = (s(x ), c(x )) where: C1: the distribution
More informationHISTORICAL PERSPECTIVE OF SURVEY SAMPLING
HISTORICAL PERSPECTIVE OF SURVEY SAMPLING A.K. Srivastava Former Joint Director, I.A.S.R.I., New Delhi -110012 1. Introduction The purpose of this article is to provide an overview of developments in sampling
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationNonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling
Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Ji-Yeon Kim Iowa State University F. Jay Breidt Colorado State University Jean D. Opsomer Colorado State University
More informationModel-based estimates of the finite population mean for two-stage cluster samples with unit non-response
Appl. Statist. (2007) 56, Part 1, pp. 79 97 Model-based estimates of the finite population mean for two-stage cluster samples with unit non-response Ying Yuan University of Texas M. D. Anderson Cancer
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationTheory and Methods of Statistical Inference. PART I Frequentist theory and methods
PhD School in Statistics cycle XXVI, 2011 Theory and Methods of Statistical Inference PART I Frequentist theory and methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution
More informationxel (Xi- W), Y = N- CL
A Model-Based Approach to Weight Trimming Michael R. Elliott, Center for Clinical Epidemiology and Biostatistics, Department of Biostatistics and Epidemiology, University of Pennsylvania Medical Center,
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationBayesian Analysis of Latent Variable Models using Mplus
Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are
More informationLecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH
Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationEric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742
COMPARISON OF AGGREGATE VERSUS UNIT-LEVEL MODELS FOR SMALL-AREA ESTIMATION Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742 Key words:
More informationIrr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland
Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS
More information