Random permutation models with auxiliary variables. Design-based random permutation models with auxiliary information. Wenjun Li
|
|
- Magnus Mills
- 6 years ago
- Views:
Transcription
1 Running heads: Random permutation models with auxiliar variables Design-based random permutation models with auxiliar information Wenjun Li Division of Preventive and Behavioral Medicine Universit of Massachusetts Medical School Shaw Building SH2-230, 55 Lake Avenue orth, Worcester, MA 0655, USA Telephone: (508) Fax: (508) Edward J. Stanek Department of Public Health, Universit of Massachusetts, 75. Pleasant Street, Amherst, MA 0003, USA Telephone: (43) Fax: (43) Julio da Motta Singer Departamento de Estatística, Universidade de São Paulo Caixa Postal 6628, São Paulo, SP , Brazil Phone: Fax: LiW_06.doc 9/27/2006 3:03 PM Page of 5
2 Abstract We extend the random permutation model proposed b Stanek, Singer and Lencina (2004) to obtain best linear unbiased estimators of a finite population mean under simple random without replacement sampling in situations where auxiliar information is available. The procedure provides a sstematic design-based justification for well-known results involving common estimators and ma serve as the basis for extending such a minimum assumption theor to more complicated sample designs. Kewords: auxiliar variable; design-based inference; prediction; finite sampling; random permutation model; simultaneous permutation LiW_06.doc 9/27/2006 3:03 PM Page 2 of 5
3 . ntroduction mprovements in the precision of estimates of population parameters based on random samples can be made b accounting for auxiliar information (e.g., age, gender etc.). Man estimators with such features have been proposed, but the either require assumptions beond those pertaining to the sample design or lack an integrated theor. For example, model-based approaches (Ghosh and Rao 994, Rao 997), that generate best linear unbiased predictors (BLUP) ignore the sample design but require a postulated model. Additional superpopulation model assumptions are required for model-assisted approaches that lead to generalized regression (GREG) estimators (Särndal, Swensson and Wretman 992). Calibration estimators, on the other hand, optimize benchmark weights b being adjusted to known population quantities on some set of auxiliar variables, but lack an integrated theor. Other designbased approaches consider the finite population as a sample realization from an infinite population, and thus make additional assumptions beond the sampling design (Fuller 2002). We develop a design-based estimator of a linear function of the response that accounts for auxiliar information, and requires no assumptions beond those defining simple random sampling. The development extends the use of the random permutation model (Stanek and Singer 2004, Stanek, Singer and Lencina 2004) to account for auxiliar variables. Under this minimal assumption setup, the results establish that the commonl used estimator (Cochran 977) is LiW_06.doc 9/27/2006 3:03 PM Page 3 of 5
4 BLUE. n addition, the development highlights novel ideas that emerge from the random permutation model framework. The first is the expression of population parameters as sums of random variables. Another is the classification of the underling random variables into those that will be realized, and those that need to be predicted. This paper is organized as follows. We first present definitions and notation, and introduce the random permutation model. We next include multiple auxiliar variables, and use the model to derive the best linear unbiased estimator (BLUE) of the population mean. We conclude with an example and discussion. 2. A Design-Based Model for Simple Random Sampling We represent sampling formall b a set of indicator random variables whose partial realization specifies a selected sample. These random variables permute the units in the population, and hence we refer to the underling stochastic model as a random permutation model. Elements of the population, including the response of interest and auxiliar variables, are non-stochastic, but not necessaril observed. The population is represented as a vector of random variables. We use the stochastic model for these random variables to develop an optimal estimator of a parameter defined in the population, assuming the population mean is known for the auxiliar variables. Unlike LiW_06.doc 9/27/2006 3:03 PM Page 4 of 5
5 previous work (Fuller 2002), our definition does not require the population to be a random sample from some infinite population. Let the population consist of subjects, indexed b s =, 2,, noninformative labels. A non-stochastic potentiall observable vector, s = (( zsk) ) z, k = 0,,..., p is associated with subject s, where z0s = s denotes the outcome of interest, and zks = x ks µ k for k =,..., p denote auxiliar variables (centered at zero), with x s = (( xks) ) and ( µ ) ( ) x = k = s = s µ x. The mean of the auxiliar variables is assumed known in the population. We represent the population µ 0 where vector of means b z = ( µ p) µ = s= s and the population variance b ( ) 2 σ σ X = =, s= σx ΣX Σ, where Σ ( ) ( z s µ z)( z s µ z) ( σ σ σ ) 2 p ( ) k k* σ X = x x x and Σ X = ( σ x x ). We define the random permutation model as the set of all possible equall likel permutations of subjects in the population. Following Stanek, Singer and Lencina (2004), we explicitl define a set of indicator random variables i =, 2,,, that have a value of one if subject s is in position i in a permutation, and zero otherwise. Using this notation, we define Z z zu, where Ui = ( Ui Ui2 U i) and = (( )) i = U s= is s = i U is, z z s, and LiW_06.doc 9/27/2006 3:03 PM Page 5 of 5
6 Z = Uz where U = ( U U U ). We refer to U as a random 2 permutation matrix, and require each realization to be equall likel, subject to the constraints that one unit is allocated to each position, U = and all units are assigned to a position, U =, where is an vector with all elements equal to. Taking the expectation over all possible permutations, ( Z) = µ z, and cov( vec ( )) =, E Z Σ P, where P = b J, is an ab, a a identit matrix, and J =. The random variables in Z represent a full permutation of the subjects in the population, with the first column, Y, representing response, and the remaining columns, X, k =,..., p, representing auxiliar variables. ote that subjects k are not identifiable in this representation. Without loss of generalit, we assume that the sample corresponds to the random variables in rows i =,..., n, i.e., Z S, with the remainder in the rows corresponding to i = n+,...,, i.e. Z R, where Z = Z Z. This notation explicitl represents the process of simple ( ) S R random sampling b a stochastic model. We simplif estimation b defining a column expansion of the random variables for the sample, Z = vec ( Z ) and the remainder, = vec ( ) S value of the sample and remaining random variables are ( ) Z Z. The expected R E Z = G µ and LiW_06.doc 9/27/2006 3:03 PM Page 6 of 5
7 ( ) Z = G, where = ( n np) E µ ( ) G 0 and G = n 0 ( n). The p covariance structure is Z V V, var =, where V = Σ P n,, Z V, V V = Σ P (, and n),, = n ( n) V Σ J, where J ( ) = n n. n n Consequentl, the partitioned model that reflects simple random sampling can be represented as Z G = µ + E. () Z G 3. BLUE of a linear function Our interest lie in linear functions of the permuted response variate, namel θ cy cy i= i i = =, or equivalentl, where = ( np ) when θ = CZ + C Z (2) ( ) C c 0, = ( ) ci = for all i,..., C c 0 and = ( ) n p * population total; or when c ( n ) c c c. For example, =, θ = µ ; when c i = for all i =,...,, θ is the i = for i =,..., n* < n, θ ma correspond to the mean response for an interviewer of the first n * sample subjects. After sampling, onl C Z will be unknown; thus, estimating θ is equivalent to predicting C Z. Following Roall s prediction approach (Roall 976), we LiW_06.doc 9/27/2006 3:03 PM Page 7 of 5
8 develop the best linear unbiased predictor (BLUP) of C Z which, when added to CZ, generates the BLUE of θ. We require the predictor to be a linear function of the sample, i.e., wz, to be an unbiased predictor of C Z, i.e., E( ) = E( ) wz C Z, and to have minimum expected mean squared error (MSE). As a result, the estimator of θ can be expressed as P = CZ + wz. The unbiased constraint implies that wg C G = 0. The variance of P is given b ( ), var P = wvw 2wV C + C V C. We then appl Roall s prediction theorem (Roall 976) to find the value of w that minimizes ( ) 2 2{ } Φ w = wv w w V C + w G C G,, λ where λ is a Lagrangian multiplier. The unique solution is {, ( ) (, )} ( ) β ( ) wˆ = V V + G GV G G GV V C ( ) = c f n n n n (3) c = n c, f n where ( ) i= n+ Consequentl, i = and = X X = ( β β2 βp) β Σ σ. p ( ) ( ) β ( µ ) Pˆ = cy + n c Y f k k X = k k. (4) where Y and X, k =,..., p are sample means. The variance is given b k 2 ( ˆ ) (( ) ( ) ) σ ( ) 2 ( ) ( ) var P = cc n c + f c f c f n σ 2 2 n n n ( 2( c )( c ) ( 2 )( ) ( c ) ) ( ρ ) ( σ ) n n f f n X n + + (5) LiW_06.doc 9/27/2006 3:03 PM Page 8 of 5
9 where ρ = σ Σ σ σ is the squared multiple correlation coefficient of Y on 2 2 X X X X 2 2 X. n practical applications, β, σ, and ρ X are not known, and must be replaced b sample estimators. 4. Example As an example, suppose we are interested in estimating the mean response given b b µ = Y i= i based on a simple random sample, accounting for auxiliar information. Since ci =, for i=,,, the BLUE is Pˆ = fy + f Y f k k X = k k, (6) p ( ) ( ) β ( µ ) or equivalentl, P p ˆ = Y β k( X k µ k) k = ( Pˆ ) ( ρ X )( ) 2 2, with variance, var = f n σ. (7) As a practical application, suppose there is interest in estimating the smoking rate µ = π in a population based on a simple random sample with both smoking status (=smoker, 0=non-smoker) and gender (=male, 0=female) recorded on the sample subjects. We assume that the proportion of males in the population, µ x = π x, is known, and represent the sample estimate of the proportion smoking as Y = ˆ π, the proportion of males in the sample as X = ˆ π x, and the proportion of male smokers in the sample as ˆx π. With this notation, ( ) 2 n ˆ σ ˆ ( ˆ = π π ), ( n ) ˆ σ 2 ˆ ( ˆ x πx πx) =, and LiW_06.doc 9/27/2006 3:03 PM Page 9 of 5
10 ( ) n ˆ σ = ˆ π ˆ π ˆ π. Using these estimators, we estimate β b x x x ( ˆ ˆ ˆ )( ˆ ( ˆ )) ˆ ˆ x x x x [ male] [ female] b = π π π π π = π π, which is the estimated difference in male and female prevalences based on the sample. Substituting these expressions into the estimator in (6) and (7) results in { π ( ) π ( πx πx) } ˆ ˆ ˆ ˆ P = n + n b ( [ ] [ ])( ) = ˆ π ˆ π ˆ π ˆ π π male female x x which is the well-known post-stratified estimator with estimated variance ( Pˆ ) ( ρ )( ) var ˆ ˆ ˆ 2 2 = f n σ, where ˆ ρ = ˆ π ˆ ππˆ x x ( ) ( ) ˆ π ˆ π ˆ π ˆ π x x. 5. Discussion We have shown that the estimator (4) is the best linear unbiased estimator (BLUE) of a linear combination of response under simple random sampling without replacement. The results establish that the commonl used estimator developed under alternative frameworks is BLUE. The estimator is expressed identicall as those commonl seen in multiple linear regression models that do not account for the finite population (Grabill 976), but includes a finite population correction factor in the variance. Results (6) and (7) are also identical to those for difference estimators with optimal coefficients (Montanari 987); to the GREG estimator (Särndal, Swensson and Wretman 992); and to the multiple regression estimator developed under a superpopulation model (Fuller 2002). LiW_06.doc 9/27/2006 3:03 PM Page 0 of 5
11 The surve sampling literature has struggled to reconcile design-based and model-based theories of estimation/prediction. Model-based methods recentl popularized b Valiant, Dorfman and Roall (see (Valliant, Dorfman and Roall 2000)) stem from the prediction approach developed b Roall (see (Roall 973) and (Roall 976)). The underling theoretical structure is important, since it allows such methods to be extended relativel easil to different applications with increasing complexit. The limitation of such theor is that it does not account for the sample design. A similar unifing theor has not been developed for design-based methods. Cochran s (Cochran 977) original approach was to postulate a linear regression model, and then determine the regression coefficients based on minimizing the variance. Other approaches, such as the GREG or the calibration approaches (Särndal, Swensson and Wretman 992) have combined model-based and design-based ideas, or began with ad-hoc functional forms of estimators, and optimized them in special settings. These approaches have been successful in addressing man practical problems in a design-based framework (Särndal, Swensson and Wretman 992, Brewer 2002). However, the have not provided a consistent conceptual and theoretical basis that can be readil extended to more complex applications. LiW_06.doc 9/27/2006 3:03 PM Page of 5
12 We believe that representing the sample design via a random permutation model, and then predicting functions of unobserved subjects in a sstematic wa provides an appealing, straightforward foundation for finite population inference. There are steps in this process that break with tradition, such as expressing a parameter as a sum of random variables. Focusing attention on predicting unobserved quantities is certainl intuitivel satisfing, but unusual in the context of estimation. The development also blurs the distinction between the traditional use of the term predictor (for random variables) and estimator (for parameters). We have illustrated how the design-based random permutation model theor can be extended to include auxiliar variables in a straightforward manner. These results extend the scope of previous results (Stanek and Singer 2004, Stanek, Singer and Lencina 2004) to a broader class of problems. The previous developments of the theor have identified subtleties in interpreting random effects in simple random sampling (Stanek, Singer and Lencina 2004) and developed predictors of realized random effects in balanced two stage sampling problems with response error (Stanek and Singer 2004). Current research is extending these results to clustered population settings where clusters are of different size and there is unequal probabilit sampling, and to settings where there is missing data. n each case, a similar approach is considered, with estimators (or predictors) developed via a clear optimization theor. LiW_06.doc 9/27/2006 3:03 PM Page 2 of 5
13 n practice, covariances in the expressions for the estimators need to be estimated. Some simulation stud results on the impact of such estimation are given b Li (Li 2003). The resulting estimator coincides with those developed b GREG or calibration approaches, and strengthens the appeal of the random permutation model. Still, much more work is needed to extend the methods to more complex settings, including two stage designs with cluster and unit covariates, longitudinal studies, and settings where units are randomized to treatments. We consider the basic results developed here to provide a foundation for additional work in these directions. 6. Acknowledgements This research was partiall supported b a H grant (H-PHS-R0-HD36848). The authors wish to thank Drs. John Buonaccorsi and Carol Bigelow for their constructive comments. The content of this article is a part of the first author s dissertation conducted at the Department of Biostatistics and Epidemiolog, Universit of Massachusetts, Amherst, Massachusetts. LiW_06.doc 9/27/2006 3:03 PM Page 3 of 5
14 7. References Brewer, K. R. W. (2002), Combined Surve Sampling nference: Weighing Basu's Elephants, London ; ew York, ew York: Arnold ; Distributed in the United States of America b Oxford Universit Press. Cochran, W. G. (977), Sampling Techniques (Third ed.), ew York: John Wile and Sons. Fuller, W. A. (2002), "Regression Estimation for Surve Samples," Surve methodolog, 28, Ghosh, M., and Rao, J.. K. (994), "Small Area Estimation: An Appraisal," Statistical Science, 9, Grabill, F. A. (976), Theor and Application of the Linear Model (Vol. ), Belmont, CA: Wadsworth Publishing Compan, nc. Li, W. (2003), "Use of Random Permutation Model in Rate Estimation and Standardization," Ph.D. Dissertation, Universit of Massachusetts, Department of Biostatistics and Epidemiolog. Montanari, G. E. (987), "Post-Sampling Efficient Qr-Prediction in Large-Sample Surves," nternational Statistical Review, 55, Rao, J.. K. (997), "Developments in Sample Surve Theor: An Appraisal," Canadian Journal of Statistics, 25, -2. Roall, R. M. (973), "The Prediction Approach to Finite Population Sampling Theor: Application to the Hospital Discharge Surve.," Technical, ational Center for Health Statistics, Office of Statistical Methods. Roall, R. M. (976), "The Linear Least-Squares Prediction Approach to Two-Stage Sampling," Journal of the American Statistical Association, 7, Särndal, C. E., Swensson, B., and Wretman, J. (992), Model Assisted Surve Sampling, ew York: Springer-Verlag. Stanek, E. J., and Singer, J. M. (2004), "Predicting Random Effects from Finite Population Clustered Samples with Response Error," Journal of the American Statistical Association, 99, Stanek, E. J., Singer, J. M., and Lencina, V. B. (2004), "A Unified Approach to Estimation and Prediction under Simple Random Sampling," Journal of Statistical Planning and nference, 2, LiW_06.doc 9/27/2006 3:03 PM Page 4 of 5
15 Valliant, R., Dorfman, A. H., and Roall, R. M. (2000), Finite Population Sampling and nference, a Prediction Approach, ew York: John Wile & Sons. LiW_06.doc 9/27/2006 3:03 PM Page 5 of 5
DESIGN-BASED RANDOM PERMUTATION MODELS WITH AUXILIARY INFORMATION. Wenjun Li. Division of Preventative and Behavioral Medicine
DESG-BASED RADOM PERMUTATO MODELS WTH AUXLARY FORMATO Wenjun Li Division of Preventative and Behavioral Medicine Universit of Massachusetts Medical School Worcester MA 0655 Edward J. Stanek Department
More informationDivision of Preventative and Behavioral Medicine. University of Massachusetts Medical School, Worcester, MA 01655
USE OF AUXLARY FORMATO A DESG-BASED RADOM PERMUTATO MODEL Wenjun Li Division of Preventative Behavioral Medicine Universit of Massachusetts Medical School, Worcester, MA 0655 Edward J. Stanek Department
More informationComments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek
Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that
More informationDomain estimation under design-based models
Domain estimation under design-based models Viviana B. Lencina Departamento de Investigación, FM Universidad Nacional de Tucumán, Argentina Julio M. Singer and Heleno Bolfarine Departamento de Estatística,
More informationA comparison of stratified simple random sampling and sampling with probability proportional to size
A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction
More informationStatistics in Medicine. Prediction with measurement errors: do we really understand the BLUP?
Prediction with measurement errors: do we really understand the BLUP? Journal: Manuscript ID: SIM-0-00 Wiley - Manuscript type: Paper Date Submitted by the Author: 0-Apr-00 Complete List of Authors: Singer,
More informationA comparison of stratified simple random sampling and sampling with probability proportional to size
A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson 1 Introduction When planning the sampling strategy (i.e.
More informationImplications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators. Calibration Estimators
Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators Jill A. Dever, RTI Richard Valliant, JPSM & ISR is a trade name of Research Triangle Institute. www.rti.org
More informationarxiv: v2 [math.st] 20 Jun 2014
A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun
More informationNONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total. Abstract
NONLINEAR CALIBRATION 1 Alesandras Pliusas 1 Statistics Lithuania, Institute of Mathematics and Informatics, Lithuania e-mail: Pliusas@tl.mii.lt Abstract The definition of a calibrated estimator of the
More informationSampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.
Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic
More informationA MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR
Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:
More informationModel Assisted Survey Sampling
Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling
More informationFinite Population Sampling and Inference
Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane
More informationSuperpopulations and Superpopulation Models. Ed Stanek
Superpopulations and Superpopulation Models Ed Stanek Contents Overview Background and History Generalizing from Populations: The Superpopulation Superpopulations: a Framework for Comparing Statistics
More informationINSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING
Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy
More informationREPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES
Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for
More informationEstimating Realized Random Effects in Mixed Models
Etimating Realized Random Effect in Mixed Model (Can parameter for realized random effect be etimated in mixed model?) Edward J. Stanek III Dept of Biotatitic and Epidemiology, UMASS, Amhert, MA USA Julio
More informationDesign and Estimation for Split Questionnaire Surveys
University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2008 Design and Estimation for Split Questionnaire
More informationA new approach to weighting and inference in sample surveys
Biometria (2008), 95, 3,pp. 539 553 C 2008 Biometria Trust Printed in Great Britain doi: 10.1093/biomet/asn028 A new approach to weighting and inference in sample surves BY JEAN-FRANÇOIS BEAUMONT Statistics
More informationAdmissible Estimation of a Finite Population Total under PPS Sampling
Research Journal of Mathematical and Statistical Sciences E-ISSN 2320-6047 Admissible Estimation of a Finite Population Total under PPS Sampling Abstract P.A. Patel 1* and Shradha Bhatt 2 1 Department
More informationA NONINFORMATIVE BAYESIAN APPROACH FOR TWO-STAGE CLUSTER SAMPLING
Sankhyā : The Indian Journal of Statistics Special Issue on Sample Surveys 1999, Volume 61, Series B, Pt. 1, pp. 133-144 A OIFORMATIVE BAYESIA APPROACH FOR TWO-STAGE CLUSTER SAMPLIG By GLE MEEDE University
More informationAnalysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington
Analsis of Longitudinal Data Patrick J. Heagert PhD Department of Biostatistics Universit of Washington 1 Auckland 2008 Session Three Outline Role of correlation Impact proper standard errors Used to weight
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationSampling Theory. Improvement in Variance Estimation in Simple Random Sampling
Communications in Statistics Theory and Methods, 36: 075 081, 007 Copyright Taylor & Francis Group, LLC ISS: 0361-096 print/153-415x online DOI: 10.1080/0361090601144046 Sampling Theory Improvement in
More informationREPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY
REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in
More informationBIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING
Statistica Sinica 22 (2012), 777-794 doi:http://dx.doi.org/10.5705/ss.2010.238 BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationA Model-Over-Design Integration for Estimation from Purposive Supplements to Probability Samples
A Model-Over-Design Integration for Estimation from Purposive Supplements to Probability Samples Avinash C. Singh, NORC at the University of Chicago, Chicago, IL 60603 singh-avi@norc.org Abstract For purposive
More informationBinomial and Poisson Probability Distributions
Binomial and Poisson Probability Distributions Esra Akdeniz March 3, 2016 Bernoulli Random Variable Any random variable whose only possible values are 0 or 1 is called a Bernoulli random variable. What
More informationCharacterization of the Skew-Normal Distribution Via Order Statistics and Record Values
International Journal of Statistics and Probabilit; Vol. 4, No. 1; 2015 ISSN 1927-7032 E-ISSN 1927-7040 Published b Canadian Center of Science and Education Characterization of the Sew-Normal Distribution
More informationAn Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys
An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction
More informationConstrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources
Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan
More informationConservative variance estimation for sampling designs with zero pairwise inclusion probabilities
Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance
More informationUniversity of Michigan School of Public Health
University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 003 Paper Weighting Adustments for Unit Nonresponse with Multiple Outcome
More informationAnalysing Spatial Data in R Worked examples: Small Area Estimation
Analysing Spatial Data in R Worked examples: Small Area Estimation Virgilio Gómez-Rubio Department of Epidemiology and Public Heath Imperial College London London, UK 31 August 2007 Small Area Estimation
More informationSmall area estimation with missing data using a multivariate linear random effects model
Department of Mathematics Small area estimation with missing data using a multivariate linear random effects model Innocent Ngaruye, Dietrich von Rosen and Martin Singull LiTH-MAT-R--2017/07--SE Department
More informationA noninformative Bayesian approach to domain estimation
A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal
More informationF. Jay Breidt Colorado State University
Model-assisted survey regression estimation with the lasso 1 F. Jay Breidt Colorado State University Opening Workshop on Computational Methods in Social Sciences SAMSI August 2013 This research was supported
More informationANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS
ANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS Background Independent observations: Short review of well-known facts Comparison of two groups continuous response Control group:
More informationA Note on the Effect of Auxiliary Information on the Variance of Cluster Sampling
Journal of Official Statistics, Vol. 25, No. 3, 2009, pp. 397 404 A Note on the Effect of Auxiliary Information on the Variance of Cluster Sampling Nina Hagesæther 1 and Li-Chun Zhang 1 A model-based synthesis
More informationChapter 5 Prediction of Random Variables
Chapter 5 Prediction of Random Variables C R Henderson 1984 - Guelph We have discussed estimation of β, regarded as fixed Now we shall consider a rather different problem, prediction of random variables,
More informationImproved ratio-type estimators using maximum and minimum values under simple random sampling scheme
Improved ratio-type estimators using maximum and minimum values under simple random sampling scheme Mursala Khan Saif Ullah Abdullah. Al-Hossain and Neelam Bashir Abstract This paper presents a class of
More informationCross-sectional variance estimation for the French Labour Force Survey
Survey Research Methods (007 Vol., o., pp. 75-83 ISS 864-336 http://www.surveymethods.org c European Survey Research Association Cross-sectional variance estimation for the French Labour Force Survey Pascal
More informationDoes low participation in cohort studies induce bias? Additional material
Does low participation in cohort studies induce bias? Additional material Content: Page 1: A heuristic proof of the formula for the asymptotic standard error Page 2-3: A description of the simulation study
More informationComputation of Csiszár s Mutual Information of Order α
Computation of Csiszár s Mutual Information of Order Damianos Karakos, Sanjeev Khudanpur and Care E. Priebe Department of Electrical and Computer Engineering and Center for Language and Speech Processing
More informationCausal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions
Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census
More informationKnown unknowns : using multiple imputation to fill in the blanks for missing data
Known unknowns : using multiple imputation to fill in the blanks for missing data James Stanley Department of Public Health University of Otago, Wellington james.stanley@otago.ac.nz Acknowledgments Cancer
More informationNo is the Easiest Answer: Using Calibration to Assess Nonignorable Nonresponse in the 2002 Census of Agriculture
No is the Easiest Answer: Using Calibration to Assess Nonignorable Nonresponse in the 2002 Census of Agriculture Phillip S. Kott National Agricultural Statistics Service Key words: Weighting class, Calibration,
More informationR function for residual analysis in linear mixed models: lmmresid
R function for residual analysis in linear mixed models: lmmresid Juvêncio S. Nobre 1, and Julio M. Singer 2, 1 Departamento de Estatística e Matemática Aplicada, Universidade Federal do Ceará, Fortaleza,
More informationEFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS
Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,
More informationBIOSTATS 540 Fall 2016 Exam 1 (Unit 1 Summarizing Data) Page 1 of 7
BIOSTATS 540 Fall 2016 Exam 1 (Unit 1 Summarizing Data) Page 1 of 7 BIOSTATS 540 - Introductory Biostatistics Fall 2016 Examination 1 (Unit 1 Summarizing Data) Due: Monday September 26, 2016 Last Date
More informationComparison of Estimators in Case of Low Correlation in Adaptive Cluster Sampling. Muhammad Shahzad Chaudhry 1 and Muhammad Hanif 2
ISSN 684-8403 Journal of Statistics Volume 3, 06. pp. 4-57 Comparison of Estimators in Case of Lo Correlation in Muhammad Shahad Chaudhr and Muhammad Hanif Abstract In this paper, to Regression-Cum-Eponential
More informationIgnoring the matching variables in cohort studies - when is it valid, and why?
Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association
More informationFrom the help desk: It s all about the sampling
The Stata Journal (2002) 2, Number 2, pp. 90 20 From the help desk: It s all about the sampling Allen McDowell Stata Corporation amcdowell@stata.com Jeff Pitblado Stata Corporation jsp@stata.com Abstract.
More informationCombining data from two independent surveys: model-assisted approach
Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,
More informationarxiv: v1 [math.st] 28 Feb 2017
Bridging Finite and Super Population Causal Inference arxiv:1702.08615v1 [math.st] 28 Feb 2017 Peng Ding, Xinran Li, and Luke W. Miratrix Abstract There are two general views in causal analysis of experimental
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationMATH 680 Fall November 27, Homework 3
MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 Work all problems. 60 points are needed to pass at the Masters Level and 75
More informationSociedad de Estadística e Investigación Operativa
Sociedad de Estadística e Investigación Operativa Test Volume 14, Number 2. December 2005 Estimation of Regression Coefficients Subject to Exact Linear Restrictions when Some Observations are Missing and
More informationMultiple Comparison Testing for Experimental Chemotherapy Based on Multivariate Covariance Analysis
Journal of Statistical and Econometric Methods, vol., no., 0, -0 ISSN: 9-0 (print), 9-99 (online) Scienpress Ltd, 0 Multiple Comparison Testing for Experimental Chemotherap Based on Multivariate Covariance
More informationSupplement-Sample Integration for Prediction of Remainder for Enhanced GREG
Supplement-Sample Integration for Prediction of Remainder for Enhanced GREG Abstract Avinash C. Singh Division of Survey and Data Sciences American Institutes for Research, Rockville, MD 20852 asingh@air.org
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationPooling multiple imputations when the sample happens to be the population.
Pooling multiple imputations when the sample happens to be the population. Gerko Vink 1,2, and Stef van Buuren 1,3 arxiv:1409.8542v1 [math.st] 30 Sep 2014 1 Department of Methodology and Statistics, Utrecht
More informationObnoxious lateness humor
Obnoxious lateness humor 1 Using Bayesian Model Averaging For Addressing Model Uncertainty in Environmental Risk Assessment Louise Ryan and Melissa Whitney Department of Biostatistics Harvard School of
More informationarxiv: v1 [math.st] 22 Dec 2018
Optimal Designs for Prediction in Two Treatment Groups Rom Coefficient Regression Models Maryna Prus Otto-von-Guericke University Magdeburg, Institute for Mathematical Stochastics, PF 4, D-396 Magdeburg,
More informationIncorporating published univariable associations in diagnostic and prognostic modeling
Incorporating published univariable associations in diagnostic and prognostic modeling Thomas Debray Julius Center for Health Sciences and Primary Care University Medical Center Utrecht The Netherlands
More informationA flexible two-step randomised response model for estimating the proportions of individuals with sensitive attributes
A flexible two-step randomised response model for estimating the proportions of individuals with sensitive attributes Anne-Françoise Donneau, Murielle Mauer Francisco Sartor and Adelin Albert Department
More informationTopic 3 Populations and Samples
BioEpi540W Populations and Samples Page 1 of 33 Topic 3 Populations and Samples Topics 1. A Feeling for Populations v Samples 2 2. Target Populations, Sampled Populations, Sampling Frames 5 3. On Making
More informationESTP course on Small Area Estimation
ESTP course on Small Area Estimation Statistics Finland, Helsinki, 29 September 2 October 2014 Topic 1: Introduction to small area estimation Risto Lehtonen, University of Helsinki Lecture topics: Monday
More informationDependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression
Dependence and scatter-plots MVE-495: Lecture 4 Correlation and Regression It is common for two or more quantitative variables to be measured on the same individuals. Then it is useful to consider what
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationComparison of Two Ratio Estimators Using Auxiliary Information
IOR Journal of Mathematics (IOR-JM) e-in: 78-578, p-in: 39-765. Volume, Issue 4 Ver. I (Jul. - Aug.06), PP 9-34 www.iosrjournals.org omparison of Two Ratio Estimators Using Auxiliar Information Bawa, Ibrahim,
More informationChapter 3: Element sampling design: Part 1
Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part
More informationPrediction of New Observations
Statistic Seminar: 6 th talk ETHZ FS2010 Prediction of New Observations Martina Albers 12. April 2010 Papers: Welham (2004), Yiang (2007) 1 Content Introduction Prediction of Mixed Effects Prediction of
More informationAn analytic proof of the theorems of Pappus and Desargues
Note di Matematica 22, n. 1, 2003, 99 106. An analtic proof of the theorems of Pappus and Desargues Erwin Kleinfeld and Tuong Ton-That Department of Mathematics, The Universit of Iowa, Iowa Cit, IA 52242,
More informationResearch Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.
Research Design - - Topic 15a Introduction to Multivariate Analses 009 R.C. Gardner, Ph.D. Major Characteristics of Multivariate Procedures Overview of Multivariate Techniques Bivariate Regression and
More informationContextual Effects in Modeling for Small Domains
University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2011 Contextual Effects in
More informationCompare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method
Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School
More informationUnbiased estimation of exposure odds ratios in complete records logistic regression
Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology
More informationGeneralized Pseudo Empirical Likelihood Inferences for Complex Surveys
The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao
More informationEstimation of Mean Population in Small Area with Spatial Best Linear Unbiased Prediction Method
Journal of Physics: Conference Series PAPER OPEN ACCESS Estimation of Mean Population in Small Area with Spatial Best Linear Unbiased Prediction Method To cite this article: Syahril Ramadhan et al 2017
More informationLongitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois
Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationThree-Level Modeling for Factorial Experiments With Experimentally Induced Clustering
Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering John J. Dziak The Pennsylvania State University Inbal Nahum-Shani The University of Michigan Copyright 016, Penn State.
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the
More informationJob Training Partnership Act (JTPA)
Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts with other slides on randomized experiments) Frank Venmans Example of a randomized experiment: Job Training
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationSAMPLING BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :55. BIOS Sampling
SAMPLIG BIOS 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-11-14 15:55 BIOS 662 1 Sampling Outline Preliminaries Simple random sampling Population mean Population
More informationDevelopment of methodology for the estimate of variance of annual net changes for LFS-based indicators
Development of methodology for the estimate of variance of annual net changes for LFS-based indicators Deliverable 1 - Short document with derivation of the methodology (FINAL) Contract number: Subject:
More informationInference about the Slope and Intercept
Inference about the Slope and Intercept Recall, we have established that the least square estimates and 0 are linear combinations of the Y i s. Further, we have showed that the are unbiased and have the
More informationSpecification testing in panel data models estimated by fixed effects with instrumental variables
Specification testing in panel data models estimated by fixed effects wh instrumental variables Carrie Falls Department of Economics Michigan State Universy Abstract I show that a handful of the regressions
More informationESTIMATION OF CONFIDENCE INTERVALS FOR QUANTILES IN A FINITE POPULATION
Mathematical Modelling and Analysis Volume 13 Number 2, 2008, pages 195 202 c 2008 Technika ISSN 1392-6292 print, ISSN 1648-3510 online ESTIMATION OF CONFIDENCE INTERVALS FOR QUANTILES IN A FINITE POPULATION
More informationMuch ado about nothing: the mixed models controversy revisited
Much ado about nothing: the mixed models controversy revisited Viviana eatriz Lencina epartamento de Investigación, FM Universidad Nacional de Tucumán, Argentina Julio da Motta Singer epartamento de Estatística,
More informationCasual Mediation Analysis
Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction
More information6. Vector Random Variables
6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following
More informationScatter Plot Quadrants. Setting. Data pairs of two attributes X & Y, measured at N sampling units:
Geog 20C: Phaedon C Kriakidis Setting Data pairs of two attributes X & Y, measured at sampling units: ṇ and ṇ there are pairs of attribute values {( n, n ),,,} Scatter plot: graph of - versus -values in
More informationChapter 2: Describing Contingency Tables - I
: Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More information