Data Integration for Big Data Analysis for finite population inference
|
|
- Robyn Dennis
- 5 years ago
- Views:
Transcription
1 for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, / 36
2 What is big data? 2 / 36
3 Data do not speak for themselves Knowledge Reproducibility Information Intepretation Data 3 / 36
4 Population and Sample Population Parameter Generalization Inference Sample Estimator 4 / 36
5 Survey Sampling Survey: Measurement Sampling: Representation Table: Survey Methodology and Sampling Statistics Survey Methodology Psychology, Cognitive Science Studies Nonsampling error Questionnaire design Sampling Statistics Statistics Studies Sampling error Sampling design, estimation 5 / 36
6 Two wings of survey data 6 / 36
7 Big Data Big Data era- Freeconomics 8 / 36
8 Big Data Survey sample data vs Big Data Table: Features Survey sample data Big Data Cost function C = C 0 + C1 n C is not linear in n Reprentativeness Bias Bias = 0 Bias 0 Variance Variance = K/n Variance = 0 9 / 36
9 Big Data Selection Bias Finite population: U = {1,, N}. Parameter of interest: ȲN = N 1 N i=1 y i Big data sample: B U. { 1 if i B δ i = 0 otherwise. Estimator: ȳ B = N 1 N B i=1 δ iy i, where N B = N i=1 δ i is the big data sample size (N B < N). 10 / 36
10 Big Data MSE of Big Data Estimator MSE Formula E δ (ȳ B ȲN ) 2 = E δ (ρ 2 δ,y ) σ 2 1 f B f B where ρ δ,y = Corr(δ, Y ), σ 2 = V ar(y ), f B = N B /N, and E δ ( ) is the expectation with respect to the big data sampling mechanism, generally unknown. If E δ (ρ δ,y ) = 0, then E δ (ρ 2 δ,y ) = O(N 1 B ) and the MSE is of order 1/NB. If E δ (ρ δ,y ) 0, then E δ (ρ 2 δ,y ) = O(1) the MSE is of order 1/f B / 36
11 Big Data Effective sample size n eff = f B 1 1 f B E δ (ρ 2 δ,y ). If ρ δ,y = 0.05 and f B = 1/2, then n eff = 400. For example, suppose that the population size is N = 10, 000, 000 and we have 50% of the population collected in the big data. If ρ δ,y = 0.05 then the MSE of the big data sample mean is equal to that of SRS mean with size n = / 36
12 Big Data Paradox of Big data (Meng 2018) Confidence interval using the big data sample (ignoring the selection bias): CI = (ȳ B 1.96 (1 f B )S 2 /N B, ȳ B (1 f B )S 2 /N B ) As N B, we have P r(ȳn CI) 0. Paradox: If one ignores the bias and apply the standard method of estimation, the bigger the dataset, the more misleading it is for valid statistical inference. 13 / 36
13 Salvation of Big Data Data Integration 15 / 36
14 Data integration: Basic Idea Two data set: Big data and survey data Big data may be subject to selection bias. For simplicity, assume a binary Y variable δ = 1 δ = 0 Y = 1 N B1 N C1 N 1 Y = 0 N B0 N C0 N 0 N B N C N where δ i = 1 if unit i belongs to the big data sample and δ i = 0 otherwise. Parameter of interest: P = P (Y = 1). 16 / 36
15 Data integration: Basic Idea (Cont d) In addition, we have a survey data of size n by SRS with the following observations in the sample level: How to combine two data sources? δ = 1 δ = 0 Y = 1 n B1 n C1 n 1 Y = 0 n B0 n C0 n 0 n 17 / 36
16 Combined estimation Data Integration Note that P (Y = 1) = P (Y = 1 δ = 1)P (δ = 1) + P (Y = 1 δ = 0)P (δ = 0). Three components 1 P (δ = 1): Big data proportion (known) 2 P (Y = 1 δ = 1) = N B1/N B: obtained from the big data. 3 P (Y = 1 δ = 0): estimated by n C1/(n C0 + n C1) from the survey data. Final estimator ˆP = P B W B + ˆP C (1 W B ) (1) where W B = N B /N, P B = N B1 /N B, and ˆP C = n C1 /(n C0 + n C1 ). 18 / 36
17 Remark 1 Variance V ( ˆP ) = (1 W B ) 2 V ( ˆP C ). = (1 W B ) 1 n P C(1 P C ). If W B is close to one, then the above variance is very small. Instead of using ˆP C = n C1 /(n C0 + n C1 ), we can construct a ratio estimator of P C to improve the efficiency. That is, use 1 ˆP C,r = 1 + ˆθ C where ˆθ C = N B0/N B1 n B0 /n B1 (n C0 /n C1 ). 19 / 36
18 Remark 2 The combined estimator is essentially a post-stratified estimator using δ as a post-stratification variable. Post-stratification idea can be directly applicable to continuous Y variable. Practical Issues δ can be obtained inaccurately (due to Imperfect Matching). We may have measurement errors in y in the big data. Survey sample may not observe y at all. 20 / 36
19 Two setups (A: survey sample data, B: Big data) Parameter of interest: θ = i U y i Table: Setup One Data X Y Represent? A B Probability sample does not observe the study variable Table: Setup Two Data X Y A B Probability sample does observe the study variable 21 / 36
20 Data Integration for Setup One Rivers (2007) idea 1 Use X to create nearest neighbor imputation for each unit i A. 2 Compute ˆθ = w iyi i A where y i is the imputed value of y i in i A. Based on MAR (missing at random) assumption f(y x, δ = 1) = f(y x) Bias may not be negligible if the dimension of x is high (due to curse of dimensionality). Naive variance estimator works well. (Estimation error is asymptotically negligible.) 22 / 36
21 Data Integration for Setup One Proposed method 1 1 Obtain δ i from A, by matching or by asking the membership for the big data. 2 Fit a model for P (δ = 1 x) using sample A. 3 Use ˆθ = i B ˆπ 1 i y i where ˆπ i = ˆP (δ i = 1 x i) and adjusted to satisfy i B ˆπ 1 i = N. Based on MAR assumption. Requires correct specification of the model for π(x) = P (δ = 1 x). 23 / 36
22 Data Integration for Setup One Proposed method 2 : Doubly robust (DR) estimation 1 Fit a working model for E(Y x) to get ŷ i = Ê(Yi xi) for each i A and i B. 2 Fit a working model for P (δ = 1 x) to get ˆπ i = ˆP (δ i = 1 x i) for each i B. 3 Use ˆθ DR = i A where ˆπ i = ˆP (δ i = 1 x i). Based on MAR assumption. w iŷ i + i B ˆπ 1 i (y i ŷ i) Requires one of the two models be correctly specified. 24 / 36
23 Justification for DR estimation Let ˆθ HT = i A w iy i be the Horvitz-Thompson estimator that could be used if y i were observed in sample A. Note that ˆθ DR ˆθ HT = i A w i ê i + i B ˆπ 1 i ê i where ê i = y i ŷ i. Double Robustness 1 If the model for P (δ = 1 x) is correctly specified, then E δ {ˆθ DR ˆθ HT } = i A w iê i + i U ê i which is design-unbiased to zero. 2 If the model for E(Y x) is correctly specified, then E(ê i) = 0 under MAR. 25 / 36
24 Data Integration for Setup Two Table: Setup Two Data X Y A B We are interested in estimating θ = i U y i from the two data sources. 26 / 36
25 Data Integration for Setup Two Note that we can compute ˆθ A = i A w iy i from sample A. Thus, unlike setup one, the goal of data integration is to improve the efficiency (i.e. reduce the variance), not to reduce the selection bias. How to incorporate the partial auxiliary information in data B? 1 If B = U, then it is an easy problem: Calibration weighting 2 For B U, we can treat B as a sub-population and apply the same calibration weighting for A B. 27 / 36
26 Calibration weighting in survey sampling Initial (design) weight: w i Final weight: w i satisfying i A w i (1, x i ) = i U (1, x i ). (2) Calibration weighting problem: Find w i that minimize D(w, w ) = i A ( ) w 2 w i i 1 w i subject to (2). 28 / 36
27 Calibration weighting for big data integration Auxiliary variable x i are observed only when δ i = 1. Calibration equation is changed to i A w i (1 δ i, δ i, δ i x i ) = i U (1 δ i, δ i, δ i x i ). (3) If y i = x i, it reduces to the post-stratification estimator in (1). 29 / 36
28 Simulation Study: Setup One Goal: Wish to compare four estimators 1 Naive estimator: mean of sample B 2 Rivers estimator 3 Proposed estimator 1 (PS estimator) using propensity score weighting. 4 Proposed estimator 2 (DR estimator) using a working model for E(Y x) and a working model for P (δ = 1 x). Three scenarios for the simulation study 1 Both models are correct 2 Only the model E(Y x) is correct. (i.e. The true distribution for P (δ = 1 x) is different from the working model. ) 3 Only the model P (δ = 1 x) is correct. 30 / 36
29 Simulation study one: Setup Outcome regression model 1 Linear model. That is, y i = 1 + x 1,i + x 2,i + ɛ i for i = 1,..., N, where x 1,i N(1, 1), x 2,i Ex(1), ɛ i N(0, 1), N = 1, 000, 000, and (x 1,i, x 2,i, ɛ i) is pair-wise independent. 2 Nonlinear model. That is, y i = 0.5(x 1,i 1.5) 2 + x 2,i + ɛ i, where (x 1,i, x 2,i, ɛ i) is the same with those in the linear model. Big data sampling mechanism 1 Linear logistic model. δ i p i Ber(p i) for i = 1,..., N, where logit(p i) = x 2,i. 2 Nonlinear logistic model. δ i p i Ber(p i) for i = 1,..., N, where logit(p i) = (x 2,i 2) / 36
30 Smulation Result Scenario n = 500 n = 1000 Bias S.E. C.R. Bias S.E. C.R. Naive I Rivers PS DR II III Naive Rivers PS DR Naive Rivers PS DR / 36
31 Simulation Study: Setup Two Finite population of size N = 1, 000, 000. x i N(2, 1) y i = (x i 2) + e i y i = (y i 3) + u i where e i N(0, 0.51) and u i N(0, ). Note that yi is an inaccurate measurement of y i. Sampling mechanism for A: SRS of size n = 500. Big data sampling mechanism: Stratified random sampling 1 Create two strata using x i 2 and x i > 2. 2 Within each stratum, we select n h elements by SRS independently, where n 1 = 300, 000 and n 2 = 200, The stratum information is not available to data analyst. 33 / 36
32 Simulation Study: Setup Two In sample A, we observe y i. Two scenarios for sample B. 1 Observe y i: Big data is subject to selection bias 2 Observe y i : Big data is subject to selection bias and measurement error. We can identify the elements in A B. Three estimators for θ = E(Y ) 1 Mean of sample A (Mean A) 2 Mean of sample B (Mean B) 3 Proposed data integration (DI) method using calibration weighting: In scenario one, we use calibration using (1 δ i, δ iy i). In scenario two, we use calibration using (1 δ i, δ iy i ). 34 / 36
33 Simulation Result Table: Monte Carlo results of mean, variance, and the MSE of the four estimators (True mean = ) Scenario Method Mean Variance MSE ( 10 4 ) ( 10 4 ) Mean A Mean B Proposed DI Mean A Mean B ,130 Proposed DI / 36
34 Discussion Big data should not be analyzed naively. (Big data paradox!) Data integration is a useful tool for harnessing big data for finite population inference. Two setups are considered. In Setup One, both Rivers method and DR method are promising. In Setup Two, calibration weighting method is useful. In Setup One, MAR assumption is used. In Setup Two, we do not need MAR assumption. Promising area of research. 36 / 36
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationCombining Non-probability and Probability Survey Samples Through Mass Imputation
Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu
More informationNonresponse weighting adjustment using estimated response probability
Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationChapter 8: Estimation 1
Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.
More informationChapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70
Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:
More informationA comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group
A comparison of weighted estimators for the population mean Ye Yang Weighting in surveys group Motivation Survey sample in which auxiliary variables are known for the population and an outcome variable
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh September 13 & 15, 2005 1. Complete-case analysis (I) Complete-case analysis refers to analysis based on
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationSampling techniques for big data analysis in finite population inference
Statistics Preprints Statistics 1-29-2018 Sampling techniques for big data analysis in finite population inference Jae Kwang Kim Iowa State University, jkim@iastate.edu Zhonglei Wang Iowa State University,
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More information6. Fractional Imputation in Survey Sampling
6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population
More informationWeighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai
Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationTwo-phase sampling approach to fractional hot deck imputation
Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.
More informationCombining data from two independent surveys: model-assisted approach
Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,
More informationCombining Non-probability and. Probability Survey Samples Through Mass Imputation
Combining Non-probability and arxiv:1812.10694v2 [stat.me] 31 Dec 2018 Probability Survey Samples Through Mass Imputation Jae Kwang Kim Seho Park Yilin Chen Changbao Wu January 1, 2019 Abstract. This paper
More informationEmpirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design
1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary
More informationCausal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies
Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationDouble Robustness. Bang and Robins (2005) Kang and Schafer (2007)
Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random
More informationA measurement error model approach to small area estimation
A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion
More informationPropensity Score Analysis with Hierarchical Data
Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational
More informationCausal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions
Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationDeriving indicators from representative samples for the ESF
Deriving indicators from representative samples for the ESF Brussels, June 17, 2014 Ralf Münnich and Stefan Zins Lisa Borsi and Jan-Philipp Kolb GESIS Mannheim and University of Trier Outline 1 Choosing
More informationDOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA
Statistica Sinica 22 (2012), 149-172 doi:http://dx.doi.org/10.5705/ss.2010.069 DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA Qi Long, Chiu-Hsieh Hsu and Yisheng Li Emory University,
More informationModification and Improvement of Empirical Likelihood for Missing Response Problem
UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu
More informationHow to Use the Internet for Election Surveys
How to Use the Internet for Election Surveys Simon Jackman and Douglas Rivers Stanford University and Polimetrix, Inc. May 9, 2008 Theory and Practice Practice Theory Works Doesn t work Works Great! Black
More informationWeighting in survey analysis under informative sampling
Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting
More informationINSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING
Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy
More informationMiscellanea A note on multiple imputation under complex sampling
Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.
More informationRecommendations as Treatments: Debiasing Learning and Evaluation
ICML 2016, NYC Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims Cornell University, Google Funded in part through NSF Awards IIS-1247637, IIS-1217686, IIS-1513692. Romance
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More informationJong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris
Jackknife Variance Estimation for Two Samples after Imputation under Two-Phase Sampling Jong-Min Kim* and Jon E. Anderson jongmink@mrs.umn.edu Statistics Discipline Division of Science and Mathematics
More informationNew Developments in Nonresponse Adjustment Methods
New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample
More informationPropensity Score Methods for Estimating Causal Effects from Complex Survey Data
Propensity Score Methods for Estimating Causal Effects from Complex Survey Data Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School
More informationA weighted simulation-based estimator for incomplete longitudinal data models
To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationChapter 3: Element sampling design: Part 1
Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part
More informationEstimating the Marginal Odds Ratio in Observational Studies
Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios
More informationGraybill Conference Poster Session Introductions
Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary
More informationThe ESS Sample Design Data File (SDDF)
The ESS Sample Design Data File (SDDF) Documentation Version 1.0 Matthias Ganninger Tel: +49 (0)621 1246 282 E-Mail: matthias.ganninger@gesis.org April 8, 2008 Summary: This document reports on the creation
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics
More informationMatching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14
STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University Frequency 0 2 4 6 8 Quiz 2 Histogram of Quiz2 10 12 14 16 18 20 Quiz2
More informationExtending causal inferences from a randomized trial to a target population
Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh
More informationPropensity score adjusted method for missing data
Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd
More informationPrimal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing
Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal
More informationarxiv: v2 [math.st] 20 Jun 2014
A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun
More informationCalibration Estimation for Semiparametric Copula Models under Missing Data
Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre
More informationCluster Sampling 2. Chapter Introduction
Chapter 7 Cluster Sampling 7.1 Introduction In this chapter, we consider two-stage cluster sampling where the sample clusters are selected in the first stage and the sample elements are selected in the
More informationCovariate Balancing Propensity Score for General Treatment Regimes
Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian
More informationModel Assisted Survey Sampling
Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling
More informationSelection on Observables: Propensity Score Matching.
Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017
More informationarxiv: v1 [stat.me] 15 May 2011
Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationChapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling
Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC
More informationPropensity Score Methods for Causal Inference
John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good
More informationWhat if we want to estimate the mean of w from an SS sample? Let non-overlapping, exhaustive groups, W g : g 1,...G. Random
A Course in Applied Econometrics Lecture 9: tratified ampling 1. The Basic Methodology Typically, with stratified sampling, some segments of the population Jeff Wooldridge IRP Lectures, UW Madison, August
More informationEstimation of change in a rotation panel design
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS028) p.4520 Estimation of change in a rotation panel design Andersson, Claes Statistics Sweden S-701 89 Örebro, Sweden
More informationMonte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics
Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,
More informationYou are allowed 3? sheets of notes and a calculator.
Exam 1 is Wed Sept You are allowed 3? sheets of notes and a calculator The exam covers survey sampling umbers refer to types of problems on exam A population is the entire set of (potential) measurements
More informationSimulation-Extrapolation for Estimating Means and Causal Effects with Mismeasured Covariates
Observational Studies 1 (2015) 241-290 Submitted 4/2015; Published 10/2015 Simulation-Extrapolation for Estimating Means and Causal Effects with Mismeasured Covariates J.R. Lockwood Educational Testing
More informationIn Praise of the Listwise-Deletion Method (Perhaps with Reweighting)
In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) Phillip S. Kott RTI International NISS Worshop on the Analysis of Complex Survey Data With Missing Item Values October 17, 2014 1 RTI
More informationAnalyzing Pilot Studies with Missing Observations
Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate
More informationMarginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal
Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect
More informationCausal Inference with General Treatment Regimes: Generalizing the Propensity Score
Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke
More informationEric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION
Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring
More informationComment: Understanding OR, PS and DR
Statistical Science 2007, Vol. 22, No. 4, 560 568 DOI: 10.1214/07-STS227A Main article DOI: 10.1214/07-STS227 c Institute of Mathematical Statistics, 2007 Comment: Understanding OR, PS and DR Zhiqiang
More informationNew Developments in Econometrics Lecture 9: Stratified Sampling
New Developments in Econometrics Lecture 9: Stratified Sampling Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Overview of Stratified Sampling 2. Regression Analysis 3. Clustering and Stratification
More informationREPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY
REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in
More informationCausal Inference Basics
Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationCalibration estimation using exponential tilting in sample surveys
Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary
More informationEstimation of Parameters and Variance
Estimation of Parameters and Variance Dr. A.C. Kulshreshtha U.N. Statistical Institute for Asia and the Pacific (SIAP) Second RAP Regional Workshop on Building Training Resources for Improving Agricultural
More informationImputation for Missing Data under PPSWR Sampling
July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR
More informationComparing MLE, MUE and Firth Estimates for Logistic Regression
Comparing MLE, MUE and Firth Estimates for Logistic Regression Nitin R Patel, Chairman & Co-founder, Cytel Inc. Research Affiliate, MIT nitin@cytel.com Acknowledgements This presentation is based on joint
More informationVARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA
Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University
More informationChapter 4: Imputation
Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation
More informationOrdered Designs and Bayesian Inference in Survey Sampling
Ordered Designs and Bayesian Inference in Survey Sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu Siamak Noorbaloochi Center for Chronic Disease
More informationCausal Inference with Measurement Error
Causal Inference with Measurement Error by Di Shu A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Statistics Waterloo,
More informationHigh Dimensional Propensity Score Estimation via Covariate Balancing
High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)
More informationWhat is Survey Weighting? Chris Skinner University of Southampton
What is Survey Weighting? Chris Skinner University of Southampton 1 Outline 1. Introduction 2. (Unresolved) Issues 3. Further reading etc. 2 Sampling 3 Representation 4 out of 8 1 out of 10 4 Weights 8/4
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationOpening Theme: Flexibility vs. Stability
Opening Theme: Flexibility vs. Stability Patrick Breheny August 25 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction We begin this course with a contrast of two simple, but very different,
More informationStatistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes
Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationCorrelated and Interacting Predictor Omission for Linear and Logistic Regression Models
Clemson University TigerPrints All Dissertations Dissertations 8-207 Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models Emily Nystrom Clemson University, emily.m.nystrom@gmail.com
More informationIntroduction to Survey Data Integration
Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5
More informationMain sampling techniques
Main sampling techniques ELSTAT Training Course January 23-24 2017 Martin Chevalier Department of Statistical Methods Insee 1 / 187 Main sampling techniques Outline Sampling theory Simple random sampling
More informationTargeted Maximum Likelihood Estimation in Safety Analysis
Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35
More informationGeneralized Pseudo Empirical Likelihood Inferences for Complex Surveys
The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao
More information