Combining Non-probability and Probability Survey Samples Through Mass Imputation
|
|
- Henry Ward
- 5 years ago
- Views:
Transcription
1 Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, Joint work with Seho Park, Yilin Chen, and Changbao Wu
2 Outline 1 Introduction 2 Proposed method 3 Variance estimation 4 Replication variance estimation 5 A real data application 6 Conclusion Kim (ISU) Survey Data Integration October 27, / 26
3 1. Introduction We are interested in combining information from two samples, one with probability sampling and the other with non-probability sampling (such as voluntary sample). We observe X from the probability sample and observe (X, Y ) from the non-probability sample. The sampling mechanism for sample B is unknown. Table: Data Structure Data X Y Representativeness A! Yes B!! No Kim (ISU) Survey Data Integration October 27, / 26
4 Mass Imputation Rivers (2007) idea 1 Use X to find the nearest neighbor for each unit i A. 2 Compute ˆθ = i A w i y i where yi is the imputed value of y i in i A using nearest neighbor imputation. Based on MAR (missing at random) assumption f (y x, δ = 1) = f (y x) where δ i = { 1 if i B 0 if i / B. Kim (ISU) Survey Data Integration October 27, / 26
5 Mass Imputation for data integration Basic Steps 1 Use sample B to estimate the conditional distribution f (y x). 2 Predict y-values for sample A using the estimated conditional distribution. If f (y x) is correctly specified and MAR holds, then the mass imputation estimator is unbiased. Question: How to estimate the variance of mass imputation estimator? Kim (ISU) Survey Data Integration October 27, / 26
6 2. Proposed method Assume Y i = m(x i ; β) + e i (1) for some β with known function m( ), with E(e i x i ) = 0. We assume that ˆβ is the unique solution to Û(β) {y i m(x i ; β)} h(x i ; β) = 0 (2) i B for some p-dimensional vector h(x i ; β). Thus, we use the observations in sample B to obtain ˆβ and then use it to construct ŷ i = m(x i ; ˆβ) for all i A. Kim (ISU) Survey Data Integration October 27, / 26
7 Theorem Suppose that model (1) and MAR condition hold. Under some regularity conditions, the mass imputation estimator satisfies where ỹ I (β) = N 1 i A c = [ n 1 B ȳ I = 1 w i m(x i ; N ˆβ) (3) i A ȳ I = ỹ I (β 0 ) + o p (n 1/2 B ) (4) w i m(x i ; β) + n 1 B {y i m(x i ; β)} h(x i ; β) c, i B ] 1 N ṁ(x i ; β 0 )h (x i ; β 0 ) N 1 ṁ(x i ; β 0 ), i B β 0 is the true value of β in (1), and ṁ(x; β) = m(x; β)/ β. Kim (ISU) Survey Data Integration October 27, / 26 i=1
8 Also, and E{ỹ I (β 0 ) ȳ N } = 0, (5) { V {ỹ I (β 0 ) ȳ N } = V N 1 w i m(x i ; β 0 ) N 1 m(x i ; β 0 ) i A i U [ + E E ( ] ei 2 ) { x i h(xi ; β 0 ) c } 2, (6) where e i = y i m(x i ; β 0 ). n 2 B i B } Kim (ISU) Survey Data Integration October 27, / 26
9 Example Under the special case of linear model y i = x iβ + e i with e i (0, σe), 2 we can use ŷ i = x ˆβ i with ˆβ = ( i B x ix ) 1 i i B x iy i to construct regression mass imputation. If we assume SRS for sample A, the asymptotic variance in (6) reduces to ) V (ˆθ I,reg = 1 β n 1σ 2 x { σ 2 ( xn x B ) 2 } e + E A n B i B (x i x B ) 2 σe. 2 If sample B were an independent random sample of size n B, then the third term would of order O(n 2 B ) and is negligible. However, as sample B is a non-probability sample, the third term is not negligible. Kim (ISU) Survey Data Integration October 27, / 26
10 3. Variance estimation For variance estimation of the mass imputation estimator (3), we have only to estimate the variance of the linearized estimator ỹ I (β 0 ) in (4). Since the variance formula can be written as where { V {ỹ I (β 0 ) ȳ N } = V A + V B V A = V N 1 w i m(x i ; β 0 ) N 1 m(x i ; β 0 ) i A i U [ V B = E E ( ] ei 2 ) { x i h(xi ; β 0 ) c } 2, n 2 B i B we can estimate V A and V B separately. } Kim (ISU) Survey Data Integration October 27, / 26
11 To estimate ˆV A, we can use ˆV A = N 2 π ij π i π j w i m(x i ; ˆβ)w j m(x j ; ˆβ). π ij i A j A where π ij is the joint inclusion probability for unit i and j, which is assumed to be positive. To estimate V B, we can use ˆV B = n 2 B i B ê 2 i { h(x i ; ˆβ) ĉ } 2, (7) where ê i = y i m(x i ; ˆβ) and [ ] 1 ĉ = ṁ(x i ; β 0 )h (x i ; β 0 ) N 1 w i ṁ(x i ; β 0 ) n 1 B i B Hence, the variance of ȳ I,reg can be estimated by ˆV (ȳ I,reg ) = ˆV A + ˆV B. i A Kim (ISU) Survey Data Integration October 27, / 26
12 Remark If n A /n B = o(1), then V B is smaller order than V A and total variance is dominated by V A. Otherwise, the two variances both contribute to the total variance. If sample B is a big data, n B is huge and V B can be safely ignored. However, to compute ˆV B in (7), we use individual observations of (x i, y i ) in sample B, which is not necessarily available when only sample A with mass imputation is released to the public. Note that the goal of mass imputation is to produce a representative sample with synthetic observations using sample B as a training data. Once the mass imputation is performed, the training data is no longer necessary in computing the point estimation. So, it is desirable to develop a variance estimation method that does not require access to the sample B observations. Kim (ISU) Survey Data Integration October 27, / 26
13 4. Replication variance estimation We consider a bootstrap method for variance estimation that creates replicated synthetic data {ŷ (k) i, i A} corresponding to each set of bootstrap weights {w (k) i, i A} associated with sample A only. This method enables the user to correctly estimate the variance of the mass imputation estimator ȳ I,reg without access to the training data {(y i, x i ) : i B} from sample B. The data file will contain additional columns of {y (k) i : i A} associated with the columns of weights {w (k) i ; i A} (k = 1,, L), where L is the number of replicates created from sample A. Kim (ISU) Survey Data Integration October 27, / 26
14 Kim & Rao (2012) developed a replication method for survey integration when sample B is also a probability sample. 1 Obtain ˆβ (k), the k-th replicate of ˆβ, by solving the same estimating equation for β using the replication weights for sample B. 2 The k-th replicate of the mass imputation estimator ȳ I,reg = i A w iŷ i is ȳ (k) I,reg = w (k) i ŷ (k) i i A where ŷ i = m(x i ; ˆβ (k) ). How to modify the method of Kim & Rao (2012) when sample B is a non-probability sample? Kim (ISU) Survey Data Integration October 27, / 26
15 In order to develop valid bootstrap method for mass imputation estimator ȳ I,reg in (3), it is critical to develop a valid bootstrap method for estimating V (ˆβ) when ˆβ is computed from (2). Note that, under assumption (1) and MAR, we can obtain V (ˆβ) =. J 1 ΩJ 1 (8) where J = E { n 1 B i B ṁih } { i and Ω = E n 2 B i B E(e2 i x)h i h } i with ṁ i = ṁ(x i ; β 0 ) and h i = h(x i ; β 0 ). In (8), the reference distribution is the joint distribution of the superpopulation model and the unknown sampling mechanism for sample B. Kim (ISU) Survey Data Integration October 27, / 26
16 Interestingly, the variance formula in (8) is exactly equal to the variance of ˆβ when sample B is selected from simple random sampling (SRS). That is, even though the sample design for sample B is not SRS, its effect for the variance of ˆβ is essentially equal to that under SRS. Roughly speaking, the MAR assumption makes the effect of the sampling design for estimating β ignorable even though it is still not ignorable for Ȳ N. Therefore, we can safely ignore the sampling design for sample B when estimating β and develop a valid bootstrap method for variance estimation of ˆβ using the bootstrap method for SRS. Kim (ISU) Survey Data Integration October 27, / 26
17 Thus, the proposed bootstrap method can be described as in the following steps: 1 Treating sample B as a simple random sample, generate the k-th bootstrap sample from sample B to compute ˆβ (k), the k-th bootstrap replicate of ˆβ, using the same estimation formula (2) applied to the bootstrap sample. 2 Using ˆβ (k) from [Step 1], compute ŷ i = m(x i ; ˆβ (k) ) for each i A. Using the ŷ (k) i, we obtain the replicated mass imputation estimator ȳ (k) I,reg = i A w (k) i ŷ (k) i. 3 The resulting bootstrap variance estimator of ȳ I,reg is then ˆV b (ȳ I,reg ) = L 1 L k=1 ( ȳ (k) I,reg ȳ I,reg ) 2. (9) Kim (ISU) Survey Data Integration October 27, / 26
18 5. A real data application Pew Research Center (PRC) data in 2015: a non-probability sample data of size n = 9, 301 with 56 variables, provided by eight different vendors with unknown sampling and data collection strategies. The PRC dataset aims to study the relation between people and community. We choose 9 variables, among them 8 are binary and 1 is continuous, as response variables in our analysis. We consider two probability samples with common auxiliary variables. The first is the Behavioral Risk Factor Surveillance System (BRFSS) survey data and the second is the Volunteer Supplement survey data from the Current Population Survey (CPS), both collected in Kim (ISU) Survey Data Integration October 27, / 26
19 ˆX PRC ˆX BRFSS ˆX CPS Age category < >=30,< >=50,< >= Gender Female Race White only Race Black only Origin Hispanic/Latino Region Northeast Region South Region West Marital status Married Employment Working Kim Employment (ISU) Retired Survey Data Integration October , / 26 Comparison of covariates from three datasets Table: Estimated Population Mean of Covariates from the Three Samples
20 Comparison of covariates from three datasets (Cont d) ˆX PRC ˆX BRFSS ˆX CPS Education High school or less Education Bachelor s degree and above Education Bachelor s degree NA Education Postgraduate NA Household Presence of child in household NA Household Home ownership NA Health Smoke everyday NA Health Smoke never NA Financial status No money to see doctors NA Financial status Having medical insurance NA Financial status Household income < 20K NA Financial status Household income >100K NA Volunteer works Volunteered NA Kim (ISU) Survey Data Integration October 27, / 26
21 Remark There are noticeable differences between the naive estimates from the PRC sample and the estimates from the two probability samples for covariates such as Origin (Hispanic/Latino), Education (High school or less), Household (with children), Health (Smoking) and Volunteer works. It is strong evidence that the PRC dataset is not a representative sample for the population. Kim (ISU) Survey Data Integration October 27, / 26
22 Mass Imputation using a single set of common covariates Table: Estimated Population Mean Using A Single Set of Common Covariates Binary Response y ˆθ v l ( 10 5 ) v b ( 10 5 ) Talked with PRC neighbours frequently BRFSS CPS Tended to trust PRC neighbours BRFSS CPS Expressed opinions PRC at a government level BRFSS CPS Voted local PRC elections BRFSS CPS Kim (ISU) Survey Data Integration October 27, / 26
23 Mass Imputation using a single set of common covariates Table: Estimated Population Mean Using A Single Set of Common Covariates Binary Response y ˆθ v l ( 10 5 ) v b ( 10 5 ) Participated in PRC school groups BRFSS CPS Participated in PRC service organizations BRFSS CPS Participated in PRC sports organizations BRFSS CPS No money PRC to buy food BRFSS CPS Kim (ISU) Survey Data Integration October 27, / 26
24 Mass Imputation using a single set of common covariates Table: Estimated Population Mean Using A Single Set of Common Covariates Continuous Response y ˆθ I v l ( 10 2 ) v b ( 10 2 ) Days had at least PRC one drink last month BRFSS CPS Kim (ISU) Survey Data Integration October 27, / 26
25 Discussion There are substantial discrepancies between the mass imputation estimator and the naive estimator in most cases. The mass imputation estimates obtained with two different probability samples are comparable for all cases. The two variance estimators obtained by using the linearization and the bootstrap methods generally agree with each other. Kim (ISU) Survey Data Integration October 27, / 26
26 6. Conclusion Using a non-probability sample data as a training set for prediction, we can implement mass imputation for survey sample data. Nonparametric model can be used for the imputation model. Machine learning algorithm can also be used for mass imputation. A promising area of research. Kim (ISU) Survey Data Integration October 27, / 26
27 REFERENCES Kim, J. K. & Rao, J. N. K. (2012). Combining data from two independent surveys: a model-assisted approach. Biometrika 99, Rivers, D. (2007). Sampling for web surveys. In Proceedings of the Survey Research Methods Section of the American Statistical Association. Kim (ISU) Survey Data Integration October 27, / 26
Combining Non-probability and. Probability Survey Samples Through Mass Imputation
Combining Non-probability and arxiv:1812.10694v2 [stat.me] 31 Dec 2018 Probability Survey Samples Through Mass Imputation Jae Kwang Kim Seho Park Yilin Chen Changbao Wu January 1, 2019 Abstract. This paper
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationCombining data from two independent surveys: model-assisted approach
Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,
More informationChapter 8: Estimation 1
Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.
More informationChapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70
Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:
More informationIntroduction to Survey Data Integration
Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5
More information6. Fractional Imputation in Survey Sampling
6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationA measurement error model approach to small area estimation
A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion
More informationHow to Use the Internet for Election Surveys
How to Use the Internet for Election Surveys Simon Jackman and Douglas Rivers Stanford University and Polimetrix, Inc. May 9, 2008 Theory and Practice Practice Theory Works Doesn t work Works Great! Black
More informationVARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA
Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationINSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING
Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy
More informationEmpirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design
1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationCalibration estimation using exponential tilting in sample surveys
Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationCan a Pseudo Panel be a Substitute for a Genuine Panel?
Can a Pseudo Panel be a Substitute for a Genuine Panel? Min Hee Seo Washington University in St. Louis minheeseo@wustl.edu February 16th 1 / 20 Outline Motivation: gauging mechanism of changes Introduce
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics
More informationChapter 3: Element sampling design: Part 1
Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part
More informationREPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES
Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for
More informationSelection on Observables: Propensity Score Matching.
Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017
More informationBayesian regression tree models for causal inference: regularization, confounding and heterogeneity
Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate
More informationTwo-phase sampling approach to fractional hot deck imputation
Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.
More informationPredicting the Treatment Status
Predicting the Treatment Status Nikolay Doudchenko 1 Introduction Many studies in social sciences deal with treatment effect models. 1 Usually there is a treatment variable which determines whether a particular
More informationImputation for Missing Data under PPSWR Sampling
July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR
More informationNon-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety
Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety David Buil-Gil, Reka Solymosi Centre for Criminology and Criminal
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationOn the bias of the multiple-imputation variance estimator in survey sampling
J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,
More informationDiscrete Choice Modeling
[Part 4] 1/43 Discrete Choice Modeling 0 Introduction 1 Summary 2 Binary Choice 3 Panel Data 4 Bivariate Probit 5 Ordered Choice 6 Count Data 7 Multinomial Choice 8 Nested Logit 9 Heterogeneity 10 Latent
More informationNew Developments in Nonresponse Adjustment Methods
New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample
More informationRecommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis.
Recommender December 13, 2016 What Are Recommender Systems? What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings: What Are Recommender Systems? Various forms,
More informationChapter 4: Imputation
Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation
More informationEconometrics Problem Set 4
Econometrics Problem Set 4 WISE, Xiamen University Spring 2016-17 Conceptual Questions 1. This question refers to the estimated regressions in shown in Table 1 computed using data for 1988 from the CPS.
More informationMissing Covariate Data in Matched Case-Control Studies
Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationWhat s New in Econometrics. Lecture 1
What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationCausal Inference with General Treatment Regimes: Generalizing the Propensity Score
Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke
More informationJun Tu. Department of Geography and Anthropology Kennesaw State University
Examining Spatially Varying Relationships between Preterm Births and Ambient Air Pollution in Georgia using Geographically Weighted Logistic Regression Jun Tu Department of Geography and Anthropology Kennesaw
More informationCombining Difference-in-difference and Matching for Panel Data Analysis
Combining Difference-in-difference and Matching for Panel Data Analysis Weihua An Departments of Sociology and Statistics Indiana University July 28, 2016 1 / 15 Research Interests Network Analysis Social
More informationECO375 Tutorial 8 Instrumental Variables
ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 1 / 22 Review: Endogeneity Instrumental
More informationTHE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B.
THE DESIGN (VERSUS THE ANALYSIS) OF EVALUATIONS FROM OBSERVATIONAL STUDIES: PARALLELS WITH THE DESIGN OF RANDOMIZED EXPERIMENTS DONALD B. RUBIN My perspective on inference for causal effects: In randomized
More informationEconometrics Problem Set 3
Econometrics Problem Set 3 Conceptual Questions 1. This question refers to the estimated regressions in table 1 computed using data for 1988 from the U.S. Current Population Survey. The data set consists
More information1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1
1. Regressions and Regression Models Simply put, economists use regression models to study the relationship between two variables. If Y and X are two variables, representing some population, we are interested
More informationREPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY
REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationNonresponse weighting adjustment using estimated response probability
Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationMonte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics
Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,
More informationGROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX
GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX The following document is the online appendix for the paper, Growing Apart: The Changing Firm-Size Wage
More informationMachine Learning. Naïve Bayes classifiers
10-701 Machine Learning Naïve Bayes classifiers Types of classifiers We can divide the large variety of classification approaches into three maor types 1. Instance based classifiers - Use observation directly
More informationApplied Econometrics (MSc.) Lecture 3 Instrumental Variables
Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.
More informationWeighting in survey analysis under informative sampling
Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting
More informationIEOR165 Discussion Week 5
IEOR165 Discussion Week 5 Sheng Liu University of California, Berkeley Feb 19, 2016 Outline 1 1st Homework 2 Revisit Maximum A Posterior 3 Regularization IEOR165 Discussion Sheng Liu 2 About 1st Homework
More informationChapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28
Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationSelective Inference for Effect Modification
Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationConstrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources
Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan
More informationCausal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions
Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census
More informationNonrespondent subsample multiple imputation in two-phase random sampling for nonresponse
Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nanhua Zhang Division of Biostatistics & Epidemiology Cincinnati Children s Hospital Medical Center (Joint work
More informationRobust Bayesian Variable Selection for Modeling Mean Medical Costs
Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,
More informationJob Training Partnership Act (JTPA)
Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts with other slides on randomized experiments) Frank Venmans Example of a randomized experiment: Job Training
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationA Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II
A Course in Applied Econometrics Lecture Outline Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Assessing Unconfoundedness (not testable). Overlap. Illustration based on Lalonde
More informationEven Simpler Standard Errors for Two-Stage Optimization Estimators: Mata Implementation via the DERIV Command
Even Simpler Standard Errors for Two-Stage Optimization Estimators: Mata Implementation via the DERIV Command by Joseph V. Terza Department of Economics Indiana University Purdue University Indianapolis
More informationTable B1. Full Sample Results OLS/Probit
Table B1. Full Sample Results OLS/Probit School Propensity Score Fixed Effects Matching (1) (2) (3) (4) I. BMI: Levels School 0.351* 0.196* 0.180* 0.392* Breakfast (0.088) (0.054) (0.060) (0.119) School
More informationLow-Income African American Women's Perceptions of Primary Care Physician Weight Loss Counseling: A Positive Deviance Study
Thomas Jefferson University Jefferson Digital Commons Master of Public Health Thesis and Capstone Presentations Jefferson College of Population Health 6-25-2015 Low-Income African American Women's Perceptions
More informationApplied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid
Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationTruncation and Censoring
Truncation and Censoring Laura Magazzini laura.magazzini@univr.it Laura Magazzini (@univr.it) Truncation and Censoring 1 / 35 Truncation and censoring Truncation: sample data are drawn from a subset of
More informationMedical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta
Medical GIS: New Uses of Mapping Technology in Public Health Peter Hayward, PhD Department of Geography SUNY College at Oneonta Invited research seminar presentation at Bassett Healthcare. Cooperstown,
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationGlobally Estimating the Population Characteristics of Small Geographic Areas. Tom Fitzwater
Globally Estimating the Population Characteristics of Small Geographic Areas Tom Fitzwater U.S. Census Bureau Population Division What we know 2 Where do people live? Difficult to measure and quantify.
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More informationAdvanced Topics in Survey Sampling
Advanced Topics in Survey Sampling Jae-Kwang Kim Wayne A Fuller Pushpal Mukhopadhyay Department of Statistics Iowa State University World Statistics Congress Short Course July 23-24, 2015 Kim & Fuller
More informationemerge Network: CERC Survey Survey Sampling Data Preparation
emerge Network: CERC Survey Survey Sampling Data Preparation Overview The entire patient population does not use inpatient and outpatient clinic services at the same rate, nor are racial and ethnic subpopulations
More informationStatistical Education - The Teaching Concept of Pseudo-Populations
Statistical Education - The Teaching Concept of Pseudo-Populations Andreas Quatember Johannes Kepler University Linz, Austria Department of Applied Statistics, Johannes Kepler University Linz, Altenberger
More informationGenerating Private Synthetic Data: Presentation 2
Generating Private Synthetic Data: Presentation 2 Mentor: Dr. Anand Sarwate July 17, 2015 Overview 1 Project Overview (Revisited) 2 Sensitivity of Mutual Information 3 Simulation 4 Results with Real Data
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationAN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE
Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao
More informationMight using the Internet while travelling affect car ownership plans of Millennials? Dr. David McArthur and Dr. Jinhyun Hong
Might using the Internet while travelling affect car ownership plans of Millennials? Dr. David McArthur and Dr. Jinhyun Hong Introduction Travel habits among Millennials (people born between 1980 and 2000)
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationDOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA
Statistica Sinica 22 (2012), 149-172 doi:http://dx.doi.org/10.5705/ss.2010.069 DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA Qi Long, Chiu-Hsieh Hsu and Yisheng Li Emory University,
More informationGov 2002: 4. Observational Studies and Confounding
Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What
More informationTreatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison
Treatment Effects Christopher Taber Department of Economics University of Wisconsin-Madison September 6, 2017 Notation First a word on notation I like to use i subscripts on random variables to be clear
More informationSuccessive Difference Replication Variance Estimation in Two-Phase Sampling
Successive Difference Replication Variance Estimation in Two-Phase Sampling Jean D. Opsomer Colorado State University Michael White US Census Bureau F. Jay Breidt Colorado State University Yao Li Colorado
More informationChapter 2 The Simple Linear Regression Model: Specification and Estimation
Chapter The Simple Linear Regression Model: Specification and Estimation Page 1 Chapter Contents.1 An Economic Model. An Econometric Model.3 Estimating the Regression Parameters.4 Assessing the Least Squares
More informationGender Recognition Act Survey ONLINE Fieldwork: 19th-21st October 2018
Recognition Act Survey ONLINE Fieldwork: thst October 0 Table Q. Do you think that those who wish to legally change their gender on official documentation (e.g. birth certificate, passport) should or should
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationBias-Variance in Machine Learning
Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on
More informationGraybill Conference Poster Session Introductions
Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary
More information