Imputation for Missing Data under PPSWR Sampling

Size: px
Start display at page:

Download "Imputation for Missing Data under PPSWR Sampling"

Transcription

1 July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23

2 () Outline () Imputation method under PPSWR sampling () Special case: SRS sampling and uniform response () () 2 23

3 () Item nonresponse: occurs frequently in sample surveys. Example In sample survey on transportation, some vehicles may not be found, but their tonnage or seat capacity is known to us. Solutions: (1) Increasing response probability; (2) Imputing the missing values of the sampled units. Imputation methods: Ratio imputation, regression imputation, random imputation etc. Shortcoming: Uniform response (often), simple random sampling PPSWR sampling: the sampling with probability proportional to size with replacement which is often used in the first stage of multistage sampling. 3 23

4 () Imputation method under PPSWR sampling 1. Notation Survey population U: consist of N distinct units identified through the labels i = 1,..., N. Suppose that the auxiliary variable, X, is available for each unit of the population, but the variable of interest, Y, is missing for some of the sampled units. s n : sample with size n drawn from U by PPSWR sampling. s r : the respondent set with size r( 1) s n r : the nonrespondent set with size n r.

5 Uniform response mechanism: independent response across sample units and equal response probability, p (q 1 p). Non-uniform response mechanism: independent response across sample units and possibly unequal response probabilities, p i (q i 1 p i ). Response indicator on y i : { 1, if the unit i responds to y i, I i = 0, otherwise. 5 23

6 2. Imputation method We first let p i be known. For the missing Y values, we suggest the following imputation method: y i = 1 n r q j y j x i, i s n r. p j s j x j r 6 23

7 It is interesting to note that yi is an approximation of the weighted least squares predictor under the following superpopulation model: y i = βx i + e i, ε(e i ) = 0, ε(e 2 i ) = σ2 x 2 i, ε(e ie j ) = 0 (i j). In fact, it can be seen that under the above superpopulation model, the weighted least squares estimator of β with the weights w i q i /(p i x 2 i ) is given by s ˆβ = r (q i y i )/(p i x i ) s r 1/p i r. Further, the expectation of s r 1/p i with respect to the response mechanism is n. 7 23

8 3. Estimator of population mean and its variance Applying the above imputation method to the PPSWR sampling, the corresponding Hansen-Hurwitz estimator of the population mean Ȳ is ˆȲ P P S = X n i s r y i = X p i x i n i s n y i p i x i I i. is design-unbiased under the non- Theorem 1. The estimator ˆȲ P P S uniform response mechanism. Theorem 2. Under the non-uniform response, the variance of ˆȲ P P S is given by V ( ˆȲ P P S) = 1 N 2 where Z i = X i /X. { 1 n N i=1 ( ) 2 Yi Z i Y + 1 Z i n N i=1 } q i Yi 2, p i Z i 8 23

9 4. Jackknife variance estimator Define the imputed values as follows: For i s n r, ( 1 q k y ) k x i, j s r, n r p k x k yi a k s (j) = r {j} ( 1 q k y ) k x i, j s n r, n r 1 p k x k k s r when the j-th sample unit is deleted. Based on these imputed values, the estimator of Ȳ can be obtained as X y i, j s r, n 1 p i x i ˆȲ P a i s P S(j) = r {j} X y i, j s n r, n 1 p i x i i s r when the j-th sample unit is deleted. 9 23

10 Define the j-th pseudovalue as follows: ˆȲ j = n ˆȲ P a P S (n 1) ˆȲ P P S(j). A jackknife variance estimator of ˆȲ P P S v J ( ˆȲ P P S ) = n 1 n = [ ˆȲ P P S j s n X 2 n(n 1) i s r is then given by y2 i p 2 i x2 i ˆȲ a P P S(j)] 2 1 n ( i s r y i p i x i ) 2. Theorem 3. Let n > 1. Then under the non-uniform response, we have E[v J ( ˆȲ P P S)] = V ( ˆȲ P P S). Theorem 3 shows that the jackknife variance estimator v J ( ˆȲ P P S ) is designunbiased under the non-uniform response

11 5. Extension to the case of unknown response probability In practice, the response probability p i is rare to be known. Here we model the response probability p i by a parametric model p i = g(x i ; θ), where g is a known smooth function and θ is an unknown finitedimensional parameter. An example of such a model is the logistic regression model. The estimator ˆθ of the parameter θ can be obtained by the maximum likelihood approach. Let ˆp i = g(x i ; ˆθ) be the estimated response probability. Then the corresponding estimator of the population mean Ȳ and its jackknife variance estimator are given by ˆȲ e P P S = X n i s r y i ˆp i x i, 11 23

12 and respectively. v J ( ˆȲ e P P S) = X 2 n(n 1) i s r y2 i ˆp 2 i x2 i 1 n ( i s r y i ˆp i x i ) 2, For the relationship between the estimators with known and unknown p i, under some regularity conditions, we have and ˆȲ e P P S = ˆȲ P P S + O p ( 1 n ), ( ) v J ( ˆȲ P e P S) = v J ( ˆȲ 1 P P S) + O p. n 3/

13 () Special case: SRS sampling and uniform response Imputed value: y i = Imputed estimator: 1 r y j x j s j r ˆȲ m SRS= ū r x n + where ū r = 1 u j with u j = y j /x j. r j s r x i, i s n r. r(n 1) (r 1)n (ȳ r ū r x r ), 13 23

14 The estimator ˆȲ SRS m is a design-unbiased estimator under uniform response. It is interesting that it is just the version of Hartley-Ross estimator (Hartley and Ross 1954, Nature) for estimating the population mean under two-phase sampling. Variance: V ( ˆȲ SRS) m = 1 ( ) np [S2 Y + (1 p)(2ȳ Ū X + Ū 2 T Ū 2 X2 2Ū Z)] 1 + O n 2, where T = 1 N and N T i with T i = Xi 2, and Z = 1 N i=1 S 2 Y = 1 N 1 N (Y i Ȳ )2. i=1 N Z i with Z i = Y i X i, i=

15 Jackknife variance estimator: v J ( ˆȲ SRS m ) = 1 { (n 1)(r 1)ū 2 n(n 1)(r 1) r s 2 x(n) + 1 (r 2) 2[n(r 2) x n (nr r 2) x r ] 2 s 2 u(r) + (n 2)2 r 2 (r 2) 2 s2 y(r) where s y ν 1 u ν 2x ν 3(r) = 1 r 1 + (n 2)r(nr 2r2 + 4r 4) (r 2) 2 ū 2 r s 2 x(r) 2(n 2)r + (r 2) 2 [n(r 2) x n (nr r 2) x r ] s yu (r) 2 (r 2) 2[(n2 r n(r 2 r + 2) (r 1)(r 2))(r 2) x n (n 2 r 2 nr(r 2 + 4) + 2(3r 2 4r + 4)) x r ]ū r s ux (r) 2(n 2)r(nr r2 + r 2) (r 2) 2 ū r s yx (r) 2 r 2 [n(r 2) x n (nr r 2) x r ]s u 2 x(r) + 2(nr r2 + r 2) ū r s ux 2(r) r 2 2r +s u 2 x2(r) n (ȳ r ū r x r )s ux (r) + 2(n 2)r s yux (r) r 2 } r2 n(r 1) (ȳ r ū r x r ) 2, j s r (y j ȳ r ) ν 1 (u j ū r ) ν 2 (x j x r ) ν

16 Two approximate Jackknife variance estimators: v J ( ˆȲ SRS m ) = nū2 1 rs 2 x(n) + 1 r [( x r x n ) 2 s 2 u(r) 2( x r x n )s yu (r) + s 2 y(r)] ( 1 + r 2 ) ( 1 ū 2 n rs 2 x(r) + 2 r 1 ) ū r [( x r x n )s ux (r) s yx (r)], n and v J( ˆȲ SRS) m = nū2 1 rs 2 x(n) + 1 ( 1 r s2 y(r) + r 2 ) ( 1 ū 2 n rs 2 x(r) 2 r 1 ) ū r s yx (r). n 16 23

17 Asymptotic design-unbiasedness: (i) E[v J ( ˆȲ SRS m m )] = V ( ˆȲ SRS ) + O ( ) 1 n. 2 (ii) E[v J ( ˆȲ SRS m m )] = V ( ˆȲ SRS ) + O ( ) 1 n. 2 (iii) E[v J ( ˆȲ SRS m m )] = V ( ˆȲ SRS ) + O ( ) 1 n. 2 Remark: We should note that the (approximate) design-unbiasedness is the main requirement for a good estimator in survey sampling. The approximate design-unbiasedness of the Jackknife variance estimators has been found first in Zou and Feng (1998), and then in Skinner and Rao (2002) and Zou et al. (2002). Such a property universally holds from our subsequent analysis

18 () The data are generated from the three ratio models which are different only in the auxiliary variables: y i = 3.9x i + x i ε i with x i U(0.1, 2.1), N(1, 1), and N(20, 16), respectively, ε i N(0, 1), and x i and ε i are assumed to be independent. In the case of uniform response, we set p = 0.76; in the case of nonuniform response, the unequal response probability p i for the unit i follows the logistic model p i = exp( x i) 1 + exp( x i )

19 We first generate a finite population with the size of N = 10, 000. Then the samples with n = 100 and n = 500 are drawn from the finite population by the PPSWR sampling, respectively. We repeat the process B = 5, 000 times. For the b-th run, denote the estimators of the population mean under the uniform (b) and non-uniform responses as and ˆȲ P P S, respectively. We calculate the ˆȲ I(b) P P S simulated means and variances as follows: and E ( ˆȲ I P P S) = 1 B V ( ˆȲ I P P S) = 1 B B ( b=1 B b=1 ˆȲ I(b) P P S, ˆȲ I(b) P P S Ȳ )2, E ( ˆȲ P P S) = 1 B V ( ˆȲ P P S) = 1 B B b=1 B ( b=1 ˆȲ (b) P P S ; ˆȲ (b) P P S Ȳ )2. Similarly, the corresponding jackknife variance estimators are calculated as E [v J ( ˆȲ I P P S)] = 1 B B b=1 v (b) J ( ˆȲ I P P S), and E [v J ( ˆȲ P P S)] = 1 B B b=1 v (b) J ( ˆȲ P P S)

20 20 23

21 Table 1 summarizes the results on the simulated mean, variance and jackknife variance estimate. It can be seen from the table that both of the estimators ˆȲ P I P S and ˆȲ P P S are very close to the true population means for the three distributions of auxiliary variable. Also, the jackknife variance estimators perform very well

22 To study the effect of the response probability, we set various response probabilities: p = 0.5 for uniform response, and p i follows p i = exp{0.3(x i X)} 1 + exp{0.3(x i X)} for non-uniform response. The results are presented in Table 2. It is observed that the approximate design-unbiasedness of the proposed estimators still holds. On the other hand, it is also clear that the variances become larger for low response probability. For some other settings of response probability, we obtain the similar results but omit them here for saving space

23 23 23

24 () Incomplete auxiliary information and multiple auxiliary information Estimation of response probability: the use of non-parametric approach. Other unequal probability sampling 24 23

25 Thank you! 25 23

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris Jackknife Variance Estimation for Two Samples after Imputation under Two-Phase Sampling Jong-Min Kim* and Jon E. Anderson jongmink@mrs.umn.edu Statistics Discipline Division of Science and Mathematics

More information

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nanhua Zhang Division of Biostatistics & Epidemiology Cincinnati Children s Hospital Medical Center (Joint work

More information

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris Jackknife Variance Estimation of the Regression and Calibration Estimator for Two 2-Phase Samples Jong-Min Kim* and Jon E. Anderson jongmink@morris.umn.edu Statistics Discipline Division of Science and

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD

ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD Statistica Sinica 12(2002), 1223-1239 ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MOD-CALIBRATED PSEUDO EMPIRICAL LIKIHOOD METHOD Jiahua Chen and Changbao Wu University of Waterloo Abstract:

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Inference with Imputed Conditional Means

Inference with Imputed Conditional Means Inference with Imputed Conditional Means Joseph L. Schafer and Nathaniel Schenker June 4, 1997 Abstract In this paper, we develop analytic techniques that can be used to produce appropriate inferences

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Graybill Conference Poster Session Introductions

Graybill Conference Poster Session Introductions Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Statistica Sinica 17(2007), 1047-1064 ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Jiahua Chen and J. N. K. Rao University of British Columbia and Carleton University Abstract: Large sample properties

More information

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Asymptotic Normality under Two-Phase Sampling Designs

Asymptotic Normality under Two-Phase Sampling Designs Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context

More information

A decision theoretic approach to Imputation in finite population sampling

A decision theoretic approach to Imputation in finite population sampling A decision theoretic approach to Imputation in finite population sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 August 1997 Revised May and November 1999 To appear

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

What is Survey Weighting? Chris Skinner University of Southampton

What is Survey Weighting? Chris Skinner University of Southampton What is Survey Weighting? Chris Skinner University of Southampton 1 Outline 1. Introduction 2. (Unresolved) Issues 3. Further reading etc. 2 Sampling 3 Representation 4 out of 8 1 out of 10 4 Weights 8/4

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n

More information

A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning Saharon Rosset Data Analytics Research Group IBM T.J. Watson Research Center Yorktown Heights, NY 1598 srosset@us.ibm.com Hui

More information

A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning Saharon Rosset Data Analytics Research Group IBM T.J. Watson Research Center Yorktown Heights, NY 1598 srosset@us.ibm.com Hui

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression Model 1 2 Ordinary Least Squares 3 4 Non-linearities 5 of the coefficients and their to the model We saw that econometrics studies E (Y x). More generally, we shall study regression analysis. : The regression

More information

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Statistica Sinica 13(2003), 613-623 TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Hansheng Wang and Jun Shao Peking University and University of Wisconsin Abstract: We consider the estimation

More information

University of Michigan School of Public Health

University of Michigan School of Public Health University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 003 Paper Weighting Adustments for Unit Nonresponse with Multiple Outcome

More information

ABHELSINKI UNIVERSITY OF TECHNOLOGY

ABHELSINKI UNIVERSITY OF TECHNOLOGY Cross-Validation, Information Criteria, Expected Utilities and the Effective Number of Parameters Aki Vehtari and Jouko Lampinen Laboratory of Computational Engineering Introduction Expected utility -

More information

Cluster Sampling 2. Chapter Introduction

Cluster Sampling 2. Chapter Introduction Chapter 7 Cluster Sampling 7.1 Introduction In this chapter, we consider two-stage cluster sampling where the sample clusters are selected in the first stage and the sample elements are selected in the

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

The Effective Use of Complete Auxiliary Information From Survey Data

The Effective Use of Complete Auxiliary Information From Survey Data The Effective Use of Complete Auxiliary Information From Survey Data by Changbao Wu B.S., Anhui Laodong University, China, 1982 M.S. Diploma, East China Normal University, 1986 a thesis submitted in partial

More information

Link lecture - Lagrange Multipliers

Link lecture - Lagrange Multipliers Link lecture - Lagrange Multipliers Lagrange multipliers provide a method for finding a stationary point of a function, say f(x, y) when the variables are subject to constraints, say of the form g(x, y)

More information

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy.

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy. CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2014 www.csgb.dk RESEARCH REPORT Ina Trolle Andersen, Ute Hahn and Eva B. Vedel Jensen Vanishing auxiliary variables in PPS sampling with applications

More information

in Survey Sampling Petr Novák, Václav Kosina Czech Statistical Office Using the Superpopulation Model for Imputations and Variance

in Survey Sampling Petr Novák, Václav Kosina Czech Statistical Office Using the Superpopulation Model for Imputations and Variance Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling Czech Statistical Office Introduction Situation Let us have a population of N units: n sampled (sam) and N-n

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

J.N.K. Rao, Carleton University Department of Mathematics & Statistics, Carleton University, Ottawa, Canada

J.N.K. Rao, Carleton University Department of Mathematics & Statistics, Carleton University, Ottawa, Canada JACKKNIFE VARIANCE ESTIMATION WITH IMPUTED SURVEY DATA J.N.K. Rao, Carleton University Department of Mathematics & Statistics, Carleton University, Ottawa, Canada KEY WORDS" Adusted imputed values, item

More information

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy. 5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis JULY 2011 Afsaneh Yazdani Preface Learning from Data Four-step process by which we can learn from data: 1. Defining the Problem 2. Collecting the Data 3. Summarizing

More information

Petter Mostad Mathematical Statistics Chalmers and GU

Petter Mostad Mathematical Statistics Chalmers and GU Petter Mostad Mathematical Statistics Chalmers and GU Solution to MVE55/MSG8 Mathematical statistics and discrete mathematics MVE55/MSG8 Matematisk statistik och diskret matematik Re-exam: 4 August 6,

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data

A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data Phillip S. Kott RI International Rockville, MD Introduction: What Does Fitting a Regression Model with Survey Data Mean?

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Discussing Effects of Different MAR-Settings

Discussing Effects of Different MAR-Settings Discussing Effects of Different MAR-Settings Research Seminar, Department of Statistics, LMU Munich Munich, 11.07.2014 Matthias Speidel Jörg Drechsler Joseph Sakshaug Outline What we basically want to

More information

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University  babu. Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling

More information

Simple Linear Regression Analysis

Simple Linear Regression Analysis LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

This paper is not to be removed from the Examination Halls

This paper is not to be removed from the Examination Halls ~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical

More information

Graduate Econometrics I: Unbiased Estimation

Graduate Econometrics I: Unbiased Estimation Graduate Econometrics I: Unbiased Estimation Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Unbiased Estimation

More information

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2 STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square

More information

Multidimensional Control Totals for Poststratified Weights

Multidimensional Control Totals for Poststratified Weights Multidimensional Control Totals for Poststratified Weights Darryl V. Creel and Mansour Fahimi Joint Statistical Meetings Minneapolis, MN August 7-11, 2005 RTI International is a trade name of Research

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).

More information

Workpackage 5 Resampling Methods for Variance Estimation. Deliverable 5.1

Workpackage 5 Resampling Methods for Variance Estimation. Deliverable 5.1 Workpackage 5 Resampling Methods for Variance Estimation Deliverable 5.1 2004 II List of contributors: Anthony C. Davison and Sylvain Sardy, EPFL. Main responsibility: Sylvain Sardy, EPFL. IST 2000 26057

More information

Admissible Estimation of a Finite Population Total under PPS Sampling

Admissible Estimation of a Finite Population Total under PPS Sampling Research Journal of Mathematical and Statistical Sciences E-ISSN 2320-6047 Admissible Estimation of a Finite Population Total under PPS Sampling Abstract P.A. Patel 1* and Shradha Bhatt 2 1 Department

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

L2: Two-variable regression model

L2: Two-variable regression model L2: Two-variable regression model Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: September 4, 2014 What we have learned last time...

More information

Nonstationary Panels

Nonstationary Panels Nonstationary Panels Based on chapters 12.4, 12.5, and 12.6 of Baltagi, B. (2005): Econometric Analysis of Panel Data, 3rd edition. Chichester, John Wiley & Sons. June 3, 2009 Agenda 1 Spurious Regressions

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

Bessel s Equation. MATH 365 Ordinary Differential Equations. J. Robert Buchanan. Fall Department of Mathematics

Bessel s Equation. MATH 365 Ordinary Differential Equations. J. Robert Buchanan. Fall Department of Mathematics Bessel s Equation MATH 365 Ordinary Differential Equations J. Robert Buchanan Department of Mathematics Fall 2018 Background Bessel s equation of order ν has the form where ν is a constant. x 2 y + xy

More information

Topic 16 Interval Estimation

Topic 16 Interval Estimation Topic 16 Interval Estimation Additional Topics 1 / 9 Outline Linear Regression Interpretation of the Confidence Interval 2 / 9 Linear Regression For ordinary linear regression, we have given least squares

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

2017 Financial Mathematics Orientation - Statistics

2017 Financial Mathematics Orientation - Statistics 2017 Financial Mathematics Orientation - Statistics Written by Long Wang Edited by Joshua Agterberg August 21, 2018 Contents 1 Preliminaries 5 1.1 Samples and Population............................. 5

More information

Economics 582 Random Effects Estimation

Economics 582 Random Effects Estimation Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Multiple Regression Analysis: Heteroskedasticity Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information