TWO-WAY CONTINGENCY TABLES WITH MARGINALLY AND CONDITIONALLY IMPUTED NONRESPONDENTS

Size: px
Start display at page:

Download "TWO-WAY CONTINGENCY TABLES WITH MARGINALLY AND CONDITIONALLY IMPUTED NONRESPONDENTS"

Transcription

1 TWO-WAY CONTINGENCY TABLES WITH MARGINALLY AND CONDITIONALLY IMPUTED NONRESPONDENTS By Hansheng Wang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) at the UNIVERSITY OF WISCONSIN MADISON 2006

2 i Abstract We consider estimating the cell probabilities and testing hypotheses in a two-way contingency table where two-dimensional categorical data have nonrespondents imputed using either conditional imputation or marginal imputation. Under simple random sampling, we derive asymptotic distributions for cell probability estimators based on the imputed data. Under conditional imputation, we also show that these estimators are more efficient than those obtained by ignoring nonrespondents when the proportion of nonrespondents is large. A Wald type test along with a Rao and Scott type corrected chi-square test for goodnessof-fit are derived. We show that the naive chi-square test for independence, which treats imputed values as observed data, is still asymptotically valid under marginal imputation. Provided we make a simple adjustment of multiplying by an appropriate factor, the naive chi-square test for independence is also valid under conditional imputation. We present simulation results which examine the size and compare the power of these tests. Some of the results are extended to stratified sampling with imputation within each stratum or across strata. Asymptotics are studied under two types of stratified sampling: 1) when the number of strata is fixed with large stratum sizes and 2) when the number of strata is large with small stratum sizes.

3 ii Acknowledgements First, I want to express my deepest gratitude to my PH.D. adviser Prof. Jun Shao. It is him who suggested my thesis topic and led me into the field of sample survey and imputation. For the first time in my life, I found that the world of statistics is so exciting! My curiosities, enthusiasms, and ambitions are always encouraged and appreciated there. It is his endless help and encourage that makes my academic life in Madison so challenging and productive. Prof. Shao also helps me build up my own research style, which emphasize both theories and application. It is also Prof. Shao who introduces me to Dr. Shein-Chung Chow, another respected researcher I am so grateful to. Although Dr. Chow did not give me help on sample survey and imputation, it is him who led me into the field of pharmaceutical statistics, where I believe I am going to build up my own career. The most important thing I learned from Dr. Chow is practical sense, which gives me a unique understanding of what is statistics and what statistics should do. Statistics is neither science nor mathematics. Instead, it is a way of reasoning and a philosophy of understanding, when unexplained variation exists in the data. I believe this understanding will play an important role in guiding my future career and research. Next, I want to thanks all my friends for their help and support. I want to thanks Landon Sego, David Dahl, Emmily Chow, and JoAnne Pinto for their

4 iii careful proofing of my thesis. Without their help, I can not finish my thesis writing in such a short time. I also want to thanks Bing Chen, Quan Hong, and Yuefeng Lu for their help and support when I was defending my thesis in Madison. I also want to thanks my college classmates Xuan Liu and Xiaohuang Hong for their long-time support and encourage since college whenever I encounter difficulty. Furthermore, I want to thanks my PH.D. defense committee, which includes Prof. Richard Johnson, Prof. Kam-Wah Tsui, Prof. Yi Lin, and Prof. Jun Zhu, for their careful reading and constructive comments. Finally, I want to give special thanks to my parents. As the only child of the family, I was given all the love, bless, and wishes they can give. They are my support and motivation whenever I want to give up. During the past three years study in USA, I missed them so much. I wish my PH.D. degree can bring them happiness and proud!!

5 iv Contents Abstract i Acknowledgements ii 1 Introduction Background An Outline Imputation Under Simple Random Sampling Introduction Statistical Model for Nonresponse Marginal and Conditional Imputation Asymptotic Distribution The Case Where A and B Are Independent The Case Where A and B Are Dependent Weighted Mean Squared Error Testing for Goodness-of-Fit Testing for Independence Simulation Study Under Simple Random Sampling Introduction

6 v 3.2 Asymptotic Normality Weighted Mean Squared Error Testing for Goodness-of-Fit Testing for Independence Marginal Imputation Conditional Imputation Relative Efficiency Conclusion Imputation Under Stratified Sampling Introduction Imputation Within Each Stratum Asymptotic Distribution Rao s Test for Goodness-of-Fit Imputation Across Strata with Small H Asymptotic Distribution Rao s Test for Goodness-of-fit Imputation Across Strata with Large H Asymptotic Distribution Asymptotic Covariance and Estimation Simulation Study Under Stratified Sampling Introduction

7 vi 5.2 Imputation Within Each Stratum Wald s Test for Goodness-of-Fit Rao s Test for Goodness-of-Fit Imputation Across Strata with Small H Wald s Test for Goodness-of-Fit Rao s Test for Goodness-of-Fit Imputation Across Strata with Large H Conclusion Real Data Study The Beaver Dam Eye Study Victimization Incidents Study Bibliography 111

8 1 Chapter 1 Introduction 1.1 Background Two-way contingency tables are widely used for summarizing two dimensional categorical data. Each cell in a two-way contingency table is a category defined by two-dimensional categorical variables. Sample cell frequencies are often computed based on the observed responses (of the two-dimensional categorical variables) from a sample of units (subjects). Statistical inferences including estimating cell probabilities, testing the hypothesis of independence, and testing goodness-of-fit are often carried out. In sample surveys or medical studies, it is not uncommon for one or two of the categorical responses to be missing. Sampled units for which both components are missing (unit nonrespondents) can be handled by a suitable adjustment of sampling weights. In practice, however, many sampled units may have exactly one missing component in their responses (item nonrespondents). The approach of ignoring data from sampled units with exactly one missing component is not acceptable, because throwing away the observed data may result in a serious decrease in efficiency of the analysis.

9 2 A popular method to handle item nonresponses is imputation, which inserts values for the unobserved items. Justification for the use of imputation with practical considerations can be found in Kalton and Kasprzyk (1986). After imputation, statistical inferences can then be made by treating the imputed values as the observed data using formulas designed for the case of no nonresponse. Various imputation methods have been proposed and studied by different authors (Little and Rubin, 1987; Schafer, 1997). All the imputation methods can be roughly divided into two categories: model based imputation methods and nonparametric imputation methods. The model based imputation methods assume a parametric or semi-parametric model for the responses and the missingness. The most typical example is regression imputation, which assumes a linear model between the response and the observed covariates. The situation where the random errors in the linear model are normally distributed was studied by Srivastava and Carter (1986). Shao and Wang (2001) extend the results to include the case in which there was no parametric assumption for the random error. The nonparametric imputation methods do not make any parametric assumption on the distribution of responses and missingness. Typical approaches belonging to this category include hot deck imputation, cold deck imputation, and nearest neighborhood imputation (Chen and Shao 2000; Chen and Shao, 2001). However, all the above methods investigate either continuous data or onedimensional categorical data. The imputation methods for multi-dimensional

10 3 categorical data are not well studied. For example, for a two-way contingency table, which is essentially a multi-dimensional categorical data problem, the statistical problems of interest include the following: How dose one impute the data? How does the relative efficiency of imputation compare with other methods, (e.g. re-weighting method)? How can tests be performed in a valid way? Another important problem for imputation is the variance/covariance estimation. It is well known that the variance/covariance of the estimators given by imputation may be different from the variance/covariance of the estimators for complete data sets. As a result, the estimators designed to estimate the variance and covariance of estimators for complete data sets may not be valid for the estimators generated by imputation. There are three commonly used approaches to obtain estimators for the variance and covariance of estimators given by imputation. One is linearization, which uses Taylor s expansion to obtain an explicit theoretical formula for the covariance structure of the estimators and then replace all the unknown quantities by their consistent point estimators. The merit of the linearization method is that it requires less computation as compared with other methods, e.g., resampling methods. However, it is not uncommon that the theoretical formula is too complex to be used. As an alternative, resampling methods such as jackknife and bootstrap (Rao and Shao, 1992) are commonly used to obtain the variance/covariance estimators. The third approach is multiple imputation (Rubin, 1987), which imputes the same

11 4 data set more than once and then obtains the variance estimator by combining the between and within imputation variability in an appropriate way. The main purpose of this thesis is to investigate the statistical properties of a conditional imputation method, which imputes nonrespondents using the estimated conditional probabilities. More specifically, we study (i) the consistency of estimators of cell probabilities based on imputed data; (ii) the asymptotic variances and covariances of estimators of cell probabilities, which lead to consistent variance and covariance estimators; and (iii) the validity of chi-square type tests for goodness of fit or independence. For testing independence of the two components of the categorical variable, we also study a marginal imputation method, which imputes nonrespondents using the estimated marginal probabilities. 1.2 An Outline The rest of this thesis is organized as follows. In Chapter 2, we study both conditional and marginal imputation under simple random sampling. In Chapter 3, extensive simulations are performed to evaluate the finite sample performance of the procedures described in Chapter 2. In Chapter 4, we study conditional imputation for stratified sampling, which includes imputation within stratum and imputation across strata. For the method of imputation across strata, two different types of asymptotics are considered. One deals with a small number of strata with a large stratum size. The other is a large number of strata

12 5 with a small stratum size. Extensive simulations are carried out in Chapter 5 to evaluate the finite sample performance of the procedures obtained in Chapter 4. Finally, several real data sets are presented to illustrate the proposed imputation methods in Chapter 6.

13 6 Chapter 2 Imputation Under Simple Random Sampling 2.1 Introduction In this chapter, we introduce two imputation methods under simple random sampling: marginal imputation and conditional imputation. Our results show that the point estimators obtained by conditional imputation are consistent, but those obtained by marginal imputation are usually not unless the two components of the categorical variables are independent of each other. The asymptotic distributions of point estimators under both imputation methods are derived when appropriate. In order to evaluate the statistical performances of the point estimators, we propose a measure called Weighted Mean Squared Error (WMSE). Then the estimators given by conditional imputation and reweighting methods are compared in terms of WMSE. The results show that conditional imputation can improve efficiency when the proportion of complete units is small. Testing goodness-of-fit is also considered. We first propose a

14 7 Wald type statistic, which is asymptotically valid. Then we show that performing the naive method of treating the imputed value as observed and applying Pearson s chi-square test is not valid. We propose a Rao type correction to the naive method. Finally, the performances of Wald type statistic and Rao type statistic are compared. Results show that their performances are comparable in terms of type I error. Testing independence is also considered. We show that the naive method of applying Pearson s chi-square statistic directly is still asymptotically valid under marginal imputation but not under conditional imputation. By simply fixing a constant, the naive method is also asymptotically correct under conditional imputation Statistical Model for Nonresponse Consider a two-dimensional response vector (A, B), where both A and B are categorical responses taking values from {1,, a} and {1,, b}, respectively. In practice, imputation is carried out by first creating some imputation classes and imputing nonrespondents within each imputation class. Imputation classes are sub-populations of the original population and are usually formed by using an auxiliary variable without nonresponse. For example, in many business surveys, imputation classes are strata or unions of strata. In medical studies, if data are obtained under several different treatments, the treatment groups are imputation classes. Throughout this chapter, we make the following assumption:

15 8 Assumption A. For each sampled unit within an imputation class, π A denotes the probability of observing A and missing B, π B denotes the probability of observing B and missing A, and π C denotes the probability of observing both A and B. As discussed in the previous chapter, the units with both respondents missing (unit nonresponses) can be ignored by suitably adjusting the sampling weights. As a result, we assume there is no unit nonresponse, i.e., π A +π B +π C = 1. Note that the probability π A, π B, and π C may be different in different imputation classes. For simplicity and without loss of generality, we only consider the case of simple random sampling with replacement. In practice, the data may be obtained by sampling without replacement. Our derived results are still valid if the sampling fraction is negligible. For the sake of convenience, we assume that there is only one imputation class, since the extension to multiple imputation classes is straightforward. 2.2 Marginal and Conditional Imputation Suppose there are n sampled units, which are indexed by k (i.e., (A k, B k ), k = 1,, n). To simplify the notation, (A k, B k ) may also be referred to as (A, B). Let C A, C B, and C C be the collection of the indices of the units with B missing, A missing, and neither A nor B missing, respectively. Let n A = C A, n B = C B,

16 9 and n C = C C, where S denotes the number of elements contained in a finite set S. In other words, n A, n B, and n C are the number of the units with B missing, with A missing, and neither A nor B missing, respectively. Therefore, the total sample size is given by n = n A + n B + n C. Let n C ij denote the total number of completers such that (A, B) = (i, j). Let p ij = P ((A, B) = (i, j)), p i = P (A = i), and p j = P (B = j). A typical method for estimation of p i and p j can be obtained by using completers as follows ˆp C i = b j=1 nc ij ij nc ij = nc i n C and ˆp C j = a i=1 nc ij ij nc ij = nc j n C, where n C i = b j=1 nc ij, n C j = a i=1 nc ij, and n C = ij nc ij. Given an incompleter (A, B) = (i, ), where * denotes the missing value, marginal imputation imputes the missing value j (1 j b) with probability ˆp C j. It means that the missing value is imputed according to its estimated marginal distribution without conditioning on the observed item (A = i). Missing values from incompleters are imputed independently. Intuitively, parameters such as p j and p i can be estimated consistently based on marginally imputed data, but parameters such as p ij cannot be estimated consistently, since the relationship between A and B is not preserved during marginal imputation. The conditional probability P (B = j A = i) is denoted by p ij A = p ij /p i.

17 10 Thus, an estimator based on completers for p ij A is given by ˆp C ij A = ˆpC ij ˆp C i = nc ij/n C n = nc ij. C i /nc n C i Conditional imputation imputes the missing value j (1 j b) with probability ˆp C ij A. In other words, given the completers, conditional imputation imputes a missing value according to its estimated conditional distribution given the observed component. Imputation for an incompleter with A missing is similar, and incompleters are imputed independently, conditioning on the completers and their observed components. After imputation, estimators of p ij are obtained using the standard formulas in a two-way contingency table by treating the imputed values as observed data. We denote those estimators (based on either marginal or conditional imputation) by ˆp I ij. Recall that C A is the collection of the indices of the units with A observed and B missing. Let ˆp A ij = 1 n A k C A I{(A k, B k ) = (i, j)}, where B k is the value obtained by imputation (either marginal imputation or conditional imputation) for any k C A. ˆp B ij and ˆp C ij are similarly defined. The relationship between ˆp I ij and ˆp A ij, ˆp B ij, ˆp C ij is given by ˆp I ij = naˆp A ij + n B ˆp B ij + n C ˆp C ij. n

18 11 For the sake of convenience, we define p = (p 11,, p 1b,, p a1,, p ab ) p A = (p 1,, p a ) p B = (p 1,, p b ) ˆp I = (ˆp I 11,, ˆp I 1b,, ˆpI a1,, ˆp I ab ) (2.1) ˆp A = (ˆp A 11,, ˆp A 1b,, ˆpA a1,, ˆp A ab ) ˆp B = (ˆp B 11,, ˆp B 1b,, ˆpB a1,, ˆp B ab ) ˆp C = (ˆp C 11,, ˆp C 1b,, ˆpC a1,, ˆp C ab ). 2.3 Asymptotic Distribution In order to obtain the limiting distribution of ˆp I ij, Lemma 1 is giveere without proof. Lemma 1 is also used when we study stratified sampling in Chapter 4. A more general form of the lemma can be found in Schenker and Welsh (1988). Lemma 1 Let X n be a sample, and U n (X n ), W n be two random vectors, such that U n d N(0, Σ 1 ) and W n X n d N(0, Σ 2 ), then U n + W n d N(0, Σ 1 + Σ 2 ).

19 The Case Where A and B Are Independent When A and B are independent, the estimators produced by both marginal and conditional imputation are consistent. However, their variances and covariances are different from the variances and covariances of the standard estimator of p ij when there is no nonresponse. The following theorem establishes the asymptotic distributions and covariance matrices of ˆp I, which were defined in (2.1), under both conditional imputation and marginal imputation. Theorem 1 Assume that A and B are independent. If π C > 0, then, as n, n(ˆp I p) d N(0, Σ), where (a) under marginal imputation (b) under conditional imputation Σ = ( 1 π C Σ = P A P B + ( π C+2π C π A +π 2 A π C )(P A (p B p B )) +( π C+2π C π B +π 2 B π C )(p A p A ) P B ; + 1 π C )P A P B + ( π C+2π C π A +π 2 A π C )(P A (p B p B )) +( π C+2π C π B +π 2 B π C )(p A p A ) P B. is the Kronecker product; p A and p B are given in (2.1). P A = diag(p A ) p A p A, where diag(p A ) denotes a diagonal matrix with the same dimension as p A and with its i th (1 i a) diagonal element to be the i th component of p A.

20 13 P B = diag(p B ) p B p B. Proof: After imputation, each unit becomes complete. For a given unit, (A k, B k ), we can define I A k to be an a-dimensional vector with the ith element equal to 1 and the others equal to 0 if A k = i. Similarly, define I B k to be a b-dimensional vector with j th element equal to 1 and the others equal to 0 if B k = j. Under the hypothesis that A and B are independent, I A k and IB k are independent. According to (2.1), note that ˆp I = 1 n n t=1 ˆp A = 1 n A I A k t C A I A k ˆp B = 1 n B t C B I A k ˆp C = 1 n C I B k I B k I B k t C C I A k I B k. It follows that n(ˆp I p) = [ ] n A (ˆp A p) + n B (ˆp B p) + (ˆp C p) n n = [ ] n A (E(ˆp A σ(c)) p) + n B (E(ˆp B σ(c)) p) + n C (ˆp C p) n n

21 14 + [ ] n A (ˆp A E(ˆp A σ(c))) + n B (ˆp B E(ˆp B σ(c))) n, n where E( σ(c)) denotes E( n A, n B, n C, (A k, B k ), k C A ). In other words, E( σ(c)) denotes the expectation conditional on the completers and the number of incompleters. Let U n = [ ] n A (E(ˆp A σ(c)) p) + n B (E(ˆp B σ(c)) p) + n C (ˆp C p) n, (2.2) n and W n = [ ] n A (ˆp A E(ˆp A σ(c))) + n B (ˆp B E(ˆp B σ(c))) n. (2.3) n (a) Marginal imputation. Given σ(c) (i.e., n A, n B, n C, and the completers), {Ik A IB k } k C A are i.i.d random vector with mean E(ˆp A σ(c)). According to Central Limit Theorem, na (ˆp A E(ˆp A σ(c))) σ(c) d N(0, Σ W ), and Σ W = diag{e(ˆp A σ(c))} (E(ˆp A σ(c)))(e(ˆp A σ(c))). On the other hand, E(ˆp A ij σ(c)) = p i ˆp C j a.s p i p j = p ij as n C. Therefore, Σ W a.s diag{p} pp. = P, (2.4)

22 15 where =. denotes defined to be. This leads to W n σ(c) d N(0, P ). Consequently, W n σ(c) n A = na (ˆp A E(ˆp A σ(c))) n n B + nb (ˆp B E(ˆp B σ(c))) n = π A na (ˆp A E(ˆp A σ(c))) + π B nb (ˆp B E(ˆp B σ(c))) + o p (1) d πa N(0, P ) + π B N(0, P ) = N(0, (1 π C )P ). Under the assumption that A and B are independent, it follows that p 1 ˆp C 1 p 11 p 1 (ˆp C 1 p 1 ).. p 1 ˆp C b p 1b p 1 (ˆp C b p b) E[ˆp A σ(c)] p =. =. p a ˆp C 1 p a1 p a (ˆp C 1 p 1 ).. p a ˆp C b p ab p a (ˆp C b p b) = p A [ 1 n C = 1 n C k C C (I B k p B )] k C C p A (I B k p B ).

23 16 Similarly, E(ˆp B σ(c)) p = 1 n C (I A k p A ) P B. Thus, we have = = = U n 1 [ n A (E(ˆp A σ(c)) p) + n B (E(ˆp B σ(c)) p) + n C (ˆp C p) ] n n A nc (E(ˆp A σ(c)) p) + nb nc (E(ˆp B σ(c)) p) nn C nn C n C + nc (ˆp C p) n π A nc (E(ˆp A σ(c) p) πc + π B πc nc (E(ˆp B σ(c) p) = + π C nc (ˆp C p) + o p (1) 1 [ πc (I A n C k p A ) (Ik B p B ) + π C + π A p A (Ik B p B ) πc k C C + π C + π B πc (I A k p A ) p B ] + o p (1) d N(0, Σ U ), where ( πc Σ U = var (Ik A p A ) (Ik B p B ) + π C + π A p A (Ik B p B ) πc + π ) C + π B (Ik A p A ) p B. πc

24 17 Let P A = diag{p A } p A p A and P B = diag{p B } p B p B. Then we have var[(ik A p a ) (Ik B p B )] = E{[(Ik A p A ) (Ik B p B )][(Ik A p A ) (Ik B p B )] } = E{[(Ik A p A )(Ik A p A ) ] [(Ik B p B )(Ik B p B ) ]} = E[(Ik A p A )(Ik A p A ) ] E[(Ik B p B )(Ik B p B ) ] = P A P B. The third equatioolds because I A k and IB k are independent. Similarly var[p A (Ik B p B )] = (p A p A ) P B var[(ik A p A ) p B ] = P A (p B p B ) cov[(ik A p A ) (Ik B p B ), p A (Ik B p B )] = E{[(Ik A p A )p A ] [(Ik B p B )(Ik B p B ) ]} = 0 P B = 0 cov[(ik A p A ) (Ik B p B ), (Ik A p A ) p B ] = E{[(Ik A p A )(Ik A p A ) ] [(Ik B p B )p B ] = P A 0 = 0

25 18 cov[p A (Ik B p B ), (Ik A p A ) p B ] = E{[p A (Ik A p A ) ] [(Ik B p B )p B ]} = 0. As a result, Σ U is given by Σ U = π C P A P B + (π C + π A ) 2 π C (p A p A ) P B + (π C + π B ) 2 π C P A (p B p B ). Therefore, U n d N(0, Σ U ). Since W n d N(0, (1 π C )P ) and P = diag{p} pp = diag{p A p B } (p A p A ) (p B p B ) = (diag{p A } p A p A ) (diag{p B } p B p B ) +(p A p A ) (diag{p B } p B p B ) +(diag{p A } (p A p A ) (p B p B ) = P A P B + (p A p A ) P B + P A (p B p B ), we have n(ˆp p) = Wn + U n d N(0, (1 π C )P + Σ U ) = N(0, Σ),

26 19 where Σ = P A P B + ( π C + 2π C π A + π 2 A π C )(P A (p B p B )) +( π C + 2π C π B + π 2 B π C )(p A p A ) P B. (b) Conditional Imputation. Suppose the total sample size is large and P (n C = 0) is negligible. Similarly, under conditional imputation we have W n σ(c) d N(0, (1 π C )P ). On the other hand, by Taylor s expansion, E(ˆp A σ(c)) p A p B p 1 ( ˆpI 11 ˆp 1 p 1 ) ˆp I 11 p 1 p 1 (ˆp I 1 p 1 )p 1... p 1 ( ˆpI 1b ˆp 1 p b ) ˆp I 1b p 1 p b (ˆp I 1 p 1 )p b =. =.. p a ( ˆpI a1 ˆp a p 1 ) ˆp I a1 p a p 1 (ˆp I a p a )p 1... p a ( ˆpI ab ˆp a p b ) ˆp I ab p a p b (ˆp I a p a )p b +o p (1) = 1 [(I n C k A Ik B p A p B ) (Ik A p A ) p B ] + o p (1) k C C = 1 [(I n C k A p A ) (Ik B p B ) + p A (Ik B p B )] + o p (1). k C C

27 20 Similarly, As a result, E(ˆp B σ(c)) p A p B = 1 [(I n C k A p A ) (Ik B p B ) + (Ik A p A ) p B ] + o p (1). k C C U n = 1 n C [ πc (I A k I B k p A p B ) + π A ((Ik A p A ) (Ik B p B ) + p A (Ik B p B )) πc + π ] B ((Ik A p A ) (Ik B p B ) + (Ik A p A ) p B ) + o p (1) πc = 1 [ 1 (I n C k A p A ) (Ik B p B ) πc + π C + π A πc p A (I B k p B ) d + π ] C + π B (Ik A p A ) p B + o p (1) πc ( 1 N 0, P A P B + (π C + π A ) 2 (p A p A ) P B π C π C + (π ) C + π B ) 2 P A (p B p B ). π C Consequently, W n + U n d N(0, Σ), where Σ = ( π C )P A P B + π C + 2π C π A + πa 2 (p A p A ) P B π C + π C + 2π C π B + π 2 B π C P A (p B p B ). π C

28 The Case Where A and B Are Dependent When A and B are dependent, the point estimators obtained by marginal imputation are not consistent. Therefore, marginal imputation is usually not considered in this case. The asymptotic results are established for conditional imputation only. Theorem 2 Assume that π C > 0. Under conditional imputation, n(ˆp I p) d N(0, Σ), where Σ = MP M + (1 π C )P, M = 1 πc I a b π A πc diag{p B A }I a U b π B πc diag{p A B }U a I b, (2.5) and p A B = (p 11 /p 1,, p 1b /p b,, p a1 /p 1,, p ab /p b ), p B A = (p 11 /p 1,, p 1b /p 1,, p a1 /p a,, p ab /p a ). (2.6) I d (d = a b, a, or b) is a d-dimensional identity matrix, and U d is a d- dimensional square matrix with all elements equal to 1. Proof: U n and W n are defined in (2.2) and (2.3). Under conditional imputation, we have W n σ(c) d N(0, (1 π C )P ),

29 22 where P was given in (2.4). On the other hand, nc [E(ˆp A σ(c)) p] ˆp p C 11 1 p ˆp C ˆp = p C 1b 1 p ˆp C 1b 1 n C. ˆp p C a1 a p ˆp C ab a. p ab +o p (1). p a ˆp C ab ˆp C a = n C ˆp C 11 p 11. ˆp C 1b p 1b. ˆp C a1 p a1 ˆp C ab p ab n C p 11 p 1 (ˆp C 1 p 1 ). p 1b p 1 (ˆp C 1 p 1 ). p a1 p a (ˆp C a p a ). p ab p a (ˆp C a p a ) As a result, nc [E(ˆp A σ(c)) p] = [I a b diag{p B A }I a U b ][ n C (ˆp C p)] + o p (1), where p B A is defined in (2.6). Similarly, it can be shown that nc [E(ˆp B σ(c)) p] = [I a b diag{p A B }(U a I b )}][ n C (ˆp C p)] + o p (1).

30 23 Hence, U n = n C [ π A πc (E(ˆp A σ(c)) p) + π B πc (E(ˆp B σ(c)) p) + π C (ˆp C p)] + o p (1) = M n C (ˆp C p) + o p (1) d N(0, MP M ), where M is given in (2.5). Consequently, n(ˆp I p) = W n + U n d N(0, MP M + (1 π C )P ) = N(0, Σ). Thus, asymptotic covariance matrices can be estimated by replacing p ij, π A, π B, and π C in Σ by ˆp I ij, n A /n, n B /n, and n C /n, respectively. The asymptotic covariance matrix is denoted by ˆΣ. 2.4 Weighted Mean Squared Error Let ˆp be an arbitrary estimator of the cell probability vector p. To evaluate its performance, we propose a measure called weighted mean squared error (WMSE), which is defined by WMSE(ˆp) = ij E(ˆp ij p ij ) 2 p ij. Theorem 3 Under conditional imputation, W MSE(ˆp I ) = 1 n [ 1 π C (ab + π 2 A a + π2 B b 2π Aa 2π B b + 2π A π B +2π A π B δ) π C ab + (ab 1)] + o( 1 n ),

31 24 where δ is a non-centrality parameter given by δ = (p ij p i p j ) 2 p i p j. Intuitively δ can be interpreted as a measure for the dependency between A and B. When A and B are independent, δ reaches its smallest possible value 0. Proof: It has been shown that n(ˆp I p) = N(0, Σ) + o p (1), where Σ is given in Theorem 2. For the sake of convenience, we define 1/ p = (1/ p 11,, 1/ p 1b,, 1/ p a1,, 1/ p ab ) p 2 /(p A p B ) = (p 2 11/(p 1 p 1 ),, p 2 1b/(p 1 p b ),, p 2 a1/(p a p 1 ),, p 2 ab/(p a p b )). It follows that ndiag{1/ p}(ˆp I p) d N(0, Σ ), where Σ = diag{1/ p} Σ diag{1/ p}. On the other side, Σ = M diag{p} M + (1 π C ) diag{p} pp, and tr[diag{1/ p diag{p} diag{1/ p}] = a b tr[diag{1/ p pp } diag{1/ p}] = 1.

32 25 As a result, we only need to focus on the evaluation of tr[diag{1/p} M diag{p } M diag{1/p}]. It can be verified that a = tr[diag{1/ p} diag{p B A } {I a U b } diag{p} {I a U b } diag{p B A } diag{1/ p}] b = tr[diag{1/ p} diag{p A B } (U a I b ) diag{p} (U a I b ) diag{p A B } diag{1/ p}] a = tr[diag{1/ p} diag{p} (I a U b ) diag{p B A } diag{1/ p}] a = tr[diag{1/ p} diag{p B A } (I a U b ) diag{p} diag{1/ p}] b = tr[diag{1/ p} diag{p A B } (U a I b ) diag{p} diag{1/ p} b = tr[diag{1/ p} diag{p} (U a I b ) diag{p A B } diag{1/ p}]. Note that δ = (p ij p i p j ) 2 p i p j = p 2 ij p i p j = p 2 ij p i p j 1 2 p ij + p i p j = tr(diag{p 2 /(p A p B )}) 1,

33 26 it follows that tr[diag(1/ p) diag(p A B ) diag(u a I b ) diag(p) diag(i a U b ) diag(p B A ) diag(1/ p)] = tr(diag(p 2 /(p A p B ))) = δ + 1 tr[diag(1/ p) diag(p B A ) diag(i a U b ) diag(p) diag(u a I b ) diag(p A B ) diag(1/ p)] = tr(p 2 /(p A p B )) = δ + 1. Thus, tr(σ ) = 1 n [ 1 π C (ab + π 2 Aa + π 2 Bb 2π A a 2π B b + 2π A π B + 2π A π B δ) π C ab + (ab 1)] + o( 1 n ). The proof is completed by noting that nw MSE(ˆp I ) = E ndiag{1/ p}(ˆp I p) 2 = tr(σ ) + o(1). According to Theorem 3, WMSE(ˆp I ) depends on the probabilities π A, π B and π C, and the cell probabilities through a non-centrality parameter δ. Also, WMSE(ˆp I ) is an increasing function of δ. Under Assumption A, ˆp C (the estimator using the complete units only) is also consistent. The relative efficiency between ˆp I and ˆp C can be assessed by the difference between WMSE(ˆp I ) and WMSE(ˆp C ). Our simulation results in Chapter 3 show that estimators given by conditional imputation can increase the efficiency if the proportion of the completers is small.

34 Testing for Goodness-of-Fit A direct application of Theorem 2 is to obtain a Wald type test for goodnessof-fit. Consider the null hypothesis of H 0 : p = p 0, where p 0 is a known vector. Under H 0, X 2 W. = n(ˆp p 0) ˆΣ 1 (ˆp p 0) d χ 2 ab 1, where χ 2 v denotes a chi-square random variable with v degrees of freedom. ˆp (p 0) is obtained by dropping the last component of ˆp I (p 0 ) and ˆΣ is the estimated asymptotic covariance matrix of ˆp, which can be obtained by dropping the last row and column of ˆΣ, the estimated asymptotic covariance matrix of ˆp I. Although X 2 W provides an asymptotically correct chi-square test, the computation of ˆΣ 1 is complicated. Instead of looking for an asymptotically correct test, we consider a simple correction of the standard Pearson s chi-square statistic by matching the first order moment (Rao and Scott, 1981). When there is no nonresponse, under H 0, the Pearson s statistic is asymptotically distributed as a chi-square random variable with ab 1 degrees of freedom: X 2 P = n (ˆp ij p ij ) 2 p ij d χ 2 ab 1. (2.7) Therefore, E(XP 2 ) ab 1. However, under conditional imputation, it follows from Theorem 3 that E(X 2 P ) 1 π C (ab + π 2 Aa + π 2 Bb 2π A a 2π B b + 2π A π B + 2π A π B δ) π C ab + (ab 1).

35 28 If we let λ = 1 π C (ab 1) (ab + π2 Aa + π 2 Bb 2π A a 2π B b + 2π A π B +2π A π B δ) π Cab + 1, (2.8) ab 1 it follows that E(XP 2 /λ) (ab 1). Thus, we propose the corrected Pearson s statistic XC 2 = X2 P /λ. The performance of this corrected chi-square test, the naive chi-square test, and Wald s test are evaluated by a simulation study in Chapter Testing for Independence An application of Theorem 1 is testing the independence of A and B. When π C = 1, (i.e., there are no cases of nonresponses) the chi-square statistic is given by X 2 I. (ˆpij ˆp i ˆp j ) 2 = n ˆp i ˆp j d χ 2 (a 1)(b 1). The following theorem establishes the asymptotic behavior of X 2 I under marginal imputation and conditional imputation when π C > 0. Theorem 4 Assume that π C > 0 and that A and B are independent. (a) When marginal imputation is applied to impute nonrespondents, X 2 I d χ 2 (a 1)(b 1).

36 29 (b) When conditional imputation is applied to impute nonrespondents, X 2 I /(π 1 C + 1 π C) d χ 2 (a 1)(b 1). Proof: (a) After marginal imputation, the test statistics given by X 2 I = n (ˆp I ij ˆp I i ˆp I j) 2 ˆp I i ˆpI j = V n 2, where V n is a ab-dimensional vector with n(ˆp I ij ˆp I i ˆp I j) ˆp I i ˆpI j as its d th component, where d = (i 1)b + j. By Taylor s expansion and Slusky s theorem, n(ˆp I ij ˆp I i ˆp I j) ˆp I i ˆpI j = n(ˆp I ij ˆp I i p j p i ˆp I j + p i p j ) pi p j + o p (1). Define U = I a b (p A 1 a) I b I a (p B 1 b), where 1 a and 1 b are two vectors with all elements equal to 1 and dimension a and b, respectively. Let 1/ p A denote the vector (1/ p 1,, 1/ p a ), and let 1/ p B be defined similarly. Define S = diag{(1/ p A ) (1/ p B )}. Note that V n = SU( n(ˆp I p)) + o p (1) d N(0, SUΣU S ),

37 30 where Σ is given in Theorem 1, which is of the form κp A P B + x(p A p A ) P B + yp A (p B p B ), where κ = 1/π C + 1 π C under conditional imputation, and κ = 1 under marginal imputation. Note that P A (1p A ) = (p A 1 )P A = 0 P B (1p B ) = (p B 1)P B = 0 (p A p A )(1p A ) = (p A 1 )(p A p A ) = (p A p A ) (p B p B )(1p B ) = (p B 1 )(p B p B ) = (p B p B ). As a result, UΣU = κp A P B. Since S = diag{1/ p i } diag{1/ p j }, it follows that SUΣU S = κpa PB, where P A = diag(1/ p A ) P A diag(1/ p A ) = diag(1/ p A ) (diag{p A } p A p A ) diag{1/ p A } = I a p A pa.

38 31 Similarly, P B = diag{1/ p B } P B diag{1/ p B } = I b p B pb. Because both P A and P B are projection matrices with rank (a 1) and (b 1), respectively, P A P B is also a projection matrix, but with rank (a 1)(b 1). Since V n d N(0, SUΣU S ) = N(0, κp A P B), it follows that 1 κ X2 I = 1 κ V n 2 d χ 2 (a 1)(b 1). The proof is completed by recalling that under marginal imputation κ = 1 and under conditional imputation κ = 1/π C + 1 π C.

39 32 Chapter 3 Simulation Study Under Simple Random Sampling 3.1 Introduction All the results obtained in the previous chapter are based on large sample theory. In this chapter, extensive simulations are carried out to evaluate the finite sample performances of the proposed estimators and tests. In Section 3.2, we study the asymptotic normality through chi-square scores, which is used by Johnson and Wichern (1998) as a tool to study the normality of multivariate normal distributions. In Section 3.3 the relative efficiencies of ˆp I and ˆp C are compared using WMSE. In Section 3.4, two test statistics (Wald type and Rao type) for goodness-of-fit are compared in terms of size (type I error probability). In Section 3.5, testing independence under marginal imputation, conditional imputation, and re-weighting method are studied. Their relative efficiencies are compared in terms of power.

40 Asymptotic Normality Let X be a random vector and Σ be a positive definite matrix. The chi-square score of X with respect to Σ is given by s. = X Σ 1 X. (3.1) Under the assumption that X is normally distributed with mean 0 and covariance matrix Σ, s is a chi-square random variable with d degrees of freedom, where d is the dimension of X. Therefore, chi-square scores can be used as a tool to evaluate the normality of a multivariate random vector. Since ˆp I has a degenerate covariance matrix, instead of studying ˆp I, we study ˆp, which is obtained by dropping the last component of ˆp I. In each simulation, the total sample size n is fixed to be 1,000. Two types of contingency tables (2 2 and 5 5) are considered. For a given type of contingency table, it is assumed that p ij 1/(ab). Thirty-two different missing patterns (i.e., (π A, π B, π C )) are considered for simulation. For each missing pattern, 10,000 data sets are generated based on the given parameters (i.e., n, a, b, p ij, π A, π B, π C ). For each data set, conditional imputation is performed and ˆp I is calculated. The asymptotic covariance matrix is also calculated according to Theorem 2. The chi-square score is obtained according to (3.1) based on ˆp and ˆΣ. Asymptotically, the scores are distributed as chi-square random variables with ab 1 degrees of freedom. Therefore, each of the 10,000 chi-square scores is compared with χ and χ , where χ 2 p is the p th upper quantile of a chi-square random variable

41 34 with appropriate degrees of freedom, i.e., P (χ 2 > χ 2 p) = p. The empirical upper tail probabilities, i.e., P (s > χ ) and P (s > χ ), are estimated by the proportion of chi-square scores which are larger than χ or χ The results are summarized in Table 1. In order to provide a better understanding of the true distribution of the chi-square scores after conditional imputation, the chi-square scores density is estimated by R for selected cases. They are compared with the true chi-square densities with appropriate degrees of freedom. The results are given in Figure 1 and Figure 2. The results show that the true distributions of the chi-square scores are well approximated by its asymptotic distribution. 3.3 Weighted Mean Squared Error In this section, a simulation study is performed to compare the efficiency of ˆp I and ˆp C in terms of the WMSE as defined in Section 2.4. Two distributions for 2 2 contingency tables are considered here. They are (0.25, 0.25; 0.25, 0.25) and (0.01, 0.49; 0.49, 0.01), respectively. The noncentrality parameters of these two distributions are given by and , respectively. Based on the same simulation scheme as described in Section 3.2, 10,000 data sets are generated for each parameter setting (i.e., n = 1, 000, p ij = 1/(ab), a,b,π A, π B, π C ). For each data set WMSEs are calculated for ˆp I and ˆp C. In order to compare the efficiency of imputation and re-weighting methods, the difference of WMSE is considered and magnified by the total sample size n.

42 35 This difference is estimated by the average of 10,000 data sets and is denoted by ˆ. The results are summarized in Table 2 with negative values indicating better performance of ˆp I. It can be seen that when the missing probability is large, conditional imputation can improve the efficiency of point estimators. 3.4 Testing for Goodness-of-Fit Two methods are proposed in Chapter 2 for testing goodness-of-fit. One is a Wald type statistic and the other is a Rao type statistic. Under the null hypothesis of testing goodness-of-fit, the Wald type statistic is essentially the chi-square score constructed in section (3.2). Therefore, in this section we only study the Rao type statistic. Based on the same simulation scheme as described in Section 3.2, 10,000 data sets are generated for each parameter setting. For each data set the Rao score is calculated and compared with appropriate chi-square quantiles. The empirical upper tail probabilities are estimated by the proportion of the Rao scores which are larger than the quantiles. The results are summarized in Table 3. On the other hand, the density of the Rao type statistic is also estimated by R and compared with the standard one. The results are given in Figure 3 and 4. All the results show that the performance of the Rao type test is comparable to the Wald type test in terms of type I error.

43 Testing for Independence Marginal Imputation Based on the same simulation scheme described in Section 3.2, 10,000 data sets are generated for each parameter setting. For each data set marginal imputation is applied. Then the naive Pearson chi-square score is calculated. According to our result, the naive Pearson chi-square score should be approximately chisquare distributed with (a 1)(b 1) degrees of freedom. Therefore, it is compared with the standard chi-square upper quantiles (5% and 95%) with (a 1)(b 1) degrees of freedom. The empirical upper tail probabilities of the Pearson chi-square scores are estimated. The results are summarized in Table 4. On the other hand, the densities of the naive Pearson scores are also estimated by R and compared with the densities of standard chi-square random variables with appropriate degrees of freedom. The results are given in Figure 5 and Conditional Imputation Based on the same simulation scheme described in Section 3.2, 10,000 data sets are generated for each parameter setting. For each data set conditional imputation is applied. The naive Pearson chi-square score is calculated and corrected by an appropriate constant, which is given in Theorem 4. According to

44 37 our result, the density of the corrected Pearson scores should be approximately chi-square distributed with (a 1)(b 1) degrees of freedom. Therefore, the corrected Pearson scores are compared with the standard chi-square quantiles (5% and 95%) with appropriate degrees of freedom. The empirical upper tail probabilities are estimated. The results are reported in Table 5. On the other hand, the density of the corrected Pearson scores is also estimated by R and compared with the densities of standard chi-square random variables with appropriate degrees of freedom. The results are given in Figure 7 and Relative Efficiency Let X 2, I, X 2,c I, and X 2,m I be the chi-square statistics for testing the independence of A and B based on completers, conditional imputation, and marginal imputation, respectively. According to our asymptotic theory, they define three asymptotically correct tests with the rejection regions given by I{X 2, I > χ 2 1 α,(a 1)(b 1)}, and I{X 2,c I /κ > χ 2 1 α,(a 1)(b 1)}, I{X 2,m I > χ 2 1 α,(a 1)(b 1)}, respectively, where κ = nn 1 C + 1 n Cn 1. Under the null hypothesis that A and B are independent, all three tests have asymptotic size α. Therefore, the

45 38 relative efficiency of the three tests becomes a problem of interest. In this section, a simulation study is performed to study the relative efficiency of the above three tests in terms of power. The simulation is performed based on a 2 2 contingency table with distribution (0.28,0.22; 0.22,0.28). Thirty-two different missing patterns are considered. For each parameter setting, 10,000 data sets are generated and three different chi-square scores are calculated. One is based on the re-weighting method, one is Pearson s test after marginal imputation, and the other is the corrected Pearson test after conditional imputation. The power is estimated by the proportion of the scores which correctly reject the null hypothesis. The results are summarized in Table 6. In order to have a better understanding of how the power of the three tests changes as a function of π C, we perform a simulation based on a 2 2 contingency table with distribution (0.28, 0.22; 0.22, 0.28). For a given probability of completeness (π C ), we set π A = π B = (1 π C )/2. Fifty different values of π C, which are evenly distributed between 0.1 and 1.0, are studied. For each given parameter setting, 10,000 iterations are carried out. The estimated empirical power is plotted versus the π C. The results are given in Figure 9. We also study the power of the three tests as the function of the noncentrality parameter δ. In this case, we fix the missing pattern to be (π C, π A, π B ) = (0.5, 0.3, 0.2).

46 39 Note that the following type of 2 2 contingency tables δ w = has non-centrality parameter equal to δ/16, which is proportional to δ. Therefore, simulations are performed based on this type of 2 2 contingency tables. Fifty equally spaced δ values from 0.01 to 0.50 are studied. The estimated empirical power is plotted versus δ and is given in Figure 10. The results in this section suggest that when testing independence, greatest power is achieved by using the complete units only. In contrast, using the chi-square test under marginal imputation results in the smallest power. An intuitive explanation is that marginal imputation makes the two categorical responses of the incompleters independent with each other conditional on the completers. It decreases the dependency of these two components. The effect is even more pronounced when the proportion of incompleters is large. As a result, the power of marginal imputation for testing independence is the lowest. On the other hand, since the conditional imputation successfully captures the dependent structure of the two responses, its power is significantly higher than marginal imputation and comparable to, but not as good as, the re-weighting method due to the fact that additional noise is created by imputation. However, the merit of the marginal imputation is that the naive Pearson test statistic is still valid, which indicates that the marginally imputed data set can be processed by standard software without modification. This is useful

47 40 when the proportion of the nonrespondents is not too large. If the proportion of the incompleters is relatively large, conditional imputation is recommended. Therefore, the naive Pearson statistic should be modified by a constant which depends on the proportion of complete units only. 3.6 Conclusion For the selected sample sizes and parameters, our simulation results show that the empirical distribution of all the Wald type statistics can be well approximated by the derived asymptotic distributions. In addition, the simulation demonstrates that the empirical distributions of all the Rao type statistics can be well approximated by standard chi-square distributions with appropriate degrees of freedom. With regard to testing independence by different methods (re-weighting, marginal imputation, or conditional imputation) the simulation results indicate that greatest power is achieved by using the complete units only. In contrast, using the chi-square test under marginal imputation results in the smallest power.

48 41 Table 1: Empirical upper tail probability of Wald type statistic Missing Probability π C π A π B p 0.05 p 0.95 p 0.05 p Number of iterations: 10,000; sample size: 1,000; p ij = 1/(ab); p 0.05 and p 0.95 : 5% and 95% empirical upper tail probabilities, respectively.

49 42 Table 2: Efficiency comparison by WMSE Missing Probability δ = δ = p C p A p B ˆ ˆ Number of iterations: 10,000; sample size: 1,000; p 0.05 and p 0.95 : 5% and 95% empirical upper tail probabilities, respectively; δ=0 comes from distribution p = (0.25, 0.25; 0.25, 0.25) and δ= comes from distribution p = (0.01, 0.49; 0.49, 0.01).

50 43 Table 3: Empirical upper tail probability of Rao type statistic Missing Probability π C π A π B p 0.05 p 0.95 p 0.05 p Number of iterations: 10,000; sample size: 1,000; p ij = 1/(ab); p 0.05 and p 0.95 : 5% and 95% empirical upper tail probabilities, respectively.

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Statistica Sinica 13(2003), 613-623 TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Hansheng Wang and Jun Shao Peking University and University of Wisconsin Abstract: We consider the estimation

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

A decision theoretic approach to Imputation in finite population sampling

A decision theoretic approach to Imputation in finite population sampling A decision theoretic approach to Imputation in finite population sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 August 1997 Revised May and November 1999 To appear

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare

More information

Empirical Likelihood Inference for Two-Sample Problems

Empirical Likelihood Inference for Two-Sample Problems Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics

More information

Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

Inferences on missing information under multiple imputation and two-stage multiple imputation

Inferences on missing information under multiple imputation and two-stage multiple imputation p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand Advising on Research Methods: A consultant's companion Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand Contents Preface 13 I Preliminaries 19 1 Giving advice on research methods

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC Journal of Applied Statistical Science ISSN 1067-5817 Volume 14, Number 3/4, pp. 225-235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWO-FACTOR ANALYSIS OF VARIANCE

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis JULY 2011 Afsaneh Yazdani Preface Learning from Data Four-step process by which we can learn from data: 1. Defining the Problem 2. Collecting the Data 3. Summarizing

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION Statistica Sinica 8(998), 07-085 BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION Jun Shao and Yinzhong Chen University of Wisconsin-Madison Abstract: The bootstrap

More information

Analysis of variance using orthogonal projections

Analysis of variance using orthogonal projections Analysis of variance using orthogonal projections Rasmus Waagepetersen Abstract The purpose of this note is to show how statistical theory for inference in balanced ANOVA models can be conveniently developed

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Comparing Group Means When Nonresponse Rates Differ

Comparing Group Means When Nonresponse Rates Differ UNF Digital Commons UNF Theses and Dissertations Student Scholarship 2015 Comparing Group Means When Nonresponse Rates Differ Gabriela M. Stegmann University of North Florida Suggested Citation Stegmann,

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

2 Naïve Methods. 2.1 Complete or available case analysis

2 Naïve Methods. 2.1 Complete or available case analysis 2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Asymptotic Normality under Two-Phase Sampling Designs

Asymptotic Normality under Two-Phase Sampling Designs Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2 Problem.) I will break this into two parts: () Proving w (m) = p( x (m) X i = x i, X j = x j, p ij = p i p j ). In other words, the probability of a specific table in T x given the row and column counts

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Negative Multinomial Model and Cancer. Incidence

Negative Multinomial Model and Cancer. Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence S. Lahiri & Sunil K. Dhar Department of Mathematical Sciences, CAMS New Jersey Institute of Technology, Newar,

More information

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus Chapter 9 Hotelling s T 2 Test 9.1 One Sample The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus H A : µ µ 0. The test rejects H 0 if T 2 H = n(x µ 0 ) T S 1 (x µ 0 ) > n p F p,n

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator Linearization Method 141 properties that cover the most common types of complex sampling designs nonlinear estimators Approximative variance estimators can be used for variance estimation of a nonlinear

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Journal of Modern Applied Statistical Methods Volume 4 Issue Article 8 --5 Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Sudhir R. Paul University of

More information

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F. Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Empirical Likelihood Methods for Pretest-Posttest Studies

Empirical Likelihood Methods for Pretest-Posttest Studies Empirical Likelihood Methods for Pretest-Posttest Studies by Min Chen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in

More information

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication,

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication, STATISTICS IN TRANSITION-new series, August 2011 223 STATISTICS IN TRANSITION-new series, August 2011 Vol. 12, No. 1, pp. 223 230 BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition,

More information

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence Valen E. Johnson Texas A&M University February 27, 2014 Valen E. Johnson Texas A&M University Uniformly most powerful Bayes

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

An Approximate Test for Homogeneity of Correlated Correlation Coefficients

An Approximate Test for Homogeneity of Correlated Correlation Coefficients Quality & Quantity 37: 99 110, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 99 Research Note An Approximate Test for Homogeneity of Correlated Correlation Coefficients TRIVELLORE

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ML Testing (Likelihood Ratio Testing) for non-gaussian models ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l

More information

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ ADJUSTED POWER ESTIMATES IN MONTE CARLO EXPERIMENTS Ji Zhang Biostatistics and Research Data Systems Merck Research Laboratories Rahway, NJ 07065-0914 and Dennis D. Boos Department of Statistics, North

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

Hypothesis Testing For Multilayer Network Data

Hypothesis Testing For Multilayer Network Data Hypothesis Testing For Multilayer Network Data Jun Li Dept of Mathematics and Statistics, Boston University Joint work with Eric Kolaczyk Outline Background and Motivation Geometric structure of multilayer

More information