Extensions of the Penalized Spline Propensity Prediction Method of Imputation

Size: px
Start display at page:

Download "Extensions of the Penalized Spline Propensity Prediction Method of Imputation"

Transcription

1 Extensions of the Penalized Sline Proensity Prediction Method of Imutation by Guangyu Zhang A dissertation submitted in artial fulfillment of the requirements for the degree of Doctor of Philosohy (Biostatistics) in The University of Michigan 007 Doctoral Committee: Professor Roderick J. Little, Chair Professor Susan A. Murhy Professor Trivellore E. Raghunathan Associate Professor Bin Nan

2 Guangyu Zhang 007

3 ACKNOWLEGEMENTS I would like to thank my advisor, Dr. Roderick Little, for his advice, suort and intellectual stimulation. I also want to thank my husband, Huanyuan Sheng, for his love. ii

4 TABLE OF CONTENTS ACKNOWLEGEMENTS LIST OF FIGURES LIST OF TABLES ABSTRACT CHAPTER I. INTRODUCTION II. ETENSIONS OF THE PENALIZED SPLINE PROPENSITY PREDICTION (PSPP) METHOD OF IMPUTATION 9 Abstract 9. Introduction 9. Penalized Sline of Proensity Prediction (PSPP).3 PSPP is not doubly robust for subgrou means 3.4 Stratified Penalized Sline Proensity Prediction for subgrou means 6.5 A Bivariate PSPP Method for estimating the conditional mean of Y given a continuous covariate 7.6 An Examle: Online Weight Loss Study 9.7 Discussion Aendix 6 ii v vi vii III. A COMPARATIVE STUDY OF THE PENALIZED SPLINE PROPENSITY PREDICTION METHOD WITH ALTERNATIVE DOUBLY ROBUST ESTIMATORS 3 Abstract 3 3. Introduction 3 3. Doubly robust estimators Simulation studies An Examle: Online Weight Loss Study Conclusion 55 iii

5 IV. THE PSPP METHOD FOR THE MONOTONE PATTERN MISSING DATA Introduction PSPP for the monotone attern of missing data Simulation study Examle Conclusion 80 V. CONCLUSION AND THE FUTURE WORK 86 REFERENCES 89 iv

6 LIST OF FIGURES Figure % Increase of RMSE of Simulation 57 % Increase of Width of CI of Simulation 58 3 Non-coverage rate of Simulation 59 4 % Increase of RMSE of Simulation 60 5 % Increase of Width of CI of Simulation 6 6 Non-coverage rate of Simulation 6 7 % Increase of RMSE of Simulation % Increase of Width of CI of Simulation Non-coverage rate of Simulation Examle of monotone missing data structure 7 v

7 LIST OF TABLES Table. Examle : Emirical Bias, Standard Deviation (SD) and Root Mean Squared Error (RMSE) for (A) Marginal mean of Y, and (B) Conditional Mean of Y given. 3. Examle : Emirical Bias, Root Mean Squared Error (RMSE) and Coverage rate (Cov) for (A) Marginal mean of Y, (B) Conditional Mean of Y given, and (C) Intercet and Sloes for Regression of Y on,. 4.3 BMI reduction within grous 5 3. Simulation classified by degree of missecification in the mean function and the degree of diversity of the roensity function Simulation classified by degree of missecification in the mean function and the degree of diversity of the roensity score Simulation 3 classified by degree of missecification in the mean function and the degree of diversity of the roensity score Emirical bias, emirical standard error and RMSE when roensity function is wrong secified BMI reduction within grous Bias, STD and RMSE for the marginal and conditional means 8 4. Covariates in the roensity model The baseline covariates in the g function of model b, d and f BMI reduction within grous 85 vi

8 ABSTRACT Little and An (004) roosed a enalized sline roensity rediction (PSPP) method of imutation of missing values that yields robust model-based inference under the missing at random assumtion. The roensity score for a missing variable is estimated and a regression model is fit with the sline of the roensity score as a covariate. The redicted marginal mean of the missing variable is doubly robust (DR) under the missecification of the imutation model. In the first art of the thesis, we study roerties of a simlified version of the PSPP that does not center the regressors rior to including them in the rediction model. We then extend the PSPP to multivariate data with both continuous and categorical variables so as to yield consistent estimates of both marginal and conditional means. The extended PSPP method is comared with the PSPP method and simle alternatives in a simulation study. For the second art of the thesis, we comare the PSPP method with several other DR estimators. The PSPP method uses a sline of roensity score to imute the missing values and the resulting estimates have a double robustness roerty. The DR roerty can also be achieved by modeling the relationshi arametrically, such as the linear in the weight method and calibration method (Firth and Bennett, 998; Scharfstein, Rotnitzky and Robins, 999; Robins, Rotnitzky and Zhao, 994; Scharfstein, Rotnitzky and Robins, 999). We comare root mean square error (RMSE), width of confidence interval and non-coverage rate of the PSPP method and these alternatives under different mean functions and roensity score functions. We also study the effects of samle size and missecification of the roensity scores. The PSPP method yields estimates with smaller RMSE and width of confidence interval comared with other methods under most situations. It yields estimates with non-coverage rates close to the 5% nominal level. vii

9 For the third art of the thesis, we extend the PSPP methods to the monotone missing data. We roose to imute the missing values based on a stewise PSPP rocedure and simulation studies show the stewise PSPP method yields consistent marginal and conditional mean estimates. We illustrate the roosed method by alying it to an online weight loss study conducted by Kaiser Permanente. We finish the thesis with a short discussion and future work. viii

10 CHAPTER I INTRODUCTION Missing data arise in scientific research for many reasons. For examle, in a two stage clinical study, a subset of subjects is selected for exensive medical tests and those who have not been selected will have missing values. On the other hand, subjects who have been selected may dro out of the study so that it is imossible to collect test results. No matter what are the reasons, it is imortant to include the information in the incomlete cases in the analysis to yield efficient estimators and better inferences. A dataset with missing values can be described by the missing-data attern, which indicates which observations are resent and which are missing. Let Y = ( y ij ) be a (n ) rectangular data set with the ith row yi = ( yi,..., yi), where y ij is the jth observation for subject i, j =,...,. Let M = ( m ) ij be a missing-data indicator matrix with the th row m = ( m,..., m ), such that m = if y is missing and m = 0 if i i i i y is resent. The matrix M then reresents the missing data attern of the dataset Y. ij We assume that ( y, m ) are indeendent over i throughout the dissertation. i i ij ij ij We first focus on the univariate missing data, where missingness confines to a single variable. Let (,...,, Y) denote a ( + ) -dimensional vector of variables with Y subject to missing values and,..., fully observed covariates. We consider the roblem of estimating the mean of Y, and the conditional means of Y in subclasses defined by a categorical variable, and the regression coefficient of Y on a continuous variable.

11 Estimation of the marginal and conditional means of Y requires the assumtions on the missing-data mechanism, which concerns the relationshi between the missingness and the values of the variables in the data matrix. Rubin (976) treated M as a random matrix and described the missing data mechanism by the conditional distribution of M given Y, f( M Y, φ ), where φ denotes unknown arameters. When missingness does not deend on Y, missing or observed, that is, f( M Y, φ) = f( M φ) for all Y, φ, the data are called missing comlete at random (MCAR). If the missingness deends only on Yobs, the observed art of Y, but not the missing art of Y, Y mis, that is f( M Y, φ) = f( M Y, φ) for all Y, φ, then the missing data mechanism is called missing at random (MAR). If the missingness deends on the missing art of the variables, that is, obs obs mis mis f( M Y, φ) = f( M Y, Y, φ) for all φ, then the data are called not missing at random (NMAR). MAR is a less restrictive assumtion than MCAR. In alications researchers are encouraged to take efforts to render the MAR assumtions lausible by measuring covariates that characterize the nonresondents (Little and Rubin, 999). We assume the missing data are MAR throughout the dissertation. Many methods have been roosed to deal with missing information. A simle aroach is comlete-case analysis (CC), which deletes units with Y missing, so information contained in the deleted cases is lost. In the context of our roblem, CC analysis yields consistent estimates of marginal mean of Y, if the missing-data mechanism is missing comlete at random. But it yields biased estimates if the missingness of Y deends on the observed covariates,..., or deends on Y. Weighted comlete-cases analysis is an alternative of the CC analysis (Little, 986; Horvitz, and Thomson, 95; Cochran, 968). Let r be the number of comlete r r cases, the marginal mean of Y can be derived as ˆ μ = ( wy)/( w) or i i i i= i=

12 3 r ˆ μ = ( wy) / n. The weight, w, is the design weight or the robability of being selected i= i i in samle surveys without nonresonse. For missing data due to nonresonse the weight is the inverse of the robability of being observed. When the weight is unknown, we can estimate it based on a set of observed variables that characterizes the resondents and nonresondents. One way is to grou subjects into subclasses based on a small set of observed covariates. Within each subgrou the resondents are a random samle of the suboulation and the weight is the inverse of the roortion of the resondents. When the number of covariates increases, sub-grouing will lead to a large number of subclasses and in this case, roensity weighting will be an alternative (Cochran, 965; Rosenbaum and Rubin, 983, 984, Little, 986). The roensity score is a scalar function of the observed covariates. One can estimate the roensity score by a logistic or robit regression of M on the observed covariates and the weight can be derived as the inverse of the roensity score. The otential draw back of the roensity weighting is that it is may yield estimates with large variances because resondents with very small roensity scores will be assigned huge weights, which may lead to out-of-range estimates for the means (Little and Rubin, 00). Comlete-case analysis and weighted comlete-case analysis delete subjects with missing values thus information contained in the covariates of the incomlete cases are lost. This loss of information may lead to less efficient estimators. To make full use of observed information we can use arametric aroach to deal with missing data. For examle, we can derive the marginal mean of Y based on a linear regression model Y i 0 = β + β + j= j ij ε i, where i ε is the error term with ε ~ N (0, σ ). We can solve this i model by maximum likelihood (ML) aroach (Little and Rubin, 00; Anderson, 957; Rubin, 974). The marginal mean of Y can be derived as Y ˆ r n = n ( y + y ˆ ), i= i i= r+ i with yˆ i = ˆ β ˆ 0 + β jij j=, where ˆ β,..., ˆ β are the maximum likelihood estimators based on 0 the comlete cases. An alternative to the ML estimators is to add a rior distribution for

13 4 the arameters β 0,..., β and σ and derive the osterior distribution of Y given the covariates and the unknown arameters, Y β β σ (Gelman, Carlin, (,...,, 0,...,, ) Stern and Rubin, 995). Missing values of Y and the unknown arameters = (,..., ) and β β0 β σ are drawn iteratively by Gibbs samler or by Markov Chain Monte Carlo (MCMC) method (Casella and George, 99; Geman and Geman, 984). When the osterior distribution reaches stationary condition after Nth iteration, M sets of data are created such that within each data every missing is substituted by an indeendent draw from the osterior distribution. For each dataset a osterior mean of Y, () l Y, l =,..., M, is derived as the average of the observed values and the osterior draws. The marginal mean of Y is the average the osterior means over the M datasets. Usually M needs to be a large number. However, if we can assume aroximate normality for the osterior distribution of β and σ given the observed data, ( β,..., β, σ,..., ), we only need to create a small number of datasets to estimate 0 the marginal mean of Y, which is the idea of multile imutation (Little and Rubin, 00; Rubin, 978). For each dataset the missing values are relaced by indeendent osterior draws and the comlete data analysis technique is alied to each imuted dataset. The marginal mean of Y can be derived using Rubin s combination rules (Rubin 978, 987, 996; Rubin and Schenker, 986; Barnard and Rubin, 999). Let ˆd μ be the estimated marginal mean of the d th dataset, d =,..., D, where D is the total number of imuted Y i datasets, the marginal mean of Y is derived as D ˆ μ = ˆ μ / d D. d = The arametric aroach described above is very efficient and yields consistent estimates if the model assumtions are correct. But the drawback is that it is very sensitive to model missecification. In reality we can never guarantee the model assumtions are correct thus robust estimators are gaining more attention recently. Robins, Rotnitzky and Zhao (994) and Rotnitzky, Robins and Scharfstein (998) roosed a class of augmented orthogonal inverse robability-weighed estimators, which combine the features of the arametric rediction with the weighted estimation equations

14 5 (WEE). The marginal mean of Y can be derived by calibrating the redictions from a arametric model by adding mean of the weighted residuals, M μ y = E( E( Y,..., )) + E[ ( Y E( Y,..., )] π ( Y ) where π ( Y ) is the robability of being observed. This leads to a calibration estimator of the form: where wˆ / Pr( M 0,..., ) n r ˆ ( ˆ ) ( ˆ i i( i i= i= ˆ i)) μ = n y + n w y y i = i = P is the estimated weight for the ith subject, and ˆi the rediction from a arametric model for the ith subject. There are three stes for the calibration method. Firstly a arametric model is fit to the comlete cases and redictions are derived for all the subjects by substituting the covariates to the regression model. Secondly, the roensity score is estimated by a logistic regression or a robit model of M on,...,. Then the marginal mean of Y can be estimated by combining mean of the redictions with mean of the weighted residuals, where residuals are the differences of the observed values and the redicted values for the comlete cases. This method has a double robustness roerty meaning that if either the rediction model is correctly secified or the weight is correctly estimated, the marginal mean of Y is consistent. This class of estimators is further extended in Robins and Rotnitzky (00), Lunceford and Davidian (004), Yu and Nan (006). y is An alternative way to achieve robustness is to weaken the model assumtions; for examle, we can fit models with robust mean functions. One of the methods is the linear in the weight rediction (LWP). It includes the weight as a linear term in the imutation model (Scharfstein, Rotnitzky and Robins, 999; Bang and Robins, 005) as follows. ( Y,..., ; β ) ~ N( g(,... ; β) + α W, σ ) where W ˆ = / Pr( R =,..., ) P is the inverse of the estimated roensity score of resondents. Similar aroach has been alied in the samle survey setting, where the weights are due to samling rather than nonresonse (Sarndal, Swensson and Wretman, 003 ; Firth D. and Bennett, 998). The linear in the weight rediction method has a similar double robustness roerty as the calibration estimators meaning that if either the ˆ

15 6 mean function of Y given the covariates are correctly secified or the weight is correctly estimated then the marginal mean of missing variable Y will be consistent. Like the calibration method, the first ste of fitting a linear in the weight model estimates the roensity score, for examle by a logistic regression model or a robit model of M on,..., ; in the second ste, a regression of Y on the weight and the other covariates is fit arametrically. Semiarametric and non-arametric method is another aroach to yield robust mean functions by caturing the nonlinear relationshi between the variables. In articular, with = and single covariate, one version of this aroach is to base imutations on the enalized sline model yi = s( xi) + εiwith truncated olynomial basis q K 0 q k= sx q ( ) = β + β x β x + β ( x κ ) + where, x,..., x q,( x κ ) q,...,( x κ ) q is known as the truncated ower basis of degree + k + q; κ <... <κk are selected fixed knots and K is the total number of knots (Eilers and Marx, 996; Ruert, Wand and Carroll, 003; Ngo and Wand, 004). The enalized least squares estimator ˆ β = ( ˆ β,..., ˆ β, ˆ β,..., ˆ β ) T is obtained by minimizing n 0 q q qk q j K q q { y } T i β0 β ( ) D j jx β k qk x κk + + λ β = β = i= qk k where λ is a smoothing arameter and D = diag (0 q+, K). The fitted values are ŷ = ( T q + λ D) T y. This model can be fitted using a number of existing software ackages, such as PROC MIED in SAS (SAS, 99; Ngo and Wand, 004; Littell, Milliken, Strou, and Wolnger. 996; Ruert, 00) and lme() in S-lus (Pinheiro and Bates, 000). With more than one covariate, one might extend this aroach by fitting a multivariate sline. However, such models are subject to the curse of dimensionality when is large, which relates to the difficulty of fitting nonarametric regression functions when the regressor sace has high dimension. The Penalized Sline of Proensity Prediction (PSPP; Little and An 004) method addresses this roblem by restricting the sline to a articular function of covariates most sensitive to model missecification, namely the roensity score. Little and An show that the PSPP method

16 7 yields an estimate of the marginal mean of the missing variable with a double robustness (DR) roerty, which means that the redicted marginal mean of Y will be consistent when either the mean function of Y given the covariates is correctly secified or the roensity score function is correctly secified. The robustness feature lies in the fact that the arametric function does not have to be correctly secified. A related aroach is given by Zeng (00), who reduces the dimension of the covariates to two, the roensity and a linear redictor, and then models the relationshi of the outcome and these two variables by a bivariate nonarametric model. For the first art of the dissertation, we simlify and extend the PSPP method. Little and An's method requires centering of the covariates before adding them to the model arametrically. We show this centering is not necessary and simlify the PSPP method considerably. We rove that this simlified version has the same DR roerty as the model roosed by Little and An (004). The simlified PSPP method is much easier for the ractitioners. We then extend the simlified PSPP method to derive the conditional mean(s) of a missing variable given a covariate. A stratified PSPP method is roosed to derive the subgrou means given a categorical covariate. For continuous covariate, we roose a bivariate PSPP method. Both of these extensions consider the interaction of the roensity score and the covariate. Simulations show that these extensions yield consistent conditional means under different mean and roensity structures. We aly the stratified PSPP method to an online weight loss study conducted by Kaiser Permanente (Couer, Peytchev, Little, Strecher and Rothert, 005). For the second art of the dissertation, we comare the PSPP method and several alternative doubly robust estimators. The PSPP method is based on the balance roerty of the roensity score, which means, conditioning on the roensity score and assuming MAR, missingness of Y does not deend on the covariates,..., (Rosenbaum and Rubin, 983). Since we do not know the true relationshi of Y and the roensity score, we use a sline of the roensity score to imute the missing values and the resulting estimates have a double robustness roerty. The DR roerty can also be achieved by modeling the relationshi arametrically, such as linear in the weight rediction method

17 8 and the calibration estimators. However emhasis in revious research has been on asymtotic roerties of the estimates, namely consistency and achieving the semiarametric efficiency bound (Firth and Bennett, 998; Scharfstein, Rotnitzky and Robins, 999; Bang and Robins, 005; Robins, Rotnitzky and Zhao, 994; Scharfstein, Rotnitzky and Robins, 999). Consistency is a relatively weak roerty, and does not guarantee good confidence coverage of inferences in small or moderate sized samles. Semiarametric efficiency is also a fairly weak roerty since it is asymtotic and does not necessarily guarantee efficiency in finite samles. For the second art of the dissertation we comare root mean square error (RMSE), width of confidence interval and non-coverage rate of the above aroaches for a range of samle sizes, when the regression model is missecified. We also comare these methods when the roensity score is wrongly secified. In the third art of my dissertation, we extend the PSPP method to the monotone attern of missing data, where variables can be arranged in a way that if Y j is missing in a unit then Yj+, Yj+,, Y are missing as well. Monotone attern of missing data is common in longitudinal studies when some subjects dro out the study and do not return. We roose to imute the missing values in a stewise rocedure. The marginal roensity score is derived for each art of the missing variables. For the art where marginal roensity score is zero due to the missingness of the recedent variable(s), we cannot aly the PSPP method directly since there is no observed data with this roensity score. In this case we roose to borrow the roensity scores from the revious stages. Imutation of missing values is done in several stes according to the attern of missing data. The art with least missing information is imuted first and then the imuted data is used to redict the missing values for the next art of data. Simulation studies show that the stewise rocedure yields satisfactory results. We illustrate our method by alying it to an online weight loss study conducted by Kaiser Permanente. We conclude the dissertation with a short discussion and future work in Chater V.

18 CHAPTER II ETENSIONS OF THE PENALIZED SPLINE PROPENSITY PREDICTION (PSPP) METHOD OF IMPUTATION Abstract Little and An (004) roosed a enalized sline of roensity rediction (PSPP) method of imutation of missing values that yields robust model-based inference under the missing at random assumtion. The roensity score for a missing variable is estimated and a regression model is fit that includes the sline of the estimated roensity score as a covariate. The redicted unconditional mean of the missing variable has a double robustness (DR) roerty under missecification of the imutation model. We show that a simlified version of PSPP, which does not center other regressors rior to including them in the rediction model, also has the DR roerty. We also roose two extensions of PSPP, namely stratified PSPP and bivariate PSPP, that extend the DR roerty to inferences about conditional means. These extended PSPP methods are comared with the PSPP method and simle alternatives in a simulation study and alied to an online weight loss study conducted by Kaiser Permanente. Keywords: missing at random, roensity, enalized sline.. Introduction Missing data roblems are common in many alications of statistics. In this aer, we consider univariate nonresonse, where the missingness is confined to a single variable. Let ( Y,,..., ) denote a + dimensional vector of variables with Y subject to missing values and,..., fully observed covariates. We consider here the roblem of estimating the mean of Y, and the conditional mean of Y in subclasses defined by a categorical -variable, and the regression coefficient of Y on a continuous -variable. 9

19 0 Many statistical methods have been roosed for these roblems. A simle aroach is comlete case analysis (CC), which deletes units with Y missing, so information contained in the deleted cases is lost. In the context of our roblem, CC analysis yields a consistent estimate of the overall mean of Y if missingness does not deend on any of the variables, and consistent estimate of the conditional mean of Y given a covariate if the missing-data mechanism deends on, but does not deend on Y or,...,. Another aroach is to imute missing values based on a arametric model, for examle a linear regression model Y = β0 + β + ε j i i = j ij, where ε i is the error term with εi ~ N(0, σ ). One can estimate ( β0,..., β ) based on the comlete cases and redict the missing values of Y by substituting for that case into the regression equation. This aroach is effective when the data are missing at random (Rubin 976; Little and Rubin, 00) and the regression model assumtions are correct, but can yield biased results when the model is missecified. Semiarametric and nonarametric methods weaken the model assumtions and cature the nonlinear relationshis between the variables. In articular, with = and single covariate, one version of this aroach is to base imutations on the enalized sline model y = s( x ) + ε with truncated olynomial basis i i i q K q 0 q k= qk k () sx ( ) = β + β x β x + β ( x κ ) + where, q x,..., x,( x ),...,( x ) q κ + q κ k + is known as the truncated ower basis of degree q; κ <... <κk are selected fixed knots and K is the total number of knots (Eilers and Marx, 996; Ruert, Wand and Carroll, 003; Ngo and Wand, 004). The enalized least squares estimator ˆ β = ( ˆ β,..., ˆ β, ˆ β,..., ˆ β ) T is obtained by minimizing n 0 q q qk q j K q q { y } T i β0 β ( ) D j jx β k qk x κk + + λ β = β = i= where λ is a smoothing arameter and D = diag (0 q+, K). The fitted values are ŷ = ( T q + λ D) T y. This model can be fitted using a number of existing software ackages, such as PROC MIED in SAS (SAS, 99; Ngo and Wand, 004) and lme() in S-lus (Pinheiro and Bates, 000). This imutation model is strictly seaking

20 arametric, but mimics a nonarametric method when K is large, since the form of the relationshi between Y and is very flexible. With more than one covariate, one might extend this aroach by fitting a multivariate sline. However, such models are subject to the curse of dimensionality when is large, which relates to the difficulty of fitting nonarametric regression functions when the regressor sace has high dimension. Penalized Sline of Proensity Prediction (PSPP; Little and An 004) addresses this roblem by restricting the sline to a articular function of covariates most sensitive to model missecification, namely the roensity score. Little and An show that the PSPP method yields an estimate of the marginal mean of the missing variable with a double robustness (DR) roerty, described below in section.. We roose a simlification of PSPP that does not center the regressors rior to including them in the rediction model. Little and An (004) did not consider whether the PSPP yields robust estimates for other arameters, such as conditional means or regression coefficients. In section.3 we rovide examles to show that the PSPP method does not in general yield estimates of these arameters with the DR roerty. This motivates robust extensions of the PSPP method for estimating subgrou means and regression coefficients, which are described in sections.4 and.5. We aly the roosed methods to an online weight loss study in section.6, and section.7 resents concluding remarks.. Penalized Sline of Proensity Prediction (PSPP) Let ( Y,,..., ) denote a vector of variables with Y subject to missing values and,..., fully-observed covariates. The missingness of Y deends only on,...,, so the missing data mechanism is missing at random (Rubin, 976). Let M be an indicator variable with M = when Y is missing and M = 0 when Y is observed. Define the logit of the roensity for Y to be observed as: ( ) P M =logit Pr( = 0,..., ) ()

21 The key roerty of the roensity score is that, conditioning on the roensity score and assuming MAR, missingness of Y does not deend on,..., (Rosenbaum and Rubin, 983). Thus, the mean of Y can be written as [( ) ] [ ( )] μ y = E M Y + E M E Y P (3) Since the true relationshi of Y and the roensity score is unknown, Little and An (004) roosed to include the roensity score in the imutation model nonarametrically. This motivates the Penalized Sline of Proensity Prediction Method (PSPP), which is based on the following model: (,..., P ) ~ N (( s ( P ),..., s ( P )), Σ) ( Y P,,..., ; β ) ~ N( s( P ) + g( P,,... ; β), σ ) (4) where N ( μ, Σ ) denotes the k-variate normal distribution with mean μ and covariance k matrix Σ, sj( P ) = E( j P ), j =,...,, is a sline for the regression of j on P of the form (); = s ( Y ) j j j is the residual of the sline model and reresents the art in not exlained by the roensity score; sp ( ) is a sline of Y on P of the form () j and g is a arametric function indexed by unknown arameter β with gy (,0,...,0; β ) = 0 for all β. One of the redictors, here, is omitted from the g - function to avoid multicollinearity. The first ste of fitting a PSPP model estimates the roensity score, for examle by a logistic regression model of M on,..., ; in the second ste, the regression of Y on P is fit as a sline model with the other covariates included in the model arametrically in the g - function. The redicted mean of Y from model (4) has the following DR roerty: Theorem. Let ˆ μ y be the rediction estimator for (3) based on model (4), and assume MAR. Then ˆ μ y is a consistent estimator of μ y if either (a) the mean of Y given ( P,,..., ) in model (4) is correctly secified, or (b) the roensity P is correctly secified, and (b) E( P ) = s ( P ) for j =,..., and EY ( P) = s( P ). The j j

22 3 robustness feature derives from the fact that the regression function g does not have to be correctly secified ( Little and An, 004). The covariates,..., in this theorem are centered by regressing,..., on slines of P and taking residuals. A simler method adds,..., directly to the regression, without centering. We now show that this method also has the DR roerty: Theorem. The PSPP method based on model (4) can be simlified as follows: ( Y P,,..., ; β ) ~ N( s( P ) + g( P,,... ; β), σ ) (5) that is, the covariates,..., enter the arametric function g without centering. Let ˆ μ y be the rediction estimator for (3) based on model (5), and assume MAR, then ˆ μ y has the same DR roerty as that derived from model (4) (see aendix for roof). For this reason, we focus on the uncentered version of the PSPP method for the remainder of the aer..3 PSPP is not doubly robust for subgrou means. The DR roerty of PSPP for estimating the marginal mean of Y does not extend to estimates of conditional means, such as means in subgrous defined by a categorical covariate. The next two examles illustrate this statement. The first examle illustrates the intuitively obvious fact that for estimating the conditional mean of Y given, the PSPP method needs to include as a redictor in the model for Y. The second examle illustrates that inclusion of as a redictor in the model for Y is not sufficient to avoid bias with the PSPP method. This limitation is then addressed with the extended versions of the method. Examle. PSPP for estimating a conditional mean: including the subgrou variable in the model for Y is necessary. We simulate 500 datasets with 500 subjects, with categorical covariate, continuous covariate and continuous resonse variable Y, where, are indeendent with ~ multinomial (0.5,0.3,0.), ~ N (0,), and

23 ( μ ) 4 Y, ~ N (, ),, μ (, ) = I[ = ] + 3 I[ = ] + 5 I[ = 3] + 0 where I[] denotes an indicator for the event in the arenthesis. We create missing values of Y from the resonse roensity model: logit ( PM ( = 0, )) = I [ = ] 0.5 I [ = ] We imute the missing values of Y using redicted means from the following methods: (a) A correctly-secified ANCOVA model of Y given,, which we denote [ + ]. (b) An incorrectly secified regression model for Y that omits, namely [ ]. (c) The PSPP Method with null g function, which we denote [( )]. The roensity sp correct score is modeled as an additive function of Pcorrect and and hence is correctly secified and conditions on. (d) Model (c) with included, namely [( sp ) + ]. This model correctly secifies correct the mean of Y given the covariates, since it includes the main effects of and. (e) The PSPP Method with null g function and incorrectly secified roensity score, modeled as a linear function of alone, which we denote [( sp wrong )]. (f) Model (e) with included, namely [( sp ) + ]. This model correctly secifies wrong the mean of Y given the covariates, since it includes the main effects of and. For all the enalized sline methods in this aer, we choose 0 equally saced fixed knots and a truncated linear basis. We estimate the marginal mean of Y and the conditional means of Y given as the average of observed and imuted values from these methods. For comarison uroses, we also show estimates from the data before deletion (BD) and estimates based on the comlete cases (CC). Emirical bias (Bias), emirical standard deviation (STD) and root mean square error (RMSE) over the 500 relications are summarized in Tables.A and.b. CC analysis yields estimates with large biases and RMSEs. The correctly secified ANCOVA model (a) yields unbiased estimates close to the BD estimates. The wrongly secified ANCOVA model (b) yields

24 5 biased arameter estimates, with large biases and RMSEs. For the PSPP method, inclusion of in the model is imortant for subgrou mean estimation. Without in the model, the PSPP method (c) yields small emirical bias for the marginal mean estimate and a large emirical bias for the conditional means of Y given, even though the roensity score model is correct and conditions on both and ; including in the PSPP method (d) yields estimates of the marginal mean of Y and conditional means of Y given with small emirical biases, and STDs and RMSEs very close to those of BD. When neither the roensity score nor the mean function is correctly secified, the PSPP method (e) yields biased results; but the bias is removed in model (f) by including, since then the regression is correctly secified. Examle. PSPP for estimating a conditional mean: including the subgrou variable in the model for Y is not sufficient. We now generate and as in Examle ; but the mean of Y given and is simulated to include both a quadratic term in and interactions between and : Y, ~ N( μ(, ),), μ (, ) = I[ = ] + 3 I[ = ]+5 I[ = 3] I [ = ] 0 I [ = ] The logistic regression of M is additive in and a quadratic function of : logit( P( M = 0, )) = 0.5 I[ = ] 0.5 I[ = ] We simulate 500 datasets with samle size of 000 each. We imute the missing Y as redicted means from the following methods: (a) A correctly-secified regression model for Y, namely [ ]. (b) The PSPP model with null g-function, namely [( sp)]. The roensity score P is modeled as an additive function of, and (c) PSPP with included, that is, [( sp) + ]. (d) PSPP with and included, namely, [( sp ) + + ]. and hence is correctly secified.

25 6 The correctly-secified ANCOVA model yields estimates with small emirical bias and RMSE close to BD (Table.). CC analysis and the wrongly secified ANCOVA model yield biased estimates. The PSPP methods (b) (d) yield estimates for the marginal mean of Y with small emirical bias, but are clearly biased for the conditional means of Y given and Y given. In articular, unlike Examle, adding to the g-function does not correct the missecification of the mean of Y given, since the estimates of the conditional means are still biased. In the second examle, the PSPP method [( sp) + ] assumes that for different levels of, the slines of Y on the interaction between P have the same shae; since the true model includes and, this assumtion is violated, and it is this fact that leads to bias for the conditional means. One solution is to include the interaction of roensity score and into the model, yielding a stratified PSPP method discussed in the next section..4 Stratified Penalized Sline Proensity Prediction for subgrou means Let I = if = c ; I = 0 if c, c =,..., C, where C is the total number of c c categories of. The stratified PSPP method is based on the following model: C c c + c= ( Y P,,..., ; β ) ~ N( I s ( P ) g( P,,,..., ; β), σ ) (6) Where g is a arametric function indexed by unknown arameter β as before, with droed to avoid multicollinearity; q K j c c = c 0c + jc + qkc k j= k= q Is( P) I( γ γ ( P) γ ( P κ ) + ) is the fitted curves for the cth level of. Within each level of, E( Y P, = c,,..., ; β ) = s ( P ) + g( P, = c,,..., ; β ). c Note that this method is not the same as alying PSPP within strata defined by, since the g-function does not necessarily include the interactions of with the other covariates. This method yields consistent estimates for the conditional means of Y given

26 7. The marginal mean of Y is a weighted average of conditional means, which again has the double robustness roerty (see aendix for roof). Examle continued Row (e) in Table. shows the results of alying stratified PSPP to the data in Examle. The emirical bias is small for the marginal mean of Y and the subgrou means of Y given, and the RMSE for these arameters is only slightly larger than for BD. Thus stratified PSPP has fixed the bias for the subgrou means in the PSPP methods. On the other hand the emirical bias remains large for the coefficients of the regression of Y on. For those arameters we need another extension of PSPP, which we now describe..5 A Bivariate PSPP Method for estimating the conditional mean of Y given a continuous covariate. In this section we consider estimating the conditional mean of Y given a continuous variable, based on a regression model for Y given. To estimate the regression coefficients in this case we need to assume that the regression of Y on is correctly secified; for concreteness we assume it is linear with mean E( Y ) = β0 + β + β. To yield consistent arameter estimates for the regression coefficients, we now include the interaction of roensity score and in the model for redicting the missing values of Y. Secifically, we roose the following bivariate PSPP method, based on the model: ( Y P,,,..., ; β ) ~ N( s( P, ) + g( P,,,..., ; β), σ ) (7) where g is a arametric function; sp (, ) is a bivariate P-sline of and Estimation of the bivariate smoothing function P. requires bivariate basis functions, which can be derived in several different ways. A natural extension of the truncated linear basis for one dimension is to form all the air-wise roducts of the basis functions. The resulting bivariate basis is called the tensor roduct basis (Ruert, Wand sp and Carroll, 003). With this basis, the bivariate function (, ) sp (, ) can be written as

27 8 K K = α0 + α + γk κk + + α + γ k' κ k' + + α3 k= k' = sp (, ) P ( P ) ( ) P K K K K γ3kp κk + γ4 k' P κ k' + γ5 kk' P κk + κ k' k= k' = k= k' = + + ( ) + ( ) + ( ) ( ) where κ <... < κ K and κ <... < κ K are selected fixed knots for roensity score and resectively. In this aer we choose 5 equally saced knots for each variable when fitting the bivariate slines using a tensor roduct basis. Examle Continued Row (f) in Table. shows estimates of the arameters when missing values are imuted using the bivariate PSPP method. This method yields estimates of the coefficients of the regression of Y on with small emirical biases and RMSEs only slightly higher than those of BD analysis. The conditional means of Y given from bivariate PSPP are biased. To get consistent estimates of both the conditional means of Y given and conditional mean of Y given, a model is needed that includes the interaction between the roensity score and and the interaction between the roensity score and. This motivates the following combination of the stratified PSPP and bivariate PSPP models: where C c c + + c= ( Y P,,..., ; β ) ~ N( I s ( P ) s( P, ) g( P,,..., ; β), σ ) ( ) Icsc P and sp (, ) are defined as in sections 4 and 5 resectively. When we alied this method to the second simulation, a small number (8) of the 500 samles failed to converge, but results for the other samles indicate that emirical bias from this model is small for both the conditional mean of Y given Y given (Table., row (g)). and the conditional mean of

28 9.6 An Examle: Online Weight Loss Study To illustrate our roosed aroach, we consider data from an online weight loss study conducted by Kaiser Permanente (Couer et al., 005). The study randomized aroximately 4,000 subjects to the treatment or the control grou. For the treatment grou, the weight loss information rovided online was tailored to the subjects based on their answers to an initial survey, which contained baseline measurements such as baseline weight, motivation to weight loss, etc; for the control grou, information rovided online was the same for all the subjects. At 3 months, a second survey was sent to all of the articiants, which collected follow-u measurements such as current weight. Our goal is to comare the short-term treatment effects; in articular, we comare the reduction of the body mass index (BMI), defined as difference of 3-month BMI and baseline BMI. There were 059 subjects in the treatment grou and 956 subjects in the control grou at the baseline. At 3 month 63 subjects in the treatment grou and 6 subjects in the control grou resonded to the second survey. We assume the data are missing at random. Subjects in the treatment grou who remained in the study have much lower baseline BMI than those who droed out (P<0.00), but this differences is not seen in the control grou (P=0.47); On the other hand, for the control grou subjects who remained in the study have better baseline health, as measured by the number of revious diseases, than those who droed out of the study (P<0.0); this differences was not seen in the treatment grou (P=0.56). These differences suggest that interactions between treatment and baseline covariates need to be included when estimating the roensity scores. We estimate the roensity scores by a logistic regression, with the inclusive criterion of retaining all variables with P-values less than 0.0. The final model includes the following covariates: baseline BMI; number of revious disease; baseline self care; which is harder eating less or being active; baseline exercise suort; baseline activity level; baseline eating toology; education; ethnic identity; treatment; interaction of treatment and baseline BMI; interaction of treatment and baseline eating toology;

29 0 interaction of treatment and baseline activity level; interaction of treatment and number of revious disease; interaction of treatment and which is harder eating less or being active. We aly the PSPP method and the stratified PSPP method to the data as follows: (a) PSPP method with null g-function, denoted as [( sp )], where scores defined in section. (b) Model (a) with treatment as a covariate, denoted as [ sp ( ) + treatment]. (c) Model (b) with baseline covariates, denoted as [ sp ( ) + treatment + g(baseline vars)]. (d) Stratified PSPP method with null g-function, denoted as [ Icsc( P )]. c= P is the roensity (e) Model (d) with baseline covariates, denoted as [ Is( P) + g(baseline vars)]. The baseline covariates in the g-function of model (c) and (e) include: ethnic identity; baseline medical advice; baseline eating toology; baseline cardio exercise; baseline activity level; baseline BMI; number of revious disease; number of weigh loss methods tried; motivation of weigh loss; which is harder eating less or being active. c= c c Results are summarized in Table.3. Emirical Standard errors (SE) and the corresonding confidence intervals are obtained from 00 bootstra samles. The treatment grou has a larger reduction of BMI after 3 month (-0.9 (0.09)) comared to the control grou (-0.45 (0.0)) based on the comlete case analysis. The stratified PSPP method (model d and e) and the PSPP method with the treatment as a covariate (model b and c) yield similar results, with the reduction of BMI ranging from to -.0 for the treatment grou and to in the control grou. The 95% confidence intervals for the treatment grou do not overla with the control grou suggesting a treatment effect on the weight loss (model b, c, d, e). On the other hand, the PSPP method without treatment as a covariate does not shown the treatment effect (95% CI (-0.96, -0.65) for the treatment; 95% CI (-0.76, -0.47) for the control). Adding g function into the model does not affect bias but imroves efficiency (model c and e).

30 .7 Discussion We have shown that the PSPP method yields an estimate of the marginal mean of Y with a double robustness roerty, without the need to center the covariates in the g function. However the PSPP method lacks this roerty for conditional mean estimation. We have roosed two extensions of PSPP that extend the double robustness roerty to conditional means, namely stratified PSPP for a categorical redictor, and bivariate PSPP for a continuous redictor. The key roerty of these extensions is that they include in the rediction model the interaction of the roensity score and the conditioning variable that defines the estimand of interest. Simulations are resented as emirical evidence of the robustness of these extensions. We estimate the bivariate function (, ) using a P-sline with a tensor roduct basis, but other sline fitting methods could also be alied. One choice is to use a thin late sline (Green and Silverman, 994; Wood, 999). To estimate sp sp (, ), we need to find the function g = g( t) = g( t, t ) minimizing g g g n ( yi gt (, t)) + λ [( ) + ( ) + ( ) ] dtdt i= t t t t where the g function has the form with t R M g( t) = θ + θ φ ( t) + δ E ( t t ) 0 j j j j j= j= E() s = s ln( s ); φ 3 j() t are linearly indeendent functions of t with π and λ is the smoothing arameter. This model can be fit using the tsline rocedure from SAS (SAS, 99; Ngo and Wand 004; Wand 003). We also fitted thin late slines for the simulation study in section 5 but found some samles failed to yield estimates due to negative variance estimates. For the other samles the results from the tsline rocedure are comarable to those from a P-sline with a tensor roduct basis. n More generally, a PSPP method that yields doubly robust estimates of the conditional mean of Y given a subset of the covariates (,..., s), s <, requires

31 inclusion of the interactions between the roensity score and (,..., s) ; clearly the curse of dimensionality comes increasingly into lay as the size of s increases. A natural question is whether these roensity score methods can be extended to yield robust estimates for the regression given the comlete set of covariates, such as, (,..., ). We note that in our setting the cases with Y missing contribute no information to this regression, so there is no gain in develoing an imutation model. If it is the covariates rather than the outcome that have missing values, however, then the incomlete cases do include information, and it remains an oen question whether roensity methods can be used to increase the robustness of inference in such situations. This question deserves future study. We use a smoothing sline function to model the relationshi between Y and the roensity score and our method has a DR roerty. The DR roerty can also be achieved by modeling the relationshi arametrically. One method is to include the inverse of the roensity score as a linear term in the imutation model (Firth and Bennett, 998; Bang and Robins, 005). Another aroach is to calibrate the redictions from a arametric model by adding means of the weighted residuals, with weights equal to inverse of the roensity scores (Robins, Rotnitzky and Zhao, 994; Scharfstein, Rotnitzky and Robins, 999). We are currently conducting simulations to comare the erformance of these methods with the PSPP method, and results will be reorted in a future aer. Acknowledgements: this research is suorted by CECCR Center grant P50 CA045. We thank Trivellore Raghunathan for assistance with Theorem.

32 3 Table. Examle : Emirical Bias, Standard Deviation (SD) and Root Mean Squared Error (RMSE) for (A) Marginal mean of Y, and (B) Conditional Mean of Y given. Entries are multilied by 00. (A) Marginal Mean of Y Methods Bias STD RMSE BD CC (a)correct ANCOVA [ + ] (b)wrong ANCOVA [ ] (c)pspp [ sp ( )] correct ( correct ) (d)pspp [ sp + ] (e)pspp [ sp ( )] wrong (f)pspp [ sp + ] ( wrong ) (B) Conditional Mean of Y given Methods = = = 3 Bias STD RMSE Bias STD RMSE Bias STD RMSE BD CC (a) Correct ANCOVA [, ] (b)wrong ANCOVA [ ] (c )PSPP [ sp ( corr ect )] (d)pspp [ sp ( correct ) + ] (e)pspp [ sp ( wrong )] (f)pspp [ sp ( wrong ) + ]

33 Table. Examle : Emirical Bias, Root Mean Squared Error (RMSE) and Coverage rate (Cov) for (A) Marginal mean of Y, (B) Conditional Mean of Y given, and (C) Intercet and Sloes for Regression of Y on,. Entries are multilied by 00. Methods Overall Mean Conditional mean given Coefficients of conditional mean given = = =3 Intercet Bias RMSE Cov Bias RMSE Cov Bias RMSE Cov Bias RMSE Cov Bias RMSE Cov Bias RMSE Cov Bias RMSE Cov BD CC (a) Correct Model [ ] (b) PSPP (c) PSPP [( s P )] [( s P ) ] (d) PSPP ( [( s P ) + + ] ) (e) Stratified PSPP ( Y = I ( csc P ) ) (f) Bivariate PSPP ( Y = s( P, ) ) (g) Stratified_Bivariate PSPP ( Y = I c s c ( P ) + (, ) s P )

University of Michigan School of Public Health

University of Michigan School of Public Health University of Michigan School of Public Health The University of Michigan Deartment of Biostatistics Working Paer Series ear 003 Paer 5 Robust Likelihood-based Analysis of Multivariate Data with Missing

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

STK4900/ Lecture 7. Program

STK4900/ Lecture 7. Program STK4900/9900 - Lecture 7 Program 1. Logistic regression with one redictor 2. Maximum likelihood estimation 3. Logistic regression with several redictors 4. Deviance and likelihood ratio tests 5. A comment

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform Statistics II Logistic Regression Çağrı Çöltekin Exam date & time: June 21, 10:00 13:00 (The same day/time lanned at the beginning of the semester) University of Groningen, Det of Information Science May

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Resonse) Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout

More information

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

SAS for Bayesian Mediation Analysis

SAS for Bayesian Mediation Analysis Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses

More information

On-Line Appendix. Matching on the Estimated Propensity Score (Abadie and Imbens, 2015)

On-Line Appendix. Matching on the Estimated Propensity Score (Abadie and Imbens, 2015) On-Line Aendix Matching on the Estimated Proensity Score Abadie and Imbens, 205 Alberto Abadie and Guido W. Imbens Current version: August 0, 205 The first art of this aendix contains additional roofs.

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

A New Asymmetric Interaction Ridge (AIR) Regression Method

A New Asymmetric Interaction Ridge (AIR) Regression Method A New Asymmetric Interaction Ridge (AIR) Regression Method by Kristofer Månsson, Ghazi Shukur, and Pär Sölander The Swedish Retail Institute, HUI Research, Stockholm, Sweden. Deartment of Economics and

More information

Semiparametric Efficiency in GMM Models with Nonclassical Measurement Error

Semiparametric Efficiency in GMM Models with Nonclassical Measurement Error Semiarametric Efficiency in GMM Models with Nonclassical Measurement Error Xiaohong Chen New York University Han Hong Duke University Alessandro Tarozzi Duke University August 2005 Abstract We study semiarametric

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Collaborative Place Models Supplement 1

Collaborative Place Models Supplement 1 Collaborative Place Models Sulement Ber Kaicioglu Foursquare Labs ber.aicioglu@gmail.com Robert E. Schaire Princeton University schaire@cs.rinceton.edu David S. Rosenberg P Mobile Labs david.davidr@gmail.com

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi LOGISTIC REGRESSION VINAANAND KANDALA M.Sc. (Agricultural Statistics), Roll No. 444 I.A.S.R.I, Library Avenue, New Delhi- Chairerson: Dr. Ranjana Agarwal Abstract: Logistic regression is widely used when

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Chapter 13 Variable Selection and Model Building

Chapter 13 Variable Selection and Model Building Chater 3 Variable Selection and Model Building The comlete regsion analysis deends on the exlanatory variables ent in the model. It is understood in the regsion analysis that only correct and imortant

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly rovided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov

More information

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling

More information

Chapter 3. GMM: Selected Topics

Chapter 3. GMM: Selected Topics Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2

More information

Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression

Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout 4a): χ 2 test of

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

The following document is intended for online publication only (authors webpage).

The following document is intended for online publication only (authors webpage). The following document is intended for online ublication only (authors webage). Sulement to Identi cation and stimation of Distributional Imacts of Interventions Using Changes in Inequality Measures, Part

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

Adaptive Estimation of the Regression Discontinuity Model

Adaptive Estimation of the Regression Discontinuity Model Adative Estimation of the Regression Discontinuity Model Yixiao Sun Deartment of Economics Univeristy of California, San Diego La Jolla, CA 9293-58 Feburary 25 Email: yisun@ucsd.edu; Tel: 858-534-4692

More information

Published: 14 October 2013

Published: 14 October 2013 Electronic Journal of Alied Statistical Analysis EJASA, Electron. J. A. Stat. Anal. htt://siba-ese.unisalento.it/index.h/ejasa/index e-issn: 27-5948 DOI: 1.1285/i275948v6n213 Estimation of Parameters of

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Supplemental Information

Supplemental Information Sulemental Information Anthony J. Greenberg, Sean R. Hacett, Lawrence G. Harshman and Andrew G. Clar Table of Contents Table S1 2 Table S2 3 Table S3 4 Figure S1 5 Figure S2 6 Figure S3 7 Figure S4 8 Text

More information

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling Scaling Multile Point Statistics or Non-Stationary Geostatistical Modeling Julián M. Ortiz, Steven Lyster and Clayton V. Deutsch Centre or Comutational Geostatistics Deartment o Civil & Environmental Engineering

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

One-way ANOVA Inference for one-way ANOVA

One-way ANOVA Inference for one-way ANOVA One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA

More information

arxiv: v3 [physics.data-an] 23 May 2011

arxiv: v3 [physics.data-an] 23 May 2011 Date: October, 8 arxiv:.7v [hysics.data-an] May -values for Model Evaluation F. Beaujean, A. Caldwell, D. Kollár, K. Kröninger Max-Planck-Institut für Physik, München, Germany CERN, Geneva, Switzerland

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS #A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt

More information

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION O P E R A T I O N S R E S E A R C H A N D D E C I S I O N S No. 27 DOI:.5277/ord73 Nasrullah KHAN Muhammad ASLAM 2 Kyung-Jun KIM 3 Chi-Hyuck JUN 4 A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST

More information

Uniform Law on the Unit Sphere of a Banach Space

Uniform Law on the Unit Sphere of a Banach Space Uniform Law on the Unit Shere of a Banach Sace by Bernard Beauzamy Société de Calcul Mathématique SA Faubourg Saint Honoré 75008 Paris France Setember 008 Abstract We investigate the construction of a

More information

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Journal of Sound and Vibration (998) 22(5), 78 85 VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Acoustics and Dynamics Laboratory, Deartment of Mechanical Engineering, The

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

IMPOSING CURVATURE AND MONOTONICITY ON FLEXIBLE FUNCTIONAL FORMS: AN EFFICIENT REGIONAL APPROACH

IMPOSING CURVATURE AND MONOTONICITY ON FLEXIBLE FUNCTIONAL FORMS: AN EFFICIENT REGIONAL APPROACH IMPOSING CURVATURE AND MONOTONICITY ON FLEXIBLE FUNCTIONAL FORMS: AN EFFICIENT REGIONAL APPROACH Hendrik Wolff 1, Thomas Heckelei 2 and Ron C. Mittelhammer 3 February 11, 2004 Abstract In many areas of

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population Chater 7 and s Selecting a Samle Point Estimation Introduction to s of Proerties of Point Estimators Other Methods Introduction An element is the entity on which data are collected. A oulation is a collection

More information

Partial Identification in Triangular Systems of Equations with Binary Dependent Variables

Partial Identification in Triangular Systems of Equations with Binary Dependent Variables Partial Identification in Triangular Systems of Equations with Binary Deendent Variables Azeem M. Shaikh Deartment of Economics University of Chicago amshaikh@uchicago.edu Edward J. Vytlacil Deartment

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

Estimation of Separable Representations in Psychophysical Experiments

Estimation of Separable Representations in Psychophysical Experiments Estimation of Searable Reresentations in Psychohysical Exeriments Michele Bernasconi (mbernasconi@eco.uninsubria.it) Christine Choirat (cchoirat@eco.uninsubria.it) Raffaello Seri (rseri@eco.uninsubria.it)

More information

Hidden Predictors: A Factor Analysis Primer

Hidden Predictors: A Factor Analysis Primer Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor

More information

arxiv: v4 [math.st] 3 Jun 2016

arxiv: v4 [math.st] 3 Jun 2016 Electronic Journal of Statistics ISSN: 1935-7524 arxiv: math.pr/0000000 Bayesian Estimation Under Informative Samling Terrance D. Savitsky and Daniell Toth 2 Massachusetts Ave. N.E, Washington, D.C. 20212

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

Sums of independent random variables

Sums of independent random variables 3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where

More information

Flexible Tweedie regression models for continuous data

Flexible Tweedie regression models for continuous data Flexible Tweedie regression models for continuous data arxiv:1609.03297v1 [stat.me] 12 Se 2016 Wagner H. Bonat and Célestin C. Kokonendji Abstract Tweedie regression models rovide a flexible family of

More information

Estimating Time-Series Models

Estimating Time-Series Models Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary

More information

Ratio Estimators in Simple Random Sampling Using Information on Auxiliary Attribute

Ratio Estimators in Simple Random Sampling Using Information on Auxiliary Attribute ajesh Singh, ankaj Chauhan, Nirmala Sawan School of Statistics, DAVV, Indore (M.., India Florentin Smarandache Universit of New Mexico, USA atio Estimators in Simle andom Samling Using Information on Auxiliar

More information

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE J Jaan Statist Soc Vol 34 No 2004 9 26 ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE Yasunori Fujikoshi*, Tetsuto Himeno

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

arxiv: v2 [stat.me] 3 Nov 2014

arxiv: v2 [stat.me] 3 Nov 2014 onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. Anestis.Touloumis@cruk.cam.ac.uk

More information

The Poisson Regression Model

The Poisson Regression Model The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle

More information

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression On the asymtotic sizes of subset Anderson-Rubin and Lagrange multilier tests in linear instrumental variables regression Patrik Guggenberger Frank Kleibergeny Sohocles Mavroeidisz Linchun Chen\ June 22

More information

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming Maximum Entroy and the Stress Distribution in Soft Disk Packings Above Jamming Yegang Wu and S. Teitel Deartment of Physics and Astronomy, University of ochester, ochester, New York 467, USA (Dated: August

More information

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS Massimo Vaccarini Sauro Longhi M. Reza Katebi D.I.I.G.A., Università Politecnica delle Marche, Ancona, Italy

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

The power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components

The power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components ATHES UIVERSITY OF ECOOMICS AD BUSIESS DEPARTMET OF ECOOMICS WORKIG PAPER SERIES 23-203 The ower erformance of fixed-t anel unit root tests allowing for structural breaks in their deterministic comonents

More information

Unobservable Selection and Coefficient Stability: Theory and Evidence

Unobservable Selection and Coefficient Stability: Theory and Evidence Unobservable Selection and Coefficient Stability: Theory and Evidence Emily Oster Brown University and NBER August 9, 016 Abstract A common aroach to evaluating robustness to omitted variable bias is to

More information

Comparing Group Means When Nonresponse Rates Differ

Comparing Group Means When Nonresponse Rates Differ UNF Digital Commons UNF Theses and Dissertations Student Scholarship 2015 Comparing Group Means When Nonresponse Rates Differ Gabriela M. Stegmann University of North Florida Suggested Citation Stegmann,

More information

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari Condence tubes for multile quantile lots via emirical likelihood John H.J. Einmahl Eindhoven University of Technology Ian W. McKeague Florida State University May 7, 998 Abstract The nonarametric emirical

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

Monte Carlo Studies. Monte Carlo Studies. Sampling Distribution

Monte Carlo Studies. Monte Carlo Studies. Sampling Distribution Monte Carlo Studies Do not let yourself be intimidated by the material in this lecture This lecture involves more theory but is meant to imrove your understanding of: Samling distributions and tests of

More information

Background. GLM with clustered data. The problem. Solutions. A fixed effects approach

Background. GLM with clustered data. The problem. Solutions. A fixed effects approach Background GLM with clustered data A fixed effects aroach Göran Broström Poisson or Binomial data with the following roerties A large data set, artitioned into many relatively small grous, and where members

More information

Performance of lag length selection criteria in three different situations

Performance of lag length selection criteria in three different situations MPRA Munich Personal RePEc Archive Performance of lag length selection criteria in three different situations Zahid Asghar and Irum Abid Quaid-i-Azam University, Islamabad Aril 2007 Online at htts://mra.ub.uni-muenchen.de/40042/

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

An Improved Calibration Method for a Chopped Pyrgeometer

An Improved Calibration Method for a Chopped Pyrgeometer 96 JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY VOLUME 17 An Imroved Calibration Method for a Choed Pyrgeometer FRIEDRICH FERGG OtoLab, Ingenieurbüro, Munich, Germany PETER WENDLING Deutsches Forschungszentrum

More information

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation 6.3 Modeling and Estimation of Full-Chi Leaage Current Considering Within-Die Correlation Khaled R. eloue, Navid Azizi, Farid N. Najm Deartment of ECE, University of Toronto,Toronto, Ontario, Canada {haled,nazizi,najm}@eecg.utoronto.ca

More information

Chapter 10. Supplemental Text Material

Chapter 10. Supplemental Text Material Chater 1. Sulemental Tet Material S1-1. The Covariance Matri of the Regression Coefficients In Section 1-3 of the tetbook, we show that the least squares estimator of β in the linear regression model y=

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

Sampling. Inferential statistics draws probabilistic conclusions about populations on the basis of sample statistics

Sampling. Inferential statistics draws probabilistic conclusions about populations on the basis of sample statistics Samling Inferential statistics draws robabilistic conclusions about oulations on the basis of samle statistics Probability models assume that every observation in the oulation is equally likely to be observed

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

Research of power plant parameter based on the Principal Component Analysis method

Research of power plant parameter based on the Principal Component Analysis method Research of ower lant arameter based on the Princial Comonent Analysis method Yang Yang *a, Di Zhang b a b School of Engineering, Bohai University, Liaoning Jinzhou, 3; Liaoning Datang international Jinzhou

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Perspective

Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Perspective Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Persective Song Liu, Kenji Fukumizu arxiv:506.02784v3 [stat.ml] 9 Oct 205 Abstract Transfer learning assumes classifiers

More information

AN OPTIMAL CONTROL CHART FOR NON-NORMAL PROCESSES

AN OPTIMAL CONTROL CHART FOR NON-NORMAL PROCESSES AN OPTIMAL CONTROL CHART FOR NON-NORMAL PROCESSES Emmanuel Duclos, Maurice Pillet To cite this version: Emmanuel Duclos, Maurice Pillet. AN OPTIMAL CONTROL CHART FOR NON-NORMAL PRO- CESSES. st IFAC Worsho

More information