The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

Similar documents
The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

Lisa Gilmore Gabe Waggoner

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editor Nicholas J. Cox Geography Department Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

Editor Executive Editor Associate Editors Copyright Statement:

The Stata Journal. Stata Press Production Manager

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editor Nicholas J. Cox Geography Department Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Stata Press Production Manager

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editor Nicholas J. Cox Geography Department Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Stata Press Production Manager

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editor Nicholas J. Cox Department of Geography University of Durham South Road Durham City DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Stata Press Production Manager

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

Lisa Gilmore Gabe Waggoner

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editor Nicholas J. Cox Geography Department Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Stata Press Production Manager

The Stata Journal. Lisa Gilmore Gabe Waggoner

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

Editor Executive Editor Associate Editors Copyright Statement:

Lisa Gilmore Gabe Waggoner

mi estimate : regress newrough_ic Current Torch_height Slowoncurves E_0 G_0 Current > 2 Pressure2 Cut_speed2 Presscutspeed, noconstant

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

This is a repository copy of EQ5DMAP: a command for mapping between EQ-5D-3L and EQ-5D-5L.

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Stata Press Production Manager

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Geography Department Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Stata Press Production Manager

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

The Stata Journal. Volume 11 Number A Stata Press publication StataCorp LP College Station, Texas

The Stata Journal. Volume 15 Number A Stata Press publication StataCorp LP College Station, Texas

The Stata Journal. Lisa Gilmore Gabe Waggoner

The Stata Journal. Stata Press Production Manager

Problem Set 1 ANSWERS

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

The Stata Journal. Stata Press Copy Editors David Culwell and Deirdre Skaggs

Confidence intervals for the variance component of random-effects linear models

The Simulation Extrapolation Method for Fitting Generalized Linear Models with Additive Measurement Error

Stata tip 63: Modeling proportions

Simulating complex survival data

From the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models

For more information on the Stata Journal, including information for authors, see the web page.

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

The simulation extrapolation method for fitting generalized linear models with additive measurement error

The regression-calibration method for fitting generalized linear models with additive measurement error

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Marginal Effects in Multiply Imputed Datasets

Instantaneous geometric rates via Generalized Linear Models

Editor Nicholas J. Cox Geography Department Durham University South Road Durham City DH1 3LE UK

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Meta-analysis of epidemiological dose-response studies

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

Recent Developments in Multilevel Modeling

leebounds: Lee s (2009) treatment effects bounds for non-random sample selection for Stata

The Stata Journal. Volume 13 Number A Stata Press publication StataCorp LP College Station, Texas

Applied Regression Modeling

STATISTICAL ANALYSIS WITH MISSING DATA

especially with continuous

The Stata Journal. Lisa Gilmore Gabe Waggoner

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

Generalized linear models

Treatment interactions with nonexperimental data in Stata

Statistical Practice

A Note on Bayesian Inference After Multiple Imputation

Transcription:

The Stata Journal Editor H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas 77843 979-845-8817; fax 979-845-6077 jnewton@stata-journal.com Associate Editors Christopher F. Baum Boston College Nathaniel Beck New York University Rino Bellocco Karolinska Institutet, Sweden, and University of Milano-Bicocca, Italy Maarten L. Buis Tübingen University, Germany A. Colin Cameron University of California Davis Mario A. Cleves Univ. of Arkansas for Medical Sciences William D. Dupont Vanderbilt University David Epstein Columbia University Allan Gregory Queen s University James Hardin University of South Carolina Ben Jann University of Bern, Switzerland Stephen Jenkins University of Essex Ulrich Kohler WZB, Berlin Frauke Kreuter University of Maryland College Park Stata Press Editorial Manager Stata Press Copy Editors Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK n.j.cox@stata-journal.com Peter A. Lachenbruch Oregon State University Jens Lauritsen Odense University Hospital Stanley Lemeshow Ohio State University J. Scott Long Indiana University Roger Newson Imperial College, London Austin Nichols Urban Institute, Washington DC Marcello Pagano Harvard School of Public Health Sophia Rabe-Hesketh University of California Berkeley J. Patrick Royston MRC Clinical Trials Unit, London Philip Ryan University of Adelaide Mark E. Schaffer Heriot-Watt University, Edinburgh Jeroen Weesie Utrecht University Nicholas J. G. Winter University of Virginia Jeffrey Wooldridge Michigan State University Lisa Gilmore Deirdre Patterson and Erin Roberson

The Stata Journal publishes reviewed papers together with shorter notes or comments, regular columns, book reviews, and other material of interest to Stata users. Examples of the types of papers include 1) expository papers that link the use of Stata commands or programs to associated principles, such as those that will serve as tutorials for users first encountering a new field of statistics or a major new technique; 2) papers that go beyond the Stata manual in explaining key features or uses of Stata that are of interest to intermediate or advanced users of Stata; 3) papers that discuss new commands or Stata programs of interest either to a wide spectrum of users (e.g., in data management or graphics) or to some large segment of Stata users (e.g., in survey statistics, survival analysis, panel analysis, or limited dependent variable modeling); 4) papers analyzing the statistical properties of new or existing estimators and tests in Stata; 5) papers that could be of interest or usefulness to researchers, especially in fields that are of practical importance but are not often included in texts or other journals, such as the use of Stata in managing datasets, especially large datasets, with advice from hard-won experience; and 6) papers of interest to those who teach, including Stata with topics such as extended examples of techniques and interpretation of results, simulations of statistical concepts, and overviews of subject areas. For more information on the Stata Journal, including information for authors, see the webpage http://www.stata-journal.com The Stata Journal is indexed and abstracted in the following: CompuMath Citation Index R Current Contents/Social and Behavioral Sciences R RePEc: Research Papers in Economics Science Citation Index Expanded (also known as SciSearch R ) Scopus TM Social Sciences Citation Index R Copyright Statement: The Stata Journal and the contents of the supporting files (programs, datasets, and help files) are copyright c by StataCorp LP. The contents of the supporting files (programs, datasets, and help files) may be copied or reproduced by any means whatsoever, in whole or in part, as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal. The articles appearing in the Stata Journal may be copied or reproduced as printed copies, in whole or in part, as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal. Written permission must be obtained from StataCorp if you wish to make electronic copies of the insertions. This precludes placing electronic copies of the Stata Journal, in whole or in part, on publicly accessible websites, fileservers, or other locations where the copy may be accessed by anyone other than the subscriber. Users of any of the software, ideas, data, or other materials published in the Stata Journal or the supporting files understand that such use is made without warranty of any kind, by either the Stata Journal, the author, or StataCorp. In particular, there is no warranty of fitness of purpose or merchantability, nor for special, incidental, or consequential damages such as loss of profits. The purpose of the Stata Journal is to promote free communication among Stata users. The Stata Journal (ISSN 1536-867X) is a publication of Stata Press. Stata, Mata, NetCourse, and Stata Press are registered trademarks of StataCorp LP.

The Stata Journal (2010) 10, Number 3, pp. 496 499 Stata tip 89: Estimating means and percentiles following multiple imputation Peter A. Lachenbruch Oregon State University Corvallis, OR peter.lachenbruch@oregonstate.edu 1 Introduction In a statistical analysis, I usually want some basic descriptive statistics such as the mean, standard deviation, extremes, and percentiles. See, for example, Pagano and Gauvreau (2000). Stata conveniently provides these descriptive statistics with the summarize command s detail option. Alternatively, I can obtain percentiles with the centile command. For example, with auto.dta, we have. sysuse auto (1978 Automobile Data). summarize price, detail Price Percentiles Smallest 1% 3291 3291 5% 3748 3299 10% 3895 3667 Obs 74 25% 4195 3748 Sum of Wgt. 74 50% 5006.5 Mean 6165.257 Largest Std. Dev. 2949.496 75% 6342 13466 90% 11385 13594 Variance 8699526 95% 13466 14500 Skewness 1.653434 99% 15906 15906 Kurtosis 4.819188 However, if I have missing values, the summarize command is not supported by mi estimate or by the user-written mim command (Royston 2004, 2005a,b, 2007; Royston, Carlin, and White 2009). 2 Finding means and percentiles when missing values are present For a general multiple-imputation reference, see Stata 11 Multiple-Imputation Reference Manual (2009). By recognizing that a regression with no independent variables estimates the mean, I can usemi estimate: regress to get multiply imputed means. If I wish to get multiply imputed quantiles, I can use mi estimate: qreg or mi estimate: sqreg for this purpose. c 2010 StataCorp LP st0205

P. A. Lachenbruch 497 I now create a dataset with missing values of price:. clonevar newprice = price. set seed 19670221. replace newprice =. if runiform() <.4 (32 real changes made, 32 to missing) The following commands were generated from the multiple-imputation dialog box. I used 20 imputations. Before Stata 11, this could also be done with the user-written commands ice and mim (Royston 2004, 2005a,b, 2007; Royston, Carlin, and White 2009).. mi set mlong. mi register imputed newprice (32 m=0 obs. now marked as incomplete). mi register regular mpg trunk weight length. mi impute regress newprice, add(20) rseed(3252010) Univariate imputation Imputations = 20 Linear regression added = 20 Imputed: m=1 through m=20 updated = 0 Observations per m Variable complete incomplete imputed total newprice 42 32 32 74 (complete + incomplete = total; imputed is the minimum across m of the number of filled in observations.). mi estimate: regress newprice Multiple-imputation estimates Imputations = 20 Linear regression Number of obs = 74 Average RVI = 1.3880 Complete DF = 73 DF: min = 19.46 avg = 19.46 DF adjustment: Small sample max = 19.46 F( 0,.) =. Within VCE type: OLS Prob > F =. newprice Coef. Std. Err. t P> t [95% Conf. Interval] _cons 5693.489 454.9877 12.51 0.000 4742.721 6644.258 From this output, we see that the estimated mean is 5,693 with a standard error of 455 (rounded up) compared with the complete data value of 6,165 with a standard error of 343 (also rounded up). However, we do not have estimates of quantiles. This could also have been done using mi estimate: mean newprice (the mean command is near the bottom of the estimation command list for mi estimate).

498 Stata tip 89 We can apply the same principle using qreg. For the 10th percentile, type. mi estimate: qreg newprice, quantile(10) Multiple-imputation estimates Imputations = 20.1 Quantile regression Number of obs = 74 Average RVI = 0.2901 Complete DF = 73 DF: min = 48.05 avg = 48.05 DF adjustment: Small sample max = 48.05 F( 0,.) =. Prob > F =. newprice Coef. Std. Err. t P> t [95% Conf. Interval] _cons 3495.635 708.54 4.93 0.000 2071.058 4920.212 Compare the value of 3,496 with the value of 3,895 from the full data. We can use the simultaneous estimates command for the full set:. mi estimate: sqreg newprice, quantiles(10 25 50 75 90) reps(20) Multiple-imputation estimates Imputations = 20 Simultaneous quantile regression Number of obs = 74 Average RVI = 0.6085 Complete DF = 73 DF adjustment: Small sample DF: min = 23.19 avg = 26.97 max = 31.65 newprice Coef. Std. Err. t P> t [95% Conf. Interval] q10 q25 q50 q75 q90 _cons 3495.635 533.5129 6.55 0.000 2408.434 4582.836 _cons 4130.037 237.1932 17.41 0.000 3642.459 4617.614 _cons 5200.238 441.294 11.78 0.000 4292.719 6107.757 _cons 6620.232 778.8488 8.50 0.000 5025.49 8214.974 _cons 8901.985 1417.022 6.28 0.000 5971.962 11832.01 3 Comments and cautions The qreg command does not give the same result as the centile command when you have complete data. This is because the centile command uses one observation, while the qreg command uses a weighted combination of the observations. It will have somewhat shorter confidence intervals, but with large datasets, the difference will be

P. A. Lachenbruch 499 small. A second caution is that comparing two medians can be tricky: the difference of two medians is not the median difference of the distributions. I have found it useful to use percentiles because there is a one-to-one relationship between percentiles if data are transformed. In our case, there is plentiful evidence that price is not normally distributed, so it would be good to look for a transformation and impute those values. This method of using regression commands without an independent variable can provide estimates of quantities that otherwise would be difficult to obtain. For example, it is much faster than finding 20 imputed percentiles and then combining them with Rubin s rules, and it is much less onerous and prone to error. 4 Acknowledgment This work was supported in part by a grant from the Cure JM Foundation. References Pagano, M., and K. Gauvreau. 2000. Principles of Biostatistics. 2nd ed. Belmont, CA: Duxbury. Royston, P. 2004. Multiple imputation of missing values. Stata Journal 4: 227 241.. 2005a. Multiple imputation of missing values: Update. Stata Journal 5: 188 201.. 2005b. Multiple imputation of missing values: Update of ice. Stata Journal 5: 527 536.. 2007. Multiple imputation of missing values: Further update of ice, with an emphasis on interval censoring. Stata Journal 7: 445 464. Royston, P., J. B. Carlin, and I. R. White. 2009. Multiple imputation of missing values: New features for mim. Stata Journal 9: 252 264. StataCorp. 2009. Stata 11 Multiple-Imputation Reference Manual. College Station, TX: Stata Press.