The mice package. Stef van Buuren 1,2. amst-r-dam, Oct. 29, TNO, Leiden. 2 Methodology and Statistics, FSBS, Utrecht University
|
|
- Bryce Lawson
- 6 years ago
- Views:
Transcription
1 The mice package 1,2 1 TNO, Leiden 2 Methodology and Statistics, FSBS, Utrecht University amst-r-dam, Oct. 29, 2012
2 > The problem of missing data Consequences of missing data Less information than planned Enough statistical power? Statistics are undefined (e.g. mean) Biases in the data analysis Systematic bias Representativity Appropriate confidence interval, P-values? In general, the presence of missing data can severely complicate interpretation and analysis of the data
3 > The problem of missing data How the calculate the mean Calculate mean in R as > y <- c(1,2,4) > mean(y) [1] where y is a vector containing three numbers, and where mean(y) is the R expression that returns their mean.
4 > The problem of missing data How to calculate the mean Now suppose that the last number is missing. R indicates this by the symbol NA, which stands for not available : > y <- c(1,2,na) > mean(y) [1] NA The mean is now undefined, and R informs us about this outcome by setting the mean to NA.
5 > The problem of missing data How to calculate the mean It is possible to add an extra argument na.rm = TRUE to the function call. This removes any missing data before calculating the mean: > mean(y, na.rm=true) [1] 1.5 This makes it possible to calculate a result, but of course the set of observations on which the calculations are based has changed. This may cause problems in statistical inference and interpretation.
6 > The problem of missing data Questions about complete-case analysis If numbers of cases change, can we compare the estimates from both models? Should we attribute differences in the estimates to changes in the model, or to changes in the subsample? Do the estimates generalize to the study population? Do we have enough cases to detect the effect of interest? Are we making the best use of the costly collected data?
7 > Single imputation methods > Mean imputation Mean imputation Frequency Ozone (ppb) Ozone (ppb) Solar Radiation (lang)
8 > Single imputation methods > Regression imputation Regression imputation Frequency Ozone (ppb) Ozone (ppb) Solar Radiation (lang)
9 > Single imputation methods > Stochastic regression imputation Stochastic regression imputation Frequency Ozone (ppb) Ozone (ppb) Solar Radiation (lang)
10 > Multiple imputation with mice Rising popularity of multiple imputation Number of publications (log) early publications 'multiple imputation' in abstract 'multiple imputation' in title Year
11 > Multiple imputation with mice Working flow in mice incomplete data imputed data analysis results pooled results mice() with() pool() data frame mids mira mipo
12 > Multiple imputation with mice How to do this in R?
13 > Multiple imputation with mice Philosophy of mice Imputation is a scientific activity, not a simple technical fix Imputed values should be plausible The method should work for any statistic calculated from the completed data
14 > How to create imputations > Criteria for good imputations Proper imputation in practice The imputation model should account for the process that created the missing data, preserve the relations in the data, and preserve the uncertainty about these relations.
15 > How to create imputations > Incorporating appropriate variation Relation between temperature and gas consumption Gas consumption (cubic feet) Temperature ( C)
16 > How to create imputations > Incorporating appropriate variation We delete gas consumption of observation 47 Gas consumption (cubic feet) a deleted observation Temperature ( C)
17 > How to create imputations > Incorporating appropriate variation Predict imputed value from regression line Gas consumption (cubic feet) b Temperature ( C)
18 > How to create imputations > Incorporating appropriate variation Predicted value + noise Gas consumption (cubic feet) c Temperature ( C)
19 > How to create imputations > Incorporating appropriate variation Predicted value + noise + parameter uncertainty Gas consumption (cubic feet) d Temperature ( C)
20 > How to create imputations > Incorporating appropriate variation Imputation based on two predictors Gas consumption (cubic feet) before insulation after insulation e Temperature ( C)
21 > How to create imputations > Incorporating appropriate variation Predictive mean matching: Y given X Gas consumption (cubic feet) before insulation after insulation Temperature ( C)
22 > How to create imputations > Incorporating appropriate variation Add two regression lines Gas consumption (cubic feet) before insulation after insulation Temperature ( C)
23 > How to create imputations > Incorporating appropriate variation Predicted given 5 C, after insulation Gas consumption (cubic feet) before insulation after insulation Temperature ( C)
24 > How to create imputations > Incorporating appropriate variation Define a matching range ŷ ± δ Gas consumption (cubic feet) before insulation after insulation Temperature ( C)
25 > How to create imputations > Incorporating appropriate variation Select potential donors Gas consumption (cubic feet) before insulation after insulation Temperature ( C)
26 > How to create imputations > Incorporating appropriate variation Bayesian PPM: Draw a line Gas consumption (cubic feet) before insulation after insulation Temperature ( C)
27 > How to create imputations > Incorporating appropriate variation Define a matching range ŷ ± δ Gas consumption (cubic feet) before insulation after insulation Temperature ( C)
28 > How to create imputations > Incorporating appropriate variation Select potential donors Gas consumption (cubic feet) before insulation after insulation Temperature ( C)
29 > How to create imputations > Imputation method in mice Method Description Scale type pmm Predictive mean matching numeric norm Bayesian linear regression numeric norm.nob Linear regression, non-bayesian numeric norm.boot Linear regression with bootstrap numeric mean Unconditional mean imputation numeric 2L.norm Two-level linear model numeric logreg Logistic regression factor, 2 levels logreg.boot Logistic regression with bootstrap factor, 2 levels polyreg Multinomial logit model factor, > 2 levels polr Ordered logit model ordered, > 2 levels lda Linear discriminant analysis factor sample Simple random sample any
30 > Multivariate missing data > MICE Multivariate Imputation by Chained Equations (MICE) MICE algorithm Specify imputation model for each incomplete column Fill in starting imputations And iterate Model: Fully Conditional Specification (FCS)
31 > Multivariate missing data > MICE Fully Conditional Specification: Con s Theoretical properties only known in special cases Cannot use computational shortcuts, like sweep-operator Care needed in building and checking the model
32 > Multivariate missing data > MICE Fully Conditional Specification : Pro s Extremely flexible Close to the data Subset selection of predictors Modular, can preserve valuable work Appears to work very well in practice
33 > Flexible Imputation of Missing Data Flexible Imputation of Missing Data (FIMD)
34 > Diagnostics > Graphs Standard diagnostic plots in mice Since mice 2.5, plots for imputed data: one-dimensional scatter: stripplot box-and-whisker plot: bwplot densities: densityplot scattergram: xyplot
35 > Diagnostics > Graphs Stripplot > library(mice) > imp <- mice(nhanes, seed = 29981) > stripplot(imp, pch = c(1, 19))
36 > Diagnostics > Graphs stripplot(imp, pch=c(1,19)) age bmi hyp chl Imputation number
37 > Diagnostics > Graphs A larger data set > imp <- mice(boys, seed = 24331, maxit = 1) > stripplot(imp) > bwplot(imp)
38 > Diagnostics > Graphs stripplot(imp) wgt hgt 200 age hc tv bmi Imputation number
39 > Diagnostics > Graphs bwplot(imp) age hgt wgt bmi hc tv Imputation number
40 > Diagnostics > Graphs densityplot(imp) hgt wgt bmi Density hc tv
41 > Diagnostics > Graphs Imputed by a normal model Genital stage Age
42 > Diagnostics > Graphs Imputed by a proportional odds model G5 G4 G3 G Genital stage G G5 G4 G3 G2 G Age
43 > Derived variables Derived variables ratio of two variables sum score index variable quadratic relations interaction term conditional imputation compositions
44 > Derived variables > Imputing a ratio How to impute a ratio? weight/height ratio: whr=wgt/hgt kg/m. Easy if only one of wgt or hgt or whr is missing Methods POST: Impute wgt and hgt, and calculate whr after imputation JAV: Impute whr as just another variable PASSIVE1: Impute wgt and hgt, and calculate whr during imputation PASSIVE2: As PASSIVE1 with adapted predictor matrix
45 > Derived variables > Imputing a ratio Method POST > imp1 <- mice(boys) > long <- complete(imp1, "long", inc = TRUE) > long$whr <- with(long, wgt/(hgt/100)) > imp2 <- long2mids(long)
46 > Derived variables > Imputing a ratio Method JAV: Just another variable > boys$whr <- boys$wgt/(boys$hgt/100) > imp.jav <- mice(boys, m = 1, seed = 32093, maxit = 10)
47 > Derived variables > Imputing a ratio Method JAV JAV passive passive 2 Weight/Height (kg/m) Height (cm)
48 > Derived variables > Imputing a ratio Method PASSIVE > meth["whr"] <- "~I(wgt/(hgt/100))"
49 > Derived variables > Imputing a ratio Method PASSIVE JAV passive passive 2 Weight/Height (kg/m) Height (cm)
50 > Derived variables > Imputing a ratio Method PASSIVE JAV passive passive 2 Weight/Height (kg/m) Height (cm)
51 > Derived variables > Summary Derived variables: summary Derived variables pose special challenges Plausible values respect data dependencies If you can, create derived variables after imputation If you cannot, use passive imputation Break up direct feedback loops using the predictor matrix
52 > Conclusion Why use MICE? 1 State-of-the-art methodology 2 Addresses all phases of multiple imputation 3 MICE algorithm is flexible and extendible 4 Extensive documentation, sample code and real datasets 5 Light and stable R package, 6 Easy to use, good defaults 7 Hundreds of applications 8 Free software, open source
53 > Conclusion Key documentation 1 Van Buuren, S. and Groothuis-Oudshoorn, C.G.M. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), Van Buuren, S. (2012). Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL.
Handling Missing Data in R with MICE
Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren 1,2 1 Methodology and Statistics, FSBS, Utrecht University 2 Netherlands Organization for Applied Scientific Research
More informationPooling multiple imputations when the sample happens to be the population.
Pooling multiple imputations when the sample happens to be the population. Gerko Vink 1,2, and Stef van Buuren 1,3 arxiv:1409.8542v1 [math.st] 30 Sep 2014 1 Department of Methodology and Statistics, Utrecht
More informationDon t be Fancy. Impute Your Dependent Variables!
Don t be Fancy. Impute Your Dependent Variables! Kyle M. Lang, Todd D. Little Institute for Measurement, Methodology, Analysis & Policy Texas Tech University Lubbock, TX May 24, 2016 Presented at the 6th
More informationStatistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23
1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing
More informationPackage accelmissing
Type Package Package accelmissing April 6, 2018 Title Missing Value Imputation for Accelerometer Data Version 1.4 Date 2018-04-04 Author Jung Ae Lee Maintainer Jung Ae Lee
More informationInteractions and Squares: Don t Transform, Just Impute!
Interactions and Squares: Don t Transform, Just Impute! Philipp Gaffert Volker Bosch Florian Meinfelder Abstract Multiple imputation [Rubin, 1987] is difficult to conduct if the analysis model includes
More informationMultiple Imputation. Paul E. Johnson 1 2. Why Impute? Amelia mice mi aregimpute Making Sense out of All of This. 1 Department of Political Science
Multiple Imputation Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2013 Descriptive 1 / 96 K.U. Overview Why Impute? Amelia
More informationMissing Data and Multiple Imputation
Maximum Likelihood Methods for the Social Sciences POLS 510 CSSS 510 Missing Data and Multiple Imputation Christopher Adolph Political Science and CSSS University of Washington, Seattle Vincent van Gogh
More informationPredictive mean matching imputation of semicontinuous variables
61 Statistica Neerlandica (2014) Vol. 68, nr. 1, pp. 61 90 doi:10.1111/stan.12023 Predictive mean matching imputation of semicontinuous variables Gerko Vink* Department of Methodology and Statistics, Utrecht
More informationFlexible multiple imputation by chained equations of the AVO-95 Survey
PG/VGZ/99.045 Flexible multiple imputation by chained equations of the AVO-95 Survey TNO Prevention and Health Public Health Wassenaarseweg 56 P.O.Box 2215 2301 CE Leiden The Netherlands Tel + 31 71 518
More informationCorrespondence Analysis of Longitudinal Data
Correspondence Analysis of Longitudinal Data Mark de Rooij* LEIDEN UNIVERSITY, LEIDEN, NETHERLANDS Peter van der G. M. Heijden UTRECHT UNIVERSITY, UTRECHT, NETHERLANDS *Corresponding author (rooijm@fsw.leidenuniv.nl)
More informationA comparison of fully Bayesian and two-stage imputation strategies for missing covariate data
A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data Alexina Mason, Sylvia Richardson and Nicky Best Department of Epidemiology and Biostatistics, Imperial College
More informationMULTILEVEL IMPUTATION 1
MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression
More informationEstimating complex causal effects from incomplete observational data
Estimating complex causal effects from incomplete observational data arxiv:1403.1124v2 [stat.me] 2 Jul 2014 Abstract Juha Karvanen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä,
More informationComparison of multiple imputation methods for systematically and sporadically missing multilevel data
Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon
More informationFlexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.
FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke
More informationBayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London
Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline
More informationIntroduction to mtm: An R Package for Marginalized Transition Models
Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationPACKAGE LMest FOR LATENT MARKOV ANALYSIS
PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationPackage hot.deck. January 4, 2016
Type Package Title Multiple Hot-Deck Imputation Version 1.1 Date 2015-11-19 Package hot.deck January 4, 2016 Author Skyler Cranmer, Jeff Gill, Natalie Jackson, Andreas Murr, Dave Armstrong Maintainer Dave
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationLecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys. Tom Rosenström University of Helsinki May 14, 2014
Lecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys Tom Rosenström University of Helsinki May 14, 2014 1 Contents 1 Preface 3 2 Definitions 3 3 Different ways to handle MAR data 4 4
More informationMissing values imputation for mixed data based on principal component methods
Missing values imputation for mixed data based on principal component methods Vincent Audigier, François Husson & Julie Josse Agrocampus Rennes Compstat' 2012, Limassol (Cyprus), 28-08-2012 1 / 21 A real
More informationMatching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14
STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University Frequency 0 2 4 6 8 Quiz 2 Histogram of Quiz2 10 12 14 16 18 20 Quiz2
More informationA Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,
A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type
More informationR Analysis Example Replication C11
R Analysis Example Replication C11 # Chapter 11 Longitudinal Analysis HRS data # Use data sets previously prepared in SAS for this chapter to reduce code burden in R # Complete Case 1 Wave # 11.3.1 Example:
More informationBasics of Modern Missing Data Analysis
Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationMethodology and Statistics for the Social and Behavioural Sciences Utrecht University, the Netherlands
Methodology and Statistics for the Social and Behavioural Sciences Utrecht University, the Netherlands MSc Thesis Emmeke Aarts TITLE: A novel method to obtain the treatment effect assessed for a completely
More informationStatistical View of Least Squares
Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationRegression Analysis By Example
Regression Analysis By Example Third Edition SAMPRIT CHATTERJEE New York University ALI S. HADI Cornell University BERTRAM PRICE Price Associates, Inc. A Wiley-Interscience Publication JOHN WILEY & SONS,
More informationKnown unknowns : using multiple imputation to fill in the blanks for missing data
Known unknowns : using multiple imputation to fill in the blanks for missing data James Stanley Department of Public Health University of Otago, Wellington james.stanley@otago.ac.nz Acknowledgments Cancer
More informationThe sbgcop Package. March 9, 2007
The sbgcop Package March 9, 2007 Title Semiparametric Bayesian Gaussian copula estimation Version 0.95 Date 2007-03-09 Author Maintainer This package estimates parameters of
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationThe pan Package. April 2, Title Multiple imputation for multivariate panel or clustered data
The pan Package April 2, 2005 Version 0.2-3 Date 2005-3-23 Title Multiple imputation for multivariate panel or clustered data Author Original by Joseph L. Schafer . Maintainer Jing hua
More informationarxiv: v2 [stat.me] 27 Nov 2017
arxiv:1702.00971v2 [stat.me] 27 Nov 2017 Multiple imputation for multilevel data with continuous and binary variables November 28, 2017 Vincent Audigier 1,2,3 *, Ian R. White 4,5, Shahab Jolani 6, Thomas
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationBayesian Multilevel Latent Class Models for the Multiple. Imputation of Nested Categorical Data
Bayesian Multilevel Latent Class Models for the Multiple Imputation of Nested Categorical Data Davide Vidotto Jeroen K. Vermunt Katrijn van Deun Department of Methodology and Statistics, Tilburg University
More informationQuantile POD for Hit-Miss Data
Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection
More informationA discrete semi-markov model for the effect of need-based treatments on the disease states Ekkehard Glimm Novartis Pharma AG
A discrete semi-markov model for the effect of need-based treatments on the disease states Ekkehard Glimm Novartis Pharma AG Basel, Switzerland Lillian Yau Sandoz Biopharmaceuticals Munich, Germany Agenda
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationThree-Level Multiple Imputation: A Fully Conditional Specification Approach. Brian Tinnell Keller
Three-Level Multiple Imputation: A Fully Conditional Specification Approach by Brian Tinnell Keller A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Arts Approved
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationBayesian Methods in Multilevel Regression
Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design
More informationGlobal Sensitivity Analysis for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach
Global for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach Daniel Aidan McDermott Ivan Diaz Johns Hopkins University Ibrahim Turkoz Janssen Research and Development September
More informationUNIVERSITY OF THE PHILIPPINES LOS BAÑOS INSTITUTE OF STATISTICS BS Statistics - Course Description
UNIVERSITY OF THE PHILIPPINES LOS BAÑOS INSTITUTE OF STATISTICS BS Statistics - Course Description COURSE COURSE TITLE UNITS NO. OF HOURS PREREQUISITES DESCRIPTION Elementary Statistics STATISTICS 3 1,2,s
More informationLCA Distal Stata function users guide (Version 1.0)
LCA Distal Stata function users guide (Version 1.0) Liying Huang John J. Dziak Bethany C. Bray Aaron T. Wagner Stephanie T. Lanza Penn State Copyright 2016, Penn State. All rights reserved. Please send
More informationLinear Decision Boundaries
Linear Decision Boundaries A basic approach to classification is to find a decision boundary in the space of the predictor variables. The decision boundary is often a curve formed by a regression model:
More informationIntroduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data
Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington
More informationPackage LBLGXE. R topics documented: July 20, Type Package
Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author
More informationTime-Invariant Predictors in Longitudinal Models
Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationDescription Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
More informationStatistics 262: Intermediate Biostatistics Model selection
Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.
More informationBayesian Analysis of Multivariate Normal Models when Dimensions are Absent
Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent Robert Zeithammer University of Chicago Peter Lenk University of Michigan http://webuser.bus.umich.edu/plenk/downloads.htm SBIES
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationA Note on Bayesian Inference After Multiple Imputation
A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in
More informationComparison for alternative imputation methods for ordinal data
Comparison for alternative imputation methods for ordinal data Federica Cugnata e Silvia Salini DEMM, Università degli Studi di Milano 22 maggio 2013 Cugnata & Salini (DEMM - Unimi) Imputation methods
More informationR-squared for Bayesian regression models
R-squared for Bayesian regression models Andrew Gelman Ben Goodrich Jonah Gabry Imad Ali 8 Nov 2017 Abstract The usual definition of R 2 (variance of the predicted values divided by the variance of the
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationPackage ARCensReg. September 11, 2016
Type Package Package ARCensReg September 11, 2016 Title Fitting Univariate Censored Linear Regression Model with Autoregressive Errors Version 2.1 Date 2016-09-10 Author Fernanda L. Schumacher, Victor
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationSTATISTICAL ANALYSIS WITH MISSING DATA
STATISTICAL ANALYSIS WITH MISSING DATA SECOND EDITION Roderick J.A. Little & Donald B. Rubin WILEY SERIES IN PROBABILITY AND STATISTICS Statistical Analysis with Missing Data Second Edition WILEY SERIES
More informationLecture Data Science
Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Regression Analysis JProf. Dr. Last Time How to find parameter of a regression model Normal Equation Gradient Decent
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationLCA_Distal_LTB Stata function users guide (Version 1.1)
LCA_Distal_LTB Stata function users guide (Version 1.1) Liying Huang John J. Dziak Bethany C. Bray Aaron T. Wagner Stephanie T. Lanza Penn State Copyright 2017, Penn State. All rights reserved. NOTE: the
More informationBAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS
BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA
More informationarxiv: v5 [stat.me] 13 Feb 2018
arxiv: arxiv:1602.07933 BOOTSTRAP INFERENCE WHEN USING MULTIPLE IMPUTATION By Michael Schomaker and Christian Heumann University of Cape Town and Ludwig-Maximilians Universität München arxiv:1602.07933v5
More informationMultiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models
Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models Hui Xie Assistant Professor Division of Epidemiology & Biostatistics UIC This is a joint work with Drs. Hua Yun
More informationReasoning with Uncertainty
Reasoning with Uncertainty Representing Uncertainty Manfred Huber 2005 1 Reasoning with Uncertainty The goal of reasoning is usually to: Determine the state of the world Determine what actions to take
More informationSome methods for handling missing values in outcome variables. Roderick J. Little
Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean
More informationIntroduction to Statistical Hypothesis Testing
Introduction to Statistical Hypothesis Testing Arun K. Tangirala Statistics for Hypothesis Testing - Part 1 Arun K. Tangirala, IIT Madras Intro to Statistical Hypothesis Testing 1 Learning objectives I
More informationlcda: Local Classification of Discrete Data by Latent Class Models
lcda: Local Classification of Discrete Data by Latent Class Models Michael Bücker buecker@statistik.tu-dortmund.de July 9, 2009 Introduction common global classification methods may be inefficient when
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationPIRLS 2016 Achievement Scaling Methodology 1
CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally
More informationUsing R in 200D Luke Sonnet
Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random
More informationBayesian Estimation of Prediction Error and Variable Selection in Linear Regression
Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Andrew A. Neath Department of Mathematics and Statistics; Southern Illinois University Edwardsville; Edwardsville, IL,
More informationAdvising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand
Advising on Research Methods: A consultant's companion Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand Contents Preface 13 I Preliminaries 19 1 Giving advice on research methods
More informationAlgebra Topic Alignment
Preliminary Topics Absolute Value 9N2 Compare, order and determine equivalent forms for rational and irrational numbers. Factoring Numbers 9N4 Demonstrate fluency in computations using real numbers. Fractions
More informationAdaptive Fractional Polynomial Modeling in SAS
SESUG 2015 ABSTRACT Paper SD65 Adaptive Fractional Polynomial Modeling in SAS George J. Knafl, University of North Carolina at Chapel Hill Regression predictors are usually entered into a model without
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationIncorporating published univariable associations in diagnostic and prognostic modeling
Incorporating published univariable associations in diagnostic and prognostic modeling Thomas Debray Julius Center for Health Sciences and Primary Care University Medical Center Utrecht The Netherlands
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationStatistical View of Least Squares
May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples
More informationMarginal Effects in Multiply Imputed Datasets
Marginal Effects in Multiply Imputed Datasets Daniel Klein daniel.klein@uni-kassel.de University of Kassel 14th German Stata Users Group meeting GESIS Cologne June 10, 2016 1 / 25 1 Motivation 2 Marginal
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level
More informationModeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods
Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Subho Majumdar School of Statistics, University of Minnesota Envelopes in Chemometrics August 4, 2014 1 / 23 Motivation
More informationVariable selection and machine learning methods in causal inference
Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of
More informationMMWS Software Program Manual
MMWS Software Program Manual 1 Software Development The MMWS program is regularly updated. The latest beta version can be downloaded from http://hlmsoft.net/ghong/ MMWS Click here to get MMWS. For a sample
More informationBOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication,
STATISTICS IN TRANSITION-new series, August 2011 223 STATISTICS IN TRANSITION-new series, August 2011 Vol. 12, No. 1, pp. 223 230 BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition,
More informationEstimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing
Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Alessandra Mattei Dipartimento di Statistica G. Parenti Università
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationEstimating Explained Variation of a Latent Scale Dependent Variable Underlying a Binary Indicator of Event Occurrence
International Journal of Statistics and Probability; Vol. 4, No. 1; 2015 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Estimating Explained Variation of a Latent
More information