The mfp Package. R topics documented: May 18, Title Multivariable Fractional Polynomials. Version Date

Similar documents
The mfp Package. July 27, GBSG... 1 bodyfat... 2 cox... 4 fp... 4 mfp... 5 mfp.object... 7 plot.mfp Index 9

Multivariable Fractional Polynomials

Multivariable Fractional Polynomials

Package CoxRidge. February 27, 2015

Fractional Polynomials and Model Averaging

The nltm Package. July 24, 2006

1 Multiple Regression

The influence of categorising survival time on parameter estimates in a Cox model

Package shrink. August 29, 2013

Towards stratified medicine instead of dichotomization, estimate a treatment effect function for a continuous covariate

Package anoint. July 19, 2015

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

Package JointModel. R topics documented: June 9, Title Semiparametric Joint Models for Longitudinal and Counting Processes Version 1.

Package crrsc. R topics documented: February 19, 2015

Package ARCensReg. September 11, 2016

especially with continuous

Package penalized. February 21, 2018

Survival Regression Models

Package penmsm. R topics documented: February 20, Type Package

Package ICBayes. September 24, 2017

Package blme. August 29, 2016

Package dispmod. March 17, 2018

The SpatialNP Package

Package MiRKATS. May 30, 2016

Generalized Linear Models

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich

The betareg Package. October 6, 2006

Multimodel Inference: Understanding AIC relative variable importance values. Kenneth P. Burnham Colorado State University Fort Collins, Colorado 80523

The lmm Package. May 9, Description Some improved procedures for linear mixed models

Relative-risk regression and model diagnostics. 16 November, 2015

Introduction to logistic regression

Package dispmod. R topics documented: February 15, Version 1.1. Date Title Dispersion models

Bayesian Linear Models

Modelling excess mortality using fractional polynomials and spline functions. Modelling time-dependent excess hazard

Package metafuse. October 17, 2016

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Lecture 7 Time-dependent Covariates in Cox Regression

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Package MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1.

Package metansue. July 11, 2018

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Package mmm. R topics documented: February 20, Type Package

Package idmtpreg. February 27, 2018

Package netcoh. May 2, 2016

Package effectfusion

Multiple imputation in Cox regression when there are time-varying effects of covariates

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

Package covtest. R topics documented:

flexsurv: flexible parametric survival modelling in R. Supplementary examples

Package rnmf. February 20, 2015

Package SpatialNP. June 5, 2018

This manual is Copyright 1997 Gary W. Oehlert and Christopher Bingham, all rights reserved.

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Package My.stepwise. June 29, 2017

Package complex.surv.dat.sim

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Package sklarsomega. May 24, 2018

8 Nominal and Ordinal Logistic Regression

Package SimSCRPiecewise

Package LBLGXE. R topics documented: July 20, Type Package

Basic Medical Statistics Course

Generalized linear models

Package MonoPoly. December 20, 2017

Checking model assumptions with regression diagnostics

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Package AID. R topics documented: November 10, Type Package Title Box-Cox Power Transformation Version 2.3 Date Depends R (>= 2.15.

SPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1

Package drgee. November 8, 2016

Advanced Quantitative Data Analysis

Package ICGOR. January 13, 2017

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Package spef. February 26, 2018

Multivariate Survival Analysis

The coxvc_1-1-1 package

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Package HGLMMM for Hierarchical Generalized Linear Models

Simple Linear Regression

SSUI: Presentation Hints 2 My Perspective Software Examples Reliability Areas that need work

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Package threg. August 10, 2015

A Practitioner s Guide to Generalized Linear Models

Package InferenceSMR

More Accurately Analyze Complex Relationships

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Two techniques for investigating interactions between treatment and continuous covariates in clinical trials

-A wild house sparrow population case study

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Generalized Additive Models

Package icmm. October 12, 2017

σ σ MLR Models: Estimation and Inference v.3 SLR.1: Linear Model MLR.1: Linear Model Those (S/M)LR Assumptions MLR3: No perfect collinearity

Chapter 1 Statistical Inference

Package jmcm. November 25, 2017

Müller: Goodness-of-fit criteria for survival data

The Stata Journal. Editors H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas

Package generalhoslem

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Transcription:

Title Multivariable Fractional Polynomials Version 1.4.0 Date 2007-03-01 The mfp Package May 18, 2007 Author original by Gareth Ambler <gareth@stats.ucl.ac.uk>, modified by Axel Benner <benner@dkfz.de> Maintainer Axel Benner <benner@dkfz.de> Depends survival, R (>= 2.0.0) Fractional polynomials are used to represent curvature in regression models. A key reference is Royston and Altman, 1994. License GPL version 2 or newer R topics documented: GBSG............................................ 1 bodyfat........................................... 2 cox.............................................. 4 fp.............................................. 4 mfp............................................. 5 mfp.object.......................................... 8 plot.mfp........................................... 9 Index 10 1

2 GBSG GBSG German Breast Cancer Study Group Format A data frame containing the observations from the GBSG study. data(gbsg) This data frame contains the observations of 686 women: id patient id 1...686. htreat hormonal therapy, a factor at two levels 0 (no) and 1 (yes). age of the patients in years. menostat menopausal status, a factor at two levels 1 (premenopausal) and 2 (postmenopausal). tumsize tumor size (in mm). tumgrad tumor grade, a ordered factor at levels 1 < 2 < 3. posnodal number of positive nodes. prm progesterone receptor (in fmol). esm estrogen receptor (in fmol). rfst recurrence free survival time (in days). cens censoring indicator (0 censored, 1 event). References M. Schumacher, G. Basert, H. Bojar, K. Huebner, M. Olschewski, W. Sauerbrei, C. Schmoor, C. Beyerle, R.L.A. Neumann and H.F. Rauschecker for the German Breast Cancer Study Group (1994). Randomized 2 2 trial evaluating hormonal treatment and the duration of chemotherapy in nodepositive breast cancer patients. Journal of Clinical Oncology, 12, 2086 2093. W. Sauerbrei and P. Royston (1999). Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistics Society Series A, Volume 162(1), 71 94. Examples data(gbsg) mfp(surv(rfst, cens) ~ fp(age, df = 2, select = 0.05) + fp(prm, df = 4, select = 0.05), family = cox, data = GBSG)

bodyfat 3 bodyfat percentage of body fat determined by underwater weighing A data frame containing the estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men. Source: Roger W. Johnson (1996), Fitting Percentage of Body Fat to Simple Body Measurements, Journal of Statistics Education. Original data are from K. Penrose, A. Nelson, and A. Fisher (1985), Generalized Body Composition Prediction Equation for Men Using Simple Measurement Techniques (abstract), Medicine and Science in Sports and Exercise, 17(2), 189. Data were supplied by Dr. A. Garth Fisher, Human Performance Research Center, Brigham Young University, who gave permission to freely distribute the data and use them for non-commercial purposes. Note, however, that there seem to be a few errors. For instance, body densities for cases 48, 76, and 96 seem to have one digit in error in the two body fat percentage values. Also note a man (case 42) over 200 pounds in weight who is less than 3 feet tall (the height should presumably be 69.5 inches, not 29.5 inches). Percent body fat estimates are truncated to zero when negative (case 182). data(bodyfat) Format This data frame contains the observations of 252 men: case Case number. brozek Percent body fat using Brozek s equation: 457/Density - 414.2 siri Percent body fat using Siri s equation: 495/Density - 450 density Density determined from underwater weighing (gm/cm**3). age Age (years). weight Weight (lbs). height Height (inches). neck Neck circumference (cm). chest Chest circumference (cm). abdomen Abdomen circumference (cm) at the umbilicus and level with the iliac crest. hip Hip circumference (cm). thigh Thigh circumference (cm). knee Knee circumference (cm). ankle Ankle circumference (cm). biceps Biceps (extended) circumference (cm). forearm Forearm circumference (cm). wrist Wrist circumference (cm) distal to the styloid processes.

4 fp Source e.g. http://lib.stat.cmu.edu/datasets/bodyfat References R.W. Johnson (1996). Fitting percentage of body fat to simple body measurements. Journal of Statistics Education [Online], 4(1). K.W. Penrose, A.G. Nelson, A.G. Fisher (1985). Generalized body composition prediction equation for men using simple measurement techniques. Medicine and Science in Sports and Exercise, 17, 189. P. Royston, W. Sauerbrei (2004). Improving the robustness of fractional polynomial models by preliminary covariate transformation. Submitted. Examples data(bodyfat) bodyfat$height[bodyfat$case==42] <- 69.5 # apparent error bodyfat <- bodyfat[-which(bodyfat$case==39),] # cp. Royston $\amp$ Sauerbrei, 2004 mfp(siri ~ fp(age, df = 4, select = 0.1) + fp(weight, df = 4, select = 0.1) + fp(height, df = 4, select = 0.1), family = gaussian, data = bodyfat) cox Family Objects for Cox Proportional Regression Models Family objects provide a convenient way to specify the details of the models used by functions such as glm. See the documentation for glm for details. cox() fp Fractional Polynomial Transformation This function defines a fractional polynomial object for a quantitative input variable x. fp(x, df = 4, select = NA, alpha = NA, scale=true)

mfp 5 Arguments x df select alpha scale quantitative input variable. degrees of freedom of the FP model. df=4: FP model with maximum permitted degree m=2 (default), df=2: FP model with maximum permitted degree m=1, df=1: Linear FP model. sets the variable selection level for the input variable. sets the FP selection level for the input variable. use pre-transformation scaling to avoid numerical problems (default=true). Value Examples ## Not run: fp(x, df = 4, select = 0.05, scale = FALSE) ## End(Not run) mfp Fit a Multiple Fractional Polynomial Model Selects the multiple fractional polynomial (MFP) model which best predicts the outcome. The model may be a generalized linear model or a proportional hazards (Cox) model. mfp(formula, data, family = gaussian, method = c("efron", "breslow"), subse select = 1, maxits = 20, keep = NULL, rescale = TRUE, verbose = FAL Arguments formula a formula object, with the response of the left of a operator, and the terms, separated by + operators, on the right. Fractional polynomial terms are indicated by fp. If a Cox PH model is required then the outcome should be specified using the Surv() notation used by coxph. data family a data frame containing the variables occurring in the formula. If this is missing, the variables should be on the search list. a family object - a list of functions and expressions for defining the link and variance functions, initialization and iterative weights. Families supported are gaussian, binomial, poisson, Gamma, inverse.gaussian and quasi. Additionally Cox models are specified using "cox".

6 mfp method subset na.action init alpha select maxits keep rescale verbose x y a character string specifying the method for tie handling. This argument is used for Cox models only and has no effect for other model families. See coxph for details. expression saying which subset of the rows of the data should be used in the fit. All observations are included by default. function to filter missing data. This is applied to the model.frame after any subset argument has been used. The default (with na.fail) is to create an error if any missing values are found. vector of initial values of the iteration (in Cox models only). sets the FP selection level for all predictors. Values for individual predictors may be changed via the fp function in the formula. sets the variable selection level for all predictors. Values for individual predictors may be changed via the fp function in the formula. maximum number of iterations for the backfitting stage. keep one or more variables in the model. The selection level for these variables will be set to 1. logical; uses re-scaling to show the parameters for covariates on their original scale (default TRUE). logical; run in verbose mode (default FALSE). logical; return the design matrix in the model object? logical; return the response in the model object? Details The estimation algorithm processes the predictors in turn. Initially, mfp silently arranges the predictors in order of increasing P-value (i.e. of decreasing statistical significance) for omitting each predictor from the model comprising all the predictors with each term linear. The aim is to model relatively important variables before unimportant ones. At the initial cycle, the best-fitting FP function for the first predictor is determined, with all the other variables assumed linear. The FP selection procedure is described below. The functional form (but NOT the estimated regression coefficients) for this predictor is kept, and the process is repeated for the other predictors in turn. The first iteration concludes when all the variables have been processed in this way. The next cycle is similar, except that the functional forms from the initial cycle are retained for all variables excepting the one currently being processed. A variable whose functional form is prespecified to be linear (i.e. to have 1 df) is tested only for exclusion within the above procedure when its nominal P-value (selection level) according to select() is less than 1. Updating of FP functions and candidate variables continues until the functions and variables included in the overall model do not change (convergence). Convergence is usually achieved within 1-4 cycles. Model Selection mfp uses a form of backward elimination. It start from a most complex permitted FP model and attempt to simplify it by reducing the df. The selection algorithm is inspired by the so-called

mfp 7 Value "closed test procedure", a sequence of tests in each of which the "familywise error rate" or P-value is maintained at a prespecified nominal value such as 0.05. The "closed test" algorithm for choosing an FP model with maximum permitted degree m=2 (4 df) for a single continuous predictor, x, is as follows: 1. Inclusion: test the FP in x for possible omission of x (4 df test, significance level determined by select). If x is significant, continue, otherwise drop x from the model. 2. Non-linearity: test the FP in x against a straight line in x (3 df test, significance level determined by alpha). If significant, continue, otherwise the chosen model is a straight line. 3. Simplification: test the FP with m=2 (4 df) against the best FP with m=1 (2 df) (2 df test at alpha level). If significant, choose m=2, otherwise choose m=1. All significance tests are carried out using an approximate P-value calculation based on a difference in deviances (-2 x log likelihood) having a chi-squared or F distribution, depending on the regression in use. Therefore, each of the tests in the procedure maintains a significance level only approximately equal to select. The algorithm is thus not truly a closed procedure. However, for a given significance level it does provide some protection against over-fitting, that is against choosing over-complex MFP models. an object of class mfp is returned which either inherits from both glm and lm or coxph. Side Effects details are produced on the screen regarding the progress of the backfitting routine. At completion of the algorithm a table is displayed showing the final powers selected for each variable along with other details. Known Bugs glm models should not be specified without an intercept term as the software does not yet allow for that possibility. Author(s) Gareth Ambler and Axel Benner References Ambler G, Royston P (2001) Fractional polynomial model selection procedures: investigation of Type I error rate. Journal of Statistical Simulation and Computation 69: 89 108. Benner A (2005) mfp: Multivariable fractional polynomials. R News 5(2): 20 23. Royston P, Altman D (1994) Regression using fractional polynomials of continuous covariates. Appl Stat. 3: 429 467. Sauerbrei W, Royston P (1999) Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society (Series A) 162: 71 94.

8 mfp.object See Also mfp.object, fp, plot.fp, glm Examples data(gbsg) f <- mfp(surv(rfst, cens) ~ fp(age, df = 4, select = 0.05) + fp(prm, df = 4, select = 0.05), family = cox, data = GBSG) print(f) survfit(f$fit) # use proposed coxph model fit for survival curve estimation mfp.object Multiple Fractional Polynomial Model Object Value Objects returned by fitting fractional polynomial model objects. These are objects representing fitted mfp models. Class mfp inherits from either glm or coxph depending on the type of model fitted. In addition to the standard glm/coxph components the following components are included in a mfp object. x powers the final FP transformations that are contained in the design matrix x. The predictor "z" with 4 df would have corresponding columns "z.1" and "z.2" in x. a matrix containing the best FP powers for each predictor. If a predictor has less than two powers a NA will fill the appropriate cell of the matrix. pvalues a matrix containing the P-values from the closed tests. Briefly p.null is the P-value for the test of inclusion (see mfp), p.lin corresponds to the test of nonlinearity and p.fp the test of simplification. The best m=1 power (power2) and best m=2 powers (power4.1 and power4.2) are also given. scale df.initial df.final dev dev.lin dev.null fptable all predictors are shifted and rescaled before being power transformed if nonpositive values are encountered or the range of the predictor is reasonably large. If x would be used instead of x where x = (x+a)/b the parameters a (shift) and b (scale) are contained in the matrix scale. a vector containing the degrees of freedom allocated to each predictor. a vector containing the degrees of freedom of each predictor at convergence of the backfitting algorithm. the deviance of the final model. the deviance of the model that has every predictor included with 1 df (i.e. linear). the deviance of the null model. the table of the final fp transformations.

plot.mfp 9 formula fit the proposed formula for a call of glm/coxph. the fitted glm/coxph model using the proposed formula. This component can be used for prediction, etc. See Also mfp, glm.object plot.mfp Plots for mfp objects This function draws two plots: (i) the linear predictor function and (ii) partial residuals together with a lowess smooth. For Cox models also smoothed martingale based residuals of the null model are plotted against the predictor. plot.mfp(x, var=null, ref.zero=true, ask=true,...) Arguments x Value var ref.zero ask object representing a fitted mfp model. the variable for which plots are desired. By default, plots are produced in turn for each variable of a model. subtract a constant from X beta before plotting so that the reference value of the x -variable yields y=0.... further arguments. logical; if TRUE, the user is asked before each plot, see par(ask=.). Examples data(gbsg) f <- mfp(surv(rfst, cens) ~ fp(age, df = 4, select = 0.05) + fp(prm, df = 4, select = 0.05), family = cox, data = GBSG) par(mfrow=c(2,2), mar=c(4,4,1,1), mgp=c(1.5,0.75,0)) plot(f, var="age")

Index Topic classes mfp.object, 8 Topic datasets bodyfat, 2 GBSG, 1 Topic models mfp, 5 bodyfat, 2 cox, 4 fp, 4 GBSG, 1 mfp, 5 mfp.object, 8 plot.mfp, 9 10