Quantile Regression Methods for Reference Growth Charts

Similar documents
ECAS Summer Course. Quantile Regression for Longitudinal Data. Roger Koenker University of Illinois at Urbana-Champaign

Nonparametric Quantile Regression

Regression based principal component analysis for sparse functional data with applications to screening pubertal growth paths.

Mixed Effects Multivariate Adaptive Splines Model for the Analysis of Longitudinal and Growth Curve Data. Heping Zhang.

Reference growth charts for Saudi Arabian children and adolescents

A DYNAMIC QUANTILE REGRESSION TRANSFORMATION MODEL FOR LONGITUDINAL DATA

Quantile Regression: Inference

Reference Growth Charts for Saudi Arabian Children and Adolescents

Quantile Regression for Longitudinal Data

Beyond the Average Man: The Art of Unlikelihood

Quantile Regression for Dynamic Panel Data

Quantile Regression: Inference

ECAS Summer Course. Fundamentals of Quantile Regression Roger Koenker University of Illinois at Urbana-Champaign

Recovering Copulae from Conditional Quantiles

Quantile Regression: Past and Prospects

Quantile Regression: 40 Years On

Lecture 22 Survival Analysis: An Introduction

Quantile regression 40 years on

Motivational Example

mboost - Componentwise Boosting for Generalised Regression Models

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Lecture 5 Models and methods for recurrent event data

University of California, Berkeley

Modeling Real Estate Data using Quantile Regression

Issues on quantile autoregression

INFERENCE ON QUANTILE REGRESSION FOR HETEROSCEDASTIC MIXED MODELS

Dynamic Disease Screening

Quantile regression in varying coefficient models

Asymmetric least squares estimation and testing

Quantile Regression for Panel/Longitudinal Data

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Generalized additive modelling of hydrological sample extremes

The Faroese Cohort 2. Structural Equation Models Latent growth models. Multiple indicator growth modeling. Response profiles. Number of children: 182

Impact Evaluation Technical Workshop:

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

APPLICATIONS OF QUANTILE REGRESSION TO ESTIMATION AND DETECTION OF SOME TAIL CHARACTERISTICS YA-HUI HSU DISSERTATION

Approximation of Functions

An Introduction to Nonstationary Time Series Analysis

Part 6: Multivariate Normal and Linear Models

Modelling geoadditive survival data

Remedial Measures Wrap-Up and Transformations Box Cox

Stat 579: Generalized Linear Models and Extensions

Asymptotic standard errors of MLE

Hierarchical Modeling for Univariate Spatial Data

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Quantile Processes for Semi and Nonparametric Regression

arxiv: v2 [stat.me] 4 Jun 2016

Analysing geoadditive regression data: a mixed model approach

Hierarchical Modelling for Univariate Spatial Data

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Mixed models with correlated measurement errors

Quantile Regression: An Introductory Overview

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Modern Statistical Process Control Charts and Their Applications in Analyzing Big Data

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

,..., θ(2),..., θ(n)

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Quantile Regression Computation: From the Inside and the Outside

CTDL-Positive Stable Frailty Model

Quantile Regression Computation: From Outside, Inside and the Proximal

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Introduction to Regression

Introduction: Nonparametric series quantile regression

Unit Root and Cointegration

Sample Size and Power Considerations for Longitudinal Studies

Bayesian Multivariate Logistic Regression

Models for Clustered Data

Models for Clustered Data

Penalty Methods for Bivariate Smoothing and Chicago Land Values

Hierarchical Modelling for Univariate Spatial Data

Likelihood ratio testing for zero variance components in linear mixed models

Longitudinal Modeling with Logistic Regression

Modelling the Covariance

STAT 420: Methods of Applied Statistics

Course topics (tentative) The role of random effects

Lecture 7 Autoregressive Processes in Space

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

PEP-PMMA Technical Note Series (PTN 01) Percentile Weighed Regression

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Bayesian Linear Regression

Introduction to Functional Data Analysis A CSCU Workshop. Giles Hooker Biological Statistics and Computational Biology

Which Quantile is the Most Informative? Maximum Likelihood, Maximum Entropy and Quantile Regression

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

7.1 The Hazard and Survival Functions

Nuisance Parameter Estimation in Survival Models

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Combining Model and Test Data for Optimal Determination of Percentiles and Allowables: CVaR Regression Approach, Part I

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Supplementary Material to General Functional Concurrent Model

Introduction to Regression

Introduction to Regression

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Quantile Regression for Residual Life and Empirical Likelihood

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED.

Gaussian Copula Regression Application

Outline. Overview of Issues. Spatial Regression. Luc Anselin

A general mixed model approach for spatio-temporal regression data

Transcription:

Quantile Regression Methods for Reference Growth Charts 1 Roger Koenker University of Illinois at Urbana-Champaign ASA Workshop on Nonparametric Statistics Texas A&M, January 15, 2005 Based on joint work with: Ying Wei, Columbia Biostatistics Anneli Pere, Children s Hospital, Helsinki Xuming He, UIUC.

Quetelet s (1871) Growth Chart 2

Penalized Maximum Likelihood Estimation 3 References: Cole (1988), Cole and Green (1992), and Carey(2002) Data: {Y i (t i,j ) : j = 1,..., J i, i = 1,..., n.} Model: Z(t) = (Y (t)/µ(t))λ(t) 1 λ(t)σ(t) N (0, 1) Estimation: max l(λ, µ, σ) ν λ (λ (t)) 2 dt ν µ (µ (t)) 2 dt ν σ (σ (t)) 2 dt, l(λ, µ, σ) = n [λ(t i ) log(y (t i )/µ(t i )) log σ(t i ) 1 2 Z2 (t i )], i=1

Quantiles as Argmins 4 The τ th quantile of a random variable Y having distribution function F is: α(τ) = argmin ρ τ (y α)df (y) where ρ τ (u) = u (τ I(u < 0)).

Quantiles as Argmins 4 The τ th quantile of a random variable Y having distribution function F is: α(τ) = argmin ρ τ (y α)df (y) where ρ τ (u) = u (τ I(u < 0)). The τth sample quantile is thus: ˆα(τ) = argmin ρ τ (y α)df n (y) n = argmin n 1 ρ τ (y i α) i=1

Quantile Regression 5 The τth conditional quantile function of Y X = x is g(τ x) = argmin g G ρ τ (y g(x))df A natural estimator of g(τ x) is ĝ(τ x) = argmin g G n i=1 ρ τ (y i g(x i )) with G chosen as a finite dimensional linear space, g(x) = p ϕ j (x)β j. j=1

Choice of Basis 6 There are many possible choices for the basis expansion {ϕ j }. We opt for the (very conventional) cubic B-spline functions: 0 5 10 15 20 Age In R quantile regression models can be estimated with the command fit <- rq(y bs(x,knots=knots),tau = 1:9/10) Similar functionality in SAS is coming real soon now.

Data 7 Longitudinal measurements on height for 2514 Finnish children,

Data 7 Longitudinal measurements on height for 2514 Finnish children, 1143 boys, 1162 girls all healthy, full-term, singleton births,

Data 7 Longitudinal measurements on height for 2514 Finnish children, 1143 boys, 1162 girls all healthy, full-term, singleton births, About 20 measurements per child,

Data 7 Longitudinal measurements on height for 2514 Finnish children, 1143 boys, 1162 girls all healthy, full-term, singleton births, About 20 measurements per child, Two cohorts: 1096 born between 1959-61, 1209 born between 1968-72

Data 7 Longitudinal measurements on height for 2514 Finnish children, 1143 boys, 1162 girls all healthy, full-term, singleton births, About 20 measurements per child, Two cohorts: 1096 born between 1959-61, 1209 born between 1968-72 Sample constitutes 0.5 percent of Finns born in these periods.

8 100 90 Unconditional Reference Quantiles Boys 0 2.5 Years 0.97 0.9 0.75 0.5 0.25 0.1 0.03 Box Cox Parameter Functions λ(t) 2 1.5 1 0.5 Height (cm) 80 70 µ(t) 90 80 70 60 60 50 LMS edf = (7,10,7) LMS edf = (22,25,22) QR edf = 16 σ(t) 0.046 0.044 0.042 0.04 0.038 0.036 0.034 0 0.5 1 1.5 2 2.5 Age (years) 0 0.5 1 1.5 2 2.5 Age (years)

9

10 180 160 Unconditional Reference Quantiles Boys 2 18 Years 0.97 0.9 0.75 0.5 0.25 0.1 0.03 Box Cox Parameter Functions λ(t) 3 2.5 2 1.5 1 0.5 0 µ(t) Height (cm) 140 160 140 120 120 LMS edf = (7,10,7) σ(t) 100 0.055 100 LMS edf = (22,25,22) QR edf = 16 0.05 0.045 0.04 80 5 10 15 Age (years) 5 10 15 Age (years)

11

12 100 90 Unconditional Reference Quantiles Girls 0 2.5 Years 0.97 0.9 0.75 0.5 0.25 0.1 0.03 Box Cox Parameter Functions λ(t) 1.5 1 0.5 0 80 µ(t) 90 Height (cm) 70 80 70 60 60 LMS edf = (7,10,7) σ(t) 0.044 LMS edf = (22,25,22) 0.042 0.04 50 QR edf = 16 0.038 0.036 0.034 0 0.5 1 1.5 2 2.5 Age (years) 0 0.5 1 1.5 2 2.5 Age (years)

13

14 Unconditional Reference Quantiles Girls 2 18 Years 0.97 0.9 0.75 Box Cox Parameter Functions λ(t) 3 160 0.5 0.25 0.1 0.03 2 1 0 Height (cm) 140 120 µ(t) 160 140 120 100 LMS edf = (7,10,7) σ(t) 100 80 LMS edf = (22,25,22) QR edf = 16 0.046 0.044 0.042 0.04 0.038 0.036 5 10 15 Age (years) 2 4 6 8 10 12 14 16 Age (years)

Growth Velocity Curves Boys 0 2.5 years Girls 0 2.5 years τ Boys 2 18 years Girls 2 18 years 15 40 30 20 LMS (7,10,7) LMS (22,25,22) QR 0.1 8 6 4 2 10 0 40 8 30 20 0.25 6 4 2 Velocity (cm/year) 10 40 30 20 0.5 0 8 6 4 2 10 0 40 8 30 20 0.75 6 4 2 10 0 40 8 30 20 0.9 6 4 2 10 0 0.5 1 1.5 2 0.5 1 1.5 2 5 10 15 Age (years) 5 10 15

16 Estimated Age Specific Density Functions Age = 0.5 Age = 1 Age = 1.5 N= 235 N= 204 N= 67 0.15 Boys 0.10 0.05 Data QR LMS N= 215 N= 191 N= 68 0.00 0.15 0.10 Girls 0.05 0.00 65 70 75 70 75 80 85 75 80 85 90 Height (cm)

17

18 Height Density at Age 1 0.15 Boys 0.10 0.05 Pooled Born in 1960 Born in 1970 0.00 0.20 Girls 0.15 0.10 0.05 0.00 70 72 74 76 78 80 82 84 Height(cm)

Conditioning on Prior Growth 19 It is often important to condition not only on age, but also on prior growth and possibly on other covariates. Autoregressive models are natural, but complicated due to the irregular spacing of typical longitudinal measurements. Data: {Y i (t i,j ) : j = 1,..., J i, i = 1,..., n.} Model: Q Yi (t i,j )(τ t i,j, Y i (t i,j 1 ), x i ) = g τ (t i,j ) + [α(τ) + β(τ)(t i,j t i,j 1 )]Y i (t i,j 1 ) + x i γ(τ).

AR Components of the Infant Conditional Growth Model 20 τ Boys Girls ˆα(τ) ˆβ(τ) ˆγ(τ) ˆα(τ) ˆβ(τ) ˆγ(τ) 0.03 0.845 0.147 0.024 0.809 0.135 (0.020) (0.011) (0.011) (0.024) (0.011) 0.1 0.787 (0.020) 0.25 0.725 (0.019) 0.5 0.635 (0.025) 0.75 0.483 (0.029) 0.9 0.422 (0.024) 0.97 0.383 (0.024) 0.159 (0.007) 0.170 (0.006) 0.173 (0.009) 0.187 (0.009) 0.213 (0.016) 0.214 (0.016) 0.036 (0.007) 0.051 (0.009) 0.060 (0.013) 0.063 (0.017) 0.070 (0.017) 0.077 (0.018) 0.757 (0.022) 0.685 (0.021) 0.612 (0.027) 0.457 (0.027) 0.411 (0.030) 0.400 (0.038) 0.153 (0.007) 0.163 (0.006) 0.175 (0.008) 0.183 (0.012) 0.201 (0.015) 0.232 (0.024) 0.042 (0.010) 0.054 (0.009) 0.061 (0.008) 0.070 (0.009) 0.094 (0.015) 0.100 (0.018) 0.086 (0.027)

AR Components of the Childrens Conditional Growth Model 21 τ Boys Girls ˆα(τ) ˆβ(τ) ˆγ(τ) ˆα(τ) ˆβ(τ) ˆγ(τ) 0.03 0.976 0.036 0.011 0.993 0.033 (0.010) (0.002) (0.013) (0.012) (0.002) 0.1 0.980 (0.005) 0.25 0.978 (0.006) 0.5 0.984 (0.004) 0.75 0.990 (0.004) 0.9 0.987 (0.009) 0.97 0.980 (0.014) 0.039 (0.001) 0.042 (0.001) 0.045 (0.001) 0.047 (0.001) 0.049 (0.001) 0.050 (0.002) 0.022 (0.007) 0.021 (0.006) 0.019 (0.004) 0.014 (0.006) 0.012 (0.009) 0.023 (0.015) 0.989 (0.006) 0.986 (0.005) 0.984 (0.007) 0.985 (0.007) 0.984 (0.008) 0.982 (0.013) 0.039 (0.001) 0.042 (0.001) 0.045 (0.001) 0.050 (0.001) 0.052 (0.001) 0.053 (0.001) 0.006 (0.015) 0.008 (0.007) 0.019 (0.006) 0.022 (0.006) 0.016 (0.006) 0.002 (0.012) 0.021 (0.018)

Transformation to Normality 22 A presumed advantage of univariate (age-specific) transformation to normality is that once observations are transformed to univariate Z-scores they are automatically prepared to longitudinal autoregression: Z t = α 0 + α 1 Z t 1 + U t Premise: Marginal Normality Joint Normality

Transformation to Normality 22 A presumed advantage of univariate (age-specific) transformation to normality is that once observations are transformed to univariate Z-scores they are automatically prepared to longitudinal autoregression: Z t = α 0 + α 1 Z t 1 + U t Premise: Marginal Normality Joint Normality Of course we know it isn t true, but we also think we know that counterexamples are pathological, and don t occur in nature.

23 3 2 1 0 1 2 3 75 80 85 90 Boys age 6.5 Theoretical Quantiles Sample Quantiles 3 2 1 0 1 2 3 75 80 85 90 Boys age 6 Theoretical Quantiles Sample Quantiles 3 2 1 0 1 2 3 1.5 0.5 0.5 1.5 Boys AR(1) residuals Theoretical Quantiles Sample Quantiles 3 2 1 0 1 2 3 80 85 90 95 100 Girls age 6.5 Theoretical Quantiles Sample Quantiles 3 2 1 0 1 2 3 80 85 90 95 100 Girls age 6 Theoretical Quantiles Sample Quantiles 3 2 1 0 1 2 3 2.0 1.0 0.0 1.0 Girls AR(1) residuals Theoretical Quantiles Sample Quantiles

24

25 Height (cm) 110 120 130 140 150 Subject 89 Subject 225 0.97 0.9 0.75 0.5 0.25 0.1 0.03 6 7 8 9 10 Age (years)

26 1 Subject 89 Subject 225 1 Quantile 0.8 0.6 0.4 Unconditional Conditional Observed 0.8 0.6 0.4 0.2 0.2 0 120 125 130 135 140 145 Height (cm) 120 125 130 135 140 Height (cm) 0

Conclusions 27 Nonparametric quantile regression using B-splines offers a reasonable alternative to parametric methods for constructing reference growth charts.

Conclusions 27 Nonparametric quantile regression using B-splines offers a reasonable alternative to parametric methods for constructing reference growth charts. The flexibility of quantile regression methods exposes features of the data that are not easily observable with conventional parametric methods. Even for height data.

Conclusions 27 Nonparametric quantile regression using B-splines offers a reasonable alternative to parametric methods for constructing reference growth charts. The flexibility of quantile regression methods exposes features of the data that are not easily observable with conventional parametric methods. Even for height data. Longitudinal data can be easily accomodated into the quantile regression framework by adding covariates, including the use of autoregressive effects for unequally spaced measurements.