Quantile Regression Methods for Reference Growth Charts 1 Roger Koenker University of Illinois at Urbana-Champaign ASA Workshop on Nonparametric Statistics Texas A&M, January 15, 2005 Based on joint work with: Ying Wei, Columbia Biostatistics Anneli Pere, Children s Hospital, Helsinki Xuming He, UIUC.
Quetelet s (1871) Growth Chart 2
Penalized Maximum Likelihood Estimation 3 References: Cole (1988), Cole and Green (1992), and Carey(2002) Data: {Y i (t i,j ) : j = 1,..., J i, i = 1,..., n.} Model: Z(t) = (Y (t)/µ(t))λ(t) 1 λ(t)σ(t) N (0, 1) Estimation: max l(λ, µ, σ) ν λ (λ (t)) 2 dt ν µ (µ (t)) 2 dt ν σ (σ (t)) 2 dt, l(λ, µ, σ) = n [λ(t i ) log(y (t i )/µ(t i )) log σ(t i ) 1 2 Z2 (t i )], i=1
Quantiles as Argmins 4 The τ th quantile of a random variable Y having distribution function F is: α(τ) = argmin ρ τ (y α)df (y) where ρ τ (u) = u (τ I(u < 0)).
Quantiles as Argmins 4 The τ th quantile of a random variable Y having distribution function F is: α(τ) = argmin ρ τ (y α)df (y) where ρ τ (u) = u (τ I(u < 0)). The τth sample quantile is thus: ˆα(τ) = argmin ρ τ (y α)df n (y) n = argmin n 1 ρ τ (y i α) i=1
Quantile Regression 5 The τth conditional quantile function of Y X = x is g(τ x) = argmin g G ρ τ (y g(x))df A natural estimator of g(τ x) is ĝ(τ x) = argmin g G n i=1 ρ τ (y i g(x i )) with G chosen as a finite dimensional linear space, g(x) = p ϕ j (x)β j. j=1
Choice of Basis 6 There are many possible choices for the basis expansion {ϕ j }. We opt for the (very conventional) cubic B-spline functions: 0 5 10 15 20 Age In R quantile regression models can be estimated with the command fit <- rq(y bs(x,knots=knots),tau = 1:9/10) Similar functionality in SAS is coming real soon now.
Data 7 Longitudinal measurements on height for 2514 Finnish children,
Data 7 Longitudinal measurements on height for 2514 Finnish children, 1143 boys, 1162 girls all healthy, full-term, singleton births,
Data 7 Longitudinal measurements on height for 2514 Finnish children, 1143 boys, 1162 girls all healthy, full-term, singleton births, About 20 measurements per child,
Data 7 Longitudinal measurements on height for 2514 Finnish children, 1143 boys, 1162 girls all healthy, full-term, singleton births, About 20 measurements per child, Two cohorts: 1096 born between 1959-61, 1209 born between 1968-72
Data 7 Longitudinal measurements on height for 2514 Finnish children, 1143 boys, 1162 girls all healthy, full-term, singleton births, About 20 measurements per child, Two cohorts: 1096 born between 1959-61, 1209 born between 1968-72 Sample constitutes 0.5 percent of Finns born in these periods.
8 100 90 Unconditional Reference Quantiles Boys 0 2.5 Years 0.97 0.9 0.75 0.5 0.25 0.1 0.03 Box Cox Parameter Functions λ(t) 2 1.5 1 0.5 Height (cm) 80 70 µ(t) 90 80 70 60 60 50 LMS edf = (7,10,7) LMS edf = (22,25,22) QR edf = 16 σ(t) 0.046 0.044 0.042 0.04 0.038 0.036 0.034 0 0.5 1 1.5 2 2.5 Age (years) 0 0.5 1 1.5 2 2.5 Age (years)
9
10 180 160 Unconditional Reference Quantiles Boys 2 18 Years 0.97 0.9 0.75 0.5 0.25 0.1 0.03 Box Cox Parameter Functions λ(t) 3 2.5 2 1.5 1 0.5 0 µ(t) Height (cm) 140 160 140 120 120 LMS edf = (7,10,7) σ(t) 100 0.055 100 LMS edf = (22,25,22) QR edf = 16 0.05 0.045 0.04 80 5 10 15 Age (years) 5 10 15 Age (years)
11
12 100 90 Unconditional Reference Quantiles Girls 0 2.5 Years 0.97 0.9 0.75 0.5 0.25 0.1 0.03 Box Cox Parameter Functions λ(t) 1.5 1 0.5 0 80 µ(t) 90 Height (cm) 70 80 70 60 60 LMS edf = (7,10,7) σ(t) 0.044 LMS edf = (22,25,22) 0.042 0.04 50 QR edf = 16 0.038 0.036 0.034 0 0.5 1 1.5 2 2.5 Age (years) 0 0.5 1 1.5 2 2.5 Age (years)
13
14 Unconditional Reference Quantiles Girls 2 18 Years 0.97 0.9 0.75 Box Cox Parameter Functions λ(t) 3 160 0.5 0.25 0.1 0.03 2 1 0 Height (cm) 140 120 µ(t) 160 140 120 100 LMS edf = (7,10,7) σ(t) 100 80 LMS edf = (22,25,22) QR edf = 16 0.046 0.044 0.042 0.04 0.038 0.036 5 10 15 Age (years) 2 4 6 8 10 12 14 16 Age (years)
Growth Velocity Curves Boys 0 2.5 years Girls 0 2.5 years τ Boys 2 18 years Girls 2 18 years 15 40 30 20 LMS (7,10,7) LMS (22,25,22) QR 0.1 8 6 4 2 10 0 40 8 30 20 0.25 6 4 2 Velocity (cm/year) 10 40 30 20 0.5 0 8 6 4 2 10 0 40 8 30 20 0.75 6 4 2 10 0 40 8 30 20 0.9 6 4 2 10 0 0.5 1 1.5 2 0.5 1 1.5 2 5 10 15 Age (years) 5 10 15
16 Estimated Age Specific Density Functions Age = 0.5 Age = 1 Age = 1.5 N= 235 N= 204 N= 67 0.15 Boys 0.10 0.05 Data QR LMS N= 215 N= 191 N= 68 0.00 0.15 0.10 Girls 0.05 0.00 65 70 75 70 75 80 85 75 80 85 90 Height (cm)
17
18 Height Density at Age 1 0.15 Boys 0.10 0.05 Pooled Born in 1960 Born in 1970 0.00 0.20 Girls 0.15 0.10 0.05 0.00 70 72 74 76 78 80 82 84 Height(cm)
Conditioning on Prior Growth 19 It is often important to condition not only on age, but also on prior growth and possibly on other covariates. Autoregressive models are natural, but complicated due to the irregular spacing of typical longitudinal measurements. Data: {Y i (t i,j ) : j = 1,..., J i, i = 1,..., n.} Model: Q Yi (t i,j )(τ t i,j, Y i (t i,j 1 ), x i ) = g τ (t i,j ) + [α(τ) + β(τ)(t i,j t i,j 1 )]Y i (t i,j 1 ) + x i γ(τ).
AR Components of the Infant Conditional Growth Model 20 τ Boys Girls ˆα(τ) ˆβ(τ) ˆγ(τ) ˆα(τ) ˆβ(τ) ˆγ(τ) 0.03 0.845 0.147 0.024 0.809 0.135 (0.020) (0.011) (0.011) (0.024) (0.011) 0.1 0.787 (0.020) 0.25 0.725 (0.019) 0.5 0.635 (0.025) 0.75 0.483 (0.029) 0.9 0.422 (0.024) 0.97 0.383 (0.024) 0.159 (0.007) 0.170 (0.006) 0.173 (0.009) 0.187 (0.009) 0.213 (0.016) 0.214 (0.016) 0.036 (0.007) 0.051 (0.009) 0.060 (0.013) 0.063 (0.017) 0.070 (0.017) 0.077 (0.018) 0.757 (0.022) 0.685 (0.021) 0.612 (0.027) 0.457 (0.027) 0.411 (0.030) 0.400 (0.038) 0.153 (0.007) 0.163 (0.006) 0.175 (0.008) 0.183 (0.012) 0.201 (0.015) 0.232 (0.024) 0.042 (0.010) 0.054 (0.009) 0.061 (0.008) 0.070 (0.009) 0.094 (0.015) 0.100 (0.018) 0.086 (0.027)
AR Components of the Childrens Conditional Growth Model 21 τ Boys Girls ˆα(τ) ˆβ(τ) ˆγ(τ) ˆα(τ) ˆβ(τ) ˆγ(τ) 0.03 0.976 0.036 0.011 0.993 0.033 (0.010) (0.002) (0.013) (0.012) (0.002) 0.1 0.980 (0.005) 0.25 0.978 (0.006) 0.5 0.984 (0.004) 0.75 0.990 (0.004) 0.9 0.987 (0.009) 0.97 0.980 (0.014) 0.039 (0.001) 0.042 (0.001) 0.045 (0.001) 0.047 (0.001) 0.049 (0.001) 0.050 (0.002) 0.022 (0.007) 0.021 (0.006) 0.019 (0.004) 0.014 (0.006) 0.012 (0.009) 0.023 (0.015) 0.989 (0.006) 0.986 (0.005) 0.984 (0.007) 0.985 (0.007) 0.984 (0.008) 0.982 (0.013) 0.039 (0.001) 0.042 (0.001) 0.045 (0.001) 0.050 (0.001) 0.052 (0.001) 0.053 (0.001) 0.006 (0.015) 0.008 (0.007) 0.019 (0.006) 0.022 (0.006) 0.016 (0.006) 0.002 (0.012) 0.021 (0.018)
Transformation to Normality 22 A presumed advantage of univariate (age-specific) transformation to normality is that once observations are transformed to univariate Z-scores they are automatically prepared to longitudinal autoregression: Z t = α 0 + α 1 Z t 1 + U t Premise: Marginal Normality Joint Normality
Transformation to Normality 22 A presumed advantage of univariate (age-specific) transformation to normality is that once observations are transformed to univariate Z-scores they are automatically prepared to longitudinal autoregression: Z t = α 0 + α 1 Z t 1 + U t Premise: Marginal Normality Joint Normality Of course we know it isn t true, but we also think we know that counterexamples are pathological, and don t occur in nature.
23 3 2 1 0 1 2 3 75 80 85 90 Boys age 6.5 Theoretical Quantiles Sample Quantiles 3 2 1 0 1 2 3 75 80 85 90 Boys age 6 Theoretical Quantiles Sample Quantiles 3 2 1 0 1 2 3 1.5 0.5 0.5 1.5 Boys AR(1) residuals Theoretical Quantiles Sample Quantiles 3 2 1 0 1 2 3 80 85 90 95 100 Girls age 6.5 Theoretical Quantiles Sample Quantiles 3 2 1 0 1 2 3 80 85 90 95 100 Girls age 6 Theoretical Quantiles Sample Quantiles 3 2 1 0 1 2 3 2.0 1.0 0.0 1.0 Girls AR(1) residuals Theoretical Quantiles Sample Quantiles
24
25 Height (cm) 110 120 130 140 150 Subject 89 Subject 225 0.97 0.9 0.75 0.5 0.25 0.1 0.03 6 7 8 9 10 Age (years)
26 1 Subject 89 Subject 225 1 Quantile 0.8 0.6 0.4 Unconditional Conditional Observed 0.8 0.6 0.4 0.2 0.2 0 120 125 130 135 140 145 Height (cm) 120 125 130 135 140 Height (cm) 0
Conclusions 27 Nonparametric quantile regression using B-splines offers a reasonable alternative to parametric methods for constructing reference growth charts.
Conclusions 27 Nonparametric quantile regression using B-splines offers a reasonable alternative to parametric methods for constructing reference growth charts. The flexibility of quantile regression methods exposes features of the data that are not easily observable with conventional parametric methods. Even for height data.
Conclusions 27 Nonparametric quantile regression using B-splines offers a reasonable alternative to parametric methods for constructing reference growth charts. The flexibility of quantile regression methods exposes features of the data that are not easily observable with conventional parametric methods. Even for height data. Longitudinal data can be easily accomodated into the quantile regression framework by adding covariates, including the use of autoregressive effects for unequally spaced measurements.