Simultaneous Confidence Bands for the Coefficient Function in Functional Regression

Similar documents
Regression When the Predictors Are Images

SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING

Function-on-Scalar Regression with the refund Package

Generalized Additive Models

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

Model checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Analysis Methods for Supersaturated Design: Some Comparisons

Outline of GLMs. Definitions

Generalized Linear Models

mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K.

Semiparametric Methods for Mapping Brain Development

Exam Applied Statistical Regression. Good Luck!

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Linear Regression Models P8111

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Generalized linear models

Resampling-Based Information Criteria for Adaptive Linear Model Selection

Theorems. Least squares regression

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Experimental Design and Data Analysis for Biologists

Lawrence D. Brown* and Daniel McCarthy*

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Generalized Linear Models

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Nonparametric Small Area Estimation Using Penalized Spline Regression

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Interaction effects for continuous predictors in regression modeling

Statistical Distribution Assumptions of General Linear Models

Binary choice 3.3 Maximum likelihood estimation

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Geographically Weighted Regression as a Statistical Model

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Proteomics and Variable Selection

Simple logistic regression

Statistics 262: Intermediate Biostatistics Model selection

Covariate Balancing Propensity Score for General Treatment Regimes

ISyE 691 Data mining and analytics

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Functional SVD for Big Data

Jianhua Z. Huang, Haipeng Shen, Andreas Buja

Regularization Methods for Additive Models

Using prior knowledge in frequentist tests

APTS course: 20th August 24th August 2018

Multinomial functional regression with application to lameness detection for horses

Semiparametric Generalized Linear Models

Regression, Ridge Regression, Lasso

Marginal Screening and Post-Selection Inference

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Classification. Chapter Introduction. 6.2 The Bayes classifier

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Resampling-Based Information Criteria for Best- Subset Regression

A Confidence Region Approach to Tuning for Variable Selection

Neural networks (not in book)

A Modern Look at Classical Multivariate Techniques

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Generalized additive modelling of hydrological sample extremes

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Remedial Measures for Multiple Linear Regression Models

Linear Models in Machine Learning

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6

Statistical Machine Learning I

Semi-Nonparametric Inferences for Massive Data

Modelling Survival Data using Generalized Additive Models with Flexible Link

Modelling with smooth functions. Simon Wood University of Bath, EPSRC funded

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

COMPARING PARAMETRIC AND SEMIPARAMETRIC ERROR CORRECTION MODELS FOR ESTIMATION OF LONG RUN EQUILIBRIUM BETWEEN EXPORTS AND IMPORTS

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Resampling-based information criteria for best-subset regression

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Model Selection. Frank Wood. December 10, 2009

Chapter 10. Additive Models, Trees, and Related Methods

Open Problems in Mixed Models

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals

Akaike Information Criterion

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Bayesian non-parametric model to longitudinally predict churn

MS&E 226: Small Data

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography

Checking, Selecting & Predicting with GAMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Lecture 3: Statistical Decision Theory (Part II)

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Generalized Linear Models. Kurt Hornik

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Multivariate Bernoulli Distribution 1

Markov Chain Monte Carlo methods

Semi-parametric estimation of non-stationary Pickands functions

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Towards a Regression using Tensors

Exercise 5.4 Solution

Inference For High Dimensional M-estimates. Fixed Design Results

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Transcription:

University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available at: https://works.bepress.com/phil_reiss/5/

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss New York University phil.reiss@nyumc.org Joint Statistical Meetings Denver, Colorado August 7, 2008 Joint work with Todd Ogden, Columbia Research supported in part by grant number 5F31MH073379-02, NIMH

A motivating example: Serotonin receptors in depression Major depressive disorder affects 9.5% of the U.S. population each year. Serotonin (5-HT), a neurotransmitter, is believed to play an important role in the disorder, mediated in part by distribution of 5-HT 1A receptors in the brain. 5-HT 1A receptor binding potential (BP), a measure of the receptors availability, can be mapped at each point of the brain via PET imaging. Question: How can we relate these BP maps to a depression-related outcome such as Hamilton Depression Score (HAM-D)? 2

Idea: regress HAM-D (scalar) on BP (image) y i = α + s T i f + ε i 3

Functional principal component regression We model a vector y of n scalar responses as y = Xα + Sf + ε where X is an n p covariate matrix S is an n N matrix: each row represents a signal predictor defined at points v 1,..., v N, and each column has mean zero ε denotes iid errors Since n N, Reiss and Ogden (2007) propose a double restriction of the coefficient function: assume f = BV q β where B is an N K matrix whose columns form a radial B-spline basis V q is a K q matrix whose columns are the leading columns of V, given the singular value decomposition SB = UDV T. In other words, the (second) design matrix in the restricted model y = Xα + SBV q β + ε comprises the first q principal components of SB. 4

The coefficient function estimate ˆf = BV q ˆβ is obtained by taking ( ˆα, ˆβ) which minimize the penalized least squares criterion y Xα SBV q β 2 + λβ T V T q P V q β, where P is chosen so that Z Z " «β T V T 2 2 «f 2 2 f q P V q β + 2 + x 1 x 2 x 2 1 2 f x 2 2 «2 # dx 1 dx 2, a measure of the roughness of f. The choice of λ governs the tradeoff between fidelity to the data and smoothness of the coefficient function (higher λ smoother). Automatic criteria for choosing λ are generalized cross-validation (Craven and Wahba, 1979) and restricted maximum likelihood (Ruppert, Wand, and Carroll, 2003). 5

Why not just use SPM- or FSL-type modeling? Standard mass-univariate approach regresses s on y, separately at each voxel. 1. If s is believed to cause y (e.g., brain chemistry psychiatric state), our model may be more appropriate. But if not (e.g., y=gender), mass-univariate modeling makes more sense. 2. Regressing y on s produces subject-specific predictions. 6

Extension to generalized linear models Functional principal component regression can be extended to generalized linear models g(µ) (g(µ 1 )... g(µ n )) T = Xα + Sf where g is an appropriate link function. Again we apply the restriction f = BV q β to obtain, e.g., the logistic model ( ) exp[(xα + SBV q β) i ] y i Bernoulli. 1 + exp[(xα + SBV q β) i ] 7

In the linear case we have a closed-form solution 0 1 2 @ ˆαˆβ A = 4 XT X X T Z Z T X Z T Z + λv T q P V q where Z = SBV q. 3 1 0 5 @ XT y Z T y 1 A, For GLMs we use penalized iteratively reweighted least squares: 0 1 2 3 @ ˆα(k) ˆβ (k) A = 4 XT W X X T 1 0 W Z 5 @ XT W z Z T W X Z T W Z + λv T q P V q Z T W z 1 A, where W is the weight matrix and z is the working dependent variable vector, both defined as for unpenalized GLMs. The penalized IRLS procedure can often fail to converge, and may be numerically unstable even when it does converge due to numerical rank deficiency of the design matrix. Fortunately, the same problems have been largely solved for fitting of generalized additive models (GAMs) by penalized IRLS (Wood, 2004). The R package mgcv (Wood, 2006), used mainly for GAMs, can also fit functional regression models such as FPCR (as of version 1.4.0). 8

Basic idea: Nonparametric bootstrap-based simultaneous confidence bands for f 1. Draw B random samples with replacement of n data points (y, x, s ) from the n observations. 2. Obtain estimates ˆf 1,..., ˆf B of the coefficient function. 3. Use these to form (say) 95% simultaneous confidence bands ( ˆf (L), ˆf (U) ) for f. 4. For all v such that ˆf (L) (v) > 0 the coefficient function is declared significantly positive at v. 9

Oversimplified example -2-1 0 1 2 3 10

Is this really a form of hypothesis testing? A more standard form of simultaneous inference for function estimates is to form null bands (simulated based on the null hypothesis f = 0) and see whether an observed function estimate stays within them. But Buja and Rolke (2007) argue for extending the concept of adjusted p-values to a general class of resampling-based bands, including simultaneous confidence bands of the kind considered here. 11

In reality, the function estimates cross over each other, so choosing the 95% most central ones is more complicated. Mandel and Betensky (2008): Order the B bootstrap estimates ˆf (1) (v)... ˆf (B) (v) at each point v, then define (for k 1) E(k) = Q v [ ˆf (k) (v), ˆf (B+1 k) (v)], the Cartesian product of symmetric 100(1 2k )% pointwise CIs at each point. E.g., for k = 2: B+1 12

Letting E b (k) be the envelope formed by all except the bth resample, the non-coverage rate of E(k) (i.e., 1 minus the simultaneous coverage rate) is estimated as the proportion of the ˆf b which exit E b(k) at some point. This can be calculated as the proportion of b {1,..., B} such that, for some v, the rank of ˆf b (v) among ˆf 1 (v),..., ˆf B (v) is k or B + 1 k. Problem: For high-dimensional, noisy function estimates, even E(1) may have less than 95% simultaneous coverage! 13

To surmount this difficulty we propose to stretch the bands by a constant factor. Define E c = Y v [ ˆf(v) + c{ ˆf (1) (v) ˆf(v)}, ˆf(v) + c{ ˆf (B) (v) ˆf(v)}]. The modified confidence band is then E c α, where c α > 1 is the smallest constant such that at most 100α% of the ˆf b exit Ec α b. 14

Another concern: smoothing parameter selection Automatic smoothing parameter selection methods tend to undersmooth when applied to case-resampled data sets. This can result in confidence bands that are too wide. To correct for this, we have developed a modified generalized cross-validation criterion accounting for replicate observations, and an iterative algorithm to maximize this criterion using standard software by multiplying the degrees of freedom by a data-dependent factor. For simulations such as those reported here, it seems best just to multiply the df by 1.3. 15

Simulation study To test the performance of our simultaneous inference procedure for logistic functional principal component regression, we used maps of binding potential of 5-HT 1A receptors obtained by Parsey et al. (2006) from 27 subjects with major depressive disorder and 41 controls. We simulated binary outcomes " # exp(s T y i Bernoulli i f) 1 + exp(s T i f) for i = 1,..., 68, where s i denotes subject i s binding potential map (1 slice, 5778 voxels), and f = kf 0. Here f 0 is the artificial coefficient function on the next slide, and k is chosen to attain specified values of the coefficient of determination defined for logistic regression as RL 2 = log(l 0) log(l M ), log(l 0 ) where L M is the likelihood of the given model while L 0 is the likelihood of the model containing only an intercept (Menard, 2000). 16

0.5 0 0.5 1 17

Simulation study, continued For RL 2 =.5,.6,.7,.8,.9, we simulated 20 sets of 68 binary outcomes» exp(s T y i Bernoulli i f) 1 + exp(s T i f) with the true coefficient function f = kf 0 attaining the given R 2 L value. For each set of outcomes, we estimated f by logistic functional principal component regression with 35 PCs, and formed simultaneous confidence bands from 1999 bootstrap resamples. Smoothing parameter chosen by approximate corrected AIC (Hurvich, Simonoff, and Tsai, 1998), with additional correction for bootstrapping. Using these bands, we determined how many times (out of 20) each voxel was significantly positive (or negative). 18

R^2 = 0.5 R^2 = 0.6 R^2 = 0.7 R^2 = 0.8 R^2 = 0.9 Color scale: Times (out of 20) found signific 0 5 10 15 20 19

Discussion Regressing scalar outcomes on entire images is a very challenging problem, but our method seems to do well at detecting salient (f 0) regions. Extension to 3D images will require tackling several practical issues. Another extension: locally adaptive smoothing parameter. Ongoing work (Todd Ogden and Yihong Zhao) uses wavelets rather than splines, and seeks a sparse representation of the coefficient function via thresholding. 20

Thank you! 21

References [1] Buja, A., and Rolke, W. (2007). Calibration for simultaneity: (re)sampling methods for simultaneous inference with applications to function estimation and functional data. Under revision. [2] Craven, P., and Wahba, G. (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik 31, 377 403. [3] Hurvich, C. M., Simonoff, J. S., and Tsai, C.-L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society, Series B 60, 271 293. [4] Mandel, M., and Betensky, R. A. (2008). Simultaneous confidence intervals based on the percentile bootstrap approach. Computational Statistics and Data Analysis 52, 2158 2165. [5] Menard, S. (2000). Coefficients of determination for multiple logistic regression analysis. The American Statistician 54, 17 24. [6] Parsey, R. V., Oquendo, M. A., Ogden, R. T., Olvet, D. M., Simpson, N., Huang, Y., Van Heertum, R. L., Arango, V., and Mann, J. J. (2006). Altered serotonin 1A binding in major depression: A [carbonyl-c-11]way100635 positron emission tomography study. Biological Psychiatry 59, 106 113. [7] Reiss, P. T., and Ogden, R. T. (2007). Functional principal component regression and functional partial least squares. Journal of the American Statistical Association 102, 984 996. [8] Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric Regression. Cambridge and New York: Cambridge University Press. [9] Wood, S. N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association 99, 673 686. [10] Wood, S. N. (2006). Generalized Additive Models: An Introduction with R. Boca Raton, FL: Chapman and Hall/CRC. 22