Wavelet-Based Functional Mixed Models

Size: px

Start display at page:

Download "Wavelet-Based Functional Mixed Models"

Amberly Bell
6 years ago
Views:

3//008 http://biostatistics.mdanderson.org/morris Wavelet-Based Functional Mied Models Jeffrey S.

1 3//008 Wavelet-Based Functional Mied Models Jeffrey S. Morris Department of Biostatistics The University of Teas MD Anderson Cancer Center Rice University /7/008

2 3//008 Linear Mied Models: General Linear Model Suppose we have N data points y i, i,, N that we would like to eplain by a set of p eplanatory variables ij ; i,, N; j,, p. The ij may be continuous variables, or discrete (dummy) variables. A General Linear Model is given by: p y b + i ij j i j In many applications, we assume the errors are independent and identically distributed, so e i ~N(0,σ e ) e

3 3// Linear Mied Models: General Linear Model This form is very general, with different specifications of ij corresponding to various classical statistical methods Continuous: Linear regression Dummy variables ( groups): t-test Dummy variables (> groups): ANOVA Combination of continuous and dummy variables: Analysis of Covariance Note: b j are partial effects; i.e. describe the effect of predictor ij after adjusting for all other effects in the model

4 Linear Mied Models: General Linear Model This model can be written in matri form: y Xb + e e ~ N (0, σ e I N ) Where y is an N response vector, X is an N p design matri, and b is a p vector of fied effects Typical goal: estimate and perform inference on fied effect vector b The form of the design matri, X, indicates the type of modeling being done 3//

5 3// Linear Mied Models: Eamples Simple linear regression To perform a simple linear regression of Y on, specify: 0, b b b X N M M b 0 Intercept b slope

6 3// Linear Mied Models: Eamples Polynomial Regression: To perform a polynomial regression of Y on, specify: p p N N N p p b b b b b X M L M O M M M L L 0, ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( b 0 Intercept, b j polynomial level j coefficient Note: Centering reduces collinearity

7 3// Linear Mied Models: Eamples Multiple linear regression To perform multiple regression of y on {,,, p }: p pn N N p p b b b b X M L M O M M M L L 0, b 0 Intercept b, b,, b p partial slopes for factors,, p

8 3// Linear Mied Models: Eamples T-tests: Define dummy variable (categorical factor) if group 0, - if group 0, b b b X N M M b 0 Intercept (overall mean of y) b Main effect for group (0 if no group effect) Note: F-test for H 0 :b 0 is equivalent to t What if there are > groups we want to compare? - way ANOVA

9 3// Linear Mied Models: Eamples ANOVA (Cell Means model): (g groups) Let ij if subject i in group j, and no intercept g Ng N N g g b b b b X M L M O M M L L, b j : overall mean for group j b j b j implies groups j and j share common mean Contrasts can be used to set up comparisons of interest

10 3// Linear Mied Models: Eamples A p vector c{c, c,,c g } specifies a contrast g j j j j c b c g j 0 if, L Specifies a linear combination of fied effects that indicate a certain comparison of interest. E.g. Can be used with cell means model to test difference between groups j and j : c j, c j - Given set of l contrasts of interest, can construct l g contrast matri C Main Effect(ANOVA): L M M O M M L L C

11 3//008 Linear Mied Models: Eamples Analysis of Covariance (ANOCOVA) Cell Means model: (g groups, p covariates) Let ij if subject i in group j, j,, g Let ij covariate (j-g+) for subject i, jg+, g+p p g g g p g N g N Ng N p g g g p g g g b b b b b X M M L L M O M M O M L L L L ) ( ) ( ) ( ) ( ) ( ) (, b j, j,, g: separate intercepts by group j b j, jg+,, g+p: partial slopes for p covariates

12 3//008 Linear Mied Models: Eamples What if we have multiple categorical factors we would like to simultaneously consider? -way ANOVA: Includes main effects for factors and, plus their -way interaction Interaction: The effect of factor is different for different levels of factor (and vice versa) Again, different parameterizations can be used Overparameterized model: (messy!) Main effect/interaction model Cell means model, with contrasts Can also interact continuous factors with categorical factors e.g. growth curves with intercepts and slopes varying by group

13 3// Linear Mied Models: Eamples Cell Means model: (gg g groups) Let ij if subject i in group j, and no intercept g Ng N N g g b b b b X M L M O M M L L, b j : overall mean for group j b j b j implies groups j and j share common mean Contrasts can be used to set up comparisons of interest, e.g. main effect and interactions

14 3// Linear Mied Models: Eamples C [ ] C Suppose factor has levels, and factor has 3 levels, with b[b, b, b 3, b, b, b 3 ] C L C b: Main effect for factor L C b: Main effect for factor L C b: -way interaction effect

15 Linear Mied Models: Eamples General Linear Model: y Xb + e e ~ N (0, σ e I N ) What if errors are not iid? Subsampling (multiple randomly sampled observations per subject) Longitudinal Data (multiple observations per subject over time, so observations are ordered) Solutions: Assume parametric error structure on e~n(0,r) Include random effect terms in the model. 3//

16 3// Linear Mied Models: Eamples Simple Random Effects model: Suppose we observe r repeated measurements of response y for each of n subjects (total of Nrn obs.) Let y[y, y,, y r, y,, y r,, y nr ] Let X be a design matri for the fied effects b Define an N n random effects design matri Z to have elements z ij if obs i from subject j, 0 otherwise. A random effects model can be written as: y Xb + Zu + e u e ~ ~ N(0, σ N(0, σ u e I I n r ) )

17 3// Linear Mied Models: Eamples y Xb + Zu + e u e ~ ~ N N (0, σ (0, σ u e I I n r ) ) u k random effect for subject k: E(y subject k)xb+u k σ u : describes subject-to-subject variability σ e : describes variability between observations within subject If we integrate the random effects u out of the model, thus marginalizing over u, y Xb + e e ~ N (0, Σ) Where Σσ u ZZ + σ e I N has block compound symmetry structure

18 3// Linear Mied Models: Eamples Block Compound Symmetry blocks blocks n r r e e e e e e e e e e e e e e e e e e e e e e e e e e e Σ L M O L M M L σ ρσ ρσ ρσ σ ρσ ρσ ρσ σ σ ρσ ρσ ρσ σ ρσ ρσ ρσ σ σ ρσ ρσ ρσ σ ρσ ρσ ρσ σ n diagonal blocks of size r r, each having compound symmetry Note: σ e σ e + σ u is the total variability and ρ σ u /σ e + σ u is the intraclass correlation

19 3// Linear Mied Models: Eamples General Linear Mied Model { { } { } { + + N m m N p p N N e u Z b X y } { ) 0, ( ~ ) 0, ( ~ N N m m R N e P N u Var(y)ΣZPZ +R Most common applications use independent random effects and residual errors, Pσ u I m and Rσ e I N Other possible covariance structures can also be used for P and R, as needed best linear unbiased predictor (BLUP) ˆ) ( ' ˆ generalized least squares estimator ' ) ' ( ˆ Σ Σ Σ Xb y PZ u y X X X b

20 3// Linear Mied Models: Eamples Fitting Linear Mied Models Standard software packages are available for fitting linear mied models PROC MIXED (SAS) lme/nlme (S+/R) Based on EM or Newton-Raphson algorithms MLE for fied effect functions (GLS) REML for covariance parameters Straightforward to perform frequentist inference on fied effect functions, or linear combinations (contrasts) F tests Trickier to test whether VC0 but recent work in this area Can also be fit using Bayesian hierarchical model If standard improper priors (uninformative) are specified for fied effects b and covariance parameters, it can be shown that the posterior mode Bayesian estimators are equivalent to MLE frequentist estimators, and the confidence/credible bands are similar

21 Functional Data Analysis Functional Data Analysis (Ramsay&Silverman 997) Methods for data where ideal units of observation are curves Sometimes distinguish between methods for sparsely sampled functional data (few obs/curve, longitudinal) and functions sampled on fine grid (many obs/curve, functional) Our main concern here is on functional data on fine grid Challenge of Functional Data Analysis: Must simultaneously consider regularization and replication Regularization: take advantage of functional structure to borrow strength from adjacent observations within curve Replication: combining information across sample curves to make inferences on population from which they came 3//008

22 3//008 Functional Mied Model (FMM) Idea: Relate functional response to set of scalar predictors through functional coefficients, while adjusting for possible correlation between functions induced by design. Suppose we observe a sample of N curves, Y i (t), i,, N, on a closed interval T Y { i ( t) X ij B j ( t) + Zik U k ( t) + E ( t) E { { i 3 response functions p j fied effect function m k functions random effect U functions residual error k i ( t) ~ GP(0, Q) ( t) ~ GP(0, S) B j (t) summarizes partial effect of X j on Y(t) Q(t,t ) and S(t,t ) are covariance surfaces on T T describing the form of the function-function deviations

23 Discrete Version of FMM Suppose each observed curve is sampled on a common equally-spaced grid of length T. Y{ N N p } X { B + } Z U{ + { E T p T m T N T Rows of B contain fied effect functions on grid Q and S are within-curve covariance matrices (T T) approimating surfaces on the grid N For irregular functional data, Q and S typically contain many nonstationarities, yet their dimension is too high to leave unstructured 3// m U E k i ~ ~ MVN (0, Q) MVN (0, S)

24 3// FDA: Functional Mied Models Growth curve eample: Goal: Assess effect of gene A on tumor growth N0 mice, ½ from knockout and ½ from wild type groups Xenograft model: microscopic tumor implanted Tumor measurements at T6 time points t{0,3,6,9,,5} Y i (t)log(tumor size) profile for animal i Y i (t) X ijb j(t) + j Cell Means Model Used X i if animal i from knockout group, 0 otherwise X i if animal i from wild type group, 0 otherwise Test: H 0 :B (t)b (t) E (t) i E i { 0, S( t, t )} ( t) ~ N

intercepts (tumor size0 at time0 by design) B j (t)b j t; E i (t)u i t+e i ; u i ~N(0,σ u ); e i ~N(0,σ e I 6 ) S t t σ u +σ e I 6, e.g. if σ u 0.

25 3// FDA: Functional Mied Models Growth curves are (log) linear, so we can represent functions with parametric form (lines) No intercepts (tumor size0 at time0 by design) B j (t)b j t; E i (t)u i t+e i ; u i ~N(0,σ u ); e i ~N(0,σ e I 6 ) S t t σ u +σ e I 6, e.g. if σ u 0., σ e : S ρ S

26 3// FDA: Functional Mied Models Mied model in matri form: Let y{y k, k,,nt}, X k if obs k from knockout animal, X k if obs k from wild type and Z kl if obs k from animal l y Xb + Zu + e u ( ) ( 0, I e ~ N 0, I ) ~ N σ σ u N e NT X X X X M NT t t t NT X X X M NT t t t NT b C b b [ ] u u u M u m

27 3// FDA: Functional Mied Models What if there is no acceptable parametric representation for the observed functions? Nonparametric representations for functions: generalizations of nonparametric regression to multiple function case. Much recent work in this area. Limitations of previous work: Most focus on smoothing methods like kernels and splines Much intended for functional data on sparse, not fine, grids Most models not fully general FMM (arbitrary X matri, nested random effects, general representations for Q or S) Many focus on estimation, not inference Most of this work works with models that are special cases of the functional mied model.

28 Applications: SELDI Organ-by-Cell-Line 6 mice had of cancer cell lines injected into of organs (lung or brain) Cell lines: A375P: human melanoma, low metastatic potential PC3MM: human prostate, highly metastatic Blood serum etracted and placed on SELDI chip Run at different laser intensities (low/ high) Total of 3 spectra (observed functions), per mouse Observations on equally-spaced grid of //

29 Applications: SELDI Organ-by-Cell-Line Goal: Find proteins differentially epressed by: Host organ site (lung/brain) Donor cell line (A375P/PC3MM) Organ-by-cell line interaction Combine information across laser intensities: Requires us to include in modeling: Functional laser intensity effect Random effect functions to account for correlation between spectra from same mouse 3//

30 3// Applications: SELDI Organ-by-Cell-Line MALDI-TOF Spectrum: observed function g(t) intensity of spectrum at m/z value t Intensity at peak (roughly) estimates the abundance of some protein with molecular weight of t Daltons

31 3// FDA: Functional Mied Model Let Y i (t) be the SELDI spectrum i { Y ( t) } log i B0 ( t) + XijBj( t) + ZikUk( t) + Ei ( t) X i for lung, - brain. X i for A375P, - for PC3MM X i3 X X 4 j 6 k X i4 for low laser intensity, - high. B 0 (t) overall mean spectrum B (t) organ main effect function B (t) cell-line main effect B 4 (t) laser intensity effect function B 3 (t) org cell-line int function Z ik if spectrum i is from mouse k (k,, 6) U k (t) is random effect function for mouse k.

32 3// Functional Mied Model (FMM) Idea: Relate functional response to set of scalar predictors through functional coefficients, while adjusting for possible correlation between functions induced by design. Suppose we observe a sample of N curves, Y i (t), i,, N, on a closed interval T Y { i ( t) X ij B j ( t) + Zik U k ( t) + E ( t) E { { i 3 response functions p j fied effect function m k functions random effect U functions residual error k i ( t) ~ GP(0, Q) ( t) ~ GP(0, S) B j (t) summarizes partial effect of X j on Y(t) Q(t,t ) and S(t,t ) are covariance surfaces on T T describing the form of the function-function deviations

33 3// Wavelet-Based Functional Mied Models We would like to fit functional mied models to data like in our motivating eamples (high dim., spiky) Need to find way to build fleibility into Q and S, yet be parsimonious enough to fit to large data sets. Need to find a way to accomplish adaptive regularization for B j (t) and U j (t) Need to find way to compute this model that is computationally efficient enough to apply to etremely large data sets (00 s-000 s of curves on grids of 0,000 or more) Want to be able to perform inference, not just estimation Key: Use of wavelet space instead of data space to do our modeling.

34 Wavelet-Based Functional Mied Models Idea: Represent functions by their wavelet epansion Basis function approach: y(t)σd jk ψ jk (t) Benefits of Using Wavelet Bases. Compact support allows efficient representations of local features. Whitening property allows parsimonious yet fleible representations of Q and S 3. Decomposes function in both frequency and time domains Enables mechanism for adaptive regularization of functions 4. Orthonormal transformation has linear representation, and special structure allows fast calculation of coefficients. 3//

35 3// WFMM: General Approach. Project observed functions Y into wavelet space.. Fit FMM in wavelet space. (Use MCMC to get posterior samples) 3. Project wavelet-space estimates (posterior samples) back to data space.

36 3// WFMM: General Approach. Project observed functions Y into wavelet space.. Fit FMM in wavelet space (Use MCMC to get posterior samples) 3. Project wavelet-space estimates (posterior samples) back to data space.

37 3// WFMM: Projecting to Wavelet Space. Project observed functions Y to wavelet space After choice of () wavelet basis, () boundary correction method, and (3) number of decomposition levels J, apply DWT to rows of Y to get wavelet coefficients corresponding to each observed function { D Y{ W{ ' N T N T T T Projects the observed curves into the space spanned by the wavelet bases (rotate your coordinate aes according to wavelet bases).

38 3// WFMM: Projecting to Wavelet Space The matri D that results contains the observed wavelet coefficients for our data set { D N T K coeff K coeff K coeff K J } } } } c, d, d, L d, J c, d, d, L d, J M M M O M cn, d N, d N, L d N, J coeff With T J, Haar wavelets, and full decomposition, we have K j j-. Otherwise not quite true because of adjustments for T J and boundary correction from other wavelet bases with wider support

39 3// WFMM: General Approach. Project observed functions Y into wavelet space.. Fit FMM in wavelet space (Use MCMC to get posterior samples) 3. Project wavelet-space estimates (posterior samples) back to data space.

40 3// WFMM: Wavelet Space Model Y{ } X N p { B + N } Z m U{ + { E N T p T m T N T Wavelet Representations YDW BBW UUW EEW

41 3// WFMM: Wavelet Space Model { DW } X N p B{ W + N } Z m U3 W + 3 E W N T p T m T N T Wavelet Representations YDW BBW UUW EEW

42 3// WFMM: Wavelet Space Model { DW W N T ' } X N p B{ W W p T ' + N } Z m U3 W W m T ' + 3 E W W N T ' Wavelet Representations YDW BBW UUW EEW WW I

43 3// WFMM: Wavelet Space Model { D } X N p { B + N } Z m U{ + { E N T p T m T N T Wavelet Representations YW D BW B UW U EW E WW I

44 3// Wavelet Space FMM D : empirical wavelet coefficients for observed curves Row i contains wavelet coefficients for observed curve i Each column double-indeed by wavelet scale j and location k { D } X N p { B + N } Z m U{ + { E N T p T m T N T U E k i ~ MVN(0, Q ~ MVN(0, S ) ) B BW & U UW : Rows contain wavelet coefficients for the fied and random effect functions, respectively E EW is the matri of wavelet-space residuals

45 3// WFMM: Covariance Assumptions We assume the between-wavelet covariance matrices Q and S are diagonal (Qdiag{q jk },Sdiag{s jk }). Wavelet coefficients within given function modeled as independent Heuristically justified by whitening property of DWT Common working assumption in wavelet regression settings Is parsimonious in wavelet space (T parameters), yet leads to fleible class of nonstationary covariance structures in data space Key: We allow variances to vary by scale j and location k Other alternatives possible (parent-sib correlation, block thresholding), but add more computational compleity. We have found independence to be very fleible: limits form of covariance matrices, but enough parameters and degrees of freedom to capture key nonstationary features of these matrices that are typical of our data

46 WFMM: Covariance Assumptions Whitening: autocorrelation of wavelet coefficients tends to die away rapidly across k; d jk approimately uncorrelated (Johnstone and Silverman 997) 3//

47 WFMM: Covariance Assumptions Whitening: autocorrelation of wavelet coefficients tends to die away rapidly across k; d jk approimately uncorrelated (Johnstone and Silverman 997) 3//

48 3// WFMM: Covariance Assumptions True mean: line plus peak True variance: increasing in t, with etra var at peak True autocorrelation: Strong autocorrelation (0.9) at left, weak autocorrelation (0.) right, etra at peak

49 3// WFMM: Covariance Assumptions Independence in wavelet space accommodates varying degrees of autocorrelation in data space Allowing variance components to vary across scale j and location k accommodates nonstationarities

3//008 http://biostatistics.mdanderson.org/morris 50 WFMM: Covariance Assumptions Most wavelet regression methods (Fan and Lin 998, Morris, et al.

50 3// WFMM: Covariance Assumptions Most wavelet regression methods (Fan and Lin 998, Morris, et al. 003, Abramovich and Angelini 007, Antoniadis and Sapatinas 007) only inde variances by scale j, but not location k. Not fleible enough to capture nonstatioinary covariance features Unnecessary restriction in multiple function case, since replicate functions allow estimation of separate VC for each (j,k)

51 3// WFMM: Wavelet-Space Model Key benefit of independence assumption: With Q,S diagonal, the columns of the WFMM are now unlinked, and can be modeled separately. A set of T regular linear mied model instead of one HUGE functional mied model. Key to computational feasibility (run time AND memory management) d { jk N p } X B { jk + N } Z m u + e { jk { jk N p m N u jk ~ N (0, q jk ) e jk ~ N (0, s jk )

52 3// WFMM: Prior Assumptions Another benefit of wavelet-space modeling is that it provides us a natural mechanism for inducing adaptive regularization for our estimates of the B a (t) To induce nonlinear shrinkage, we assume a spikeslab prior on B ajk : B ajk γ ajk aj ajk N( 0, τ ) + ( γ ) δ Bernoulli( π ajk aj γ Nonlinearly shrinks B ajk towards 0, leading to adaptively regularized estimates of B a (t). τ aj & π aj are regularization parameters ) 0

53 3// WFMM: Prior Assumptions How to choose smoothing parameters? Not intuitive. Antoniadis, Sapatinas, and Silverman (998) suggest a parameterization in which π j decrease eponentially in j, and relate them theoretically to the underlying Besov space Clyde and George (000) estimated these from data using local MLE, in empirical Bayes fashion. We follow. Involves iterative (EM) algorithm to estimate the proportion of large wavelet coefficients at each level j (π ij ), and their associated variances (τ ij ), with lower bound on τ ij to avoid ecessive linear shrinkage Similar in principal to idea of estimating smoothing parameters from data in nonparametric regression

54 WFMM: Prior Assumptions To complete the specification of the Bayesian model, we need priors on the variance components. When PRI, we only need to worry about q jk and s jk We use diffuse conjugate Inverse Gamma priors Center on starting value estimates (MLE s), and set scale parameter such that prior has /000 information of data These empirical Bayes priors are by default automatically calculated by the code, which is important since it would not be feasible to set these by hand since there are (H+)T variance components. 3//

55 3// WFMM: Model Fitting MCMC to obtain posterior samples of model quantities Work with marginal likelihood; U integ. out; Let Ω be a vector containing ALL covariance parameters (i.e. P,R,Q,S). MCMC Steps. Sample from f(b D,Ω): Miture of Normals and point masses at 0 for each i,j,k.. Sample from f(ω D,B): Metropolis-Hastings steps for each j,k 3. If desired, sample from f(u D,B,Ω): Multivariate Normals

56 3// B Bˆ ajk LS aj α ijk N { X WFMM: Model Fitting Updating fied effects, B : Gibbs step (conjugate spike-slab miture) for each a,, p, j,, J, k,, Kj { } Bˆ ˆ ajk, MLELSaj, Vajk LSaj + ( αijk ) I0 ' Σ T aj Taj Linear Shrinkage X } X α ijk ' Σ ajk, MLE a jk a a jk jk ( a) ( a) jk, MLE ajk ajk, MLE a jk a ( d V Bˆ Var( ) ( X ' Σ X ) Σ ZPZ' q + jk X Bˆ { } d 4 ajk 44 ajk 3 Pr γ 4 aj jk Nonlinear Shrinkage aj T τ / V ajk Rs ) jk

57 3// WFMM: Model Fitting α ajk O Pr ajk ajk O + ( ) ajk γ d, O Posterior Odds ajk ajk π ζ ajk Taj ep π aj Taj + 3 aj ( O + ) ajk T { aj Posterior Odds Prior Odds Bayes Factor ζ ajk ˆ ajk, MLE B / V ajk

58 3// Wavelets: Wavelet Regression σ, τ 0, π0.0 ˆ θ ˆ θ ˆ θ jk,shrinkage jk,linear jk,nonlinear d d jk d jk SF jk P j P jk jk SF j

59 3// Wavelets: Wavelet Regression σ, τ 5, π0.0 ˆ θ ˆ θ ˆ θ jk,shrinkage jk,linear jk,nonlinear d d jk d jk SF jk P j P jk jk SF j

60 3// Wavelets: Wavelet Regression σ, τ, π0.0 ˆ θ ˆ θ ˆ θ jk,shrinkage jk,linear jk,nonlinear d d jk d jk SF jk P j P jk jk SF j

61 WFMM: Model Fitting Updating of variance components Ω: Could use Gibbs step for q jk and s jk : f(ω D,B,U) Problem: As q jk 0, U jk 0, get stuck in local mode ( black hole phenomenon ) Solution: Marginalize model by integrating out U, then update f(ω D,B), then f(u D,B,Ω) Equivalent to updating (Ω,U ) simultaneously, since f(u,ω D,B) f(ω D,B) f(u D,B,Ω) Another advantage: If inference on random effects not needed, then there is no need to update them! Disadvantage: Can t update q jk and s jk with simple Gibbs steps, since f(ω D,B) is not a known distributional form ; instead must use Metropolis-Hastings step 3//

62 WFMM: Model Fitting Metropolis-Hastings for q jk, s jk Use random-walk proposal: Proposed value for new MCMC iteration q jk () is centered at value at previous iteration q jk (0), as N(q jk (0),propvar jk ) Key is appropriate choice of propvar jk ; if too big, sampler won t move, if too small, will not eplore posterior (get stuck in local modes) Typically found by trial-and-error; not feasible here! We estimate propvar jk as.5var(q jk,mle ), which should be slightly more than variance of the true complete conditional distribution; works well 3//

63 3// WFMM: Model Fitting Updating of random effects, U : Gibbs step, Normal f ( u d, B, Ω) jk jk jk Normal( M jk, V jk ) V jk { ( ) } Ψ + q P jk jk Ψ jk { '( ) } Z s R Z jk M jk V jkψ jk uˆ jk, MLE 3 Linear Shrinkage ˆ jk, MLE ΨjkZ'( s jk R) ( d jk XB jk u ) When PRI, this step is very quick and trivial

64 3// WFMM: General Approach. Project observed functions Y into wavelet space.. Fit FMM in wavelet space (Use MCMC to get posterior samples) 3. Project wavelet-space estimates (posterior samples) back to data space.

65 3// WFMM: Back to Data Space 3. Project wavelet-space estimates (posterior samples) back to data space. Apply IDWT to posterior samples of B to get posterior samples of fied effect functions B j (t) for i,, p, on grid t. BBW Posterior samples of U k (t), Q, and S are also available, if desired.

66 3// WFMM: Back to Data Space For each fied effect function a,, p, and MCMC sample g,, G, construct: B [ ] ( g ) ( g ) ( ) B B L B ( g ) g a a a ajk Then apply the pyramid-based IDWT to this vector to obtain a posterior sample of B (g) a (t) on the grid t, since B a B a W In the same way, posterior samples of U(t) can be obtained by applying the IDWT to U (g), if desired. You can also compute posterior samples for any contrast involving the fied effect functions, CB J

67 3// WFMM: Back to Data Space We can also obtain posterior samples of Q or S from the wavelet space versions Q (g) diag{q jk (g) } and S (g) diag{s jk (g) } using the results of Vannucci and Corradi (999), based on the fact that QWQ W Can be done efficiently, in O(T ), by applying the -d IDWT to Q (g) and S (g) This gives us estimates of the within-curve covariance surfaces, The diagonal of these give us estimates of the variance function

68 3// WFMM: Inference Given posterior samples of all model quantities, we can perform desired Bayesian inference or prediction:. Pointwise posterior credible intervals for functional effects or contrasts. Posterior probabilities for pointwise hypotheses of interest 3. Bayes Factors for functional testing 4. Can account for multiple testing in identifying significant regions of curves by controlling the epected Bayesian FDR 5. Can compute posterior predictive distributions, which can be used for model-checking or classification.

69 3// Pointwise Inference: WFMM: Inference Given posterior samples of B a (t), can construct posterior mean (shrinkage) estimates for B a (t), or any contrast of fied effect functions CB Can compute quantiles of posterior distribution for B a (t) for each t, which can be used for pointwise credible intervals. Pr{Q j,α/ (t)<b a (t)< Q j,-α/ (t) D}-α Can compute probability of having a certain minimum effect size: p a (t)pr{ B a (t) >δ D}, e.g. log: δ -fold Takes both Statistical and Clinical significance into account

70 3// WFMM: Properties Draws of spectra from posterior predictive distribution yield data that looks like real MALDI data (3 rd column), indicating reasonable model fit.

71 3// WFMM: Properties Posterior samples/estimates of fied effect functions B i (t) adaptively regularized as a result of shrinkage prior applied to wavelet coefficients..5 (a) School E log(met) Able to preserve dominant spikes in mean curves, if present am pm 3pm 5pm 7pm Tim e of Day

72 WFMM: Properties 3//

73 WFMM: Properties Posterior samples/estimates of random effect functions U j (t) also appear to be adaptively regularized Able to preserve spikes in random effect functions, as well Important for estimation of random effect functions AND for valid inference on fied effect functions. 3//

74 3// WFMM: Properties How can the random effects be adaptively regularized when they have a Gaussian prior (with linear shrink)? U jk ~N(0,q jk ) However, note that each wavelet coefficient has its own variance component q jk, which determines the amount of shrinkage from u jk,mle towards 0 (smaller q relative to s, more shrinkage towards 0) Since (j,k) corresponding to strong signals in the random effect functions will tend to be larger, q jk will also tend to be larger, leading to less shrinkage. Unimportant (j,k) will tend to have small q jk, and thus lead to more shrinkage Although linear for each wavelet coefficient, the shrinkage is nonlinear when viewed for the entire set of wavelet coefficients together. This same dynamic allows very smooth random effect estimates when the data supports this notion.

75 3// WFMM: Properties While adaptive to irregularity, this framework can also yield relatively smooth effect functions when the data supports smooth representations.

76 WFMM: Inference Multiple testing problem is inherent in pointwise inference: T positions, T tests One Solution: perform functional test of H 0 :B a (t)0 Popular approach for some researchers (FANOVA) Not as interesting to me: key is for what t we have B a (t) differ from zero, and by how much. We can do functional testing in WFMM using Bayes Factors Another Solution: Apply some sort of multiple testing adjustment to the inference. Here we will look at applying the False Discovery Rate (FDR) ideas popular in microarrays to this pointwise functional setting. 3//

77 3// Adjusting for Multiple testing Classic approach: Bonferroni T tests, use α/t as sig. level WFMM: Inference Controls eperiment-wise error rate H 0 Pr(at least one false positive)< α Too conservative for some fields Decision (e.g. microarrays), so alternative criterion devised H False Discovery Rate (FDR): Benjamini and Hochberg (995) Controls proportion of discoveries that are false to be no more than α A C Truth H 0 H FDR C B D C + D

78 WFMM: Inference Many procedures have been developed for controlling FDR Benjamini and Hochberg (995), Yekutieli and Benjamin (999), Benjamini and Liu (999), Storey (00), Storey (003), Genovese and Wasserman (00), Ishwaran and Rao (003), Pounds and Morris (003), Efron (004), Newton (004), Pounds and Cheng (004) Most methods follow one of three approaches:. Fit miture model on tests stats (known distn under H 0 ). Fit miture model on p-values {U(0,) under H 0 } 3. Fit Bayesian miture model, with point mass prior on H 0 Methods can either set limit on global FDR (α), or on local FDR for each statistic i (Storey 003) q i min Pr(H 0 true for gene i Gene i in RR) Note: p-value is Pr(Gene i in RR H 0 true for statistic i) 3//

79 3// WFMM: Inference Global: αc/(c+d) Local: q(pv)c/(c+d) D d B C c A pv

80 3// WFMM: Inference Bayesian FDR from posterior probabilities Posterior probs of effect sizes: p i Pr{ B i <δ Y}, i,, T If we define a discovery to be effect size at least δ (H 0i : B i < δ), then p i is a Bayesian local FDR To control the global epected Bayesian FDR. Sort in ascending order of p i {p (i), i,, T}. Identify cutpoint ϕ α on posterior probabilities that controls epected Bayesian FDR to be α i ϕ α p (λ), where λ mai: pi i α i 3. Flag the set of statistics {i: p i ϕ α } as significant (According to model, we epect only α to be false pos.)

81 3// WFMM: Inference Notes This approach takes both statistical and practical significance into account when declaring differences significant. Given choice of cutpoint on posterior probabilities to flag significant regions, we can also estimate false negative rate (FNR), sensitivity, and specificity for declaring flagged regions significantly different.

82 WFMM: Inference FDR for pointwise functional inference: Substituting Lebesgue measure for counting measures, we can apply FDR ideas to pointwise inference: From MCMC samples, we have the quantities: p al Pr{ B a (t l ) >δ Y} for l,, T Using H 0t : B a (t) <δ, p al is a Bayesian local FDR for fied effect function a at position t l within the curve. As described above, we can find a cutpoint ϕ α on the p al that controls the global average Bayesian FDR<α α > t : t flagged falsely Ba ( t) < δ t : t flagged as "significant" 3//

83 3// Applications: SELDI Organ-by-Cell-Line Let Y i (t) be the SELDI spectrum i { Y ( t) } 5 log i XijBj( t) + ZikUk ( t) + Ei ( t) j X i if lung/a375p, 0 ow X i for brain/a375p, 0 ow X i3 if lung/pcmm X i4 for brain/pcmm X i5 if high laser intensity, - if low 6 k B j (t) overall mean spectrum for treatment group j,,3,4 B 5 (t) laser intensity effect function Z ik if spectrum i is from mouse k (k,, 6) U k (t) is random effect function for mouse k.

Applications: SELDI Organ-by-Cell-Line 3//008

84 Applications: SELDI Organ-by-Cell-Line 3//

85 Applications: SELDI Organ-by-Cell-Line 3//

86 Applications: SELDI Organ-by-Cell-Line 3//

87 3// Applications: SELDI Organ-by-Cell-Line Using α0.05, δ (-fold epression on log scale), we flag a number of spectral regions.

88 Applications: SELDI Organ-by-Cell-Line 3//

89 Applications: SELDI Organ-by-Cell-Line 3900 D (~00-fold) (CGRP-II): dilates blood vessels in brain 760 D (~5-fold) (neurogranin): active in synaptic modeling in brain (Not detected as peak) 3//

90 Applications: SELDI Organ-by-Cell-Line 3//

91 3// Applications: SELDI Organ-by-Cell-Line Inclusion of nonparametric functional laser intensity effect is able to adjust for systematic differences in the and y aes between laser intensity scans

92 3// Conclusion Presented unified modeling approach for FDA Adaptive enough to handle irregularities in both mean structures and random effects (covariances) Method based on mied models; is FLEXIBLE Accommodates a wide range of eperimental designs Addresses large number of research questions Posterior samples allow Bayesian inference and prediction Posterior credible intervals; pointwise or joint Predictive distributions for future sampled curves Predictive probabilities for classification of new curves Bayesian functional inference can be done via Bayes Factors Since a unified modeling approach is used, all sources of variability in the model propagated throughout inference.

93 3// Conclusion Approach is Bayesian. The only informative priors to elicit are regularization parameters, which can be estimated from data using empirical Bayes. Developed general-use code reasonably fast and straightforward to use minimum information to specify is Y, X, Z matrices. Can deal with missing data, i.e. partially observed functions Method has been generalized to higher dimensional functions, e.g. image data, space/time data The Gaussian/independence assumptions can be relaed to robustify modeling

94 3// Acknowledgements Work presented here is from 3 papers. Wavelet-Based Functional Mied Models (006) Jeffrey S. Morris and Raymond J. Carroll, JRSS-B, 68(): Using Wavelet-Based Functional Mied Models to Characterize Population Heterogeneity in Accelerometer Profiles: A Case Study (006) Jeffrey S. Morris, Cassandra Arroyo, Brent Coull, Louise Ryan, Richard Herrick, and Steve Gortmaker, JASA, 0(4): Bayesian Analysis of Mass Spectrometry Proteomics Data using Wavelet Based Functional Mied Models (007) Jeffrey S. Morris, Philip J. Brown, Richard Herrick, Keith A. Baggerly, and Kevin R. Coombes, Biometrics, online. Supported by NIH Grant R0 CA07304 Computer code/papers on web at

95 3// Acknowledgements Statistical Collaborators Raymond J. Carroll Philip J. Brown Louise M. Ryan Keith A. Baggerly Kevin R. Coombes Marina Vannucci Brent A. Coull Elizabeth J. Malloy Cassandra Arroyo Veera Baladandayuthapani Richard C. Herrick (computing) Biomedical Collaborators Joanne Lupton Meeyoung Hong Nancy Turner Steve Gortmaker Stanley Hamilton James Abbruzzesse Ryuji Kobayashi John Koomen Nancy Shih Howard Gutstein Josh Fidler Donghui Li

Bayesian Wavelet-Based Functional Mixed Models

Bayesian Wavelet-Based Functional Mixed Models Jeffrey S. Morris U M.D. Anderson Cancer Center Raymond J. Carroll exas A&M University Functional Data Analysis Functional Data: Ideal units of observation: