Bayesian Hierarchical Self Modeling Warping Regression

Size: px

Start display at page:

Download "Bayesian Hierarchical Self Modeling Warping Regression"

Gavin Francis
5 years ago
Views:

1 Bayesian Hierarchical Self Modeling Warping Regression Donatello Telesca 1,3, Lurdes Y.T. Inoue 2,3 Technical Report No. 510 Department of Statistics University of Washington Author s Footnote 1 University of Washington, Department of Statistics. Corresponding author ( telesca@stat.washington.edu) 2 University of Washington, Department of Biostatistics. 3 Fred Hutchinson Cancer Research Center. January 25,

2 Abstract Functional data often exhibit a common shape but also variations in amplitude and phase across curves. The analysis often proceed by synchronization of the data through curve registration. In this paper we propose a Bayesian Hierarchical model for curve registration. Our hierarchical model provides a formal account of amplitude and phase variability while borrowing strength from the data across curves in the estimation of the model parameters. We discuss extensions of the model by utilizing penalized B splines in the representation of the shape and time transformation functions, and by allowing random image sets in the time transformation. We discuss applications of our model to simulated data as well as to two data sets. In particular, we illustrate using our model in a non standard analysis aimed at investigating regulatory network in time course microarray data. Keywords: Bayesian Hierarchical Model, Self Modeling Regression, Curve Registration, Splines, Markov Chain Monte Carlo (MCMC), Time Course Microarray Experiments. 2

3 1 Introduction Functional data analysis is concerned with the analysis of random curves, known as functional data, which often exhibit a common pattern but also variations in amplitude and phase across curves. The monographs by Ramsay and Silverman (1997, 2002) provide a broad range of examples of functional data as well as methods for analysis. In this paper we analyze two sets of functional data shown in Figure 1. Panel (a) shows a sample of height velocity curves for 39 boys estimated from individual growth curves. We observe an overall deceleration trend in growth from infancy to adulthood with some few acceleration deceleration pulses in velocity. In particular, the most proeminent velocity pulse corresponds to the pubertal spurt. The goal of the analysis lies in modeling both the amplitude and timing of features in the individual curves to estimate the mean (velocity) curve. Panel (b) shows a sample of 5 time course mrna gene expression measurements in the yeast cell cycle experiment described by Spellman et al (1998). Time course gene expression microarray experiments are thought to provide insights into the functional state of an organism by mapping the activity of the entire genome as a biological process that unfolds over time. One of the goals of analysis is to learn about network relationships between genes. In this sense, we are specifically interested in modeling the timing of individual expression profile features (such as maxima, minima, plateaus, inflections) as those may be suggestive of gene regulatory relationships of activation, inhibition and co expression. An important aspect in functional data is that time variation explains a large portion of the variability in the data. For example, even though children experience a similar sequence of hormonal events affecting growth, such events do not occur at the same rate/time in all children. Such time variation is accounted for with curve registration methods for functional data. Specifically, curve registration, also known as curve alignment in biology, or time warping in the engineering literature, refers to a class of methods which consists of finding, for each observed curve, a warping function which synchronizes all curves. Oftentimes, after aligning curves, 3

4 one estimates an average curve, a process called structural averaging by Kneip and Gasser (1988, 1992). Several time warping methods have been devised to date. In the engineering literature, Sakoe and Chiba (1978) developed a registration technique called dynamic time warping to align two curves with different dynamics and applied it to speech analysis and recognition. Under this method, dynamic programming is used to optimize the alignment between two curves through the minimization of a cost function. Wang and Gasser (1997, 1999) introduced the technique to the statistical literature and provided large sample properties of the time transformation estimators. In particular, they proposed an alternative cost function which utilized amplitude and derivatives of the curves, and extended dynamic time warping to align more than two curves. Central to their development, however, was the assumption that the curves were smooth. Gasser and Kneip (1995) proposed the landmark registration method which consists of identifying the timing of certain features (landmarks) in the curves which are then aligned so that they occur at the same transformed times. Alternatively, Silverman (1995) developed a method for curve registration which did not require landmarks. Optimizing a fitting criterion, Silverman estimated time shift transformations. The method was later extended by Ramsay and Li (1998) to a continuous monotone transformation family and Kneip et al. (2000) who proposed curve registration with locally estimated monotone time transformations. More recently, some authors have explored curve registration using SElf MOdeling Regression (SEMOR) methods. In their classical version, SEMOR methods by Lawton, Sylvester and Maggio (1972) and Kneip and Gasser (1988) are semi parametric models in which the subject specific regression function is a parametric transformation of a common smooth regression function. Specifically, let Y i (t) denote the observed curve for subject i over the sampling times t = (t 1,...,t n ). The general SEMOR model can be expressed as Y i (t) = m i (t) + e i, with m i (t) = g(m, θ i )(t), where m is the common shape function, g is the transformation that generates the individual regression functions and e i is the vector of random errors. The name 4

5 self modeling comes from the fact that in SEMOR both the shape function g and model parameters θ are estimated from the data. A common class of SEMOR models with transformations of both the x and y axes with a shape invariant shift has m i (t) = α i +β i m(γ i t+δ i ) referred to as the Shape Invariant Model (SIM) (Lawton, Sylvester and Maggio, 1972 and Kneip and Gasser, 1988). Ronn (2001), considering a version of the above SIM model in which m i (t) = m(t + δ i ) where the shifts δ i are random effects, proposed nonparametric maximum likelihood registration of random time shifted curves. Brumback and Lindstrom (2004) built on the SIM models, replacing the linear time transformation function to allow for more flexible, monotone random time transformations. Specifically, in their formulation, curve specific amplitude parameters as well as curve specific time transformations are assumed random. Moreover, the time transformation and the common shape function are modeled with B splines. With a similar idea towards developing flexible warping functions, Gervini and Gasser (2004) proposed a self modeling warping function with the components of the warping function estimated from the data using B spline basis functions. However, in their formulation, the time transformations are not random. Liu and Müller (2004) proposed time synchronization of random processes where the observed random curves are assumed to be generated by a bivariate stochastic process consisting of a stochastic monotone time transformation function and an unrestricted random amplitude function. In this paper we investigate curve registration from a Bayesian perspective and propose a Bayesian Hierarchical Self Modeling Warping Regression Model (BHSMWR). While our model builds on earlier work (Ronn, 2001; Brumback and Lindstrom, 2004 and Gervini and Gasser, 2004), to our knowledge, it is the first to address curve registration from a Bayesian standpoint with the hierarchical model providing a formal account of amplitude and phase variability while borrowing strength from the data across curves in the estimation of the model parameters. We also provide extensions of the model: first, utilizing penalized B splines in the representation of the shape and time transformation functions and, second, extending the family of time 5

6 transformation functions to allow random image sets. We utilize the Markov Chain Monte Carlo (MCMC) algorithm to explore the posterior distribution of the model parameters. This paper is organized as follows. In Section 2 we present our model. Specifically, in Section 2.1 we present our hierarchical model. We discuss using flexible representations of the shape and time transformation functions using B splines in Section 2.2, and using penalized B splines in Section 2.3. In Section 2.4 we discuss an extension of the model that allows random image sets for the time transformation function. In Section 3 we illustrate our methods. First, we apply our model to simulated data in Section 3.1 and later to the height velocity data in Section 3.2 and to time course microarray data in Section 3.3. In particular, in the latter example we introduce network inference from BHSMWR models. Finally, in Section 4 we provide a discussion. 2 Model Formulation 2.1 Hierarchical Model Consider a sample of curves y 1 (t),, y N (t). More specifically, let y ij denote the j th observed level of the i th curve at time t ij, with i = 1, 2,..., N and j = 1, 2,..., n i. To simplify notation, we assume that all curves are observed in the same design interval T = [t 1, t n ], therefore defining the trajectory for curve i as a vector y i (t) = (y i1,...,y in ) where t = (t 1,...,t n ). We introduce the following three stage hierarchical model. Stage One: Each observed curve is modeled as: y i (t) = c i 1 n + a i m(u i (t, φ i ), β) + ǫ i, i = 1,...,N, (1) iid where ǫ i N(0, I n σǫ 2 ) is a vector of independent random errors, normally distributed with null mean vector and with common variance σ 2 ǫ. In the above, m(, ) denotes a common shape function generating the individual curves and u i (, ) denotes the curve specific time transformation function. 6

7 The common shape function can be modeled as a linear combination of an appropriate set of basis functions and a set of basis coefficients β. Additional discussion on the representation of the shape function is deferred to Sections 2.2 and 2.3. Stage Two: Given a common shape function m(, ), individual curves may exhibit different scales and levels of response. We assume that c i N(c 0, σc 2) and a i N(a 0, σa 2)I{a i > 0}, thus defining curve specific random linear transformations. We note that the above assumption of strictly positive amplitude can be relaxed. For example, in our application to time course expression data, the amplitude parameters are allowed to vary over the entire real line, since expression levels for genes with opposite regulation could have anti symmetric profiles. The curve specific random time transformation functions u i (t, φ i ) characterize the timing features of each curve. We start by considering u i as a smooth map defined over the design interval T, subject to the monotonicity and image constraints: t 1 < < u i (t j, φ i ) < u i (t k, φ i ) < < t n, i {1,..., N}, j < k ; (2) u i (t 1, φ i ) = t 1, u i (t n, φ i ) = t n, i {1,..., N}; (3) where (2) restricts the time transformation functions to maintain the original ordering of the sampling times (no time reversion) and allows for a 1 to 1 correspondence between physical times and transformed times, while constraint (3) corresponds to the assumption that the stochastic time transformation happens between fixed starting and termination time points, coinciding with the boundaries of the sampling interval T. The last assumption may not be adequate in some cases, for example, when some trajectories exhibit a simple linear shift. We discuss relaxing assumption (3) in Section 2.4. The time transformation function u i (t, φ i ) may be modeled as a linear combination of an appropriate set of basis functions and a set of individual basis coefficients φ i. Additional discussion on the representation of the time transformation function is deferred to Sections 2.2 and 2.3. We assume that the time transformation function coefficients have a multivariate normal distribution φ i iid N(Υ, Σ φ ), where Υ is the 7

8 vector associated with the identity time transformation function, that is, u i (t, Υ) = t (see Appendix A.1. for more details.). Stage Three: We assume that a 0 N(1, σa0 2 ) and c 0 N(0, σc0 2 ). Moreover, for precision parameters, 1/σ 2 a Gamma(a a, b a ), 1/σ 2 c Gamma(a c, b c ), 1/σ 2 ǫ Gamma(a ǫ, b ǫ ). [In our development, X Gamma(a, b) is parametrized so that E[X] = a/b.] Additionally, we assume that the shape function coefficient vector β has a multivariate normal distribution β N(0, Σ β ). The precision matrix for the shape coefficients is specified as Σ 1 β = 1 λ Ω, where 1/λ Gamma(a λ, b λ ). Similarly, the precision matrix for the warping parameters is specified as Σ 1 φ = 1 σ 2 φ P, with 1/σ 2 φ Gamma(a φ, b φ ). We discuss choices of Ω and P in Section 2.3 under penalized B spline basis functions. 2.2 Fixed knots B-spline implementation In this Section we discuss the specification of the shape function m(, ) and the time transformation functions u i (, ) using B spline basis functions (de Boor, 1978). B spline basis functions have been previously used to represent the common shape function and/or the individual time transformation functions (Brumback and Lindstrom, 2004; Gervini and Gasser, 2004) for their flexibility. Specifically, to represent the common shape function we select a set of knots (κ 1, κ 2,...,κ p ) partitioning the sampling interval T into p + 1 subintervals. Using piecewise cubic polynomials and given the set of interior knots denoted {κ} p, we define B(u i (t, φ i )) as a n K design matrix of B spline basis where K = p + 4. In this framework, we then define m(u i (t, φ i ), β) = B(u i (t, φ i ))β, where β is the K dimensional vector of basis coefficients defining the common shape function. Similarly, given a set of interior knots {κ} h, we may represent the individual time transformation functions as u i (t, φ i ) = B(t)φ i, where B(t) is a n Q design matrix 8

9 of B spline basis where Q = h + 4 and φ i is the Q dimensional vector of basis coefficients. Since the time transformation function needs to respect monotonicity and boundary conditions (2) and (3) we impose the following constraints on the warping coefficients φ i : t 1 < φ i2 < φ i3 < < t ni, i {1,..., N} (monotonicity), (4) φ i1 = t 1, φ iq = t ni, i {1,...,N} (image), (5) Equation (4) provides a sufficient condition for monotonicity, derived from basic properties of B spline derivatives (de Boor, 1978). The above implementation requires the specification of the number of interior knots as well as the location of the knots for both the common shape function m(, ) and the individual time transformation functions u i (, ). This is a model selection problem which is beyond the scope of this paper. We just note, however, that this problem can be addressed with the optimization of some information criteria (for example, AIC, BIC) or using predictive measures (for example, CPO, Bayes factor). 2.3 Penalized regression splines implementation As pointed out earlier, B splines offer a flexible modeling tool. An issue for its implementation arises with the choice of the positions and number of interior knots. We propose an alternative which relies on penalized regression splines (Eilers and Marx, 1996; Ruppert, Wand and Carrol, 2003). Specifically, under this formulation, a relatively large number of equidistant knots is selected and a penalty, dependent on a smoothing parameter λ, is placed on coefficients of adjacent B splines. In a frequentist framework the choice of λ is usually made in the model selection stage and is based on cross validation analysis. We take the Bayesian approach and include the smoothing parameter explicitly into the model. Consider the problem of estimating the common shape function m(, ). Given a relatively large number of equidistant interior knots {κ} p, we represent the common shape function as a linear combination of B spline basis m(, β) = B( )β. Following 9

10 Lang and Brezger (2004), we place a first order random walk shrinkage prior on the shape coefficients β, so that: β k = β k 1 + e k, e k N(0, λ). (6) We assume β 0 = 0. It can be shown that, conditional on λ, β has a multivariate normal distribution with null mean vector and precision matrix Ω/λ. Under the above first order random walk, Ω is a banded precision penalization matrix Ω = (7) Note that the random walk variance λ can be interpreted as the smoothing parameter. Following Lang and Brezger (2004) we place a diffuse conjugate inverse gamma hyperprior on the variance, that is, λ IG(a λ, b λ ). A similar approach may be adopted to model the time transformation functions u i (t i, φ i ) so that i = 1,, N, (φ ik Υ k ) = (φ i(k 1) Υ k 1 ) + η k, η k N(0, σ 2 φ). (8) Assuming that (φ i0 Υ 0 ) = 0, it can be shown that φ i N(Υ, σφ 2 P), where P is a precision penalization matrix as in (7) and σ 2 φ is the smoothing parameter associated with the transformation functions u i (, ). 2.4 Stochastic time model As we discussed in Section 2.1, the assumptions imposed by equation (3) imply that the stochastic time transformation happens between fixed starting and termination time points, coinciding with the boundaries of the sampling interval T. In the case of time course microarray data, for example, such restriction may not be adequate. For instance, consider two expression profile signals such that E(Y 1 (t)) = m 1 (t) and 10

11 E(Y 2 (t)) = m 1 (t+δ 2 ) differentiated by a simple linear offset. The time transformation function cannot be analytically represented by a time transformation with fixed image T. In order to account for such cases we model the time transformation functions as monotone maps defined over the sampling interval T with values in a random interval (T + δ i ) = [t 1 + δ i, t n + δ i ] so that the time length of the sampling experiment is fixed and equal to (t n t 1 ). As we are modeling the common shape function m(, ) in a semi parametric fashion we assume the individual shifts δ i are defined on a compact domain so that m(, ) is defined on the expanded interval T = [t 1, t n + ]. The shape function m(, ) and the individual time transformation functions u i (, ) can be modeled again as linear combinations of (penalized) B spline basis functions. The set of basis defining m(, ) now need to span a functional space over the expanded domain T. In order for the individual time transformation functions to take values over random intervals of the type (T +δ i ) we need to free one of the terminal warping coefficients. If we choose to free φ i1 (operationally choosing δ i = φ i1 ), we may proceed as in Section 2.2 modifying the set of constraints (4) and (5) so that (t 1 ) < φ i1 < < φ iq < φ i(q+1) < < φ iq < (t ni + ), i {1,..., N}; (9) φ i1 [(t 1 ), (t 1 + )], φ iq = t ni + φ i1, i {1,...,N}. (10) The ordering in (9) guarantees monotonicity of the time transformation functions and the constraints in (10) allows for individual random sampling domains of fixed length (t n t 1 ). Equivalently, one may choose to free the last warping coefficient φ iq, rearranging (9) and (10) accordingly. 2.5 Posterior Estimation The full parameter vector is θ = (c,a, β, φ, c 0, a 0, σǫ, 2 σc, 2 σa, 2 λ, σφ) 2, where c = (c 1,...,c N ), a = (a 1,, a N ) and φ = (φ 1,, φ N ) is an N Q matrix of individual warping parameters. We fully specify the Bayesian model with priors on the parameter vector θ as specified in Section 2.1. Given the observation matrix 11

12 Y = (Y 1,,Y N ), the posterior distribution follows form f(θ Y) = f(c,a, β, φ, c 0, a 0, σǫ 2, σ2 c, σ2 a, λ, σ2 φ Y) (11) f(y c,a, β, φ, σǫ)f(c,a 2 c 0, a 0, σc, 2 σa)f(β 2 λ)f(φ σφ) 2 f(c 0 σc 2 0 )f(a 0 σa 2 0 )f(σǫ 2 a ǫ, b ǫ )f(σc 2 a c, b c )f(σa 2 a a, b a )f(λ a λ, b λ )f(σφ 2 a φ, b φ ). The joint posterior density is analytically intractable, and so we implemented a Markov Chain Monte Carlo algorithm to sample from the posterior distribution. Most of the full conditionals are available in closed form, except for the time transformation parameters φ. We use Gibbs sampling whenever full conditionals are available and adaptive Metropolis Hastings to sample the time transformation parameters. Derived full conditional posterior distributions are provided in the Appendix A.2. 3 Applications 3.1 Simulation Study To assess estimation via BHSMWR we simulated 30 curves (see Figure 2, panels (a) and (b)) with the common shape function m(t) = cos(t/4) + sin(t)/4 evaluated at 50 equally spaced time points in the design interval T = [0, 30]. Each simulated curve is of the form y ij = c i + a i m(u i (t j, φ i )) + ǫ ij, where we chose, c i iid N(0, 1) and a i iid N(1, 1)I{a i > 0}, i = 1,...,30. The time transformation functions u i (t, φ i ) were generated from a linear combination of B spline basis over the sampling interval T. In particular, we defined u i (t, φ i ) = B(t)φ i, with B(t) = {g 1,, g 5 }, corresponding to a single interior knot placed at t = 15, and φ i N 5 (Υ, 2) defined over the constrained space as given by equations (9) and (10). We fitted the BHSMWR model to a set of noisy curves where ǫ ij iid N(0, σ ǫ =.3). Relatively diffuse Gamma(0.1, 1) priors are considered for the amplitude, scale and shape precision parameters. In order to estimate the common shape function, we placed 39 equally spaced interior knots between 4 and 34 with a maximum expansion constraint = 5. In order to estimate the individual time transformation functions we partitioned the interval T into four subintervals, placing three interior knots at κ = 12

13 (7.5, 10, 22.5). As the warping parameters are constrained with maximum support of length (t n t 1 + ) = = 35, we consider a Gamma(0.5, rate = 20) prior for the warping precision. Our inferences are based on 15, 000 samples from the posterior distribution obtained after discarding the initial 50, 000 MCMC iterations for burn in. Figure 2 also shows some results from our alignment. Panel (c) shows 30 simulated trajectories superimposed with a cross sectional mean. The cross sectional mean curve does not resemble any of the individual curves as phase variation is not taken into account. Panel (d) shows the registered trajectories superimposed with the registered posterior median. Accounting for phase variability we are now able to recover subtle features of the originating signal. Panels (e) and (f) indicate the quality of signal recovery in a sample of curves. Panel (e) shows that the estimated posterior median signals (full lines) are almost indistinguishable from the true signals (in dashed lines). A similar comment applies to the time transformation function shown in panel (f) where the posterior median time transformation functions closely overlap the corresponding true time transformations. To assess sensitivity of the results to our prior choices we re estimated the model considering a two fold increase in the prior variances. The resulting estimates did not change substantially and are thus omitted from the manuscript. We also assessed estimation with noisier curves by doubling the error variance. We could still successfully estimate the model parameters. For example, we found that the true time transformation functions were still contained within the 90% posterior credible bands (figure omitted). 3.2 Registering height acceleration functions The Berkeley Growth Study (Tuddenham and Snyder, 1954) recorded the height of 39 boys for 31 time points between the ages of 2 and 18 years. The goal of the analysis lies in modeling both the amplitude and timing of features in the individual curves to estimate the mean velocity curve. Panel (a) of Figure 3 shows growth velocity curves superimposed with the cross sectional mean. As pointed out by Ramsay and 13

14 Li (1998) and Gervini and Gasser (2004) the cross sectional mean underestimates the amplitude of the local maxima and overestimates the amplitude of local minima in a systematic fashion since it does not account for phase variation. Using our BHSMWR model, in order to estimate the common shape function, we place 27 equally spaced interior knots between 3 and 23 and consider a maximum expansion constraint = 7. We model the individual time transformation functions partitioning the interval T = [2, 18] into five subintervals and placing four interior knots at κ = (5.2, 8.2, 11.6, 14.8). We consider a Gamma(0.5, rate = 20) prior for the warping precision and place Gamma(0.1, 1) priors on the amplitude, scale and shape precision parameters. Our inferences are based on 15, 000 samples from the posterior distribution obtained after discarding the initial 50, 000 MCMC iterations for burn in. Figure 3, panels (b) through (d), shows the results from our analysis. Panel (b) shows the estimated median time transformation functions while panel (c) shows the registered curves superimposed with the registered posterior median shape function. The estimated time transformation functions show that curve registrations occur over monotone transformations combined with linear offsets. Panel (d) compares the unregistered cross sectional mean in gray to the registered median shape function. The median function estimated via BHSMWR captures midgrowth velocity spurts that are lost in the cross sectional mean. To assess sensitivity of the results to our prior choices, we also considered a two fold increase in the prior variances. The resulting estimates did not differ substantially from those previously reported in Figure 3 and are, therefore, omitted. Finally, we also compared fitting the data with our model that allows for random image sets as opposed to a model with a fixed image set. A comparison of the median posterior shape functions for the two models (Figure 3, panel (e)), shows that an image restricted model identifies only one midgrowth spurt (as inferred in Gervini and Gasser, 2005), while a random image model highlights the presence of more details in the growth velocity of a typical boy. We computed the Conditional Predictive Ordinate (CPO) (Gelfand, Dey and Chang, 2002). The generally higher CPOs under 14

15 our random image model indicate that it provides better curve fit (Figure 3, panel (f)). 3.3 Analysis of yeast cell cycle gene expression data Spellman et al. (1998) conducted a series of microarray experiments to create a comprehensive catalog of yeast genes whose transcription levels vary periodically within the cell cycle. In our analysis, we consider a sample of 100 differentially expressed genes from yeast cultures synchronized via the α factor arrest. mrna expression measurements were collected every seven minutes for 126 minutes for a total of 18 measurements per gene. Motivated by earlier work on the existence of time delayed relationships as seen, for example, between transcription factors and their targets (see, for example, Qian et al. (2001), Yu et al. (2003)) we use curve registration to investigate time varying network relationships between genes. This application illustrates a non standard application of curve registration. The BHSMWR model presented earlier on is now extended. Here, the individual amplitude parameters a i are defined over the entire real line R to allow for inverse regulatory relationships between genes. Since timing of expression features may be indicative of network relationships, inferences about gene to gene interactions may be based on similarities in their time transformation functions. Global or local (time specific) measures of similarity may be based on distance measures D(u i (t), u j (t)) between individual time transformation functions u i (t) and u j (t). Information about the order of influence, e.g. gene i affecting gene j (i j) can be assessed with sign(u i (t) u j (t)), where a positive sign indicates that i j while a negative sign indicates that j i. For identifiability purposes, we consider the two fold amplified cross sectional mean as the reference curve towards which we align all other expression profiles. After re scaling the sampling times into units of seven minutes each, we estimate the common shape function placing 27 equally spaced interior knots between 4 and 22 15

16 and considering a maximum expansion constraint = 6, corresponding to a maximum offset of 42 minutes. In order to estimate the individual time transformation functions we partition the interval T = [0, 18] into four subintervals, placing three interior knots at κ = (4.5, 9.0, 13.5). We place a Gamma(0.2, rate = 2) prior on the scale and amplitude precisions and a Gamma(0.5, rate = 20) prior on the precision of the warping parameters. A Gamma(0.1, 1) prior was chosen for the shape precision parameter. We base our inference on 15, 000 samples from the posterior distribution obtained after discarding the initial 50, 000 MCMC iterations for burn in. Panel (a) of Figure 4 shows the the estimated posterior densities of the amplitude parameters. Most of these distributions are pulled away from 0, confirming earlier reports in the literature that these genes are differentially expressed in Spellmann s experiment. Panel (b) shows the posterior median time transformation functions for 100 genes. As all expression profiles are defined as random transformations of a common shape function, these transformation functions are informative about similarities between expression profiles. Up to scale, the closer these functions are, the more similar the fitted expression profiles are. Panel (c) shows a random sample of gene expression profiles with associated posterior median fit with corresponding 90% credible bands. After selecting the top 50 genes with highest absolute median amplitude, panel (d) shows the expected posterior absolute differences E(D ij (u i, u j ) y) between each pair of genes with D ij (u i, u j ) = t u i (t) u j (t), for all pairs i j. The closer the time transformation functions the darker is the area in the gene gene association matrix. In order to show how similarities may help us identify interacting genes we consider gene YPR119W which, in this experiment, seems to exhibit a fairly distinct expression profile. Looking at the row or column values for this gene in the gene gene association matrix represented in panel (e) of Figure 4, we find that the best three matches with the gene of interest are genes: YGR108W, YER001W and YBL032W. Panel (f) shows the expression profile for our gene of interest together with its three best matches. We notice how the model identifies relationships between genes which exhibit similar timing in their profile features, relating genes which appear to be coexpressed as well 16

17 as genes which seem to exhibit patterns of inhibition. In particular, the interaction between YPR119W and YGR108W can be explained by the fact that both are cell cycle regulators involved with a common complex. The interactions with YER001W and BYL032W have not been reported in the literature. In Table 1 we report the top 15 relationships with lowest expected posterior warping distance. For all known genes the inferred relationships have biological support as they share similar cellular functions and/or have been reported in the literature to interact with each other. 4 Discussion In this paper we proposed a Bayesian hierarchical model for curve registration. Our hierarchical model provides a formal account of amplitude and phase variability while borrowing strength from the data across curves in the estimation of the model parameters. As opposed to most methods to curve registration, our method does not require preliminary smoothing of the data and provides a natural framework for assessing uncertainty in the estimated time transformation and shape functions. We introduced some extensions. We proposed a new family of monotone time transformation functions which allows for random image sets and thus greater flexibility in modeling functional data. Moreover, using penalized B splines, we discussed smoothing via shrinkage of the shape and warping coefficients. Our simulation study indicates that our model can adequately capture true signals and time transformation functions even in the presence of noisy data. In the application to the height velocity data set we showed that our registered posterior median shape function captured features in velocity that were lost with the cross sectional mean and that the curves had linear offsets. In the microarray data set we showed how BHSMWR models could be applied to a non standard problem of investigating network relationships. Motivated by biological work indicating that time delayed relationships are possible, we used the time transformation functions to infer relationships between genes. The advantage of this approach over traditional methods (see, for example, Friedman et al., 2000; Inoue et al., 2006) to network analysis is that 17

18 the complexity of the problem increases only linearly. Moreover, while traditional methods can only capture average relationships, when using the time transformation functions for inferring networks one can also investigate time varying networks. We note that our BHSMWR model can also be used to identify genes with differential temporal patterns of expression. This could be assessed using measures derived from the posterior distribution of the individual profile amplitude parameters a i. We also note that standard procedures that control for the posterior expected Bayesian FDR as well as decision theoretic techniques may be utilized to address the issue of multiple comparison when testing for differential expression and network relationships. This is a subject for further investigation which is beyond the scope of this paper. Although flexible our model has some limitations. A common limitation to curve registration methods is that our BHSMWR model is not adequate for analyzing functional data observed for very few time points. Moreover, the shrinkage procedure outlined in this paper may be inappropriate when the underlying shape function is highly oscillating. A possible solution, offering flexible shrinkage, is to replace the global variance (smoothing) parameter for the shape coefficients with a set of local smoothing parameters. Finally, since the method relies on estimating a common shape function, we also acknowledge that it has limitations when the signal is overcome by noise. Appendix Appendix A.1: B spline Identity coefficients In this Section we describe the structure of the identity coefficients when representing a smooth function f(t) as a linear combination of B spline basis functions B(t) of order r, evaluated at t over the domain T = [0, I]. Given a set of h distinct interior knots {κ} h partitioning the domain T in (h + 1) subintervals, the complete set of knots defining B(t) is ν = (ν 1 ν h+2r ) = (0,, 0, κ 1,, κ h, I,, I). We define a set of coefficients Υ = (υ 1,, υ h+r ), as the identity coefficients if f(t) = B(t)Υ = t. The coefficients Υ are found setting 18

19 the derivative of B(t)β with respect to t to 1 for all t [0, I]. Given the order r and the interior knots {κ} h, Υ can be obtained using the recursion: υ q+1 = (ν q+r ν q+1 )/(r 1) + ν q, q = 1,, h + r 1, ν 1 = 0. Appendix A.2: Full Conditionals In this section we provide the full conditional posterior distributions for the model parameters. A2.1 Full conditional distribution for the common shape parameters β: (β Y, θ β ) N (m β, V β ), where = [Σ 1 β + 1/σ2 ǫ X X] and m β = V β [1/σǫ 2 X (Y C)], with a 1 B(u 1 (t, φ 1 )) C = (c 1 1 n 1,, c N 1 n N ) and X =.. a N B(u N (t, φ N )) V 1 β A2.2 Full conditional distribution for the population linear scale and amplitude transformation parameters a 0 and c 0 : (a 0 Y, θ a0 ) N (a 0, σ2 a0 ) I{a 0 0}, where σ 2 a0 = (1/σ2 a0 + N/σ2 a ) 1 and a 0 = (σ2 a0 /σ2 a )( N i=1 a i ); (c 0 Y, θ c0 ) N (c 0, σ2 c0 ), where σ2 c0 = (1/σ2 c0 +N/σ2 c ) 1 and c 0 = (σ2 c0 /σ2 c )( N i=1 c i ). A2.3 Full conditional distribution for the individual linear scale and amplitude transformation parameters (c i, a i ): (a i, c i Y, θ (ci,a i )) N (m l, Σ l ), where Σ 1 l = [ Σ 1 c,a + 1/σ2 ǫ W W ] and m l = Σ l [Σ 1 c,a (c 0, a 0 ) + 1/σ 2 ǫ W Y i ], with Σ c,a = Diag(σ 2 c, σ 2 a), W = [1 ni B(u i (t, φ i ))β]. A2.4 Full conditional distribution for the error variance parameter σ 2 ǫ: 19

20 (1/σ 2 ǫ Y, θ σ 2 ǫ ) Gamma(a ǫ, b ǫ), where a ǫ = a ǫ + ( N i=1 n i )/2 and b ǫ = b ǫ + 1/2 N i=1 (Y i µ i ) (Y i µ i ), where µ i = c i 1 ni + a i B(u i (t, φ i ))β. A2.5 Full conditional distributions for other variance parameters: (1/σc 2 Y, θ σc 2) Gamma(a c, b c ), where a c = a c+n/2 and b c = b c+1/2 N i=1 (c i c 0 ) 2 ; (1/σa 2 Y, θ σa 2) Gamma(a a, b a ), where a a = a a +N/2 and b a = b a +1/2 N i=1 (a i a 0 ) 2 ; (1/λ Y, θ λ ) Gamma(a λ, b λ ), where a β = a λ + K/2 and b λ = b λ + 1/2 β Ωβ; (1/σφ 2 Y, θ σ 2 φ ) Gamma(a φ, b φ), where a φ = a φ + Q N/2 and b φ = b φ + 1/2 N i=1 (φ i Υ) P (Φ i Υ). Acknowledgments L. Inoue acknowledges partial support by grant P50 CA from the National Cancer Institute and from the Career Development Funding from the Department of Biostatistics, University of Washington. 20

21 References 1. Bar-Joseph, Z. (2004). Analyzing time series gene expression data. Bioinformatics, 20, Brumback, C. L. and Lindstrom, J. M. (2004). Self modeling with flexible, random time transformations. Biometrics, 60, Capra, W. B., and Müller, H. (1997). An accelerated-time model for response curves, Journal of the American Statistical Association, 92, de Boor, C. (1978). A Practical Guide to Splines. Berlin: Springer-Verlag. 5. Barry, D. (1986). Nonparametric Bayesian Regression. The Annals of Statistics, 14, Eliers, P. and Marx, B. (1996). Flexible Smoothing using B-splines and Penalized Likelihood (with comments and rejoinder). Statistical Science, 11, Friedman, N., Linial, M., Nachman, I. and Pe er, D. (2000). Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7, Gasser, T., and Kneip, A. (1995). Searching for structure in curve samples. Journal of the American Statistical Association, 90, Gelfand, A.E., Dey, D.K. and Chang, H. (1992). Model determination using predictive distributions with implementation via sampling based methods (with discussion). In Bayesian Statistics 4, J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith (eds). Oxford: Oxford University Press. 10. Gervini, D., Gasser, T. (2004). Self-modelling warping functions. Journal of the Royal Statistical Society, Series B: Methodological, 66,

22 11. Inoue, L.Y.T., Neira, M., Nelson, C., Gleave, M. and Etzioni, R. (2006). Clusterbased network model. Biostatiscs (in press). 12. Kneip, A. and Gasser, T. (1988). Convergence and consistency results for selfmodeling nonlinear regression. The Annals of Statistics, 16, Kneip, A. and Gasser, T. (1992). Statistical tools to analyze data representing a sample of curves. The Annals of Statistics, 20, Kneip, A., Li, X., MacGibbon, K. B., and Ramsay, J. O. (2000). Curve registration by local regression. The Canadian Journal of Statistics, 28 (1), Lang, S. and Brezger, A. (2004). Bayesian P-Splines. Journal of Computational and Graphical Statistics, 13, Lawton,W. H., Sylvestre, E. A. and Maggio, M. S. (1972). Self modeling nonlinear regression. Technometrics, 14, Lindley, D.V. and Smith, A.F.M. (1972). Bayesian estimates for the linear model (with discussion). Journal of the Royal Statistical Society, Series B: Methodological, 34, Liu, X. and Müller, H. (2004). Functional Averaging and Synchronization for Time-warped random curves. Journal of the American Statistical Association, 99, Newton, M. A., Noueiry, A., Sarkar, D., and Ahlquist P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5: Qian, J., Dolled-Filhart, M., Lin, J., Yu H., Gerstein, M. (2001). Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. Journal of Molecular Biology, 314,

23 21. Ramsay, J. O., and Silverman, B. W. (2002). Applied functional data analysis: methods and case studies. Berlin: Springer-Verlag. 22. Ramsay, J. O., and Silverman, B. W. (1997). Functional data analysis. Berlin: Springer-Verlag. 23. Ramsay, J.O. and Xiaochun Li (1998). Curve Registration. Journal of the Royal Statistical Society, Series B: Methodological, 60, Ronn, B. (2001). Nonparametric Maximum Likelihood Estimation for Shifted Curves. Journal of the Royal Statistical Society, Series B: Methodological, 63, Ruppert, D., Wand, M.P. and Carroll, R.J. (2003).Semiparametric Regression. Cambridge University Press. 26. Sakoe H. and Chiba S. (1978). Dynamic programming optimization for spoken word recognition. IEEE Trans. Acoustic, Speech and Signal Processing, Vol ASSP - 26, 1, Silverman, B. W. (1995). Incorporating parametric effects into functional principal components analysis. Journal of the Royal Statistical Society, Series B: Methodological, 57, Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9 (12), Tuddenham, R. D. and Snyder, M. M. (1954). Physical growth of California boys and girls from birth to eighteen years. University of California Publications in Child Development, 1, Wahba, G. (1978). Improper Priors, Spline Smoothing, and the Problem of Guarding Against Model Errors in Regression. Journal of the Royal Statistical 23

24 Society, Series B: Methodological, 40, Wahba, G. (1983). Bayesian Confidence Intervals for the Cross-Validated Smoothing Spline. Journal of the Royal Statistical Society, Series B: Methodological, 45, Wang K. and Gasser T. (1997). Alignment of curves by dynamic time warping. The Annals of Statistics, 27, Wang, K. and Gasser, T.(1998). Asymptotic and bootstrap confidence bounds for the structural average of curves. The Annals of Statistics, 26 (3), Wang, K. and Gasser, T. (1999). Synchronizing sample curves Nonparametrically. The Annals of Statistics, 27, Yu, H., Luscombe, N.M., Qian, J., Gerstein, M. (2003). Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends in Genetics, 19(8),

25 Growth velocity Log Expression Physical Age (a) Time (b) Figure 1: Examples of Functional Data. Panel (a): Individual growth velocity curves for 39 boys. Panel (b): Yeast cell cycle mrna expression measurements (Spellman et al, 1998) for a sample of 5 genes, where each time unit corresponds to 7 minutes.

26 Stochastic time Function value Physical time (a) Physical time (b) Funciton value Function value Physical Time (c) Warped time (d) Function value Stochastic time Physical time (e) Physical time (f) Figure 2: Simulation study. Panel (a): Generating monotone transformation functions u i (t). Panel (b): 30 sample curves m u i (t) generated by applying the common shape function to the monotone time transformations (from panel (a)) plus random linear scale and amplitude transformations. Panel (c): 30 simulated curves, each obtained from adding random noise to the reference signals in panel (b) superimposed with a cross sectional mean. Panel (d): 30 registered sample curves superimposed with posterior median registered shape function. Panel (e): Three simulated trajectories ( ) with estimated posterior median warped signals ( ) (with corresponding 90% credible bands in dashed gray) and true generating signals ( ). Panel (f): Three posterior median estimated warping functions ( ) (with corresponding 90% credible bands in dashed gray) and corresponding true signals ( ).

27 Growth velocity Stochastic time Physical Age (a) Physical time (b) Growth Velocity Growth Velocity Warped time (c) Physical Age (d) Growth Velocity M0: Log CPO Physical Age (e) M1: Log CPO (f) Figure 3: Height velocity data analysis. Panel (a): Individual unregistered growth velocity curves for 39 boys, with cross sectional mean in gray. Panel (b): Estimated median warping functions. Panel (c): 39 registered growth velocity curves with associated median registered shape function. Panel (d): Cross sectional (dashed gray) and registered mean growth velocity (solid black) for 39 boys with 90% credible bands around the median registered shape function. Panel (e): Registered posterior median growth velocities (along with the corresponding 90% credible bands) for a model with fixed time transformation image (dashed) and a model with random time transformation image (solid). Panel (f): Log CPO for a model with fixed time transformation image (M1) Vs. Log CPO for a model with random time transformation image (M0).

28 Density Stochastic time Amplitude (a) Physical time (b) Log Expression * * * * * * * * * * * * * * * * * * * Gene Time (c) Gene (d) Log Expression YGR108W YPR119W YBL032W YER001W Stochastic time Time (e) Physical Time (f) Figure 4: Cell Cycle data analysis. Panel (a): Posterior distributions for 100 individual amplitude parameters a i. Panel (b): Posterior median time transformation functions for 100 genes as related to the amplified structural mean. Panel (c): Posterior median expression profile (c i +a i m u i (t)) and 90% C.I. for a random sample of three genes. Panel (d): Posterior expected average absolute distance between each pair of the top 50 genes (darker areas indicate closer time transformation distances). Panel (e): Gene YPR119W (dark solid + bullets) with its three best matches: YGR108W, YER001W and YBL032W (dark solid trajectories). Panel (f): Time transformation function for gene YPR119W (dark solid) with time transformation functions for the three best matching genes (gray solid) and for the three worst matches (gray dashed).

29 Table 1: List of Top Gene Profile Similarities (Lowest posterior expected warping distance). Gene 1 Gene 2 Relationship Notes ORF (Gene) ORF (Gene) YGR109C (CLB6) YKL164C (PIR1) Inhibition No known interactions. YJL159W (HSP150) YKL163W (PIR3) Coexpression Both members of the PIR family with similar functions. YBL003C (HTA2) All histones, associated to cell YNL030W (HHF2) cycle and DNA processing. All YDR225W (HTA1) YBR010W (HHT1) Coexpression present evidence of physical YNL031C (HHT2) interactions from mass YDR224C (HTB1) spectrometry studies. YBR010W (HHT1) YNL030W (HHF2) Coexpression Both histones with similar functions. Physical interactions were observed in gel retardation experiments. YPR156C (TPO3) YNL058C Coexpression YNL058C has no known function. YKR013W (PRY2) YPL163C (SVS1) Coexpression PRY2 has no known function. YNL030W (HHF2) YNL031C (HHT2) Coexpression Physically interacting histones. YKL164C (PIR1) YKL096W (CWP1) Inhibition Both genes play a role in the biogenesis of cellular components. YDR224C (HTB1) YPL127C (HH01) Coexpression Histones with similar functions. YPR119W (CLB2) YGR108W (CLB1) Coexpression Similar cellular functions. YKL185W (ASH1) YDR055W (PST1) Coexpression Similar cellular functions. [Description obtained from

Modelling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions.

Modelling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions. BAYESIAN STATISTICS 7, pp. 000 000 J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2003 Modelling Data over Time: Curve