Dynamic Inverse Prediction and Sensitivity Analysis With High-Dimensional Responses: Application to Climate-Change Vulnerability of Biodiversity

Size: px

Start display at page:

Download "Dynamic Inverse Prediction and Sensitivity Analysis With High-Dimensional Responses: Application to Climate-Change Vulnerability of Biodiversity"

Avice Brooks
5 years ago
Views:

1 Supplementary materials for this article are available at /s Dynamic Inverse Prediction and Sensitivity Analysis With High-Dimensional Responses: Application to Climate-Change Vulnerability of Biodiversity James S. CLARK,DavidM.BELL, Matthew KWIT, Amanda POWELL, and Kai ZHU Sensitivity analysis (SA) of environmental models is inefficient when there are large numbers of inputs and outputs and interactions cannot be directly linked to input variables. Traditional SA is based on coefficients relating the importance of an input to an output response, generating as many as one coefficient for each combination of model input and output. In many environmental models multiple outputs are part of an integrated response that should be considered synthetically, rather than by separate coefficients for each output. For example, there may be interactions between output variables that cannot be defined by standard interaction terms for input variables. We describe dynamic inverse prediction (DIP), a synthetic approach to SA that quantifies how inputs affect the combined (multivariate) output. We distinguish input interactions (specified as a traditional product of input variables) from output interactions (relationships between outputs not directly linked to inputs). Both contribute to traditional SA coefficients and DIP in ways that permit interpretation of unexpected model results. An application of broad and timely interest, anticipating effects of climate change on biodiversity, illustrates how DIP helps to quantify the important input variables and the role of interactions. Climate affects individual trees in competition with neighboring trees, but interest lies at the scale of species and landscapes. Responses of individuals to climate and competition for resources involve a number of output variables, such as birth rates, growth, and mortality. They are all components of individual health, and they interact in ways that cannot be linked to observed inputs, through allocation constraints. We show how prior dependence is introduced to aid interpretation of inputs in the context of ecological resource modeling. We further demonstrate that a new approach to multiplicity (multiple-testing) correction can be implemented in such models to filter through the large number of input combinations. DIP provides a synthetic index of important inputs, including climate vulnerability in the context of competition for light and soil moisture, based on the full (multivariate) response. By aggregating in specific ways (over individuals, years, and other input variables) we provide ways to summarize and rank species in terms of their vulnerability to climate change. This article has supplementary material online. James S. Clark ( ) is Professor ( jimclark@duke.edu), David M. Bell is Graduate Student, Matthew Kwit is Graduate Student, Amanda Powell is Graduate Student, and Kai Zhu is Graduate Student, Nicholas School of the Environment, Department of Biology, and Department of Statistical Science, Duke University, Durham, NC 27708, USA International Biometric Society Journal of Agricultural, Biological, and Environmental Statistics, Volume 18, Number 3, Pages DOI: /s

2 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 377 Key Words: Biodiversity; Climate change; Forest dynamics; Hierarchical models; Interactions; Model selection; Multiple testing; Risk analysis. 1. INTRODUCTION Sensitivity analysis (SA) is an important element of environmental modeling. SA is used to quantify how uncertainty in parameter estimates affects uncertainty in predictions (Fieberg and Jenkins 2005), to identify the parameters that most influence predictions (de Kroon, van Groenendael, and Ehrleń 2000), and to determine the input variables and feedbacks that have large effect on response variables (Schwarz 2011). Complexity can preclude effective SA for at least two reasons. First, a large number of sensitivity coefficients would be needed to fully explore a model with multiple outputs. SA of q {1,...,Q} input variables that influence each of r {1,...,R} response variables could require a minimum of Q R coefficients for all input output combinations. Second, when models include feedbacks and interactions that are themselves of interest SA can be especially complex. For example, in climate models absorption of solar radiation increases cloudiness, which reflects incoming radiation back to space (Schwarz 2011). This feedback results from an interaction between an input and the current system state. In models of climate and biodiversity, species abundances change due to feedbacks between individual organisms; thus, SA could be profitable at this individual scale. Quantifying the variability in sensitivities to climate among individuals and over time could shed light on how environmental fluctuations impact population health. However, when SA varies between individuals and within individuals over time, complexity can be daunting. Hierarchical modeling allows inference on such high dimensional problems (Wikle et al. 2001; Gelfand et al. 2005; Cressie et al. 2009), but it introduces another challenge of how to effectively summarize so many relationships, including interactions. Here we introduce a new approach to evaluate inputs and outputs in large environmental models, based on dynamic prediction of a multivariate response vector. Advantages include inference at the appropriate scale, but in a simpler and more synthetic way than traditional SA. Our approach accommodates not only interactions that can be directly parameterized as combinations of input variables, indicated by the index qq, but also those that arise internally, where one response variable r depends on another response variable r in ways that cannot be parameterized through inputs. We apply it to one of the most extensive, long-term forest data sets that includes both experimental manipulation and natural variation for 30,000 individual trees tracked from 10 to 20 years for >300,000 tree-years. The data set is unique in three ways, (1) it provides annual resolution, (2) it tracks multiple demographic response variables, and (3) it includes experimental manipulation. This data set contrasts with previously published forest plot data that are purely observational, that record only tree size as a response, that lack manipulation or monitoring of important

3 378 J.S. CLARK ET AL. covariates, and/or that provide temporal resolution too coarse (e.g., five-yr frequency) to allow effective climate analysis. Our analysis includes three elements. First, we motivate SA at the scale of individual organisms, aided by the concepts of input and output interactions. Second, we introduce dynamic inverse prediction (DIP), an integrative approach to SA, having the advantages of being more synthetic and less complex. Finally, we show that a new innovation for variable selection involving multiple comparisons in high dimensional models (Scott and Berger 2010) applies well in this setting. The example of biodiversity and climate change is used to illustrate throughout. 2. BACKGROUND MOTIVATION Biodiversity response to climate change provides a challenging and important application of our approach. Climate changes occurring now pose fundamental questions about future biodiversity (e.g., Hillyer and Silman 2010; Ettinger, Ford, and HilleRis- Lambers 2011; Clark et al. 2011b; Zhu, Woodall, and Clark 2012). How can models allow for the combination of health responses to a combination of resources and climate? Can models help identify the species that will be strongly affected by climate change in the context of competition for resources? Can they help us anticipate where species will find refuge, given that the best sites will experience the strongest competition? Effective biodiversity risk assessment should consider the dynamic multivariate and interactive effects of climate variation at the scales at which processes occur, individuals responding to seasonal variation. Current models used to predict biodiversity responses to climate change primarily rely on spatial correlations between regional abundance of species and regional climate. However, climate affects not species, but rather weather affects individual trees. Climate changes, the aggregate of changes in weather, include increased growing season length and summer drought. The effects of these changes depend on competition, predation, and disease for individuals of different species and sizes. Water use and light interception by competing neighbors and topographic variation in moisture at fine spatial scales are not resolved in current models. Furthermore, the effects of input variables interact moisture is best exploited by individuals with access to high light, light capture depends on temperature, and so forth. Outputs may also be complex. The response individual health is multivariate, involving not only growth, but also fecundity and survival (Welp, Randerson, and Liu 2007; Souza et al. 2008; Granier et al. 2008; Valladares et al. 2008; Valladares and Pearcy 2002; Clark et al. 2011a, 2011b). In biodiversity studies a response vector could be as large as all of the species in a region that depend on environmental variation, all of the individuals in a population as they jointly respond to the environment and one another, or all of the variables measured on an individual that help describe its response to weather. In each of these cases interest may focus on the full (multivariate) response as opposed to the individual elements of a response vector.

4 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 379 In our application responses occur at the individual scale, but, as is typical in many such studies, interest focuses on aggregate quantities, such as population distribution and abundance in relation to climate. In other words, the processes that control patterns at the level of interest (species and climate) operate at the scale of individual responses to weather. Our approach is motivated by the need to infer how individuals of different species differ in their relationships with climate, building from long-term health of individuals exposed to natural and experimental variation in risk factors. We describe advantages to a new perspective on the dynamic inverse problem of an organism s multivariate response MOTIVATION FOR DYNAMIC INVERSE PREDICTION Two problems that arise in sensitivity analysis of environmental models are (i) the effect of input variable q on output variable r is a single coefficient extracted from a large fitted model, and (ii) there are too many combinations to interpret and no obvious way to rank them in terms of importance. Consider output vector y with elements {y r : r = 1,...,R} and input vector x with elements {x q : q = 1,...,Q}. Sensitivity s(r,q) = dy r /dx q could be as simple as a regression coefficient. But there are many of them, Q R, so which do we focus on? Is q important if s(r,q) > s(r,q )? That depends on one s view of the importance of r vs. r. Which is more diagnostic under the range of conditions that can occur? If we view the components of y as part of an integrated response it may be difficult to weight their contributions. How do we reduce the dimensionality of this sensitivity analysis? If we invert the problem to a predictive distribution x q y,..., then both problems are addressed. Based on a fitted model the full response vector y assigns a score to each x q, weighted by the full fitted model. Unlike traditional SA, which relies on coefficients for each combination of input variable q and output or response variable r, dynamic inverse prediction (DIP) is synthetic an input variable q is evaluated based on its effect on the full response vector y, rather than each response variable individually. For the application considered here, the response vector for individual i in year t +1, y i,t+1, contains fecundity, represented here by index r, and growth, represented by index r. The influence of input variables in vector x i,t can be most directly assessed from the capacity of the individual i in year t + 1 to predict these inputs, based on the full response in the length-r vector y i,t+1. DIP reduces the analysis from Q R sensitivity coefficients to no more than Q predictive distributions DIP is not only synthetic, it is less complex. It is dynamic in this application, because we are concerned with prediction over time the prediction changes with changing input variables. To illustrate, consider a simulated example for the linear model, omitting time t for the moment, y i N(x i A, ), with length-r observation vectors y i, i = 1,...,n, length-q input vectors x i, and parameter values in Q R matrix A. We defer discussion of prior distributions on A and to Section 4.4, but note here that there will be an additional prior density for input variables, x i(q) N( x (q), 10). We specify a prior density centered on the mean values of input variables. The prior for input variables is needed to ensure that the predictive distribution of input variables is proper. Large error is introduced in covariance matrix having diagonal elements selected at random from the interval (10, 100), much wider than the range of x i A. Correlations that determine off-diagonal elements are in the

5 380 J.S. CLARK ET AL. Figure 1. At left are predictive means and 95 % intervals for simulated data with four main effects in a design matrix that also includes an intercept in position 1 and an interaction in position 6 (x 4 x 5, not shown). There are R = 50 response variables and n = 30 observations. The 1:1 line of agreement and horizontal line at the prior mean are shown. The 300 coefficients in A for Q = 6 inputs are shown at lower left. Input 2 is especially informative, all values being far from zero. Inputs 3 and 4 are uninformative, having values close to zero. Input 4 interacts with input 5, which is more informative than 4. At right are histograms of scores for each observation (Equation (19)) with mean scores shown for each panel. At lower right are predictions of y (95 % intervals, with summaries for 10 bins). range ( 0.2, 0.2). The parameter matrix A has covariance H = (X T X) 1, where X is the n Q designmatrixhavingrowsx i. There is a simulated data set of sample size n = 30 with Q = 6 input variables in x and R = 50 response variables in vector y i.main effects represent four of the input variables, the remaining two being an intercept and an interaction term. Before considering the inverse prediction of input variables, we show the standard predictive intervals for vectors y i at lower right in Figure 1. These predictions marginalize over the posterior distribution of A, with additional stochasticity induced by the likelihood,

6 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 381 p ( y i X, Y, ) = N ( y i x i A, ) N ( vec(a) vec (( X T X ) 1 X T Y ), H ) da. The notation N(y i x ia, ) indicates that the vector y i is distributed as a normal distribution with arguments being the mean vector and covariance matrix, and Y is n R response matrix. The n R = 1500 predictive distributions for Y in Figure 1 (lower right) emphasize the large errors in. There is also a predictive distribution for input variable q in observation i, p ( xi(q) X, Y, ) p ( y i xi(q), A, ) N ( vec(a) vec (( X T X ) 1 X T Y ), H ) p ( xi(q)) da. Figure 1 shows 95 % predictive intervals for input variable q in each observation i. Input variables are identified with the notation x,q for all n observations in the qth column of X. In this example input x,2 has disproportionately large effects on all responses this can be seen as large sensitivity coefficients at lower left. The predictive intervals for the x,2 are narrow (top left) and predictive scores high (upper right we discuss the intervals and scores in Section 4.2). Input x,3 has average effects that are lower than other inputs by a factor of 0.1. Consequently, predictive intervals are broad, many spanning the prior mean of 0, and scores are low. Input x,4 has average effects of the same magnitude as x,3,but it interacts with x,5, which has a larger effect than x,3. The predictions for x,4 and x,5 are intermediate between important x,2 and unimportant x,3. The predictive intervals of main effects in X (left side of Figure 1) and mean scores (right side) summarize input effects in a way that could not be achieved with Q R = 300 sensitivity coefficients for a traditional SA. The 300 responses in matrix Y are weighted by the model itself, and they incorporate effects of interactions. The full model is engaged in these prediction scores, rather than individual coefficients extracted from it. In this example of a poorly fitting model with 300 poorly estimated regression coefficients there are still important input variables in terms of their impact of the full response vectors that emerge from DIP. In many studies a number of attributes in Y might be measured simply because they happen to be observable. Their diagnostic value might be unknown. The importance of a response variable may differ from one observation to the next. Inverse prediction provides an efficient means for extracting the important input variables, even in cases where many of the response variables contain limited information. Simulation studies further show that predictions of input variables are insensitive to collinearity in X. Figure 2 shows 95 % of prediction scores for experiments where n = 1000, Q = 6, R = 50, = diag(1,r), and A is constructed as in the previous example. A random X is generated as before but now with systematic correlation introduced between x,2 (informative) and x,3 (non-informative) ranging from 0 to The ranges of prediction scores result from random X and the posterior distribution of A. Scores for informative x,2 are uniformly high and x,3 uniformly low, despite a range of collinearity between these two variables. Prediction scores for remaining two main effects are likewise unaffected by collinearity. DIP builds on a tradition of predicting observations based on a fitted model, typically used to check or rank models (Gelfand and Ghosh 1998) or to evaluate their predictive

7 382 J.S. CLARK ET AL. Figure 2. Ninety-five percent of prediction scores for main effects in X where the correlation between x,2 and x,3 ranges from 0 to capacity (Gneiting and Raftery 2007). Rather than ask how well the model predicts the data, inverse prediction focuses on how well an observation, in our application an individual tree in a given year, predicts environmental inputs based on all input variables and the fitted model. In the application that follows a fitted state-space model for an imputed response vector y ij,t+1 is inverted to predict input variables in the vector x ij,t (Section 4.2). In a state-space model the response vector y ij,t could be treated as a latent state that is not directly observed, but rather is imputed based on observations, contained in a vector z ij,t. If the predictive mean is biased or predictive variance large then the variable has limited impact on y ij,t. Prediction scores can be aggregated to obtain mean scores for individuals, years, and species. The approach can be viewed as an application of in-sample prediction, applied and interpreted in the context of individual health INTERACTIONS IN THE MODEL In principle DIP could be applied to many environmental modeling challenges. Our application to biodiversity and climate change poses several considerations related to interactions, prior specification, and variable selection. Input interactions are defined here as those that can be estimated as traditional interaction terms for combinations of input variables. Input interactions are positive or negative and determine whether changes in resources and temperature tend to amplify or buffer one another. Traditionally, interactive effects of resources have been termed complementary or antagonistic (Tilman 1980; Huisman and Weissing 2001; Revilla and Weissing 2008; Hall2009). We use the terms amplifying and buffering. A positive interaction is one where the effect of an input variable is greatest when another is abundant (Figure 3a). On the other hand, many hypothesized effects of climate change are consistent with the notion of buffering, which can be viewed as a negative interaction (Figure 3b). For example, Frelich and Reich (2010) hypothesize that moist locations will provide refuges as aridity increases in the future, a negative interaction between moisture change over time (increasing aridity) and spatial variation in moisture status. In other words, reduced moisture supply during drought has greatest impact on sites where mois-

DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 383 Figure 3. Three interaction examples from regression having positive main effects for both variables.

8 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 383 Figure 3. Three interaction examples from regression having positive main effects for both variables. The response surfaces grade from low (red) to high (yellow). Big arrows indicate values of variable 1 where response to variable 2 is large, and vice versa. Resource models are typically constrained to positive interactions between resource variables (a), but acknowledge negative interactions (b). Regression allows both, but is also capable of unrealistic behavior (c). Both (b) and (c) have negative interaction terms. ture is already low (Figure 3b). This would occur if individuals on wet sites are effectively buffered from drought, responding less than those on dry sites. Competition can change these expectations; a positive interaction (Figure 3a) could result if leaf area and transpiration demand increase to fully exploit the greater moisture supply on wet sites, making them more vulnerable to drought. For resources and climate results like Figure 3c may violate prior assumptions. To avoid outcomes like Figure 3c it is not enough to impose a prior on main effects. We specify a prior that can be used where the sign of the full effect is known (positive or negative), but the sign of the interaction is not (Section 4.4). Our application of the term input interactions distinguishes this interaction between input variables from those that arise internally and/or cannot be attributed to combinations of input variables. Output interactions are defined here to be relationships between response variables that cannot be directly linked to input variables. Unlike input interactions, which can be specified explicitly as relationships between input variables, there will be relationships between response variables that arise from feedbacks that are not observed. We use the term output interaction to refer to relationships between outputs that cannot be specified as traditional (input) interaction terms. In our application output interactions occur within organisms, due to allocation constraints. If allocation to fecundity comes at the cost of reduced allocation to growth, there can be a relationship (interaction) between fecundity and growth that cannot be fitted to specific combination of input variables. Output interactions can lead to surprises if not properly identified (Section 4.3). Our approach helps to accommodate and clarify contributions from input and output interactions. We discuss how these two types of interactions are identified and how they affect sensitivity analysis in Section VARIABLE SELECTION The unusual size of the data set and number of input combinations presents a model selection challenge. When many species each have different geographic distributions there are different numbers of input variables to consider for each species. Seasonal winter and spring temperatures (w), summer drought (m), local moisture status (M), tree size (D), previous growth rate (d), and availability of light (C) affect the demographic rates of 40 dominant species over 20 yr. We entertain up to 1062 main effects and two-way interactions, but depending on geographic distribution, some species cannot include all of them.

9 384 J.S. CLARK ET AL. Multiplicity describes the fact that the number of selected variables scales with the number of input variables that are considered. Size of the selected model can be corrected for the fact that the number of variables considered is not constant (Scott and Berger 2010). In our analysis variable selection is based on the marginal likelihood, which penalizes large models, and a model prior to address the multiplicity posed by variable combinations (Scott and Berger 2010) that differ among species. Model fitting and selection tools help to filter through many potential variables to identify those of consequence (Section 4.5). In the sections that follow we summarize the data sets and the model (details are in Clark et al. 2010). We then introduce DIP and its relation to SA, followed by application. 3. DATA SETS Data come from 20-yr census plots located in mixed temperate forests from midelevation Piedmont to northern hardwoods of the southern Appalachians of North Carolina. Individuals of all tree species are tracked over time as they respond to spatiotemporal variation in climate and local competition for light and moisture (Section 4.1). For a given species there are i = 1,...,n j individuals on plot j = 1,...,12 plots, modeled over t = 1,...,T years. Response variables are demographic rates, including diameter growth d ij,t and fecundity potential f ij,t, informed by observations from tree censuses, tree increment cores, remote sensing, and seed traps, using field methods detailed in Clark et al. (2010). Tree-year observations taken during censuses include tree diameter, survival status, crown class, and reproductive status. Censuses are conducted at 2 to 4 year intervals. Additional observations of growth are obtained from increment cores, which provide annual growth data. Remote sensing is used to quantify exposed canopy area (ECA) as an index of light availability. Seed-year observations come from seed traps, collected two to five times annually. Data submodels are detailed individually for annual fecundity, growth, and mortality in Clark et al. (2010). Input variables are restricted to climate, competition, and resources known to affect demographic responses (Table 1). Plots were selected to provide a range of climate variation (Piedmont to mountains). Tree canopies were manipulated (pulling down large trees) to provide a full range of light values (Cooper-Ellis et al. 1999; Dietze and Clark 2008; Clark et al. 2010). Exposed canopy cover C ij,t is an index of light availability and ranges from 0 (completely shaded by neighbors) to >100 m 2. Summer drought is summarized by the Palmer Drought Severity Index (PDSI) m j,t for June through September for site j in year t. PDSI expresses the departure of a given year from the long-term moisture availability for the site, in this case since 1930 (Figure 4b). Spatial variation in moisture availability M ij is taken as the product of annual average precipitation (mm) at site j and the topographic convergence index (Bevin and Kirkby 1979) for the location of tree ij (Figure 4c). M ij varies among the 12 stands due to variation in precipitation and within sites due to topography. Thus, M ij represents spatial variation and m j,t represents temporal variation, how drought index for a given growing season departs from the site average. In addition to climate variables and light, the model includes tree diameter D ij,t and previous growth rate d ij,t 1, both of which can explain growth and fecundity (Clark et al. 2010).

10 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 385 Table 1. Hypothesized direct effects and interactions by demographic response variable. Input covariate Reference Summary rationale Prior distribution Growth response ln(d ij,t 1 ) Fecundity potential ln(f ij,t 1 ) Minimal model Intercepts species No prior knowledge NI Canopy area ln(c ij,t 1 ) tree-year Light is a limiting resource for A qr > 0 A qr > 0 which plants compete Additional main effects Diameter ln(d ij,t 1 ) tree-year Fecundity potential can increase NI A qr > 0.5 allometrically Large diameter effect ln 2 (D ij,t 1 ) tree-year Physiological function may decline, but not improve with old age A qr < 0 A qr < 0 Previous year growth ln(d ij,t 1 ) tree-year Fecundity may depend on previous growth, beyond effects explained by past climate inputs A qr = 0 site-year Years with warm winters, long site-year Drought years decrease carbon site Sites with warm winters, long Winter temperature deviation A qr > 0 A qr > 0 w j,t 1 growing seasons increase carbon gain Summer (Jun, Jul, Aug, Sep) A qr > 0 A qr > 0 drought deviation m j,t gain Average winter (Jan, Feb, Mar) A qr > 0 A qr > 0 temperature W j growing seasons increase carbon gain Average moisture index M ij tree Moist sites support carbon gain A qr > 0 A qr > 0 Interactions Light by winter temperature tree-year C ij,t 1 w j,t 1 Light by summer drought tree-year C ij,t 1 m j,t Light by ave winter site-year temperature C ij,t 1 W j Light by ave moisture tree-year C ij,t 1 M ij Winter temperature by summer site-year drought w j,t 1 m j,t Summer drought by ave moisture m j,t M ij tree-year NI Like moisture, temperature enters as a site effect W j and a site-year effect w j,t. Winter and spring temperatures control bud break, leaf and fruit set and can have a large impact on tree carbon balance. We use the annual temperature for January through March for site j in year t. The site effect is taken to be the average winter/spring temperature W j, and the site-year effect is the departure from that average, w j,t (Figure 4a). The ranges of input variables in this study are relevant for 21st century climate change predictions. They span the southeastern Piedmont to northern hardwoods in spatial variation. Variation in temperature among sites and over time spans the 2 to 5 C is similar to the temperature increases predicted for 21st century climate change. Within the study period, variation in summer PDSI for the 20-yr study period spans the interval ( 4, 4), i.e. several

11 386 J.S. CLARK ET AL. Figure 4. Climate related input variables. (a) Winter/spring temperature has a spatial component W j (time-averaged, among sites j) and a temporal component w j,t (within j, over time). Summer moisture has a temporal component m j,t, the Palmer Drought Severity Index (b), and a spatial component, the moisture index M ij (c). The map in (c) shows M ij values for tree locations from moist (blue) to dry (yellow). severe droughts to some of the wettest years for this climate. This large temporal variation was experienced by most, but not all, species. The range of local (spatial) moisture values was limited for species that were restricted one or a few sites, but broad for many. Canopy removal experiments provided a full range of canopy area-tree size combinations, to ensure that effects of both variables could be estimated (Clark et al. 2010). Between the input variables included in this analysis, pairwise correlations were low, mostly less than 0.2 in absolute value (Supplement).

12 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS MODEL SUMMARY Multivariate responses of individuals to multiple inputs are modeled in a state-space framework. The model includes uncertainty in the process, variation among individuals, and observation models. There is process error within and between individuals and over years. Here we summarize the model from Clark et al. (2010), which provides additional detail on prior specification, algorithm development, and diagnostics MODEL DEVELOPMENT The process model tracks changing states of individuals each year as they grow, reach reproductive maturity, produce seed, and die. There are direct responses of growth and fecundity to fluctuating inputs, as well as interactions that result from tradeoffs in allocation between growth and reproduction (Knops, Koenig, and Carmen 2007; Mund et al. 2010; Sánchez-Humanes, Sork, and Espelta 2011); the latter emerge as output interactions, not directly linked to input variables. Maturation is a partially hidden Markov process, where an individual ij can change from the immature F ij,t = 0 to mature F ij,t+1 = 1 state as it increases in size, dependingon access to resources and climateinputs (Table 1). The process equation for the bivariate growth-fecundity response for a mature individual (F ij,t = 1) has response vector y ij,t =[ln d ij,t ln f ij,t ], (4.1) which includes fecundity f ij,t (potential seed productivity) and the diameter growth increment d ij,t (cm), which determines change in diameter D ij,t+1 = D ij,t + d ij,t. As mentioned previously, data models link this process equation to observations, which consist of increment cores, tree diameter measurements, observations of maturation status, and seeds collected in traps (Clark et al. 2010). Our interest here is in the latent vector y ij,t.the process equation is ( yij,t+1 Fij,t+1 = 1 ) N 2 (x ij,t A + α ij + κ t, ), α ij N 2 (0,W) (4.2) where x ij,t isa1byqvector of inputs (main effects and interactions), A is a Q by R matrix of fitted parameters for R = 2 response variables, α ij is the random effect associated with individual ij, W is the 2 by 2 covariance matrix for random effects, κ t is a fixed year effect, is a 2 by 2 covariance matrix for process error, [ = r rr r r r ]. (4.3) In this case of R = 2 the covariance is the scalar quantity rr = r r. Equation (4.2) is the process equation of a state-space model that applies to individuals that are reproductively mature. For immature individuals, F ij,t = 0, y ij,t is a scalar quantity for growth (i.e., there is no fecundity), and Equation (4.2) is univariate. Note that a given predictor q {1,...,Q} enters the model in three ways, for adult growth and fecundity (Equation (4.1)) and for juvenile growth. Data models for observations, prior distributions, algorithm development for MCMC, and diagnostics, are detailed in Clark et al. (2010).

13 388 J.S. CLARK ET AL IMPLEMENTATION OF DYNAMIC INVERSE PREDICTION DIP is derived directly from the fitted model. We wish to evaluate the importance of an input variable q, based on the full response vector, incorporating all interactions. Let q represent variable(s) that interact with q, and q represent variables that do not. Consider a predictive distribution for x ij,t (q), i.e., the qth element of x ij,t. To ensure a proper distribution of predicted xij,t (q), we have the likelihood, posterior for parameters θ, and prior for xij,t (q), p ( xij,t ) (q) X, Y p ( y ij,t+1 x ij,t (q), x ij,t (q, q),θ ) p ( θ X, Y ) p ( xij,t (q)) dθ = p ( y ij,t+1 x ij,t (q), x ij,t (q, q)) p ( x ij,t (q) ). Equation (4.2) is the likelihood, which can be reorganized this way y ij,t+1 N 2 (x ij,t ( q) A q + x ij,t (q) B ij,t (q) + α ij + κ t, ) (4.4) where B ij,t (q) is the length-r vector for the main effect of q (variable of interest; see below), and A q is the matrix excluding row q and qq (interactions between q and q ). The first term in the mean vector includes terms not involving q, neither as main effects nor as interactions. The second term is a length-r vector that includes the main effect of q and interactions, B ij,t (q) = A q + p q x ij,t (p) A qp, (4.5) A qp is a row of A corresponding to an interaction with q. For example, suppose x = [x 1,x 2,x 3,x 4 x 3 x 4 ], where the design includes an interaction x 3 x 4, and subscripts for individual, location, and year are omitted for simplicity. If the response variable of interest is q = 3, then q = 4, and q =[1, 2, 4]. The first term of Equation (4.4)is A 11 A 12 [x 1 x 2 x 4 ] A 21 A 22, A 41 A 42 and the second term is x 3 [A 31 A 32 ]+x 3 x 4 [A 5,1 A 5,2 ]. The subscript 5 in matrix A corresponds to the row containing the interaction between inputs 3 and 4, i.e., qq. The prior on an input x q is x q N(a q,b q ). Now including subscripts, there is a predictive density PD ij,t (q) = N(ˆx ij,t (q),v ij,t (q) ) (4.6)

14 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 389 with predictive mean and variance ˆx ij,t (q) = V ij,t [ Bij,t 1 (y ij,t+1 x ij,t ( q) A q α ij κ t ) + a q /b q ], ) 1. V ij,t = ( B ij,t (q) 1 B T ij,t (q) + b 1 q This predictive distribution incorporates interactions caused by input variables in Equation (4.5) and additional (output) interactions in the response vector absorbed by. This standard approach to predicting missing x is used here as the basis for evaluating the role of q when the output y is a vector HOW INTERACTIONS AFFECT DIP VS. SENSITIVITY Here we compare the simplicity of the foregoing predictive distributions in DIP with a more traditional sensitivity analysis, where we incorporate input and output interactions. In SA there is a sensitivity coefficient for each input output combination qr. In the absence of input interactions the sensitivity of response variable r (e.g., growth or fecundity) to input variable q (e.g., temperature, moisture, light availability) is simply s (q r) = dy ij,t+1(r) dx ij,t (q) = y ij,t+1(r) x ij,t (q) x ij,t (q) x ij,t (q) = A q,r x D q (4.7) where A q,r is the coefficient for the rth response to input variable q. In our model this is the proportionate (log) response of growth or fecundity (Equation (4.1)) standardized by the range of variation xq D to allow comparison of effects across different input variables, x ij,t (q) = (x ij,t (q) min(x q ))/xq D, where xd q = max(x q) min(x q ). There is no ij, t subscript on the sensitivity coefficient of Equation (4.7), because there are no interactions the response to input variable q does not depend on the level of other inputs that an individual experiences that might interact with q. This standard approach to sensitivity has two disadvantages, (i) we have many coefficients, and (ii) a given coefficient A q,r has to summarize the entire effect; by contrast, Equation (4.6) engages the entire fitted model, including all sources of uncertainty. We pursue this sensitivity approach further to show that it can expose different types of interactions. When there are interactions among inputs and outputs (allocation within individuals), sensitivity varies among individuals and years, ( yij,t+1(r) s ij,t (qq,r r) = + y ) ij,t+1(r) y ij,t+1(r ) xij,t (q) x ij,t (q) y ij,t+1(r ) x ij,t (q) x ij,t (q) = 1 xq D [1 g ij,t (rr ) ]B ij,t (q) (4.8) where g ij,t (rr ) is the (output) dependence of response y ij,t+1(r) on y ij,t+1(r ) g ij,t (rr ) = y ij,t (r) = rr 1 y ij,t (r r ), and B ij,t (q) is given by Equation (4.5). g rr is the dependence of r on r that is not tied to inputs. Input interactions (between input variables q and q ) enter through B ij,t (q). Input and output interactions interact in the second term of Equation (4.8). The sensitivity sub-

15 390 J.S. CLARK ET AL. script in Equation (4.8) now includes not only ij, t, but also r and q, the interactions. In terms of input and output interactions Equation (4.8) can be interpreted this way: s ij,t (qq r r) direct ( ) input effect + interaction q r (qq ) r + ( ) output interaction r r ij,t ij,t ( input/output interaction (qq ) r ) ij,t. (4.9) The first term is the direct effect of Equation (4.7). Input interactions (qq ) comprise the second term, those explained by combinations of input variables. Output interactions (rr ) modify the effects of inputs, due to allocation constraints between different response variables. There are many such coefficients for individuals ij, time t, and interactions r and q. There is no obvious way to simplify all of these coefficients into a meaningful summary. DIP incorporates output interactions more compactly than traditional SA. For a response vector of length R = 2 having covariance matrix [ = 1 ρ ρ 1 the predictive variance for an input variable x q in Equation (4.6) is 1 ρ 2 V = B1 2 2ρB 1B 2 + B2 2 where B 1 and B 2 are the two elements of vector B from Equation (4.5). Of course, an important input has large values in vector B and thus contributes to a small predictive variance. If the input variable affects both outputs in the same direction (B 1 and B 2 have the same sign), then an amplifying output interaction (ρ >0) increases the predictive variance. In both SA and DIP output interactions contained in are distinct from input interactions, contained in A. In DIP output interactions have little effect on the mean prediction, having more impact on the predictive variance. In SA, the output interactions play a different role, even leading to surprises. Consider an input variable q that has positive effect on overall health, such as a limiting resource. We expect that coefficients in row q of A are greater than zero. However, allocation tradeoffs result in negative elements of.aresponse variable r can have a negative sensitivity coefficient s (q r) for individuals having a strong positive response for variable r due to negative covariance rr < 0 and negative output interaction g rr. For example, trees partition stored reserves between growth and reproduction (Granier et al. 2008; Knops, Koenig, and Carmen 2007; Mund et al. 2010). Despite fluctuations in conditions that benefit overall health a negative correlation between growth and fecundity can result in responses that appear paradoxical. When negative covariance arises due to feedbacks that are not captured by input variables, sensitivities can be negative or positive, depending on values of other variables involved in the interaction q and the responses to them (r ). In DIP such negative correlations between response variables tend to decrease the predictive variance if the input variable affects responses in the same direction. ]

16 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 391 In summary, SA and DIP respond to combinations of input and output variables in different ways. DIP is synthetic for the effects of an input variable on the full multivariate output. DIP depends implicitly on output interactions, but, because it does not quantify each input-response separately, it does not identify when output interactions lead to surprises. SA is more complex, but it can be used to identify when output interactions could be the cause of surprising results. Finally, implementation requires a prior specification for the coefficients in A to address issues summarized in Figure PRIOR SPECIFICATION ON INPUTS WHEN THERE ARE OUTPUT INTERACTIONS Consider the common situation where prior knowledge indicates that input interactions could be either amplifying or buffering (Figures 3a or3b, but not 3c). Prior knowledge is limited to the full effect of an input variable q on response variable r, dy ij,t (r) /dx ij,t 1(q). In the absence of interactions, parameters can be flat over some positive or negative interval, e.g., truncated at zero. When input and output interactions can both occur a suitable prior on A is specified conditionally. The interpretation of interactions as amplifying or buffering requires that variables that interact with q do not change sign over the range of x q. This requirement is achieved with a rescaling of input variables and the prior specification that follows. We use flat priors on A, truncated at limits defined either by prior information (0 for main effects or non-zero for interactions in the next section), or they are set to positive or negative values near the extreme estimates obtained without truncation (Mitchell and Beauchamp 1988) all priors are proper, and none are defined by arbitrary and unrealistically large truncation values. For those truncated at zero we use model selection to determine whether or not an input variable is important (Section 4.5). Of course, not all elements of A need to be constrained to positive or negative values, only those where prior knowledge suggests it. The limits affect the height of the prior and thus the marginal likelihood used in model selection. Ignoring for the moment individual and year effects the expected response vector is E[y ij,t+1(r) ]=x ij,t A r where y ij,t (r) is the rth response variable in vector y, A r is the rth column of A, and x ij,t A r = +x ij,t (q) A q,r + x ij,t (q )A q,r + x ij,t (q) x ij,t (q )A qq,r + includes terms for direct effects of inputs q and q and their interaction (qq ). Further assume that a positive relationship is specified (through the prior distribution) between input variable q and response variable r, i.e., A qr > 0. This prior belief is only ensured if x q = 0. But most variables are not measured on scales where 0 has any particular significance. The prior belief applies to the derivative B ij,t(q,r) = dy ij,t+1(r) = A q,r + x ij,t (q dx )A qq,r ij,t (q) (see Equation (4.9)). The interaction coefficient A qq,r can be positive or negative, but it should not be so negative that it violates prior belief that dy r /dx q > 0. To impose the

17 392 J.S. CLARK ET AL. prior belief that x q has a positive effect on response variable r we specify a prior on A qq,r conditional on main effects in A, p(a qq,r,a q,r,a q,r) = p(a qq,r A q,r,a q,r)p(a q,r,a q,r). For a specific case where there is prior belief that both q and q have positive effect, the conditional interaction is (A q,r,a q,r)>0 A qq,r (A q,r,a q,r)>max( A q,r, A q,r). (4.10) Computationally, this dependence is imposed at the proposal stage of a Metropolis step VARIABLE SELECTION Although we limit consideration to only those variables known to have important impacts on tree health, the number of potential main effects and interactions is large and different for species having different geographic distributions. The minimal model includes only intercepts and exposed canopy area C ij,t (Table 1), the latter because all species are limited by light when growing in shaded understories. There are m = 1,...,M additional predictors in Table 1, where M = 13. The eight main effects in Table 1 contribute up to 2 8 = 256 combinations. A two-way interaction can be considered only if both main effects are included in the model. We focus on a subset of potential interactions, those most likely to have importance for climate change vulnerability and its interaction with competition (Table 1). Themodelspaceis large (1062variablecombinations), giventhatfittingis done using Metropolis within Gibbs (George and Mccullough 1993, 1997). Here we describe an implementation of Scott and Berger s (2010) technique for variable selection. Models are evaluated based on the posterior model probability, derived from the prior for model k, p(k), and marginal likelihood, p(z k), p ( k(m) z ) p ( z k(m) ) p ( k(m) ). (4.11) There are m predictors in model k(m), excluding the minimal model, which contains only intercept and canopy area. As mentioned above, z are observations, and data models are detailed in previous publications. Equation (4.10) is the basis for many model choice criteria, including posterior odds and Bayes factors (Clyde and George 2004; Clyde and Ghosh 2010) and model averaging (Hoeting et al. 1999). Following Scott and Berger (2010) we apply fully Bayesian model choice. This approach has the advantage that it automatically adjusts for multiple comparisons, without imposing ad hoc penalties. The model prior p ( k(m) ) = p m (1 p) M m (4.12) (George and McCullough 1997), when p is treated as an unknown, provides a multiplicity correction. The marginal likelihood from Equation (4.11) is p ( z k(m) ) = p ( z θ k,k(m) ) p(θ k )dθ k (4.13)

18 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 393 the integrand containing likelihood and prior. With a flat beta(p 1, 1) prior the posterior model probability is p ( k(m) ( ) 1 ) p(z k(m)) M z p ( z ) k(m) (4.14a) M + 1 m i.e., a mixture. Large models are penalized in two ways. The first penalty comes from the fact that each additional parameter adds a dimension requiring integration in the marginal likelihood (Equation (4.13)). This penalty is roughly multiplicative. Interactions bear this burden thrice, once for each main effect, then again for the interaction term. However, this is a size penalty, but not a multiplicity correction, because it does not depend on the number of models considered (Scott and Berger 2010). Instead multiplicity is handled through the prior. In our application, model selection focuses on matrix A. The marginal likelihood is approximated using the approach of Chib (1995). Inverting Bayes Theorem allows evaluation of p ( z k(m) ) = p(z A, k(m))p(a ) p(a z, k(m)) where A is a value of A having posterior density simulated by Metropolis-within-Gibbs. Results did not depend on the particular parameter matrix A chosen to evaluate it. The posterior inclusion probability for a given variable is p(a q 0) = p ( k(m) ) ( ) z I q k(m) (4.14b) {k} where {k} is the set of all models, I() is the indicator function, being equal to 1 if its argument is true and 0 otherwise. We monitored posterior model probabilities within the MCMC to progressively pare down the variable space and arrive at a maximum posterior model probability (Equations (4.14a), (4.14b)). The Metropolis-within-Gibbs algorithm described in Clark et al. (2010) is implemented such that the minimal model and an alternative model, proposed from a uniform distribution, are evaluated based on the log posterior probabilities. The more probable model is selected as the current model. Each model builds up a history of model probabilities. A combination of a large number of proposals and low posterior probability is cause for deletion of a model. In other words, model selection is progressive, with especially poor models eliminated first and surviving models requiring more proposals before rejection. Runs of 100,000 Gibbs steps were implemented consecutively, each time reintroducing the best of the models previously rejected. Finally, the last model is fitted alone AGGREGATING THE SELECTED MODEL Results from the selected model (Section 4.5) are aggregated over individuals, time or both to obtain summaries of DIP and SA. For DIP, there is a prediction for x q from every individual in every year, because individuals track an input variable differently at different

19 394 J.S. CLARK ET AL. locations, at different times, and over different ranges of input variables. Summaries are available by aggregating over individuals, years, or both. We apply Gneiting and Raftery s (2007) scoring rule to each individual and year, S ij,t (q) = (x ij,t (q) ˆx ij,t (q) ) 2 ln V ij,t (q). (4.15a) V ij,t (q) The score rewards predictions close to the truth and with low predictive variance (second term). The predictive variance in the first term penalizes overconfident predictions. Aggregation over time provides an average score for an individual, identifying which individuals are responding most and least to q, S ij (q) = 1 T 1 T S ij,t (q). t=2 (4.15b) Aggregation over individuals provides population level results for each year, which can help identify combinations of year t and location j that pose large vulnerabilities, S j,t (q) = 1 n j S ij,t (q). (4.15c) n j i=1 Aggregation over individuals, locations, and years provides the overall population prediction that can be compared among species, S(q) = 1 J(T 1) J j=1 t=2 T S j,t (q). 5. APPLICATION (4.15d) Advantages of the dynamic inverse approach as a component of SA are demonstrated here with example species and summarized with patterns for all species. We begin by considering the size of models selected and the interpretation of interactions. We then show DIP at different levels of aggregation, to evaluate synthetic responses to climate of individuals over time, and we use SA to illuminate the role of output interactions SELECTED MODELS Variable selection yielded models of intermediate size (Table 2). In some cases few variables were selected due to a limited distribution for a species, in terms of abundance (number of trees) or the range of variation in covariate space. Species that occur on a subset of plots had fewer variables under consideration than species that occur on all plots. For example, average winter temperature W j is a plot level variable. Species that occur on only a few plots were not tested for effects of average winter temperature. Rare species (acronyms are divi, ilde, mafr, qust, ulru, and ulun) did not have interactions in the final model. Although a selected model could be small due to limited distribution of a species, it was not the case that large models were selected for the abundant and widespread species (e.g.,

20 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 395 Table 2. Predictors by species. Any combination of d, f, j, or+ indicates inclusion in the model. d, f,orj indicates that the 90 % credible interval for adult growth, fecundity, or juvenile growth, respectively, does not include zero, either positive (+) ornegative( ). indicates that the 99 % credible interval does not include zero. X indicates inclusion, but the 90 % credible interval includes zero. spec C D D 2 d w m W M C w C m C W C M w m M m acru +d + f + j +d X X+ f + j +d acsa +d + f + j acpe +d + f + j +d X +d + f + j acba +d + f + j amar +d + f + j +d X + f + j X + f +d beal +d + f + j X + f +d X + f + j X + f + j +d + f bele +d + f + j +d X + f + j X + f +d f + j caca +d + f + j +d X + f + j +d cagl +d + f + j +d X + f + j X + f X + f + j +d + j caov +d + f + j +d f X+ f +d + j cato +d + f + j X + f d f +d f X+ f X X+ f +d + j caun +d + f + j +d + fx+ f + j X + f X + f +d + f +d + j ceca +d + f + j +d X + f X + f +d f + j cofl +d + f + j +d f X + f +d f + j divi +d + f + j fagr +d + f + j +d + f X + f X+ f + j +d fram +d + f + j +d + f X+ f X + f + j +d f + j +d j +d f + j ilop +d + f + j ilde +d + f + j juvi +d + f + j +d X + f + j +d + j list +d + f + j X X X X X X X X X litu +d + f + j +d + f X + f X + f + j +d + j +d + j maac+d + f + j mafr +d + j nysy +d + f + j +d f X+ f + j +d + j oxar +d + f + j +d + f X + f X + f X X +d + j +d + f + j piri +d + f + j +d + f X+ f X + f +d + j pist +d + f + j X +d + f +d +d + j

21 396 J.S. CLARK ET AL. Table 2. (Continued.) spec C D D 2 d w m W M C w C m C W C M w m M m pita +d + f + j +d f X + f + j X + f + j X + f + j +d f + j +d f + j piec +d + f + j +d f X + f + j X + f + j +d f + j pivi +d + f + j +d f X+ f + j X + f +d + j qual +d + f + j +d + f X+ f +d + j quco +d + f + j +d + f X+ f +d + j qufa +d + f + j +d + f + j +d + f + j X quma +d + f + j quph +d + f + j +d X +d + j qupr +d + f + j quru +d + f + j +d + f X+ f + j +d + f + j qust +d + f + j quve +d + f + j +d + f X+ f + j +d + j quun +d + f + j X X+ f +d + f +d + j rops +d + f + j +d X+ f + j X + f X + f + j +d + j saal +d + f + j +d X+ f + j X X +d + f + j tiam +d + f + j +d X + f + j +d tsca +d + j X + f +d + f X + f X + f X + f + j +d + f + j ulal +d + f + j +d f X+ j X + f + j +d ulam +d + f + j +d X+ j X + f + j +d + f ulru +j X X X X X X X ulun +d + j

22 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 397 acru, cofl, litu, qual, quru). Limited model size results from the fact that many species were insensitive to different input variables and not from limited sample size (Section 5.3). As reported in Clark et al. (2010) correlations between input variables tended to be low. Correlations > 0.5 occurred for only six of the input pairs reported in the Supplement. In fact, few input correlations exceeded 0.2. Even for the highest correlations, variance inflation factors (VIFs) are well below values typically taken as diagnostic of problems, e.g., 5 to 10. VIFs are presented with parameter correlations in the Supplement INTERACTION TERMS Where retained, interactions were positive more often than not (Table 2), indicating the importance of interactions that tend to amplify the effects of one another. Consider a species with an especially large number of interactions in the selected model, Pinus taeda. We contrast P. taeda with another that is abundant and widespread, yet having only weak climate effects and weak interactions, Liquidambar straciflua. The selected model for P. taeda included x ij,t = (1 C ij,t d ij,t w j,t m j,t M ij C ij,t w j,t C ij,t m j,t ). Main effects for exposed canopy (C) and climate variation (w,m) were important for all demographic rates, including juvenile and adult growth and fecundity (inputs are summarized in Figure 4). Both C w and C m interactions were positive for both juvenile and adult growth, and negative for fecundity. Thus, individual growth responded strongly to interannual variation in moisture and winter temperature, but especially so for juveniles exposed to high light levels (Figure 5). This positive interaction describes how the response to climate is amplified by light availability; juveniles shaded in the understory showed lower response than those exposed to high light. Growth responses to both variables were greatest for canopy individuals with access to full sunlight. The negative interaction between Figure 5. Examples of amplifying positive interactions (growth) and buffering negative interactions (fecundity) for Pinus taeda between light availability, winter temperature, and summer drought. Plots show input interactions, i.e., just the first term of Equation (4.8). In all panels, contours increase from low at lower left to high at upper right (see Figure 3).

23 398 J.S. CLARK ET AL. these same variables for fecundity indicates a buffering effect; fecundity responses to climate variation were largest for individuals with limited light access. Although the selected model for Liquidambar styraciflua included climate variables, weak effects are indicated by the fact that 90 % credible intervals included 0 (Table 2) DYNAMIC INVERSE PREDICTION AND SENSITIVITY The different responses of Pinus taeda and Liquidambar styraciflua individuals are particularly evident from DIP. Predictive distributions for w and m (Equation (4.6)) for randomly selected individuals of Pinus taeda and Liquidambar styraciflua are shown as predictive means and 95 % predictive intervals (red lines) with true covariate values (black lines) in Figure 6. Individuals of both species closely track the limiting resource light. However, these species contrast in their responses to drought and winter temperatures. Pinus taeda tracks both variables closely the demographic health of Pinus taeda individuals is controlled by both variables, across the population, in all years, particularly with increasing droughts since The predictive intervals for individual Liquidambar styraciflua tree years simply recover the prior, with a mean of 0.5 for both variables. The fact that the limiting resource light has large impact across individuals and years for both species supports the notion that these different responses to climate represent important species level differences. The predictive scores for the entire populations by year, shown below each predictive distribution in Figure 6.ForLiquidambar, scores are below the scale for climate variables, but they are high for light (bottom right panel). Failure to predict climate variables does not result from limited data. Liquidambar is one of the most abundant species in the data set. Liquidambar does not predict the climate variables, because climate variables do not control the multivariate responses for this species. For Pinus taeda a negative output interaction (Equation (4.8)) between growth and fecundity explains surprising negative sensitivities (Equation (4.9)) of growth to winter temperature for reproductive adults and positive for juveniles (Figure 7). The overall effect of winter temperature is positive, with a long growing season increasing opportunity for photosynthesis. The strong fecundity response (large positive sensitivity in Figure 7b) and negative output interaction (tradeoff between fecundity and growth) explains the negative response of diameter growth, which is especially strong at low light levels (Figure 7a). The fact that juveniles do not reproduce explains the positive growth sensitivities and the increasing response with understory light (Figure 7d). Aggregation of DIP scores to the population scale (Equation (4.15c)) shows differences between species and the importance of input variables. Most species track closely light availability, shown as mean scores near zero or above (Figure 8a), but other variables affect species differently. Pinus taeda is ranked low for w (Figure 8b) and m (Figure 8d), respectively, showing that the tight tracking for individuals in Figure 6 applies to the population as a whole. Liquidambar styraciflua has low rank for both variables, consistent with patterns observed for individuals. Neither species responds strongly to spatial variables M and W.

24 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 399 Figure 6. Individual predictive distributions (predictive mean and 95 % interval) for randomly selected individuals over time (red lines) compared with actual input values (black). Below each individual prediction are the aggregate scores (Equations (4.15a), (4.15b), (4.15c), (4.15d)) for the entire population for each year (aggregate mean and 95 % of individual predictions). Individuals are Pinus taeda at left and Liquidambar styraciflua at right. The prior mean is DISCUSSION Sensitivity analysis (SA) has been one of the most popular tools to emerge in population studies over the last decade. It is now routinely applied to projection matrices, which are used to extrapolate demographic rates fitted to individual organisms (de Kroon, van Groenendael, and Ehrleń 2000; Caswell2000). Sensitivity coefficients provide detail, a

25 400 J.S. CLARK ET AL. Figure 7. Sensitivity of growth and fecundity to temperatures in the months of January, February, and March, w j,t conditioned on light level C ij,t compared for juveniles and adults of Pinus taeda. Each point is a sensitivity estimate for an individual and year (Equation (4.8)). coefficient for each input variable. However, many environmental models have multiple outputs. In these cases, dynamic inverse prediction (DIP) can complement sensitivity analysis. The individual s prediction of climate and resources spatially and over time, based on the fitted model, provides a direct measure of importance for the variables that affect its multivariate response vector (Figure 6). DIP can be implemented at the scale where climate affects populations (individuals over short intervals), it is synthetic, and it is readily aggregated to explore population, year, and conditional responses. It integrates such concepts as effect size, goodness of fit, and variable interactions, and it does so in a way that has intuitive interpretation individuals not closely tracking climate variation in space and time are unlikely to show large responses to near-term variation. By contrast, species having a large number of their individuals closely tracking contemporary variation are clearly sensitive to it. The aggregation of DIP by year, site, and individual provides a basis for anticipating which species will respond in what way to each combination of climate variables, depending on the local competitive context. Evaluating the impacts of climate change could begin with aggregate species differences (Figure 8) to identify those most likely to respond on average to near-term changes in climate. Magnitude of the response can be gauged relative to the risks individuals face constantly, competition for light (Figure 8a). The approach can be implemented where the number of responses is large we have applied to the 100 species on plots across eastern North America in the USFS FIA database (Clark, Gelfand, and Zhu, in preparation). Limitations of DIP, as applied here, include need for spatiotemporal data on inputs and responses, but alternative approaches to understand climate sensitivity have limitations of their own. Species distribution models (SDMs) provide only calibrated regressions of spatial patterns in species abundance and climate, rather than dynamic responses to changing risk exposure. Climate dependence is weak in SDMs (Canham and Thomas 2010) due to the fact that spatial patterns of abundance represent climate effects indirectly, due to competition, and highly aggregated climate data (over years and geographic areas) do not reflect weather experienced by the individual. Disaggregation to the individual scale can reveal large differences between species in terms of the distribution of responses among individuals, depending on their local competitive settings.

26 DYNAMIC INVERSE PREDICTION AND SENSITIVITY ANALYSIS 401 Figure 8. Prediction scores for five predictors, with species ranked from low to high.

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in