Multivariate analyses in microbial ecology

Size: px
Start display at page:

Download "Multivariate analyses in microbial ecology"

Transcription

1 MINIREVIEW Multivariate analyses in microbial ecology Alban Ramette Microbial habitat grou, Max Planck Institute for Marine Microbiology, Bremen, Germany OnlineOen: This article is available free online at Corresondence: Alban Ramette, Microbial habitat grou, Max Planck Institute for Marine Microbiology, Celsiusstrasse 1, Bremen, Germany. Tel.: ; fax: ; Received 17 January 27; revised 18 July 27; acceted 2 July 27. DOI:1.1111/j x Editor: Ian Head Abstract Environmental microbiology is undergoing a dramatic revolution due to the increasing accumulation of biological information and contextual environmental arameters. This will not only enable a better identification of diversity atterns, but will also shed more light on the associated environmental conditions, satial locations, and seasonal fluctuations, which could exlain such atterns. Comlex ecological questions may now be addressed using multivariate statistical analyses, which reresent a vast otential of techniques that are still underexloited. Here, well-established exloratory and hyothesis-driven aroaches are reviewed, so as to foster their addition to the microbial ecologist toolbox. Because such tools aim at reducing data set comlexity, at identifying major atterns and utative causal factors, they will certainly find many alications in microbial ecology. Keywords ordination; multivariate; modeling; statistics; gradient. Introduction Microbial ecology is undergoing a rofound change because structure function relationshis between communities and their environment are starting to be investigated at the field, regional, and even continental scales (e.g. Hughes Martiny et al., 26; Ramette & Tiedje, 27a, b). Because DNA sequences are being accumulated at an unrecedented rate due to high-throughut technologies such as yrosequencing (Edwards et al., 26a, b), single-cell genome sequencing (Zhang et al., 26), or metagenomics (Venter et al., 24; Field et al., 26; Gill et al., 26), future challenges will very likely consist of interreting the observed diversity atterns as a function of contextual environmental arameters. This would hel answer fundamental questions in microbial ecology such as whether microbial diversity resonds qualitatively and quantitatively to the same factors as macroorganism diversity (Horner-Devine et al., 24; van der Gast et al., 25; Green & Bohannan, 26; Hughes Martiny et al., 26). Most obstacles encountered by microbial ecologists when they try to summarize and further exlore large data sets concern the choice of the adequate numerical tools to further evaluate the data statistically and visually. Such tools, which have been develoed by community ecologists to work on distribution and diversity atterns of lants and animals, could be readily alied in microbial ecology. Although multivariate analyses of community diversity atterns are well described in the literature, microbial ecologists have used multivariate analyses either rarely or mostly for exloratory uroses. A brief survey of the literature confirms this trend (Table 1; Fig. 1). Table 1 indicates that bacterial studies rank third after lant and fish studies for their use of multivariate analyses. Comlex data sets are mostly exlored via rincial comonent analysis, or cluster analysis, and hyothesis-driven techniques such as redundancy analysis, canonical corresondence analysis (CCA), or Mantel tests are more rarely used (Fig. 1). Axis 1 (horizontal) clearly differentiates microscoic (bacteria, microorganisms, fungi) from macroscoic (fish, bird, lant, insect) life, and this may be related to the use of more exloratory methods (e.g. cluster analysis, PCA) in the first grou. It is imortant to state that the figures resented in Table 1 and Fig. 1 have to be taken with caution because many articles do not include a descrition of statistical aroaches in their titles or abstracts, and so Journal comilation c 27 Federation of Euroean Microbiological Societies

2 2 A. Ramette Table 1. Usage (%) of multivariate methods in different fields Exloratory analysis Hyothesis-driven analysis Keywords w Cluster PCA MDS PCoA CCA RDA MANOVA Mantel ANOSIM CVA Total number z Bacter Microb Plant Fung Fish Bird Insect A literature search was erformed with the Thomson ISI research tool with the following arameters (Doc tye, all document tyes; language, all languages; databases, SCI-EXPANDED, SSCI, A&HCI; Timesan, 19 26) on December 13, 26 in the titles and abstracts of the articles only. w Asterisks were laced at the end of each keyword to accommodate for variations. Each keyword was additionally combined with the following technical designations: cluster, cluster analysis; PCA, rincial comonent analysis; MDS, multidimensional scaling; PcoA, rincial coordinate analysis; CCA, canonical corresondence analysis; RDA, redundancy analysis; Mantel, Mantel test, or CVA, canonical variate analysis. z Total number refers to the total number of ublications identified by each keyword and all its combinations. The ordination based on corresondence analysis of the raw number is deicted in Fig MANOVA Mantel ANOSIM MDS Fish CCA BirdRDA Plant Insect PCoA PCA CVA cluster Fung Bacteria the table is certainly biased and incomlete. However, the oint of the table was both to identify some general trends in the literature and to give one examle of the usefulness of multivariate analysis to analyze a data table. Microb Fig. 1. Corresondence analysis of method usage in various scientific fields. In this symmetrical scaling of CA scores, the first two axes exlained 47.3% and 35.8% of the total inertia of Table 1, resectively. The gray areas were drawn to facilitate the interretation. Comlete row names (scientific fields; full circles) and column names (methods; white triangles) are given in Table 1. Methods (triangles) located close to each other corresond to methods often occurring together in studies. The distance between a scientific field oint and a method oint aroximates the robability of method usage in the field. 1. This review aims at resenting some common multivariate techniques in order to foster their integration into the microbial ecologist s toolbox. Indeed, it is no longer ossible to gain a full understanding of Ecology and Systematics without some knowledge of multivariate analysis. Or, contrariwise, misunderstanding of the methods can inhibit advancement of the science (James & McCulloch, 199). Such a review is ambitious because it tries to rovide a few guidelines for a very vast disciline that is still under develoment. For this reason, it cannot be exhaustive and does not retend to offer in-deth coverage of all selected toics. The review is largely insired by descritions, comments, and suggestions originating from multile, highly recommended sources (ter Braak & Prentice, 1988; James & McCulloch, 199; Legendre & Legendre, 1998; Les & Smilauer, 1999; ter Braak & Smilauer, 22; Palmer, 26), where detailed information about each technique can be obtained. In the first art, data tye and rearation are reviewed as a necessary basis for subsequent multivariate analyses. Second, common multivariate methods (i.e. cluster analysis, rincial comonent analysis, corresondence analysis, multidimensional scaling) and a few statistical methods to test for significant differences between grous or clusters are described, focusing on the methods main objectives, alications, and limitations. Beyond the mere identification of diversity atterns, microbial ecologists may wish to correlate or exlain those atterns using measured environmental arameters, and this aroach is addressed in the third art. Secial emhasis is laced on a few methods that have roven useful in ecological studies, namely redundancy analysis, CCA, linear discriminant analysis, as well as variation artitioning. The final art rovides ractical considerations to hel researchers avoid itfalls and choose the most aroriate methods. Journal comilation c 27 Federation of Euroean Microbiological Societies

3 Multivariate analyses in microbial ecology 3 Data tyes and data rearation Data sets The initial multivariate data set may consist of a table of objects (e.g. samles, sites, time eriods) in rows and measured variables for those objects in columns. This table structure is the standard used in the resent review. When the latter variables are biological taxa, the columns will simly be designated as secies thereafter. It is critical to clearly identify what corresonds to objects and variables in the data set. Indeed, objects in one study may be secies or oerational taxonomic units (OTU) for which catabolic rofiles, gene resence or olymorhism, etc. are measured. In another study where samles from different sites are comared based on, for instance, community fingerrinting techniques, objects can now be samles and secies variables. This distinction is imortant because rocedures that analyze relationshis among objects or among variables are different. Objects are defined a riori by the samling strategy before making observations and variable measurements. Besides, most multivariate analyses assume indeendence between objects (or samles), i.e. observations made on an object are not a riori deendent on those made on another object. Variables, however, can be found to be intercorrelated to various degrees, but this is not necessarily known in advance. Initial data sets can also consist of distance matrices where airwise dissimilarities between objects are calculated. The original table of raw data is not always available, e.g. for DNA DNA hybridization values, hylogenetic distances, and thus secific multivariate techniques have to be considered to deal with data matrices. Data transformations In multivariate data tables, measured variables can be binary, quantitative, qualitative, rank-ordered, classes, frequencies, or even a mixture of those tyes. If variables do not have a uniform scale (e.g. environmental arameters measured in different units or scales) or an adequate format, variables have to be transformed before erforming further analyses. Each qualitative variable has to be recoded as a set of numerical variables that relace it in the numerical calculations. One way to do so is to create a series of dummy variables that corresond to all the states of the qualitative variable. For instance, if the variable season has to be recoded, four associated variables will be constructed, and for each object the value 1 will be given to the corresonding season when it occurs, and for the three other seasons when it is absent. Many statistical ackages automatically erform this recoding. Standardization rovides dimensionless variables and removes the undue influence of magnitude differences between scales or units. A common rocedure is to aly the z-score transformation to the values of each variable. For each variable, it consists of (1) comuting the difference between the original value and the mean of the variable (i.e. centering) and of (2) dividing this difference by the SD of the variable. Normalizing transformations aim at correcting the distribution shaes of certain variables, which deart from normality. One thus tries to obtain a homogeneous variances for variables, conditions under which multivariate rocedures often erform better. Different mathematical transformations can be used to normalize the x values of a variable: for instance, the arcsin ( x) transformation can be alied to ercentages or roortions, log(x1c) to variables dearting strongly from a normal distribution, and (x1c) to less roblematic cases, with c being a constant that is added to avoid mathematically undefined comutations. The c constant is generally chosen so that the smallest nonzero value is obtained by comuting x1c in the former functions. The constant should also be of the same order of magnitude as the variable (Legendre & Legendre, 1998). To make community comosition (either resence absence or abundance) data containing many zeros suitable for analysis by linear methods such as rincial comonent analysis (PCA) or canonical redundancy analysis (RDA), the Hellinger transformation [Eq. (1)] is one of five transformations that give good results (Legendre & Gallagher, 21). The chord transformation is a useful transformation that also gives less weight to rare secies in the secies table [Eq. (2)]. The transformations are given by rffiffiffiffiffiffi yij ¼ y ij ð1þ y iþ y ij yij ¼ sffiffiffiffiffiffiffiffiffiffi P yij 2 j¼1 ð2þ where y ij is the original secies value for site i and secies j, y i1 reresents the sum of all secies values for site i (i.e. sum er row), is the number of secies in the table (number of columns), and y ijreresents the resulting, transformed secies value (Legendre & Gallagher, 21). These transformations are articularly recommended when rare secies are not truly rare, i.e. when they mostly occur because the samling was erformed blindly, as generally done in soil or marine microbial ecology. Further data transformations can be found in Sokal & Rohlf (1995) and Legendre & Legendre (1998). The way to deal with missing data is a disciline on its own (Legendre & Legendre, 1998). Briefly, one can either delete rows or columns containing the missing value(s), or try to relace the missing values by mathematical estimates inferred from values obtained from other objects in the data Journal comilation c 27 Federation of Euroean Microbiological Societies

4 4 A. Ramette set. In the latter case, it is still difficult to rovide ecologically meaningful exlanations for these estimates. In any case, the secific handling of missing data should be reorted by the investigator. When dealing with matrices, it is ossible to change a similarity matrix (S) into a dissimilarity matrix (D) by alying the following transformations: D =1 S, D = (1 S), or D = (1 S 2 ). To normalize any D matrix to the interval [ 1], one can comute D/D max, or (D D min )/(D max D min ), where D max and D min reresent the highest and lowest values of D, resectively (Legendre & Legendre, 1998). Exloratory analyses Visualization and exloration of comlex data sets The basic aim of ordination and cluster analysis is to reresent the (dis)similarity between objects (e.g. samles, sites) based on values of multile variables (columns) associated with them, so that similar objects are deicted near from each other and dissimilar objects are found further aart from each other. Exloratory multivariate analyses are thus useful to reveal atterns in large data sets, but they do not directly exlain why those atterns exist. This latter oint is addressed in the third art of the review. Cluster analysis and association coefficients Cluster analysis encomasses several multivariate techniques that are used to grou objects into categories based on their dissimilarities. The aim is both to minimize withingrou variation and maximize between-grou variation in order to reveal well-defined categories of objects, and therefore reduce the dimensionality of the data set to a few grous of rows (James & McCulloch, 199; Legendre & Legendre, 1998). This aroach is thus generally recommended when distinct discontinuities instead of continuous differences (i.e. gradients) are exected between samles (objects) because cluster analysis mostly aims at reresenting artitions in a data set (Legendre & Legendre, 1998). Because distance matrices that are based on differences in DNA or amino acid sequences are commonly used to describe microbial diversity, cluster analysis has become very oular in microbial ecology (Table 1; Fig. 1). This is not surrising because the grouing of organisms based on their henotyic or genotyic similarities in order to infer their taxonomic ositioning is generally and historically based on cluster analysis (or at least based on a tree-like reresentation) and, as such, is central to biology and evolution (Avise, 26). Tyical microbial ecology questions that are addressed by cluster analysis are whether the clustering atterns of molecular sequences reflect samle origin or samling time in order to reveal secific biogeograhical or temoral atterns, resectively (Whitaker et al., 23; Acinas et al., 24). Those factors are generally hyothesized to be of a discontinuous nature, but the rationale of generally reresenting molecular differences as discontinuous clusters in microbial ecology and microbial genomic studies has only started to be questioned (Konstantinidis et al., 26). Another common alication consists of sorting out clones from environmental samles based on secific criteria (e.g. genetic or henotyic markers) because clones or variants are exected to form tight clusters around their arental strains and to be more distinct from other lineages (Acinas et al., 24). In microarray data analysis, cluster analysis has heled identify common exression atterns of grous of genes, which may shed light on functionally related genes or athways (Eisen et al., 1998). Cluster analysis of a data table roceeds in two stes. First, a relevant association coefficient has to be chosen to measure the association (similarity or dissimilarity) among objects or among variables. Second, the calculated association matrix is reresented as a horizontal tree (hierarchical clustering) or as distinct grous of objects (k-means clustering), based on secific rules to aggregate objects. For ecologists, the ower of cluster analysis derives from the existence of different tyes of (dis)similarity coefficients. The choice of aroriate and ecologically meaningful association coefficients is articularly imortant because it directly affects the values that are subsequently used for the categorization of objects. The analysis of similarities among objects (rows) is designated as Q mode analysis, whereas when relationshis among variables (columns) are the focus of the study, this is referred to as R mode analysis (Legendre & Legendre, 1998). Noticeably, the two modes of analysis do not generally use the same association coefficients. Although it is not ossible to give a full review of all association coefficients here, it is useful to known that, for comaring objects (rows) based on their column attributes in Q mode analysis, coefficients may be chosen as a function of data tye (quantitative, qualitative, ordinal, or mixed data, normalized data, resenceabsence), imortance given to rare secies, weight given to each object, and calculation of associated robability levels. For comaring objects in a samle-by-environment table (e.g. water, soil chemistry), selection of aroriate coefficients generally deends on data tye and unit homogeneity of the measured variables. In R mode analyses, in addition to the reviously cited criteria, the choice of a coefficient may also deend on how the variables are related to each other (e.g. linearly, monotonically, qualitatively, ordered), and on how secies absence is handled in the calculations. In most ecological studies, the absence of a secies at two sites being Journal comilation c 27 Federation of Euroean Microbiological Societies

5 Multivariate analyses in microbial ecology 5 comared is not considered as a measure of similarity between those sites. Indeed, a simultaneous secies absence at two sites may be due to different reasons, e.g. the sites offer different hysical chemical conditions and the secies cannot exist under both conditions, and so there is no straightforward conclusion about site similarity that can be drawn in this case. Asymmetric coefficients are coefficients that do not take into account cases of double absences of secies ( double zeros ) in the calculation of airwise similarities among sites. Moreover, in microbial ecology where environmental communities are generally far from being exhaustively samled, a double absence of an OTU has to be regarded more as a lack of information rather than a sign of common structure among samles, and asymmetric coefficients such as Jaccard (191) or Srensen (1948) should be referred. More details about the calculation of association coefficients and their aroriateness can be found, for instance, in Chater 7 of (Legendre & Legendre, 1998). When an association matrix is calculated, the relationshis between objects or variables can be reresented following secific aggregation rules. Three general aroaches are commonly used: hierarchical clustering, k-means artitioning, and two-way joining. In hierarchical clustering, a linkage rule to form clusters and the numbers of clusters that best suit the data have to be determined a riori. Clusters, which are nested rather than mutually exclusive here, are either formed by rogressively agglomerating objects from high to low similarity cutoff values (forward clustering), or using the converse strategy, i.e. grouing all cases together and rogressing from low to high cutoff values in order to merge objects and clusters (backward clustering). These two strategies do not necessarily yield the same clusters. The merging of clusters is visualized using a tree format (generally horizontal) and is successful when well-defined clusters are identified in the data set (Sneath & Sokal, 1973). Common linkage rules are, e.g. nearest neighbor (the distance between two clusters is the distance between their closest neighboring oints), furthest neighbor (the distance between two clusters is the distance between their two furthest objects), and the widely used unweighted airgrou method using averages (UPGMA; Sneath & Sokal, 1973), where the distance between two clusters is the average distance between all intercluster airs. When within-cluster homogeneity is desired, Ward s method, which merges clusters only if they increase the within-cluster variation the least, is recommended (Legendre & Legendre, 1998). Finally, equal weight can also be given to clusters that are exected to be of different sizes using the weighted arithmetic average clustering (WPGMA), which consists of giving less weight to the original similarities of the largest grous (Legendre & Legendre, 1998). In k-means clustering, objects are assigned to k clusters (k being defined in advance), based on their nearest Euclidean distance to the mean of the clusters. The mean of the cluster is iteratively recalculated until no more assignments are made and cluster means fall below a redefined cut-off value or until the iteration limit is reached. Different means for each cluster are ideally obtained for each dimension used in the analysis, as indicated by high F-values from the resective analyses of variance. Unlike hierarchical clustering, k-means clustering does not require rior comutation of dissimilarity matrix among objects and is therefore more adated to large data sets (e.g. few thousand objects) where comuting ower is an issue. However, the method is quite sensitive to outliers, which are usually removed before erforming the analyses (Legendre & Legendre, 1998). Two-ste cluster analysis may be useful to grou objects into clusters when one or more of the variables are categorical (not interval or dichotomous). Objects are first groued based on the categories, which are themselves hierarchically clustered as single cases. Because neither a roximity table nor iterative stes are required, the method is articularly suited for the analysis of very large data sets (Eisen et al., 1998). Princial comonent analysis (PCA) PCA has been alied to numerous henotyic and genotyic (e.g. fingerrinting atterns) data sets, and it is one of the most oular exloratory analyses (Table 1), erhas because the technique is generally the first multivariate aroach to be exlained in most data analysis manuals. However, this choice may not always be justified in ecology and recommendations for aroriate alications are rovided at the end of this section and in the Practical considerations art of the resent review. Examles of use in microbial ecology concern the identification of atterns of microbial community change over seasons or geograhic areas (e.g. Merrill & Halverson, 22), or as those atterns relate to different lant comartments at different lant develomental stages (Mougel et al., 26), or the reduction of the comlexity of data sets involving hydrochemistry data, bacterial, and archeal community rofiles in order to visualize and interret comlex multivariate data sets onto two-dimensional geograhic mas of contaminated sites (Mouser et al., 25). The PCA rocedure basically calculates new synthetic variables (rincial comonents), which are linear combinations of the original variables (for instance, the secies of a samle-by-secies table), and that account for as much of the variance of the original data as ossible (Hotelling, 1933). The aim is to reresent the objects (rows) and variables (columns) of the data set in a new system of coordinates (generally on two or three axes or dimensions) Journal comilation c 27 Federation of Euroean Microbiological Societies

6 6 A. Ramette where the maximum amount of variation from the original data set can be deicted. In ractice, PCA is either erformed on a variance covariance matrix or on a correlation matrix. The first aroach is followed when the same units or data tyes are used (e.g. abundance of different secies). The aim is then to reserve and to reresent the relative ositions of the objects and the magnitude of variation between variables in the reduced sace. PCA on a correlation matrix is rather used when descritor variables are measured in different units or on different scales (e.g. different environmental arameters) or when the aim is to dislay the correlations among (standardized) descritor variables. The two aroaches lead to different rincial comonents and different distances between rojected objects in the ordination; hence, the interretation of the relationshis must be made with care (Table 2). Indeed, for correlation matrices, variables are first standardized (i.e. they become indeendent of their original scales), and so distances between objects are also indeendent from the scales of the original variables. All variables thus contribute to the same extent to the ordination of objects, regardless of their original variance. PCA results are generally dislayed as a bilot (Jolicoeur & Mosimann, 196), where the axes corresond to the new system of coordinates, and both samles (dots) and taxa (arrows) are reresented (Fig. 1a). The direction of a secies arrow indicates the greatest change in abundance, whereas its length may be related to a rate of change. Deending on whether a distance or a correlation bilot is chosen, different interretations can be made from the ordination diagram (Table 2). The interretation of the relationshis between samles and secies differs and is directly affected by the scaling chosen, i.e. whether the analysis mainly focuses on intersamle relationshis (scaling 1) or intersecies correlations (scaling 2). For instance, in scaling 1, the distances between objects are an aroximation of their Euclidean distances in the multidimensional sace, but this aroximation is not valid if scaling 2 is chosen (Table 2). Projecting an object at a right angle on a secies arrow in the ordination diagram aroximates the osition of the Table 2. Interretation of ordination diagrams Linear methods (PCA, RDA) PCA, RDA RDA Scaling 1 Scaling 2 Samles Secies ENV NENV Focus on samle (rows) distance Focus on secies (columns) correlation Euclidean distances among samles Linear correlations among secies Marginal effects of ENV on ordination scores Correlations among ENV Euclidean distance between samle classes Abundance values in secies data Values of ENV in the samles Membershi of samles in the classes Linear correlations between secies and ENV Mean secies abundance within classes of nominal ENV Average of ENV within classes Unimodal methods (CA, CCA) CA, CCA CCA Focus on samle (rows) distance and Hill s scaling Focus on secies (columns) distances Turnover distances among samles w 2 distances between samles - w 2 distances among secies distributions Marginal effects of ENV Correlations among ENV Turnover distances between samle classes w 2 distances between samle classes Relative abundances of the secies table Relative abundances of the secies table Values of ENV in the samles Membershi of samles in the classes Weighted averages the secies otima in resect to articular ENV Relative total abundances in the samle classes ENV averages within samle classes The interretation of ordination diagrams deends on the focus of the study, because samle scores are rescaled as a function of the scaling choice. Aroximate relationshis between and among the different elements reresented in bilots and trilots as secies (reresented as dots or arrows), samles (dots), environmental variables (ENV; arrows), and nominal (qualitative) environmental variables (NENV; dots). A meaningless interretation ( ) haens when the suggested comarison is not otimal because of inaroriate scaling of the ordination scores. Adated from ter Braak (1994); Les & Smilauer (1999); ter Braak & Smilauer (22). Journal comilation c 27 Federation of Euroean Microbiological Societies

7 Multivariate analyses in microbial ecology 7 object along that secies descritor. The length of the secies descritor indicates its contribution to the formation of the ordination sace. For correlation bilots, the length of the orthogonal rojection of a secies arrow on the axes aroximates its SD on the resective axes. Angles between secies arrows reflect their correlations, e.g. utative interactions between secies (scaling 2). An erroneous interretation of the bilot would be to use the roximity of an object oint and the ti of a secies arrow to deduce a relationshi between them. Indeed, only right-angle rojections of samles onto secies arrows are correct to derive aroximated secies abundance in the samles. PCA should generally be used when the objects (sites or samles) cover very short gradients, i.e. when the same secies are mostly identified everywhere in the study area (i.e., when samles mostly differ in secies abundances), and when secies linearly resond to environmental gradients. Because those conditions are often not met in ecological studies, other multivariate aroaches have been rogressively referred over PCA (as also suggested by Table 1) such as corresondence analysis or multidimensional scaling. PCA is successful when most of the variance is accounted for by the largest (generally the first two or three) comonents. The amount of variance accounted for by each rincial comonent is given by its eigenvalue. The mathematical descrition of eigenvalue calculation stes goes beyond the aim of the resent review but can be found in most linear algebra manuals. Eigenvalues derived from a PCA are generally considered to be significant when their values are larger than the average of all eigenvalues (Legendre & Legendre, 1998). The cumulative ercentage of variance accounted for by the largest comonents indicates how much roortion of the total variance is deicted by the actual ordination. High absolute correlation values between the synthetic variables (rincial comonents) and the original variables are useful to identify which variables mainly contribute to the variation in the data set, and this is referred to as the loading of the variables on a given axis. However, because the synthetic and original variables are linearly correlated (i.e. they are not indeendent), standard tests to determine the statistical significance of the correlations between them cannot be used. Princial coordinate analysis (PCoA) The technique is more rarely used by microbial ecologists (Table 1), desite its usefulness at reducing and reresenting atterns resent in distance matrices dislaying dissimilarities among objects (Gower, 1966). Its objectives are very similar to those of PCA in that it uses a linear (Euclidean) maing of the distance or dissimilarities between objects onto the ordination sace (i.e. rojection in a Cartesian sace), and the algorithm attemts to exlain most of the variance in the original data set. In microbial ecology, PCoA has been used, for instance, to test whether virulence rofiles (i.e. resence or absence of secific genes) arising from athogenic strains could differentiate either healthy or contaminated hosts (Chaman et al., 26), or to determine whether PCoA axes could searate grous of Stahylococcus aureus isolates into bovine and human hosts when genetic relationshis among them had been established by random amlified olymorhic DNA-PCR analysis (Reinoso et al., 24). As oosed to PCA, PCoA works with any dissimilarity measure and so secific association coefficients that better deal with the roblem of the resence of many double zeros in data sets can be surmounted. Moreover, PCoA does not rovide a direct link between the comonents and the original variables and so the interretation of variable contribution may be more difficult. This is because PCoA comonents, instead of being linear combinations of the original variables as in PCA, are comlex functions of the original variables deending on the selected dissimilarity measure. Besides, the non-euclidean nature of some distance measures does not allow for a full reresentation of the extracted variation into a Euclidean ordination sace. In that case, the non-euclidean variation cannot be reresented and the ercent of total variance cannot be comuted with exactness. The choice of the dissimilarity measure is thus of great imortance, and subsequent transformation of the data to correct for negative eigenvalues is sometimes necessary (see Legendre & Legendre, 1998, section for how to correct for such negative eigenvalues). Objects are reresented as oints in the ordination sace. Eigenvalues are also used here to measure how much variance is accounted for by the largest synthetic variables on each PCoA synthetic axis. Although there is no direct, linear relationshi between the comonents and the original variables, it is still ossible to correlate object scores on the main axis (or axes) with the original variables to assess their contribution to the ordination. Corresondence analysis (CA) A basic question that ecologists may want to address when facing a multidimensional table of sites (or samles) by secies is whether certain secies occur at secific sites, as a measure of their ecological references. CA has generally been used in microbial ecology to determine whether atterns in microbial OTU distribution could reflect differentiation in community comosition as a function of seasons, geograhic origin, or habitat structure (Olaade et al., 25; Edwards et al., 26a, b; Kent et al., 27). The overall aim of the method is to comare the corresondence between samles and secies from a table of counted data (or any dimensionally homogenous table) and to reresent Journal comilation c 27 Federation of Euroean Microbiological Societies

8 8 A. Ramette Axis II (PCA) (a) Axis II (CA) (b) Axis I (PCA) Axis I (CA) Fig. 2. Ordination diagrams in two dimensions. (a) In a PCA bilot reresentation, samles are reresented by dots and secies by arrows. The arrows oint in the direction of maximal variation in the secies abundances, and their lengths are roortional to their maximal rate of change. Long arrows corresond to secies contributing more to the data set variation. Right-angle rojection of a samle dot on a secies arrow gives aroximate secies abundance in the samle. (b) In a CA joint lot reresentation focusing on secies distance, both samles and secies are deicted as dots. Secies dots corresond to the center of gravity (inertia) of the samles where they mostly occur. Distances between samle and secies oints give an indication of the robability of secies comosition in samles (see Table 2 for more details about diagram interretation). it in a reduced ordination sace (Hill, 1974). Noticeably, instead of maximizing the amount of variance exlained by the ordination, CA maximizes the corresondence between secies scores and samle scores. Several algorithms exist and the most commonly described one is recirocal averaging, which consists of (1) assigning arbitrary numbers to all secies in the table (these are the initial secies scores), (2) for each samle, a samle score is then determined as a weighted average of all secies scores (this thus takes into account the abundance of each secies at the site and the reviously determined secies scores), (3) for each secies, a new secies score is then calculated as the weighted average of all the samle scores, (4) both secies scores and samle scores are standardized again to obtain a mean of zero and a SD of one, and (5) stes two to four are reeated until secies and site scores converge towards stable solutions in successive iterations (Hill, 1974). The overall table variance (inertia) based on w 2 distances is decomosed into successive comonents that are uncorrelated to each other, as in the PCA or PCoA rocedures. For each axis, the overall corresondence between secies scores and samle scores is summarized by an eigenvalue, and the latter is thus equivalent to a correlation coefficient between secies scores and samle scores (Gauch, 1982). The technique is oular among ecologists because CA is articularly recommended when secies dislay unimodal (bell shaed or Gaussian) relationshis with environmental gradients (ter Braak, 1985), as it haens when a secies favors secific values of a given environmental variable, which is revealed by a eak of abundance or resence when the otimal conditions are met (this can be visualized by lotting secies abundance against the environmental arameter). The unimodal model that suorts the concet of ecological niches has also been shown to be of the right order of comlexity for the ordination of most ecological data (ter Braak & Prentice, 1988). Although examles of unimodal distributions along variables or environmental gradients exist with macroorganisms (ter Braak, 1985), the shae of the distribution of the abundance of microbial secies along environmental arameters or gradients has not been extensively investigated (but see Ramette & Tiedje, 27a, b). This may arise from the fact that, in microbial surveys, environmental samling is mostly erformed blindly in relation to environmental heterogeneity, and the abundance of target secies is generally determined without systematically analyzing associated environmental arameters. Finally, another imortant feature of CA for microbial ecologists is that the recirocal averaging algorithm disregards secies double absences because the relationshis between rows and columns of the table are quantified using the w 2 coefficient that excludes double absences (Legendre & Legendre, 1998). Both samles and taxa are often jointly deicted in the ordination sace (i.e. joint lot; Fig. 2b), where the center of inertia (centroid) of their scores corresonds to the zero for all axes. Deending on the choice of the scaling tye, either the ordination of rows (samles) or the columns (secies) is meaningful, and can be interreted as an aroximation of the w 2 distances between samles or secies, resectively (see Table 2 for more details about interretation). Samle oints that are close to each other are similar with regard to the attern of relative frequencies across secies. It is imortant to remember that in such joint lots, either distances between samle oints or distances between secies oints can be interreted, but not the distances between samle and secies oints. Indeed, these distances are not simle Euclidean distances comuted from the relative row or column frequencies, but rather they are Journal comilation c 27 Federation of Euroean Microbiological Societies

9 Multivariate analyses in microbial ecology 9 weighted distances. The roximity between samle and secies oints in the lot can thus be understood as a robability of secies occurrence or of a high abundance in the samles in the vicinity of a secies oint. In scaling 2 (i.e. focus on secies), secies oints found at the center of the ordination sace should be carefully checked with the raw data to clarify whether the secies ordination really corresonds to the otimal abundance or occurrence of the secies, or whether the secies is just badly reresented by the main axes, as it is the case when other axes are more aroriate to reresent the secies. Rare secies contribute little to the total table inertia (i.e. they only lay a minor role in the overall table variance) and are hence ositioned at the edges of the lot, next to the site(s) where they occur. In general, only the secies oints found away from the ordination center and not close to the edges of the ordination have more chances to be related to the ordination axes, i.e. to contribute to the overall variance (Legendre & Legendre, 1998). When the secies comosition of the sites rogressively changes along the environmental gradient, samle ositions may aear in the ordination lot as nonlinear configurations called arch (Gauch, 1982) (or horseshoe in the case of PCA), which may imair further ecological interretation. In CA, the arch effect may be mathematically roduced as a side-effect of the CA rocedure that tries to obtain axes that both maximally searate secies and that are uncorrelated to each other (ter Braak, 1987): when the first axis suffices to correctly order the sites and secies, a second axis (uncorrelated with the former) can be obtained by folding the first axis in the middle and bringing its extremities together, thus resulting in an arch configuration. Further axes can be obtained by further dividing and folding the first axis into segments (Legendre & Legendre, 1998). To remove the arch effect in CA, a mathematical rocedure, detrending, is used to flatten the distribution of the sites along the first CA axis without changing their ordination on that axis. The aroach is then designated as detrended corresondence analysis (DCA). The review of different detrending algorithms such as using segments or olynomials goes beyond the scoe of this review, but more information can be obtained in (ter Braak & Prentice, 1988; Legendre & Legendre, 1998). Some authors have also argued that the arch effect may not be an artifact but an exected feature of the analysis, esecially when secies turnover is high along environmental gradients (James & McCulloch, 199). In that case, if the samles are meaningfully ositioned along the arch, the ordination should be acceted as a valid result. Nonmetric multidimensional scaling (NMDS) NMDS is generally efficient at identifying underlying gradients and at reresenting relationshis based on various tyes of distance measures. Not surrisingly, NMDS has found an increasing number of alications in microbial ecology (Table 1). The technique has been generally alied to identify atterns among multile samles that were subjected to molecular fingerrinting techniques. For instance, NMDS was used to analyze and to comare the reroducibility of various fingerrinting techniques such as ribosomal internal sacer analysis (RISA), terminal fragment length olymorhism (T-RFLP), and denaturing gradient gel electrohoresis (DGGE) between different laboratories when alied to samles chosen from a salinity gradient (Casamayor et al., 22). NMDS was also used to comare diversity atterns of microbial communities (as determined by length heterogeneity-pcr) from samles undergoing different land management ractices (Mills et al., 26). Another examle is the analysis of the bacteriolankton communities of four shallow eutrohic lakes that differed in nutrient load and food web structure using DGGE rofiling, so as to determine the secificity of community signatures in each lake (Van der Gucht et al., 25). The NMDS algorithm ranks distances between objects, and uses these ranks to ma the objects nonlinearly onto a simlified, two-dimensional ordination sace so as to reserve their ranked differences, and not the original distances (Sheard, 1966). The rocedure works as follows: the objects are first laced randomly in the ordination sace (the desired number of dimensions has to be defined a riori), and their distances in this initial configuration are comared by monotonic regression with the distances in the original data matrix based on a stress function (values between and 1). The latter indicates how different the ranks on the ordination configuration are from the ranks in the original distance matrix. Several iterations of the NMDS rocedure are generally imlemented so as to obtain the lowest stress value ossible (i.e. the best goodness of fit) based on different random initial ositions of the objects in the ordination sace. For samle-by-secies tables, simulations have shown that before alying NMDS, a standardization of each secies by its maximum abundance, followed by the comutation of distances between samles based on the Steinhaus or Kulczinski similarity coefficients yielded informative ordination results (Legendre & Legendre, 1998,. 449). In NMDS ordination, the roximity between objects corresonds to their similarity, but the ordination distances do not corresond to the original distances among objects. Because NMDS reserves the order of objects, NMDS ordination axes can be freely rescaled, rotated, or inverted, as needed for a better visualization or interretation. Because of the iterative rocedure, NMDS is more comuter intensive than eigenanalyses such as PCoA, PCA, or CA. However, constant imrovement in comuting ower makes this limitation less of a roblem for small- to medium-sized matrices. Journal comilation c 27 Federation of Euroean Microbiological Societies

10 1 A. Ramette Testing for significant differences between grous In addition to reresenting objects in an ordination lot or as clusters of similar objects, another objective may be to test whether differences between grous of objects (rows) in a multivariate table are significantly different based on the set of their attributes (columns), i.e. to test whether similarities within grous are higher than those between grous. Here, nonarametric multivariate ANOVA (NPMANOVA) and analysis of similarities (ANOSIM), which are commonly found in standard statistical ackages, are briefly reviewed. It is also ossible to use canonical analyses ( Testing for significant differences between grous ) to test for significant differences between grous of objects. These statistical tests, however, must not be used to assess the statistical difference among grous that were derived from a revious cluster analysis on the same variables because, under those conditions, the two aroaches would not be indeendent from each other. Indeed, the grous derived from cluster analysis (which are themselves made to fit the data) would then be used for testing the null hyothesis that there is no difference among the grous. This hyothesis would then not be indeendent of the data used to test it, and would nearly always roduce significant differences between the grous even if it is not the case (Legendre & Legendre, 1998). NPMANOVA The method can be used to test for significant differences between the means of two or more grous of multivariate, quantitative data (Anderson, 21). The null hyothesis of equality of means is tested based on Wilks L (lambda) statistic, which relaces the F-test normally used in univariate ANOVA. When only two grous are comared, Hotelling s T 2 test is more aroriate. The latter test can also be used, as a ost hoc test, to assess the significance of airwise comarisons statistically between grous, following an overall significant Wilks test. Significance is generally comuted by ermutation of grou membershi, with several thousand relicates, alleviating concerns about multinormality of the data. Because multile airwise comarisons are made, the significance level of the airwise Hotelling s tests needs, however, to be corrected. With the Bonferroni correction, for instance, the P-value usually chosen for significant differences between grous (i.e..5) is relaced by a smaller P-value calculated by dividing the original P-value by the total number of airwise comarisons that are erformed. For instance, for 1 airwise comarisons, the corrected P-value becomes.5. This correction is often judged to be rather conservative as it leads to significance for fewer airwise comarisons (Legendre & Legendre, 1998). ANOSIM This nonarametric rocedure tests for significant difference between two or more grous, based on any distance measure (Clarke, 1993). It comares the ranks of distances between grous with ranks of distances within grous. The means of those two tyes of ranks are comared, and the resulting R test statistic measures whether searation of community structure is found (R = 1), or whether no searation occurs (R = ). R values 4.75 are commonly interreted as well searated, R 4.5 as searated, but overlaing, and R o.25 as barely searable (Clarke & Gorley, 21). The test makes fewer assumtions than MANOVA because it is based on the ranks of distances, and it is often used for samle-by-secies tables, where grous of samles are comared. All grous should have comarable within-grou disersion to avoid finding falsely significant results (Legendre & Legendre, 1998). Alications in microbial ecology include testing for satial differences, temoral changes, or environmental imacts on microbial assemblages. For instance, Kent et al. (27) determined whether bacterial communities from the same lake were more similar in comosition to each other than to communities in different lakes. The bacterial comosition and diversity of samles from different geograhic origins, habitats, and avian hosts were also comared using ANOSIM based on a length heterogeneity (LH)-PCR (Bisson et al., 27). Another examle is the alication of ANOSIM to terminal restriction fragment length olymorhism (T-RFLP)-generated data to determine the imact of B and NaCl on soil microbial community structure in the wheat rhizoshere (Nelson & Mele, 27). Environmental interretation Exloratory analyses may reveal the existence of clusters or grous of objects in a data set. When a sulementary table or matrix of environmental variables is available for those objects, it is then ossible to examine whether the observed atterns are related to environmental gradients. Tyical objectives may be, for instance, to reveal the existence of a relationshi between community structure and habitat heterogeneity, between community structure and satial distance, or to identify the main variables affecting bacterial communities when a large set of environmental variables has been conjointly collected. The significance of the relationshis between secies atterns and environmental variables can generally be assessed by ermutation techniques such as Monte Carlo ermutation tests, which infer statistical roerties from the data themselves. The order of data (generally the rows of one matrix) is ermuted and the relationshis between the observed atterns and environmental variables can be Journal comilation c 27 Federation of Euroean Microbiological Societies

11 Multivariate analyses in microbial ecology 11 assessed for randomness. This aroach is articularly suitable when variables do not follow a normal distribution (which is often the case with environmental or ecological data), as generally required by traditional statistical rocedures (Legendre & Legendre, 1998). Indirect gradient analyses Ordination axes or clusters can be interreted based on additional environmental variables (i.e. variables not used in the ordination or cluster analysis) that rovide ecological knowledge about the studied sites or secies ecological characteristics. When using exloratory ordination aroaches on a samle-by-secies table, samles are dislayed along the axes of main variation in secies comosition. These axes are thus constructed without reference to environmental characteristics, but they can be hyothesized to reresent underlying environmental gradients (e.g. environmental arameters, satial or temoral variables, chemical gradients), which need to be subsequently identified. Such an aroach is designated as indirect, because synthetic variables (i.e. the axes) are first constructed and thereafter related to environmental variation. For instance, the scores of the objects on PCA or CA main comonents (axes) can be further related by standard statistical rocedures (e.g. ANOVA, regression analysis) to environmental variables. Likewise, in PCoA or NMDS, it is ossible to statistically comare the ranks obtained by the objects on each axis and the ranks of those objects on additional environmental variables, using Searman s rank correlation coefficients (Legendre & Legendre, 1998). A convenient method of interretation is to reresent the additional environmental variables as fitted arrows directly on the ordination diagram. These variables are added to the existing ordination by linear regression of their values onto the existing ordination axes. This rocedure is imlemented in various statistical ackages (e.g. CANOCO, R). Hence, it is ossible to assess the direction and magnitude of the most raid change in the environmental variables and to determine whether they corresond to the observed atterns among objects (Oksanen, 27). In cluster analysis, the magnitude of the absolute correlation value between an ordered clustering solution and environmental variables may also rovide clues about utative environmental causes for the observed discontinuities in the data set. Another convenient way of dislaying additional information to hel interret the ordination is to use site symbols whose sizes are roortional to the values of the additional variable. Hence, one can visually assess whether the ordination of objects (samles, sites) matches secific trends in the additional variable. This strategy was, for instance, used on NMDS ordination lots inferred from DGGE rofiles on which the values of five additional environmental variables were individually maed as roortional circles in order to identify the main environmental factors related to the bacterial community structure in four freshwater lakes (Van der Gucht et al., 25). Direct gradient analyses (constrained analyses) In constrained (canonical) ordination analyses, only the variation in the secies table that can be exlained by the environmental variables is dislayed and analyzed, and not all the variation in the secies table. Gradients are suosed to be known and reresented by the measured variables or their combinations, while secies abundance or occurrence is considered to be a resonse to those gradients. Constrained ordinations are mostly based on multivariate linear models relating rincial axes to the observed environmental variables, and the different techniques deend on data tyes (matrix or table), and on the hyothesis underlying secies distribution in the gradients (i.e. linear or unimodal). Their aim is to find the best mathematical relationshis between secies comosition and the measured environmental variables, and to assess whether, statistically, such a relationshi could have been roduced due to chance alone using ermutation tests. The resulting ordination diagrams dislay samles, secies, and environmental variables so that fitted secies samles and secies environment relationshis can be derived as easily as ossible from angles between arrows or distances between oints and arrows (Table 2). Redundancy analysis (RDA) In microbial ecology, RDA has been alied, for instance, to test whether the occurrence of biocontrol bacteria with secific carbon source utilization rofiles was related to their origin from different root samles (Folman et al., 23), to determine which environmental factors were the most significant to exlain variation in microbial community comosition in undisturbed native rairies and croed agricultural field (McKinley et al., 25), to examine the effects of samling locations (longitude, latitude, altitude) on genetic diversity of lant athogenic bacteria (Kolliker et al., 26), or to assess the influence of season, farm management, and soil chemical, hysical, and biological roerties on nitrogen fluxes and bacterial community structure (Cookson et al., 26). This method can be considered as an extension of PCA in which the main axes (comonents) are constrained to be linear combinations of the environmental variables (Rao, 1964). Two tables are then necessary: one for the secies data ( deendent variables) and one for the environmental variables ( indeendent variables). Multile linear regressions are used to exlain variation between indeendent Journal comilation c 27 Federation of Euroean Microbiological Societies

12 12 A. Ramette and deendent variables, and these calculations are erformed within the iterative rocedure to find the best ordination of the objects. The interest of such an aroach is to reresent not only the main atterns of secies variation as much as they can be exlained by the measured environmental variables but also to dislay correlation coefficients between each secies and each environmental variable in the data set. When the data set consists of a matrix of distances between objects, distance-based RDA (db-rda; Legendre & Anderson, 1999) can be alied to determine how well additional environmental arameters can exlain the variation among objects in the matrix. The technique first alies a PCoA on the distance matrix to convert it back to a rectangular table containing rows of objects by columns of PCoA coordinates. Those new, uncorrelated coordinates thus corresond to synthetic secies variables that are then related to additional environmental arameters using a classical RDA. For instance, db-rda was successfully used to determine how the variation in matrices of genomic distances among environmental strains could be exlained by factors such as soil arameters, host lant secies, and satial scale, each factor being taken alone or in combination (Ramette & Tiedje, 27b). Most software oututs rovide the total variation in secies comosition as exlained by the environmental axes, the cumulative ercentage of variance of the secies environment relationshi, and the overall statistical significance of the relationshis between the secies and environmental tables. RDA can be reresented by a trilot of samles (dots), secies (arrows), and environmental variables (arrows for quantitative variables and dots for each level of qualitative or nominal variables), or by any combinations thereof (i.e. bilots) (ter Braak, 1994). Deending on the scaling chosen, i.e. whether the analysis mainly focuses on intersamle relationshis or intersecies correlations, the interretation of the relationshis between samles, secies, and environmental variables differs (Table 2). Canonical corresondence analysis (CCA) The aroach is very similar to that of RDA, excet that CCA is based on unimodal secies environment relationshis whereas RDA is based on linear models (ter Braak, 1986). CCA can be considered as the constrained form of CA in which the axes are linear combinations of the environmental variables. CCA uses the unimodal model to model secies resonse to the environmental variation as a mathematical simlification to enable the estimation of a large number of arameters and the identification of a small number of ordination axes. This secies model seems, however, to be robust even when some secies dislay bimodal resonses, unequal ranges, or unequal maxima along environmental gradients, and the technique is thus considered to be the method of choice by many ecologists (ter Braak & Smilauer, 22). It is therefore articularly adated for the environmental interretation of tables of abundance and occurrence of secies, and accommodates well the absence of secies at certain sites in the data set. CCA is sensitive to rare secies that occur in secies-oor samles, and down-weighting of such secies hel reduce the roblem (Legendre & Legendre, 1998). Software oututs are very similar to those of RDA and as for RDA, trilot and bilot reresentations and interretation deend on the choice of the scaling tye (Table 2). The same interretation of the relationshis between samle and secies oints is found in CA and CCA. Right-angle rojection of these oints on the environmental arrows leads to the correct aroximation of the ranking of the oints along environmental variables. CCA has been used in an increasing number of ublications dealing with microbial assemblages in marine and soil ecosystems. Tyical questions that are addressed concern the identification of environmental factors that influence the diversity of bacterial assemblages among large sets of candidate environmental arameters measured for the same samles, when the diversity is determined by cultureindeendent, genetic fingerrinting techniques such as automated ribosomal intergenic sacer analysis (ARISA) (Yannarell & Trilett, 25), DGGE (Salles et al., 24; Sa et al., 27), or T-RFLP (Córdova-Kreylos et al., 26; Klaus et al., 27). Another interest in the technique comes from the ossibility of determining the secific secies or OTUs that resond to articular environmental variables, and as such that can be identified as candidate indicator secies. Those secies can then be subjected to further exeriments so as to confirm their status of indicator secies. For instance, the relationshis, as determined by CCA, between bacterial community comosition and 11 environmental variables for 3 lakes in Wisconsin, revealed that atterns in bacterial communities were best exlained by regional- and landscae-level factors, as well as by secific seasons, H, and water clarity (Yannarell & Trilett, 25). CCA was also successfully used to demonstrate that former land use management affected the comosition of the targeted soil microbial community (Burkholderia) to a larger extent than did lant secies (Salles et al., 24). Another interesting examle in the marine ecosystem is the study of the interactions between various abiotic arameters and hytolankton community data (biotic arameter) to exlain bacteriolankton dynamics in the North Sea and the subsequent identification of the bacterial hylotyes resonding more secifically to the factors (Sa et al., 27). Another examle of using CCA to identify some microbial communities as ollution indicators can be found in (Córdova-Kreylos et al., 26). Journal comilation c 27 Federation of Euroean Microbiological Societies

13 Multivariate analyses in microbial ecology 13 (a) Fig. 3. Partitioning biological variation into the effects of two factors. The large rectangle reresents the total variation in the biological data table, which is artitioned among two sets of exlanatory variables (a, b). Fraction 4 shows the unexlained art of the biological variation. Fractions 1 and 3 are obtained by artial constrained ordination or artial regression, and can be tested for significance. For instance, fraction 1 corresonds to the amount of biological variation that can be exclusively exlained by (a) effects when (b) effects are taken into consideration (i.e., when b is considered as a covariable). Fraction 2 [i.e., variation indifferently attributed to (a) and (b) or a covariation of (a) and (b)] is obtained by subtracting fractions 1 and 3 from the total exlained variance, and cannot be tested for statistical significance. Partial ordination, variation artitioning When the effects of a articular environmental variable need to be tested after elimination of ossible effects due to other (environmental) variables, artial ordination may be used (e.g. artial CCA, artial RDA). Such an aroach is also referred to as artialling out or controlling for the effects of secific variables, which are secified as covariables in the constrained analysis. For instance, in a study dealing with the effects of environmental and ollutant variables on microbial communities, Córdova-Kreylos et al. (26) observed that variation in microbial communities was more due to satial variation than to ollutants. The use of artial CCA to account for satial variation in the biological data set revealed that metals had a greater effect on microbial community comosition than organic ollutants. This idea of controlling for the effects of secific variables can be extended to evaluate the effects of all the different sets (factors) of environmental variables resent in a study so as to determine the relative contribution (amount of variation exlained) and significance of each variable set on the total biological variance. The so-called variation artitioning rocedure (Borcard et al., 1992) artitions the total variance of the secies table into the resective contribution of each set of environmental variables and into their covariations using both standard and artial constrained ordinations (Fig. 3). Two methods have traditionally been used to artition the variation of community comosition data, i.e. canonical artitioning and regression on distance matrices based on Mantel tests (Legendre & Legendre, 1998). The canonical aroach has been shown to be more aroriate to artition the b diversity correctly among sites and to test (b) 4 hyotheses about the origin and maintenance of its variation (Legendre et al., 25). Alications of variation artitioning in microbial ecology include, for instance, the study by Ramette & Tiedje (27b), which alied the technique in the context of RDA to disentangle the effects of sace, environmental soil arameters, and lant secies on Burkholderia community abundance and diversity. By quantifying the amount of biological variation that is left unexlained when all environmental variables had been considered, the study suggested that much less of the biological variation could be redicted at the intrasecific level comared with higher taxonomic levels. Another interesting examle is the study of seasonal changes in bacterial community comosition in shallow eutrohic lakes, in which to-down regulation (grazers) of bacterial community comosition was examined after accounting for bottom-u regulation (resources) (Muylaert et al., 22). Linear discriminant analysis (LDA) When grous or clusters of objects have been obtained by exloratory analyses for instance, LDA can be used to identify linear combinations of additional environmental variables that best discriminate those grous. In that resect, LDA can be seen as an extension of MANOVA for two or more grous, in which environmental variables that secifically exlain the grouing of objects are identified. Another alication consists of assigning new objects to reviously defined grous for rediction or classification uroses based on the calculated discriminant function. For instance, Fuhrman et al. (26) used the technique to evidence the existence of reeatable temoral atterns in the community comosition of marine bacteriolankton over 4.5 years. The technique is mostly recommended for multinormal data for which attribute data are linearly related and for which variances and covariances of the variables are good summary statistics. A visual reresentation of LDA can be erformed, and in the resulting ordination, the axes are then the discriminant functions. The distances between objects, which corresond to Mahalanobis distances that take into account the correlations among descritors (Mahalanobis, 1936), are indeendent of the scale of measurement of the various descritors and are mostly used to comare grous of sites or objects with each other (Legendre & Legendre, 1998). Selection of variables in regression models In the revious constrained methods where linear combinations of environmental (exlanatory) variables are used, the inclusion of too many exlanatory variables to describe secies distribution may lead to difficult ecological interretations and to lower redictability of the models, due to Journal comilation c 27 Federation of Euroean Microbiological Societies

2. Sample representativeness. That means some type of probability/random sampling.

2. Sample representativeness. That means some type of probability/random sampling. 1 Neuendorf Cluster Analysis Assumes: 1. Actually, any level of measurement (nominal, ordinal, interval/ratio) is accetable for certain tyes of clustering. The tyical methods, though, require metric (I/R)

More information

Ecological Resemblance. Ecological Resemblance. Modes of Analysis. - Outline - Welcome to Paradise

Ecological Resemblance. Ecological Resemblance. Modes of Analysis. - Outline - Welcome to Paradise Ecological Resemblance - Outline - Ecological Resemblance Mode of analysis Analytical saces Association Coefficients Q-mode similarity coefficients Symmetrical binary coefficients Asymmetrical binary coefficients

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

Hidden Predictors: A Factor Analysis Primer

Hidden Predictors: A Factor Analysis Primer Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor

More information

Plotting the Wilson distribution

Plotting the Wilson distribution , Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion

More information

One-way ANOVA Inference for one-way ANOVA

One-way ANOVA Inference for one-way ANOVA One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

Brownian Motion and Random Prime Factorization

Brownian Motion and Random Prime Factorization Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Metrics Performance Evaluation: Application to Face Recognition

Metrics Performance Evaluation: Application to Face Recognition Metrics Performance Evaluation: Alication to Face Recognition Naser Zaeri, Abeer AlSadeq, and Abdallah Cherri Electrical Engineering Det., Kuwait University, P.O. Box 5969, Safat 6, Kuwait {zaery, abeer,

More information

Using Factor Analysis to Study the Effecting Factor on Traffic Accidents

Using Factor Analysis to Study the Effecting Factor on Traffic Accidents Using Factor Analysis to Study the Effecting Factor on Traffic Accidents Abstract Layla A. Ahmed Deartment of Mathematics, College of Education, University of Garmian, Kurdistan Region Iraq This aer is

More information

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling Scaling Multile Point Statistics or Non-Stationary Geostatistical Modeling Julián M. Ortiz, Steven Lyster and Clayton V. Deutsch Centre or Comutational Geostatistics Deartment o Civil & Environmental Engineering

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

7.2 Inference for comparing means of two populations where the samples are independent

7.2 Inference for comparing means of two populations where the samples are independent Objectives 7.2 Inference for comaring means of two oulations where the samles are indeendent Two-samle t significance test (we give three examles) Two-samle t confidence interval htt://onlinestatbook.com/2/tests_of_means/difference_means.ht

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Research of power plant parameter based on the Principal Component Analysis method

Research of power plant parameter based on the Principal Component Analysis method Research of ower lant arameter based on the Princial Comonent Analysis method Yang Yang *a, Di Zhang b a b School of Engineering, Bohai University, Liaoning Jinzhou, 3; Liaoning Datang international Jinzhou

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION Journal of Statistics: Advances in heory and Alications Volume 8, Number, 07, Pages -44 Available at htt://scientificadvances.co.in DOI: htt://dx.doi.org/0.864/jsata_700868 MULIVARIAE SAISICAL PROCESS

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

The European Commission s science and knowledge service. Joint Research Centre

The European Commission s science and knowledge service. Joint Research Centre The Euroean Commission s science and knowledge service Joint Research Centre Ste 7: Statistical coherence (II) PCA, Exloratory Factor Analysis, Cronbach alha Hedvig Norlén COIN 2017-15th JRC Annual Training

More information

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming Maximum Entroy and the Stress Distribution in Soft Disk Packings Above Jamming Yegang Wu and S. Teitel Deartment of Physics and Astronomy, University of ochester, ochester, New York 467, USA (Dated: August

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

CMSC 425: Lecture 4 Geometry and Geometric Programming

CMSC 425: Lecture 4 Geometry and Geometric Programming CMSC 425: Lecture 4 Geometry and Geometric Programming Geometry for Game Programming and Grahics: For the next few lectures, we will discuss some of the basic elements of geometry. There are many areas

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

An Improved Calibration Method for a Chopped Pyrgeometer

An Improved Calibration Method for a Chopped Pyrgeometer 96 JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY VOLUME 17 An Imroved Calibration Method for a Choed Pyrgeometer FRIEDRICH FERGG OtoLab, Ingenieurbüro, Munich, Germany PETER WENDLING Deutsches Forschungszentrum

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population Chater 7 and s Selecting a Samle Point Estimation Introduction to s of Proerties of Point Estimators Other Methods Introduction An element is the entity on which data are collected. A oulation is a collection

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle] Chater 5 Model checking, verification of CTL One must verify or exel... doubts, and convert them into the certainty of YES or NO. [Thomas Carlyle] 5. The verification setting Page 66 We introduce linear

More information

Estimation of Separable Representations in Psychophysical Experiments

Estimation of Separable Representations in Psychophysical Experiments Estimation of Searable Reresentations in Psychohysical Exeriments Michele Bernasconi (mbernasconi@eco.uninsubria.it) Christine Choirat (cchoirat@eco.uninsubria.it) Raffaello Seri (rseri@eco.uninsubria.it)

More information

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations Preconditioning techniques for Newton s method for the incomressible Navier Stokes equations H. C. ELMAN 1, D. LOGHIN 2 and A. J. WATHEN 3 1 Deartment of Comuter Science, University of Maryland, College

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Ecologically meaningful transformations for ordination of species data

Ecologically meaningful transformations for ordination of species data Oecologia (001) 19:71 80 OI 10.1007/s00440100716 Pierre Legendre Eugene. Gallagher Ecologicall meaningful transformations for ordination of secies data Received: 5 Setember 000 / Acceted: 17 March 001

More information

Completely Randomized Design

Completely Randomized Design CHAPTER 4 Comletely Randomized Design 4.1 Descrition of the Design Chaters 1 to 3 introduced some basic concets and statistical tools that are used in exerimental design. In this and the following chaters,

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

A New Asymmetric Interaction Ridge (AIR) Regression Method

A New Asymmetric Interaction Ridge (AIR) Regression Method A New Asymmetric Interaction Ridge (AIR) Regression Method by Kristofer Månsson, Ghazi Shukur, and Pär Sölander The Swedish Retail Institute, HUI Research, Stockholm, Sweden. Deartment of Economics and

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Measuring center and spread for density curves. Calculating probabilities using the standard Normal Table (CIS Chapter 8, p 105 mainly p114)

Measuring center and spread for density curves. Calculating probabilities using the standard Normal Table (CIS Chapter 8, p 105 mainly p114) Objectives 1.3 Density curves and Normal distributions Density curves Measuring center and sread for density curves Normal distributions The 68-95-99.7 (Emirical) rule Standardizing observations Calculating

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Monte Carlo Studies. Monte Carlo Studies. Sampling Distribution

Monte Carlo Studies. Monte Carlo Studies. Sampling Distribution Monte Carlo Studies Do not let yourself be intimidated by the material in this lecture This lecture involves more theory but is meant to imrove your understanding of: Samling distributions and tests of

More information

Session 5: Review of Classical Astrodynamics

Session 5: Review of Classical Astrodynamics Session 5: Review of Classical Astrodynamics In revious lectures we described in detail the rocess to find the otimal secific imulse for a articular situation. Among the mission requirements that serve

More information

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete

More information

Characteristics of Beam-Based Flexure Modules

Characteristics of Beam-Based Flexure Modules Shorya Awtar e-mail: shorya@mit.edu Alexander H. Slocum e-mail: slocum@mit.edu Precision Engineering Research Grou, Massachusetts Institute of Technology, Cambridge, MA 039 Edi Sevincer Omega Advanced

More information

8.7 Associated and Non-associated Flow Rules

8.7 Associated and Non-associated Flow Rules 8.7 Associated and Non-associated Flow Rules Recall the Levy-Mises flow rule, Eqn. 8.4., d ds (8.7.) The lastic multilier can be determined from the hardening rule. Given the hardening rule one can more

More information

8 STOCHASTIC PROCESSES

8 STOCHASTIC PROCESSES 8 STOCHASTIC PROCESSES The word stochastic is derived from the Greek στoχαστικoς, meaning to aim at a target. Stochastic rocesses involve state which changes in a random way. A Markov rocess is a articular

More information

A STUDY ON THE UTILIZATION OF COMPATIBILITY METRIC IN THE AHP: APPLYING TO SOFTWARE PROCESS ASSESSMENTS

A STUDY ON THE UTILIZATION OF COMPATIBILITY METRIC IN THE AHP: APPLYING TO SOFTWARE PROCESS ASSESSMENTS ISAHP 2005, Honolulu, Hawaii, July 8-10, 2003 A SUDY ON HE UILIZAION OF COMPAIBILIY MERIC IN HE AHP: APPLYING O SOFWARE PROCESS ASSESSMENS Min-Suk Yoon Yosu National University San 96-1 Dundeok-dong Yeosu

More information

Uniform Law on the Unit Sphere of a Banach Space

Uniform Law on the Unit Sphere of a Banach Space Uniform Law on the Unit Shere of a Banach Sace by Bernard Beauzamy Société de Calcul Mathématique SA Faubourg Saint Honoré 75008 Paris France Setember 008 Abstract We investigate the construction of a

More information

Adaptive estimation with change detection for streaming data

Adaptive estimation with change detection for streaming data Adative estimation with change detection for streaming data A thesis resented for the degree of Doctor of Philosohy of the University of London and the Diloma of Imerial College by Dean Adam Bodenham Deartment

More information

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Haiing Lu, K.N. Plataniotis and A.N. Venetsanooulos The Edward S. Rogers Sr. Deartment of

More information

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi LOGISTIC REGRESSION VINAANAND KANDALA M.Sc. (Agricultural Statistics), Roll No. 444 I.A.S.R.I, Library Avenue, New Delhi- Chairerson: Dr. Ranjana Agarwal Abstract: Logistic regression is widely used when

More information

Modelling 'Animal Spirits' and Network Effects in Macroeconomics and Financial Markets Thomas Lux

Modelling 'Animal Spirits' and Network Effects in Macroeconomics and Financial Markets Thomas Lux Modelling 'Animal Sirits' and Network Effects in Macroeconomics and Financial Markets Kiel Institute for the World Economy & Banco de Esaña Chair of Comutational Economics University of Castellón GSDP

More information

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations PINAR KORKMAZ, BILGE E. S. AKGUL and KRISHNA V. PALEM Georgia Institute of

More information

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Journal of Sound and Vibration (998) 22(5), 78 85 VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Acoustics and Dynamics Laboratory, Deartment of Mechanical Engineering, The

More information

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm Gabriel Noriega, José Restreo, Víctor Guzmán, Maribel Giménez and José Aller Universidad Simón Bolívar Valle de Sartenejas,

More information

arxiv:cond-mat/ v2 25 Sep 2002

arxiv:cond-mat/ v2 25 Sep 2002 Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,

More information

Pairwise active appearance model and its application to echocardiography tracking

Pairwise active appearance model and its application to echocardiography tracking Pairwise active aearance model and its alication to echocardiograhy tracking S. Kevin Zhou 1, J. Shao 2, B. Georgescu 1, and D. Comaniciu 1 1 Integrated Data Systems, Siemens Cororate Research, Inc., Princeton,

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu and Markov Models Huizhen Yu janey.yu@cs.helsinki.fi Det. Comuter Science, Univ. of Helsinki Some Proerties of Probabilistic Models, Sring, 200 Huizhen Yu (U.H.) and Markov Models Jan. 2 / 32 Huizhen Yu

More information

PERFORMANCE BASED DESIGN SYSTEM FOR CONCRETE MIXTURE WITH MULTI-OPTIMIZING GENETIC ALGORITHM

PERFORMANCE BASED DESIGN SYSTEM FOR CONCRETE MIXTURE WITH MULTI-OPTIMIZING GENETIC ALGORITHM PERFORMANCE BASED DESIGN SYSTEM FOR CONCRETE MIXTURE WITH MULTI-OPTIMIZING GENETIC ALGORITHM Takafumi Noguchi 1, Iei Maruyama 1 and Manabu Kanematsu 1 1 Deartment of Architecture, University of Tokyo,

More information

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation 6.3 Modeling and Estimation of Full-Chi Leaage Current Considering Within-Die Correlation Khaled R. eloue, Navid Azizi, Farid N. Najm Deartment of ECE, University of Toronto,Toronto, Ontario, Canada {haled,nazizi,najm}@eecg.utoronto.ca

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA)

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA) Unsuervised Hyersectral Image Analysis Using Indeendent Comonent Analysis (ICA) Shao-Shan Chiang Chein-I Chang Irving W. Ginsberg Remote Sensing Signal and Image Processing Laboratory Deartment of Comuter

More information

A Social Welfare Optimal Sequential Allocation Procedure

A Social Welfare Optimal Sequential Allocation Procedure A Social Welfare Otimal Sequential Allocation Procedure Thomas Kalinowsi Universität Rostoc, Germany Nina Narodytsa and Toby Walsh NICTA and UNSW, Australia May 2, 201 Abstract We consider a simle sequential

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

Re-entry Protocols for Seismically Active Mines Using Statistical Analysis of Aftershock Sequences

Re-entry Protocols for Seismically Active Mines Using Statistical Analysis of Aftershock Sequences Re-entry Protocols for Seismically Active Mines Using Statistical Analysis of Aftershock Sequences J.A. Vallejos & S.M. McKinnon Queen s University, Kingston, ON, Canada ABSTRACT: Re-entry rotocols are

More information

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling Journal of Modern Alied Statistical Methods Volume 15 Issue Article 14 11-1-016 An Imroved Generalized Estimation Procedure of Current Poulation Mean in Two-Occasion Successive Samling G. N. Singh Indian

More information

SIMULATION OF DIFFUSION PROCESSES IN LABYRINTHIC DOMAINS BY USING CELLULAR AUTOMATA

SIMULATION OF DIFFUSION PROCESSES IN LABYRINTHIC DOMAINS BY USING CELLULAR AUTOMATA SIMULATION OF DIFFUSION PROCESSES IN LABYRINTHIC DOMAINS BY USING CELLULAR AUTOMATA Udo Buschmann and Thorsten Rankel and Wolfgang Wiechert Deartment of Simulation University of Siegen Paul-Bonatz-Str.

More information

Frequency-Weighted Robust Fault Reconstruction Using a Sliding Mode Observer

Frequency-Weighted Robust Fault Reconstruction Using a Sliding Mode Observer Frequency-Weighted Robust Fault Reconstruction Using a Sliding Mode Observer C.P. an + F. Crusca # M. Aldeen * + School of Engineering, Monash University Malaysia, 2 Jalan Kolej, Bandar Sunway, 4650 Petaling,

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Churilova Maria Saint-Petersburg State Polytechnical University Department of Applied Mathematics

Churilova Maria Saint-Petersburg State Polytechnical University Department of Applied Mathematics Churilova Maria Saint-Petersburg State Polytechnical University Deartment of Alied Mathematics Technology of EHIS (staming) alied to roduction of automotive arts The roblem described in this reort originated

More information

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition TNN-2007-P-0332.R1 1 Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Haiing Lu, K.N. Plataniotis and A.N. Venetsanooulos The Edward S. Rogers

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

PER-PATCH METRIC LEARNING FOR ROBUST IMAGE MATCHING. Sezer Karaoglu, Ivo Everts, Jan C. van Gemert, and Theo Gevers

PER-PATCH METRIC LEARNING FOR ROBUST IMAGE MATCHING. Sezer Karaoglu, Ivo Everts, Jan C. van Gemert, and Theo Gevers PER-PATCH METRIC LEARNING FOR ROBUST IMAGE MATCHING Sezer Karaoglu, Ivo Everts, Jan C. van Gemert, and Theo Gevers Intelligent Systems Lab, Amsterdam, University of Amsterdam, 1098 XH Amsterdam, The Netherlands

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Pulse Propagation in Optical Fibers using the Moment Method

Pulse Propagation in Optical Fibers using the Moment Method Pulse Proagation in Otical Fibers using the Moment Method Bruno Miguel Viçoso Gonçalves das Mercês, Instituto Suerior Técnico Abstract The scoe of this aer is to use the semianalytic technique of the Moment

More information

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing PHYSICAL REVIEW E 75, 4197 27 Period-two cycles in a feedforward layered neural network model with symmetric sequence rocessing F. L. Metz and W. K. Theumann Instituto de Física, Universidade Federal do

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

ON THE DEVELOPMENT OF PARAMETER-ROBUST PRECONDITIONERS AND COMMUTATOR ARGUMENTS FOR SOLVING STOKES CONTROL PROBLEMS

ON THE DEVELOPMENT OF PARAMETER-ROBUST PRECONDITIONERS AND COMMUTATOR ARGUMENTS FOR SOLVING STOKES CONTROL PROBLEMS Electronic Transactions on Numerical Analysis. Volume 44,. 53 72, 25. Coyright c 25,. ISSN 68 963. ETNA ON THE DEVELOPMENT OF PARAMETER-ROBUST PRECONDITIONERS AND COMMUTATOR ARGUMENTS FOR SOLVING STOKES

More information

Chapter 1 Fundamentals

Chapter 1 Fundamentals Chater Fundamentals. Overview of Thermodynamics Industrial Revolution brought in large scale automation of many tedious tasks which were earlier being erformed through manual or animal labour. Inventors

More information