Abstract. 1 Introduction

Size: px

Start display at page:

Download "Abstract. 1 Introduction"

Junior Malone
5 years ago
Views:

1 Comparisons between Q-R-mode methods in the study of distribution of recent facies in the sea floors of the Cadiz bay Gonzalez-Caballero, J.L.* & Gutierrez-Mas, J.M.**. * Dpto. de Matemdticas. Universidad de Cadiz de Abstract In this work we compare the outcomes obtained from three mnltivariate statistic methods: the Q-mode factor analysis, the technique of biplot representation and the correspondence analysis applied to the study of the recent sediments of the sea floor of the Cadiz bay. The results obtained with these methods have allowed discrimination between types of sediments through the establishment of different mineralogical associations defined as fades, as well as the elaborate models of sedimentary processes, aspects of great interest for knowledge of the behaviour of the coast marine environment. 1 Introduction The Singular Value Decomposition (SVD) of a X,^ matrix, n objects and p variables, allows us to write X using the principal directions obtained in the R? and R spaces, where the row and column vectors of the X matrix can be represented, respectively. When sediments of the sea floor are extracted and properties of mineralogical composition or of grain size distribution are studied, the picked up data are of compositional type, that is to say, the values of the variables for each sample add a constant quantity (generally 100), Reyment et al/ suggest, among other methods, the use of the

2 4 Applied Sciences and the Environment Q-mode factor analysis, or 'inverted' factor analysis, proposed by Inibrie and Pnrdy^. The fact that the Q-mode factor analysis for a X matrix is a type of factor analysis of a similarities Q-matrix among the X rows of the type XX', allows us to express it in terms of the SVD of X, obtaining the factor loadings starting from a scaling of the eigenvectors of XX' and the factors scores starting from the autovectores of X'X. But the SVD is the foundation of many reduction techniques and data representation, among them the technique of biplot representation and the correspondence analysis, both also suggested by Reyment et al/ so that they are used in the treatment of this type of geologic data, under the denomination of Q-R-mode methods. In this comunication we compare the outcomes obtained starting from the Q-mode factor analysis with the previous Q-R-mode methods, in the study of the distribution of recent sediments in different sectors of the sea floors of the Cadiz bay, with object of checking the effectiveness of the same ones in the description and classification of the sediments. 2 The Q-mode factor analysis, the biplot and the correspondence analysis. The SVD of a X?^ matrix of range r (< p < n) allows us to descompose X as X = VAU' (1) where A diag(ai,..., A,-) is a diagonal matrix with Af,..., A^ the positive eigenvalues of X'X, U = [HI,..., u?.] the matrix of eigenvectors ortonormal of X'X and V = [vi,...,v^] is the matrix of eigenvectors ortonormal of XX', corresponding to the eigenvalues Af,..., A^. Their statistical importance is due to Eckart and Young* and Householder and Young^ that showed that if AI > A 2 >... > A?., the best fit in the sense of least squares of the X matrix for one of range q(< r) comes given by the (n,q) matrix X(9) = V(9)A(9)"U(r/) (2) taking the qfirsteigenvalues and eigenvectors, and an absolute measure of the kindness of this fit can be defined for the proximity to 1

3 Applied Sciences and the Environment The Q-mode factor analysis of Imbrie and Purdy. The Q-factor model supposes that each sediment can be exprcssed, approximately, as a linear combination of g(< min(n,p)) factors (patron sediments), so that where A, F and E represent, respectively, the factor loadings matrix, the factor scores matrix and the residual matrix of non explained part by the model. The decomposition (1) allows us to obtain the model (3) without taking more than A = V% A^ and F = U^. Although such a decomposition also allows other alternatives, the previous one is usually taken because the factor loadings are directly comparable on being the factors F'F = 1%. Imbrie and Purdy^ intended to apply the model (3) to analyse geological problems involving compositional data, defining the 'index of proportional similarity' like the coefficient that is, applying the model (3) to the matrix W = D~^X, where D is an (n,n) diagonal matrix of the row sum squares of X. On the other hand, the same analytic criteria that are used in the general factor model to carry out rotations, orthogonal or oblique, inspired by those TVmraZonc cnferm o/,9%my?k afn/cfmre to obtain a simpler factor structure, they can be used in the Q-factor model. 2.2 The representation biplot model. The biplot is a representation model also based on the decomposition (1) that was introduced by Gabriel^ Given a data X,^ matrix, the biplot provides to combined, exact or approximate plot according to the range of X, of the n objects and the p variables in two dimensions. For that, from the decomposition (1), we can to define the matrix wiht Af,..., A? the elements of A* and similary for A'~", that allow us to express X as H' (6)

4 6 Applied Sciences and the Environment being the g; and hj vectors, with r components, formed by the row of G and H respectively. The least squares properties of the decomposition (1) allow us to obtain an approximate representation of X in a plane, taking thefirsttwo components of g% and hj, denominated biplot of X. For a = 1 one has G = V(2) A^) and H' = UL\, denominated component principal biplot. It verifies that H'H = 1% (notices you the equivalence of G and H with A and F for q 2 in the Q- factor model), and also XX' = GG', that is to say, the relationships among the rows of X in relation to the euclidean metric, they can be represented by those of the g^ vectors with the same metric. 2.3 The correspondence analysis. The technique denominated correspondence analysis (Greenacre^) is usually a procedure that allows to obtain a particular graphic representation of the rows and columns of a non negative data matrix. Consider a X^p matrix of TV observations arranged in a two-way contingency table, where the n rows and the p columns represent, respectively, the n and p categories of two discret variables, with Xij denoting the number of observations which, respectively, take the ith categoric (i = l,---,n) for the first variable and the jth categoric (j = l,--.,p) for the second variable. The procedure is based on to obtainfirsta matrix R = P - fc', where P = (1/7V)X, f = P Ip, c P'lra, and Ip, In are vectors of p and n elements respectively with all elements unity. In fact, the R matrix it is a residual matrix one that it is obtained when the independence model isfittedto P If we define the diagonal matrix (n, n) Dy whose elements are those of the f vector and the diagonal matrix (p, p) DC with elements _ 1/2 _ 1/2 those of the c vector, the transformation Z = Dy RDc allows us to rescale of different forms the rows and the columns of R, according to the inverse of the square root of the total rows and columns, respectively, that which homogenizes the scale of rows and columns of R. Again, the SVD given in (1) of the Z matrix, taking thefirsttwo singular values and associate eigenvectors, allows us to obtain some coordinates in the euclidean plane for the rows and columns of R, that are rescaled versions of the principal components of Z, given for: (7) The expressions of (7), as well as the relationship that exists

5 Applied Sciences and the Environment 1 among F and C, allow us to interpret in the plot that the proximity among points that represent the rows of X reveal the same behaviour in relation to the columns, and reciprocally for the points that represent the columns. Also, the proximity of the column points to a row point reveals the influence of those in this. Lastly, the proximity of a point (row or column) to a certain axis expresses their contribution in the definition of the axis, being biggest all that more it moves away from the center of the representation. 3 Analysis and discussion of recent sediments of the Cadiz bay. With the purpose of comparing the results obtained in each one of the three procedures from section 2, a sampling of sediments obtained in 35 stations from the sea floor of the Cadiz bay has been analysed. Two types of properties have been measured in these samples, on one hand the composition, through compositional variables like the content in Quartz (QU), Feldspars (FE), Phyllosilicates (PH) and Carbonates (CA), whose sum of values for each sediment is the 100%. For other the granulometric nature has been measured, that is to say, the grain size distribution, measured by the size fraction Gravel (GR), Sand (SA) and Mud (MU) content, whose sum is also the 100%. The relationships we find among the variables of both characteristics will be those which best define sediment types (fades) on the sea floors of the Cadiz bay. From a descriptive analysis of the variables measured, we can to deduce that the Quartz is the most abundant component together with the Sand, constituting the most permanent and characteristic properties of the sample carried out. The results of the Q-mode factor model described in the section 2 for the W matrix, give us factor loadings and factor scores, respectively, in the Tables 1 and 2 of the initial factors and rotated factors by the varimax procedure given. On analysing the initial factor scores, we can carried out two sedimentary fades, one as the majority (Factor 1) that explains 96 % of the variability, predominant in the area considering the high loadings that this factor has in all the sediments. This fades can be described as bioclastic quarziferous sand and represents the dominion of the traction transport, as bottom load and associated with the

6 8 Applied Sciences and the Environment highest energy processes that takes place on the sea floor. The other fades (Factor 2), can be described as quartziferous biodastic mud, only predominant in some sectors, and represents the dominion of the suspension load transport which require less energy. Sedim. SI S2 S3 S4 S S8 SO S10 Sll S S14 S15 S16 S17 S18 S19 S S22 S S Expl. Var. % of total Initial solution Factor loadings Factor 1 Factor MO G , Table 1: Factor loadings for the initial rotated orthogonal solution (right)from Rotated solution Factor loadings Factor 1 Factor solution (left) and Q-mode factor analysis. In Figure 1 it can be appreciated how almost all the sediments are in a range of values of high factor loadings (between 0.8 and 1) for the Factor 1, while for the Factor 2 a great part of them take values of negative factor loadings (between -0.2 and 0), other take low positive values (between 0 and 0.3) and only some few relatively high positive

7 Applied Sciences and the Environment values (between 0.3 and 0,6). The analysis results indicates the coexistence of the two processes that have given rise to the formation of these sediments. The dominant process is of character tractive and associated to the high energy in the sea floor environment, is the one that has given rise to those WocWz'c g%on%/ero%s aorw. Later, the sediments deposited as consequence of this process, they should be reworking in a low energetic environment, more quited, deep and far of the coast, giving rise to the precipitation of fine sizes (Mud) from the suspension. Var. QUAR. FELD. PHYL. GARB. GRAV. SAND MUD Initial solution Factor scores Factor 1 Factor Rotated solution Factor scores Factor 1 Factor Table 2: Factor scores for the initial solution (left) and rotated orthogonal solution (right) from Q-rnode factor analysis. Figure 1: Initial factor loadings 0, ,2 0,0-0,2-0.4 S15 S1 S30 S11 S20; 0,0 0, This last process altered the properties of the sediments preexisting, especially the grain size distribution and its influence varies of some areas to other, depending on the geographical situation of the sampling stations in relation with the agent of transport. So, the negatives factor loadings for the Factor 2 correspond to the areas little or anything affected by this process, while the growing positive values indicate the areas of more influence.

8 10 Applied Sciences and the Environment As for the analysis of the factor scores and loadings obtained after rotating orthogonaly with the variamax procedure, the results show how they are clarified even more the fades when the variabiability explained is distributed in 61 % of the Factor 1 in front of the 39 % of the Factor 2. Axis 2* (3,91%) 1,U 0, ,4 0,2 0,0 0,2-0,4 MU + Figure 2: Principal components biplot S15 SI S30 CA + Sll PH Q"»3 ';» + FE " + sao': / SA + 0,0 0,2 0,4 0, Axis 1*( 96.09%) 1,2 1,0 0, ,4 0, n A Figure 3: Plot of rows and columns with correspondence analysis OR S1 S33S9 CA 4- ^ * S30- S1l7 FE t S20 SA.. '< ' Row.Coords The results analysed with the Q-factor model can be obtained with a biggest clarity with the Figure 2, obtained with the representation of the principal components biplot for the same W matrix. In it, besides being appreciated how as much the factor loadings as the factor scores can be obtained for the two factor solutions obtained without more than to rotate the axes orthogonaly. We can observe the almost complete domain of the fades quarziferous sand, with

9 Applied Sciences and the Environment 11 prevalence of the variable Sand like main describer property followed of the Quartz, in the samples of the low part of the Figure 2. Those are very little affected by the process of reworking. On the other hand, the /acs'es gworzz%/ero%3 6%'oc/as^c mw is representated in the high area of the Figure 2. Lastly, the position of the variable PH, GR and FE so near the origin indicate the little influence that they have in the definition of the sedimentary fades. Figure 3 represent the results obtained from the correspondence analysis. These are globally the same ones although they can be tuned more. It suitable to notice in the first place that in this graph the axes and the coordinates that the sediments and the variables have, they don't indicate the same thing that in the representation biplot. These represent the plot in two dimensions of the deviations that there are in the rows (sediments) and the columns (variables) in relation to the independence model among both. In this Figure 3 can be appreciated in the low left area the proximity of most of the samples to the variable Sand and Quartz, closely related, that define the predominant /%c2e,s, and the slipping toward the left of the remaining samples toward the proximity of the variable Mud that defines the second fades. The opposition of the AR and QU in front of the MU are the one that defines the first axis and, therefore, where they are the differences among the sediments throughout it. But also, the proximity of some of them to the variable PH, CA or FE indicates the relative importance of these variables in its composition. So we think that the diversity among them is better explained that in the Figure 1 and 2. This diversity is present also in the second axis, where all the variables except the GR have an influence similar in its definition. The contrast of all them with the variable GR, a not very present component in the sediments, is the one that marks the differences observed in this second axis. 4 Conclusions Three models of multivariate analysis have been used to analyse the fades of the same sample of sediments taken of the sea floor of the Cadiz bay. Basically, the results obtained with them are the same ones and they are in consonance with that exposed in Gutierrez-Mas \ although complementary conclusions can be extracted.

10 12 Applied Sciences and the Environment These methods have allowed us to define the sedimentary fades present in the sea floors of Cadiz bay, arid to differentiate the processes that have rise the sediments. The Q-fact or model and the biplot obtain the same results, because both techniques are based on the SVD of the same data matrix and, therefore, to interpret the factor loadings and scores, the biplot can be used because it gives us a very clear graphic idea in the definition of the fades, when treating sediments and variables jointly. On the other hand, the correspondence analysis also provides a combined plot, where the two-dimensional fit that one makes is an independence model among the rows (sediments) and columns (variables) of the data matrix. In this plot, besides the predominant associations of variables that define the fades, it can be analysed the importance that the other less predominant variables have in each one of the sediments. References [1] Eckart, C. and Young, G. Approximation of one matrix by another of lower rank. Psychometrika,!, pp , [2] Gabriel, K.R. The biplot graphic display of matrices with application to principal components analysis. Biometrika,58, pp , [3] Greenacre, M.J., Theory and Applications of Correspondence Analysis, Academic Press, London, [4] Gutierrez-Mas, J.M., Thesis Univ. of Cadiz. 364 pp. [5] Householder, A.S. and Young,G. Matrix approximation and latent roots. Am. Math. Monthly^, pp , [6] Imbrie, J. and Purdy, E. Classification of modern Bahamian carbonate sediments, Mem. A?ner. Assoc. Petrol. Geol., 7, pp ,1962. [7] Reyrnent, R. and Jreskog, K.G., Applied Factor Analysis in the Natural Sciences, Second edition, Cambridge U.P., 1993.

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In