CORRESPONDENCE ANALYSIS

Size: px
Start display at page:

Download "CORRESPONDENCE ANALYSIS"

Transcription

1 CORRESPONDENCE ANALYSIS INTUITIVE THEORETICAL PRESENTATION BASIC RATIONALE DATA PREPARATION INITIAL TRANSFORAMATION OF THE INPUT MATRIX INTO PROFILES DEFINITION OF GEOMETRIC CONCEPTS (MASS, DISTANCE AND CENTROID) CONSTRUCTION OF THE PRINCIPAL AXES OF INERTIA AND PROJECTION PLOTS GRAPHICAL INTERPRETATION OF OUTPUTS SUPPLEMENTARY PROJECTIONS QUALITATIVE REGRESSION ARCHETYPAL DISCRIMINATION CASE STUDIES APPENDIX A Air Quality in Gdansk APPENDIX B Reservoir Quality Zones in an Oil Field APPENDIX C Climatology o Porto urban area APPENDIX D Risk Assessment o mine tailings dam breakage APPENDIX E Index o Quality in Natural Stones 1

2 BASIC RATIONALE CORRESPONDENCE ANALYSIS Correspondence Analysis (CA) is a geometric data analysis methodology whose main goal is to represent tabular data graphically, acilitating by this manner the numeric table interpretation. The basic idea behind such a methodology developed in the 1960s by the French mathematician Jean-Paul Benzécri is that any matrix o positive numbers viewed as some orm o contingency tables 1 concatenation can be summarized into a series o 2D graphs that plot data with respect to two perpendicular coordinate axes, calibrated or a common scale. Avoiding as much as possible unnecessary a priori assumptions, the interpretation o such graphs allows or detecting and evaluating the pattern o relationships between rows and columns o the input table 2. When taking the input table as a n p matrix, its geometric representation consists o depicting the elements o the table as points in a certain geometric space. Depending on how the matrix is viewed, it may be converted into a cloud o n rows in the p R space, or into a cloud o p columns in the n R space. The very backbone o the CA algorithm leads to the eature that these two representations are equivalent, allowing or the joint interpretation o rows and columns in the same plot. The oremost outputs yielded by CA are standard Cartesian graphs showing the simultaneous projection o the labels that represent the input matrix rows and columns 3 onto the axes that convey the maximum inertia 4 associated to the initial cloud o points (each row being considered as a point in the column space, and conversely). The above mentioned graphs are ranked by 1 In a contingency table, the intersection o any row and column gives the number o occurrences that share the characteristics that are common to that row and column. 2 In simple terms, the interpretation procedure consists o inding out i there is attraction, repulsion, or indierence between the relevant elements o a given data set. This entails looking or similarities and dierences rom column to column, rom row to row, and between columns and rows. 3 The rows o the input matrix are usually denoted individuals and the columns, variables (or attributes, or properties, or observations ). 4 The inertia o a point belonging to a cloud located in space is given by the product o its mass by its squared distance to the centroid o the entire cloud. In statistical language, the inertia is the analogue o the variance, since the mass can be seen as the relative requency o a distribution o points, being the geometric centroid calculated as the weighted average o the set o coordinates deining the entire cloud. 2

3 descending order o importance (quantiied by the raction o the total cloud inertia conveyed by the axes extracted rom such cloud), and only the most signiicant plots are retained or interpretation 5 reducing by the same token the dimensionality o the input matrix. This process o discarding the less important graphs that produces the reduction o the initial cloud dimensionality is illustrated in a simple artiicial example outlined in Fig. A-1, in which the cloud o rows (individuals) can be seen as an ellipsoide initially given in the 3D space o columns (properties). By minimizing the distance rom the cloud o individuals to three orthogonal directions in space, the CA algorithm inds the three ellipsoid principal axes (AXIS 1,2 and 3 in Fig. A-1) and projects the points o the cloud into the two planes deined by AXIS 1 vs. AXIS 2, and by AXIS 1 vs. AXIS 3 [(1) and (2), in Fig. A-1]. Fig. A-1 Rotation o an ellipsoidal cloud o points to the most avorable position, where the two principal plans [(1) and (2)] o the ellipsoid resume in 2D the geometry o the cloud in an optimal way 5 Such an interpretation becomes easier since the dimensionality o the initial table is reduced when it is converted into a small number o graphs. 3

4 The initial ellipsoid cloud given ormerly in the columns reerential (whose axes are denoted prop.1, 2 and 3 in Fig A-1) that describe the individuals (rows) contained in the input matrix is rotated in an optimal way to a new reerential, deined by the ellipsoid principal axes. The geometric meaning o the importance o each axis assessed by the quantities denoted eigenvalues and represented in Fig. A-1 by λ 1, λ2, λ3 is now clear: each eigenvalue is a measure o the length o the spread o the projections onto the corresponding principal axis, which is obviously related to the variability o the cloud o points distribution around such axis. I one decides to disregard Axis 3 (because λ3is considered negligible in respect to λ,λ 1 2 ), plot (1) o Fig. A-1 is the representation o the initial cloud that allows the required dimensionality reduction (rom 3D to 2D), with a minimum loss o inormation [quantiied by the ratio λ ( λ λ + ) + ] λ3 As opposed to classical multivariate statistics, the signiicance o an axis is not judged on the grounds o any hypothesis testing procedure based on unveriiable assumptions, but depends only on its explicative power in the context o the problem to be approached by the CA modeling methodology. Hence, CA calls or a symbiotic eort perormed jointly by the data analyst and the expert in the scientiic domain to which data reers, being the later responsible or interpretation, according to the rules made available by the ormer. The rules oered by the data analyst stem directly rom the CA algorithm, as it was put orward by Benzécri and his ollowers. The irst question to be handled beore applying the algorithm is to assure that the input matrix may be considered as a concatenation o contingency tables cross-tabulating two qualitative variables 6. Then, the graphs produced by the CA program should be scrutinized in order to retain or interpretation a small number o them, able to explain a reasonable raction o the total cloud inertia. These graphs are interpreted not only in terms o the pattern emerging rom row and column projections, but also by determining the quality o each point s representation in respect to an axis. This quality is measured by the absolute contribution o the point s projection or the inertia associated with the axes, given by 6 In particular, this requirement implies that sums along rows and column are allowed by the nature o data, producing values that make sense in the context o the problem to be approached by the methodology. 4

5 the raction o this inertia assigned to each particular point. Also, the relative contribution, evaluated by the angle between the vector representing the point in the original cloud and its projection onto the axes, is an additional measure o the point s representation quality. Once outlined a irst interpretation scheme, it is in general needed to modiy the coding o the variables (and sometimes, o the individuals), in order to improve results. Such an improvement by interactive coding is considered as satisactory when the model emerging rom CA outputs is accepted both by the data analyst and the data expert. This model ranges rom a simple discourse about the meaning o the axes to a complex set o quantiied relationships between data sets. In any case, the model should ollow the data, not the reverse, according to Benzécri s dictum. Furthermore, in line with the CA paradigm, the model obtained through the methodology is not validated by any statistical test o hypothesis, but by its contribution to give rise to valuable and helpul insights on the issue addressed by the expert o the scientiic domain where data is included in. DATA PREPARATION In a variety o environmental studies based on empirical observations, it is very common to arrange the results o such observations under the orm o data tables symbolically represented in Fig. A-2, in which the values o observed variables ( j = 1,..., p ) are recorded or each one o the physical units ( i = 1,..., n ) where such variables were captured (denoted individuals). Fig. A-2 Generic table recording results o p observations in n individuals K i, j is a numeric value providing the result o observation j or the individual i ] [ ( ) 5

6 The table depicted in Fig. A-2 may be viewed as a n p matrix, whose n rows are the individuals, and the p columns are the attributes observed in such individuals. Such attributes may be expressed by real numbers which give the value o some measure o a quantitative continuous variable or by integers denoting the modality (or category, or class.) o a qualitative variable. Historically, Correspondence Analysis was developed by Jean-Paul Benzécri or the case o two qualitative variables cross-tabulated in a contingency table (Fig. A-3). In subsequent developments, an appropriate concatenation o such contingency tables expressing the cross-tabulation o two variables is the compulsory input or CA, entailing the transormation o the most common data model o Fig. A-2 into a set o two-way contingency tables, arranged as blocks o a new matrix. Fig. A-3 Contingency Table In the two-way contingency table o Fig. A-3, two qualitative variables VAR 1 and VAR 2 are put in correspondence trough the absolute requencies ( i j) K, o cooccurrences o modalities i and j. The SUM along columns give the total number K(i) o occurrences o VAR 2 modalities and, along lines, the total number K(j) o occurrences o VAR 1 modalities. K is the global number o occurrences, which is the SUM o absolute requencies given in K ( i) or K ( j), which can be viewed as histograms (expressed in absolute requencies) or VAR 2 and VAR 1, respectively. 6

7 In order to put the model depicted in Fig. A-2 under a ormat suited or some CA applications, the irst step is to transorm it into a complete disjunctive matrix, denoted D and shown symbolically in Fig. A-4. This matrix displays as rows the n individuals, and as columns the p modalities o a set o q qualitative variables. For each individual, the value 1 is assigned to the modality that occurs in that individual (and 0 to the others), or the entire set o the q qualitative variables. Fig. A-4 complete disjunctive matrix D It is worth noting that the complete disjunctive matrix is already a juxtaposition o contingency tables. In act, each block reerring to a given variable is the contingency table crossing the individuals with the absolute requency o modalities in which such variable is deployed (in this particular case, those requencies take only two values: one, i the modality occurs or a certain individual, and zero, otherwise). Taking into account that j stands or a modality in Fig. A-3, and or a qualitative variable (deployed into several modalities) in Fig. A-4, the representations given in Figures A-2 and A-3 can be put into relation. In act, it is clear that K(i) = q because the sum o one line along columns o Fig. A-4 equates the number o variables (since each variable contains only the value 1 in a certain modality, being the others given the value 0). Moreover, it is comprehensible that K = nq because n is the total absolute requency o all modalities or each one o the q histograms calculated when columns o Fig. A-4 are summed along lines. 7

8 Hence, any table ollowing the generic model o Fig. A-1 can be submitted to the CA algorithm, provided that it is previously transormed into a complete disjunctive matrix D. I such a table contains quantitative variables, these should be split into meaningul classes by inspection o their empirical distributions, and any real number ( i j) K, appearing in Fig. A-2 should be substituted by an array containing the value one in the class where ( i j) K, alls, and zero in all the other classes. As a result, the quantitative variable is converted into a set o categories (APPENDIX A).. Instead o using as input the complete disjunctive matrix D, it is in general advisable to transorm D into the Burt Matrix B (Fig. A-5) by multiplying the transpose matrix o D ( D ) by itsel ( B = D D ). Fig. A-5 Burt Matrix ( q is the number o qualitative variables, whose total number o modalities is p, and A stands or the contingency table cross-tabulating j by j ) 8

9 Furthermore, the Burt matrix 7, apart rom being the input or the CA algorithm, contains all the inormation required by the classical treatment o questionnaires 8, when the aim o the survey perormed through such questionnaires is considered as purely descriptive. In act, its diagonal blocks give the histogram o modalities or each question, and the non-diagonal blocks contain the cross-tabulations o all pairs o questions. In summary, the basic data preparation required to apply the CA algorithm is to transorm the available raw data into an input matrix that can be viewed as a juxtaposition or as a concatenation o contingency tables 9. INITIAL TRANSFORMATION OF THE INPUT MATRIX INTO PROFILES Any n x p matrix to be inputted to the CA algorithm can be depicted in geometric terms as a cloud o n rows in the column space p R or as a cloud o p columns in the row space n R. Given the symmetric character o contingency tables, the two above mentioned views are equivalent in semantic terms, since it is the same to use the matrix shown in Fig. A-3 or its transpose. Hence, the transormations perormed in rows have their counterpart in columns by substituting the index i by j, and conversely. Bearing in mind this argument and or the sake o parsimony, most ormulae o this text are expressed in terms o rows (whenever necessary, the corresponding ormulae or columns are derived rom the ormer by changing i into j). 7 This matrix consists o all two-way cross-tabulations o a set o categorical variables, including the crosstabulation o each variable with itsel. It is the analogue or qualitative variables, o the variance-covariance matrix or actorial methods based on quantitative variables. In act, the diagonal blocks o the Burt matrix are the analogue o variances and the non-diagonal ones, o covariances. 8 Conversely, by viewing each qualitative variable o any input table as a question with response categories, it may as well be stated that we end up with a quasi-universal coding ormat: the questionnaire. 9 Beore running any CA program, the data to be inputted should be scrutinized to assure that the resulting ormat can be considered as a contingency table, or more commonly as a juxtaposition or as a concatenation o such tables (being the above described complete disjunctive matrix and Burt matrix seen as the most common particular instances o juxtaposition and concatenation, respectively). 9

10 Each point o the cloud representing the rows o the input matrix illustrated in Fig. A-3 is a vector in the p R space, whose coordinates deine the row proile i j, given by: i j ( i, j) K ij K K( i, j) = = =, where K( i) i K( i) K K( i, j) ij K =, i = K(i)/K, K( i) = K( i, j), K = K( i, j) The proile o row i p j= 1 n p i= 1 j= 1 i ij j = expresses, in a standard way, how the individual i i is described in terms o the available set o properties. Since K(i,j) is expressed in absolute requencies in a contingency table 10, the coordinates o a row i correspond to its relative requencies, calculated or the row total K(i). In an input table like the one given in Fig. A-3 two individuals assigned to rows i and i have a similar proile when the values o their original properties K(i,j) and K(i,j) are roughly proportional. Moreover, the proile o a given individual i along columns j adds up to 1, as shown below: p j= 1 p K(i, j) K(i) = K(i) K(i) p i ij j= 1 j = = = j= 1 i Since the coordinates o individuals meet the above given relationship, the individuals can be represented in a p 1 dimension space. This is a speciic advantage o CA in terms o the eort to achieve the main objective o the data analysis methods that aim to acilitate interpretation through dimensionality reduction. In contrast to other actorial methods (like Principal Components Analysis), this eature o CA stems rom the act that any contingency table contains an additional inormation which is not used in 1 10 It is worth noting that the notion o contingency table can be generalized beyond the case o counts that give rise to absolute requencies. In act, CA can be properly applied to any table o homogeneous positive numbers or which it makes sense to express them in relative amounts. Hence, the sum along columns and rows should be allowed by the nature o data (in particular, a common unit should be given to all elements o the matrix), and a reliable sense should be conerred to that SUM [denoted K(i) and K(j) in Fig. A-2]. Also, each element o a row (or a column) divided by K(i) [or K(j)] should generate a signiicant ratio. Moreover, adding K(i) along lines and K(j) along columns, the same meaningul amount K should be obtained. 10

11 the PCA input table: the sum o the total number o occurrences may be derived rom n p data by K = K (i, j ). i =1 j =1 An example o the dimensionality shrinkage caused by the act that proiles add up to 1 is shown in Fig. A-6: all individuals i described by 3 properties lay ab initio on the plan E, given by x3 = 1 x 2 x1 (just by eect o the initial transormation, the dimensionality o the space was reduced rom 3 to 2). Fig A-6 Plan containing all individuals i characterized by 3 properties For a 2D case, when the input matrix contains only two columns j and j, proiles o the rows i can be represented graphically in a scatterplot, as shown in Fig. A-7. Fig. A-7 Geometric representation o rows in a n x 2 contingency table (X = ji = ij and Y = ji' = ij' ) i i 11

12 DEFINITION OF GEOMETRIC CONCEPTS (MASS, DISTANCE AND CENTROID) Once placed the cloud o points in space by its coordinates (the above deined proiles), to each point should be assigned a mass, in order to account or its signiicance, in terms o the number o cases reported in the contingency table that such a point embodies. For the sake o comparison, it is reasonable that the mass o each point should K(i) be deined by i =, a measure that accounts or its relative magnitude in respect to K the bulk o entire cloud. Hence, ceteris paribus, the bigger is the mass o a point, the greater is its contribution to the attraction o a Principal Axis to its neighborhood. Another eature needed to delineate a geometric representation problem is to deine a distance to measure how near (ar) the points o the cloud are in space, one in respect to the others. Obviously, in a contingency table, the usual Euclidean distance is not appropriate to account or such geometric inter-relations, which are to be mediated through the directions o spread o the cloud representing the input table, i.e., its principal Axes. In act, the Euclidean distance treats all coordinates equally, and what is needed is to compensate the discrepancy between requencies via a weighting procedure. This procedure leads to the eature that options that occur less requently are made to contribute more highly to the inter-proile distance, while those that occur more requently are made to contribute less. According to Benzécri, the 2 2 χ distance d ( i, i ) as deined below, is well-matched to overcome the drawbacks o the usual Euclidean distance or the case o contingency tables. d 2 p p ij i j i i ( i, i ) = = ( ) 2 j j j= 1 j i i j= 1 j A key concept in the geometric representation o the individuals cloud is its centroid. This is a point in space which is not necessarily located in the geographical centre o the cloud, but that accounts or the mass assigned to each row o the input matrix. The coordinates g j o the centroid G I o n individuals in the deined as: p R space are 12

13 n n n n i ij i j = i = ij = i= 1 i= 1 i i= 1 i= 1 ( j) K g = j K( i, j) / K = = K j mass coordinates It is worth noting that each coordinate o the row centroid corresponds to the relative requency o the column to which such coordinate reers. This is a consequence o the complete symmetry between rows and columns, stemming rom the speciic arrangement o contingency tables (in Fig. A-2, it is indierent to organize the modalities o VAR 1 as rows or as columns, the same applying obviously to VAR 2). CONSTRUCTION OF THE PRINCIPAL AXES OF INERTIA AND PROJECTION PLOTS The basic trait o CA as a actorial technique is that the cloud o points does not stretch equally in every direction 11. Hence, in order to reduce the space dimensionality where the input table is to be interpreted, it is required to ind the directions o maximum spread o the cloud, i.e., the principal axes o inertia o the set o points representing the matrix rows. These Axes are obtained by a procedure that involves the eingevalue decomposition 12 o the inertia matrix obtained rom the input table by calculating its moments and products o inertia. This procedure, analogue to the classical multivariate least squares orthogonal distance it, can be viewed in intuitive terms as ollows: 1. Take the centroid o the cloud 2. From this point, move a straight line in all directions, sweeping the entire space where the cloud is positioned 3. For each direction, calculate the sum o the square distances rom each point o the cloud to the sweeping straight line 11 In the case where the cloud could be assimilated to a hyper-sphere, no direction would dier rom the others, in what the spread o points is concerned. This would indicate that there is no ainity between rows and columns o the input matrix, and consequently CA would yield a number o equivalent Axes, whose putative interpretation is pointless. 12 The eigenvalue decomposition o a symmetric matrix is optimal in terms o least squares. 13

14 4. Select the direction that minimizes the above deined sum o squares and calculate, or such direction, the sum o squared projections o every cloud point to the straight line 5. Take the vector representing the above deined direction as the irst Axis o inertia o the cloud and the above deined sum o squared projections as a measure o the importance o such an axis 13 (the ormer is the irst eigenvector o the inertia matrix, according to B-3, and the later is its irst eigenvalue) 6. Take a new straight line, lying in the plane orthogonal to the irst one, and repeat the algorithm, inding the second Axis and its importance. Once obtained this axis, iterate the procedure until p 1 axis are extracted (i p<n). Trough this modus operandi, a set o p 1 principal axes, sorted by descending order o importance, are produced. At this stage, the cloud o points representing the rows o the input matrix can be projected onto the previously obtained axes. Being projection u α j the coordinate j o axis α, the iα o an individual i onto such axis is given by the scalar product o two vectors, one giving the axis direction, and the other the position o the individual in the R p space (corrected to transorm the required or Cartesian graphs), as ollows: u p ' ij i α = j= 1 i j αj 2 χ distance into the usual Euclidian distance As a consequence o the symmetry o the input table, the columns o the data matrix can also be projected onto the same axes, applying the transition ormulae, given below: ' jα = n 1 ij λ α i= 1 j ' iα 13 The importance o an axis is a measure o conormity o the initial cloud to its projection onto the axis. The more important is an axis, the less deormed is the cloud, when it is reduced to its projections onto such an axis. 14

15 ' iα = p 1 ij λ α j= 1 i ' jα Where jα is the projection o column j onto axis α, whose eigenvalue is λ α Hence, at this phase, two sets o tables are produced, exhibiting the coordinates o the projections o rows and columns onto the same axes. Now, selecting a small 14 number o axes, those can be put into graphical orm, displaying the projections o rows and columns onto the principal planes deined by these axes, as illustrated in Fig. A-8 or a particular point. Fig. A-8 Projection o a point o the cloud into a principal plane The above mentioned principal planes, which are maps characterized by a metric leading to the same distance scale in all directions, are constructed by crossing in Cartesian graphs Axis 1 with all the other selected axes. These maps represent successive sections o the original cloud o points, which are sorted by descending order o importance. Moreover the selected sections are optimal, in the sense that they minimize the loss o inormation when the cloud is substituted by such an array o plots. I they are prone to be interpreted, this set o plots shows the original p R and n R constellation o points under a useul orm, reducing the dimensionality o the problem in the most avorable way. 14 The number o axes to be selected depend on a trade-o between their importance, given by the raction o inertia conveyed by their eigenvalues, and the context where the problem at hand is situated, requiring that all selected axes are interpretable. 15

16 Fig. A-9 summarizes the CA algorithm, illustrating how the input table is converted into a graphical output. It is worth noting the crucial role played by the transition ormulae, which permit to encapsulate into a single inal plot the R p and R n analyses. Such a inal plot is the most economic (and inormative) synthetic representation o the input table. Fig. A-9 Diagram symbolizing the CA algorithm 16

17 GRAPHICAL INTERPRETATION OF OUTPUTS Given a set o graphs where Axis 1 is combined with all the other pre-selected axes, the interpretation route starts by the most important plot, deined by the irst two axes (those that exhibit the larger eigenvalues). Examining this graph (in conjunction with the others), the data analyst (together with the data expert) must give a physical meaning to all axes, producing a discourse that explain their role in terms o similarity (and/or opposition) between properties and/or individuals (columns and/or rows o the input matrix). Indeed, the whole process o interpretation relies on the meaning o Axes, not in proximity (or detachment) between projections onto the planes (above all, i such projections are in the vicinity o the graph s origin). The act that a certain projection o a row i lies close to a projection o a column j does not imply that i is associated with j. No direct row-column distance interpretation is allowed, due to the scaling procedure underlying the method o projection o both individuals and proprieties onto the same plane. Hence, the joint interpretation o rows and columns points must be perormed with respect to the principal axes o the map. Thereore, the understanding o such axes is the irst task to be carried on when interpreting the graphical outputs provided by the CA algorithm (bottom o Fig. A-9). For accomplishing this task, it is needed to choose which properties and/or individuals are associated with each axis. This requirement entails the choice o a threshold in a certain measure o association, denoted Absolute Contribution o individual i to Axis α, and given by 15 : C a iα = i λ ' 2 iα α Where ' iα is the projection o individual i onto Axis α, and λ α is the eigenvalue assigned to Axis α. 15 As noticed beore, individuals and properties are inter-changeable in the CA algorithm (hence, the same ormula applies or properties, substituting i by j ). 17

18 This measure o association expresses (in %) the raction o the Axis α total inertia that is conveyed by individual i. Should the individuals be randomly dispersed around the axis, the Absolute Contribution assigned to each o them is 100 n. Then, a natural criterion to spot which individuals can be used to interpret an axis is to impose a threshold in the Absolute Contribution o the set o n individuals, retaining only the subset that meets the condition that each o its individuals contributes to the axis in a ratio bigger than 100/n. The retained individuals are such that their distances to the Axis - when combined with their masses is the smallest, which entails that they project generally ar rom the graph s origin. Once selected the individuals (and/or properties) exceeding the above deined threshold (and/or 100 p ), their projections onto the graph which lie generally on the outer parts o the map, close to its edges or periphery allow to interpret the axis in terms o vicinity/separation between individuals/properties 16. The above described procedure is repeated or all pre-selected axes 17, until a coherent interpretation o the relevant graphs is reached, or the entire set o individuals/properties, as illustrated in Fig. A-10. In the let part o Fig. A-10 it is shown some usual conigurations obtained when a cloud o points is projected onto the plan crossing Axis 1 and 2 provided by the CA algorithm. The right part o Fig. A-10 represents the input matrix, ater being re-arranged by ascending order, according to the projections o its rows onto Axis 1 (the irst row o the re-arranged matrix exhibits the minimum projection onto Axis1). 16 At this stage, it must be stressed once more that CA deals with proiles, which means that one does not interpret the raw requencies that are given in the input table, but rather their values relative to the SUM o the respective row or column. In eect, comparing individuals or properties always means comparing their proiles. 17 It is not unusual that the set o pre-selected axes cardinal rises above (or below) the minimum number o axes required or interpretation, in which case the interpretation requirement prevails. It may also happen that a certain plane, although important in what concerns the eigenvalues o its axes, does not add any relevant improvement to the interpretation process. In this case, such a plane should be disregarded, in avor o a less important one, which could be more inormative or interpretation purposes. It is not because an axis has a relatively small eigenvalue that it should be ignored (oten such an axis helps to make a strong point about the data). As a general rule based on our experience, it is very exceptional that CA based case studies need more than 3 or 4 axes to get a coherent interpretation o large data tables.. 18

19 Fig. A-10 Typical conigurations o the projections onto plan 1,2 o a cloud submitted to CA, and corresponding input matrices re-arranged according to the projections o their rows onto Axis 1 Assuming that all elements projected onto the plans given in Fig. A-10 are relevant or the Axes 1 and/or 2 (in the previously described sense that the given threshold or their Absolute Contributions is exceeded), Fig. A-10 shows the equivalence between the graphical and the tabular orm o the input matrices (which is obviously an important ingredient to improve the interpretation endeavor, since results can be matched with original data). Analyzing case (1) o Fig. A-10, it is apparent that Axis 1 separates clearly two areas, both in graphical and tabular terms. Those are denoted A and B in the projections plot, and the interpretation o Axis 1 is made on the grounds o its ability to break away the two groups o elements A and B, each one o which exhibits a strong inner homogeneity (elements belonging to group A or B are similar, inside each group). For this case, Axis 2 has no relevance, or interpretation purposes (this argument should be checked by inspection o Absolute Contributions to Axis 2). Regarding the tabular orm represented in the right part o Fig. A-10, the act is that the input matrix ater being re- 19

20 arranged according to its projections onto Axis 1 shows two diagonal blocks containing the elements displayed in the graph as clusters A and B. These blocks exhibit values o the absolute requencies (or a matrix standing or a two-way contingency table) that are the highest, whereas the non-diagonal blocks contain the smallest values (denoted 0 in Fig. A-10). Regarding case (2) o Fig. A-10, three groups o individuals and/or properties emerge. Axis 1 separates projections A rom C, and Axis 2 dierentiates (A+C) rom B, assuming that all elements projected in the graph meet the condition that their Absolute Contribution exceeds 100 n and/or 100 p or Axis 1 and 2. When the input matrix is rearranged in the same way as in case (1), three diagonal blocks are obtained (ollowing the sequence driven by Axis 1). These blocks are bordered by quasi-null elements, as depicted in the right part o Fig. A-10 (2). The case denoted (3) in Fig. A-10 is very common in CA outputs, specially or ordinal variables (i.e., qualitative variables whose modalities are sequenced). Such a pattern similar to a parabolic crescent is known as the Guttman eect, and requires only one coordinate to identiy the sequence o individuals along Axis 1. The data matrix, ater being sorted according to the previously described procedure, exhibits a diagonal structure, where central elements present much higher values than non-diagonal ones. The structures recognized in Fig. A-10, being a sound basis or the descriptive interpretation o data without calling or any statistical hypothesis, are in addition the grounds where a preliminary modeling design can be preormed, providing some explicative power to CA methodology per se. In act, groups o individuals obtained in cases (1) and (2) may be the basis or establishing an empirical typology, which is in general more ruitul than those produced by most clustering algorithms, since the groups produced by CA are explained by the properties that are connected with the Axes responsible or the ormation o groups (APPENDIX B). Moreover, in case (3), scaling indices holding a certain signiicance in terms o properties may be produced, sorting quantitatively the sequence o individuals by means o a single meaningul real number (which is the Axis 1 coordinate), related to the modalities o variables that drive the spread along Axis 1 (APPENDIX B). 20

21 In most cases, the interpretation process does not take the holistic lavor that is patent in Fig. A-10, due to the nature o data (which do no allow a global reading o the graphical outputs based only on Axis 1). Generally, several but no more than three or our axes are needed, and the interpretation is made on the grounds o such axes, related through the application o the threshold criterion to individuals and/or proprieties which they are associated with. Also in addition to the context provided by the data expert the ormat o the input matrix must be taken into account in the interpretation process. In act, even though complete disjunctive and Burt matrices give rise to a similar pattern when their graphical outputs are compared, their axes eigenvalues dier. Moreover, the total number o axes provided by CA depends on the data matrix ormat: or a contingency table, it amounts to p-1 i p<n (or to n-1 i n<p); or a juxtaposition o q contingency tables (in particular, or a complete disjunctive matrix), it amounts to p-q (assuming n>p); or a Burt matrix, it amounts to p-q (where p is the dimension o the square matrix containing q blocks). Furthermore, in contrast to other actorial classical methods like Principal Components Analysis, it should be emphasized that CA captures non-linear relationships. This oremost eature o CA recommend to apply such a methodology even in some cases holding only purely quantitative variables, ater splitting them into classes (APPENDIX C, where a case study where circular quantitative attributes occur is described). SUPPLEMENTARY PROJECTIONS In some instances, the input matrix to be submitted to the CA algorithm is heterogeneous, i.e., blocks o a dierent kind may be acknowledged in the data table, both or individuals and variables. In this case, it may be ruitul or the sake o interpretation to split the matrix in a series o homogeneous blocks, the relationship rom one to the others is to be ound out. For this end, CA oers a speciic procedure, denoted Supplementary Projection, which permits to interpret, under a graphical orm, how those blocks relate. The Supplementary Projection procedure consists o selecting a block o homogeneous individuals and/or properties, designated by Principal or Active and 21

22 apply the CA algorithm using only this block to produce the axes, which are interpreted as described above in terms o rows and columns o the principal block. Ater that, individuals and/or properties belonging to the other blocks (demoted illustrative ) are projected as supplementary elements onto the axes derived rom the active block, according to the ollowing rationale. Given the proile + i j + i o a supplementary individual, its projection onto the axes provided by the eigenvalue decomposition o the inertia matrix corresponding to the principal block is written as: ' + i α = p 1 λ α j= 1 + i j + i ' jα where ' j α are the projections o the principal matrix columns onto the axes α o the same matrix (the eigenvalues o which are denoted λ α ). Obviously, the same applies mutatis mutandis to the projection o supplementary properties, as ollows: ' + j α = n 1 λ α i= 1 + ij + j ' iα By means o the above given ormulae or supplementary projections, all blocks o the initial matrix that were not considered as Principal are related to the axes provided by the later. Now, the problem arises how to judge the strength o the relationship between a given supplementary element and the axes produced by the CA algorithm when applied to the principal matrix. A natural way o achieving this goal is to measure the angle between the location o the supplementary element in the original space and all axes derived rom the principal matrix. This measure is denoted Relative Contribution, given by: 22

23 C r αi 2 α = i 2 ρ = cos 2 β Where ρ 2 = ' 2 iα α is the distance rom the supplementary individual i to the centroid o the cloud representing the principal matrix, and β is angle between the vector representing i and axis α. The corresponding ormula or the Relative Contribution o an axis to a supplementary variable j is obviously ' 2 r jα 2 Cα j =, where ρ = 2 ρ α ' 2 jα At this phase, given a set o previously interpreted axes derived rom the principal matrix, it is required to identiy which axis a certain supplementary element relates the most, to put in correspondence such element with individuals and/or properties that are responsible or the axis emergence. The axis we are seeking or is obviously the one which contributes the most to the element under scrutiny (should the element lie on the axis, a Relative Contribution o 1 is obtained). In addition to allow the selection o the axis which a certain element is associated with, the value o the Relative Contribution in the interval [0,1] accounts also or the strength o such an association, playing an analogue role as the correlation coeicient in classical regression. The greater is the Relative Contribution o an axis to an individual (or property), the closer is that element to the axis (in particular, i the Relative Contribution o an axis to a certain element is zero, this indicates that such an element is orthogonal in respect to that axis). The Supplementary Projection procedure, although being not an exclusive eature o the CA algorithm, is undoubtedly its most powerul modeling tool, allowing to outline strategies or coping with problems o questionnaire enhanced handling, diachronic studies, spatial comparisons, and other issues involving relationships between dissimilar blocks o the input matrix. 23

24 Moreover, new developments on CA applications were put orward by our applied research, on the grounds o supplementary projections. These new developments as qualitative regression and archetypal discrimination are addressed in the subsequent sections. QUALITATIVE REGRESSION When searching or a relationship between two sets o qualitative variables observed in an array o individuals, no classical regression may be applied since the values representing the individuals attributes are not real numbers, but codes indicating the modalities o the qualitative variables included in the available empirical data. CA is a valuable tool to address this modeling problem, provided that advantage is taken rom supplementary projection o one set o variables onto the other. Given an array o empirical cases contained in a database where the two sets o variables are known in the same individuals, the problem to be approached by the proposed methodology can be summarized in the ollowing steps. The irst step aims to extract rom the database a set o q variables (denoted predictors ) that are observable in a new case where the other set o variables (denoted dependent ) is to be predicted. The predictors are then arranged in a complete disjunctive matrix A, containing n rows (the individuals) per p columns (the total number o modalities or the q predictors observed in the n individuals). The second step consists o selecting, rom the database, a new set o qualitative variables to be predicted on the grounds o the irst set. These dependent variables are arranged under the same ormat as A, giving rise to a matrix B (n p ) that contains, or the same set o n individuals, the p modalities o the relevant attributes to be predicted in new cases, where only predictors are recorded. The third step seeks to establish some sort o relationship between B and A. Obviously, this cannot be achieved by means o an equation o the type Y = (x 1,..x q ) (as it is usual an in ordinary regression), since all variables are qualitative. But a speciic 24

25 kind o graphical relationship between B- and A-type variables can be obtained i B is projected onto the actorial axes resulting rom the eigenvalue decomposition o A. This relationship is mediated by the actorial axes, which play the role o a transer unction between B and A (summarizing A-type variables in quantitative coordinates, which are linked, through the same metric, with the corresponding B-type variables coordinates). The ourth step consists o using the relationship given by the previous procedure to orecast the modalities where B-type attributes all, or a new matrix C (n p) containing only the predictors codiied under the same ormat as A and B, and reerring to the selected n individuals where the relevant attributes are to be predicted. It must be stressed that, as ar as prediction is concerned, the third step o the above described methodology is the crucial point to be dealt with. Such point, permitting to get a satisactory relationship between matrices B and A, is addressed by using CA as qualitative regression tool. This calls or the concepts o supplementary projection and relative contribution : the ormer places B-type variables onto the actorial axes provided by matrix A, and the latter measures the quality o the relationship (the analogue o the correlation coeicient, in ordinary regression). Adjusting the transition ormulae to the case o complete disjunctive matrices, the supplementary projection o modality j o matrix B onto the axis α provided by CA o matrix A is given by: where, + ' j α = n j' 1 λ n α i= 1 n j' is the sum o column j' in matrix B, representing the total number o occurrences o each supplementary variables modality λ α is the α -eigenvalue provided by CA o matrix A 1i modality δ ' ij 0 otherwise j ' occurs in row i iα is the projection o the row i onto the α -eigenvalue provided by CA o matrix A δ ij' iα 25

26 It is worth noting that, as expected, all terms o the above given equation depend only on the eigenvalue decomposition o matrix A, and provide all modalities o the variables to be predicted in unction o the predictors modalities (summarized in their projections onto axes α emerging rom matrix A). Now, to choose which axes are relevant to the relationship between B and A, the relative contributions o all axes to the dependent variables are scrutinized. The more the relative contribution o the axis α to a given modality is close to 1, the more that modality is associated with axis α, which in turn relates to a subset o predictors, interpreted in terms o the CA algorithm. This interpretation is perormed on the grounds o CA algorithm reerring only to matrix A, by applying the inertia criterion: a given axis is explained by the combination o predictors that exceeds the proportion o the total inertia that would be assigned to these predictors or a hypothetical uniorm distribution. By applying a maximization criterion to relative contributions, it is achieved the selection o the axes that explain the best a link between predictors and dependent variables, associating one set o variables to the other. Also, the rows o matrix A representing the individuals in the empirical database can be projected onto the same axes, as usual in CA. As expected rom the speciic nature o the problem, no equation relating B- to A- type variables is obtained. However, the projections o each modality o B-type variables onto the relevant axes are given by the above given supplementary projection expression providing the values o + ' j α. Thereore, since the same axes are related to A-type variables through their coordinates, the qualitative regression is perormed in graphical terms, mediated by the axes. Now, the n new cases containing only A-type variables arranged under the complete disjunctive ormat in the matrix C (n p) are projected as supplementary individuals onto the previously obtained axes, according to the ollowing equation: + i ' α = q 1 λ α p j = 1 δ i ' j jα where, 26

27 q is the number o A type variables λ α is the α -eigenvalue provided by matrix A 1i modality j occurs in row i' δ ' i j 0 otherwise jα is the projection o column j onto the α -eigenvalue provided by matrix A Fig. A-11 summarizes the entire procedure, emphasizing how supplementary projection is the key ingredient to achieve the qualitative regression, both in unveiling the relationship between the two sets o variables and in orecasting dependent variables or new cases. An example o application o this procedure to the assessment o risk o mine tailings dam breakage is provided in APPENDIX D, illustrating the proposed modeling methodology. Fig. A-11 Outline o the procedure to use CA as a qualitative regression tool 27

28 ARCHETYPAL DISCRIMINATION CA can be used as a modeling technique to classiy a set o individuals sharing the same qualitative attributes 18 in reerence to a scale deined by two extreme poles or archetypes. The procedure to achieve such goal is outlined in the sequel. Given an empirical data set o n individuals where q attributes were observed, these are arranged under the orm o a complete disjunctive matrix o p columns, (denoted R) containing the relevant modalities o each variable, whose total number is p. Scrutinizing these modalities, two abstract vectors are constructed by the data expert: the irst corresponds to the GOOD pole (archetype 1) and is obtained by selecting, or each variable o the real data set arranged as R, the most avorable modality in respect to a certain criterion; in contrast, the second denoted the BAD pole (archetype 0) is obtained by selecting the most unavorable modalities in respect to the same criterion. These two vectors are put under the orm o a 2 x p complete disjunctive matrix A, containing the same modalities as the real data set. When matrix A is submitted to the CA algorithm, a single Axis is obtained insomuch as the input matrix contains only two rows. This Axis where the modalities projections are also displayed can be viewed as a scale whose extremes are the GOOD and BAD poles (archetypes 1 and 0). Then, when the empirical data set R (n x p) is projected in supplementary terms onto the single Axis provided by CA o the archetype abstract matrix A, the real individuals are characterized by a quantitative variable which is their coordinate in the Axis. This coordinate measures the extent to which each individual resembles to the predeined extremes, and can thereore be used as its degree o goodness. Consequently the set o all individuals projections can be sorted accordingly, providing in general two distributions represented by histograms (one corresponding to individuals more similar to archetype 1, and the other to individuals more similar to archetype 0 19 ). 18 As usual, those attributes may contain some quantitative variables that were previously split into classes 19 In most cases, the matrix R is divided by a priori knowledge into two dierent blocks, each one o each corresponding to real individuals assigned beorehand to a given archetype. In this instance, it is known a priori that the individuals belong in act to two dierent groups, and their projections are contained in distinct histograms. 28

29 in Fig. A-11. The proposed methodology is illustrated or a generic case in the diagram shown Fig. A-12 Archetypal discrimination symbolic description 29

30 In order to accomplish all the objectives o a comprehensive Discrimination Analysis, it is required to allocate an anonymous individual 20 to one o the groups related to each archetype. Consequently, it is needed to address the problem o the overlapping zone represented in the bottom o Fig. A-12. For this end, a boundary dividing in a clear-cut way the Axis provided by CA into two zones must be ound. The procedure to search or such a boundary by an optimal method consists o simulating dierent positions or the boundary in the overlapping zone, until the raction o misclassiied cases reaches its minimum. When this optimal position is established, any unknown case ( anonymous individual) can be allocated to Group I (individuals similar to archetype 1) or to Group II (individuals similar to archetype 0), depending on its supplementary projection onto Axis 1 in relation to the optimal boundary location, as illustrated in Fig. A-13. Fig. A-13 Allocation o unknown cases by establishing an optimal boundary In APPENDIX E is outlined a case study aiming at establishing an index o quality in natural stones, based on archetypal discrimination. 20 This is a case where the a priori belonging is unknown. 30

31 APPENDIX A AIR QUALITY IN GDANSK The air quality in the city o Gdansk is monitored by measuring the concentration o a set o ive pollutants: Nitrogen Dioxide (labeled as NO2), Sulur Dioxide (labeled as SO2), Particulate Matter (labeled as PM), Carbon Monoxide (labeled as CO), and Ozone (labeled as Oz). For the year o 2010, the average monthly concentrations (expressed in micrograms per cubic meter) are given in Table I. Table I The aim o the study is to ind the pattern o association between extreme values o the pollutants concentration those that exceed the allowed limits provided below in Table II and the months o the year

32 Table II In order to apply CA to this case study, Table I was put under a complete disjunctive ormat by handling the inormation contained in Table II according to the ollowing procedure: i the value o concentration or a given pollutant is lower than the prescribed limit given in Table II, code 1 is assigned to column labeled -, and code 0 is assigned to column labeled +. In the cases or which the observed concentration exceeds the limit, code 1 is assigned to column labeled +, and code 0 is assigned to column labeled -. As a result o this procedure, raw data is transormed into the complete disjunctive matrix given in Table III. Table III NO2- NO2+ SO2- SO2+ PM- PM+ CO- CO+ Oz+ Oz+ Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec

33 When the CA algorithm is applied to Table III, a set o 5 axes are obtained. In Table IV it is given the eigenvalues o such axes, as well as the % o inertia conveyed by each o them and the accumulated inertia. Table IV EIGENVALUE % INERTIA % ACCUM AXIS AXIS AXIS AXIS AXIS By inspection o Table IV, a preliminary conclusion may be oered: it seems that the problem can be approached using only axis 1 and 2, which account or 75% o the cloud total inertia. In act, i these two axes explain all relevant proprieties which are the cases where the allowed limits o pollutants are exceeded it is not needed to scrutinize the remaining axes. For the end o assuring that the plan 1,2 is suicient to display the relationships between the relevant variables, it is required to examine table V, where the absolute contributions o the set o proprieties to axes 1 and 2 are given (the values that surpass the practical threshold o 100/p=10 are printed in bold or axes 1 and 2, indicating that a signiicant connection may be established between a given modality and one o these axes). Table V ABSOLUTE CONTRIBUTIONS AXIS 1 AXIS 2 AXIS 3 AXIS 4 AXIS 5 NO NO SO SO PM PM CO CO Oz Oz

34 Since all modalities or which the allowed limit in the pollutant concentration is exceeded can be connected to axis 1 (SO2+, PM+, C0+, Oz+) or axis 2 (N02+), the other axes can be discarded, and the interpretation o all relevant variables can be perormed in the plane 1,2, depicted in Fig. 1. Fig.1 Projection o variables modalities onto plane 1,2 The interpretation o Fig 1 is to be done exclusively on the grounds o modalities linked to axis 1 and 2 (and this linkage is symbolized by arrows pointing to one o the axes). Regarding axis 1, a group o 3 points project onto in its let part (SO2+, PM+, C0+), as opposed to Oz+, which project onto its right part. This means that there is a similarity between the proile o modalities SO2+, PM+, C0+ along time, and that this cluster opposes to modality Oz+. In what axis 2 is concerned, it separates the two modalities o NO2. Moreover, since the two axes are orthogonal, the pattern disclosed by axis 1 is unrelated to axis 2. 34

35 Hence, the conclusion can be drawn that the irst two axes are suicient to reveal the association/opposition pattern o all relevant modalities (those that indicate an excess o pollutant concentration over the allowed limit). Given that the raction o inertia conveyed by such axes is signiicant (75 %), it was decided to disregard the remaining axes. It is now required to select, rom the set o individuals (months), those whose contribution to axes 1 and 2 exceed the practical threshold o 100/12=8.33. This is perormed by examining Table VI, where the link between a given month and one o the axes 1 and 2 is symbolized by the respective absolute contribution printed in bold. Table VI ABSOLUTE CONTRIBUTIONS Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec The inspection o Table VI indicates that the individuals linked to axis 1 are JAN, FEB, JUN and DEC, and those linked to axis 2 are MAR, JUN and SEPT. These months, retained or interpretation in conjunction with conclusions drawn rom Fig. 1, are highlighted in Fig. 2 by the arrows pointing their projections to the axis they are linked to. 35

36 It is worth noting that months APR, AUG and JUL project onto the same point and hence their absolute contributions to Axis 1 are summed up, amounting to 3 x (which exceeds the practical threshold o 100/12=8.33, and are thereore retained or interpretation). In what JUNE is concerned, even though its absolute contribution to Axis 2 is greater than to Axis 1, it was decided to assign this individual to the later axis, since its explicative power is more important than the ormer (55% vs. 20 %) All the other months do no intervene in the pattern revealed by Fig. 1, and are disregarded rom this analysis. Fig.2 Projection o individuals onto plane 1,2 The joint interpretation o Fig. 1 and 2 leads to the pattern o associations/oppositions represented in Fig. 3 symbolic diagram. 36

37 Fig. 3 Diagram showing associations/oppositions between pollutants and months Fig. 3 indicates that concentrations o pollutants CO, SO2 and PM that exceed the allowed limits occur mainly in months Jan, Feb and Dec. This association correlates negatively with months o April, June, July and August, linked to extreme vales o Ozone. A weaker (20 vs. 50%) opposition is disclosed by Axis 2, detaching concentrations in NO2 above and below the allowed limit (the ormer being linked to Sept and the later to March) SOURCE: Project or the course Natural Resources Management and Planning ( ) 37

38 APPENDIX B RESERVOIR QUALITY ZONES IN AN OIL FIELD In this example CA is applied to properties captured in a Middle East oil ield through a set o exploration wells. The aim o the study is to model the reservoir internal architecture in terms o homogeneous zones, in what oil quality is concerned. Based on such Reservoir Quality Zones, production planning can be achieved by maximizing recovery. The available data, consisting o quantitative and qualitative variables that characterize the reservoir, were arranged under a common ormat by the construction o the Complete Disjunctive Matrix given under a symbolic orm in Fig. D-3.1, where the variables reerring to the 172 wells trough which the oil ield was sampled were split into two sets: the irst set, denoted principal, contains the leading properties o the reservoir, in terms o their capacity to give rise to contrasting zones inside the geological ormation in which the oil was trapped; the second set, denoted supplementary, contains ancillary parameters that acilitate the interpretation o the ormer, or the purpose o obtaining homogeneous Reservoir Quality Zones. For each variable, meaningul classes or categories were established by the reservoir engineer who needs to disclose the oil ield internal architecture, or production planning purposes. Fig. D-3.1 Data model 38

39 When the matrix depicted in Fig. D-3.1 is submitted to the CA algorithm, the graph given in Fig. D-3.2 is obtained, or the plan deined by Axis 1 and 2 (representing 85 % o the initial could inertia). Fig. D-3.2 Projection o individuals and variables into the principal plan produced by CA (Labels o variables categories are given in Fig. D-3.1) The interpretation o Fig. D-3.2, depicting a perect Guttman eect, leads to the creation o 7 groups o wells, sequenced according to their projection onto Axis 1. Such sequence displays a degradation in oil quality (rom negative to positive coordinates): in the let part o the Axis, the groups have the highest worth (and such worth decreases when the wells coordinate increases); in the right part o Axis 1, wells reveal a poorer oil quality (the greater is a well coordinate in Axis 1, the smaller is the oil worth). In the region o the plot deined by small Axis 1 coordinates (and by big negative Axis 2 39

40 coordinates 21 ), GROUP 4 is projected, associated with intermediate values o Elevation and Water Saturation. This particular group makes the transition between pure oil and clean water [the ormer reerring to the highest negative coordinates in Axis 1 (associated with low values o Elevation and Water Saturation), and the later projecting near its positive edge, associated with high values o the same variables. Regarding the columns that were projected as supplementary properties, they show a similar pattern as the principal parameters (or the case that they display an ordinal character); regarding nominal variables (as acies, presence/absence o limestone, clay, and dolomite), their modalities that project onto the negative part o Axis 1 indicate the presence o high quality oil, and the reverse or modalities projecting onto its positive part. As a consequence o the Guttman eect revealed by Fig. D-3.2, it is acknowledged that each group o wells (G1 G7) can be identiied exclusively by its projection onto Axis 1. Limits on such projections are given in Fig. D-3.3, which is a kind o symbolic U-shaped histogram, where classes to which groups belong are given by appropriate intervals in Axis 1 coordinate (derived rom Fig. D-3.2). As previously discussed, the oil quality (assessed by oil saturation o the rock) augments when the coordinate in Axis 1 decreases. Fig. D-3.3 Symbolic histogram providing the deinition o groups o wells according to their coordinate in CA Axis 1 The zones o the reservoir (groups o wells) obtained by the aorementioned procedure are shown in Fig. D-3.4, now in a horizontal cross-section across the geographical space where the oil ield is located in. By inspection o Fig. D-3.3, a core o high quality oil is spotted in the centre o the reservoir, and the above noticed 21 Axis 2 does no discriminate oil rom water, being interpreted as reveling the opposition between oil and water vs. mixed groups. 40

41 degradation in the CA plot is now portrayed in physical terms. This can also be seen in the vertical cross-section o Fig. D-3.5, reerring this time to the position o groups in regard to elevation Z (the lower is the elevation below surace, the higher is the oil content). Fig. D-3.4 Geographical representation o Groups (labeled 1 7) obtained by CA (horizontal cross-section) Fig. D-3.5 Geographical representation o Groups (labeled G1 G7) obtained by CA (vertical cross-section) SOURCE: Pereira, H.G., Silva, A.C., Soares, A., Ribeiro, L., Carvalho, J. (1990) Improving reservoir description by using geostatistical and multivariate data analysis techniques, Math. Geol., Vol. 22, No. 8, pp

42 APPENDIX C CLIMATOLOGY OF PORTO URBAN AREA In a weather station located in Porto, records o a small set o climatologic variables is available or the period 1998/2000. In addition to the time o day when each measure was taken, the variables that come here into play are temperature and wind direction/speed. In order to submit this data set to CA, it is required to codiy all variables under a common scheme, denoted by Benzécri as a complete disjunctive matrix. For this end, since all variables are quantitative, it is necessary to categorize them into meaningul intervals, whose limits were proposed iteratively by a climatology expert. For this speciic case, particular attention was paid to variables time o day and wind direction, as a consequence o their dierent character with respect to the remainder parameters. Indeed, while the later (temperature and wind velocity) are expressed by usual real numbers, the metric associated to the ormer is by no means linear (and regular arithmetic does no hold). Such metric, which is an acknowledged property o directional or circular variables, does not permit to calculate the Euclidean distance needed to apply the most common method o data analysis or quantitative variables: Principal Components Analysis (PCA). Hence, in this case, the use o CA was driven by the data set characteristics, not or the reason that the available variables are qualitative, but because their heterogeneity (even within the quantitative realm) could not be managed trough a simpler technique like PCA, which assumes linearity. Ater a series o trials, the modalities o the available variables were established as shown below. VARIABLE time o day CODE H1 H2 H3 Limits (h) 0/8 8/16 16/24 VARIABLE temperature CODE T1 T2 T3 T4 T5 Limits (ºC) 0/10 10/15 15/20 20/25 >25 42

43 VARIABLE wind direction CODE D1 D2 D3 D4 D5 Limits (º) 0/60 60/ / / /360 VARIABLE wind speed CODE V1 V2 V3 V4 V5 Limits (m/s) 0.0/ / / /7.0 >7.0 Fig. D-5.1 Codes and limits or the available variables Histograms o the available variables expressed in absolute requency o occurrences are given in Fig. D-5.2 to D.5.-5, in accordance with the classes previously established. Fig. D.5.2 Variable time o day Fig. D-5.3 Variable temperature 43

44 Fig. D-5.4 Variable wind direction Fig. D-5.5 Variable wind speed The empirical data table was then converted into a complete disjunctive matrix containing rows and 18 columns (the total number o modalities or the available set o variables, codiied according to Fig. D-5.1). When this matrix is submitted to CA algorithm, two principal planes were obtained, explaining 40% o the total inertia o the cloud. Although this raction o inertia may seem small, the act is that all variables can be interpreted on the grounds o their projections onto such planes 22. Thus, the climatologic interpretation is based solely on the projections o the variables modalities onto planes deined by Axis 1,2 and 1,3, depicted in plots o Fig. D-5.6 and D-5.7, respectively. 22 This is a eature that occurs in most cases o applying CA to a complete disjunctive matrix. 44

45 Fig. D-5.6 Projection o variables modalities onto plane 1,2 Fig. D-5.7 Projection o variables modalities onto plane 1,3 In order to interpret the above given igures, it is required to put the histogram reerring to wind direction (D-5.4) in geographical terms. This is shown on Fig. D-5.8, where classes D1 to D5 are shown in respect to the Wind Rose prevailing in the Porto Region. 45

46 Fig. D-5.7 Wind direction circular histogram or the Porto region In what regards Fig. D5-6, the interpretation o Axis 1 is straightorward: it shows an increase in temperature, rom let to right. A clear connection is perceived between the extreme low category o temperature (T1) and night measures o H1 associated with East winds D2. This association is explained by the local scale phenomenon known as land breeze, occurring at nigh (when onshore temperatures are lower that oshore ones). Focusing now on the right side o Axis 1, the extreme high categories o temperature (T4 and T5) are in relation with the intermediate wind speed (denoted V3) and NW winds D4. This association is explained by a global scale synoptic phenomenon driven by the interaction between the Azores anticyclone and the Iberian depression, which causes a clockwise wind circulation reaching Porto rom NW (D4) and occurring mainly in hot Summer days (T4 and T5). In what Axis 2 is concerned, it can be remarked that it opposes Northern wind directions D1+D5 to both extremes o temperature (T1 and T5). 46

47 Fig. D-5.7 brings a new insight stemming rom Axis 3 interpretation. Such an Axis opposes wind speeds V4/V5 rom wind directions D1+D5, suggesting that strong winds seldom blow rom Northern directions. The conclusions drawn rom this analysis are o two kind: on one hand, some o them, being trivial and/or expectable, have just the unction o authenticate the methodology or the beneit o a skeptic climatologist; on the other, those conclusions that convey any sort o novelty may be useul as conditional evidence to be scrutinized by the (now less skeptic ) expert in the ield to which data reer. SOURCE: Góis, J., Pereira, H.G., Salgueiro, R. (2010) Geostatistics applied to City o Porto urban climatology, in geoenv VII, Atkinson & Loyd (Eds.) Springer, p

48 APPENDIX D RISK ASSESSMENT OF MINE TAILINGS POND DAM BREAKAGE In the Mediterranean region, ancient mining operations have generated huge amounts o sludge, dumped into tailing ponds that are in general sustained by precarious dams, constructed rom locally obtained ills. An inventory o such tailing ponds was perormed in the scope o an UE unded research project, aiming at developing a decision-support system to prevent environmental disasters ollowing the accidental breakage o this type o dams. Since modest physical modeling experience is available to address this issue (as opposed to water reservoir dams), it was decided to adopt a stochastic approach to assess the risk o breakage on the grounds o an historical data base where 55 cases o disasters were recorded, together with dam characteristics and harmul consequences. The irst step o the proposed approach was to extract rom the database an assemblage o attributes (denoted predictors ) that characterize the dam conditions, prior to the disaster, and another array o attributes that embody the damage resulting rom the disaster. Given that available inormation on prior conditions and damage contains important qualitative eatures (described by nominal variables like the nature o the dam, country where the pond is located, ailure type.), no classical regression may be perormed and conditions are met to apply CA based qualitative regression to be above outlined training set. In Fig. D-6.1, it is given the graphical output produced by CA application to the predictor matrix, arranged under a complete disjunctive ormat (where quantitative variables were split into classes). Fig. D-6.1 Projection o predictor s modalities onto Axis 1 and 2 resulting rom CA 48

49 The interpretation o Fig. D-6.1 is made on the grounds o Axis 1 ability to separate small dams in environmentally regulated countries (negative semi-axis) rom big, inactive and ring type dams located in in environmentally unregulated countries (positive semi-axis), showing in addition a clear increasing sequence (rom let to right) or the relevant ordinal variables like dam height and storage volume. Now, when attributes linked to dam breakage conditions are projected as supplementary variables, Fig. D-6.2 is obtained, displaying graphically how predictors are associated with to those variables. It is obvious that the only relevant mediator between the two sets o variables to be put into relationship is Axis 1, even though it conveys only 35% o the inertia cloud. Moreover, the strength o such a relationship can be evaluated quantitatively by the Relative Contributions o Axis 1 to each modality o supplementary attributes, as given in Table D-6.I. Fig. D-6.2 Supplementary projection o variables linked to the disaster conditions Table D-6.I Relative contributions o Axis 1 to supplementary variables modalities Negative semi-axis: Sludge Volume Released <50000m3 Mix Type o Sequentially Raised Tailing Dam (0.75) No atalities (0.55) Failure Type: Hole (0.20) Failure Type: Overtopping/Overlow (0.16) Downstream Type o Sequentially Raised Tailing Dam (0.12) (0.62) 49 Positive semi-axis: Upstream Type o Sequentially Raised Tailing Dam Sludge Linear Distance Traveled >12000m Sludge Volume Released >300000m3 > 10 atalities Sludge Linear Distance Traveled between 800 and 12000m 1-10 atalities (0.63) (0.51) (0.47) (0.40) (0.15) (0.11)

50 It is clear that Axis 1 resulting rom predictors matrix CA can be viewed as a scale o RISK. In act, onto the negative semi-axis are projected modalities o the attributes that characterize the type o ailure leading to low damage, and the reverse occurs or the positive semi-axis, where severe damage variable classes are ranked. Hence, when projecting onto Axis 1 the test sites displayed in Fig. D-6.3 whose risk o ailure is to be assessed, a ramework o prevention priorities (and eort) can be established, on the grounds o Fig. D-6.4., where the training set o historical disasters is also projected as reerence numbers, the meaning o which is disclosed in Table D-6.II. Fig D-6.3 Location o test sites Fig. D Supplementary projection o test sites and cases drawn rom the Historical Data Base onto an empirical scale o risk provided by CA 50

51 Table D-6.II List o the 55 cases that compose the Historical Data Base. Year o Re. Re. Name Country the Name number number incident El Cobre Old 1 Los Frailes Spain Dam Country Year o the incident Chile Aitik Sweden Fort Meade USA Baia Borsa Romania Harmony South Arica Baia Mare Romania Hokkaido Japan Sgurigrad Bulgaria Itabirito Brazil Maritsa Istok 1 Bulgaria Jinduicheng China Stava Italy La Patagua New Dam Chile Balka Chuicheva Russia Los Maquis Chile Zletovo Macedonia (Yugoslavia) Mike Horse USA Maggie Pie United Mochikoshi Kingdom No.1 Japan 1978 Montcoal 11 Bilbao Spain No.7, Raleigh USA 1987 County 13 Derbyshire United Kingdom Olinghouse USA Madjarevo Bulgaria Omai Guyana Middle Arm Tasmania Unknown 70 Placer, Surigao del Norte Philippines Partizansk, Primorski Krai Russia Riverview USA Huelva Spain Sipalay Philippines Amatista, 1994 or Peru Nazca Stancil USA Arcturus Zimbabwe Sullivan mine Canada Baokeng South Arica Tennessee Consolidated USA 1988 No.1 26 Bellavista Chile (unidentiied) SW USA Bualo Creek USA Veta de Agua No.1 Chile Cerro Negro Chile Barahona, Chile Unknown 30 Cerro Negro No.4 Chile Bonsal USA Unknown 31 Cerro Negro No.3 Chile Mochikoshi n2 Japan Unknown 32 Church Rock USA Phelps-Dodge USA Unknown 33 Deneen Mica USA Silver King USA (unidentiied), East Texas USA Unidentiied USA Unknown 35 El Cobre New Dam Chile

52 SOURCE: Salgueiro, A.R, Pereira, H.G., Rico, M.T., Benito, G. Díez-Herrero, A. (2008) Application o Correspondence Analysis in the assessment o mine tailings dam breakage risk in the Mediterranean Region, Risk Analysis, Vol. 28, No 1, pp

53 APPENDIX E INDEX OF QUALITY IN NATURAL STONES A major problem that arises in natural stone exploitation planning is the lack o an objective criterion or optimizing the economic recovery o the material to be produced, under environmental constraints. In act, and in contrast with mineral commodities mining operations where grade is the decisive control variable, or the case o natural stone extraction there is no single parameter driving the demand, which depends decisively on a myriad o actors, ranging rom physical to aesthetical eatures, strongly associated with the speciic application oreseen or the material to be extracted. But the current practice, insomuch as it ocus on blind ad-hoc supply o the most accessible blocks, does not take into account demand requirements. This situation leads to serious shortcomings, namely the huge deposition o waste blocks in the vicinity o the quarry, with the associated landscape recovery costs, and the production o material which is worthless or a given conjuncture in the building industry, with the associated storage costs. In order to address this issue under a demand driven approach that minimizes environmental damage and stocks waiting or a virtual removing, a resh viewpoint is put orward: the production planning is organized in such way that blocks to be extracted in a certain conjuncture should meet the downstream industries requirements entailed by that conjuncture, being let in situ the remaining material. Obviously, this procedure must comply with geotechnical and geological constraints that are ound in the available natural sources, i.e., a compromise should be reached between the characteristics o the quarry and the demand requirements or each application o the blocks to be extracted. The irst step to be undertaken in the above outlined procedure is to identiy the set o the natural stone s attributes required or a certain application that can be captured in the aces o the quarry, prior to extraction. These attributes are then encapsulated into a single index o quality according to Archetypal Discrimination, as illustrated bellow and production planning is perormed by maximizing such an index. The case study reported here reers to a marble quarry located in the Estremoz anticlinorium (see Fig. D-8.1), or which a panel o specialists (architects, builders and geologists) have deined the observable attributes modalities that were considered as the 53

54 archetypes o the GOOD and BAD material, or a certain application. In Table D-8.I it is given the matrix containing these two vectors, in what ractures are concerned. The meaning o the modalities sequence or each attribute is illustrated in Fig. D-8.2. Fig. D-8.1 Location o the Quarry Table D-8.I Archetype deinition Fig. D-8.2 Modalities o the archetypes attributes The ollowing steps consisted o capturing empirical data in the aces o the quarry, according to the ormat driven by the above given archetypes. For this end, the available vertical aces were swept by a moving window, and a photograph was taken or 54

55 each support, corresponding to the ield were ractures were digitalized and typiied in terms o their attributes, according to the scheme outlined in Fig. D-8.3. Fig. D-8.3 Scheme o data capture and their digital characterization Then, Archetypal Discrimination was applied by projecting as supplementary individuals the empirical supports onto the Axis obtained by CA applied to the archetype matrix o Table D-8.1. The coordinates o these supports in the above mentioned Axis represent the quality index o the material to be extracted, or the a priori deined use. Hence, the blocks o the quarry can now be classiied into meaningul classes or the required application, as exempliied in Fig. D-8.4, and the demand driven production planning can be perormed on the grounds o each block closeness to the GOOD pole. 55

56 Fig. D-8.4 Classiication o blocks in a quarry or a given application SOURCE: Pereira, H.G., Brito, M.G., Albuquerque, T., Ribeiro, J. (1993) Geostatistical estimation o a summary recovery index or marble quarries, Proceedings Geostatistics Tróia 92, Vol. 2, p

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint Biplots in Practice MICHAEL GREENACRE Proessor o Statistics at the Pompeu Fabra University Chapter 6 Oprint Principal Component Analysis Biplots First published: September 010 ISBN: 978-84-93846-8-6 Supporting

More information

Least-Squares Spectral Analysis Theory Summary

Least-Squares Spectral Analysis Theory Summary Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,

More information

Correspondence Analysis (CA)

Correspondence Analysis (CA) Correspondence Analysis (CA) François Husson & Magalie Houée-Bigot Department o Applied Mathematics - Rennes Agrocampus husson@agrocampus-ouest.r / 43 Correspondence Analysis (CA) Data 2 Independence model

More information

The achievable limits of operational modal analysis. * Siu-Kui Au 1)

The achievable limits of operational modal analysis. * Siu-Kui Au 1) The achievable limits o operational modal analysis * Siu-Kui Au 1) 1) Center or Engineering Dynamics and Institute or Risk and Uncertainty, University o Liverpool, Liverpool L69 3GH, United Kingdom 1)

More information

Objectives. By the time the student is finished with this section of the workbook, he/she should be able

Objectives. By the time the student is finished with this section of the workbook, he/she should be able FUNCTIONS Quadratic Functions......8 Absolute Value Functions.....48 Translations o Functions..57 Radical Functions...61 Eponential Functions...7 Logarithmic Functions......8 Cubic Functions......91 Piece-Wise

More information

Chapter 6 Reliability-based design and code developments

Chapter 6 Reliability-based design and code developments Chapter 6 Reliability-based design and code developments 6. General Reliability technology has become a powerul tool or the design engineer and is widely employed in practice. Structural reliability analysis

More information

CHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does

CHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does Geosciences 567: CHAPTER (RR/GZ) CHAPTER : INTRODUCTION Inverse Theory: What It Is and What It Does Inverse theory, at least as I choose to deine it, is the ine art o estimating model parameters rom data

More information

OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL

OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL Dionisio Bernal, Burcu Gunes Associate Proessor, Graduate Student Department o Civil and Environmental Engineering, 7 Snell

More information

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques

More information

APPENDIX 1 ERROR ESTIMATION

APPENDIX 1 ERROR ESTIMATION 1 APPENDIX 1 ERROR ESTIMATION Measurements are always subject to some uncertainties no matter how modern and expensive equipment is used or how careully the measurements are perormed These uncertainties

More information

Fluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs

Fluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs Fluctuationlessness Theorem and its Application to Boundary Value Problems o ODEs NEJLA ALTAY İstanbul Technical University Inormatics Institute Maslak, 34469, İstanbul TÜRKİYE TURKEY) nejla@be.itu.edu.tr

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition COMP 408/508 Computer Vision Fall 07 PCA or Recognition Recall: Color Gradient by PCA v λ ( G G, ) x x x R R v, v : eigenvectors o D D with v ^v (, ) x x λ, λ : eigenvalues o D D with λ >λ v λ ( B B, )

More information

AH 2700A. Attenuator Pair Ratio for C vs Frequency. Option-E 50 Hz-20 khz Ultra-precision Capacitance/Loss Bridge

AH 2700A. Attenuator Pair Ratio for C vs Frequency. Option-E 50 Hz-20 khz Ultra-precision Capacitance/Loss Bridge 0 E ttenuator Pair Ratio or vs requency NEEN-ERLN 700 Option-E 0-0 k Ultra-precision apacitance/loss ridge ttenuator Ratio Pair Uncertainty o in ppm or ll Usable Pairs o Taps 0 0 0. 0. 0. 07/08/0 E E E

More information

Categories and Natural Transformations

Categories and Natural Transformations Categories and Natural Transormations Ethan Jerzak 17 August 2007 1 Introduction The motivation or studying Category Theory is to ormalise the underlying similarities between a broad range o mathematical

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

CONVECTIVE HEAT TRANSFER CHARACTERISTICS OF NANOFLUIDS. Convective heat transfer analysis of nanofluid flowing inside a

CONVECTIVE HEAT TRANSFER CHARACTERISTICS OF NANOFLUIDS. Convective heat transfer analysis of nanofluid flowing inside a Chapter 4 CONVECTIVE HEAT TRANSFER CHARACTERISTICS OF NANOFLUIDS Convective heat transer analysis o nanoluid lowing inside a straight tube o circular cross-section under laminar and turbulent conditions

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid Probabilistic Model o Error in Fixed-Point Arithmetic Gaussian Pyramid Antoine Méler John A. Ruiz-Hernandez James L. Crowley INRIA Grenoble - Rhône-Alpes 655 avenue de l Europe 38 334 Saint Ismier Cedex

More information

OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION

OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION Xu Bei, Yeo Jun Yoon and Ali Abur Teas A&M University College Station, Teas, U.S.A. abur@ee.tamu.edu Abstract This paper presents

More information

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x) Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not

More information

Numerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective

Numerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective Numerical Solution o Ordinary Dierential Equations in Fluctuationlessness Theorem Perspective NEJLA ALTAY Bahçeşehir University Faculty o Arts and Sciences Beşiktaş, İstanbul TÜRKİYE TURKEY METİN DEMİRALP

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares Scattered Data Approximation o Noisy Data via Iterated Moving Least Squares Gregory E. Fasshauer and Jack G. Zhang Abstract. In this paper we ocus on two methods or multivariate approximation problems

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Analog Computing Technique

Analog Computing Technique Analog Computing Technique by obert Paz Chapter Programming Principles and Techniques. Analog Computers and Simulation An analog computer can be used to solve various types o problems. It solves them in

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction . ETA EVALUATIONS USING WEBER FUNCTIONS Introduction So ar we have seen some o the methods or providing eta evaluations that appear in the literature and we have seen some o the interesting properties

More information

Lecture 2: Data Analytics of Narrative

Lecture 2: Data Analytics of Narrative Lecture 2: Data Analytics of Narrative Data Analytics of Narrative: Pattern Recognition in Text, and Text Synthesis, Supported by the Correspondence Analysis Platform. This Lecture is presented in three

More information

Curve Sketching. The process of curve sketching can be performed in the following steps:

Curve Sketching. The process of curve sketching can be performed in the following steps: Curve Sketching So ar you have learned how to ind st and nd derivatives o unctions and use these derivatives to determine where a unction is:. Increasing/decreasing. Relative extrema 3. Concavity 4. Points

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd.

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd. .6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics Spline interpolation was originally developed or image processing. In GIS, it is mainly used in visualization o spatial

More information

Improvement of Sparse Computation Application in Power System Short Circuit Study

Improvement of Sparse Computation Application in Power System Short Circuit Study Volume 44, Number 1, 2003 3 Improvement o Sparse Computation Application in Power System Short Circuit Study A. MEGA *, M. BELKACEMI * and J.M. KAUFFMANN ** * Research Laboratory LEB, L2ES Department o

More information

CoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain;

CoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain; CoDa-dendrogram: A new exploratory tool J.J. Egozcue 1, and V. Pawlowsky-Glahn 2 1 Dept. Matemàtica Aplicada III, Universitat Politècnica de Catalunya, Barcelona, Spain; juan.jose.egozcue@upc.edu 2 Dept.

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Additional exercises in Stationary Stochastic Processes

Additional exercises in Stationary Stochastic Processes Mathematical Statistics, Centre or Mathematical Sciences Lund University Additional exercises 8 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

More information

Analysis of Friction-Induced Vibration Leading to Eek Noise in a Dry Friction Clutch. Abstract

Analysis of Friction-Induced Vibration Leading to Eek Noise in a Dry Friction Clutch. Abstract The 22 International Congress and Exposition on Noise Control Engineering Dearborn, MI, USA. August 19-21, 22 Analysis o Friction-Induced Vibration Leading to Eek Noise in a Dry Friction Clutch P. Wickramarachi

More information

A Simple Explanation of the Sobolev Gradient Method

A Simple Explanation of the Sobolev Gradient Method A Simple Explanation o the Sobolev Gradient Method R. J. Renka July 3, 2006 Abstract We have observed that the term Sobolev gradient is used more oten than it is understood. Also, the term is oten used

More information

STAT 801: Mathematical Statistics. Hypothesis Testing

STAT 801: Mathematical Statistics. Hypothesis Testing STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

More information

RESOLUTION MSC.362(92) (Adopted on 14 June 2013) REVISED RECOMMENDATION ON A STANDARD METHOD FOR EVALUATING CROSS-FLOODING ARRANGEMENTS

RESOLUTION MSC.362(92) (Adopted on 14 June 2013) REVISED RECOMMENDATION ON A STANDARD METHOD FOR EVALUATING CROSS-FLOODING ARRANGEMENTS (Adopted on 4 June 203) (Adopted on 4 June 203) ANNEX 8 (Adopted on 4 June 203) MSC 92/26/Add. Annex 8, page THE MARITIME SAFETY COMMITTEE, RECALLING Article 28(b) o the Convention on the International

More information

Numerical Methods - Lecture 2. Numerical Methods. Lecture 2. Analysis of errors in numerical methods

Numerical Methods - Lecture 2. Numerical Methods. Lecture 2. Analysis of errors in numerical methods Numerical Methods - Lecture 1 Numerical Methods Lecture. Analysis o errors in numerical methods Numerical Methods - Lecture Why represent numbers in loating point ormat? Eample 1. How a number 56.78 can

More information

SEPARATED AND PROPER MORPHISMS

SEPARATED AND PROPER MORPHISMS SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN The notions o separatedness and properness are the algebraic geometry analogues o the Hausdor condition and compactness in topology. For varieties over the

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

CHOW S LEMMA. Matthew Emerton

CHOW S LEMMA. Matthew Emerton CHOW LEMMA Matthew Emerton The aim o this note is to prove the ollowing orm o Chow s Lemma: uppose that : is a separated inite type morphism o Noetherian schemes. Then (or some suiciently large n) there

More information

SEPARATED AND PROPER MORPHISMS

SEPARATED AND PROPER MORPHISMS SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN Last quarter, we introduced the closed diagonal condition or a prevariety to be a prevariety, and the universally closed condition or a variety to be complete.

More information

Power Spectral Analysis of Elementary Cellular Automata

Power Spectral Analysis of Elementary Cellular Automata Power Spectral Analysis o Elementary Cellular Automata Shigeru Ninagawa Division o Inormation and Computer Science, Kanazawa Institute o Technology, 7- Ohgigaoka, Nonoichi, Ishikawa 92-850, Japan Spectral

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

Asymptote. 2 Problems 2 Methods

Asymptote. 2 Problems 2 Methods Asymptote Problems Methods Problems Assume we have the ollowing transer unction which has a zero at =, a pole at = and a pole at =. We are going to look at two problems: problem is where >> and problem

More information

The concept of limit

The concept of limit Roberto s Notes on Dierential Calculus Chapter 1: Limits and continuity Section 1 The concept o limit What you need to know already: All basic concepts about unctions. What you can learn here: What limits

More information

Circuit Complexity / Counting Problems

Circuit Complexity / Counting Problems Lecture 5 Circuit Complexity / Counting Problems April 2, 24 Lecturer: Paul eame Notes: William Pentney 5. Circuit Complexity and Uniorm Complexity We will conclude our look at the basic relationship between

More information

Multiple Correspondence Analysis

Multiple Correspondence Analysis Multiple Correspondence Analysis 18 Up to now we have analysed the association between two categorical variables or between two sets of categorical variables where the row variables are different from

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

RELIABILITY OF BURIED PIPELINES WITH CORROSION DEFECTS UNDER VARYING BOUNDARY CONDITIONS

RELIABILITY OF BURIED PIPELINES WITH CORROSION DEFECTS UNDER VARYING BOUNDARY CONDITIONS REIABIITY OF BURIE PIPEIES WITH CORROSIO EFECTS UER VARYIG BOUARY COITIOS Ouk-Sub ee 1 and ong-hyeok Kim 1. School o Mechanical Engineering, InHa University #53, Yonghyun-ong, am-ku, Incheon, 40-751, Korea

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

The Deutsch-Jozsa Problem: De-quantization and entanglement

The Deutsch-Jozsa Problem: De-quantization and entanglement The Deutsch-Jozsa Problem: De-quantization and entanglement Alastair A. Abbott Department o Computer Science University o Auckland, New Zealand May 31, 009 Abstract The Deustch-Jozsa problem is one o the

More information

Bayesian Technique for Reducing Uncertainty in Fatigue Failure Model

Bayesian Technique for Reducing Uncertainty in Fatigue Failure Model 9IDM- Bayesian Technique or Reducing Uncertainty in Fatigue Failure Model Sriram Pattabhiraman and Nam H. Kim University o Florida, Gainesville, FL, 36 Copyright 8 SAE International ABSTRACT In this paper,

More information

Chapter Two Elements of Linear Algebra

Chapter Two Elements of Linear Algebra Chapter Two Elements of Linear Algebra Previously, in chapter one, we have considered single first order differential equations involving a single unknown function. In the next chapter we will begin to

More information

On High-Rate Cryptographic Compression Functions

On High-Rate Cryptographic Compression Functions On High-Rate Cryptographic Compression Functions Richard Ostertág and Martin Stanek Department o Computer Science Faculty o Mathematics, Physics and Inormatics Comenius University Mlynská dolina, 842 48

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Lab on Taylor Polynomials. This Lab is accompanied by an Answer Sheet that you are to complete and turn in to your instructor.

Lab on Taylor Polynomials. This Lab is accompanied by an Answer Sheet that you are to complete and turn in to your instructor. Lab on Taylor Polynomials This Lab is accompanied by an Answer Sheet that you are to complete and turn in to your instructor. In this Lab we will approimate complicated unctions by simple unctions. The

More information

Telescoping Decomposition Method for Solving First Order Nonlinear Differential Equations

Telescoping Decomposition Method for Solving First Order Nonlinear Differential Equations Telescoping Decomposition Method or Solving First Order Nonlinear Dierential Equations 1 Mohammed Al-Reai 2 Maysem Abu-Dalu 3 Ahmed Al-Rawashdeh Abstract The Telescoping Decomposition Method TDM is a new

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R, Principal Component Analysis (PCA) PCA is a widely used statistical tool for dimension reduction. The objective of PCA is to find common factors, the so called principal components, in form of linear combinations

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

RATIONAL FUNCTIONS. Finding Asymptotes..347 The Domain Finding Intercepts Graphing Rational Functions

RATIONAL FUNCTIONS. Finding Asymptotes..347 The Domain Finding Intercepts Graphing Rational Functions RATIONAL FUNCTIONS Finding Asymptotes..347 The Domain....350 Finding Intercepts.....35 Graphing Rational Functions... 35 345 Objectives The ollowing is a list o objectives or this section o the workbook.

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)

More information

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations. POLI 7 - Mathematical and Statistical Foundations Prof S Saiegh Fall Lecture Notes - Class 4 October 4, Linear Algebra The analysis of many models in the social sciences reduces to the study of systems

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Linear Motion Test Review. 4. What does it mean when the sign of the acceleration is different than the sign of the velocity? Object is slowing down.

Linear Motion Test Review. 4. What does it mean when the sign of the acceleration is different than the sign of the velocity? Object is slowing down. Linear Motion Test Review 1. What is the slope o the graph o position versus time? LOOK IT UP IN YOUR NOTES 2. What is the slope o the graph o velocity versus time? LOOK IT UP IN YOUR NOTES 3. Name three

More information

MATHEMATICS: PAPER I TRIAL EXAMINATION 28 AUGUST 2015

MATHEMATICS: PAPER I TRIAL EXAMINATION 28 AUGUST 2015 MATHEMATICS: PAPER I TRIAL EXAMINATION 8 AUGUST 015 TIME: 3 HOURS TOTAL: 150 MARKS EXAMINATION NUMBER: PLEASE READ THE FOLLOWING INSTRUCTIONS CAREFULLY 1. Write your examination number on the paper.. This

More information

( x) f = where P and Q are polynomials.

( x) f = where P and Q are polynomials. 9.8 Graphing Rational Functions Lets begin with a deinition. Deinition: Rational Function A rational unction is a unction o the orm ( ) ( ) ( ) P where P and Q are polynomials. Q An eample o a simple rational

More information

Calculators are NOT permitted.

Calculators are NOT permitted. THE 0-0 KEESW STTE UIVERSITY HIGH SHOOL THETIS OETITIO RT II In addition to scoring student responses based on whether a solution is correct and complete, consideration will be given to elegance, simplicity,

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

One-Dimensional Motion Review IMPORTANT QUANTITIES Name Symbol Units Basic Equation Name Symbol Units Basic Equation Time t Seconds Velocity v m/s

One-Dimensional Motion Review IMPORTANT QUANTITIES Name Symbol Units Basic Equation Name Symbol Units Basic Equation Time t Seconds Velocity v m/s One-Dimensional Motion Review IMPORTANT QUANTITIES Name Symbol Units Basic Equation Name Symbol Units Basic Equation Time t Seconds Velocity v m/s v x t Position x Meters Speed v m/s v t Length l Meters

More information

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods Optimization Last time Root inding: deinition, motivation Algorithms: Bisection, alse position, secant, Newton-Raphson Convergence & tradeos Eample applications o Newton s method Root inding in > 1 dimension

More information

Math Review and Lessons in Calculus

Math Review and Lessons in Calculus Math Review and Lessons in Calculus Agenda Rules o Eponents Functions Inverses Limits Calculus Rules o Eponents 0 Zero Eponent Rule a * b ab Product Rule * 3 5 a / b a-b Quotient Rule 5 / 3 -a / a Negative

More information

INPUT GROUND MOTION SELECTION FOR XIAOWAN HIGH ARCH DAM

INPUT GROUND MOTION SELECTION FOR XIAOWAN HIGH ARCH DAM 3 th World Conerence on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 2633 INPUT GROUND MOTION LECTION FOR XIAOWAN HIGH ARCH DAM CHEN HOUQUN, LI MIN 2, ZHANG BAIYAN 3 SUMMARY In

More information

Syllabus Objective: 2.9 The student will sketch the graph of a polynomial, radical, or rational function.

Syllabus Objective: 2.9 The student will sketch the graph of a polynomial, radical, or rational function. Precalculus Notes: Unit Polynomial Functions Syllabus Objective:.9 The student will sketch the graph o a polynomial, radical, or rational unction. Polynomial Function: a unction that can be written in

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Semideterministic Finite Automata in Operational Research

Semideterministic Finite Automata in Operational Research Applied Mathematical Sciences, Vol. 0, 206, no. 6, 747-759 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.2988/ams.206.62 Semideterministic Finite Automata in Operational Research V. N. Dumachev and

More information

Quality control of risk measures: backtesting VAR models

Quality control of risk measures: backtesting VAR models De la Pena Q 9/2/06 :57 pm Page 39 Journal o Risk (39 54 Volume 9/Number 2, Winter 2006/07 Quality control o risk measures: backtesting VAR models Victor H. de la Pena* Department o Statistics, Columbia

More information

Ultra Fast Calculation of Temperature Profiles of VLSI ICs in Thermal Packages Considering Parameter Variations

Ultra Fast Calculation of Temperature Profiles of VLSI ICs in Thermal Packages Considering Parameter Variations Ultra Fast Calculation o Temperature Proiles o VLSI ICs in Thermal Packages Considering Parameter Variations Je-Hyoung Park, Virginia Martín Hériz, Ali Shakouri, and Sung-Mo Kang Dept. o Electrical Engineering,

More information

Manufacturing Remaining Stresses in Truck Frame Rail's Fatigue Life Prediction

Manufacturing Remaining Stresses in Truck Frame Rail's Fatigue Life Prediction Manuacturing Remaining Stresses in Truck Frame Rail's Fatigue Lie Prediction Claudiomar C. Cunha & Carlos A. N. Dias MSX International & Department o Naval Engineering EPUSP/USP/Brazil Department o Mechanical

More information

MISS DISTANCE GENERALIZED VARIANCE NON-CENTRAL CHI DISTRIBUTION. Ken Chan ABSTRACT

MISS DISTANCE GENERALIZED VARIANCE NON-CENTRAL CHI DISTRIBUTION. Ken Chan   ABSTRACT MISS DISTANCE GENERALIZED VARIANCE NON-CENTRAL CI DISTRIBUTION en Chan E-Mail: ChanAerospace@verizon.net ABSTRACT In many current practical applications, the probability o collision is oten considered

More information

Educational Procedure for Designing and Teaching Reflector Antennas in Electrical Engineering Programs. Abstract. Introduction

Educational Procedure for Designing and Teaching Reflector Antennas in Electrical Engineering Programs. Abstract. Introduction Educational Procedure or Designing and Teaching Relector Antennas in Electrical Engineering Programs Marco A.B. Terada Klipsch School o Electrical and Computer Engineering New Mexico State University Las

More information

1. Introduction to Multivariate Analysis

1. Introduction to Multivariate Analysis 1. Introduction to Multivariate Analysis Isabel M. Rodrigues 1 / 44 1.1 Overview of multivariate methods and main objectives. WHY MULTIVARIATE ANALYSIS? Multivariate statistical analysis is concerned with

More information

CHAPTER 8 ANALYSIS OF AVERAGE SQUARED DIFFERENCE SURFACES

CHAPTER 8 ANALYSIS OF AVERAGE SQUARED DIFFERENCE SURFACES CAPTER 8 ANALYSS O AVERAGE SQUARED DERENCE SURACES n Chapters 5, 6, and 7, the Spectral it algorithm was used to estimate both scatterer size and total attenuation rom the backscattered waveorms by minimizing

More information

Introduction to Simulation - Lecture 2. Equation Formulation Methods. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy

Introduction to Simulation - Lecture 2. Equation Formulation Methods. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Introduction to Simulation - Lecture Equation Formulation Methods Jacob White Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Outline Formulating Equations rom Schematics Struts and Joints

More information