Principal Component Analysis, an Aid to Interpretation of Data. A Case Study of Oil Palm (Elaeis guineensis Jacq.)
|
|
- Allen Kelley
- 6 years ago
- Views:
Transcription
1 Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 4(2): Scholarlink Research Institute Journals, 2013 (ISSN: ) jeteas.scholarlinkresearch.org Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 4(1):73-76 (ISSN: ) Principal Component Analysis, an Aid to Interpretation of Data. A Case Study of Oil Palm (Elaeis guineensis Jacq.) Ekezie Dan Dan Department of Statistics, Imo State University, PMB 2000, Owerri, Nigeria. Abstract Principal Component Analysis provides an objective way of finding indices so that the variation in the data can be accounted for as concisely as possible. It may well turn out that two or three Principal Components provide a good summary of all the original variables. Consideration of the values of the Principal Components instead of the values of the original variables may then make it much easier to understand what the data have to say. In short, Principal Components Analysis is a means of simplifying data by reducing the number of variables. Although Principal Components Analysis has been well described in a number of texts, the emphasis of the descriptions has been on the underlying theory of the methods and on the methods of computation. Only a limited number of practicable examples of the technique have been published in sufficient detail to enable the reader to gain any facility in the interpretation of the results of the analysis and the necessary background knowledge to the problems is not usually available. In this paper, a case study of the application of Principal Component Analysis to a practical problem is presented and is suggested that there is a need for the extensive application of the existing methods of multivariate analysis over a wide range of problems and subjects, especially in agriculture, in order to test the practical value of the techniques. Keywords: multivariate analysis, principal component analysis, agriculture, oil palm (Elaeis guineensis Jacq.) INTRODUCTION Principal Component Analysis is a descriptive procedure for analyzing relationships that may exist in a set of quantitative variables. It is designed to reduce the number of variables that need to be considered to a small number of indices (called the principal components) that are linear combinations of the original variables. Usually the technique is not utilized as an end in itself but as a method for illustrating, modeling, and combining variables for further analysis. For example, much of the variation in the body measurements of oil palm progeny (EWS) shown in Appendix 1, will be related to the general size of the palms, and the total (1) will measure this quite well. This accounts for one dimension in the data. Another index is 1 X1 X2 X3 X4 X5 X6 X7 (2) which is a contrast between the first five measurements and the last two. This reflects another dimension in the data. Principal Component Analysis provides an objective way of finding indices of this type so that the variation in the data can be accounted for as concisely as possible. It may well turn out that two or three Principal Components provide a good summary of all the original variables. Consideration of the values of the Principal Components instead of the values of the original variables may then make it much easier to understand what the data have to say. In short, Principal Components Analysis is a means of simplifying data by reducing the number of variables. Although principal components analysis, and indeed, most other multivariate techniques have been well described in a number of texts, the emphasis of these descriptions has been on the underlying theory of the methods and on the methods of computation. Only a limited number of practicable examples of the technique have been published in sufficient detail to enable the reader to gain any facility in the interpretation of the results of the analysis and the necessary background knowledge to the problems is not usually available. The interpretation of the results of the analysis is therefore left to the reader and no clear advice is given as to how this may be done. One of the aims of this paper, therefore, is to suggest that there is a need for the extensive application of the present methods of multivariate analysis, including principal components analysis, over a wide range of problems and subjects, in order to test the practical value of the techniques. The theory of multivariate methods has far out-run the practical application of the techniques, with the result that only a handful of examples are available in the published literature. In fact, some of this lack of practical examples has stemmed from difficulties in computation, but with greater encouragement, therefore, and better guidance as to the interpretation of the results, more research workers might be 237
2 prepared to attempt multivariate analysis of their data. This paper presents a case study in the application of principal components analysis, taken from the Ph.D. dissertation of the author on Multivatiate Analysis of nursery and yield characters of the Oil Palm (Elaeis guineensis Jacq.) Data. The experiment for this study was carried out in NIFOR Nigerian Institute for Oil Palm Research, Benin City in NIFOR is one of the Research Institutes under the Federal Ministry of Science and Technology with a research mandate for the palms sub-sector. Its major activities are centered on the Oil Palm, Coconut, Raphia, Dates and Ornamental palms. The layout adopted for this experiment is unreplicated blocks of Trays of Progenies. A total of 620 palms were randomly selected from a total of 1470 oil palm seedlings. The selection of the planted seedlings was by stratified sampling method for optimum allocation based on height variation within Trays which form the strata. The progenies are: 1. 1st Grade EWS (Extension Works Seeds) T X 6.37D (Wet Heat Treated) D X 3.365D D X D D X D Records obtained: Leaf Count, Height measurements, Flowering observations, Yield bunches. The object is to choose whole trays of seedlings planted in the pre-nursery from one or two crosses and to follow the testing of them through to yielding in the field (Field 33). Not less than 14 trays were planted out. Every stand in each tray was marked and measurements made at monthly intervals in prenursery and nursery. Normal leaf and flowering observations started immediately after planting in the field. Up to the end of the nursery stage, the identity of each palm was known so that a true cross section of each population can be planted into the field. The seedlings were transplanted from the pre-nursery trays to beds in the nursery in early April, keeping each seedling in the same relative position to it s neighbours. Monthly height measurements and identification of the youngest fully open leaf continued. Height measurements and leaf counts of the seedling in the main nursery continued until their planting into Field Principal Component Analysis The basic technique of principal components analysis is well described by Kendall (1957), Seal (1964), Quenouile (1962) and many others. In order to define precisely the technique as it has been employed in case study described in this paper, however, the following stages are distinguished. choice of the variables to be included in the analysis; construction of the basic data matrix; transformation of the basic data, if required; calculation of the dispersion or correlation matrix; calculation of eigenvalues and eigenvectors of the dispersion or correlation matrix; examination and interpretation of the eigenvalues; interpretation of the eigenvectors; calculation of the transformed values; plotting or further analysis of transformed values. A fuller commentary on these separate stages is given by Jeffers (1964), but there are several points which it is worth stressing before beginning the account of this case study. First, there are two major decisions to be taken during the course of the analysis. At stage (c), the data may be analysed without transformation, or they may be transformed, usually into the logarithms of the original value. Some exponents of principal component analysis advocate the transformation of all data, partly so as to satisfy the assumptions that may be made in the appeal to probability distributions and partly because they wish to consider hypotheses about ratios of the basic variables rather than about linear functions of the variables. The second decision is concerned with the choice between finding the eigenvalues and eigenvectors of the dispersion matrix or the correlation matrix. This choice depends on whether or not the scale of the original observations is important in the interpretation of the results. It is also worth stressing here that, in principal components analysis, the principal diagonal of the correlation matrix is never replaced by a vector of communalities, as in factor analysis. The selection of the planted seedlings was by stratified sampling method with optimum allocation based on height variation within trays which form the strata. KEY: DURA (D) - shell (endocarp), the main variety oil palm fruit found in the groups; has a large nut with a thick shell and thin mesocarp. PISIFERA (P) is a small fruit with no shell. TENERA (T) cross between Dura and Pisifera give rise to Tenera (hybrid). It has thick mesocarp containing much more oil and fat (chemically saturated oil) than either of its parents. The Tenera nut is small and easily shelled to release the palm kernel. The Tenera palm Kernel is smaller than the Dura kernel although the Tenera bunch is much larger than Dura. In all, the Tenera is a much better variety for industrial and economic purposes. 238
3 Practical Value of Principal Components Analysis The practical objectives of the use of principal components analysis may be summarized by the following list: the examination of the correlations between the variables of a selected set, as has been done in Table 1; the reduction of the basic dimensions of the variability in the measured set to the smallest number of meaningful dimensions, as has been done in Table 2; the elimination of variables which contribute relatively little extra information; the examination of the grouping of individuals in n-dimensional space; determination of the objective weighting of measured variables in the construction of meaningful indices; the allocation of individuals to previously demarcated groups; the recognition of misidentified individuals; orthogonalization of regression calculations. Not all of these objectives will be of equal importance in any given study and some will be entirely absent. Nevertheless, the method provides one solution to such problems, and is easy to apply, if electronic digital computer is available, with minimum of assumptions. Physical Characteristics of Oil Palm (Elaeis guineensis Jacq.). Appendix 1 In the oil palm data collection, the following biometric characteristics were collected viz: X 1 leaf count in the nursery, which was dropped because there was equal value X 2 height of the oil palm seedlings in the nursery X 3 leaf count in the field X 4 height of the oil palm seedlings in the field canopy spread (measured in metres) X 5 X 6 X 7 Sex-ratio (measured in percentage) average yield of oil palm (measured in kilograms, kg). The extracts of the correlation matrix among the different biometric characters from the computer print outs are as presented in Table 1 (See Appendix) The eigenvalues (latent roots) and eigenvectors, were obtained from the correlation matrix by solving the characteristics equation. A 0... (3) where A is the correlation matrix. The eigenvalues (latent roots) and eigenvector, together with the percentage of variability accounted for by each component are given in Table 2. Table 2 Eigenvectors for Components Height in the nursery Height in the field Leaf count in the field Canopy spread Sex Ratio Latent Root Percentage of Variability Cumulative percentage of variability One important problem in the application of principal components is to decide on the number of components, which have any practical significance. Bartlett (1950) has suggested an appropriate test for arriving at this decision but this test will not be used in the present case. Instead we shall adopt the arbitrary rule of thumb which Jeffere (1966) has utilized in his two case studies in the application of principal component analysis. This rule of thumb suggests that we consider only those components, which have eigen values of 1.00 or greater as having any practical significance. Using this method, we see from the computer printout of the principal components analysis and the extracted values in Tables 2 that only the first two components might be of any practical significance. The first component accounts for 78.15, per cent of the total variability in the oil palm progeny. The second component accounts for a further 17.89, percent. However, if we take in the third component 239 which accounts for 3.44, percent of the total variability, we find that the first three components account for nearly 99.48, percent of the total variability for the oil palm progeny (see Table 2). We may therefore neglect the remaining two components as being of not much practical significance. We therefore reduce the number of biometric characters from a whopping seven to just three and proceed to apply Discriminatory and Canonical Analysis to the Data. Practical Implication of the Findings In view of the strong inter-correlation between the variables, a lesser number of variables may be sufficient to distinguish between the oil palm measurement characters of the oil palm. In the present case, we had only seven variables and the analysis has suggested that two were sufficient to work with. The application of the analysis has suggested how the time and labour in measuring a
4 large number of variables may be saved by measuring a few of the variables and giving the same information as the totality of the variables. The analysis therefore suggests that there are probably two (three) major components of the physical variables, accounting for about nearly 99.48, percent of the total variability of the oil palm. This gives some clear indications as to the nature of the differences between the characters/ parameters of the oil palm being considered. The fact that, by definition, these components are mutually independent greatly simplifies the interpretation of the variability measured by the physical variables, and focuses the attention of the researcher on the basic dimension of which his variables are only first approximations. The principal component analysis of the correlation matrix confirms that very few components of variation have been included in the data, the first two components together accounting for of the oil palm. Only these two components are likely to have any practical significance. The remaining variation described by the measured variables is relatively unimportant. In further work only one of the main group of variables, example height of the oil palm seedlings in the nursery, together with height in the field need be retained, although new variables believed to be uncorrelated with those included in this study should be sought. Summary of the Principal Component Analysis (PCA) The practical objectives of the use of principal component analysis in this study may be summarized by the following tests: the examination of the correlations between the variables of a selected set; the reduction of the basic dimensions of the variability in the measured set to the smallest number of meaningful dimensions. the elimination of variables which contribute relatively little extra information; orthogonalization of regression calculations. CONCLUSION Consequently, Principal Component Analysis has enabled us to concentrate attention on the basic dimensions of the variability of the physical properties of the oil palm progeny and then uses this information to determine the relative importance of these dimensions in predicting the comprehensive yielding capacity of the improved varieties of the oil palm. Principal Component Analysis has given us clear guidance as to the selection of the necessary variables in further work on the oil palm, especially in selection studies, there would be little point in including more than two of the measurements of the physical properties, and these two variables should include height of the oil palm seedlings in the nursery and in the field. This ensures that no time is wasted by continuing to measure variables which contribute relatively little to the study. RECOMMENDATION The practical implications would seem to be relatively simple. There is a need for the technique to be more widely applied, and perhaps even more important, for the results of these applications to be more widely reported, within the contexts of their original problems, so that the value of the technique can be assessed in practice REFERENCES BARTLETT, M.S. (1950). Test of significance in factor analysis. Brit. J. Psych EKEZIE, D.D. (2011). Multivariate Analysis of Nursery and Yield Characters of the Oil Palm (Elaeis guineensis Jacq.) Data. Unpublished Ph.D. Dissertation JEFFERE, J.N.R. (1966). Principal component analysis in taxonomic research (Forestry Commission Statistics Section Paper No. 83) (1966) Correspondence. Statistician, 15, KENDALL M.G. (1957). A Course in Multivariate Analysis. London: Griffin QUENOUILLE. M.H. (1962). Associate Measurements. London: Buttrworths SEAL, H. (1964). Multivariate Statistical Analysis for Biologists. London: Methuen 240
5 APENDIX 1 BIOMETRIC MEASUREMENTS ON THE OIL PALM (Elaeis guneenssis Jacq.) LEAF COUNT (NURSERY) HEIGHT (M) (NURSERY) LEAF COUNT (FIELD) HEIGHT(M) (FIELD) CANOPY SPREAD (METERS) SEX RATIO (%) YIELD (4yrs) ( ) X 1 X 2 X 3 X 4 X 5 X 6 X TOTAL MEAN RANGE Source: Ekezie (2011) Ph.D Dissertation-extracted from EXP , N.I.F.O.R. Benin City, Nigeria. Table 1. Correlation Matrix height in nursery leaf count in field height in field canopy spread sex ratio height in nursery leaf count in field Height in field canopy spread Sex ratio yield 0.315* 0.417** 0.443** * 0.506** * 0.549** ** 0.506** 0.549** ** * 0.417** 0.443** yield * Correlation is significant at the 0.05 level (2-tailed). ** Correlation is significant at the 0.01 level (2-tailed). 241
1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables
1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables
More informationWhat is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.
What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationGenetic Diversity by Multivariate Analysis Using R Software
Available online at www.ijpab.com Shah et al Int. J. Pure App. Biosci. 6 (3): 181-190 (2018) ISSN: 2320 7051 DOI: http://dx.doi.org/10.18782/2320-7051.6596 ISSN: 2320 7051 Int. J. Pure App. Biosci. 6 (3):
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationLECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS
LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal
More informationOn Ranked Set Sampling for Multiple Characteristics. M.S. Ridout
On Ranked Set Sampling for Multiple Characteristics M.S. Ridout Institute of Mathematics and Statistics University of Kent, Canterbury Kent CT2 7NF U.K. Abstract We consider the selection of samples in
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationDimensionality Reduction Techniques (DRT)
Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,
More information2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 What is factor analysis? What are factors? Representing factors Graphs and equations Extracting factors Methods and criteria Interpreting
More informationThe Principal Component Analysis
The Principal Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) PCA Fall 2017 1 / 27 Introduction Every 80 minutes, the two Landsat satellites go around the world, recording images
More informationBIO 682 Multivariate Statistics Spring 2008
BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:
More informationBeta vulgaris L. ssp. vulgaris var. altissima Döll
PROTOCOL FOR TESTS ON DISTINCTNESS, UNIFORMITY AND STABILITY Beta vulgaris L. ssp. vulgaris var. altissima Döll SUGARBEET COMPONENTS UPOV Code: BETAA_VUL Adopted on 21/03/2018 Entry into force on 01/02/2018
More informationUCLA STAT 233 Statistical Methods in Biomedical Imaging
UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/
More informationMULTIVARIATE ANALYSIS IN ONION (Allium cepa L.)
ISSN 0258-7122 Bangladesh J. Agril. Res. 37(4): 573-582, December 2012 MULTIVARIATE ANALYSIS IN ONION (Allium cepa L.) M. H. RASHID 1, A. K. M. A. ISLAM 2, M. A. K. MIAN 3 T. HOSSAIN 4 AND M. E. KABIR
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationAbstract =20, R 2 =25 15, S 2 = 25 25, S 3
Plant Archives Vol. 16 No. 2, 2016 pp. 575-579 ISSN 0972-5210 EVALUATION OF INFLUENCE OF RHIZOME SIZE AND PLANT SPACING ON GROWTH AND YIELD ATTRIBUTES OF GINGER (ZINGIBER OFFICINALE ROSC.) CV. MARAN IN
More informationPerformance In Science And Non Science Subjects
Canonical Correlation And Hotelling s T 2 Analysis On Students Performance In Science And Non Science Subjects Mustapha Usman Baba 1, Nafisa Muhammad 1,Ibrahim Isa 1, Rabiu Ado Inusa 1 and Usman Hashim
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationComparative Analysis of ICA Based Features
International Journal of Emerging Engineering Research and Technology Volume 2, Issue 7, October 2014, PP 267-273 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Comparative Analysis of ICA Based Features
More informationTAMS39 Lecture 10 Principal Component Analysis Factor Analysis
TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content - Lecture Principal component analysis
More informationGeophysical Study of Limestone Attributes At Abudu Area of Edo State, Nigeria
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 2 (5): 795-800 Scholarlink Research Institute Journals, 2011 (ISSN: 2141-7016) jeteas.scholarlinkresearch.org Journal of Emerging
More informationA New Generalised Inverse Polynomial Model in the Exploration of Response Surface Methodology
Journal of Emerging Trends in Engineering Applied Sciences (JETEAS) (6): 1059-1063 Scholarlink Research Institute Journals 011 (ISSN: 141-7016) jeteas.scholarlinkresearch.org Journal of Emerging Trends
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationPrincipal Component Analysis, A Powerful Scoring Technique
Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new
More informationDIMENSION REDUCTION AND CLUSTER ANALYSIS
DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833
More informationCHAPTER 1. Introduction
CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing
More information13.7 ANOTHER TEST FOR TREND: KENDALL S TAU
13.7 ANOTHER TEST FOR TREND: KENDALL S TAU In 1969 the U.S. government instituted a draft lottery for choosing young men to be drafted into the military. Numbers from 1 to 366 were randomly assigned to
More informationAnalysis of Variance and Co-variance. By Manza Ramesh
Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method
More informationResearchers often record several characters in their research experiments where each character has a special significance to the experimenter.
Dimension reduction in multivariate analysis using maximum entropy criterion B. K. Hooda Department of Mathematics and Statistics CCS Haryana Agricultural University Hisar 125 004 India D. S. Hooda Jaypee
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More informationCOMBINING ABILITY ANALYSIS FOR CURED LEAF YIELD AND ITS COMPONENT TRAITS IN BIDI TOBACCO (NicotianatabacumL.)
International Journal of Science, Environment and Technology, Vol. 5, No 3, 2016, 1373 1380 ISSN 2278-3687 (O) 2277-663X (P) COMBINING ABILITY ANALYSIS FOR CURED LEAF YIELD AND ITS COMPONENT TRAITS IN
More informationMultivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis
Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis
More informationSAMPLING IN FIELD EXPERIMENTS
SAMPLING IN FIELD EXPERIMENTS Rajender Parsad I.A.S.R.I., Library Avenue, New Delhi-0 0 rajender@iasri.res.in In field experiments, the plot size for experimentation is selected for achieving a prescribed
More informationPrincipal Components Analysis using R Francis Huang / November 2, 2016
Principal Components Analysis using R Francis Huang / huangf@missouri.edu November 2, 2016 Principal components analysis (PCA) is a convenient way to reduce high dimensional data into a smaller number
More informationPrincipal Component Analysis (PCA) Theory, Practice, and Examples
Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A
More informationThe Empirical Rule, z-scores, and the Rare Event Approach
Overview The Empirical Rule, z-scores, and the Rare Event Approach Look at Chebyshev s Rule and the Empirical Rule Explore some applications of the Empirical Rule How to calculate and use z-scores Introducing
More informationPrincipal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,
Principal Component Analysis (PCA) PCA is a widely used statistical tool for dimension reduction. The objective of PCA is to find common factors, the so called principal components, in form of linear combinations
More informationIntroduction to Factor Analysis
to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55 Today s Lecture Factor Analysis Today s Lecture Exploratory
More informationChapter 4: Factor Analysis
Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.
More informationRELEVANCE OF FOLIAR EPIDERMAL CHARACTERS IN THE DELIMITATION OF THREE FORMS OF ELAEIS GUINEENSIS (JACQ.)
RELEVANCE OF FOLIAR EPIDERMAL CHARACTERS IN THE DELIMITATION OF THREE FORMS OF ELAEIS GUINEENSIS (JACQ.) Osuji, Julian Onyewuonyeoma *1 and Ajah, Obiageri Florence 1 1 Department of Plant Science and Biotechnology,
More informationEvaluation of Taro (Colocasia esculenta (L.) Schott.) Germplasm Using Multivariate Analysis
150 Journal of Root Crops, 2011, Vol. 37 No. 2, pp. 150-154 Indian Society for Root Crops ISSN 0378-2409 Evaluation of Taro (Colocasia esculenta (L.) Schott.) Germplasm Using Multivariate Analysis Central
More informationSTUDY ON GENETIC DIVERSITY OF POINTED GOURD USING MORPHOLOGICAL CHARACTERS. Abstract
ISSN 0258-7122 Bangladesh J. Agril. Res. 33(3) : 607-616, December 2008 STUDY ON GENETIC DIVERSITY OF POINTED GOURD USING MORPHOLOGICAL CHARACTERS A.S.M.R. KHAN 1, M.G. RABBANI 2, M.A. SIDDIQUE 3 AND M.I.
More informationU.S. Plant Patents and the Imazio Decision
U.S. Plant Patents and the Imazio Decision Robert J. Jondle, Ph.D., Esq. Castle Rock, Colorado (303) 799-6444 rjondle@jondlelaw.com www.jondlelaw.com Overview of U.S. Protection Options 1. Plant Patents
More informationEXTENT OF HETEROTIC EFFECTS FOR SEED YIELD AND COMPONENT CHARACTERS IN CASTOR (RICINUS COMMUNIS L.) UNDER SEMI RABI CONDITION
Indian J. Agric. Res.., 47 (4) : 368-372, 2013 AGRICULTURAL RESEARCH COMMUNICATION CENTRE www.arccjournals.com / indianjournals.com EXTENT OF HETEROTIC EFFECTS FOR SEED YIELD AND COMPONENT CHARACTERS IN
More informationComputers & Geosciences, Vol. 3, pp Pergamon Press, Printed in Great Britain
Computers & Geosciences, Vol. 3, pp. 245-256. Pergamon Press, 1977. Printed in Great Britain ROKE, A COMPUTER PROGRAM FOR NONLINEAR LEAST-SQUARES DECOMPOSITION OF MIXTURES OF DISTRIBUTIONS ISOBEL CLARK
More informationEffect of the age and planting area of tomato (Solanum licopersicum l.) seedlings for late field production on the physiological behavior of plants
173 Bulgarian Journal of Agricultural Science, 20 (No 1) 2014, 173-177 Agricultural Academy Effect of the age and planting area of tomato (Solanum licopersicum l.) seedlings for late field production on
More informationFACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING
FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT
More informationUse of administrative registers for strengthening the geostatistical framework of the Census of Agriculture in Mexico
Use of administrative registers for strengthening the geostatistical framework of the Census of Agriculture in Mexico Susana Pérez INEGI, Dirección de Censos y Encuestas Agropecuarias. Avenida José María
More informationComputer exercise 3: PCA, CCA and factors. Principal component analysis. Eigenvalues and eigenvectors
UPPSALA UNIVERSITY Department of Mathematics Måns Thulin Multivariate Methods Spring 2011 thulin@math.uu.se Computer exercise 3: PCA, CCA and factors In this computer exercise the following topics are
More informationVAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as:
1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in the relationships among the variables) -Factors are linear constructions of the set of variables (see #8 under
More informationWhile entry is at the discretion of the centre, candidates would normally be expected to have attained one of the following, or equivalent:
National Unit specification: general information Unit code: H1JB 11 Superclass: SE Publication date: May 2012 Source: Scottish Qualifications Authority Version: 01 Summary This Unit is designed to meet
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationDrift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares
Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares R Gutierrez-Osuna Computer Science Department, Wright State University, Dayton, OH 45435,
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationIB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks
IB Questionbank Mathematical Studies 3rd edition Grouped discrete 184 min 183 marks 1. The weights in kg, of 80 adult males, were collected and are summarized in the box and whisker plot shown below. Write
More informationAppendix B: Skills Handbook
Appendix B: Skills Handbook Effective communication is an important part of science. To avoid confusion when measuring and doing mathematical calculations, there are accepted conventions and practices
More informationThe aim of this section is to introduce the numerical, graphical and listing facilities of the graphic display calculator (GDC).
Syllabus content Topic 1 Introduction to the graphic display calculator The aim of this section is to introduce the numerical, graphical and listing facilities of the graphic display calculator (GDC).
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 8 MATRICES III Rank of a matrix 2 General systems of linear equations 3 Eigenvalues and eigenvectors Rank of a matrix
More informationTechniques and Applications of Multivariate Analysis
Techniques and Applications of Multivariate Analysis Department of Statistics Professor Yong-Seok Choi E-mail: yschoi@pusan.ac.kr Home : yschoi.pusan.ac.kr Contents Multivariate Statistics (I) in Spring
More informationPrincipal Component Analysis. Applied Multivariate Statistics Spring 2012
Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction
More informationDevelopment of Agrometeorological Models for Estimation of Cotton Yield
DOI: 10.5958/2349-4433.2015.00006.9 Development of Agrometeorological Models for Estimation of Cotton Yield K K Gill and Kavita Bhatt School of Climate Change and Agricultural Meteorology Punjab Agricultural
More informationDeveloping Rainfall Intensity Duration Frequency Models for Calabar City, South-South, Nigeria.
American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-02, Issue-06, pp-19-24 www.ajer.org Research Paper Open Access Developing Rainfall Intensity Duration Frequency
More informationIllinois State Water Survey at the University of Illinois Urbana, Illinois
Illinois State Water Survey at the University of Illinois Urbana, Illinois "A Study of Crop-Hail Insurance Records for Northeastern Colorado with Respect to the Design of the National Hail Experiment"
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationPOPULATION AND SAMPLE
1 POPULATION AND SAMPLE Population. A population refers to any collection of specified group of human beings or of non-human entities such as objects, educational institutions, time units, geographical
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation
More informationESCONDIDO UNION HIGH SCHOOL DISTRICT COURSE OF STUDY OUTLINE AND INSTRUCTIONAL OBJECTIVES
ESCONDIDO UNION HIGH SCHOOL DISTRICT COURSE OF STUDY OUTLINE AND INSTRUCTIONAL OBJECTIVES COURSE TITLE: Algebra II A/B COURSE NUMBERS: (P) 7241 / 2381 (H) 3902 / 3903 (Basic) 0336 / 0337 (SE) 5685/5686
More information12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL)
12.12 Model Building, and the Effects of Multicollinearity (Optional) 1 Although Excel and MegaStat are emphasized in Business Statistics in Practice, Second Canadian Edition, some examples in the additional
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationTESTING FOR CO-INTEGRATION
Bo Sjö 2010-12-05 TESTING FOR CO-INTEGRATION To be used in combination with Sjö (2008) Testing for Unit Roots and Cointegration A Guide. Instructions: Use the Johansen method to test for Purchasing Power
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number
More information, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1
Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression
More informationDover- Sherborn High School Mathematics Curriculum Probability and Statistics
Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and
More informationDoubled haploid ramets via embryogenesis of haploid tissue cultures
Doubled haploid ramets via embryogenesis of haploid tissue cultures Harry E. Iswandar 1, J. M. Dunwell 2, Brian P. Forster 3, Stephen P. C. Nelson 1,4 and Peter D. S. Caligari,3,4,5 ABSTRACT Tissue culture
More informationGenetic Divergence Studies for the Quantitative Traits of Paddy under Coastal Saline Ecosystem
J. Indian Soc. Coastal Agric. Res. 34(): 50-54 (016) Genetic Divergence Studies for the Quantitative Traits of Paddy under Coastal Saline Ecosystem T. ANURADHA* Agricultural Research Station, Machilipatnam
More informationto be tested with great accuracy. The contrast between this state
STATISTICAL MODELS IN BIOMETRICAL GENETICS J. A. NELDER National Vegetable Research Station, Wellesbourne, Warwick Received I.X.52 I. INTRODUCTION THE statistical models belonging to the analysis of discontinuous
More informationMULTIVARIATE TIME SERIES ANALYSIS AN ADAPTATION OF BOX-JENKINS METHODOLOGY Joseph N Ladalla University of Illinois at Springfield, Springfield, IL
MULTIVARIATE TIME SERIES ANALYSIS AN ADAPTATION OF BOX-JENKINS METHODOLOGY Joseph N Ladalla University of Illinois at Springfield, Springfield, IL KEYWORDS: Multivariate time series, Box-Jenkins ARIMA
More informationPOST GRADUATE DIPLOMA IN APPLIED STATISTICS (PGDAST) Term-End Examination June, 2016 MST-005 : STATISTICAL TECHNIQUES
No. of Printed Pages : 6 I MS T-005 1 00226 POST GRADUATE DIPLOMA IN APPLIED STATISTICS (PGDAST) Term-End Examination June, 2016 MST-005 : STATISTICAL TECHNIQUES Time : 3 hours Maximum Marks : 50 Note
More informationDesign and Construction of a Conical Screen Centrifugal Filter for Groundnut Oil Slurry
Leonardo Electronic Journal of Practices and Technologies ISSN 1583-1078 Issue 9, July-December 006 p. 91-98 Design and Construction of a Conical Screen Centrifugal Filter for Groundnut Oil Slurry Abdulkadir
More informationMaximum variance formulation
12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal
More informationAlternative Growth Goals for Students Attending Alternative Education Campuses
Alternative Growth Goals for Students Attending Alternative Education Campuses AN ANALYSIS OF NWEA S MAP ASSESSMENT: TECHNICAL REPORT Jody L. Ernst, Ph.D. Director of Research & Evaluation Colorado League
More informationCanonical Correlation & Principle Components Analysis
Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs
More informationNONPARAMETRIC TESTS. LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-12
NONPARAMETRIC TESTS LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-1 lmb@iasri.res.in 1. Introduction Testing (usually called hypothesis testing ) play a major
More informationInferential Statistics
Inferential Statistics Part 1 Sampling Distributions, Point Estimates & Confidence Intervals Inferential statistics are used to draw inferences (make conclusions/judgements) about a population from a sample.
More informationGEOG 4110/5100 Advanced Remote Sensing Lecture 15
GEOG 4110/5100 Advanced Remote Sensing Lecture 15 Principal Component Analysis Relevant reading: Richards. Chapters 6.3* http://www.ce.yildiz.edu.tr/personal/songul/file/1097/principal_components.pdf *For
More informationMultivariate Statistics (I) 2. Principal Component Analysis (PCA)
Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation
More informationGENETIC DIVERGENCE OF A COLLECTION OF SPONGE GOURD (Luffa cylindrica L.)
GENETIC DIVERGENCE OF A COLLECTION OF SPONGE GOURD (Luffa cylindrica L.) M. A. Gaffar 1, M. S. Hossain 2, Shamsunnaher 3 1 Agriculture Extension Officer, Department of Agricultural Extension, Narayanganj
More informationB. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis
B Weaver (18-Oct-2001) Factor analysis 1 Chapter 7: Factor Analysis 71 Introduction Factor analysis (FA) was developed by C Spearman It is a technique for examining the interrelationships in a set of variables
More informationMeasurement 4: Scientific Notation
Q Skills Review The Decimal System Measurement 4: Scientific Notation Dr. C. Stewart We are so very familiar with our decimal notation for writing numbers that we usually take it for granted and do not
More informationOr, in terms of basic measurement theory, we could model it as:
1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables; the critical source
More informationThe National Spatial Strategy
Purpose of this Consultation Paper This paper seeks the views of a wide range of bodies, interests and members of the public on the issues which the National Spatial Strategy should address. These views
More informationPrincipal Component Analysis
Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature
More informationPhotographs to Maps Using Aerial Photographs to Create Land Cover Maps
Aerial photographs are an important source of information for maps, especially land cover and land use maps. Using ArcView, a map composed of points, lines, and areas (vector data) can be constructed from
More informationThe Governance of Land Use
The planning system The Governance of Land Use United Kingdom Levels of government and their responsibilities The United Kingdom is a unitary state with three devolved governments in Northern Ireland,
More informationCHAPTER 4 CRITICAL GROWTH SEASONS AND THE CRITICAL INFLOW PERIOD. The numbers of trawl and by bag seine samples collected by year over the study
CHAPTER 4 CRITICAL GROWTH SEASONS AND THE CRITICAL INFLOW PERIOD The numbers of trawl and by bag seine samples collected by year over the study period are shown in table 4. Over the 18-year study period,
More information