RV COEFFICIENT & PRINCIPAL COMPONENT ANALYSIS OF INSTRUMENTAL VARIABLES USING SASIIML SOFTWARE
|
|
- Kelly Stevens
- 6 years ago
- Views:
Transcription
1 RV COEFFICIENT & PRINCIPAL COMPONENT ANALYSIS OF INSTRUMENTAL VARIABLES USING SASIIML SOFTWARE Pascal SCHLICH INTRODUCTION When p numerical variables have been recorded from n samples it is now quite usual to perform a principal component analysis (PCA) (Morrisson 1976) on the data set called Xnp. This method gives a representaion of the samples in a low dimensional space directed by the first principal components (PC). The PCs are obtained as linear combinations of the p variables. The interpretation of a principal component amounts to the comparison of the p coefficients of the associated linear combination. A few problems appear whenp is large (for instance higher than 50). On one hand the memory size of some microcomputers could be too small or the length of computing time could be prohibitive. On the other hand and this is the main problem the interpretation of a linear combination of so many variables would certainly be tiresome and not very convincing. It is often difficult to decide which are the relevant correlations between variables and PCs. It would be of a great interest to have previous knowledge of the relevant variables and then perform the PCA with these variables only. Let Y be another data set containing q numerical variables recorded from the same n samples. The problem of the comparison of the two sample configurations can be studied through the two PCA. A numerical coefficient is needed to quantify this comparison. Now assume that Y has to be fit by X. Some variables in X are maybe unnecessary to get principal sample plots as close as possible to those of Y PCA. The problem is then to select a subset of r variables from X called X(r) giving a PCA interpretation similar to this of Y. Here also a coefficent is needed to measure similarity between two sets of variables: X(r) and Y Assuming X= Y this last question amounts to the first one about the selection of a subset of sufficient variables in a single PCA. The RV coefficent (Escoufier ) brings useful answers to these questions. It is based on a presentation ofpcacurrentlyused in France(Caillez and Pages 1976) in which important point is.. the choice. of the metric. The RV coefficient can be used to select simultaneously variables and metric in order to reduce the number of variables in a single PCA or to establish links between two PCAs on the same samples (Escoufier and Roberts 1979; Bonifaset ai.1984). Thi& last method is called the principal component analysis of instrumental variables (PCAN). It is a non symetric conjoint analysis includingselection of variables from a data set to fit another one. Applications of these methods in food science have been presented by Schlich et al (1987). The aim of this communication is to announce a program for RV selection of variables written in SAS/IML. The first part of this communication gives the broad outlines of the RV coefficient and the PCAN. The second one interprets a pedagogic example from artificial data. 566
2 I :. F ;...! 2 r " " ;" r i " t. F: f u ".. f: I ; " " 1: I :; t. " ;-; f: l- ; : -:.; jj. METHODS PCA Xnp is a data set containing p numerical variables recorded from nsamples. Dnn is a diagonal matrix of sample weights. Usually each sample gets the same weights lin. Ppp is a symetric matrix defining the metric used to measure distances between samples. PCA of t.he stat.iscal study (XPD) leads to computat.ion of the eigen system of the n f n matrix product WX=XPXD or equivalently to the p*p dual matrix pr()duct XDXP (the means matrix transposition). When Pis the identity then this last matrix product becomes the covariance matrix. RV coefficient Let (Ynq Qqq Dnn) be another statistical study on the same n samples and Wy be the matrix product YQYD.The RV coefficient between the two statistical studies is defined by: RV(WXWy) == trace(wx.wy) / (trace(wx.wx> * trace(wy.wy»li2 The RV coefficient appears as a generalized coefficient of correlation between the two PCAs. It varies between 0 and 1. When RV equals 0 then the matrix of correlation between X and Y is null. When RV equals 1 then all distances between samples are proportional in the two PCAs. Therefore the closer to 1 the RV the nearer the two PCAs. In the most usual cases in which the dual product is reduced to the covariance or correlation matrix: RV = Li=Ip;k=lq C2(XiYk)!(tij=lp C2(XiXj) * LkI=Iq C2(YkYl»li2 where c is a covariance or correlation coefficient. In thepar1;icular case in which q=l and if R is the classical multiple correlation coefficient then RV=R2/pl/2. This identity points out two pecularities of this coefficient. First it appean> as the square cosine of the angle between Y vector and X space inversely weighted by the dimension of this space. Secondly the magnitude of RV value is comparable to that of a squared correlation. Practically an RV value of 0.95 leads to equivalent interpretation of the two PCA.. I.!. Choosing variables in PCA Let X";Y(r) be a subset of r variables from Y. IfRV(WY(r)Wy) is close to 1 then the PCA of the reduced data set Y(r) can replace the PCA of the whole Y dataset without disturbing the sample locations on principal plots. Variables composing Y(r) are selected using a forward algorithm (Do Chi 1979). Three options are available to associate a metric Q(r) to the data set Y(r): "Q(r) is the identity matrix and each selected variable gts the same weight. - Q(r) is a diagonal matric containing the variable weights..; Q(r) isa symetric matrix containing a general metric. For the two last options the metric is constructed iteratively and simultaneously to the choice of variables by the program. 3 : 567
3 Figure 1 shows and explains the input window of the program. PCAIV PCAIV occurs with two different data sets X and Y. X(r) variables are selected from X to fit Y using the third option for Q(r) described above. PCAIV is. equivalent to the PCA of the q linear combinations of the r X( r) variables obtained by orthogonal projection of the q Y variables on the space spanned by the X(r) variables. In fact the PCAIV seems like a multiple linear regression with several instead of one only dependant variables Y and one common subset of in dependant variables X(r). After the X(r) selection the user has to compute the PCAN using the procedure PRINCOMP (with COY option) on the output data set DTS3 (Figure 1). The principal components of the PCAN are then understood by the correlations computed between both X(r) and Y variables and those components. EXAMPLE WITH SIMULATED DATA The data set ps.dts16 contains 26 observations called A B... Z and 16 variables called Xl... X16. It has been simulated to have a particular matrix of correlations showed in Table 1. More precisely the variables Xl X10 and X15 have been first independently simulated. Secondly the variables X2 X3... X9 have been simulated to have good correlations with Xl alternating sign plus and minus. The variables XU X12 X13 and X14 have been simulated in the same way for X10. Finally the variable X16 has been simulated to have good correlation withx15. Of course the PCA of this data set underlines three significant dimensions (Table 1).: the first one is higly correlated with the first 9 variables the second one with the 5 variables X10... X14 and the third with X15 and X16. Table 2 shows the RV selection with and without weights on selected variables. In the two cases. the two first selected variables are Xl and X10 which are both head of one variable group in the simulation process. The third group is represented by variable X16 in the two selection. The figure 2 shows where and how the RV selectsvariables. In fact 6 variables only are sufficient to get a first sample plot very similar to this ofthe whole PCA (Figure 3). Allowing weights on variables 3 variables only (one per dimension) are necessary to get a RV value greater than The computed weights are thus similar to those of the PCs. Table 3 gives and describes an example of PCAIV. It demonstrates clearly that PCAN is able to discover hidden linear combinations of variables. CONCLUSION The RV. coefficient is a useful tool to compare and to sununarize subsets of variables recorded from the same samples. More generally the RV coefficient can be considered as an unifying tool for linear multivariate statistical methods (Robert and Escoufier 1976). It can be. used also for classification of variables (Schlich 1989).or for a three-way data analysis called STATIS (Lavit 1988; Traissac 1990). 568
4 VSELECT----=== =-====-===================================9 Command ===> RV SELECTION: From X to fit y OTSl = First input data set y =.;..NUM_ Names of Y variables (NUM for all numerical variables) OPT = COR COV for covariance / CORfo]: correlation Second input data set Names of candidate variables XSEL = Names of variables imposed to the selection OPT = COR COV for covariance / COR for correlation METR = IO IO for identity/ WG for weighted/ IV for PCAIV RVMX = 1.00 PMAX = OTS3 = Maximum value for the RV coefficient Maximum number ofselected variables Output data set of the selected variables transformed by the metric for WG and IV options To select a subset of variables from a single data set to fit this whole data set define: DTS1 =DTS2 XSEL variables will be the selected first..; : " t f li ;" t ;: t( ; 1 1;" f t.- ;; r t OPT=COR means that each variable is divided by its standard deviation. RVMX andpmax stops the selectionat a given step. DT$3 is the output data set containing the selected variables transformed by the metric for IV and WG values of the METR option. When METR=lV the PCA of this data set is the PCAIV. In this case as for METR=WG PRINCOMP must be perform on DTS3 using its COY option. " Figure 1 : Inpu.tprogram window RVSELECT and comments -. : 569
5 SIMUlATED EXAMPLE OF SELECTIOO OF VARIABLE IN A SINGLE DATA SET *************************************************************** The data set ps.dts16 has been simulated to have almost this particular correlation matrix: VARIABLE Xl X7. X3 X4 X5 X6 r:1 X8 X9 Xl X X3 -O.-O o8 M-o.E O.OO-O. 0." X o."-o.m M-o o.n O.-o.M O. X r:1 -o4-o.n " O.-o o. O. X O.U-O. O."-O. O.-O. O.-o X Xl Xl o.oj Xl Xl Xl Xl XlO X11 Xl2 Xl3 Xl4-0.-o.09-o.03-o.14 O.H -O.-O. m-o. H o.-o o o.h-o o. O o.n-o.n O..-o.M o4 o.m Xl5 Xl PCA of the whole data set ps.dts16 canposed of 26 observations and 16 variables: Eigenvalue % PC PC PC PC PC PC Table 1 Variables higly correlated with the PC Xl X7. X3 X4 X5 X6 r:1 X8 XlO Xl1 Xl2 Xl3 Xl4 Xl5 Xl6 Unselected PC RV SELECTIOO FROO X TO m Y Y is fran ps.dts16 Variables (COR) are Xl X7.X3 X4 X6 r:1 X8 X9 XlO xu Xl2 Xl3 X14 Xl5 xi6 x is fran ps.dts16 Variables (COR) are Xl X7. X3 X4 X5 X6 r:1 X8 X9 XlO Xl1 Xl2 Xl3 Xl4 Xl5 Xl6 Metric wi th X selected variables is ID Metric with X selected variables is WG Weights Step Variable RV PC highly correlated Step Variable RV Xl XlO Xl6 1 1Xl PC1 1 1Xl XlO PC XlO X PC Xl L Xl PC) 5 5 X PCl 4 14 Xl Unselected variables 6 11 Xl PC Xl X X Unselected variables 7 2lQ X X Xl r: Xl Xl r: X Xl Xl X Xl Xl X X3 1 The 16 variables had 3 relevant PC of unequal weights. With the ID metric the RV selects 6 variables: 3 for PCl 2 for PC2 and 1 for PC3 With the WG metric the RV selects 3 variables and 3 weights similar to the PC weights Weights % Xl XlO Xl Table 2 570
6 CORRELATIONS BETWEEN PC OF THE WHOLE PCA AND THE VARIABLES PC2 -r j 0.5 i 0.0 -O.? 1 XIS m DO Xl6 Xl6 XI! ---l XU I _...- I PCl RV (METR=lD) chooses variables (highlighted) among the different significant directions and proportionally to the number of variables highly correlated with them. Figure 2 X8 SUPERIMPOSED FIRST PRINCIPAL SAMPLE PLOTS OF THE WHOLE AND THE REDUCED PCA (METR=lD) "j h-l I h! 4 " I j I : 31 i Z 2 J I z I j R II T : 1 r f 1 C Gg C k K W XW XXII J do 0 A ley bini 0 : -1 U e B L f n a ju sf v q I -2 I a -3 s I v p :: t I :" I S PCl Capital letters denote sample coming from the whole PCA. Lower case letter denotes samples coming from the reduced PCA of variables X1 X10 X3 X16 X5 and X11. Due to the good RV value (0.955) location of samples are very similar in the two PCA. The two interpretations are equivalent! Figure 3... f.. LO
7 SIMULATI:O EX1OO1.B OF PCAIV ************************** Assume vadables YJ. y/ Y3 Y4 and Y5 are simulated hneax combinations of variables Xl X9 XlO Xl4 and X.1.6 from data set ps.dts16. the coefficients are uniformely and randa:oly chosen between -1 and +1: Yl= -0.92*Xl *X9-0.25*X.l0-0.47*X *X16 Y2= 0.04*X9-0.78*X *X *X *)(10 Y3= 0.97*Xl *Xl *X *X *X10 Y4= O.72*XlO *)( *X *X *X9 Y5= -o.06*xl *XlO *X9 + O.17*Xl *X16 Is the RV able to di.scover this structure when all the 16 X vari.ables are submitted to the selection? RUNNING TIlE THREl: OPTIONS FOR TIlE MmIC k*************************************** RV SELF.crION FROM X lo FIT Y Y j.s from ps.dts5 Variables (COR) ar.e Yl Y2 Y3 Y4 Y5 x is from ps.dts16 Variables (COR) are X) X2 X3 X4 X5 X6 X7 X8 X9 X10 XJJ Xl/ X13 X14 Xl5 )16 Metric lith X selected. variables is ill Step Variable 9 X9 16 Xl6 14 X.14 2X2 15 X15 RV With no choice for the metric the RV fails in selecting the right variables. Metric 1ith X selected variables is WG Step Varhble 9 X9 16 Xl6 14 Xl4 13 X Xll RV Using a metric ofwd ghts the ref;tll t is a little b:it better but 2 selected variables are.stilj. wrong. Metdc with X selected variables is IV Step Variable J C) XC) 2 16 X XlO 4 14 X Xl RV Usi.ng a general metric the RV selects the right variables and discovers the structure. Table 3 572
8 REFERENCES Bernard-Do Chi C. Choix de variables en analyse de donnees. These. Universite des Sciences et Techniques du Languedoc i: f.:;.. j. y f i.: ". j.... Bonifas L.; Escoufier Y.;Gonzales P.L.; Sabatier R. Choix de variables en analyse en compos antes principales Rev. Stat. Appl Cailliez F.; Pages J.P. Introduction it lanalyse des donnees. Smash: Paris Escoufier Y. Echantillonage dans une population de variables aleatoires reelles. Publ. lnst. Stat. Univ. Paris Escoufier Y. Le traiterilent des variables vectorielles. Biometrics Escoufier Y.; Robert P Choosing variables andmetrics by optimizing the RV coefficient. In Optimizing Methods in Statistics; Rustagi J.S. Ed.; Academic: New York 1979; pp Lavit Ch. Analyse conjointe de tableaux qo.antitatifs. Masson: Paris Morrisson D.F. Multivariate Statistical Methods 2nd ed.; McGraw-Hill: New York K...r: "Z.i t:.. :. fl :..; t.. f; ".. i. l Robert P.;Escoufier Y. A unifying tool for linear multivariate statistical methods: the RV coefficient. Appl. Stat Schlich P.; Issanchou S.; Guichard E.; Etievant P.; Adda J. RV coefficient: a new approach to select variables in PCA and to get correlations between sensory and instrumental data. In Flavour Science and Technology; Martens M. DalenG.A. Russwurm R. Jr. Eds; Wiley: Chichester 1987; pp Schlich P.; Guichard E. Selection and classification of volatilecompoundf:! of apricot using the RV coefficient. J. Agric. FoodChem Traissac P. Exploratory data analysis of a cube of data by the ACT (Statis method) using SASlIML and SASIGRAPH software. In SEUG190 proceedings (in press) Address of the author is: INRA- Laboratoire de Recherches sur los Aromes 17 rue Sully DIJON Cedex - FRANCE i. 573
The STATIS Method. 1 Overview. Hervé Abdi 1 & Dominique Valentin. 1.1 Origin and goal of the method
The Method Hervé Abdi 1 & Dominique Valentin 1 Overview 1.1 Origin and goal of the method is a generalization of principal component analysis (PCA) whose goal is to analyze several sets of variables collected
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationUsing multitable techniques for assessing Phytoplankton Structure and Succession in the Reservoir Marne (Seine Catchment Area, France)
Using multitable techniques for assessing Phytoplankton Structure and Succession in the Reservoir Marne (Seine Catchment Area, France) Frédéric Bertrand, Myriam Maumy, Anne Rolland, Stéphan Jacquet To
More informationAssessing Sample Variability in the Visualization Techniques related to Principal Component Analysis : Bootstrap and Alternative Simulation Methods.
Draft from : COMPSTAT, Physica Verlag, Heidelberg, 1996, Alberto Prats, Editor, p 205 210. Assessing Sample Variability in the Visualization Techniques related to Principal Component Analysis : Bootstrap
More informationILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS
ILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS W. T. Federer, C. E. McCulloch and N. J. Miles-McDermott Biometrics Unit, Cornell University, Ithaca, New York 14853-7801 BU-901-MA December 1986
More informationRV Coefficient and Congruence Coefficient
RV Coefficient and Congruence Coefficient Hervé Abdi 1 1 Overview The congruence coefficient was first introduced by Burt (1948) under the name of unadjusted correlation as a measure of the similarity
More informationMachine Learning 11. week
Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately
More informationPrincipal Component Analysis (PCA) Principal Component Analysis (PCA)
Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n } Recall:
More informationPrincipal Component Analysis. Applied Multivariate Statistics Spring 2012
Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction
More informationPrincipal Components Analysis using R Francis Huang / November 2, 2016
Principal Components Analysis using R Francis Huang / huangf@missouri.edu November 2, 2016 Principal components analysis (PCA) is a convenient way to reduce high dimensional data into a smaller number
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationTítulo: THEMATIC ANALYSIS OF TOURISM RESEARCH IN LEADING TOURISM JOURNALS
Título: THEMATIC ANALYSIS OF TOURISM RESEARCH IN LEADING TOURISM JOURNALS Iris Lopes, João Albino Silva, Efigénio Rebelo, Guilherme Castela THEME: Tourism Research Frontiers The role of research journals
More informationStructural Equation Modeling and Confirmatory Factor Analysis. Types of Variables
/4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 18: Introduction to covariates, the QQ plot, and population structure II + minimal GWAS steps Jason Mezey jgm45@cornell.edu April
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationA User's Guide To Principal Components
A User's Guide To Principal Components J. EDWARD JACKSON A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Brisbane Toronto Singapore Contents Preface Introduction 1. Getting
More informationConjoint use of variables clustering and PLS structural equations modelling
Conjoint use of variables clustering and PLS structural equations modelling Valentina Stan 1 and Gilbert Saporta 1 1 Conservatoire National des Arts et Métiers, 9 Rue Saint Martin, F 75141 Paris Cedex
More informationCOLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS
American Journal of Biostatistics 4 (2): 29-33, 2014 ISSN: 1948-9889 2014 A.H. Al-Marshadi, This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajbssp.2014.29.33
More informationDiscriminant Analysis on Mixed Predictors
Chapter 1 Discriminant Analysis on Mixed Predictors R. Abdesselam Abstract The processing of mixed data - both quantitative and qualitative variables - cannot be carried out as explanatory variables through
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationUpper bound on the number of ternary square-free words
Upper bound on the number of ternary square-free words Pascal Ochem and Tony Reix LaBRI, Université Bordeaux 1 351 cours de la Libération 33405 Talence Cedex, France ochem@labri.fr Tony.Reix@bull.net Abstract
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationB. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis
B Weaver (18-Oct-2001) Factor analysis 1 Chapter 7: Factor Analysis 71 Introduction Factor analysis (FA) was developed by C Spearman It is a technique for examining the interrelationships in a set of variables
More informationDegenerate Expectation-Maximization Algorithm for Local Dimension Reduction
Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,
More informationBest linear unbiased prediction when error vector is correlated with other random vectors in the model
Best linear unbiased prediction when error vector is correlated with other random vectors in the model L.R. Schaeffer, C.R. Henderson To cite this version: L.R. Schaeffer, C.R. Henderson. Best linear unbiased
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationBootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions
JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi
More informationMultilevel Component Analysis applied to the measurement of a complex product experience
Multilevel Component Analysis applied to the measurement of a complex product experience Boucon, C.A., Petit-Jublot, C.E.F., Groeneschild C., Dijksterhuis, G.B. Outline Background Introduction to Simultaneous
More information1 Overview. Conguence: Congruence coefficient, R V -coefficient, and Mantel coefficient. Hervé Abdi
In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Conguence: Congruence coefficient, R V -coefficient, and Mantel coefficient Hervé Abdi 1 Overview The congruence between
More informationPrincipal Component Analysis, A Powerful Scoring Technique
Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new
More informationR = µ + Bf Arbitrage Pricing Model, APM
4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationResearchers often record several characters in their research experiments where each character has a special significance to the experimenter.
Dimension reduction in multivariate analysis using maximum entropy criterion B. K. Hooda Department of Mathematics and Statistics CCS Haryana Agricultural University Hisar 125 004 India D. S. Hooda Jaypee
More informationA METHOD OF FINDING IMAGE SIMILAR PATCHES BASED ON GRADIENT-COVARIANCE SIMILARITY
IJAMML 3:1 (015) 69-78 September 015 ISSN: 394-58 Available at http://scientificadvances.co.in DOI: http://dx.doi.org/10.1864/ijamml_710011547 A METHOD OF FINDING IMAGE SIMILAR PATCHES BASED ON GRADIENT-COVARIANCE
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationMIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010
MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf
More informationAssignment 3. Introduction to Machine Learning Prof. B. Ravindran
Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively
More informationThe General Linear Model in Functional MRI
The General Linear Model in Functional MRI Henrik BW Larsson Functional Imaging Unit, Glostrup Hospital University of Copenhagen Part I 1 2 Preface The General Linear Model (GLM) or multiple regression
More informationEmpirical Gramians and Balanced Truncation for Model Reduction of Nonlinear Systems
Empirical Gramians and Balanced Truncation for Model Reduction of Nonlinear Systems Antoni Ras Departament de Matemàtica Aplicada 4 Universitat Politècnica de Catalunya Lecture goals To review the basic
More informationPrincipal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17
Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into
More informationPRINCIPAL COMPONENT ANALYSIS
PRINCIPAL COMPONENT ANALYSIS 1 INTRODUCTION One of the main problems inherent in statistics with more than two variables is the issue of visualising or interpreting data. Fortunately, quite often the problem
More informationSelection on selected records
Selection on selected records B. GOFFINET I.N.R.A., Laboratoire de Biometrie, Centre de Recherches de Toulouse, chemin de Borde-Rouge, F 31320 Castanet- Tolosan Summary. The problem of selecting individuals
More informationColour. The visible spectrum of light corresponds wavelengths roughly from 400 to 700 nm.
Colour The visible spectrum of light corresponds wavelengths roughly from 4 to 7 nm. The image above is not colour calibrated. But it offers a rough idea of the correspondence between colours and wavelengths.
More informationIntroduction to Structural Equation Modeling
Introduction to Structural Equation Modeling Notes Prepared by: Lisa Lix, PhD Manitoba Centre for Health Policy Topics Section I: Introduction Section II: Review of Statistical Concepts and Regression
More informationEcon671 Factor Models: Principal Components
Econ671 Factor Models: Principal Components Jun YU April 8, 2016 Jun YU () Econ671 Factor Models: Principal Components April 8, 2016 1 / 59 Factor Models: Principal Components Learning Objectives 1. Show
More informationSpectrum and Exact Controllability of a Hybrid System of Elasticity.
Spectrum and Exact Controllability of a Hybrid System of Elasticity. D. Mercier, January 16, 28 Abstract We consider the exact controllability of a hybrid system consisting of an elastic beam, clamped
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationAnnouncements (repeat) Principal Components Analysis
4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long
More informationRoss (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.
4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.
More informationLecture 4: Principal Component Analysis and Linear Dimension Reduction
Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:
More informationPCR and PLS for Clusterwise Regression on Functional Data
PCR and PLS for Clusterwise Regression on Functional Data Cristian Preda 1 and Gilbert Saporta 2 1 Faculté de Médecine, Université de Lille 2 CERIM - Département de Statistique 1, Place de Verdun, 5945
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 2. Overview of multivariate techniques 2.1 Different approaches to multivariate data analysis 2.2 Classification of multivariate techniques
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More information1 Practical situations - Main aims of the method
" EXPLORATORY DATA ANALYSIS OF A CUBE OF DATA DY THE ACT (STATIS METHOD) USING SAS/IML@AND SAS/GRAPIflsOFTWARE Pierre TRAISSAC, Jean Louis DESSELLE Unite de biometrie - ENSA.M INRA UM.II Abstract We present
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationw. T. Federer, z. D. Feng and c. E. McCulloch
ILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENT ANALYSIS USING GENSTATIPCP w. T. Federer, z. D. Feng and c. E. McCulloch BU-~-M November 98 ~ ABSTRACT In order to provide a deeper understanding of the workings
More information1 Interpretation. Contents. Biplots, revisited. Biplots, revisited 2. Biplots, revisited 1
Biplots, revisited 1 Biplots, revisited 2 1 Interpretation Biplots, revisited Biplots show the following quantities of a data matrix in one display: Slide 1 Ulrich Kohler kohler@wz-berlin.de Slide 3 the
More informationDominant Feature Vectors Based Audio Similarity Measure
Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 6. Principal component analysis (PCA) 6.1 Overview 6.2 Essentials of PCA 6.3 Numerical calculation of PCs 6.4 Effects of data preprocessing
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationPrincipal Component Analysis
Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature
More informationNAG Toolbox for Matlab. g03aa.1
G03 Multivariate Methods 1 Purpose NAG Toolbox for Matlab performs a principal component analysis on a data matrix; both the principal component loadings and the principal component scores are returned.
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationCHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION
59 CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 4. INTRODUCTION Weighted average-based fusion algorithms are one of the widely used fusion methods for multi-sensor data integration. These methods
More informationPLS discriminant analysis for functional data
PLS discriminant analysis for functional data 1 Dept. de Statistique CERIM - Faculté de Médecine Université de Lille 2, 5945 Lille Cedex, France (e-mail: cpreda@univ-lille2.fr) 2 Chaire de Statistique
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationMachine Learning for Software Engineering
Machine Learning for Software Engineering Dimensionality Reduction Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Exam Info Scheduled for Tuesday 25 th of July 11-13h (same time as the
More informationOverview of clustering analysis. Yuehua Cui
Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this
More informationMethods for territorial intelligence.
Methods for territorial intelligence. Serge Ormaux To cite this version: Serge Ormaux. Methods for territorial intelligence.. In International Conference of Territorial Intelligence, Sep 2006, Alba Iulia,
More informationFrank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 147 le-tex
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c08 2013/9/9 page 147 le-tex 8.3 Principal Component Analysis (PCA) 147 Figure 8.1 Principal and independent components
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationPERFORMANCE OF THE EM ALGORITHM ON THE IDENTIFICATION OF A MIXTURE OF WAT- SON DISTRIBUTIONS DEFINED ON THE HYPER- SPHERE
REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 111 130 PERFORMANCE OF THE EM ALGORITHM ON THE IDENTIFICATION OF A MIXTURE OF WAT- SON DISTRIBUTIONS DEFINED ON THE HYPER- SPHERE Authors: Adelaide
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,
More informationDISTATIS: The Analysis of Multiple Distance Matrices
DISTATIS: The Analysis of Multiple Distance Matrices Hervé Abdi The University of Texas at Dallas Alice J O Toole The University of Texas at Dallas Dominique Valentin Université de Bourgogne Betty Edelman
More informationCHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum
CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum 1997 19 CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION 3.0. Introduction
More informationPrincipal Component Analysis CS498
Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy
More informationSection 4.5 Eigenvalues of Symmetric Tridiagonal Matrices
Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices Key Terms Symmetric matrix Tridiagonal matrix Orthogonal matrix QR-factorization Rotation matrices (plane rotations) Eigenvalues We will now complete
More informationBiplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint
Biplots in Practice MICHAEL GREENACRE Proessor o Statistics at the Pompeu Fabra University Chapter 6 Oprint Principal Component Analysis Biplots First published: September 010 ISBN: 978-84-93846-8-6 Supporting
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationDimensionality Reduction Techniques (DRT)
Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,
More informationIntelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham
Intelligent Data Analysis Principal Component Analysis Peter Tiňo School of Computer Science University of Birmingham Discovering low-dimensional spatial layout in higher dimensional spaces - 1-D/3-D example
More informationIndependent Component Analysis and Its Application on Accelerator Physics
Independent Component Analysis and Its Application on Accelerator Physics Xiaoying Pang LA-UR-12-20069 ICA and PCA Similarities: Blind source separation method (BSS) no model Observed signals are linear
More informationRECENT DEVELOPMENTS IN VARIANCE COMPONENT ESTIMATION
Libraries Conference on Applied Statistics in Agriculture 1989-1st Annual Conference Proceedings RECENT DEVELOPMENTS IN VARIANCE COMPONENT ESTIMATION R. R. Hocking Follow this and additional works at:
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationAlignment and Analysis of Proteomics Data using Square Root Slope Function Framework
Alignment and Analysis of Proteomics Data using Square Root Slope Function Framework J. Derek Tucker 1 1 Department of Statistics Florida State University Tallahassee, FL 32306 CTW: Statistics of Warpings
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II
MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II the Contents the the the Independence The independence between variables x and y can be tested using.
More information7. Variable extraction and dimensionality reduction
7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationAuffray and Enjalbert 441 MODAL THEOREM PROVING : EQUATIONAL VIEWPOINT
MODAL THEOREM PROVING : EQUATIONAL VIEWPOINT Yves AUFFRAY Societe des avions M. Dassault 78 quai Marcel Dassault 92210 Saint-Cloud - France Patrice ENJALBERT * Laboratoire d'informatique University de
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationPLS classification of functional data
Computational Statistics (27) 22:223 235 DOI 1.17/s18-7-41-4 ORIGINAL PAPER PLS classification of functional data Cristian Preda Gilbert Saporta Caroline Lévéder Published online: 23 February 27 Springer-Verlag
More informationComments on the method of harmonic balance
Comments on the method of harmonic balance Ronald Mickens To cite this version: Ronald Mickens. Comments on the method of harmonic balance. Journal of Sound and Vibration, Elsevier, 1984, 94 (3), pp.456-460.
More informationShort Answer Questions: Answer on your separate blank paper. Points are given in parentheses.
ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on
More information4 Linear Algebra Review
Linear Algebra Review For this topic we quickly review many key aspects of linear algebra that will be necessary for the remainder of the text 1 Vectors and Matrices For the context of data analysis, the
More informationTHIS PAPER SHOWS how a single method
A Single Matrix Method for Several Problems By Alvin C. Egbert Matrix algebra has become a familiar research tool in recent years, but the teaching and learning problem is still formidable for many individuals.
More informationPrincipal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude
Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Addy M. Boĺıvar Cimé Centro de Investigación en Matemáticas A.C. May 1, 2010
More information