INTRODUCCIÓ A L'ANÀLISI MULTIVARIANT. Estadística Biomèdica Avançada Ricardo Gonzalo Sanz 13/07/2015
|
|
- Rolf Holland
- 6 years ago
- Views:
Transcription
1 INTRODUCCIÓ A L'ANÀLISI MULTIVARIANT Estadística Biomèdica Avançada Ricardo Gonzalo Sanz ricardo.gonzalo@vhir.org 13/07/2015
2 1. Introduction to Multivariate Analysis 2. Summary Statistics for Multivariate Data 3. Inference with Multivariate data 4. Principal Components analysis
3 1. Introduction to Multivariate Analysis 2. Summary Statistics for Multivariate Data 3. Inference with Multivariate data 4. Principal Components analysis
4 1. Introduction to multivariate analysis Difficult with only one variable... In real life most phenomena are complex and can rarely be described using a single variable: Socio economic surveys Clinical studies Economical indices
5 1. Introduction to multivariate analysis Some examples... Coronary Heart Study has measured 7 variables: Arterial Tension, Age, Weight, Body Surface, Years suffering HT, Pulse, Stress. Nutritional study: data on 29 fast-food products: Price, weight, calories, protein, fat, saturatedfat, sodium, iron, calcium, Vitamin_a, Vitamin:C, food_type Risk prediction models for prostate cancer: race, age, sex, genetics, body mass index, family history of cancer, history of tobacco use, use of aspirin and nonsteroidal anti-inflammatory drugs (NSAIDS), physical activity, use of hormone replacement therapy, reproductive factors, history of cancer screening, and dietary factors.
6 1. Introduction to multivariate analysis Some examples... Gene expression analysis with high throughput techniques (microarrays, RNA-Seq, )
7 1. Introduction to multivariate analysis Nowadays is very common listening about MVA SARA H. ASENADOR DIARIO EXPANSIÓN 27/06/2015
8 1. Introduction to multivariate analysis Difficult with only one variable... All these are examples of multidimensional data which requires multivariate statistical techniques to deal with them Multivariate data arise when researchers measure several variables on each unit in their sample. The majority of data sets collected by researchers in all disciplines are multivariate. in some cases it may make sense to isolate each variable and study it separately, in the main it does not (only the simultaneously study of variables will uncover the patterns of the data)
9 1. Introduction to multivariate analysis Difficult with only one variable... All these are examples of multidimensional data which requires multivariate statistical techniques to deal with them Multivariate data arise when researchers measure several variables on each unit in their sample. The majority of data sets collected by researchers in all disciplines are multivariate. in some cases it may make sense to isolate each variable and study it separately, in the main it does not (only the simultaneously study of variables will uncover the patterns of the data)
10 Observations (n) Observations (n) 1. Introduction to multivariate analysis How the data looks: Univariate statistics Multivariate statistics Variables (K) Variables (K) K>n K<n
11 Observations (n) 1. Introduction to multivariate analysis How the data looks: Multivariate statistics Variables (K) K>n K<n
12 1. Introduction to multivariate analysis How the data looks: Couple Hage Hheight Wage Wheight Hagefm Huswif dataset. # The observations are 10 married couples # Hage: the husband's age (in years). # Hheight: the husband's height (in mm). # Wage: the wife's age (in years). # Wheight: the wife's's height (in mm). # Hagefm: husband's age (in years) at first marriage.
13 1. Introduction to multivariate analysis How the data looks: Huswif dataset.
14 Observations (n) 1. Introduction to multivariate analysis How can be studied: One approach: group techniques differently depending if 1. The goal is to model the relation between one or more independent explanatory variables and one or more dependent variables. Multiple regression, Factor Analysis, Discriminant Analysis, 2. The goal is to model the relation between a group of variables where none of them has special relevance. Principal components Analysis, Cluster Analysis, MDS,.
15 Observations (n) 1. Introduction to multivariate analysis How can be studied: One approach: group techniques differently depending if 1. The goal is to model the relation between one or more independent explanatory variables and one or more dependent variables. Multiple regression, Factor Analysis, Discriminant Analysis, 2. The goal is to model the relation between a group of variables where none of them has special relevance. Principal components Analysis, Cluster Analysis, MDS,.
16 1. Introduction to Multivariate Analysis 2. Summary Statistics for Multivariate Data 3. Inference with Multivariate data 4. Principal Components analysis
17 2. Summary Statistics for Multivariate Data Numeric summaries: 1. Summaries for each of the variables separately Means Variances 2. Summarize the relationships between the variables Covariances Correlations Distances
18 2. Summary Statistics for Multivariate Data Mean: (huswif data set)
19 2. Summary Statistics for Multivariate Data Mean: (huswif data set) mean sd n Hage Hagefm Hheight Wage Wheight Variances: is a measure of the spread of variable values > apply(huswif,2,var) Hage Hheight Wage Wheight Hagefm
20 2. Summary Statistics for Multivariate Data Covariances: It is a measure of how two variables change together in the dataset. covariances Variances > var(huswif) Hage Hheight Wage Wheight Hagefm Hage Hheight Wage Wheight Hagefm Covariance matrix
21 2. Summary Statistics for Multivariate Data Correlations: It is a measure of the strength and direction of the linear relationship between two variables. We will know if the two variables are related or there are independent. Values go from -1 to +1. > cor(huswif) Hage Hheight Wage Wheight Hagefm Hage Hheight Wage Wheight Hagefm
22 2. Summary Statistics for Multivariate Data Distances: Most common measure of distance is the Euclidean distance:
23 2. Summary Statistics for Multivariate Data Distances: > dist(scale(huswif))
24 Height dist(scale(huswif)) hclust (*, "complete") Summary Statistics for Multivariate Data Distances: Cluster Dendrogram > plot(hclust(dist(scale(huswif))))
25 2. Summary Statistics for Multivariate Data Graphical summaries: Scatterplot
26 2. Summary Statistics for Multivariate Data Graphical summaries: Boxplot boxplot(scale(huswif),col= red )
27 2. Summary Statistics for Multivariate Data Graphical summaries: Star plot Star plot of Huswif dataset stars(huswif,full=true,scale=true,labels= c(1:10),key.loc=c(8,2),main="star plot of Huswif dataset",draw.segments=true) Each star represents one couple of the dataset; each ray in the star is proportional to one variable Hheight Hage Wage 10 Wheight Hagefm
28 2. Summary Statistics for Multivariate Data Graphical summaries: Biplot Fuel, gear ratio size
29 2. Summary Statistics for Multivariate Data Exercise Description: Largemouth bass were studied in 53 different Florida lakes to examine the factors that influence the level of mercury contamination. Water samples were collected from the surface of the middle of each lake in August 1990 and then again in March The ph level, the amount of chlorophyll, calcium, and alkalinity were measured in each sample. The average of the August and March values were used in the analysis. Next, a sample of fish was taken from each lake with sample sizes ranging from 4 to 44 fish. The age of each fish and mercury concentration in the muscle tissue was measured. (Note: Since fish absorb mercury over time, older fish will tend to have higher concentrations). Thus, to make a fair comparison of the fish in different lakes, the investigators used a regression estimate of the expected mercury concentration in a three year old fish as the standardized value for each lake. Finally, in 10 of the 53 lakes, the age of the individual fish could not be determined and the average mercury concentration of the sampled fish was used instead of the standardized value Dataset: Mercury.txt
30 2. Summary Statistics for Multivariate Data Exercise Variable Names: ID: ID number Lake: Name of the lake Alkalinity: Alkalinity (mg/l as Calcium Carbonate) ph: ph Calcium: Calcium (mg/l) Chlorophyll: Chlorophyll (mg/l) Avg_Mercury: Average mercury concentration (parts per million) in the muscle tissue of the fish sampled from that lake No.samples: How many fish were sampled from the lake min: Minimum mercury concentration amongst the sampled fish max: Maximum mercury concentration amongst the sampled fish 3_yr_Standard_mercury : Regression estimate of the mercury concentration in a 3 year old fish from the lake (or = Avg Mercury when age data was not available) age_data: Indicator of the availability of age data on fish sampled
31 2. Summary Statistics for Multivariate Data Exercise
32 2. Summary Statistics for Multivariate Data Exercise mean sd IQR 0% 25% 50% 75% 100% n age_data Alkalinity Avg_Mercury Calcium Chlorophyll max min No.samples ph > apply(mercurio[,c(3:10)],2,var) Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min max e e e e e e e e-01
33 2. Summary Statistics for Multivariate Data > var(mercurio[,c(3:12)]) Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min max X3_yr_Standard_Mercury age_data max X3_yr_Standard_Mercury age_data Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min max X3_yr_Standard_Mercury age_data
34 2. Summary Statistics for Multivariate Data > cor(mercurio[,c(3:12)]) Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min max X3_yr_Standard_Mercury age_data max X3_yr_Standard_Mercury age_data Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min max X3_yr_Standard_Mercury age_data
35 2. Summary Statistics for Multivariate Data
36 2. Summary Statistics for Multivariate Data boxplot(scale(mercurio[,c(3:12)]))
37 2. Summary Statistics for Multivariate Data > dist(scale(mercurio[,c(3:12)]))
38 2. Summary Statistics for Multivariate Data > plot(hclust(dist(scale(mercurio[,c(3:12)]))),labels=mercurio[,2])
39 1. Introduction to Multivariate Analysis 2. Summary Statistics for Multivariate Data 3. Inference with Multivariate data 4. Principal Components analysis
40 3. Inference with Multivariate data Hotelling and MANOVA test. Hotellings T test: it would analogues of the familiar student t test from univariate analysis. It tests of the differences between the (multivariate) means of different populations MANOVA: It would analogues of the ANOVA of the univariate analysis. The means of different variables in different populations are computed.
41 1. Introduction to Multivariate Analysis 2. Summary Statistics for Multivariate Data 3. Inference with Multivariate data 4. Principal Components analysis
42 4. Principal Component Analysis Definition. PC1 = 37% Teams towards the top of the graph tipically concede more shots and win more aerial duels, while as you move down, teams attempt more short passes with greater accuracy PC2= 18% teams further to the right of the graph attempt more tackles, interceptions and dribbles
43 4. Principal Component Analysis Definition. Given a KxN data matrix containing K (correlated) measurements on N samples (objects/individuals ) Decomposes data matrix in new K components that account for different sources of variability in the data, are uncorrelated, that is each component accounts for a different source of variability, have decreasing explanatory ability: each component explains more than the following allow for a lower dimensional representation of the data in terms of scores on principal components. provide an overview of the dominant patterns and major trends in the data (visualize clusters, identify outliers)
44 4. Principal Component Analysis How does PCA works. We have dataframe of absorbance values for 30 retention times and 28 wavelengths in an HPLC A principal component analysis consists of a repetitive process of using linear regression to find a new set of axes that are better aligned with the data. This axis represents some unknown factor that has the power to explain a significant portion of the variation in the data. This is accomplished by fitting a straight-line to the 30 data points, with the resulting linear regression model giving a new axis that best explains the data. Next, the 30 data points are projected onto the 27- dimensional surface that is perpendicular to the regression line and the process of regression and projection continues until there is a complete set of 28 new axes, each representing an unknown factor of lesser importance than those preceding it.
45 4. Principal Component Analysis How does PCA works. Being the data correlated it is difficult to separate each source of variability If K were much higher it would even be more difficult.
46 4. Principal Component Analysis How does PCA works. Transform the data Center each variable subtracting its mean Scale each variable dividing by its SD All variables are now comparable: Mean = 0 SD = 1
47 4. Principal Component Analysis How does PCA works. First principal component: a linear combination of all the original variables that goes along the direction of highest variability in the data explains the maximum amount of variation in the data How does PCA work
48 4. Principal Component Analysis How does PCA work 2nd principal component: a linear combination of all the original variables that goes along the next direction of highest variability in the data orthogonally to first PC explains the maximum amount of remaining variation in the data Successive PCs describe decreasing amount of remaining variation.
49 4. Principal Component Analysis How does PCA work PCA provides a new set of coordinates for the observations Original coordinates Value of the variables New coordinates Value of PCs: scores Scores are the new coordinates in the orthogonal system defined by PCs. X1 X2
50 4. Principal Component Analysis How does PCA works. PCs have been derived so that They are orthogonal Each PC explains the maximum amount of remaining variation in the data This means that it is not necessary to use all PCs to visualize the data in this new coordinate system Taking the first PCs will often explain a high percentage of variability. Usually only first 2 or 3 This should always be checked!!!
51 4. Principal Component Analysis How does PCA works. PCs can be interpreted by looking at which of the original variables contribute most to their variability The more a variable is correlated with a PC the highest its influence. Size of contributions of each variable: loadings Loadings are the cosines of the angle between variables and PCs
52 4. Principal Component Analysis How does PCA works. Summary: PCA performs a transformation into a new set of orthogonal coordinates with decreasing ability (most, 1 st PC, to least, last PC) to explain the observed variability. PCA analysis provides % of variance explained by each PC Loadings: Correlations between PCs and variables Use these to (try to) interpret what the PCs mean Scores: Values of the observations in the PC system of coordinates Use these to plot the observations in reduced dimension.
53 4. Principal Component Analysis How does PCA works. Summary: PCA performs a transformation into a new set of orthogonal coordinates with decreasing ability (most, 1 st PC, to least, last PC) to explain the observed variability. PCA analysis provides % of variance explained by each PC Loadings: Correlations between PCs and variables Use these to (try to) interpret what the PCs mean Scores: Values of the observations in the PC system of coordinates Use these to plot the observations in reduced dimension.
54 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)
55 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) Summary: mean sd IQR 0% 25% 50% 75% 100% n AnysHT Edat Estrés Pes Pols PressioArt
56 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)
57 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)
58 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) Component loadings: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 AnysHT Edat Estrés Pes Pols PressioArt SupCorp Loadings: Correlations between PCs and variables Use these to (try to) interpret what the PCs mean They serve as a guide to quantify how important a given variable is in a component.
59 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) Component variances: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Standard deviation Proportion of Variance Cumulative Proportion Each component (orderedly) explains more variability than the one that follows it. The analysis provides: Components variances Percentage of variability explained by each component The screeeplot to guide the decision on how many components should be retained in order to provide a good explanation of the data in reduced dimension.
60 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)
61 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) PC values can be used to plot the data The plot can be used as a guide to interpret the main sources of variability In RCmdr if the option Add principal components to dataset has been selected the plot is done as usual selecting these new variables.
62 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)
63 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)
64 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)
65 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt).PC <- princomp(~anysht+edat+estrés+pes+pols+pressioart+supcorp, cor=true, data=coronari) text(.pc$scores[,1],.pc$scores[,2],coronari[,7])
66 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) biplot(.pc)
67 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) Compute correlation matrix of the combined dataset. Look at the correlations between PCAs and original variables.
68 4. Principal Component Analysis Exercise Datos: Obreros.csv
69 4. Principal Component Analysis RESULTS
70 Variances Principal Component Analysis Scree Plot.PC Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
71 4. Principal Component Analysis Principal components interpretation (loadings) PC in the dataset:
72 4. Principal Component Analysis Correlation between PC and variables
73 4. Principal Component Analysis CONCLUSIONS. PC1 separate the families for sons number and economic reasons PC2 separate CA families and other families with low son number from the other
74 4. Principal Component Analysis plot(hclust(dist(scale(obreros[,c(2:7)]))),labels=obreros[,1])
75 4. Principal Component Analysis biplot(.pc) text(.pc$scores[,1],.pc$scores[,2],obreros[,1])
Principal Component Analysis. Applied Multivariate Statistics Spring 2012
Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction
More informationA User's Guide To Principal Components
A User's Guide To Principal Components J. EDWARD JACKSON A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Brisbane Toronto Singapore Contents Preface Introduction 1. Getting
More informationMultivariate Statistics (I) 2. Principal Component Analysis (PCA)
Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation
More informationEDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS
EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More informationPRINCIPAL COMPONENTS ANALYSIS (PCA)
PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction PCA is considered an exploratory technique that can be used to gain a better understanding of the interrelationships between variables. PCA is performed
More informationDIMENSION REDUCTION AND CLUSTER ANALYSIS
DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833
More informationPrincipal Component Analysis, A Powerful Scoring Technique
Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationLECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS
LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal
More informationPrincipal Component Analysis
I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables
More informationApplied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition
Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationOverview of clustering analysis. Yuehua Cui
Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this
More informationSTATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS
STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables
More informationPrincipal Component Analysis (PCA) Theory, Practice, and Examples
Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A
More informationChapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.
Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There
More informationShort Answer Questions: Answer on your separate blank paper. Points are given in parentheses.
ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on
More informationQuantitative Understanding in Biology Principal Components Analysis
Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations
More informationMultivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis
Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download
More informationChemometrics. Matti Hotokka Physical chemistry Åbo Akademi University
Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationAn Introduction to Applied Multivariate Analysis with R
~ Snrinuer Brian Everitt Torsten Hathorn An Introduction to Applied Multivariate Analysis with R > Preface........................................................ vii 1 Multivariate Data and Multivariate
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More information-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the
1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation
More informationMultivariate analysis of genetic data an introduction
Multivariate analysis of genetic data an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Population genomics in Lausanne 23 Aug 2016 1/25 Outline Multivariate
More informationPrincipal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17
Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into
More informationPrincipal component analysis
Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and
More informationRegression Analysis. Regression: Methodology for studying the relationship among two or more variables
Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the
More information6. Let C and D be matrices conformable to multiplication. Then (CD) =
Quiz 1. Name: 10 points per correct answer. (20 points for attendance). 1. Let A = 3 and B = [3 yy]. When is A equal to B? xx A. When x = 3 B. When y = 3 C. When x = y D. Never 2. See 1. What is the dimension
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationIntroduction to Principal Component Analysis (PCA)
Introduction to Principal Component Analysis (PCA) NESAC/BIO NESAC/BIO Daniel J. Graham PhD University of Washington NESAC/BIO MVSA Website 2010 Multivariate Analysis Multivariate analysis (MVA) methods
More informationMATH 1150 Chapter 2 Notation and Terminology
MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the
More informationPrincipal Components Analysis. Sargur Srihari University at Buffalo
Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2
More informationSimplifying Drug Discovery with JMP
Simplifying Drug Discovery with JMP John A. Wass, Ph.D. Quantum Cat Consultants, Lake Forest, IL Cele Abad-Zapatero, Ph.D. Adjunct Professor, Center for Pharmaceutical Biotechnology, University of Illinois
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More information7. Variable extraction and dimensionality reduction
7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality
More informationUCLA STAT 233 Statistical Methods in Biomedical Imaging
UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationA Introduction to Matrix Algebra and the Multivariate Normal Distribution
A Introduction to Matrix Algebra and the Multivariate Normal Distribution PRE 905: Multivariate Analysis Spring 2014 Lecture 6 PRE 905: Lecture 7 Matrix Algebra and the MVN Distribution Today s Class An
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 6. Principal component analysis (PCA) 6.1 Overview 6.2 Essentials of PCA 6.3 Numerical calculation of PCs 6.4 Effects of data preprocessing
More information1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables
1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables
More informationMultivariate Fundamentals: Rotation. Exploratory Factor Analysis
Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2
More informationKarhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering
Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition
More informationMultivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis
Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis
More informationPRINCIPAL COMPONENTS ANALYSIS
PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains
More informationAlgebra II Vocabulary Cards
Algebra II Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Complex Numbers Complex Number (examples)
More informationMultivariate analysis of genetic data: an introduction
Multivariate analysis of genetic data: an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London XXIV Simposio Internacional De Estadística Bogotá, 25th July
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationNemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014
Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of
More informationAnnouncements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)
Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationMultivariate calibration
Multivariate calibration What is calibration? Problems with traditional calibration - selectivity - precision 35 - diagnosis Multivariate calibration - many signals - multivariate space How to do it? observed
More informationPractice Questions for Exam 1
Practice Questions for Exam 1 1. A used car lot evaluates their cars on a number of features as they arrive in the lot in order to determine their worth. Among the features looked at are miles per gallon
More information7 Principal Components and Factor Analysis
7 Principal Components and actor nalysis 7.1 Principal Components a oal. Relationships between two variables can be graphically well captured in a meaningful way. or three variables this is also possible,
More informationSTAT 730 Chapter 1 Background
STAT 730 Chapter 1 Background Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 27 Logistics Course notes hopefully posted evening before lecture,
More informationStatistical Analysis. G562 Geometric Morphometrics PC 2 PC 2 PC 3 PC 2 PC 1. Department of Geological Sciences Indiana University
PC 2 PC 2 G562 Geometric Morphometrics Statistical Analysis PC 2 PC 1 PC 3 Basic components of GMM Procrustes Whenever shapes are analyzed together, they must be superimposed together This aligns shapes
More informationAlgebra, Functions, and Data Analysis Vocabulary Cards
Algebra, Functions, and Data Analysis Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Complex Numbers
More informationBivariate data analysis
Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green
More informationAn Introduction to Multivariate Methods
Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate
More informationQuantitative Understanding in Biology Short Course Session 9 Principal Components Analysis
Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Jinhyun Ju Jason Banfelder Luce Skrabanek June 21st, 218 1 Preface For the last session in this course, we ll
More informationAlgebra II Vocabulary Cards
Algebra II Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Complex Numbers Complex Number (examples)
More informationHandout #8: Matrix Framework for Simple Linear Regression
Handout #8: Matrix Framework for Simple Linear Regression Example 8.1: Consider again the Wendy s subset of the Nutrition dataset that was initially presented in Handout #7. Assume the following structure
More informationPCA Advanced Examples & Applications
PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise Principal Components (PCA) Paper II Example:
More informationChapter 12 Summarizing Bivariate Data Linear Regression and Correlation
Chapter 1 Summarizing Bivariate Data Linear Regression and Correlation This chapter introduces an important method for making inferences about a linear correlation (or relationship) between two variables,
More informationApplication of mathematical, statistical, graphical or symbolic methods to maximize chemical information.
Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental science
More informationMultivariate Analysis of Variance
Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)
More informationExperimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University
Experimental design Matti Hotokka Department of Physical Chemistry Åbo Akademi University Contents Elementary concepts Regression Validation Hypotesis testing ANOVA PCA, PCR, PLS Clusters, SIMCA Design
More informationLooking at Data Relationships. 2.1 Scatterplots W. H. Freeman and Company
Looking at Data Relationships 2.1 Scatterplots 2012 W. H. Freeman and Company Here, we have two quantitative variables for each of 16 students. 1) How many beers they drank, and 2) Their blood alcohol
More informationMultivariate data analysis (MVA) - Introduction
Multivariate data analysis (MVA) - Introduction Introduction t 2 Univariate/Multivariate Latent variables Projections t 1 PCA Examples 10/9/2012 MVA intro 2008 H. Antti 1 Chemical and Biological data are
More informationPre-Calculus Multiple Choice Questions - Chapter S8
1 If every man married a women who was exactly 3 years younger than he, what would be the correlation between the ages of married men and women? a Somewhat negative b 0 c Somewhat positive d Nearly 1 e
More informationMultivariate and Multivariable Regression. Stella Babalola Johns Hopkins University
Multivariate and Multivariable Regression Stella Babalola Johns Hopkins University Session Objectives At the end of the session, participants will be able to: Explain the difference between multivariable
More informationExploratory Factor Analysis and Canonical Correlation
Exploratory Factor Analysis and Canonical Correlation 3 Dec 2010 CPSY 501 Dr. Sean Ho Trinity Western University Please download: SAQ.sav Outline for today Factor analysis Latent variables Correlation
More informationAP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation
Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may
More informationChapter 3. Measuring data
Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring
More informationsphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19
additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA
More informationStatistics Toolbox 6. Apply statistical algorithms and probability models
Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of
More informationSCHOOL OF MATHEMATICS AND STATISTICS
Data provided: Graph Paper MAS6011 SCHOOL OF MATHEMATICS AND STATISTICS Dependent Data Spring Semester 2016 2017 3 hours Marks will be awarded for your best five answers. RESTRICTED OPEN BOOK EXAMINATION
More informationNew Interpretation of Principal Components Analysis
Zeszyty Naukowe WWSI, No 16, Vol 11, 2017, pp 43-65 New Interpretation of Principal Components Analysis Zenon Gniazdowski * Warsaw School of Computer Science Abstract A new look on the principal component
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More information1 Interpretation. Contents. Biplots, revisited. Biplots, revisited 2. Biplots, revisited 1
Biplots, revisited 1 Biplots, revisited 2 1 Interpretation Biplots, revisited Biplots show the following quantities of a data matrix in one display: Slide 1 Ulrich Kohler kohler@wz-berlin.de Slide 3 the
More informationLatent Variable Methods Course
Latent Variable Methods Course Learning from data Instructor: Kevin Dunn kevin.dunn@connectmv.com http://connectmv.com Kevin Dunn, ConnectMV, Inc. 2011 Revision: 269:35e2 compiled on 15-12-2011 ConnectMV,
More informationFreeman (2005) - Graphic Techniques for Exploring Social Network Data
Freeman (2005) - Graphic Techniques for Exploring Social Network Data The analysis of social network data has two main goals: 1. Identify cohesive groups 2. Identify social positions Moreno (1932) was
More informationUNIT 12 ~ More About Regression
***SECTION 15.1*** The Regression Model When a scatterplot shows a relationship between a variable x and a y, we can use the fitted to the data to predict y for a given value of x. Now we want to do tests
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 2017-2018 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI
More informationRevision: Chapter 1-6. Applied Multivariate Statistics Spring 2012
Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing
More informationClusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved
Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse
More information11 Correlation and Regression
Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value
More information22 Approximations - the method of least squares (1)
22 Approximations - the method of least squares () Suppose that for some y, the equation Ax = y has no solutions It may happpen that this is an important problem and we can t just forget about it If we
More informationAlgebra I Vocabulary Cards
Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression
More informationWolfgang Karl Härdle Leopold Simar. Applied Multivariate. Statistical Analysis. Fourth Edition. ö Springer
Wolfgang Karl Härdle Leopold Simar Applied Multivariate Statistical Analysis Fourth Edition ö Springer Contents Part I Descriptive Techniques 1 Comparison of Batches 3 1.1 Boxplots 4 1.2 Histograms 11
More informationSimple Linear Regression
Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)
More informationCHAPTER 10. Regression and Correlation
CHAPTER 10 Regression and Correlation In this Chapter we assess the strength of the linear relationship between two continuous variables. If a significant linear relationship is found, the next step would
More informationG E INTERACTION USING JMP: AN OVERVIEW
G E INTERACTION USING JMP: AN OVERVIEW Sukanta Dash I.A.S.R.I., Library Avenue, New Delhi-110012 sukanta@iasri.res.in 1. Introduction Genotype Environment interaction (G E) is a common phenomenon in agricultural
More informationAssignment 3. Introduction to Machine Learning Prof. B. Ravindran
Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively
More informationLecture 4: Principal Component Analysis and Linear Dimension Reduction
Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More information