INTRODUCCIÓ A L'ANÀLISI MULTIVARIANT. Estadística Biomèdica Avançada Ricardo Gonzalo Sanz 13/07/2015

Size: px
Start display at page:

Download "INTRODUCCIÓ A L'ANÀLISI MULTIVARIANT. Estadística Biomèdica Avançada Ricardo Gonzalo Sanz 13/07/2015"

Transcription

1 INTRODUCCIÓ A L'ANÀLISI MULTIVARIANT Estadística Biomèdica Avançada Ricardo Gonzalo Sanz ricardo.gonzalo@vhir.org 13/07/2015

2 1. Introduction to Multivariate Analysis 2. Summary Statistics for Multivariate Data 3. Inference with Multivariate data 4. Principal Components analysis

3 1. Introduction to Multivariate Analysis 2. Summary Statistics for Multivariate Data 3. Inference with Multivariate data 4. Principal Components analysis

4 1. Introduction to multivariate analysis Difficult with only one variable... In real life most phenomena are complex and can rarely be described using a single variable: Socio economic surveys Clinical studies Economical indices

5 1. Introduction to multivariate analysis Some examples... Coronary Heart Study has measured 7 variables: Arterial Tension, Age, Weight, Body Surface, Years suffering HT, Pulse, Stress. Nutritional study: data on 29 fast-food products: Price, weight, calories, protein, fat, saturatedfat, sodium, iron, calcium, Vitamin_a, Vitamin:C, food_type Risk prediction models for prostate cancer: race, age, sex, genetics, body mass index, family history of cancer, history of tobacco use, use of aspirin and nonsteroidal anti-inflammatory drugs (NSAIDS), physical activity, use of hormone replacement therapy, reproductive factors, history of cancer screening, and dietary factors.

6 1. Introduction to multivariate analysis Some examples... Gene expression analysis with high throughput techniques (microarrays, RNA-Seq, )

7 1. Introduction to multivariate analysis Nowadays is very common listening about MVA SARA H. ASENADOR DIARIO EXPANSIÓN 27/06/2015

8 1. Introduction to multivariate analysis Difficult with only one variable... All these are examples of multidimensional data which requires multivariate statistical techniques to deal with them Multivariate data arise when researchers measure several variables on each unit in their sample. The majority of data sets collected by researchers in all disciplines are multivariate. in some cases it may make sense to isolate each variable and study it separately, in the main it does not (only the simultaneously study of variables will uncover the patterns of the data)

9 1. Introduction to multivariate analysis Difficult with only one variable... All these are examples of multidimensional data which requires multivariate statistical techniques to deal with them Multivariate data arise when researchers measure several variables on each unit in their sample. The majority of data sets collected by researchers in all disciplines are multivariate. in some cases it may make sense to isolate each variable and study it separately, in the main it does not (only the simultaneously study of variables will uncover the patterns of the data)

10 Observations (n) Observations (n) 1. Introduction to multivariate analysis How the data looks: Univariate statistics Multivariate statistics Variables (K) Variables (K) K>n K<n

11 Observations (n) 1. Introduction to multivariate analysis How the data looks: Multivariate statistics Variables (K) K>n K<n

12 1. Introduction to multivariate analysis How the data looks: Couple Hage Hheight Wage Wheight Hagefm Huswif dataset. # The observations are 10 married couples # Hage: the husband's age (in years). # Hheight: the husband's height (in mm). # Wage: the wife's age (in years). # Wheight: the wife's's height (in mm). # Hagefm: husband's age (in years) at first marriage.

13 1. Introduction to multivariate analysis How the data looks: Huswif dataset.

14 Observations (n) 1. Introduction to multivariate analysis How can be studied: One approach: group techniques differently depending if 1. The goal is to model the relation between one or more independent explanatory variables and one or more dependent variables. Multiple regression, Factor Analysis, Discriminant Analysis, 2. The goal is to model the relation between a group of variables where none of them has special relevance. Principal components Analysis, Cluster Analysis, MDS,.

15 Observations (n) 1. Introduction to multivariate analysis How can be studied: One approach: group techniques differently depending if 1. The goal is to model the relation between one or more independent explanatory variables and one or more dependent variables. Multiple regression, Factor Analysis, Discriminant Analysis, 2. The goal is to model the relation between a group of variables where none of them has special relevance. Principal components Analysis, Cluster Analysis, MDS,.

16 1. Introduction to Multivariate Analysis 2. Summary Statistics for Multivariate Data 3. Inference with Multivariate data 4. Principal Components analysis

17 2. Summary Statistics for Multivariate Data Numeric summaries: 1. Summaries for each of the variables separately Means Variances 2. Summarize the relationships between the variables Covariances Correlations Distances

18 2. Summary Statistics for Multivariate Data Mean: (huswif data set)

19 2. Summary Statistics for Multivariate Data Mean: (huswif data set) mean sd n Hage Hagefm Hheight Wage Wheight Variances: is a measure of the spread of variable values > apply(huswif,2,var) Hage Hheight Wage Wheight Hagefm

20 2. Summary Statistics for Multivariate Data Covariances: It is a measure of how two variables change together in the dataset. covariances Variances > var(huswif) Hage Hheight Wage Wheight Hagefm Hage Hheight Wage Wheight Hagefm Covariance matrix

21 2. Summary Statistics for Multivariate Data Correlations: It is a measure of the strength and direction of the linear relationship between two variables. We will know if the two variables are related or there are independent. Values go from -1 to +1. > cor(huswif) Hage Hheight Wage Wheight Hagefm Hage Hheight Wage Wheight Hagefm

22 2. Summary Statistics for Multivariate Data Distances: Most common measure of distance is the Euclidean distance:

23 2. Summary Statistics for Multivariate Data Distances: > dist(scale(huswif))

24 Height dist(scale(huswif)) hclust (*, "complete") Summary Statistics for Multivariate Data Distances: Cluster Dendrogram > plot(hclust(dist(scale(huswif))))

25 2. Summary Statistics for Multivariate Data Graphical summaries: Scatterplot

26 2. Summary Statistics for Multivariate Data Graphical summaries: Boxplot boxplot(scale(huswif),col= red )

27 2. Summary Statistics for Multivariate Data Graphical summaries: Star plot Star plot of Huswif dataset stars(huswif,full=true,scale=true,labels= c(1:10),key.loc=c(8,2),main="star plot of Huswif dataset",draw.segments=true) Each star represents one couple of the dataset; each ray in the star is proportional to one variable Hheight Hage Wage 10 Wheight Hagefm

28 2. Summary Statistics for Multivariate Data Graphical summaries: Biplot Fuel, gear ratio size

29 2. Summary Statistics for Multivariate Data Exercise Description: Largemouth bass were studied in 53 different Florida lakes to examine the factors that influence the level of mercury contamination. Water samples were collected from the surface of the middle of each lake in August 1990 and then again in March The ph level, the amount of chlorophyll, calcium, and alkalinity were measured in each sample. The average of the August and March values were used in the analysis. Next, a sample of fish was taken from each lake with sample sizes ranging from 4 to 44 fish. The age of each fish and mercury concentration in the muscle tissue was measured. (Note: Since fish absorb mercury over time, older fish will tend to have higher concentrations). Thus, to make a fair comparison of the fish in different lakes, the investigators used a regression estimate of the expected mercury concentration in a three year old fish as the standardized value for each lake. Finally, in 10 of the 53 lakes, the age of the individual fish could not be determined and the average mercury concentration of the sampled fish was used instead of the standardized value Dataset: Mercury.txt

30 2. Summary Statistics for Multivariate Data Exercise Variable Names: ID: ID number Lake: Name of the lake Alkalinity: Alkalinity (mg/l as Calcium Carbonate) ph: ph Calcium: Calcium (mg/l) Chlorophyll: Chlorophyll (mg/l) Avg_Mercury: Average mercury concentration (parts per million) in the muscle tissue of the fish sampled from that lake No.samples: How many fish were sampled from the lake min: Minimum mercury concentration amongst the sampled fish max: Maximum mercury concentration amongst the sampled fish 3_yr_Standard_mercury : Regression estimate of the mercury concentration in a 3 year old fish from the lake (or = Avg Mercury when age data was not available) age_data: Indicator of the availability of age data on fish sampled

31 2. Summary Statistics for Multivariate Data Exercise

32 2. Summary Statistics for Multivariate Data Exercise mean sd IQR 0% 25% 50% 75% 100% n age_data Alkalinity Avg_Mercury Calcium Chlorophyll max min No.samples ph > apply(mercurio[,c(3:10)],2,var) Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min max e e e e e e e e-01

33 2. Summary Statistics for Multivariate Data > var(mercurio[,c(3:12)]) Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min max X3_yr_Standard_Mercury age_data max X3_yr_Standard_Mercury age_data Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min max X3_yr_Standard_Mercury age_data

34 2. Summary Statistics for Multivariate Data > cor(mercurio[,c(3:12)]) Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min max X3_yr_Standard_Mercury age_data max X3_yr_Standard_Mercury age_data Alkalinity ph Calcium Chlorophyll Avg_Mercury No.samples min max X3_yr_Standard_Mercury age_data

35 2. Summary Statistics for Multivariate Data

36 2. Summary Statistics for Multivariate Data boxplot(scale(mercurio[,c(3:12)]))

37 2. Summary Statistics for Multivariate Data > dist(scale(mercurio[,c(3:12)]))

38 2. Summary Statistics for Multivariate Data > plot(hclust(dist(scale(mercurio[,c(3:12)]))),labels=mercurio[,2])

39 1. Introduction to Multivariate Analysis 2. Summary Statistics for Multivariate Data 3. Inference with Multivariate data 4. Principal Components analysis

40 3. Inference with Multivariate data Hotelling and MANOVA test. Hotellings T test: it would analogues of the familiar student t test from univariate analysis. It tests of the differences between the (multivariate) means of different populations MANOVA: It would analogues of the ANOVA of the univariate analysis. The means of different variables in different populations are computed.

41 1. Introduction to Multivariate Analysis 2. Summary Statistics for Multivariate Data 3. Inference with Multivariate data 4. Principal Components analysis

42 4. Principal Component Analysis Definition. PC1 = 37% Teams towards the top of the graph tipically concede more shots and win more aerial duels, while as you move down, teams attempt more short passes with greater accuracy PC2= 18% teams further to the right of the graph attempt more tackles, interceptions and dribbles

43 4. Principal Component Analysis Definition. Given a KxN data matrix containing K (correlated) measurements on N samples (objects/individuals ) Decomposes data matrix in new K components that account for different sources of variability in the data, are uncorrelated, that is each component accounts for a different source of variability, have decreasing explanatory ability: each component explains more than the following allow for a lower dimensional representation of the data in terms of scores on principal components. provide an overview of the dominant patterns and major trends in the data (visualize clusters, identify outliers)

44 4. Principal Component Analysis How does PCA works. We have dataframe of absorbance values for 30 retention times and 28 wavelengths in an HPLC A principal component analysis consists of a repetitive process of using linear regression to find a new set of axes that are better aligned with the data. This axis represents some unknown factor that has the power to explain a significant portion of the variation in the data. This is accomplished by fitting a straight-line to the 30 data points, with the resulting linear regression model giving a new axis that best explains the data. Next, the 30 data points are projected onto the 27- dimensional surface that is perpendicular to the regression line and the process of regression and projection continues until there is a complete set of 28 new axes, each representing an unknown factor of lesser importance than those preceding it.

45 4. Principal Component Analysis How does PCA works. Being the data correlated it is difficult to separate each source of variability If K were much higher it would even be more difficult.

46 4. Principal Component Analysis How does PCA works. Transform the data Center each variable subtracting its mean Scale each variable dividing by its SD All variables are now comparable: Mean = 0 SD = 1

47 4. Principal Component Analysis How does PCA works. First principal component: a linear combination of all the original variables that goes along the direction of highest variability in the data explains the maximum amount of variation in the data How does PCA work

48 4. Principal Component Analysis How does PCA work 2nd principal component: a linear combination of all the original variables that goes along the next direction of highest variability in the data orthogonally to first PC explains the maximum amount of remaining variation in the data Successive PCs describe decreasing amount of remaining variation.

49 4. Principal Component Analysis How does PCA work PCA provides a new set of coordinates for the observations Original coordinates Value of the variables New coordinates Value of PCs: scores Scores are the new coordinates in the orthogonal system defined by PCs. X1 X2

50 4. Principal Component Analysis How does PCA works. PCs have been derived so that They are orthogonal Each PC explains the maximum amount of remaining variation in the data This means that it is not necessary to use all PCs to visualize the data in this new coordinate system Taking the first PCs will often explain a high percentage of variability. Usually only first 2 or 3 This should always be checked!!!

51 4. Principal Component Analysis How does PCA works. PCs can be interpreted by looking at which of the original variables contribute most to their variability The more a variable is correlated with a PC the highest its influence. Size of contributions of each variable: loadings Loadings are the cosines of the angle between variables and PCs

52 4. Principal Component Analysis How does PCA works. Summary: PCA performs a transformation into a new set of orthogonal coordinates with decreasing ability (most, 1 st PC, to least, last PC) to explain the observed variability. PCA analysis provides % of variance explained by each PC Loadings: Correlations between PCs and variables Use these to (try to) interpret what the PCs mean Scores: Values of the observations in the PC system of coordinates Use these to plot the observations in reduced dimension.

53 4. Principal Component Analysis How does PCA works. Summary: PCA performs a transformation into a new set of orthogonal coordinates with decreasing ability (most, 1 st PC, to least, last PC) to explain the observed variability. PCA analysis provides % of variance explained by each PC Loadings: Correlations between PCs and variables Use these to (try to) interpret what the PCs mean Scores: Values of the observations in the PC system of coordinates Use these to plot the observations in reduced dimension.

54 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)

55 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) Summary: mean sd IQR 0% 25% 50% 75% 100% n AnysHT Edat Estrés Pes Pols PressioArt

56 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)

57 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)

58 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) Component loadings: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 AnysHT Edat Estrés Pes Pols PressioArt SupCorp Loadings: Correlations between PCs and variables Use these to (try to) interpret what the PCs mean They serve as a guide to quantify how important a given variable is in a component.

59 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) Component variances: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Standard deviation Proportion of Variance Cumulative Proportion Each component (orderedly) explains more variability than the one that follows it. The analysis provides: Components variances Percentage of variability explained by each component The screeeplot to guide the decision on how many components should be retained in order to provide a good explanation of the data in reduced dimension.

60 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)

61 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) PC values can be used to plot the data The plot can be used as a guide to interpret the main sources of variability In RCmdr if the option Add principal components to dataset has been selected the plot is done as usual selecting these new variables.

62 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)

63 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)

64 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt)

65 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt).PC <- princomp(~anysht+edat+estrés+pes+pols+pressioart+supcorp, cor=true, data=coronari) text(.pc$scores[,1],.pc$scores[,2],coronari[,7])

66 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) biplot(.pc)

67 4. Principal Component Analysis Example: Coronary Risk data (RiscCoronari.txt) Compute correlation matrix of the combined dataset. Look at the correlations between PCAs and original variables.

68 4. Principal Component Analysis Exercise Datos: Obreros.csv

69 4. Principal Component Analysis RESULTS

70 Variances Principal Component Analysis Scree Plot.PC Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7

71 4. Principal Component Analysis Principal components interpretation (loadings) PC in the dataset:

72 4. Principal Component Analysis Correlation between PC and variables

73 4. Principal Component Analysis CONCLUSIONS. PC1 separate the families for sons number and economic reasons PC2 separate CA families and other families with low son number from the other

74 4. Principal Component Analysis plot(hclust(dist(scale(obreros[,c(2:7)]))),labels=obreros[,1])

75 4. Principal Component Analysis biplot(.pc) text(.pc$scores[,1],.pc$scores[,2],obreros[,1])

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction

More information

A User's Guide To Principal Components

A User's Guide To Principal Components A User's Guide To Principal Components J. EDWARD JACKSON A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Brisbane Toronto Singapore Contents Preface Introduction 1. Getting

More information

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Multivariate Statistics (I) 2. Principal Component Analysis (PCA) Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation

More information

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

PRINCIPAL COMPONENTS ANALYSIS (PCA)

PRINCIPAL COMPONENTS ANALYSIS (PCA) PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction PCA is considered an exploratory technique that can be used to gain a better understanding of the interrelationships between variables. PCA is performed

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Principal Component Analysis, A Powerful Scoring Technique

Principal Component Analysis, A Powerful Scoring Technique Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

Principal Component Analysis

Principal Component Analysis I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables

More information

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Overview of clustering analysis. Yuehua Cui

Overview of clustering analysis. Yuehua Cui Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this

More information

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories. Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Quantitative Understanding in Biology Principal Components Analysis

Quantitative Understanding in Biology Principal Components Analysis Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations

More information

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download

More information

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

An Introduction to Applied Multivariate Analysis with R

An Introduction to Applied Multivariate Analysis with R ~ Snrinuer Brian Everitt Torsten Hathorn An Introduction to Applied Multivariate Analysis with R > Preface........................................................ vii 1 Multivariate Data and Multivariate

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

Multivariate analysis of genetic data an introduction

Multivariate analysis of genetic data an introduction Multivariate analysis of genetic data an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Population genomics in Lausanne 23 Aug 2016 1/25 Outline Multivariate

More information

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17 Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

6. Let C and D be matrices conformable to multiplication. Then (CD) =

6. Let C and D be matrices conformable to multiplication. Then (CD) = Quiz 1. Name: 10 points per correct answer. (20 points for attendance). 1. Let A = 3 and B = [3 yy]. When is A equal to B? xx A. When x = 3 B. When y = 3 C. When x = y D. Never 2. See 1. What is the dimension

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Introduction to Principal Component Analysis (PCA)

Introduction to Principal Component Analysis (PCA) Introduction to Principal Component Analysis (PCA) NESAC/BIO NESAC/BIO Daniel J. Graham PhD University of Washington NESAC/BIO MVSA Website 2010 Multivariate Analysis Multivariate analysis (MVA) methods

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

Simplifying Drug Discovery with JMP

Simplifying Drug Discovery with JMP Simplifying Drug Discovery with JMP John A. Wass, Ph.D. Quantum Cat Consultants, Lake Forest, IL Cele Abad-Zapatero, Ph.D. Adjunct Professor, Center for Pharmaceutical Biotechnology, University of Illinois

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

UCLA STAT 233 Statistical Methods in Biomedical Imaging

UCLA STAT 233 Statistical Methods in Biomedical Imaging UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

A Introduction to Matrix Algebra and the Multivariate Normal Distribution

A Introduction to Matrix Algebra and the Multivariate Normal Distribution A Introduction to Matrix Algebra and the Multivariate Normal Distribution PRE 905: Multivariate Analysis Spring 2014 Lecture 6 PRE 905: Lecture 7 Matrix Algebra and the MVN Distribution Today s Class An

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 6. Principal component analysis (PCA) 6.1 Overview 6.2 Essentials of PCA 6.3 Numerical calculation of PCs 6.4 Effects of data preprocessing

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2

More information

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains

More information

Algebra II Vocabulary Cards

Algebra II Vocabulary Cards Algebra II Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Complex Numbers Complex Number (examples)

More information

Multivariate analysis of genetic data: an introduction

Multivariate analysis of genetic data: an introduction Multivariate analysis of genetic data: an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London XXIV Simposio Internacional De Estadística Bogotá, 25th July

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014 Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of

More information

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Multivariate calibration

Multivariate calibration Multivariate calibration What is calibration? Problems with traditional calibration - selectivity - precision 35 - diagnosis Multivariate calibration - many signals - multivariate space How to do it? observed

More information

Practice Questions for Exam 1

Practice Questions for Exam 1 Practice Questions for Exam 1 1. A used car lot evaluates their cars on a number of features as they arrive in the lot in order to determine their worth. Among the features looked at are miles per gallon

More information

7 Principal Components and Factor Analysis

7 Principal Components and Factor Analysis 7 Principal Components and actor nalysis 7.1 Principal Components a oal. Relationships between two variables can be graphically well captured in a meaningful way. or three variables this is also possible,

More information

STAT 730 Chapter 1 Background

STAT 730 Chapter 1 Background STAT 730 Chapter 1 Background Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 27 Logistics Course notes hopefully posted evening before lecture,

More information

Statistical Analysis. G562 Geometric Morphometrics PC 2 PC 2 PC 3 PC 2 PC 1. Department of Geological Sciences Indiana University

Statistical Analysis. G562 Geometric Morphometrics PC 2 PC 2 PC 3 PC 2 PC 1. Department of Geological Sciences Indiana University PC 2 PC 2 G562 Geometric Morphometrics Statistical Analysis PC 2 PC 1 PC 3 Basic components of GMM Procrustes Whenever shapes are analyzed together, they must be superimposed together This aligns shapes

More information

Algebra, Functions, and Data Analysis Vocabulary Cards

Algebra, Functions, and Data Analysis Vocabulary Cards Algebra, Functions, and Data Analysis Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Complex Numbers

More information

Bivariate data analysis

Bivariate data analysis Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green

More information

An Introduction to Multivariate Methods

An Introduction to Multivariate Methods Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate

More information

Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis

Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Jinhyun Ju Jason Banfelder Luce Skrabanek June 21st, 218 1 Preface For the last session in this course, we ll

More information

Algebra II Vocabulary Cards

Algebra II Vocabulary Cards Algebra II Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Complex Numbers Complex Number (examples)

More information

Handout #8: Matrix Framework for Simple Linear Regression

Handout #8: Matrix Framework for Simple Linear Regression Handout #8: Matrix Framework for Simple Linear Regression Example 8.1: Consider again the Wendy s subset of the Nutrition dataset that was initially presented in Handout #7. Assume the following structure

More information

PCA Advanced Examples & Applications

PCA Advanced Examples & Applications PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise Principal Components (PCA) Paper II Example:

More information

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation Chapter 1 Summarizing Bivariate Data Linear Regression and Correlation This chapter introduces an important method for making inferences about a linear correlation (or relationship) between two variables,

More information

Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information.

Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental science

More information

Multivariate Analysis of Variance

Multivariate Analysis of Variance Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)

More information

Experimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University

Experimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University Experimental design Matti Hotokka Department of Physical Chemistry Åbo Akademi University Contents Elementary concepts Regression Validation Hypotesis testing ANOVA PCA, PCR, PLS Clusters, SIMCA Design

More information

Looking at Data Relationships. 2.1 Scatterplots W. H. Freeman and Company

Looking at Data Relationships. 2.1 Scatterplots W. H. Freeman and Company Looking at Data Relationships 2.1 Scatterplots 2012 W. H. Freeman and Company Here, we have two quantitative variables for each of 16 students. 1) How many beers they drank, and 2) Their blood alcohol

More information

Multivariate data analysis (MVA) - Introduction

Multivariate data analysis (MVA) - Introduction Multivariate data analysis (MVA) - Introduction Introduction t 2 Univariate/Multivariate Latent variables Projections t 1 PCA Examples 10/9/2012 MVA intro 2008 H. Antti 1 Chemical and Biological data are

More information

Pre-Calculus Multiple Choice Questions - Chapter S8

Pre-Calculus Multiple Choice Questions - Chapter S8 1 If every man married a women who was exactly 3 years younger than he, what would be the correlation between the ages of married men and women? a Somewhat negative b 0 c Somewhat positive d Nearly 1 e

More information

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University Multivariate and Multivariable Regression Stella Babalola Johns Hopkins University Session Objectives At the end of the session, participants will be able to: Explain the difference between multivariable

More information

Exploratory Factor Analysis and Canonical Correlation

Exploratory Factor Analysis and Canonical Correlation Exploratory Factor Analysis and Canonical Correlation 3 Dec 2010 CPSY 501 Dr. Sean Ho Trinity Western University Please download: SAQ.sav Outline for today Factor analysis Latent variables Correlation

More information

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19 additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS Data provided: Graph Paper MAS6011 SCHOOL OF MATHEMATICS AND STATISTICS Dependent Data Spring Semester 2016 2017 3 hours Marks will be awarded for your best five answers. RESTRICTED OPEN BOOK EXAMINATION

More information

New Interpretation of Principal Components Analysis

New Interpretation of Principal Components Analysis Zeszyty Naukowe WWSI, No 16, Vol 11, 2017, pp 43-65 New Interpretation of Principal Components Analysis Zenon Gniazdowski * Warsaw School of Computer Science Abstract A new look on the principal component

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

1 Interpretation. Contents. Biplots, revisited. Biplots, revisited 2. Biplots, revisited 1

1 Interpretation. Contents. Biplots, revisited. Biplots, revisited 2. Biplots, revisited 1 Biplots, revisited 1 Biplots, revisited 2 1 Interpretation Biplots, revisited Biplots show the following quantities of a data matrix in one display: Slide 1 Ulrich Kohler kohler@wz-berlin.de Slide 3 the

More information

Latent Variable Methods Course

Latent Variable Methods Course Latent Variable Methods Course Learning from data Instructor: Kevin Dunn kevin.dunn@connectmv.com http://connectmv.com Kevin Dunn, ConnectMV, Inc. 2011 Revision: 269:35e2 compiled on 15-12-2011 ConnectMV,

More information

Freeman (2005) - Graphic Techniques for Exploring Social Network Data

Freeman (2005) - Graphic Techniques for Exploring Social Network Data Freeman (2005) - Graphic Techniques for Exploring Social Network Data The analysis of social network data has two main goals: 1. Identify cohesive groups 2. Identify social positions Moreno (1932) was

More information

UNIT 12 ~ More About Regression

UNIT 12 ~ More About Regression ***SECTION 15.1*** The Regression Model When a scatterplot shows a relationship between a variable x and a y, we can use the fitted to the data to predict y for a given value of x. Now we want to do tests

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 2017-2018 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI

More information

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012 Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

11 Correlation and Regression

11 Correlation and Regression Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value

More information

22 Approximations - the method of least squares (1)

22 Approximations - the method of least squares (1) 22 Approximations - the method of least squares () Suppose that for some y, the equation Ax = y has no solutions It may happpen that this is an important problem and we can t just forget about it If we

More information

Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

More information

Wolfgang Karl Härdle Leopold Simar. Applied Multivariate. Statistical Analysis. Fourth Edition. ö Springer

Wolfgang Karl Härdle Leopold Simar. Applied Multivariate. Statistical Analysis. Fourth Edition. ö Springer Wolfgang Karl Härdle Leopold Simar Applied Multivariate Statistical Analysis Fourth Edition ö Springer Contents Part I Descriptive Techniques 1 Comparison of Batches 3 1.1 Boxplots 4 1.2 Histograms 11

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)

More information

CHAPTER 10. Regression and Correlation

CHAPTER 10. Regression and Correlation CHAPTER 10 Regression and Correlation In this Chapter we assess the strength of the linear relationship between two continuous variables. If a significant linear relationship is found, the next step would

More information

G E INTERACTION USING JMP: AN OVERVIEW

G E INTERACTION USING JMP: AN OVERVIEW G E INTERACTION USING JMP: AN OVERVIEW Sukanta Dash I.A.S.R.I., Library Avenue, New Delhi-110012 sukanta@iasri.res.in 1. Introduction Genotype Environment interaction (G E) is a common phenomenon in agricultural

More information

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively

More information

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information