Analyse des Correspondence Multiples (ACM) Multiple Correspondence Analysis (MCA) Hervé Abdi. Refresher! and Menu MCA... MCA... Home Page.
|
|
- Jeffry Lewis
- 5 years ago
- Views:
Transcription
1 u u Analyse des Correspondence Multiples (ACM) Multiple Correspondence Analysis (MCA) Hervé Abdi herve herve@utdallas.edu Page of 64
2 . u u Some more eamples of simple CA What to do with more than nominal variables? Answer: Multiple Correspondence Analysis. Back to punctuation... Another look at the individual / patterns MCA for nominal variables and CA! The eigenvalue problem Correction formula... More than variables MCA and ordinal data (ranks) Page of 64
3 . u u Science Diplomas in the US from 966 to 000 Page 3 of 64
4 Table : Correspondence Analysis. All Diplomas in the US ( ). (Bachelor, Master, Ph.D., Men and Women) The Data u u Engineer Physics Earth Math Biology Psychology Social NonSc Page 4 of 64
5 PHYSICS λ =.009 τ =3% u u SOCIAL SCIENCES EARTH Non Sciences BIOLOGY PSYCHOLOGY ENGINEERING MATHEMATICS Figure : Diplomas in the US: Correspondence Analysis. λ =.0044 τ = 48% Page 5 of 64
6 PHYSICS λ =.009 τ =3% u u SOCIAL SCIENCES EARTH Non Sciences BIOLOGY PSYCHOLOGY ENGINEERING MATHEMATICS λ =.0044 τ = 48% Figure : Diplomas in the US: Correspondence Analysis (with frills). Page 6 of 64
7 u u PHYSICS λ =.009 τ =3% MEN SOCIAL SCIENCES 970 ENGINEERING 97 EARTH Non Sciences BIOLOGY PSYCHOLOGY MATHEMATICS λ =.0044 τ = 48% Page 7 of 64 WOMEN
8 u u PHYSICS λ =.009 τ =3% SOCIAL SCIENCES 970 ENGINEERING 97 EARTH MATHEMATICS λ 975 Non Sciences = τ = 48% BIOLOGY PSYCHOLOGY Page 8 of 64 Figure 4: Diplomas in the US: Correspondence Analyss. By Diploma (Green: Bachelor. Red: Master. Gold: Ph.D.).
9 u u Table : Correspondence Analysis. Diploma Global Analysis. J-set (Disciplines). The Projections, Contributions, Squared distance to barycenter, and Cosines F F ctr ctr m d cos cos Qual Engineer Physics Earth Math Biology Psychology Social NonSc Page 9 of 64
10 Table 3: Correspondence Analysis. Diploma Global Analysis. I-set (Disciplines). The Projections, Contributions, Squared distance to barycenter, and Cosines u u 0 3 F F ctr ctr m d cos cos Qual Page 0 of 64
11 The Causes of Death in the USA in 00. u u Table 4: The Causes of Death in the USA (00) Septicemia Hepatitis hiv Cancer tumor Diabetes Parkinson Alheimer Heart Pneumonia Lower Respiratory Liver Kidney Unknown Other Diseases Accident Car Accident NonCar Other Accidents Suicide Homicide Page of 64
12 First have a look at the data (log transformed): u u Color Key 00 USA. Causes of death (log) Value Septicemia Hepatitis HIV Cancer tumor Diabetes Parkinson/Alheimer Heart Pneumonia Lower Respiratory Liver Kidney Unknown Other Diseases Accident Car Accident NonCar Other Accident Page of 64 Suicide Homicide Figure 5: Causes of Death in the US: A picture of the data (log transformed).
13 Correspondence analysis: The Scree. u u Page 3 of 64
14 Eplained Variance per Dimension u u Percentage of Eplained Variance Page 4 of Dimensions Figure 6: Causes of Death in the US: The Scree (How many dimensions?).
15 A (symmetric) plot u u Hepatitis λ τ =.083 =5% λ =.373 τ = 67% 5 34 Homicide 5 4 Accident Car Accident Non Car HIV Suicide Liver Cancer Diabetes Lower Respiratory Septicemia Kidney Other Accidents Other Diseases 85+ Pneumonia Unknown Parkinson/Alheimer < Page 5 of 64 Figure 7: Causes of Death in the US: Correspondence Analysis. Symmetric plot
16 A (symmetric) plot u u Hepatitis λ τ =.083 =5% λ =.373 τ = 67% 5 34 Homicide 5 4 Accident Car Accident Non Car HIV Suicide Liver Cancer Diabetes Lower Respiratory Septicemia Kidney Other Accidents Other Diseases 85+ Pneumonia Unknown Parkinson/Alheimer < Page 6 of 64 Figure 8: Causes of Death in the US: Correspondence Analysis. Symmetric plot with frills. There are three periods in life (or death!).
17 An asymmetric plot from R Hepatitis HIV Liver Cancer Suicide Diabetes Other Diseases Tumor Other Accident Kidney Lower Respiratory Accident Car Accident NonCar Heart Septicemia Homicide Pneumonia Unknown Parkinson/Alheimer 85+ u u Page 7 of 64 Figure 9: Causes of Death in the US: Correspondence Analysis. Asymmetric plot (From R). There are three periods in life (or death!).
18 An asymmetric plot (with frills) u u Hepatitis HIV Liver Cancer Suicide Diabetes Other Diseases Tumor Other Accident Kidney Lower Respiratory Accident Car Accident NonCar Heart Septicemia Homicide Pneumonia Unknown Parkinson/Alheimer Page 8 of 64 Figure 0: Causes of Death in the US: Correspondence Analysis. Asymmetric plot with frills (From R). There are three periods in life (or death!)
19 u u F F ctr ctr m d cos cos Qual Table 5: Correspondence Analysis. Causes of Death in the US. Global Analysis. J-set (Disciplines). The Projections, Contributions, Squared Distances to Barycenter, and (squared) Cosines. Page 9 of 64
20 F F ctr ctr m d cos cos Qual Septicemia Hepatitis HIV Cancer Tumor Diabetes Parkinson/Alheimer Heart Pneumonia Lower Respiratory Liver Kidney Unknown Other Diseases Car Accidents Non-car Accidents Other Accidents Suicide Homicide u u Page 0 of 64 Table 6: Correspondence Analysis. Cause of Death in the US Global Analysis. I-set (Causes). The Projections, Contributions, Squared distance to barycenter, and Cosines
21 3. u u Page of 64
22 u u.,!?... R C H Z P G mark #. Period Rousseau marks later mark # Comma Rousseau marks later last mark: Other Giraudou Page of 64 Table 7: Multiple Correspondence Analysis, Punctuation eample. The table of patterns.
23 .,!?... R C H Z P G P-R P-C P-H P-Z P-P P-G C-R C-C C-H C-Z C-P C-G O-R O-C O-H O-Z O-P O-G Page 3 of 64 u u Table 8: Multiple Correspondence Analysis, Punctuation eample. The table of patterns.
24 The magic of the distributional equivalence!.,!?... R C H Z P G P-R P-C P-H P-Z P-P P-G C-R C-C C-H C-Z C-P C-G O-R O-C O-H O-Z O-P O-G u u Page 4 of 64 Table 9: Multiple Correspondence Analysis, Punctuation eample. The table of patterns.
25 u u Table 0: Multiple Correspondence Analysis. Punctuation eample. The Projections, Contributions, Squared Distances to Barycenter, and (Squared) Cosines. F F Ctr Ctr m d cos cos Qual PERIOD COMMA OTHER Rousseau Chateaubriand Hugo Zola Proust Giraudou Page 5 of 64
26 MCA Punctuation! Chateaubriand OTHER Rousseau Hugo COMMA Zola Proust u u PERIOD Page 6 of 64 Giraudou Figure : Multiple Correspondence Analysis: Punctuation
27 MCA Punctuation! With Frills Chateaubriand OTHER Rousseau Hugo COMMA Zola Proust u u PERIOD Page 7 of 64 Giraudou Figure : Multiple Correspondence Analysis: Punctuation
28 Compare with CA Punctuation! u u Chateaubriand Rousseau OTHER COMMA Proust Hugo PERIOD Zola Giraudou Page 8 of 64 Figure 3: Punctuation: Correspondence Analysis
29 MCA Punctuation! With Frills u u Chateaubriand Rousseau COMMA Proust OTHER Hugo PERIOD Zola Giraudou Figure 4: Punctuation Correspondence Analysis Page 9 of 64
30 The Eigenvalue Problem: Fictitious Eigenvalues (the hypercube problem) Compare u u λ λ τ τ 73 4 With M λ M λ M λ 3 M λ 4 M λ 5 M λ 6 M λ τ τ τ 3 τ 4 τ 5 τ 6 τ Page 30 of 64 The Factors are the same but not the eigenvalues! in MCA, the M λ /K code the hypercube!
31 Correcting for The Eigenvalue Problem: Keep only the M λ > ( ) λ = 4 Mλ u u Eample λ = 4 ( Mλ ) = 4 (0.0667) = =.078 Page 3 of 64
32 And Vice-Versa u u Mλ = (λ + ) Page 3 of 64
33 u u The Burt Table Table : Multiple Correspondence Analysis, Punctuation eample. The Burt Table..,!?... R C H Z P G , !? R C H Z P G Page 33 of 64
34 u u The Burt Table Table : Multiple Correspondence Analysis, Punctuation eample. The Burt Table..,!?... R C H Z P G , !? R C H Z P G Page 34 of 64
35 Add something on the Burt Table and the average inertia (From Greenacre 007) u u Page 35 of 64
36 u u Table 3: Multiple Correspondence Analysis. Burt Table Analysis. Punctuation eample. The Projections, Contributions, Squared distance to barycenter, and Cosines F F ctr ctr m d cos cos Qual , R C H Z P G Page 36 of 64
37 Compare u u λ λ τ τ 73 4 With M λ M λ M λ 3 M λ 4 M λ 5 M λ 6 M λ τ τ τ 3 τ 4 τ 5 τ 6 τ With Page 37 of 64 Bλ B λ B λ 3 B λ 4 B λ 5 B λ 6 B λ τ τ τ 3 τ 4 τ 5 τ 6 τ Bλ = M λ M λ = B λ
38 u u CA MCA Burt Normalied F F F F F F F F , , R C H Z P G Table 4: Comparison between three ways of analying the same contingency table. Correspondence analysis Multiple correspondence analysis, and Burt Table Analysis. Punctuation eample. The last two rows are normalied to. They can be obtained by normaliation of any of the analyses. The normaliation is obtained by dividing each entry of a column by the square root of its eigenvalue. Page 38 of 64
39 A Small Eample u u Effect of Oak Species on Pinot Noirs Aged 3 Pinot with Oak Type I and 6 with Oak Type II. Three Assessors choose their own variables Answers coded as binary or ternary (0/) Page 39 of 64
40 Table : Data for the barrel-aged red burgundy wines eample. Oak Type" is an illustrative (supplementary) variable, The wine W? is an unknown wine treated as a supplementary observation. Epert Epert Epert 3 Oak red Wine Type fruity woody coffee fruit roasted vanillin woody fruity butter woody W W W W W W W?? Figure 5: Wines. The Data.. Valentin: Multiple Correspondence Analysis u u h \paperheight: pt (40mm) Page 40 of 64 -MCA007-pretty.te - page 4 \paperwidth: pt (60mm)
41 corrections for the eigenvalues for MCA u u First: Benécri: Keep eigenvalue K λ l : eigenvalues from the analysis indicator matri. cλ l corrected eigenvalues [( ) ( K λ l )] if λ l > K K K cλ l = 0 if λ l K Over-correct (but not much)!. () Page 4 of 64
42 corrections for the eigenvalues for MCA u u Greenacre: Keep Benécri s correction: Correct the Inertia λ l : eigenvalues from indicator matri cλ l corrected eigenvalues Use average inertia, denoted I: I = K K Percentage inertia: τ c = c λ I ( l instead of cλ λ l J K K ) () cλ l. (3) Page 4 of 64
43 Table 5: Eigenvalues, corrected eigenvalues, proportion of eplained inertia and corrected proportion of eplained inertia. The eigenvalues of the Burt matri are equal to the squared eigenvalues of the indicator matri; The corrected eigenvalues for Benécri and Greenacre are the same, but the proportion of eplained variance differ. Eigenvalues are denoted by λ, proportions of eplained inertia by τ (note that the average inertia used to compute Greenacre s correction is equal to I =.7358). u u Indicator Burt Benécri Greenacre Matri Matri Correction Correction Factor Iλ τ I B λ τ B Z λ τ Z cλ τ c Page 43 of 64
44 u u Table 6: Factor scores, squared cosines, and contributions for the observations (I-set). The eigenvalues and proportions of eplained inertia are corrected using Benécri/Greenacre formula. Contributions corresponding to negative scores are in italic. The mystery wine (Wine?) is a supplementary observation. Only the first two factors are reported. Wine Wine Wine 3 Wine 4 Wine 5 Wine 6 Wine? F cλ % c Factor Scores F Squared Cosines F Contributions Page 44 of 64
45 u u 4 3 λ =.03 τ = % 5? 6 λ =.7004 τ = 95% Page 45 of 64 Figure 6: Wines. Map of the wines.
46 b Figure 7: Wines. Map of the Variables (Assessors Scales). tions on the rst dimensions. The eigenvalues (λ) and ed with Benécri/Greenacre formula. (a) The I set: rows The J set: columns (i.e., adjectives). Oak and Oak are been slightly moved to increase readability). (Projections H. Abdi & D. Valentin: Multiple Correspondence An u u Page 46 of 64 EX document: Abdi-MCA007-pretty.te - page 0 gust, 009-5h \paperhe \paperw
47 TAKS in TEXAS u u A Bigggggggggggg Analysis: Effect of Correction on Big Data Eemple: Math Test: 5th Grade (all of DISD). (44 Questions 4 answers each = 7 Columns). Page 47 of 64
48 u u Principal Component Number MCA Taks Grade 5 o o o o o o o o o o o o o o o o o o o o o o o o * o o o o o o o o o o * * * o o o * * o o * * o o o * o o o * * o o o o o o o * o o o * o o o o oo o o * * o * * ** o o o o o o o o o Principal Component Number Page 48 of 64
49 Correcting for The Eigenvalue Problem: With K Variables: Keep only the M λ > K λ = Mλ =.884 ( ) ( K Mλ ) K K Eample: TAKS ( ) ( 44 λ =.884 ) =.087 (Goes from 7% to 96%!) u u Page 49 of 64
50 Ordinal Data u u Five judges Each judge rates five beer A Beer eample On a 6-point rating scale (0 to 5) Page 50 of 64
51 u u Cain Sims Guy Kinchen Schalbs Miller 3 RollRock Lowenbrau Coors Budweiser Table 7: Correspondence Analysis. Scoring Beers. The Original data. Page 5 of 64
52 Thermometer Coding: Use both sides of the scale First Idea: Use the ratings Not a very good idea! Use Thermometer Coding. u u Page 5 of 64
53 Thermometer Coding: The Leverage Principle u u Page 53 of 64
54 u u Cain Sims Guy Kinchen Schalbs Miller RollRock Lowenbrau Coors Budweiser Table 8: Correspondence Analysis. Scoring Beers. The data with thermometer coding. Page 54 of 64
55 u u 0. Guy $ $ Beer Ranks I and J Cain $ $ Coors Principal Component Number Sims $ $ Miller Schalbs $ $ Budweiser Kinchen $ $ RollRock Kinchen $+$ Schalbs $+$ Cain $+$ Guy $+$ Sims $+$ Lowenbrau Principal Component Number Figure 8: Ranking Beers: Correspondence Analysis. The Ugly plot: no frill at all! (Don t show that ecept to your close friends) Page 55 of 64
56 u u Coors Miller Budweiser RollRock Lowenbrau Page 56 of 64 Figure 9: Ranking Beers: Correspondence Analysis (no frill, but prettier). The Beer Plot
57 u u Guy Cain Schalbs Coors Sims Sims Miller Kinchen Budweiser Kinchen RollRock Schalbs Cain Guy Lowenbrau Figure 0: Ranking Beers: Correspondence Analysis. Beers and Raters (+ and -) Page 57 of 64
58 u u Guy Cain Schalbs Coors Sims Sims Miller Kinchen Budweiser Kinchen RollRock Schalbs Cain Guy Lowenbrau Figure : Ranking Beers: Correspondence Analysis. Beers and Raters (+ and -). Raters are lines Page 58 of 64
59 u u Guy Cain Schalbs Coors Sims Sims Miller Kinchen Budweiser Kinchen RollRock Schalbs Cain Guy Lowenbrau Figure : Ranking Beers: Correspondence Analysis. Projecting a Beer on a Rater. Page 59 of 64
60 u u Guy Cain Schalbs Coors Sims Sims Miller Kinchen Budweiser Kinchen RollRock Schalbs Cain Guy Lowenbrau Figure 3: Ranking Beers: Correspondence Analysis. Projecting a Beer on another Rater. Page 60 of 64
61 u u Guy Miller Cain Schalbs Coors Kinchen Sims Kinchen RollRock Lowenbrau Sims Budweiser Schalbs Cain Guy Page 6 of 64 Figure 4: Ranking Beers: Correspondence Analysis. More projections.
62 u u Guy Cain Schalbs Coors Sims Sims Miller Kinchen Budweiser Kinchen RollRock Schalbs Cain Guy Lowenbrau Figure 5: Ranking Beers: Correspondence Analysis. Even ore projections Page 6 of 64
63 u u F F ctr ctr m d cos cos Qual Cain Cain Sims Sims Guy Guy Kinchen Kinchen Schalbs Schalbs Page 63 of 64 Table 9: Correspondence Analysis. Beers. J-set (Raters). The Projections, Contributions, Squared distance to barycenter, and (Squared) Cosines
64 u u F F ctr ctr m d cos cos Qual Miller RollRock Lowenbrau Coors Budweiser Table 0: Correspondence Analysis. Beers: The I-set (Beers). The Projections, Contributions, Squared distance to barycenter, and Cosines Page 64 of 64
(Analyse des Correspondances (AFC) (Simple) Correspondence Analysis (CA) Hervé Abdi CNAM STA201
u u (Analyse des Correspondances (AFC) (Simple) Correspondence Analysis (CA) How to Analye the... Hervé Abdi CNAM-04. STA0 www.utdallas.edu/ herve herve@utdallas.edu Page of 85 . u u Refresher: Weight
More informationDiscriminant Correspondence Analysis
Discriminant Correspondence Analysis Hervé Abdi Overview As the name indicates, discriminant correspondence analysis (DCA) is an extension of discriminant analysis (DA) and correspondence analysis (CA).
More informationThe STATIS Method. 1 Overview. Hervé Abdi 1 & Dominique Valentin. 1.1 Origin and goal of the method
The Method Hervé Abdi 1 & Dominique Valentin 1 Overview 1.1 Origin and goal of the method is a generalization of principal component analysis (PCA) whose goal is to analyze several sets of variables collected
More informationHow to analyze multiple distance matrices
DISTATIS How to analyze multiple distance matrices Hervé Abdi & Dominique Valentin Overview. Origin and goal of the method DISTATIS is a generalization of classical multidimensional scaling (MDS see the
More informationPrincipal Component Analysis
(in press, 00). Wiley Interdisciplinary Reviews: Computational Statistics, Principal Component Analysis Hervé Abdi Lynne J. Williams Abstract Principal component analysis (pca) is a multivariate technique
More information1 Overview. 2 Multiple Regression framework. Effect Coding. Hervé Abdi
In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Effect Coding Hervé Abdi 1 Overview Effect coding is a coding scheme used when an analysis of variance (anova) is performed
More information1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams
In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Coefficients of Correlation, Alienation and Determination Hervé Abdi Lynne J. Williams 1 Overview The coefficient of
More informationMatrices and Determinants
Math Assignment Eperts is a leading provider of online Math help. Our eperts have prepared sample assignments to demonstrate the quality of solution we provide. If you are looking for mathematics help
More information1 Overview. Conguence: Congruence coefficient, R V -coefficient, and Mantel coefficient. Hervé Abdi
In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Conguence: Congruence coefficient, R V -coefficient, and Mantel coefficient Hervé Abdi 1 Overview The congruence between
More informationHow to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor...
The principal components method of extraction begins by finding a linear combination of variables that accounts for as much variation in the original variables as possible. This method is most often used
More informationRV Coefficient and Congruence Coefficient
RV Coefficient and Congruence Coefficient Hervé Abdi 1 1 Overview The congruence coefficient was first introduced by Burt (1948) under the name of unadjusted correlation as a measure of the similarity
More informationReview. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression
Ch. 3: Correlation & Relationships between variables Scatterplots Exercise Correlation Race / DNA Review Why numbers? Distribution & Graphs : Histogram Central Tendency Mean (SD) The Central Limit Theorem
More informationDEATHS. In 2011, Florida resident deaths increased to 172,856. This is a 0.2 percent increase from 2010.
Deaths DEATHS In 2011, Florida resident deaths increased to 172,856. This is a 0.2 percent increase from 2010. The overall resident death rate decreased slightly from 9.2 in 2010 to 9.1 per 1,000 population
More informationCHART D-1: RESIDENT DEATHS AND RATES PER 100,000 POPULATION, BY RACE AND GENDER, FLORIDA AND UNITED STATES, CENSUS YEARS AND
Deaths CHART D-1: RESIDENT DEATHS AND RATES PER 100,000 POPULATION, BY RACE AND GENDER, FLORIDA AND UNITED STATES, CENSUS YEARS 1970-2000 AND 2005-2015 WHITE BLACK YEAR TOTAL WHITE BLACK OTHER MALE FEMALE
More informationCorrespondence analysis
C Correspondence Analysis Hervé Abdi 1 and Michel Béra 2 1 School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA 2 Centre d Étude et de Recherche en Informatique
More informationMath 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore
Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Chapter 3 continued Describing distributions with numbers Measuring spread of data: Quartiles Definition 1: The interquartile
More information. In 2009, Florida resident deaths decreased to 169,854. This is a 0.4 percent decrease from 2008.
Deaths DEATHS. In 2009, Florida resident deaths decreased to 169,854. This is a 0.4 percent decrease from 2008.. The overall resident death rate per 1,000 population decreased 1.1 percent from 9.1 per
More informationMATH 227 CP 7 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
MATH 227 CP 7 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Find the mean, µ, for the binomial distribution which has the stated values of n and p.
More informationGeneralizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data
Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data Hervé Abdi, Derek Beaton, & Gilbert Saporta CARME 2015, Napoli, September 21-23 1 Outline
More informationMultiple Correspondence Analysis
Multiple Correspondence Analysis 18 Up to now we have analysed the association between two categorical variables or between two sets of categorical variables where the row variables are different from
More informationpsychological statistics
psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationCorrespondence Analysis
STATGRAPHICS Rev. 7/6/009 Correspondence Analysis The Correspondence Analysis procedure creates a map of the rows and columns in a two-way contingency table for the purpose of providing insights into the
More informationExample: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?
Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight? 16 subjects overfed for 8 weeks Explanatory: change in energy use from non-exercise activity (calories)
More informationDifferential Modeling for Cancer Microarray Data
Differential Modeling for Cancer Microarray Data Omar Odibat Department of Computer Science Feb, 01, 2011 1 Outline Introduction Cancer Microarray data Problem Definition Differential analysis Existing
More informationAudio transcription of the Principal Component Analysis course
Audio transcription of the Principal Component Analysis course Part I. Data - Practicalities Slides 1 to 8 Pages 2 to 5 Part II. Studying individuals and variables Slides 9 to 26 Pages 6 to 13 Part III.
More informationModule 03 Lecture 14 Inferential Statistics ANOVA and TOI
Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module
More informationAnalyzing Assessors and Products in Sorting Tasks: DISTATIS, Theory and Applications
Analyzing Assessors and Products in Sorting Tasks: DISTATIS, Theory and Applications Hervé Abdi a, a The University of Texas at Dallas Richardson, TX, 708 088, USA. Dominique Valentin Université de Bourgogne,
More informationChapter I: Introduction & Foundations
Chapter I: Introduction & Foundations } 1.1 Introduction 1.1.1 Definitions & Motivations 1.1.2 Data to be Mined 1.1.3 Knowledge to be discovered 1.1.4 Techniques Utilized 1.1.5 Applications Adapted 1.1.6
More information60% 50% 40% 30% 20% 10% 0%
Table 1. Number & Percentage of Students by IB Diploma-Seeking Status: N Percent N Percent N Percent Seeking Diploma 13 34% 11 34% 24 34% Seeking Certificate 8 21% 1 3% 9 13% Anticipated 17 45% 20 63%
More informationBiplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 11 Offprint. Discriminant Analysis Biplots
Biplots in Practice MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University Chapter 11 Offprint Discriminant Analysis Biplots First published: September 2010 ISBN: 978-84-923846-8-6 Supporting
More informationMultiple Decrement Models
Multiple Decrement Models Lecture: Weeks 7-8 Lecture: Weeks 7-8 (Math 3631) Multiple Decrement Models Spring 2018 - Valdez 1 / 26 Multiple decrement models Lecture summary Multiple decrement model - epressed
More informationResearch Methodology: Tools
MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 05: Contingency Analysis March 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide
More informationThe American School of Marrakesh. Algebra 2 Algebra 2 Summer Preparation Packet
The American School of Marrakesh Algebra Algebra Summer Preparation Packet Summer 016 Algebra Summer Preparation Packet This summer packet contains eciting math problems designed to ensure our readiness
More informationNON-PARAMETRIC STATISTICS * (http://www.statsoft.com)
NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) 1. GENERAL PURPOSE 1.1 Brief review of the idea of significance testing To understand the idea of non-parametric statistics (the term non-parametric
More informationDesigning Information Devices and Systems II Fall 2018 Elad Alon and Miki Lustig Homework 9
EECS 16B Designing Information Devices and Systems II Fall 18 Elad Alon and Miki Lustig Homework 9 This homework is due Wednesday, October 31, 18, at 11:59pm. Self grades are due Monday, November 5, 18,
More informationStatistics: revision
NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers
More informationPostestimation commands predict estat Remarks and examples Stored results Methods and formulas
Title stata.com mca postestimation Postestimation tools for mca Postestimation commands predict estat Remarks and examples Stored results Methods and formulas References Also see Postestimation commands
More informationClassical Test Theory. Basics of Classical Test Theory. Cal State Northridge Psy 320 Andrew Ainsworth, PhD
Cal State Northridge Psy 30 Andrew Ainsworth, PhD Basics of Classical Test Theory Theory and Assumptions Types of Reliability Example Classical Test Theory Classical Test Theory (CTT) often called the
More informationSTATISTICS Relationships between variables: Correlation
STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.
More informationMath 407: Probability Theory 5/10/ Final exam (11am - 1pm)
Math 407: Probability Theory 5/10/2013 - Final exam (11am - 1pm) Name: USC ID: Signature: 1. Write your name and ID number in the spaces above. 2. Show all your work and circle your final answer. Simplify
More informationInequalities and Multiplication
Lesson 3-6 Inequalities and Multiplication BIG IDEA Multipling each side of an inequalit b a positive number keeps the direction of the inequalit; multipling each side b a negative number reverses the
More informationClass President: A Network Approach to Popularity. Due July 18, 2014
Class President: A Network Approach to Popularity Due July 8, 24 Instructions. Due Fri, July 8 at :59 PM 2. Work in groups of up to 3 3. Type up the report, and submit as a pdf on D2L 4. Attach the code
More informationOverview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition
Overview INFOWO Statistics lecture S1: Descriptive statistics Peter de Waal Introduction to statistics Descriptive statistics Department of Information and Computing Sciences Faculty of Science, Universiteit
More informationSupplementary Tables and Figures
Supplementary Tables Supplementary Tables and Figures Supplementary Table 1: Tumor types and samples analyzed. Supplementary Table 2: Genes analyzed here. Supplementary Table 3: Statistically significant
More informationGetting To Know Your Data
Getting To Know Your Data Road Map 1. Data Objects and Attribute Types 2. Descriptive Data Summarization 3. Measuring Data Similarity and Dissimilarity Data Objects and Attribute Types Types of data sets
More informationChapter 6 Scatterplots, Association and Correlation
Chapter 6 Scatterplots, Association and Correlation Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours 12 10 5 3 15 16 8 Grade 70
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationDistribution of GCSE Grades Summer 2016
Distribution of GCSE Grades Summer 2016 Entries A* A B C D E F G U X Art and Design F 62 3 12 25 16 6 1 1 0 0 0 M 28 0 1 3 10 11 1 1 0 0 0 All 90 3 13 28 26 17 2 1 0 0 0 Business Studies F 44 0 2 9 14
More informationTypes of Statistical Tests DR. MIKE MARRAPODI
Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample
More informationPubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH CROSS-OVER EXPERIMENT DESIGNS
PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH CROSS-OVER EXPERIMENT DESIGNS Designed experiments are conducted to demonstrate a cause-and-effect relation between one or more explanatory factors
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific
More informationFactor Analysis. Qian-Li Xue
Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale
More informationData Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Chapter 2 1 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign Simon Fraser University 2011 Han, Kamber, and Pei. All rights reserved.
More informationChapter (3) Describing Data Numerical Measures Examples
Chapter (3) Describing Data Numerical Measures Examples Numeric Measurers Measures of Central Tendency Measures of Dispersion Arithmetic mean Mode Median Geometric Mean Range Variance &Standard deviation
More informationANÁLISE DOS DADOS. Daniela Barreiro Claro
ANÁLISE DOS DADOS Daniela Barreiro Claro Outline Data types Graphical Analysis Proimity measures Prof. Daniela Barreiro Claro Types of Data Sets Record Ordered Relational records Video data: sequence of
More informationReview #1. 4) the set of odd natural numbers less than 19 A) {1, 3, 5,..., 17} B) {0, 1, 3, 5,..., 17} C) {2, 4, 6,..., 18} D) {1, 3, 5,...
Prestatistics Review #1 List the elements in the set. 1) The set of the days of the week A) {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Sunday} B) {Tuesday, Thursday} C) {Friday, Monday, Saturday,
More informationThe empirical ( ) rule
The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%
More informationDISTATIS: The Analysis of Multiple Distance Matrices
DISTATIS: The Analysis of Multiple Distance Matrices Hervé Abdi The University of Texas at Dallas Alice J O Toole The University of Texas at Dallas Dominique Valentin Université de Bourgogne Betty Edelman
More informationAP Statistics S C A T T E R P L O T S, A S S O C I A T I O N, A N D C O R R E L A T I O N C H A P 6
AP Statistics 1 S C A T T E R P L O T S, A S S O C I A T I O N, A N D C O R R E L A T I O N C H A P 6 The invalid assumption that correlation implies cause is probably among the two or three most serious
More informationNumber of Deaths to Residents Occurring in EACH County. Residing in Each County
Mortality - Table 1. MN Deaths by County of Occurrence Distributed by Place of Residence; MN Resident Deaths Distributed by County of Occurrence, 2016. Number of Deaths Number of Deaths to Residents Occurring
More informationThe concord Package. August 20, 2006
The concord Package August 20, 2006 Version 1.4-6 Date 2006-08-15 Title Concordance and reliability Author , Ian Fellows Maintainer Measures
More information1. Types of Biological Data 2. Summary Descriptive Statistics
Lecture 1: Basic Descriptive Statistics 1. Types of Biological Data 2. Summary Descriptive Statistics Measures of Central Tendency Measures of Dispersion 3. Assignments 1. Types of Biological Data Scales
More informationTypes of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511
Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationCorrespondence Analysis & Related Methods
Correspondence Analysis & Related Methods Michael Greenacre SESSION 9: CA applied to rankings, preferences & paired comparisons Correspondence analysis (CA) can also be applied to other types of data:
More informationEQ: What is a normal distribution?
Unit 5 - Statistics What is the purpose EQ: What tools do we have to assess data? this unit? What vocab will I need? Vocabulary: normal distribution, standard, nonstandard, interquartile range, population
More informationHow do scientists build something so small? Materials 1 pkg of modeling materials 1 piece of butcher paper 1 set of cards 1 set of markers
Using Modeling to Demonstrate Self-Assembly in Nanotechnology Imagine building a device that is small enough to fit on a contact lens. It has an antennae and a translucent screen across the pupil of the
More informationPrincipal Component Analysis for Mixed Quantitative and Qualitative Data
Principal Component Analysis for Mixed Quantitative and Qualitative Data Susana Agudelo-Jaramillo Manuela Ochoa-Muñoz Tutor: Francisco Iván Zuluaga-Díaz EAFIT University Medelĺın-Colombia Research Practise
More informationMeasures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz
Measures of Central Tendency and their dispersion and applications Acknowledgement: Dr Muslima Ejaz LEARNING OBJECTIVES: Compute and distinguish between the uses of measures of central tendency: mean,
More informationA Weighted Score Derived from a Multiple Correspondence
Int Statistical Inst: Proc 58th World Statistical Congress 0 Dublin (Session CPS06) p4 A Weighted Score Derived from a Multiple Correspondence Analysis Solution de Souza Márcio L M Universidade Federal
More informationCh. 3 Review - LSRL AP Stats
Ch. 3 Review - LSRL AP Stats Multiple Choice Identify the choice that best completes the statement or answers the question. Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable lumber
More informationElimination Method Streamlined
Elimination Method Streamlined There is a more streamlined version of elimination method where we do not have to write all of the steps in such an elaborate way. We use matrices. The system of n equations
More informationMALLOY PSYCH 3000 MEAN & VARIANCE PAGE 1 STATISTICS MEASURES OF CENTRAL TENDENCY. In an experiment, these are applied to the dependent variable (DV)
MALLOY PSYCH 3000 MEAN & VARIANCE PAGE 1 STATISTICS Descriptive statistics Inferential statistics MEASURES OF CENTRAL TENDENCY In an experiment, these are applied to the dependent variable (DV) E.g., MEASURES
More informationLinear Algebra Practice Problems
Math 7, Professor Ramras Linear Algebra Practice Problems () Consider the following system of linear equations in the variables x, y, and z, in which the constants a and b are real numbers. x y + z = a
More informationFinding Relationships Among Variables
Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis
More informationCorrespondence Analysis
Correspondence Analysis Julie Josse, François Husson, Sébastien Lê Applied Mathematics Department, Agrocampus Ouest user-2008 Dortmund, August 11th 2008 1 / 22 History Theoretical principles: Fisher (1940)
More informationMath 10 Chapter 6.10: Solving Application Problems Objectives: Exponential growth/growth models Using logarithms to solve
Math 10 Chapter 6.10: Solving Application Problems Objectives: Exponential growth/growth models Using logarithms to solve Exponential Growth Models Steps for Solving Application Problems: 1. Read, throw
More informationLESSON 35: EIGENVALUES AND EIGENVECTORS APRIL 21, (1) We might also write v as v. Both notations refer to a vector.
LESSON 5: EIGENVALUES AND EIGENVECTORS APRIL 2, 27 In this contet, a vector is a column matri E Note 2 v 2, v 4 5 6 () We might also write v as v Both notations refer to a vector (2) A vector can be man
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationH.Algebra 2 Summer Review Packet
H.Algebra Summer Review Packet 1 Correlation of Algebra Summer Packet with Algebra 1 Objectives A. Simplifing Polnomial Epressions Objectives: The student will be able to: Use the commutative, associative,
More informationDescriptive Statistics and Visualizing Data in STATA
Descriptive Statistics and Visualizing Data in STATA BIOS 514/517 R. Y. Coley Week of October 7, 2013 Log Files, Getting Data in STATA Log files save your commands cd /home/students/rycoley/bios514-517
More information2 Descriptive Statistics
2 Descriptive Statistics Reading: SW Chapter 2, Sections 1-6 A natural first step towards answering a research question is for the experimenter to design a study or experiment to collect data from the
More informationChapter 19. Agreement and the kappa statistic
19. Agreement Chapter 19 Agreement and the kappa statistic Besides the 2 2contingency table for unmatched data and the 2 2table for matched data, there is a third common occurrence of data appearing summarised
More informationData Mining 4. Cluster Analysis
Data Mining 4. Cluster Analysis 4.2 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables
More informationCalculus with business applications, Lehigh U, Lecture 05 notes Summer
Calculus with business applications, Lehigh U, Lecture 0 notes Summer 0 Trigonometric functions. Trigonometric functions often arise in physical applications with periodic motion. They do not arise often
More informationAlgebra 1. Standard Linear Functions. Categories Graphs Tables Equations Context. Summative Assessment Date: Friday, September 14 th.
Algebra 1 Standard Linear Functions Categories Graphs Tables Equations Contet Summative Assessment Date: Friday, September 14 th Page 1 Page 2 Page 3 Linear Functions DAY 1 Notesheet Topic Increasing and
More informationOnline supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population
Online supplement Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in Breathlessness in the General Population Table S1. Comparison between patients who were excluded or included
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationModeling Spatial Relationships using Regression Analysis
Esri International User Conference San Diego, CA Technical Workshops July 2011 Modeling Spatial Relationships using Regression Analysis Lauren M. Scott, PhD Lauren Rosenshein, MS Mark V. Janikas, PhD Answering
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationClass Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement
Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement Introduction to Structural Equation Modeling Lecture #1 January 11, 2012 ERSH 8750: Lecture 1 Today s Class Introduction
More informationChapter 4: Factor Analysis
Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.
More informationvalue mean standard deviation
Mr. Murphy AP Statistics 2.4 The Empirical Rule and z - Scores HW Pg. 208 #4.45 (a) - (c), 4.46, 4.51, 4.52, 4.73 Objectives: 1. Calculate a z score. 2. Apply the Empirical Rule when appropriate. 3. Calculate
More informationCorrespondence Analysis of Longitudinal Data
Correspondence Analysis of Longitudinal Data Mark de Rooij* LEIDEN UNIVERSITY, LEIDEN, NETHERLANDS Peter van der G. M. Heijden UTRECHT UNIVERSITY, UTRECHT, NETHERLANDS *Corresponding author (rooijm@fsw.leidenuniv.nl)
More informationMATH 446/546 Test 2 Fall 2014
MATH 446/546 Test 2 Fall 204 Note the problems are separated into two sections a set for all students and an additional set for those taking the course at the 546 level. Please read and follow all of these
More information# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance
Practice Final Exam Statistical Methods and Models - Math 410, Fall 2011 December 4, 2011 You may use a calculator, and you may bring in one sheet (8.5 by 11 or A4) of notes. Otherwise closed book. The
More informationSpring /11/2014
MA 138 Calculus 2 for the Life Sciences SECOND MIDTERM Spring 2014 3/11/2014 Name: Sect. #: Answer all of the following questions. Use the backs of the question papers for scratch paper. No books or notes
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More information