Perceptual Mapping Based on Three-Way Binary Data
|
|
- Jasmin Watson
- 5 years ago
- Views:
Transcription
1 Perceptual Mapping Based on Three-Way Binary Data B. Kemal Büyükkurt, Concordia University, Montreal, Canada & Pieter M. Kroonenberg, Leiden University, The Netherlands
2 Perceptual Mapping - Perceptual mapping: Graphical display summarizing consumers perceptions of multi-attribute objects. Example: Displaying brands in a product class together with their attributes e.g. brands for treating stomach problems. Brunswik s (955) Lens Model: Theoretical foundation for understanding the importance of perceptions in consumer purchases Perceptions Preferences Choice
3 Perceptual Mapping - 2 Goals perceptual mapping Aid for strategic marketing decisions Summarizing nature and degree of competition among a set brands via key product attributes. Common application areas product positioning identification of market gaps for new product development.
4 Perceptual Mapping - 3 Basic data Brands are scored on a number of attributes by several individuals Scores averaged over individuals Result: Brand by Attribute matrix Common data analysis techniques correspondence analysis Brands principal component analysis, multidimensional analysis discriminant analysis Attributes factor analysis Brands Average Attributes Individuals
5 Basic data Perceptual Mapping - 4 Doctor thinks a brand posesses an attribute => score =, if not: score = 0 Three-way binary data: Brands Attributes Doctors Why average over doctors? Different doctors may be sensitive to different attributes Three-way data analysis techniques Three-mode binary hierarchical cluster analysis Three-mode principal component analysis (numerical)
6 The Binary Data Cube j=,...,j Variables (Attributes) MODE B i=,...,i Objects (Brands) MODE A k=,...,k Subjects (Doctors) MODE C Fibers Slices
7 Stacked Two-Way Data Columns: Attributes through J Doctor (k=) Doctor 2 (k=2) Rows: Brands through I Rows: Brands through I Doctor K (k=k) Rows: Brands through I
8 HICLAS3: Algebraic Representation (Tucker3-HICLAS) Hiclas3 model (uses Boolean algebra) P Q R ~ x ˆ = m = a~ b c~ g~ ijk ijk p = q = r = ip jq kr pqr m ijk = iffã ip = and b jq = and c kr = and g pqr = for at least one combination of p, q, and r; ã ip, b jq, c kr : elements binary component matrices A, B, and C, respectively (brands, attributes, doctors). : element of the P Q R three-way binary core array G, indicates links between binary components of the three modes g pqr
9 HICLAS3 Pictorial Representation A B G C G m 2 = as a 22 b 2 c g 22 = (all other 7 combinations contain a zero) brands attributes doctors core array c2 b b2 a a2 a a2 b b2 c
10 Three-Mode Component Analysis Tucker3 model (numerical) xˆ ijk = m ijk = P Q R p = q = r = a ip b jq c kr g pqr i=,...,i (brands); j=,...,j (attributes); k=,...,k (doctors); m ijk is the model matrix or structural image a ip, b jq, c kr : elements loading matrices A, B, and C, respectively (brands, attributes, doctors). g pqr : element of the P Q R three-way core array G; indicates strength of the link between the components of the three modes
11 Three-Mode Binary Analysis in Action Perceptions of Medical Doctors w.r.t. Gastro-Intestinal Drugs
12 Perceptions of Medical Doctors Gastro-Intestinal Drugs Tagamet Zantac Pepcid Axid Sulcrate Cytotec Losec
13 Attributes Adjectives [Binary answers no (0) or yes ()] Relieves Pain RelPain Does not have serious side effects NoSideEf Relatively safe w.r.t. potential interactions with other drugs Safe Flexible in terms of dosage FlexDose Not too costly for the patient LowCost Relieves symptoms RelSymptoms Promotes healing Heals Prophylactic Prophylactic
14 Data: Brands Attributes Doctors ( ) j=,...,8 Variables (Attributes) MODE B i=,...,7 Objects (Brands) MODE A k=,...,283 Subjects (Doctors) MODE C
15 Perceptions of Medical Doctors Central questions What is the position of brands w.r.t. each other? Which attributes are related to this positioning? Do doctors differ in their perceptions in which brands have which attributes?
16 HiClas3 Model Tucker3 hierarchical classes model Basic elements Binary components for all three modes (doctors, brands and attributes) Plus linkage information about the components Basic literature Papers by Ceulemans, Van Mechelen in Psychometrika (Catholic University Leuven, Belgium)
17 Number of discrepancies HiClas3 Choosing a Model,, 2,2, 3,3, Brands Attributes Doctors 2,,2,2,2 2,2,2 3,2,2 2,3,2 3,3, Model complexity: (3,3,2) = (Brands = 3 components ; Attr = 3; Docs = 2) Discrepancy : Data have a, model matrix a 0 and vice versa 3,,3 2,2,3,3,3 3,2,3 2,3,3 3,3,3 Degrees of freedom
18 Binary Component Matrices (brands; attributes) B = Cytoprotective agent B2 = Tagamet (Oldest) B3 = Histamines; H-2 blocker Brand Discre- pancies Fit B B2 B Sulcrate Cytotec Zantac Pepcid Axid Losec Tagamet Attribute Discre- pancies Fit A A2 A Relieves Pain Relieves Symptoms Promotes Health No Side Effects Relatively Safe Flexible Dose Prophylatic Low Cost A = Primary medical A2 = Use in practice A3 = Secondary medical Low Cost had no relations with other attributes
19 Binary Component Matrices (doctors) Doctors MD MD2 f Prop. s Doctor Type Doctor Type Doctor Type Doctor Type Average sd =.09 Doctor Type 4 (0,0) has no links with other doctors
20 Dr2 ( 0) Binary Core Array A A2 A3 Prim. Prac Secon. Med. tice Med B (Cytoprotective Cytoprotective) 0 0 B2 (Tagamet ) 0 B3 (Histamines ) Dr3 (0 ) B (Cytoprotective Cytoprotective) 0 B2 (Tagamet ) 0 0 B3 (Histamines ) : a link exists between components of the three modes Dr = Dr2 + Dr3
21 Doctor Type 2 Sulcrate Zantac, Axid, Pepcid, Losec Cytotec Tagamet n = 69 No side effects Safe Prophylactic Flexible dose Low cost Relieves pain and symptoms, Promotes health A
22 Doctor Type 3 Sulcrate Zantac, Axid, Pepcid, Losec Cytotec Tagamet n = 98 No side effects Safe Prophylactic Flexible dose Low cost Relieves pain and symptoms, Promotes health
23 Doctor Type Sulcrate Zantac, Axid, Pepcid, Losec Cytotec Tagamet n = 70 No side effects Safe Prophylactic Flexible dose Low cost Relieves pain and symptoms, Promotes health A
24 Doctor Types Sulcrate Zantac, Axid, Pepcid, Losec Cytotec Tagamet Dr2 Low cost Dr3 No side effects Safe Prophylactic Flexible dose Dr A Relieves pain and symptoms, Promotes health
25 Characterisation of Doctor Types Brand Name Dr 2 (n=69) Dr 3 (n=98) Dr (n=70) Zantac a0, a a0,,a2, a3 a0, a, a2, a3 Axid a0, a a0,,a2, a3 a0, a, a2, a3 Pepcid a0, a a0,,a2, a3 a0, a, a2, a3 Losec a0, a a0,,a2, a3 a0, a, a2, a3 Cytotec a0, a a0, a,,a3 a0, a, XX,a3 Sulcrate a0, a a0, a, a2, a3 a0, a, a2, a3 Tagamet a0, a, a2 a0,,a2 a0, a, a2,xx a0={relieves Pain, Relieves Symptoms, Promotes Healing} a={prophylactic}, a2={flexible Dosage}, a3={no Side Effects, Safe} Doctor Type 4 has no links; Low Cost has no links
26 Further Considerations No information on Low Cost (more complex HiClas models can model a separate component for Low Cost) Tagamet is relatively inexpensive, while the others are not Don t the doctors see this? HiClas3 suggest they do not.
27 Proportions of Ones across Doctors Tagamet Zantac PepCid Axid Losec Sulcrate Cytotec RelievePain RelieveSymptoms PromotesHealth NoSideEffect RelativeSafe FlexbileDose Prophylactic LowCost
28 Further Considerations Surprise Tagamet is the only relatively inexpensive brand Possible reason: Doctors from all groups say Tagamet is not expensive. Thus unrelated to the present groups. Possible solution: More groups for attributes (we are working on this) Question Other variability not present in HiClas solution?
29 Further Analyses Treat the binary data as numerical and analyse with Tucker3. Handle the data such that emphasis is on: relative differences between brands relative differences between attributes.
30 Three-Mode Component Analysis Concentrate on consensus and individual differences between doctors in the relationships between brands and attributes. Absolute differences between brand and between attributes are ignored. 0 M 5 A B C D E 5 M 0 A B C D E -2 M +2 A B C D E -2 M +2 A B C D E
31 Component Scores * * * Consensus among doctors Tucker3_332 Subject Component Individual differences between doctors Tucker3_332 Component 2
32 Joint Biplot (Consensus among doctors - Mean) Second Component 2 0 Tagamet LowCost Losec RelPain RelSymptoms Heals Pepcid Axid Zantac NoSideEff FlexDose Safe - -2 Mean of each brand and each attribute -3 Cytotec Prophylactic Sulcrate First Component
33 Joint Biplot (Individual differences between doctors - Deviations from mean) Second Component Cytotec Safe Losec Sulcrate Heals Relieves Relieves Pain Symptoms NoSideEff FlexDose Pepcid Zantac Axid Prophylatic LowCost -2 Tagamet First Component
34 Conclusions - HiClas model Given the data are binary, the binary hierarchical classes model is an obvious analysis method and has a relatively straightforward interpretation. Effective graphics to display results Many components might be necessary to model all systematic variability present.
35 Tucker3 model Conclusions - 2 By using a numerical model variance can be portrayed in a different and also insightful manner Differential weighting may simplify model description Enlightning graphics are available (joint biplots), but it requires some training to understand them
36 Conclusions - 3 Substantive conclusions concern the perceptual mappings of the brands with respect to the attributes as seen by the doctors. The main patterns have been discussed during the presentation and will not be repeated.
37 Thank You
38 Tucker3 Model in Matrix Notation X = AG ( B ' C ' ) + ε A, (I P) loadings matrix for brands B, (J Q) loadings matrix for attributes C, (K R) loadings matrix for subjects G, (P Q R) core array with links between the components
39 PARAFAC/CANDECOMP Model: xˆ ijk = m ijk = s = (Harshman 970, 976; Harshman and Lundy 984, 994; Carroll and Chang 970) Based on the principle of Parallel Proportional Profiles (Cattell 944). S g sss m ijk is the model matrix or structural image A is the (I S) loadings matrix for brands B is the (J S) loadings matrix for attributes C is the (K S) loadings matrix for subjects G is the (S S S) superdiagonal core array exclusive links between the components s of the three modes a is b js c ks
40 Three-Mode Components Analysis: Model Comparison MODELS NUMBER OF COMPONENTS STANDARDIZED Number St.Fit/#Param A B C SS of (x000) Param. TUCKALS TUCKALS TUCKALS TUCKALS TUCKALS TUCKALS TUCKALS TUCKALS TRILIN TUCKALS TUCKALS TRILIN TUCKALS COMPUTATION OF NUMBER OF PARAMETERS: A + B + C + core - transformational freedom TUCKALS2: I*P + J*Q + + P*Q*K - P**2 - Q**2 TUCKALS3: I*P + J*Q + K*R + P*Q*R - P**2 - Q**2 - R**2 PARAFAC : I*S + J*S + K*S + S - S - S - S
41 Varimax Rotation: Deciding On Weights Relative Weights Varimax Value A B C Core A B C unrotated
42 Joint biplot for Brands and Attrubutes First versus Third Component for Second Component of Doctors.5 Cytotc PromHl Sulcrt RlvPn RlvSym Third Component Losec RelSaf Axid PepCid Zantac NoSiEf Tagamt NotCst FlxDoz Prophy /05/08 0:48:7 First Component
43 Components for Brands and Attributes Mode Unrotated Components (Orthonormal) Components After Varimax Rotation of the Core Matrix Components After Joint Varimax Rotation of Components and the Core Brands: A G D E B C F Attributes: Inexp NoSiEf Safe Prophy RelPain FlexDo RelSym Heals
44 Core Array Components for Brands: Frontal Slice : Unrotated Varimax Rotation of the Core Only Components for Attributes Joint Varimax Rotation of the Components and the Core Frontal Slice 2:
45 Assessment of Goodness of Model Fit: Kroonenberg and De Leeuw (980), and Kroonenberg (983) show that SS(Residual) = SS (Total) - SS(Fit) SS Accounted For = SS(Fit) / SS (Total) Also, as it has been shown by Ten Berge, De Leeuw, and Kroonenberg (987), when the ALS algorithm has converged, SS (Residual m ) = SS (Total m ) - SS(Fit m ) where m stands for any level of any mode of the data matrix. Using the last relationship, the relative fit of individual levels of a mode can be established. Also, whether a given level fits the model well or badly can be determined.
46 Model selection Tucker3 model Deviance versus Sum of Numbers of Components (Three-Mode Scree Plot) 400 Deviance versus Sum of Numbers of Components Deviance (SS(Residual)) xx 2xx2 x2x2 2x2x 2x2x2 3xx3 3x3x x3x3 2x3x2 3x2x2 2x2x3 3x3x2 3x2x3 2x3x3 3x3x /05/08 09:58:28 Sum of Numbers of Components (S = P + Q + R)
47 Tucker 3 Solutions Raw SS Standardized SS SS(Total) A.EST.SS(Fit) B.EST.SS(Fit) C.EST.SS(Fit) SS(Fit) SS(Residual) DF = Number of data points (minus loss of information due to preprocessing or missing data) minus the number of independent parameters Number of independent parameters = (I*P) + (J*Q) + (K*R) + (P*Q*R) - P**2 - Q**2 - R**2 with I, J, K the numbers of levels of st, 2nd, and 3rd modes, respectively, and P, Q, R the numbers of components of st, 2nd, and 3rd modes, respectively.
48 Relating Subject Components to External Variables Y: Number of years of experience as a medical doctor (standardized) X : First component score for subjects mode, X 2 : Second component score for subjects mode, Linear Model: Y = X B + X 2 B 2 Estimates: B = -.054, std. error = 0.997, t=.058, p-value=0.29 B 2 =.350, std. error = 0.997, t =.345, p-value=0.8 R 2 =0.0, F-value=.477, df=(2, 282), p-value=0.23. Conclusion: Subject components are not related to number of years of experience as a medical doctor.
49 Relating Residuals to External Variables: Y: Number of years of experience as a medical doctor (standardized) X: Sum of squares of residuals for each subject Linear Regression: Y = B X Estimated B = , R 2 = , F-value = 0.22, d.f. = (, 282) p-value = Conclusion : Residuals are not related to number of years of experience.
50 Result HiClas3-model (2D 3A 3B) Sulcrate Zantac, Axid, Pepcid, Losec Cytotec Tagamet D,D3 D,D3 D,D3 D,D2 D,D2 D4 No side effects Prophylactic Flexible dose Safe Relieves pain and symptoms, Promotes health D ( ) = 70 D2 ( 0) = 69 D3 (0 ) = 98 D4 (0 0) = 46 Low cost
51 HICLAS3: Example (Tucker3-HICLAS) Stimuli (0) Ignored in restaurant; Disconnecting operator; Closing store; Missing page in book; Accused by instructor; People tell lies about you; Persistently contradicted; Unfairly blamed for error () P P3 P2 P3 P P3 Subjects Response Grimace (00) Hands tremble; (00) Perspire; Want to strike Turn away; Lose patience; Feel irritated; Curse () Become enraged; Become tense; Heart beats faster (0) Based on example Leuven group
52 Number of discrepancies HiClas3 Three-mode scree plot Doctors Attributes Brands,, 2,2, 2,,2,2,2 3,3, ,,3,3,3 2,2,2 2,2,3 2,3,2 3,2,2 3,3, 2,3,3 3,2,3 3,3,2 Sum of Numbers of Components (S = P + Q + R) **
53 Preprocessing: Double Centring (in three-mode component analysis) Double centring: doctors may not use the attrubutes uniformly across the brands and across the attributes. Double centring: Scores in deviations from brand means attribute means. x. i x ḳ jk Origin = zero point for both brands and attributes of each subject s scores.i.i Brands Doctors k.k...j...j Attributes * x ijk = x ijk x. jk x i. k + x.. k jk ik I x K matrix of means removed J x K matrix of means removed
54 Preprocessing: Double centring Three-way factorial design without replacement ( observation per cell): Dependent variable: Brand possesses attribute (score = ) x = m + a + b + c + ac + bc + ab + ijk i j k ik jk ij abc ijk After centring: x = m + a + b + c + ac + bc + ab + ijk i j k ik jk ij abc ijk Analysed with Three-mode PCA ab ij = consensus of doctors about attributes of brands abc ijk = differences between doctors about attributes of brands
55 Model selection Tucker3 model Deviance versus Degrees-of-Freedom 400 Deviance (SS(Residual)) x3x3 3xx3 3x3x 3x2x2 3x2x3 3x3x2 2x2x32x2x2 2x3x2 2x3x3 2x2x 2xx2 x2x2 x3x3 xx Degrees-of-Freedom Model complexity: (Docs = 2; Attr = 3; Brands = 3) or (Docs = 3; Attr = 3; Brands = 3)
56 Third Component LowCost Tagamet Joint biplot (Consensus) Cytotec Heals RelSymptoms RelPain Prophylactic Losec Axid FlexDose Sulcrate NoSideEff Pepcid Zantac Safe First Component
57 Joint biplot (Consensus) Second Component Tagamet LowCost Losec RelPain RelSymptoms Heals PepCid AxidZantac NoSideEff FlexDose Safe Sulcrate Cytotec Third Component LowCost Tagamet Prophylactic Cytotec Losec Heals Safe RelSymptoms RelPain Sulcrate NoSideEff Pepcid AxidZantac FlexDose -3 Prophylactic
58 Conclusions Where to go from here Irregular patterns in some doctors combined with low number of ones were excluded from the HiClas analysis while these doctors were scattered all over the plot of the doctors components. Thus Tucker analysis picked up some information which was not available to the HiClas analysis. Similarly for the LowCost attribute. Is the numerical information such as the variance somewhere to be found in the HiClas results and if so can it be used? Construct exactly fitting hierarchical classes models and run a Tucker3 analysis on them. Construct doctors/attributes/brands artificially according to a specific pattern and include them in the analysis to facilitate interpretation. Sort out the mathematics of the comparison between models.
59 HiClas3 Three-Mode Deviance Plot Number of discrepancies ,,,2,2,3,3 Doctors Attributes Brands 2,,2 2,2, 2,2,2 2,2,3 2,3, Model complexity: (Docs = 2; Attr = 3; Brands = 3) or (Docs = 3; Attr = 3; Brands = 3) 2,3,3 3,,3 3,2,2 3,2,3 3,3, 3,3,2 3,3,3 Degrees of freedom **
60 Component Scores Individual differences between doctors Tucker3_332 Component HiClas3_332 Consensus among doctors = Dr 0 = Dr2 0 = Dr3 00 = Dr Tucker3_332 Subject Component.2
61 Gastro SVD-Biplot (means across doctors - attributes centred) Second Component NoSiEf RelSaf RlvPn RlvSym PromHl Losec Zantac PepCid FlxDoz Axid Tagamet NotCst Sulcrt Prophy Cytotc First Component Third Component Losec Cytotc RelSaf PepCid Axid PromHl NoSiEf Sulcrt Zantac Prophy RlvSym RlvPn FlxDoz Tagamet NotCst First Component 24/06/08 3:4:59
Two- and three-way component models for LR fuzzy data in a possibilistic framework
Two- and three-way component models for LR fuzzy data in a possibilistic framework Paolo Giordani Department of Statistics, Probability and Applied Statistics University of Rome La Sapienza P.le Aldo Moro,
More informationThe family of hierarchical classes models: A state-of-the-art overview. Iven Van Mechelen, Eva Ceulemans, Iwin Leenen, and Jan Schepers
The family of hierarchical classes models: state-of-the-art overview Iven Van Mechelen, Eva Ceulemans, Iwin Leenen, and Jan Schepers University of Leuven Iven.VanMechelen@psy.kuleuven.be Overview of the
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 6: Some Other Stuff PD Dr.
More informationMultilevel Component Analysis applied to the measurement of a complex product experience
Multilevel Component Analysis applied to the measurement of a complex product experience Boucon, C.A., Petit-Jublot, C.E.F., Groeneschild C., Dijksterhuis, G.B. Outline Background Introduction to Simultaneous
More informationSleep data, two drugs Ch13.xls
Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationSOME ADDITIONAL RESULTS ON PRINCIPAL COMPONENTS ANALYSIS OF THREE-MODE DATA BY MEANS OF ALTERNATING LEAST SQUARES ALGORITHMS. Jos M. F.
PSYCHOMETRIKA--VOL. 52, r~o. 2, 183-191 JUNE 1987 SOME ADDITIONAL RESULTS ON PRINCIPAL COMPONENTS ANALYSIS OF THREE-MODE DATA BY MEANS OF ALTERNATING LEAST SQUARES ALGORITHMS Jos M. F. TEN BERGE UNIVERSITY
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationPart I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with
More informationWritten Exam (2 hours)
M. Müller Applied Analysis of Variance and Experimental Design Summer 2015 Written Exam (2 hours) General remarks: Open book exam. Switch off your mobile phone! Do not stay too long on a part where you
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,
More informationLECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS
LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal
More informationLinear Algebra (Review) Volker Tresp 2018
Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c
More informationPrincipal Components Analysis. Sargur Srihari University at Buffalo
Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2
More information. Example: For 3 factors, sse = (y ijkt. " y ijk
ANALYSIS OF BALANCED FACTORIAL DESIGNS Estimates of model parameters and contrasts can be obtained by the method of Least Squares. Additional constraints must be added to estimate non-estimable parameters.
More informationIntroduction to Tensors. 8 May 2014
Introduction to Tensors 8 May 2014 Introduction to Tensors What is a tensor? Basic Operations CP Decompositions and Tensor Rank Matricization and Computing the CP Dear Tullio,! I admire the elegance of
More informationSwarthmore Honors Exam 2012: Statistics
Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may
More informationAnalysis of variance
Analysis of variance 1 Method If the null hypothesis is true, then the populations are the same: they are normal, and they have the same mean and the same variance. We will estimate the numerical value
More informationFactorial Within Subjects. Psy 420 Ainsworth
Factorial Within Subjects Psy 40 Ainsworth Factorial WS Designs Analysis Factorial deviation and computational Power, relative efficiency and sample size Effect size Missing data Specific Comparisons Example
More informationDimensionality Reduction Techniques (DRT)
Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,
More informationSTATISTICS Relationships between variables: Correlation
STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.
More informationLecture 3: Analysis of Variance II
Lecture 3: Analysis of Variance II http://www.stats.ox.ac.uk/ winkel/phs.html Dr Matthias Winkel 1 Outline I. A second introduction to two-way ANOVA II. Repeated measures design III. Independent versus
More informationJOS M.F. TEN BERGE SIMPLICITY AND TYPICAL RANK RESULTS FOR THREE-WAY ARRAYS
PSYCHOMETRIKA VOL. 76, NO. 1, 3 12 JANUARY 2011 DOI: 10.1007/S11336-010-9193-1 SIMPLICITY AND TYPICAL RANK RESULTS FOR THREE-WAY ARRAYS JOS M.F. TEN BERGE UNIVERSITY OF GRONINGEN Matrices can be diagonalized
More informationy = µj n + β 1 b β b b b + α 1 t α a t a + e
The contributions of distinct sets of explanatory variables to the model are typically captured by breaking up the overall regression (or model) sum of squares into distinct components This is useful quite
More informationA Hierarchical Perspective on Lee-Carter Models
A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel
More informationPrincipal Component Analysis & Factor Analysis. Psych 818 DeShon
Principal Component Analysis & Factor Analysis Psych 818 DeShon Purpose Both are used to reduce the dimensionality of correlated measurements Can be used in a purely exploratory fashion to investigate
More information1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables
1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More information28. SIMPLE LINEAR REGRESSION III
28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationAMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression
AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number
More informationStat 217 Final Exam. Name: May 1, 2002
Stat 217 Final Exam Name: May 1, 2002 Problem 1. Three brands of batteries are under study. It is suspected that the lives (in weeks) of the three brands are different. Five batteries of each brand are
More informationM M Cross-Over Designs
Chapter 568 Cross-Over Designs Introduction This module calculates the power for an x cross-over design in which each subject receives a sequence of treatments and is measured at periods (or time points).
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationThreshold models with fixed and random effects for ordered categorical data
Threshold models with fixed and random effects for ordered categorical data Hans-Peter Piepho Universität Hohenheim, Germany Edith Kalka Universität Kassel, Germany Contents 1. Introduction. Case studies
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationDIMENSION REDUCTION AND CLUSTER ANALYSIS
DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833
More informationThird-Order Tensor Decompositions and Their Application in Quantum Chemistry
Third-Order Tensor Decompositions and Their Application in Quantum Chemistry Tyler Ueltschi University of Puget SoundTacoma, Washington, USA tueltschi@pugetsound.edu April 14, 2014 1 Introduction A tensor
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More informationFactor analysis. George Balabanis
Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationChapter 11: Factorial Designs
Chapter : Factorial Designs. Two factor factorial designs ( levels factors ) This situation is similar to the randomized block design from the previous chapter. However, in addition to the effects within
More informationMATH Notebook 3 Spring 2018
MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................
More informationNumber of cases (objects) Number of variables Number of dimensions. n-vector with categorical observations
PRINCALS Notation The PRINCALS algorithm was first described in Van Rickevorsel and De Leeuw (1979) and De Leeuw and Van Rickevorsel (1980); also see Gifi (1981, 1985). Characteristic features of PRINCALS
More informationUnbalanced Designs & Quasi F-Ratios
Unbalanced Designs & Quasi F-Ratios ANOVA for unequal n s, pooled variances, & other useful tools Unequal nʼs Focus (so far) on Balanced Designs Equal n s in groups (CR-p and CRF-pq) Observation in every
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 6. Principal component analysis (PCA) 6.1 Overview 6.2 Essentials of PCA 6.3 Numerical calculation of PCs 6.4 Effects of data preprocessing
More informationTAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12
Kurskod: TAMS38 - Provkod: TEN1 TAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12 The collection of the formulas in mathematical statistics prepared by Department
More informationGetting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1
1 Rows first, columns second. Remember that. R then C. 1 A matrix is a set of real or complex numbers arranged in a rectangular array. They can be any size and shape (provided they are rectangular). A
More informationBare minimum on matrix algebra. Psychology 588: Covariance structure and factor models
Bare minimum on matrix algebra Psychology 588: Covariance structure and factor models Matrix multiplication 2 Consider three notations for linear combinations y11 y1 m x11 x 1p b11 b 1m y y x x b b n1
More informationMinwise hashing for large-scale regression and classification with sparse data
Minwise hashing for large-scale regression and classification with sparse data Nicolai Meinshausen (Seminar für Statistik, ETH Zürich) joint work with Rajen Shah (Statslab, University of Cambridge) Simons
More informationMultivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis
Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis
More informationDistances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Distances and similarities Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Similarities Start with X which we assume is centered and standardized. The PCA loadings were
More informationAnalysis of 2x2 Cross-Over Designs using T-Tests
Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous
More informationLecture 4: An FPTAS for Knapsack, and K-Center
Comp 260: Advanced Algorithms Tufts University, Spring 2016 Prof. Lenore Cowen Scribe: Eric Bailey Lecture 4: An FPTAS for Knapsack, and K-Center 1 Introduction Definition 1.0.1. The Knapsack problem (restated)
More informationDesign and Analysis of Experiments
Design and Analysis of Experiments Part VIII: Plackett-Burman, 3 k, Mixed Level, Nested, and Split-Plot Designs Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Plackett-Burman
More information. a m1 a mn. a 1 a 2 a = a n
Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by
More information19. Blocking & confounding
146 19. Blocking & confounding Importance of blocking to control nuisance factors - day of week, batch of raw material, etc. Complete Blocks. This is the easy case. Suppose we run a 2 2 factorial experiment,
More informationInter Item Correlation Matrix (R )
7 1. I have the ability to influence my child s well-being. 2. Whether my child avoids injury is just a matter of luck. 3. Luck plays a big part in determining how healthy my child is. 4. I can do a lot
More informationG E INTERACTION USING JMP: AN OVERVIEW
G E INTERACTION USING JMP: AN OVERVIEW Sukanta Dash I.A.S.R.I., Library Avenue, New Delhi-110012 sukanta@iasri.res.in 1. Introduction Genotype Environment interaction (G E) is a common phenomenon in agricultural
More informationWhat If There Are More Than. Two Factor Levels?
What If There Are More Than Chapter 3 Two Factor Levels? Comparing more that two factor levels the analysis of variance ANOVA decomposition of total variability Statistical testing & analysis Checking
More informationMEMORIAL UNIVERSITY OF NEWFOUNDLAND DEPARTMENT OF MATHEMATICS AND STATISTICS FINAL EXAM - STATISTICS FALL 1999
MEMORIAL UNIVERSITY OF NEWFOUNDLAND DEPARTMENT OF MATHEMATICS AND STATISTICS FINAL EXAM - STATISTICS 350 - FALL 1999 Instructor: A. Oyet Date: December 16, 1999 Name(Surname First): Student Number INSTRUCTIONS
More informationRelated Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM
Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru
More information1 Vectors and Tensors
PART 1: MATHEMATICAL PRELIMINARIES 1 Vectors and Tensors This chapter and the next are concerned with establishing some basic properties of vectors and tensors in real spaces. The first of these is specifically
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationOrthogonal contrasts for a 2x2 factorial design Example p130
Week 9: Orthogonal comparisons for a 2x2 factorial design. The general two-factor factorial arrangement. Interaction and additivity. ANOVA summary table, tests, CIs. Planned/post-hoc comparisons for the
More informationWindow-based Tensor Analysis on High-dimensional and Multi-aspect Streams
Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationContents. TAMS38 - Lecture 8 2 k p fractional factorial design. Lecturer: Zhenxia Liu. Example 0 - continued 4. Example 0 - Glazing ceramic 3
Contents TAMS38 - Lecture 8 2 k p fractional factorial design Lecturer: Zhenxia Liu Department of Mathematics - Mathematical Statistics Example 0 2 k factorial design with blocking Example 1 2 k p fractional
More informationReference: Chapter 13 of Montgomery (8e)
Reference: Chapter 1 of Montgomery (8e) Maghsoodloo 89 Factorial Experiments with Random Factors So far emphasis has been placed on factorial experiments where all factors are at a, b, c,... fixed levels
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours
Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID
More informationIntroduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs
Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique
More informationGROUP THEORY PRIMER. New terms: tensor, rank k tensor, Young tableau, Young diagram, hook, hook length, factors over hooks rule
GROUP THEORY PRIMER New terms: tensor, rank k tensor, Young tableau, Young diagram, hook, hook length, factors over hooks rule 1. Tensor methods for su(n) To study some aspects of representations of a
More informationMcGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper
Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions
More informationAn Introduction to Multivariate Methods
Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationCanonical Correlation & Principle Components Analysis
Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 3 for Applied Multivariate Analysis Outline 1 Reprise-Vectors, vector lengths and the angle between them 2 3 Partial correlation
More informationsociology 362 regression
sociology 36 regression Regression is a means of modeling how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,
More informationIE 361 Module 21. Design and Analysis of Experiments: Part 2
IE 361 Module 21 Design and Analysis of Experiments: Part 2 Prof.s Stephen B. Vardeman and Max D. Morris Reading: Section 6.2, Statistical Quality Assurance Methods for Engineers 1 In this module we begin
More informationMultidimensional Joint Graphical Display of Symmetric Analysis: Back to the Fundamentals
Multidimensional Joint Graphical Display of Symmetric Analysis: Back to the Fundamentals Shizuhiko Nishisato Abstract The basic premise of dual scaling/correspondence analysis lies in the simultaneous
More informationUCLA STAT 233 Statistical Methods in Biomedical Imaging
UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/
More informationNew Frequency Weighting of Hand-Arm Vibration
Industrial Health 2005, 43, 509 515 Original Article New Frequency Weighting of Hand-Arm Vibration Yoshio TOMINAGA Institute for Science of Labour, Sugao, Miyamae-ku, Kawasaki 216-8501, Japan Received
More informationNesting and Mixed Effects: Part I. Lukas Meier, Seminar für Statistik
Nesting and Mixed Effects: Part I Lukas Meier, Seminar für Statistik Where do we stand? So far: Fixed effects Random effects Both in the factorial context Now: Nested factor structure Mixed models: a combination
More informationNoise & Data Reduction
Noise & Data Reduction Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum Dimension Reduction 1 Remember: Central Limit
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationContingency Table Analysis via Matrix Factorization
Contingency Table Analysis via Matrix Factorization Kumer Pial Das 1, Jay Powell 2, Myron Katzoff 3, S. Stanley Young 4 1 Department of Mathematics,Lamar University, TX 2 Better Schooling Systems, Pittsburgh,
More informationDimensionality of Hierarchical
Dimensionality of Hierarchical and Proximal Data Structures David J. Krus and Patricia H. Krus Arizona State University The coefficient of correlation is a fairly general measure which subsumes other,
More informationChapter 8. Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis
Chapter 8 Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis Copyright 2018, 2014, and 2011 Pearson Education, Inc. Slide - 1 Content 1. Identifying the Target Parameter 2.
More informationRevision: Chapter 1-6. Applied Multivariate Statistics Spring 2012
Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing
More informationPrincipal Component Analysis
Principal Component Analysis Giorgos Korfiatis Alfa-Informatica University of Groningen Seminar in Statistics and Methodology, 2007 What Is PCA? Dimensionality reduction technique Aim: Extract relevant
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationAnalysis of Variance (ANOVA)
Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA
More informationGaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford
Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford Carol Hsin Abstract The objective of this project is to return expected electricity
More informationRegression: Lecture 2
Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and
More informationOptimal Selection of Blocked Two-Level. Fractional Factorial Designs
Applied Mathematical Sciences, Vol. 1, 2007, no. 22, 1069-1082 Optimal Selection of Blocked Two-Level Fractional Factorial Designs Weiming Ke Department of Mathematics and Statistics South Dakota State
More informationRegression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.
Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would
More information