Louis Roussos Sports Data
|
|
- Clare Joseph
- 5 years ago
- Views:
Transcription
1 Louis Roussos Sports Data Rank the sports you most like to participate in, 1 = favorite, 7 = least favorite. There are n=130 rank vectors. > sportsranks Baseball Football Basketball Tennis Cycling Swimming Jogging [...]
2 K-means in R Set #Clusters = K = centers. nstart is the number of times it runs the algorithm, each time using a diferent random starting set of means. > kmeans(sportsranks,centers=2,nstart=10) K means clustering with 2 clusters of sizes 62, 68 Cluster means: Baseball Football Basketball Tennis Cycling Swimming Jogging Clustering vector: Within cluster sum of squares by cluster: [1] Available components: [1] cluster centers withinss size
3 Getting clusters of size K=2,..., 10 kms < vector( list,10) for(k in 2:10) { kms[[k]] < kmeans(sportsranks,centers=k,nstart=10) }
4 K = 1 BaseB FootB BsktB Ten Cyc Swim Jog Group K = 2 BaseB FootB BsktB Ten Cyc Swim Jog Group Group K = 3 BaseB FootB BsktB Ten Cyc Swim Jog Group Group Group K = 4 BaseB FootB BsktB Ten Cyc Swim Jog Group Group Group Group K = 2: Group 1 likes swimming and cycling, while group 2 likes the team sports, baseball, football, and basketball. K = 3: Group 1 appears to be about the same is the team sports group from K = 2, while groups 2 and 3 both like swimming and cycling. The difference is that group 3 does not like jogging, while group 2 does. K = 4: The team-sports group has split into one that likes tennis (group 3), and one that doesn t (group 2).
5 Plotting two clusters The idea is to project the observations to the subspace (which is just a line) that goes through the two clusters mean vectors. The z = µ 1 µ 2 µ 1 µ 2, is the unit vector pointing from µ 2 to µ 1. Then using z as an axis, the projections of the observations onto z have coordinates w i = x i z, i = 1,..., N.
6 The histogram K=2 Frequency Basketball Jogging Football Swimming Baseball Cycling Tennis X X W
7 Plot for K=3 If K = 3, then the three means lie in a plane, hence we would like to project the observations onto that plane. One approach is to use principal components on the means: Z = we apply the spectral decomposition to the sample covariance matrix of Z: 1 3 Z H 3 Z = GLG, (1) where G is orthogonal and L is diagonal. The diagonals of L here are 11.77, 4.07, and five zeros. We then rotate the data and the means using G, µ 1 µ 2 µ 3, W = XG and W (means) = ZG, Only the first two columns in each matrix are relevant.
8 The Plot K=3 Var Jogging 2 Baseball Football Tennis Basketball Cycling Swimming Var 1
9 The sums of squares SS K SS K = obj( µ 1,..., µ K ) = K k=1 {i y i =k} x i µ k 2.
10 The reduction of sums of squares 1-SS[k]/SS[k-1] K 1 SS K SS K 1
11 Silhouettes in R The function silhouette.km finds the silhouettes for a given clustering, then sort.silhouette orders them, first by cluster number, then by value. To plot the sillhouettes for k = 2,..., 10: sil.ave < NULL # To collect silhouette s means for each K par(mfrow=c(3,3)) for(k in 2:10) { sil < silhouette.km(sportsranks,kms[[k]]$centers) sil.ave < c(sil.ave,mean(sil)) ssil < sort.silhouette(sil,kms[[k]]$cluster) plot(ssil,type= h,xlab= Observations,ylab= Silhouettes ) title(paste( K =,K)) } The sil.ave calculated above can then be used to obtain the plot of averages: plot(2:10,sil.ave,type= l,xlab= K,ylab= Average silhouette width )
12 Plotting the silhouettes K = 2 K = Ave = K = Ave = K = Ave = Ave = 0.534
13 Plotting the silhouettes averages Average silhouette width K K = 2 seems like a good choice.
14 Model-based clustering Car data The data consists of size measurements on 111 automobiles, the variables include length, wheelbase, width, height, front and rear head room, front leg room, rear seating, front and rear shoulder room, and luggage area. The data are in the file cars. The variables have been normalized to have medians of 0 and median absolute deviations (MAD) of (the MAD for a N(0, 1)).
15 R for model-based clustering The R function we use is in the package mclust. The function is Mclust. The basic command is simple: mcars < Mclust(cars) There are many options for plotting in the package. To see a plot of the BIC s, use plot(mcars,cars,what= BIC ) You have to clicking on the graphics window, or hit enter, to reveal the plot. Not that the BIC s in this function are actually the BIC s. So we want to maximize it.
16 Plotting the BIC s BIC EII VII EEI VEI EVI VVI EEE EEV VEV VVV number of components K = 2, VVV is best.
17 What is VVV? To find the name of the best model: > mcars best model: ellipsoidal, unconstrained with 2 components That K = 2 is easy to see. The assumptions on the covariance matrices are ellipsoidal, which means they have no special structure, and unconstrained, which means they are not assumed equal for the two groups, Σ 1 = Σ 2. To plot variable 1 (length) versus variable 4 (height), use plot(mcars,cars,what= classification,dimens=c(1,4))
18 Plotting the clusters Height FrtLegRoom Length Width Luggage PC RearHd PC1
19 The cars in group 2 Rear Head Rear Seating Rear Shoulder Luggage Chevrolet Corvette Honda Civic CRX Mazda MX5 Miata Mazda RX Nissan 300ZX Chevrolet Astro Chevrolet Lumina APV Dodge Caravan Dodge Grand Caravan Ford Aerostar Mazda MPV Mitsubishi Wagon Nissan Axxess Nissan Van Volkswagen Vanagon
20 Just group 1 Redo on just the group 1 automobiles: cars1 < cars[mcars$classification==1,] mcars1 < Mclust(cars1) mcars1 best model: elliposidal multivariate normal with 1 components The best is one big cluster.
21 The models in mclust Code Description Σ k EII spherical, equal volume σ 2 I p VII spherical, unequal volume σk 2I p EEI diagonal, equal volume and shape Λ VEI diagonal, varying volume, equal shape c k EVI diagonal, equal volume, varying shape c k VVI diagonal, varying volume and shape Λ k EEE ellipsoidal, equal volume, shape, and orientation Σ EEV ellipsoidal, equal volume and equal shape Γ k ΛΓ k VEV ellipsoidal, equal shape c k Γ k Γ k VVV ellipsoidal, varying volume, shape, and orientation arbitrary Here, Λ s are diagonal matrices with positive diagonals, s are diagonal matrices with positive diagonals whose product is 1, Γ s are orthogonal matrices, Σ s are arbitrary nonnegative definite symmetric matrices, and c s are positive scalars. A subscript k on an element means the groups can have different values for that element. No subscript means that element is the same for each group.
22 Hierarchical clustering of the sports plclust(hclust(dist(t(sportsranks)))) Height Baseball Football Basketball Jogging Tennis Cycling Swimming Complete linkage
23 Hierarchical clustering of the individuals par(mfrow=c(2,1)) dxs < dist(sportsranks) # Gets Euclidean distances lbl < rep(,130) # Prefer no labels for the individuals plclust(hclust(dxs),xlab= Complete linkage,sub=,labels=lbl) plclust(hclust(dxs,method= single ),xlab= Single linkage,sub=,labels=lbl) Height Complete linkage Height Single linkage
Solution to Series 7
Prof. r. M. Maathuis Multivariate tatistics 2014 olution to eries 7 1. a) Computing the 2 clusters with the K-means method. > set.seed(10) > kmean.bank
More informationSTAT 730 Chapter 14: Multidimensional scaling
STAT 730 Chapter 14: Multidimensional scaling Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 16 Basic idea We have n objects and a matrix
More informationPrincipal Components. Summary. Sample StatFolio: pca.sgp
Principal Components Summary... 1 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 8 Component Weights... 9 D and 3D Component Plots... 10 Data Table... 11 D and 3D Component
More informationFactor Analysis. Summary. Sample StatFolio: factor analysis.sgp
Factor Analysis Summary... 1 Data Input... 3 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 9 Extraction Statistics... 10 Rotation Statistics... 11 D and 3D Scatterplots...
More informationChapter 24: Comparing means
Chapter 4: Comparing means Example: Consumer Reports annually conducts a survey of automobile reliability Approximately 4 million households are surveyed by mail, The 990 survey is summarized in the Figure
More informationBare minimum on matrix algebra. Psychology 588: Covariance structure and factor models
Bare minimum on matrix algebra Psychology 588: Covariance structure and factor models Matrix multiplication 2 Consider three notations for linear combinations y11 y1 m x11 x 1p b11 b 1m y y x x b b n1
More informationMultivariate Statistics
Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering
More informationLinear Algebra, Summer 2011, pt. 3
Linear Algebra, Summer 011, pt. 3 September 0, 011 Contents 1 Orthogonality. 1 1.1 The length of a vector....................... 1. Orthogonal vectors......................... 3 1.3 Orthogonal Subspaces.......................
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationCOMPLEX PRINCIPAL COMPONENT SPECTRA EXTRACTION
COMPLEX PRINCIPAL COMPONEN SPECRA EXRACION PROGRAM complex_pca_spectra Computing principal components o begin, click the Formation attributes tab in the AASPI-UIL window and select program complex_pca_spectra:
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationapplications Rome, 9 February Università di Roma La Sapienza Robust model based clustering: methods and applications Francesco Dotto Introduction
model : fuzzy model : Università di Roma La Sapienza Rome, 9 February Outline of the presentation model : fuzzy 1 General motivation 2 algorithm on trimming and reweigthing. 3 algorithm on trimming and
More informationThe mclust Package. January 18, Author C. Fraley and A.E. Raftery, Dept. of Statistics, University of Washington.
The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics, University of Washington. Title Model-based cluster analysis Model-based cluster analysis: the
More informationPrincipal Component Analysis. Applied Multivariate Statistics Spring 2012
Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction
More informationMultiple Variable Analysis
Multiple Variable Analysis Revised: 10/11/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 4 Scatterplot Matrix... 4 Summary Statistics... 6 Confidence Intervals... 7 Correlations...
More informationarxiv: v1 [stat.me] 7 Aug 2015
Dimension reduction for model-based clustering Luca Scrucca Università degli Studi di Perugia August 0, 05 arxiv:508.07v [stat.me] 7 Aug 05 Abstract We introduce a dimension reduction method for visualizing
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationGEOMETRY OF MATRICES x 1
GEOMETRY OF MATRICES. SPACES OF VECTORS.. Definition of R n. The space R n consists of all column vectors with n components. The components are real numbers... Representation of Vectors in R n.... R. The
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationClustering: K-means. -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD
Clustering: K-means -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 Clustering Introduction When clustering, we seek to simplify the data via a small(er) number of summarizing variables
More informationSingular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces
Singular Value Decomposition This handout is a review of some basic concepts in linear algebra For a detailed introduction, consult a linear algebra text Linear lgebra and its pplications by Gilbert Strang
More informationChapter 5: Exploring Data: Distributions Lesson Plan
Lesson Plan Exploring Data Displaying Distributions: Histograms For All Practical Purposes Mathematical Literacy in Today s World, 7th ed. Interpreting Histograms Displaying Distributions: Stemplots Describing
More information8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution
Eigenvectors and the Anisotropic Multivariate Gaussian Distribution Eigenvectors and the Anisotropic Multivariate Gaussian Distribution EIGENVECTORS [I don t know if you were properly taught about eigenvectors
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationIntroduction to Matrix Algebra
Introduction to Matrix Algebra August 18, 2010 1 Vectors 1.1 Notations A p-dimensional vector is p numbers put together. Written as x 1 x =. x p. When p = 1, this represents a point in the line. When p
More informationMultivariate analysis of genetic data: exploring groups diversity
Multivariate analysis of genetic data: exploring groups diversity T. Jombart Imperial College London Bogota 01-12-2010 1/42 Outline Introduction Clustering algorithms Hierarchical clustering K-means Multivariate
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationJournal of Statistical Software
JSS Journal of Statistical Software January 2012, Volume 46, Issue 6. http://www.jstatsoft.org/ HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data Laurent
More informationSTAT 151A: Lab 1. 1 Logistics. 2 Reference. 3 Playing with R: graphics and lm() 4 Random vectors. Billy Fang. 2 September 2017
STAT 151A: Lab 1 Billy Fang 2 September 2017 1 Logistics Billy Fang (blfang@berkeley.edu) Office hours: Monday 9am-11am, Wednesday 10am-12pm, Evans 428 (room changes will be written on the chalkboard)
More informationFigure T1: Consumer Segments with No Adverse Selection. Now, the discounted utility, V, of a segment 1 consumer is: Segment 1 (Buy New)
Online Technical Companion to Accompany Trade-ins in Durable Goods Markets: Theory and Evidence This appendix is divided into six main sections which are ordered in a sequence corresponding to their appearance
More informationAssignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:
Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition Due date: Friday, May 4, 2018 (1:35pm) Name: Section Number Assignment #10: Diagonalization
More informationVAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:
VAR Model (k-variate VAR(p model (in the Reduced Form: where: Y t = A + B 1 Y t-1 + B 2 Y t-2 + + B p Y t-p + ε t Y t = (y 1t, y 2t,, y kt : a (k x 1 vector of time series variables A: a (k x 1 vector
More information7. Symmetric Matrices and Quadratic Forms
Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationShort Answer Questions: Answer on your separate blank paper. Points are given in parentheses.
ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on
More informationClusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved
Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationParsimonious Gaussian Mixture Models
Parsimonious Gaussian Mixture Models Brendan Murphy Department of Statistics, Trinity College Dublin, Ireland. East Liguria West Liguria Umbria North Apulia Coast Sardina Inland Sardinia South Apulia Calabria
More informationFINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3
FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 The required files for all problems can be found in: http://www.stat.uchicago.edu/~lekheng/courses/331/hw3/ The file name indicates which problem
More information1 Linearity and Linear Systems
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)
More informationFamilies of Parsimonious Finite Mixtures of Regression Models arxiv: v1 [stat.me] 2 Dec 2013
Families of Parsimonious Finite Mixtures of Regression Models arxiv:1312.0518v1 [stat.me] 2 Dec 2013 Utkarsh J. Dang and Paul D. McNicholas Department of Mathematics & Statistics, University of Guelph
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationO.E. Alloy Wheel Weight Applications
O.E. Alloy Wheel Weight Applications Passenger Cars Vehicle Model 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 n Acura n AUDI 1 All EN EN EN EN EN EN EN EN EN EN EN n BMW 1 All IAWbo IAWbo IAWbo
More information(b) Find the constituent matrices of A. For this, we need the eigenvalues of A, which we can find by using the Maple command "eigenvals":
Problem 5 : The card deal problem First solution: using the linalg package (a) Defining the matrix A. First define B = 13A: (1) (b) Find the constituent matrices of A. For this, we need the eigenvalues
More informationRepeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each
Repeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each participant, with the repeated measures entered as separate
More informationContents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2
Contents Preface for the Instructor xi Preface for the Student xv Acknowledgments xvii 1 Vector Spaces 1 1.A R n and C n 2 Complex Numbers 2 Lists 5 F n 6 Digression on Fields 10 Exercises 1.A 11 1.B Definition
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationThis appendix provides a very basic introduction to linear algebra concepts.
APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not
More informationNo books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.
Math 304 Final Exam (May 8) Spring 206 No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question. Name: Section: Question Points
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationPackage sgpca. R topics documented: July 6, Type Package. Title Sparse Generalized Principal Component Analysis. Version 1.0.
Package sgpca July 6, 2013 Type Package Title Sparse Generalized Principal Component Analysis Version 1.0 Date 2012-07-05 Author Frederick Campbell Maintainer Frederick Campbell
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationKeystone Exam Review: Module 2. Linear Functions and Data Organizations
Algebra Keystone Review: M2 Name: Date: Period: Part : Multiple Choice Questions. ) Which graph shows y as a function of x. Keystone Exam Review: Module 2 Linear Functions and Data Organizations A) B)
More informationPrincipal Components Theory Notes
Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory
More informationMath Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p
Math Bootcamp 2012 1 Review of matrix algebra 1.1 Vectors and rules of operations An p-dimensional vector is p numbers put together. Written as x 1 x =. x p. When p = 1, this represents a point in the
More informationMaths for Signals and Systems Linear Algebra in Engineering
Maths for Signals and Systems Linear Algebra in Engineering Lecture 18, Friday 18 th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE LONDON Mathematics
More informationseries. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques..
1 Use computational techniques and algebraic skills essential for success in an academic, personal, or workplace setting. (Computational and Algebraic Skills) MAT 203 MAT 204 MAT 205 MAT 206 Calculus I
More informationMATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)
MATH 20F: LINEAR ALGEBRA LECTURE B00 (T KEMP) Definition 01 If T (x) = Ax is a linear transformation from R n to R m then Nul (T ) = {x R n : T (x) = 0} = Nul (A) Ran (T ) = {Ax R m : x R n } = {b R m
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More information5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.
Linear Algebra - Test File - Spring Test # For problems - consider the following system of equations. x + y - z = x + y + 4z = x + y + 6z =.) Solve the system without using your calculator..) Find the
More informationIntroduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract
Final Project 2//25 Introduction to Independent Component Analysis Abstract Independent Component Analysis (ICA) can be used to solve blind signal separation problem. In this article, we introduce definition
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationPrincipal component analysis (PCA) for clustering gene expression data
Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical
More informationREVIEW SET MIDTERM 1
Physics 010 Fall 01 Orest Symko REVIEW SET MIDTERM 1 1. On April 15, 1991, Dr. Rudolph completed the Boston Marathon (6 miles, 385 yards) in a time of 3 hours, minutes, 30 seconds. Later in the summer
More informationMatrix Vector Products
We covered these notes in the tutorial sessions I strongly recommend that you further read the presented materials in classical books on linear algebra Please make sure that you understand the proofs and
More informationDimensionality Reduction and Principle Components
Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,
More informationPRINCIPAL COMPONENTS ANALYSIS
PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains
More informationNature Biotechnology: doi: /nbt Supplementary Figure 1
Supplementary Figure 1 MNN corrects nonconstant batch effects. By using locally linear corrections, MNN can handle non-constant batch effects, here simulated as a small angle rotation of data on twodimensional
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationComputational functional genomics
Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationDimensionality Reduction and Principal Components
Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X
More informationOperators and the Formula Argument in lm
Operators and the Formula Argument in lm Recall that the first argument of lm (the formula argument) took the form y. or y x (recall that the term on the left of the told lm what the response variable
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationPolynomial Regression
Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationAn Introduction to Multivariate Statistical Analysis
An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents
More informationKarhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering
Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition
More informationL3: Review of linear algebra and MATLAB
L3: Review of linear algebra and MATLAB Vector and matrix notation Vectors Matrices Vector spaces Linear transformations Eigenvalues and eigenvectors MATLAB primer CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
More informationDimension Reduction and Iterative Consensus Clustering
Dimension Reduction and Iterative Consensus Clustering Southeastern Clustering and Ranking Workshop August 24, 2009 Dimension Reduction and Iterative 1 Document Clustering Geometry of the SVD Centered
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationGALILEAN RELATIVITY. Projectile motion. The Principle of Relativity
GALILEAN RELATIVITY Projectile motion The Principle of Relativity When we think of the term relativity, the person who comes immediately to mind is of course Einstein. Galileo actually understood what
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationThe Pennsylvania State University The Graduate School Eberly College of Science VISUAL ANALYTICS THROUGH GAUSSIAN MIXTURE MODELS WITH
The Pennsylvania State University The Graduate School Eberly College of Science VISUAL ANALYTICS THROUGH GAUSSIAN MIXTURE MODELS WITH SUBSPACE CONSTRAINED COMPONENT MEANS A Thesis in Statistics by Mu Qiao
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationLinear Algebra. Session 12
Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)
More informationProblem 1. CS205 Homework #2 Solutions. Solution
CS205 Homework #2 s Problem 1 [Heath 3.29, page 152] Let v be a nonzero n-vector. The hyperplane normal to v is the (n-1)-dimensional subspace of all vectors z such that v T z = 0. A reflector is a linear
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More informationPhysics February 2, Hand-In 10.6 Energy Problems
Physics February 2, 2018 Hand-In 10.6 Energy Problems Momentum and Impulse Momentum and Impulse Momentum A team is said to have Momentum if they are on a roll, or hard to stop. In Physics, momentum is
More informationYORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions
YORK UNIVERSITY Faculty of Science Department of Mathematics and Statistics MATH 222 3. M Test # July, 23 Solutions. For each statement indicate whether it is always TRUE or sometimes FALSE. Note: For
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationLAB 2: Orthogonal Projections, the Four Fundamental Subspaces, QR Factorization, and Inconsistent Linear Systems
Math 550A MATLAB Assignment #2 1 Revised 8/14/10 LAB 2: Orthogonal Projections, the Four Fundamental Subspaces, QR Factorization, and Inconsistent Linear Systems In this lab you will use Matlab to study
More information