Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
|
|
- Moses Sharp
- 5 years ago
- Views:
Transcription
1 Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, / 1 2 / 1 Other datatypes Other datatypes Document data You might start with a collection of n documents (i.e. all of Shakespeare s works in some digital format; all of Wikileaks U.S. Embassy Cables). This is not a data matrix... Given p terms of interest: {Al Qaeda, Iran, Iraq, etc.} one can form a term-document matrix filled with counts. Document data Document ID Al Qaeda Iran Iraq / 1 4 / 1
2 Other datatypes Transaction data Time series Imagine recording the minute-by-minute prices of all stocks in the S & P 500 for last 200 days of trading. The data can be represented by a matrix. BUT, there is definite structure across the rows of this matrix. They are not unrelated cases like they might be in other applications. 5 / 1 6 / 1 Social media data Graph data 7 / 1 8 / 1
3 Other data types Data quality Graph data Nodes on the graph might be facebook users, or public pages. Weights on the edges could be number of messages sent in a prespecified period. If you take weekly intervals, this leads to a sequence of graphs G i = communication over i-th week. How this graph changes is of obvious interest.... Even structure of just one graph is of interest we ll come back to this when we talk about spectral methods... Some issues to keep in mind Is the data experimental or observational? If observational, what do we know about the data generating mechanism? For example, although the S&P 500 example can be represented as a data matrix, there is clearly structure across rows. General quality issues: How much of the data missing? Is missingness informative? Is it very noisy? Are there outliers? Are there a large number of duplicates? 9 / 1 10 / 1 Preprocessing A continuous variable that could be discretized General procedures 1.0 Aggregation Combining features into a new feature. Example: pooling county-level data to state-level data. 0.8 Discretization Breaking up a continuous variable (or a set of continuous variables) into an ordinal (or nominal) discrete variable. Transformation Simple transformation of feature (log or exp) or mapping to a new space (Fourier transform / power spectrum, wavelet transform) / 1 12 / 1
4 Discretization by fixed width Discretization by fixed quartile / 1 14 / 1 Discretization by clustering Variable transformation: bacterial decay / 1 16 / 1
5 Variable transformation: bacterial decay BMW daily returns (fecofin package) 17 / 1 18 / 1 ACF of BMW daily returns Discrete wavelet transform of BMW daily returns 19 / 1 20 / 1
6 Part II Combinations of features Given a data matrix X n p with p fairly large, it can be difficult to visualize structure., PCA & eigenanalysis Often useful to look at linear combinations of the features. Each β R p, determines a linear rule f β (x) = x T β Evaluating this on each row X i of X yields a vector (f β (X i )) 1 i n = X β. 21 / 1 22 / 1 Principal Components By choosing β appropriately, we may find interesting new features. Suppose we take k much smaller than p vectors of β which we write as a matrix B p k This method of dimension reduction seeks features that maximize sample variance... Specifically, for β R p, define the sample variance of V β = X β R n Var(V β ) = 1 n 1 n n (V β,i V β ) 2 i=1 The new data matrix XB has fewer dimensions than X. This is dimension reduction... V β = 1 n i=1 V β,i Note that Var(V Cβ ) = C 2 Var(V β ) so we usually standardize β 2 = n i=1 β2 i = 1 23 / 1 24 / 1
7 Principal Components Define the n n matrix H = I n n 1 n 11T This matrix removes means: (Hv) i = v i v. It is also a projection matrix: Principal Components with Matrices With this matrix, Var(V β ) = 1 n 1 βt X T HX β. So, maximizing sample variance, with β 2 = 1 is ( ) maximize β T X T HX β. β 2 =1 H T = H This boils down to an eigenvalue problem... H 2 = H 25 / 1 26 / 1 Eigenanalysis The matrix X T HX is symmetric, so it can be written as X T HX = VDV T where D k k = diag(d 1,..., d k ), rank(x T HX ) = k and V p k has orthonormal columns, i.e. V T V = I k k. We always have d i 0 and we take d 1 d 2... d k. Eigenanalysis & PCA Suppose now that β = k j=1 a jv j with v j the columns of V. Then, β 2 = k ( ) β T X T HX β = i=1 a 2 i k ai 2 d i i=1 Choosing a 1 = 1, a j = 0, j 2 maximizes this quantity. 27 / 1 28 / 1
8 Eigenanalysis & PCA Therefore, β 1 = v 1 the first column of V solves ( ) maximize β T X T HX β. β 2 =1 Higher order components Having found the direction with maximal sample variance we might look for the second most variable direction by solving ( ) maximize β T X T HX β. β 2 =1,β T v 1 =0 This yields scores HX β 1 R n. These are the 1st principal component scores. Note we restricted our search so we would not just recover v 1 again. Not hard to see that if β = k j=1 a jv j this is solved by taking a 2 = 1, a j = 0, j / 1 30 / 1 Olympic data Higher order components In matrix terms, all the principal components scores are (HX V ) n k and the loadings are the columns of V. This information can be summarized in a biplot. The loadings describe how each feature contributes to each principal component score. 31 / 1 32 / 1
9 Olympic data: screeplot The importance of scale The PCA scores are not invariant to scaling of the features: for Q p p diagonal the PCA scores of X Q are not the same as X. Common to convert all variables to the same scale before applying PCA. Define the scalings to be the sample standard deviation of each feature. In matrix form, let S 2 = 1 ( ) n 1 diag X T HX Define X = HX S 1. The normalized PCA loadings are given by an eigenanalysis of X T X. 33 / 1 34 / 1 Olympic data Olympic data: screeplot 35 / 1 36 / 1
10 PCA and the SVD The singular value decomposition of a matrix tells us we can write X = U V T PCA and the SVD Recall X = HX S 1/2. We saw that normalized PCA loadings are given by an with k k = diag(δ 1,..., δ k ), δ j 0, k = rank( X ), U T U = V T V = I k k. Recall that the scores were X V = (U V T )V = U eigenanalysis of X T X. It turns out that, via the SVD, the normalized PCA scores are given by an eigenanalysis of X X T because ( X V ) n k = U Also, X T X = V 2 V T where U are eigenvectors of X X T = U 2 U T. so D = / 1 38 / 1 Another characterization of SVD Given a data matrix X (or its scaled centered version X ) we might try solving Another characterization of SVD It can be proven that minimize Z:rank(Z)=k X Z 2 F where F stands for Frobenius norm on matrices A 2 F = n p i=1 j=1 A 2 ij Z = S n k (V T ) k p where S n k is the matrix of the first k PCA scores and the columns of V are the first k PCA loadings. This approximation is related to the screeplot: the height of each bar describes the additional drop in Frobenius norm as the rank of the approximation increases. 39 / 1 40 / 1
11 Olympic data: screeplot Other types of dimension reduction Instead of maximizing sample variance, we might try maximizing some other quantity... Independent Component Analysis tries to maximize non-gaussianity of V β. In practice, it uses skewness, kurtosis or other moments to quantify non-gaussianity. These are both unsupervised approaches. Often, these are combined with supervised approaches into an algorithm like: Feature creation Build some set of features using PCA. Validate Use the derived features to see if they are helpful in the supervised problem. 41 / 1 42 / 1 Olympic data: 1st component vs. total score Part III 43 / 1 44 / 1
12 Olympic data Similarities Start with X which we assume is centered and standardized. The PCA loadings were given by eigenvectors of the correlation matrix which is a measure of similarity. The first 2 (or any 2) PCA scores yield an n 2 matrix that can be visualized as a scatter plot. Similarly, the first 2 (or any 2) PCA loadings yield an p 2 matrix that can be visualized as a scatter plot. 45 / 1 46 / 1 Similarities The PCA loadings were given by eigenvectors of the correlation matrix which is a measure of similarity. The visualization of the cases based on PCA scores were determined by the eigenvectors of X X T, an n n matrix. To see this, remember that Similarities The matrix X X T is a measure of similarity between cases. The matrix X T X is a measure of similarity between features. X = U V T Structure in the two similarity matrices yield insight into the set of cases, or the set of features... so X X T = U 2 U T X T X = V 2 V T 47 / 1 48 / 1
13 Distances Distances are inversely related to similarities. If A and B are similar, then d(a, B) should be small, i.e. they should be near. If A and B are distant, then they should not be similar. For a data matrix, there is a natural distance between cases: Distances Suggests a natural transformation between a similarity matrix S and a distance matrix D D ik = (S ii 2 S ik + S kk ) 1/2 The reverse transformation is not so obvious. Some suggestions from your book: d(x i, X k ) 2 = X i X k 2 2 p = X ij X kj 2 2 j=1 = (X X T ) ii 2(X X T ) ik + (X X T ) kk S ik = D ik = e D ik = D ik 49 / 1 50 / 1 Distances A distance (or a metric) on a set S is a function d : S S [0, + ) that satisfies d(x, x) = 0; d(x, y) = 0 x = y d(x, y) = d(y, x) d(x, y) d(x, z) + d(z, y) If d(x, y) = 0 for some x y then d is a pseudo-metric. Similarities A similarity on a set S is a function s : S S R and should satisfy s(x, x) s(x, y) for all x y s(x, y) = s(y, x) By adding a constant, we can often assume that s(x, y) / 1 52 / 1
14 Examples: nominal data The simplest example for nominal data is just the discrete metric { 0 x = y d(x, y) = 1 otherwise. The corresponding similarity would be { 1 x = y s(x, y) = 0 otherwise. = 1 d(x, y) Examples: ordinal data If S is ordered, we can think of S as (or identify S with) a subset of the non-negative integers. If S = m then a natural distance is d(x, y) = x y m 1 1 The corresponding similarity would be s(x, y) = 1 d(x, y) 53 / 1 54 / 1 Examples: vectors of continuous data If S = R k there are lots of distances determined by norms. The Minkowski p or l p norm, for p 1: ( k ) 1/p d(x, y) = x y p = x i y i p i=1 Examples: vectors of continuous data If 0 < p < 1 this is not a norm, but it still defines a metric. The preceding transformations can be used to construct similarities. Examples: binary vectors Examples: p = 2 the usual Euclidean distance, l 2 p = 1 d(x, y) = k i=1 x i y i, the taxicab distance, l 1 p = the d(x, y) = max 1 i k x i y i, the sup norm, l If S = {0, 1} k R k the vectors can be thought of as vectors of bits. The l 1 norm counts the number of mismatched bits. This is known as Hamming distance. 55 / 1 56 / 1
15 Example: Mahalanobis distance Given Σ k k that is positive definite, we define Mahalanobis distance on R k by d Σ (x, y) = ( ) 1/2 (x y) T Σ 1 (x y) Example: similarities for binary vectors Given binary vectors x, y {0, 1} k we can summarize their agreement in a 2 2 table This is the usual Euclidean distance, with a change of basis given by a rotation and stretching of the axes. If Σ is only non-negative definite, then we can replace Σ 1 with Σ, its pseudo-inverse. This yields a pseudo-metric because it fails the test x = 0 x = 1 total y y = 0 f 00 f 01 f 0 y = 1 f 10 f 11 f 1 total x f 0 f 1 k d(x, y) = 0 x = y. 57 / 1 58 / 1 Example: similarities for binary vectors We define the simple matching coefficient (SMC) similarity by Example: cosine similarity & correlation Given vectors x, y R k the cosine similarity is defined as SMC(x, y) = f 00 + f 11 = f 00 + f 11 f 00 + f 01 + f 10 + f 11 k 1 cos(x, y) = x, y x 2 y 2 1. = 1 x y 1 k The correlation between two vectors is defined as The Jaccard coefficient ignores entries where x i = y i = 0 J(x, y) = f 11 f 01 + f 10 + f cor(x, y) = x x 1, y ȳ 1 x x 1 2 y ȳ / 1 60 / 1
16 Example: correlation An alternative, perhaps more familiar definition: cor(x, y) = S xy S x S y S xy = 1 k 1 S 2 x = 1 k 1 S 2 y = 1 k 1 k (x i x)(y i ȳ) i=1 k (x i x) 2 i=1 k (y i ȳ) 2 i=1 Correlation & PCA The matrix 1 n 1 X T X was actually the matrix of pair-wise correlations of the features. Why? How? 1 ( X T ) X n 1 = 1 (D 1/2 X T HX D 1/2) ij n 1 The diagonal entries of D are the sample variances of each feature. The inner matrix multiplication computes the pair-wise dot-products of the columns of HX ( n X T HX )ij = (X ki X i )(X kj X j ) k=1 61 / 1 62 / 1 High positive correlation High negative correlation 63 / 1 64 / 1
17 No correlation Small positive 65 / 1 66 / 1 Small negative Combining similarities In a given data set, each case may have many attributes or features. Example: see the health data set for HW 1. To compute similarities of cases, we must pool similarities across features. In a data set with M different features, we write x i = (x i1,..., x im ), with each x im S m. 67 / 1 68 / 1
18 Combining similarities Given similarities s m on each S m we can define an overall similarity between case x i and x j by s(x i, x j ) = M w m s m (x im, x jm ) m=1 with optional weights w m for each feature. Your book modifies this to deal with asymmetric attributes, i.e. attributes for which Jaccard similarity might be used. 69 / 1
Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationDistances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Distances and similarities Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Similarities Start with X which we assume is centered and standardized. The PCA loadings were
More informationDimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More informationDistances & Similarities
Introduction to Data Mining Distances & Similarities CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Distances & Similarities Yale - Fall 2016 1 / 22 Outline
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Similarity and Dissimilarity Similarity Numerical measure of how alike two data objects are. Is higher
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 10 What is Data? Collection of data objects
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Types of data sets Record Tables Document Data Transaction Data Graph World Wide Web Molecular Structures
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 1
Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects
More informationData preprocessing. DataBase and Data Mining Group 1. Data set types. Tabular Data. Document Data. Transaction Data. Ordered Data
Elena Baralis and Tania Cerquitelli Politecnico di Torino Data set types Record Tables Document Data Transaction Data Graph World Wide Web Molecular Structures Ordered Spatial Data Temporal Data Sequential
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationLecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis
Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar 10 What is Data? Collection of data objects and their attributes Attributes An attribute is a property
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationSingular Value Decomposition and Principal Component Analysis (PCA) I
Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationRevision: Chapter 1-6. Applied Multivariate Statistics Spring 2012
Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing
More informationFall TMA4145 Linear Methods. Exercise set Given the matrix 1 2
Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension
More informationNumerical Methods I Singular Value Decomposition
Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 10 What is Data? Collection of data objects
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationLecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis
Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Week #1
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 02: Data Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) 1 10 What is Data? Collection of data objects and their attributes An attribute
More informationSimilarity and Dissimilarity
1//015 Similarity and Dissimilarity COMP 465 Data Mining Similarity of Data Data Preprocessing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed.
More informationj=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).
Math 344 Lecture #19 3.5 Normed Linear Spaces Definition 3.5.1. A seminorm on a vector space V over F is a map : V R that for all x, y V and for all α F satisfies (i) x 0 (positivity), (ii) αx = α x (scale
More informationLinear Algebra 2 Spectral Notes
Linear Algebra 2 Spectral Notes In what follows, V is an inner product vector space over F, where F = R or C. We will use results seen so far; in particular that every linear operator T L(V ) has a complex
More informationThe value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N.
Math 410 Homework Problems In the following pages you will find all of the homework problems for the semester. Homework should be written out neatly and stapled and turned in at the beginning of class
More information1 Interpretation. Contents. Biplots, revisited. Biplots, revisited 2. Biplots, revisited 1
Biplots, revisited 1 Biplots, revisited 2 1 Interpretation Biplots, revisited Biplots show the following quantities of a data matrix in one display: Slide 1 Ulrich Kohler kohler@wz-berlin.de Slide 3 the
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationThis appendix provides a very basic introduction to linear algebra concepts.
APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not
More informationCS 340 Lec. 6: Linear Dimensionality Reduction
CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More information1 Linearity and Linear Systems
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)
More informationLinear Algebra and Eigenproblems
Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationLecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the
More informationLinear Methods in Data Mining
Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationCS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)
CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis
More informationLecture 7 Spectral methods
CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More informationSingular Value Decomposition
Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationFINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3
FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 The required files for all problems can be found in: http://www.stat.uchicago.edu/~lekheng/courses/331/hw3/ The file name indicates which problem
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationFoundations of Computer Vision
Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply
More informationNumerical Linear Algebra
Numerical Linear Algebra The two principal problems in linear algebra are: Linear system Given an n n matrix A and an n-vector b, determine x IR n such that A x = b Eigenvalue problem Given an n n matrix
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationDimensionality Reduction
394 Chapter 11 Dimensionality Reduction There are many sources of data that can be viewed as a large matrix. We saw in Chapter 5 how the Web can be represented as a transition matrix. In Chapter 9, the
More information4 Linear Algebra Review
Linear Algebra Review For this topic we quickly review many key aspects of linear algebra that will be necessary for the remainder of the text 1 Vectors and Matrices For the context of data analysis, the
More informationChapter 7: Symmetric Matrices and Quadratic Forms
Chapter 7: Symmetric Matrices and Quadratic Forms (Last Updated: December, 06) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved
More informationSingular Value Decomposition
Singular Value Decomposition (Com S 477/577 Notes Yan-Bin Jia Sep, 7 Introduction Now comes a highlight of linear algebra. Any real m n matrix can be factored as A = UΣV T where U is an m m orthogonal
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationMatrix Factorizations
1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular
More informationLecture 8: Linear Algebra Background
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationDependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.
MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method
CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method Tim Roughgarden & Gregory Valiant April 15, 015 This lecture began with an extended recap of Lecture 7. Recall that
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationImage Registration Lecture 2: Vectors and Matrices
Image Registration Lecture 2: Vectors and Matrices Prof. Charlene Tsai Lecture Overview Vectors Matrices Basics Orthogonal matrices Singular Value Decomposition (SVD) 2 1 Preliminary Comments Some of this
More informationFinal Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson
Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationSTATISTICAL LEARNING SYSTEMS
STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis
More information3.1. The probabilistic view of the principal component analysis.
301 Chapter 3 Principal Components and Statistical Factor Models This chapter of introduces the principal component analysis (PCA), briefly reviews statistical factor models PCA is among the most popular
More informationMathematics for Graphics and Vision
Mathematics for Graphics and Vision Steven Mills March 3, 06 Contents Introduction 5 Scalars 6. Visualising Scalars........................ 6. Operations on Scalars...................... 6.3 A Note on
More informationPrincipal components analysis COMS 4771
Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature
More information