CS 340 Lec. 6: Linear Dimensionality Reduction

Size: px
Start display at page:

Download "CS 340 Lec. 6: Linear Dimensionality Reduction"

Transcription

1 CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January / 46

2 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis Applications Singular Value Decomposition AD () January / 46

3 Lots of High Dimensional Data Lots of high-dimensional data... face images According to media reports, a pair of hackers said on Saturday that the Firefox Web browser, commonly Zambian President Levy perceived as the safer Mwanawasa has won a and more customizable second term in office in an election his challenger alternative to market Michael Sata accused him leader Internet Explorer, of rigging, official results is critically flawed. A showed on Monday. presentation on the flaw was shown during the ToorCon hacker conference in San Diego. documents gene expression data MEG readings AD () January / 46

4 Motivation Why do dimensionality reduction? Computational: compress data time/space effi ciency Statistical: fewer dimension better generalization Visualization: understand structure of data Anomaly detection: describe normal data, detect outliers Dimensionality reduction in this course: Linear methods (this week) Clustering. Feature selection. AD () January / 46

5 Types of Problems Supervised learning (classification, regression): Applications: face recognition, gene expression prediction Techniques: knn, SVM, least squares (+ dimensionality reduction preprocessing) Structure discovery: find an alternative representation z of data x Applications: visualization Techniques: clustering, linear dimensionality reduction Density estimation p (x): model the data, Applications: anomaly detection, language modeling Techniques: clustering, linear dimensionality reduction AD () January / 46

6 What is the true dimensionality of these data? AD () January / 46

7 What is the true dimensionality of this data? Figure: Simulated data in three classes, near the surface of a half-sphere AD () January / 46

8 Linear Dimensionality Reduction Which line is best???? Which line should I pick? AD () January / 46

9 Linear sic idea Dimensionality of linear Reduction dimensionality reductio Basic idea of linear dimensionality reduction Represent each face as aas high-dimensional a high-dimensional vector x R 361 vector by a x R lower-dimensional vector say z R 10. Represent each face as a high-dimensional vector x R 3 x R 361 x R 361 z R 10 z = U x z R 10 z = U x How do we choose U? AD () January / 46

10 Review of Linear Algebra x = 3.4, x = x 1. x d b 11 b 1n, A =. a 11 a 1p. a d 1 a dp, B =. b p1 b pn Here x is scalar (1 1), x is d 1, A is d p and B is p n. Transposition: x T = x, x T = ( ) ( x 1 x d, A T ) = a i,j ji. Quantities whose inner dimensions match may be multiplied by summing over this index. The outer dimensions give the dimensions of the answer. p (Ax) i = a i,j x j, (AB) i,j = a i,k b k,j j=1 k=1 x T x scalar, xx T d d, Ax d 1, AB d n, x T Ax scalar. AD () January / 46

11 Review of Linear Algebra Simple and valid manipulations (AB) C = A (BC ), A (B + C ) = AB + AC, (A + B) T = A T + B T, (AB) T = B T A T Consider a square matrix A then u is an eigenvector of A and λ is its associated eigenvalue iff Au =λu. If the matrix is diagonalizable AU = UD A = UDU 1 Q: Prove that A k = UD k U 1. Why is this expression useful? AD () January / 46

12 Review of Linear Algebra: PD Matrices A real-valued square matrix A is called (semi-)positive definite if x T Ax 0 Q: Prove that for any matrix M, the matrix M T M is (semi-)positive definite. Q: Prove that a positive definite matrix only admits positive eigenvalues. AD () January / 46

13 Review of Linear Algebra: Inner Product Let u, v R d, then the inner product of u and v is a scalar u T v = v T u = d i=1 The (Euclidean) length/norm of a vector u is written u and is defined as the square root of the inner product of the vector with itself u = u T u = d ui 2. i=1 u i v i If the angle between vectors u and v is θ then cos (θ) = ut v u v. AD () January / 46

14 Approximating High-dimensional Vectors We are given N data {x i } N i=1 where x i R d and we want to approximate them by { x i } N i=1 using x i = k z j,i w j j=1 where z i,j R and {w i } k i=1 are Rd -valued basis vector. This can be rewritten as for x i = Wz i W = ( w 1 w k ), d k matrix z i = ( z 1,i z k,i ) T, k 1 vector AD () January / 46

15 Approximating High-dimensional Vectors An even more compact notation is X }{{} d N = }{{} W }{{} Z d k k N where X = ( x 1 x N ) and Z = ( z1 z N ). We can gain very significantly in terms of storage if k << d as we only need to store W (size d k) and Z (size k N) to compute X instead of X (size d N). Example: For d = 1000, k = 10 and N = 10 6, we have d N/ (d k + k N) 100. AD () January / 46

16 Approximating High-dimensional Vectors How should we select W and Z to ensure X X? We introduce the reconstruction error X X and propose to minimize the square of its Frobenius norm J (W, Z) = 1 X X 2 N = 1 N F N x i x i 2 i=1 = 1 N d N j=1 i=1 (x j,i x j,i ) 2 subject to W be an orthonormal matrix; i.e. w T i w j = { 1 if i = j 0 otherwise W T W = I k. Q: What is the minimum total squared reconstruction error for k = d? What about k > d? AD () January / 46

17 Preliminaries: Normalization and Centering of the Data It is standard to normalize and center the data beforehand. This ensures that PCA finds the interesting directions of variation, not the ones which just happen to be large because of the units of measurement that are used. Hence in practice if the original data were {x i }, we compute m j = 1 N N x j,i, σ 2 j = 1 N i=1 N (x j,i m j ) 2 i=1 and we set x i = ( x 1,i x d,i ) T where x j,i = x j,i m j σ j. AD () January / 46

18 Finding the first principal component Consider first the case k = 1 then we want to minimize J (W, Z) = J ( w 1, z 1) = 1 N = 1 N = 1 N N x i z 1,i w 1 2 i=1 N x T i x i 2x T i z 1,i w 1 + z 1,i w1 T w 1 z 1,i i=1 N x T i x i 2z 1,i x T i w 1 + z1,i 2 i=1 subject to w T 1 w 1 = 1 with z 1 = (z 1,1 z 1,2 z 1,N ). Taking derivative w.r.t. z 1,i and setting it equal to zero J ( w 1, z 1) z 1,i = 2x T i w 1 + 2z 1,i = 0 z 1,i = x T i w 1 Optimal reconstruction weights are obtained by orthogonally projecting the data onto the first principal direction w 1. AD () January / 46

19 Finding the first principal component Minimizing J (w 1 ) is thus equivalent to maximizing N 1 N z1,i 2 = 1 N ) 2 i=1 N (w1 T x i i=1 ( ) = w1 T N 1 N x i x T i w 1 s.t. w 1 = 1. i=1 }{{} Σ Assume the data have been centered, so that N x i = 0 i=1 then n 1 ( ) ( 2 ( ) ) 2 ( ) N w1 T x i E w1 T x Var w1 T x ; i=1 i.e. we seek w 1 maximizing the variance of the projected data. AD () January / 46

20 Note additionally that we have Σ = 1 N N x i x T i i=1 ( E xx T) = Cov (x). That is Σ is an estimate of the covariance/correlation matrix of the data. AD () January / 46

21 Finding the first principal component Minimizing J (w 1 ) is equivalent to maximizing w T 1 Σw 1 s.t. w 1 = 1. Proposition: The vector w opt 1 minimizing J (w 1 ) is the eigenvector (selected such that w 1 = 1) associated to the largest eigenvalue of Σ. Proof: Σ is a symmetric matrix so it is diagonalizable by an orthornormal matrix U; i.e. ΣU = UD Σ = UDU T with D diagonal. Without loss of generality, we pick D =diag ( σ 2 1,..., σ2 d ) where σ 2 1 σ 2 2 σ2 d. AD () January / 46

22 Finding the first principal component It follows that arg max w T Σw 1 1 = arg max w 1 : w 1 =1 w 1 : w 1 =1 = arg max y T Dy y=u T w 1 : y =1 = arg max y=u T w 1 : y =1 ( U T w 1 ) T D ( U T w 1 ) d σ 2 i yi 2 i=1 so y opt = ( ) T w1 = Uy opt = u 1. AD () January / 46

23 PCA Example Principal Components Analysis ˆx i = k j=1 z ijv j. We have d = 2 and k = 1. Circles are the original data points, crosses are the reconstructions. The red star is the data mean. Loadings (basis) 9 scores AD () January / 46

24 The second principal component We want to minimize J ( w 1, z 1, w 2, z 2) = 1 N s.t. w 1 = w 2 = 1 and w T 1 w 2 = 0. N x i z 1,i w 1 z 2,i w 2 2 i=1 Optimizing w.r.t w 1, z 1 gives the same results as before. If we optimize w.r.t z 2,i, we find J ( w opt 1, z opt,1, w 2, z 2) z 2,i = 2x T i w 2 + 2z 2,i = 0 z 2,i = x T i w 2 Similarly it can be proved that w opt 2 is the eigenvector (selected such that w 2 = 1) associated to the second largest eigenvalue of Σ. AD () January / 46

25 General Case Compute the eigendecomposition of Σ = 1 N XXT = UDU T with σ 2 1 σ2 2 σ2 d and keep only the associated k eigenvectors U k = ( ) u 1 u k The estimate is given by k = j=1 Z opt X = U k ( U T k X ) }{{} It can be additionally shown that (for k < d) 1 N X X 2 = F ) u j (u T j X }{{} loadings d σ 2 j j=k+1 AD () January / 46

26 Important Practical Remark If you have centered and normalize the data beforehand, don t forget to correct later on!! Suppose you have considered then the reconstruction will be X=Φ 1 (X µ) X = µ+φx X = µ+φ X where X is the PCA approximation of X. AD () January / 46

27 Reconstruction Error We have Hence it follows that x i x i 2 = = = = ( d j=k+1 ( d j=k+1 d j=k+1 x i x i = d ( ) u j u T j x i j=k+1 u j (u T j x i ) ) T ( d j=k+1 (u T j x i ) u T j ) ( d ( u T j x i ) 2 d u T j x i x T i u j j=k+1 j=k+1 u j ( u T j x i ) ) u j ( u T j x i ) ) as u T j u l = 1 if j = l and 0 if j = l AD () January / 46

28 Reconstruction Error Thus we have 1 N X X 2 F = 1 N = 1 N = 1 N = = N i=1 N i=1 x i x i 2 ( d d u T j j=k+1 d u T j j=k+1 d σ 2 j j=k+1 ) u T j x i x T i u j j=k+1 ) Σu j ( N x i x T i i=1 u j AD () January / 46

29 How many principal components? How Many Principal Components? Similar to question of How many clusters? Magnitude of eigenvalues indicate fraction of variance captured. Eigenvalues on a face image dataset: Magnitude of eigenvalues indicate fraction of variance captured. Typically eigenvalues drop off sharply so you don t need too many λ i i Eigenvalues typically drop off sharply, so don t need that many. AD () January / 46

30 PCA Example Example: λ 1 =0.0938, λ 2 = Example where PCA is of interest. AD () January / 46

31 PCA Example Difficult example Examples where PCA is of no interest. PCA will make no difference between these examples, because the structure on the left is not linear AD () January / 46

32 Image Compression using PCA Given one single image, how can you use the PCA to perform image compression? Many different approaches are possible Example 1: Interpret the columns of the image as different data points x i. Example 2: Interpret the rows of the image as different data points x i. Example 3: Partition the image in non-overlapping small blocks, blocks are now x i. Note: There are better ways to compress images! AD () January / 46

33 Computing the Principal Components Computing Σ takes O ( Nd 2) operations and computing the eigenvectors of the d d matrix Σ takes O ( d 3) operations. This can be very prohibitive! If d >> N, then we can compute the eigenvectors based on the eigenvectors of the so-called N N Gram matrix X T X in O ( N 3) instead. Assume v i is an eigenvector of X T X such that v i = 1 associated to the eigenvalue λ i then by definition X T Xv i = λ i v i so by multiplying both sides by X then XX }{{} T (Xv i ) = λ i (Xv i ) N Σ That is Xv i = ũ i is an eigenvector of Σ associated to the eigenvalue λ i N and we can have a unit norm eigenvector by selecting u i = λ 1/2 i ũ i. AD () January / 46

34 Singular Value Decomposition Given a d N matrix X, the SVD of X is a factorization of the form }{{} X = UDV T = d N r i=1 λ i u i v T i }{{}}{{} d 1 1 N where r = min (d, N), U are the left singular vectors with U T U = I r, V are the right singular vectors with V T V = I r. Right singular vectors are eigenvectors of X T X X T X = (UDV T) T ( UDV T) = VDU T UDV T = VD 2 V T X T XV = VD 2 Left singular vectors are eigenvectors of XX T ( XX T = UDV T) ( UDV T) T = UDV T VDU T and clearly σ 2 i = UD 2 U T X T XU = UD 2 = λ 2 i /N. AD () January / 46

35 Singular Value Decomposition and PCA In the PCA, we have X = U k ( U T k X ). If we plug the SVD decomposition X = U k ( U T k U ) DV T = k λ i u i vi T i=1 i.e. the truncated SVD yields the PCA approximation. This can be computationally beneficial. AD () January / 46

36 Visualization Visualizing data Project 256 dimensional vectors (representing 16x16 images of digits) into 2D AD () January / 46

37 Visualization bed vectors into their z1, z2 coord AD () January / 46

38 Eigen-Faces Each x i R d is a face image x ji = intensity of the j-th pixel in image i d is the number of pixels. Each x i R d is a face image. X d n U d k Z k n (... ) ( ) ( z 1... z n ) Idea: z i more meaningful representation of i-th face than x i. Can use z i for nearest-neighbor classication Much faster: O (dk + Nk) time instead of O (dn) when N, d >> k. AD () January / 46

39 Eigen-Faces with K-NN Eigenfaces K=4 test images closest match in training set using K=4 15 AD () January / 46

40 Eigen-Faces with K-NN Eigenfaces K=10 test images closest match in training set using K=10 17 AD () January / 46

41 Eigen-Faces with K-NN Misclassification rate vs K 40 number of misclassification errors on test set PCA dimensionality AD () January / 46

42 Latent Semantic Analysis PCA can be used to cluster documents and carry out information retrieval by using concepts as opposed to exact word-matching. This enables us to surmount the problems of synonymy (car, auto) and polysemy (money bank, river bank). The data is available in a term-frequency (TF) matrix N is the number of documents. d is the number of words in the vocabulary. Each x i R d is a vector of word counts; x j,i = numbers of occurences of word j in document i. AD () January / 46

43 Example Document 1: {I, eat, chips} Document 2: {computer,chips,chips} Document 3: {intel,computer,chips} We have X = AD () January / 46

44 PCA for Latent Semantic Analysis Using the PCA, we obtain X U k Z k That is we approximate the documents by a linear combination of k basis documents. How to measure similarity between two documents x i and x j? x T i x j is probably better than x T i x j Applications: information retrieval. Note: usually no computational savings; original x is already sparse. AD () January / 46

45 Network Anomaly Detection detection [Lakhina, 05] x ji = amount of traffi c on link j in the network during each time interval i on l i c is sum of flows along a few paths ponent intuitively represents a path Model assumption: total traffi c is sum of flows along a few paths. Apply PCA: each principal component intuitively represents a path. AD () January / 46

46 Model assumption: total traffic is sum of flows along a few paths Apply PCA: each principal component intuitively represents a path Anomaly when traffic deviates from first few principal components Network Anomaly Detection Anomaly when traffi c deviates from first few principal components. Principal component analysis (PCA) / Case studies 2 AD () January / 46

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. CSC 566 Advanced Data Mining Alexander Dekhtyar.. Information Retrieval Latent Semantic Indexing Preliminaries Vector Space Representation of Documents: TF-IDF Documents. A single text document is a

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Linear Dimensionality Reduction Practical Machine Learning (CS294-34) September 24, 2009 Percy Liang Lots of high-dimensional data... face images Zambian President Levy Mwanawasa has won a second term

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Manning & Schuetze, FSNLP, (c)

Manning & Schuetze, FSNLP, (c) page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Manning & Schuetze, FSNLP (c) 1999,2000

Manning & Schuetze, FSNLP (c) 1999,2000 558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

CS 246 Review of Linear Algebra 01/17/19

CS 246 Review of Linear Algebra 01/17/19 1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

Chapter 7: Symmetric Matrices and Quadratic Forms

Chapter 7: Symmetric Matrices and Quadratic Forms Chapter 7: Symmetric Matrices and Quadratic Forms (Last Updated: December, 06) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Throughout these notes we assume V, W are finite dimensional inner product spaces over C. Math 342 - Linear Algebra II Notes Throughout these notes we assume V, W are finite dimensional inner product spaces over C 1 Upper Triangular Representation Proposition: Let T L(V ) There exists an orthonormal

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] Eigenvalue Problems. Introduction Let A an n n real nonsymmetric matrix. The eigenvalue problem: Au = λu λ C : eigenvalue u C n : eigenvector Example:

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1 Eigenvalue Problems. Introduction Let A an n n real nonsymmetric matrix. The eigenvalue problem: EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] Au = λu Example: ( ) 2 0 A = 2 1 λ 1 = 1 with eigenvector

More information

Problems. Looks for literal term matches. Problems:

Problems. Looks for literal term matches. Problems: Problems Looks for literal term matches erms in queries (esp short ones) don t always capture user s information need well Problems: Synonymy: other words with the same meaning Car and automobile 电脑 vs.

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information