Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Size: px
Start display at page:

Download "Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota"

Transcription

1 Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota University of Padua June 7, 18

2 Introduction, background, and motivation Common goal of data mining methods: to extract meaningful information or patterns from data. Very broad area includes: data analysis, machine learning, pattern recognition, information retrieval,... Main tools used: linear algebra; graph theory; approximation theory; optimization;... In this talk: emphasis on dimension reduction techniques and the interrelations between techniques Padua 06/07/18 p. 2

3 Introduction: a few factoids We live in an era increasingly shaped by DATA bytes of data created in % of data on internet created since Billion internet users in Million Google searches worldwide / minute (5.2 B/day) 15.2 Million text messages worldwide / minute Mixed blessing: Opportunities & big challenges. Trend is re-shaping & energizing many research areas including : numerical linear algebra Padua 06/07/18 p. 3

4 Drowning in data Dimension Reduction PLEASE! Picture modified from

5 Major tool of Data Mining: Dimension reduction Goal is not as much to reduce size (& cost) but to: Reduce noise and redundancy in data before performing a task [e.g., classification as in digit/face recognition] Discover important features or paramaters The problem: Given: X = [x 1,, x n ] R m n, find a low-dimens. representation Y = [y 1,, y n ] R d n of X Achieved by a mapping Φ : x R m y R d so: φ(x i ) = y i, i = 1,, n Padua 06/07/18 p. 5

6 n m X x i d Y y n i Φ may be linear : y j = W x j, j, or, Y = W X... or nonlinear (implicit). Mapping Φ required to: Preserve proximity? Maximize variance? Preserve a certain graph? Padua 06/07/18 p. 6

7 Basics: Principal Component Analysis (PCA) In Principal Component Analysis W is computed to maximize variance of projected data: max W R m d ;W W =I n i=1 y i 1 n n j=1 y j 2 2, y i = W x i. Leads to maximizing Tr [ W (X µe )(X µe ) W ], µ = 1 n Σn i=1 x i Solution W = { dominant eigenvectors } of the covariance matrix Set of left singular vectors of X = X µe Padua 06/07/18 p. 7

8 SVD: X = UΣV, U U = I, V V = I, Σ = Diag Optimal W = U d matrix of first d columns of U Solution W also minimizes reconstruction error.. i x i W W T x i 2 = i x i W y i 2 In some methods recentering to zero is not done, i.e., X replaced by X. Padua 06/07/18 p. 8

9 Unsupervised learning Unsupervised learning : methods do not exploit labeled data Example of digits: perform a 2-D projection Images of same digit tend to cluster (more or less) Such 2-D representations are popular for visualization Can also try to find natural clusters in data, e.g., in materials Basic clusterning technique: K- means PCA digits : Superconductors Multi ferroics Superhard Catalytic Photovoltaic Ferromagnetic Thermo electric Padua 06/07/18 p. 9

10 Example: The Swill-Roll (00 points in 3-D) Original Data in 3 D Padua 06/07/18 p.

11 2-D reductions : 15 PCA 15 LPP Eigenmaps ONPP Padua 06/07/18 p. 11

12 Example: Digit images (a random sample of 30) Padua 06/07/18 p. 12

13 2-D reductions : PCA digits : LLE digits : K PCA digits : ONPP digits : x 3 Padua 06/07/18 p. 13

14 DIMENSION REDUCTION EXAMPLE: INFORMATION RETRIEVAL

15 Information Retrieval: Vector Space Model Given: a collection of documents (columns of a matrix A) and a query vector q. Collection represented by an m n term by document matrix with a ij = L ij G i N j Queries ( pseudo-documents ) q are represented similarly to a column Padua 06/07/18 p. 15

16 Vector Space Model - continued Problem: find a column of A that best matches q Similarity metric: cos of angle between a column of A and q q T A(:, j) q 2 A(:, j) 2 To rank all documents we need to compute s = q T A s = similarity (row) vector Literal matching not very effective. Padua 06/07/18 p. 16

17 Common approach: Use the SVD Need to extract intrinsic information or underlying semantic information LSI: replace A by a low rank approximation [from SVD] A = UΣV T A k = U k Σ k V T k U k : term space, V k : document space. New similarity vector: s k = q T A k = q T U k Σ k V T k Called Truncated SVD or TSVD in context of regularization Main issues: 1) computational cost 2) Updates Padua 06/07/18 p. 17

18 Use of polynomial filters Idea: Replace A k by Aφ(A T A), where φ == a filter function Consider the stepfunction (Heaviside): φ(x) = { 0, 0 x σ 2 k 1, σ 2 k x σ2 1 This would yield the same result as with TSVD but Not easy to use this function directly Solution : use a polynomial approximation to φ... then... s T = q T Aφ(A T A), requires only Mat-Vec s * See: E. Kokiopoulou & YS 04 Padua 06/07/18 p. 18

19 IR: Use of the Lanczos algorithm (J. Chen, YS 09) Lanczos algorithm = Projection method on Krylov subspace Span{v, Av,, A m 1 v} Can get singular vectors with Lanczos, & use them in LSI Better: Use the Lanczos vectors directly for the projection K. Blom and A. Ruhe [SIMAX, vol. 26, 05] perform a Lanczos run for each query [expensive]. Proposed: One Lanczos run- random initial vector. Then use Lanczos vectors in place of singular vectors. In short: Results comparable to those of SVD at a much lower cost. Padua 06/07/18 p. 19

20 Background: The Lanczos procedure Let A an n n symmetric matrix ALGORITHM : 1 Lanczos 1. Choose vector v 1 with v 1 2 = 1. Set β 1 0, v For j = 1, 2,..., m Do: 3. α j := vj T Av j 4. w := Av j α j v j β j v j 1 5. β j+1 := w 2. If β j+1 = 0 then Stop 6. v j+1 := w/β j+1 7. EndDo Note: Scalars β j+1, α j, selected so that v j+1 v j, v j+1 v j 1, and v j+1 2 = 1. Padua 06/07/18 p.

21 Background: Lanczos (cont.) Lanczos recurrence: β j+1 v j+1 = Av j α j v j β j v j 1 Let V m = [v 1, v 2,, v m ]. Then we have: V T m AV m = T m = α 1 β 2 β 2 α 2... β β m 1 α m 1 β m β m α m In theory {v j } s are orthonormal. In practice need to reorthogonalize Padua 06/07/18 p. 21

22 Tests: IR Information retrieval datasets # Terms # Docs # queries sparsity MED 7,014 1, CRAN 3,763 1, Med dataset. Cran dataset. Preprocessing times preprocessing time lanczos tsvd Med iterations preprocessing time lanczos tsvd Cran iterations Padua 06/07/18 p. 22

23 Average query times Med dataset Cran dataset. average query time 5 x Med 3 lanczos tsvd lanczos rf 4 tsvd rf average query time 4 x Cran 3 lanczos tsvd lanczos rf tsvd rf iterations iterations Padua 06/07/18 p. 23

24 Average retrieval precision Med dataset Cran dataset 0.9 Med 0.9 Cran average precision average precision iterations lanczos tsvd lanczos rf tsvd rf iterations lanczos tsvd lanczos rf tsvd rf Retrieval precision comparisons Padua 06/07/18 p. 24

25 Supervised learning We now have data that is labeled Example: (health sciences) malignant - non malignant Example: (materials) photovoltaic, hard, conductor,... Example: (Digit recognition) Digits 0, 1,..., 9 d e c f a b g Padua 06/07/18 p. 25

26 Supervised learning We now have data that is labeled Example: (health sciences) malignant - non malignant Example: (materials) photovoltaic, hard, conductor,... Example: (Digit recognition) Digits 0, 1,..., 9 d e c f d e c f?? a b g a b g Padua 06/07/18 p. 26

27 Supervised learning: classification Problem: Given labels (say A and B ) for each item of a given set, find a mechanism to classify an unlabelled item into either the A or the B" class. Many applications.?? Example: distinguish SPAM and non-spam messages Can be extended to more than 2 classes. Padua 06/07/18 p. 27

28 Another application: Padua 06/07/18 p. 28

29 Supervised learning: classification Best illustration: written digits recognition example Given: a set of labeled samples (training set), and an (unlabeled) test image. Problem: find label of test image Digit 0 Digit 0 Digfit 1 Digit Training data Digfit 1 Digit 2 Digit 9 Digit 9 Digit?? Test data Digit?? Dimension reduction Roughly speaking: we seek dimension reduction so that recognition is more effective in low-dim. space Padua 06/07/18 p. 29

30 Basic method: K-nearest neighbors (KNN) classification Idea of a voting system: get distances between test sample and training samples Get the k nearest neighbors (here k = 8) Predominant class among these k items is assigned to the test sample ( here)? Padua 06/07/18 p. 30

31 Supervised learning: Linear classification Linear classifiers: Find a hyperplane which best separates the data in classes A and B. Example of application: Distinguish between SPAM and non-spam e- mails Linear classifier Note: The world in non-linear. Often this is combined with Kernels amounts to changing the inner product Padua 06/07/18 p. 31

32 A harder case: 4 Spectral Bisection (PDDP) Use kernels to transform

33 0.1 Projection with Kernels σ 2 = Transformed data with a Gaussian Kernel Padua 06/07/18 p. 33

34 Simple linear classifiers Let X = [x 1,, x n ] be the data matrix. and L = [l 1,, l n ] the labels either +1 or -1. 1st Solution: Find a vector u such that u T x i close to l i, i Common solution: SVD to reduce dimension of data [e.g. 2-D] then do comparison in this space. e.g. A: u T x i 0, B: u T x i < 0 v [For clarity: principal axis u drawn below where it should be] Padua 06/07/18 p. 34

35 Fisher s Linear Discriminant Analysis (LDA) Principle: Use label information to build a good projector, i.e., one that can discriminate well between classes Define between scatter : a measure of how well separated two distinct classes are. Define within scatter : a measure of how well clustered items of the same class are. Objective: make between scatter measure large and within scatter small. Idea: Find projector that maximizes the ratio of the between scatter measure over within scatter measure Padua 06/07/18 p. 35

36 Define: Where: S B = S W = c k=1 c k=1 n k (µ (k) µ)(µ (k) µ) T, (x i µ (k) )(x i µ (k) ) T x i X k µ = mean (X) µ (k) = mean (X k ) X k = k-th class n k = X k X 1 CLUSTER CENTROIDS GLOBAL CENTROID X 2 µ X 3 Padua 06/07/18 p. 36

37 Consider 2nd moments for a vector a: a T S B a = a T S W a = c n k a T (µ (k) µ) 2, i=1 c k=1 a T (x i µ (k) ) 2 x i X k a T S B a weighted variance of projected µ j s a T S W a w. sum of variances of projected classes X j s LDA projects the data so as to maximize the ratio of these two numbers: max a a T S B a a T S W a Optimal a = eigenvector associated with the largest eigenvalue of: S B u i = λ i S W u i. Padua 06/07/18 p. 37

38 LDA Extension to arbitrary dimensions Criterion: maximize the ratio of two traces: Tr [U T S B U] Tr [U T S W U] Constraint: U T U = I (orthogonal projector). Reduced dimension data: Y = U T X. Common viewpoint: hard to maximize, therefore alternative: Solve instead the ( easier ) problem: max Tr [U T S B U] U T S W U=I Solution: largest eigenvectors of S B u i = λ i S W u i. Padua 06/07/18 p. 38

39 LDA Extension to arbitrary dimensions (cont.) Consider the original problem: max U R n p, U T U=I Tr [U T AU] Tr [U T BU] Let A, B be symmetric & assume that B is semi-positive definite with rank(b) > n p. Then Tr [U T AU]/Tr [U T BU] has a finite maximum value ρ. The maximum is reached for a certain U that is unique up to unitary transforms of columns. Consider the function: f(ρ) = max V T V =I Tr [V T (A ρb)v ] Call V (ρ) the maximizer for an arbitrary given ρ. Note: V (ρ) = Set of eigenvectors - not unique Padua 06/07/18 p. 39

40 Define G(ρ) A ρb and its n eigenvalues: µ 1 (ρ) µ 2 (ρ) µ n (ρ). Clearly: f(ρ) = µ 1 (ρ) + µ 2 (ρ) + + µ p (ρ). Can express this differently. Define eigenprojector: P (ρ) = V (ρ)v (ρ) T Then: f(ρ) = Tr [V (ρ) T G(ρ)V (ρ)] = Tr [G(ρ)V (ρ)v (ρ) T ] = Tr [G(ρ)P (ρ)]. Padua 06/07/18 p. 40

41 Recall [e.g. Kato 65] that: P (ρ) = 1 2πi Γ (G(ρ) zi) 1 dz Γ is a smooth curve containing the p eivenvalues of interest Hence: f(ρ) = 1 2πi Tr G(ρ)(G(ρ) zi) 1 dz =... Γ = 1 2πi Tr z(g(ρ) zi) 1 dz With this, can prove : 1. f is a non-increasing function of ρ; 2. f(ρ) = 0 iff ρ = ρ ; 3. f (ρ) = Tr [V (ρ) T BV (ρ)] Γ Padua 06/07/18 p. 41

42 Can now use Newton s method. Careful when defining V (ρ): define the eigenvectors so the mapping V (ρ) is differentiable. ρ new = ρ Tr [V (ρ)t (A ρb)v (ρ)] Tr [V (ρ) T BV (ρ)] = Tr [V (ρ)t AV (ρ)] Tr [V (ρ) T BV (ρ)] Newton s method to find the zero of f a fixed point iteration with: g(ρ) = Tr [V T (ρ)av (ρ)] Tr [V T (ρ)bv (ρ)]. Idea: Compute V (ρ) by a Lanczos-type procedure Note: Standard problem - [not generalized] inexpensive See T. Ngo, M. Bellalij, and Y.S. for details Padua 06/07/18 p. 42

43 GRAPH-BASED TECHNIQUES

44 Multilevel techniques in brief Divide and conquer paradigms as well as multilevel methods in the sense of domain decomposition Main principle: very costly to do an SVD [or Lanczos] on the whole set. Why not find a smaller set on which to do the analysis without too much loss? Tools used: graph coarsening, divide and conquer For text data we use hypergraphs Padua 06/07/18 p. 44

45 Multilevel Dimension Reduction Main Idea: coarsen for a few levels. Use the resulting data set ˆX to find a projector P from R m to R d. P can be used to project original data or new data Graph... Coarsen Last Level Coarsen n X m m X r n r Project n Y Yr n r d d Gain: Dimension reduction is done with a much smaller set. Hope: not much loss compared to using whole data Padua 06/07/18 p. 45

46 Making it work: Use of Hypergraphs for sparse data * * * * a * * * * b A = * * * * c * * * d * * e e net e c a b 5 d 6 net d Left: a (sparse) data set of n entries in R m represented by a matrix A R m n Right: corresponding hypergraph H = (V, E) with vertex set V representing to the columns of A. Padua 06/07/18 p. 46

47 Hypergraph Coarsening uses column matching similar to a common one used in graph partitioning Compute the non-zero inner product a (i), a (j) between two vertices i and j, i.e., the ith and jth columns of A. Note: a (i), a (j) = a (i) a (j) cos θ ij. Modif. 1: Parameter: 0 < ɛ < 1. Match two vertices, i.e., columns, only if angle between the vertices satisfies: tan θ ij ɛ Modif. 2: Scale coarsened columns. If i and j matched and if a (i) 0 a (j) 0 replace a (i) and a (j) by ) c (l) = ( 1 + cos 2 θ ij a (i) Padua 06/07/18 p. 47

48 Call C the coarsened matrix obtained from A using the approach just described Lemma: Let C R m c be the coarsened matrix of A obtained by one level of coarsening of A R m n, with columns a (i) and a (j) matched if tan θ i ɛ. Then x T AA T x x T CC T x 3ɛ A 2 F, for any x R m with x 2 = 1. Very simple bound for Rayleigh quotients for any x. Implies some bounds on singular values and norms - skipped. Padua 06/07/18 p. 48

49 Tests: Comparing singular values Singular values to 50 and approximations Exact power it1 coarsen coars+zs rand Singular values to 50 and approximations Exact power it1 coarsen coars+zs rand Singular values to 50 and approximations Exact power it1 coarsen coars+zs rand Results for the datasets CRANFIELD (left), MEDLINE (middle), and TIME (right). Padua 06/07/18 p. 49

50 Low rank approximation: Coarsening, random sampling, and rand+coarsening. Err1 = A H k Hk T A F ; Err2= 1 k k Dataset n k c Coarsen Rand Sampl Err1 Err2 Err1 Err2 ˆσ i σ i σ i Kohonen aft FA chipcool brainpc scfxm1-2b thermomechtc Webbase-1M Padua 06/07/18 p. 50

51 LINEAR ALGEBRA METHODS: EXAMPLES

52 Updating the SVD (E. Vecharynski and YS 13) In applications, data matrix X often updated Example: Information Retrieval (IR), can add documents, add terms, change weights,.. Problem Given the partial SVD of X, how to get a partial SVD of X new Will illustrate only with update of the form X new = [X, D] (documents added in IR) Padua 06/07/18 p. 52

53 Updating the SVD: Zha-Simon algorithm Assume A U k Σ k V T k Compute D k = (I U k U T k )D and A D = [A, D], D R m p and its QR factorization: [Û p, R] = qr(d k, 0), R R p p, Û p R m p Note: A D [U k, Û p ]H D [ Vk 0 0 I p ] T ; H D [ Σk U T k D 0 R Zha Simon ( 99): Compute the SVD of H D & get approximate SVD from above equation It turns out this is a Rayleigh-Ritz projection method for the SVD [E. Vecharynski & YS 13] Can show optimality properties as a result ] Padua 06/07/18 p. 53

54 Updating the SVD When the number of updates is large this becomes costly. Idea: Replace Û p by a low dimensional approximation: Use Ū of the form Ū = [U k, Z l ] instead of Ū = [U k, Û p ] Z l must capture the range of D k = (I U k U T k )D Simplest idea : best rank l approximation using the SVD. Can also use Lanczos vectors from the Golub-Kahan-Lanczos algorithm. Padua 06/07/18 p. 54

55 An example LSI - with MEDLINE collection: m = 7, 014 (terms), n = 1, 033 (docs), k = 75 (dimension), t = 533 (initial # docs), n q = 30 (queries) Adding blocks of 25 docs at a time The number of singular triplets of (I U k Uk T projection ( SV ) is 2. )D using SVD For GKL approach ( GKL ) 3 GKL vectors are used These two methods are compared to Zha-Simon ( ZS ). We show average precision then time Padua 06/07/18 p. 55

56 ZS SV GKL Retrieval accuracy, p = 25 Average precision Number of documents Experiments show: gain in accuracy is rather consistent Padua 06/07/18 p. 56

57 ZS SV GKL Updating time, p = 25 Time (sec.) Number of documents Times can be significantly better for large sets Padua 06/07/18 p. 57

58 Conclusion Interesting new matrix problems in areas that involve the effective mining of data Among the most pressing issues is that of reducing computational cost - [SVD, SDP,..., too costly] Many online resources available Huge potential in areas like materials science though inertia has to be overcome To a researcher in computational linear algebra : big tide of change on types or problems, algorithms, frameworks, culture,.. But change should be welcome

59 When one door closes, another opens; but we often look so long and so regretfully upon the closed door that we do not see the one which has opened for us. Alexander Graham Bell ( ) In the words of Lao Tzu: If you do not change directions, you may end-up where you are heading Thank you! Visit my web-site at Padua 06/07/18 p. 59

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the

More information

Linear algebra methods for data mining Yousef Saad Department of Computer Science and Engineering University of Minnesota.

Linear algebra methods for data mining Yousef Saad Department of Computer Science and Engineering University of Minnesota. Linear algebra methods for data mining Yousef Saad Department of Computer Science and Engineering University of Minnesota NCSU Jan 31, 2014 Introduction: a few factoids Data is growing exponentially at

More information

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota NASCA 2018, Kalamata July 6, 2018 Introduction, background, and

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM 12 Annual Meeting Twin Cities, July 13th, 12

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota ICSC 2012, Hong Kong, Jan 4-7, 2012 HAPPY BIRTHDAY

More information

Efficient Linear Algebra methods for Data Mining Yousef Saad Department of Computer Science and Engineering University of Minnesota

Efficient Linear Algebra methods for Data Mining Yousef Saad Department of Computer Science and Engineering University of Minnesota Efficient Linear Algebra methods for Data Mining Yousef Saad Department of Computer Science and Engineering University of Minnesota NUMAN-08 Kalamata, 08 Team members involved in this work Support Past:

More information

A few applications of the SVD

A few applications of the SVD A few applications of the SVD Many methods require to approximate the original data (matrix) by a low rank matrix before attempting to solve the original problem Regularization methods require the solution

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering From data-mining to nanotechnology and back: The new problems of numerical linear algebra Yousef Saad Department of Computer Science and Engineering University of Minnesota Modelling 09 Roznov pod Radhostem

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering From data-mining to nanotechnology and back: The new problems of numerical linear algebra Yousef Saad Department of Computer Science and Engineering University of Minnesota Computer Science Colloquium

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1 Eigenvalue Problems. Introduction Let A an n n real nonsymmetric matrix. The eigenvalue problem: EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] Au = λu Example: ( ) 2 0 A = 2 1 λ 1 = 1 with eigenvector

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] Eigenvalue Problems. Introduction Let A an n n real nonsymmetric matrix. The eigenvalue problem: Au = λu λ C : eigenvalue u C n : eigenvector Example:

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization General Tools for Solving Large Eigen-Problems

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

9 Searching the Internet with the SVD

9 Searching the Internet with the SVD 9 Searching the Internet with the SVD 9.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

LARGE SPARSE EIGENVALUE PROBLEMS

LARGE SPARSE EIGENVALUE PROBLEMS LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization 14-1 General Tools for Solving Large Eigen-Problems

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Lecture 5: Web Searching using the SVD

Lecture 5: Web Searching using the SVD Lecture 5: Web Searching using the SVD Information Retrieval Over the last 2 years the number of internet users has grown exponentially with time; see Figure. Trying to extract information from this exponentially

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

LEC 5: Two Dimensional Linear Discriminant Analysis

LEC 5: Two Dimensional Linear Discriminant Analysis LEC 5: Two Dimensional Linear Discriminant Analysis Dr. Guangliang Chen March 8, 2016 Outline Last time: LDA/QDA (classification) Today: 2DLDA (dimensionality reduction) HW3b and Midterm Project 3 What

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

13 Searching the Web with the SVD

13 Searching the Web with the SVD 13 Searching the Web with the SVD 13.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts

More information

USING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING

USING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING POZNAN UNIVE RSIY OF E CHNOLOGY ACADE MIC JOURNALS No. 80 Electrical Engineering 2014 Hussam D. ABDULLA* Abdella S. ABDELRAHMAN* Vaclav SNASEL* USING SINGULAR VALUE DECOMPOSIION (SVD) AS A SOLUION FOR

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Principal Component Analysis and Linear Discriminant Analysis

Principal Component Analysis and Linear Discriminant Analysis Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Lecture Notes 2: Matrices

Lecture Notes 2: Matrices Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

CS 572: Information Retrieval

CS 572: Information Retrieval CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?) Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Matrix Approximations

Matrix Approximations Matrix pproximations Sanjiv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 00 Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning Latent Semantic Indexing (LSI) Given n documents

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Quick Introduction to Nonnegative Matrix Factorization

Quick Introduction to Nonnegative Matrix Factorization Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Problems. Looks for literal term matches. Problems:

Problems. Looks for literal term matches. Problems: Problems Looks for literal term matches erms in queries (esp short ones) don t always capture user s information need well Problems: Synonymy: other words with the same meaning Car and automobile 电脑 vs.

More information

FAST UPDATING ALGORITHMS FOR LATENT SEMANTIC INDEXING

FAST UPDATING ALGORITHMS FOR LATENT SEMANTIC INDEXING SIAM J. MATRIX ANAL. APPL. Vol. 35, No. 3, pp. 1105 1131 c 2014 Society for Industrial and Applied Mathematics FAST UPDATING ALGORITHMS FOR LATENT SEMANTIC INDEXING EUGENE VECHARYNSKI AND YOUSEF SAAD Abstract.

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Linear Algebra Background

Linear Algebra Background CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection

More information

Enhanced graph-based dimensionality reduction with repulsion Laplaceans

Enhanced graph-based dimensionality reduction with repulsion Laplaceans Enhanced graph-based dimensionality reduction with repulsion Laplaceans E. Kokiopoulou a, Y. Saad b a EPFL, LTS4 lab, Bat. ELE, Station 11; CH 1015 Lausanne; Switzerland. Email: effrosyni.kokiopoulou@epfl.ch

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information