Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Size: px
Start display at page:

Download "Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia"

Transcription

1 Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June /30

2 Contact information Network Feature s Please inform me of any typos or errors at geoff.roeder@gmail.com 2/30

3 Outline Network Feature s 1 2 Network Feature s 3/30

4 Outline Network Feature s 1 2 Network Feature s 4/30

5 Formal Definition s as Multi-Linear Maps Network Feature s The tensor product of two vector spaces V and W over a field F is another vector space over F. It is denoted V K W or V W when the field is understood. 5/30

6 Formal Definition s as Multi-Linear Maps Network Feature s The tensor product of two vector spaces V and W over a field F is another vector space over F. It is denoted V K W or V W when the field is understood. If {v i } and {w j } are bases for V and W, then the set {v i w j } is a basis for V W 5/30

7 Formal Definition s as Multi-Linear Maps Network Feature s The tensor product of two vector spaces V and W over a field F is another vector space over F. It is denoted V K W or V W when the field is understood. If {v i } and {w j } are bases for V and W, then the set {v i w j } is a basis for V W If S : V X and T : W Y are linear maps, then the tensor product of S and T is a linear map defined by S T : V W X Y (S T )(v w) = S(v) T (w) 5/30

8 Sufficient Definition s as Multidimensional Arrays (adapted from Kolda and Bader 2009) Network Feature s With respect to a given basis, a tensor is a multidimensional array. E.g., a real N th -order tensor X R I 1 I 2... I N w.r.t. the standard basis is an N-dimensional array where X i1,i 2,...,i N is the element at index (i 1, i 2,..., i N ) 6/30

9 Sufficient Definition s as Multidimensional Arrays (adapted from Kolda and Bader 2009) Network Feature s With respect to a given basis, a tensor is a multidimensional array. E.g., a real N th -order tensor X R I 1 I 2... I N w.r.t. the standard basis is an N-dimensional array where X i1,i 2,...,i N is the element at index (i 1, i 2,..., i N ) The order of a tensor is the number of dimensions or modes or indices required to uniquely identify an element 6/30

10 Sufficient Definition s as Multidimensional Arrays (adapted from Kolda and Bader 2009) Network Feature s With respect to a given basis, a tensor is a multidimensional array. E.g., a real N th -order tensor X R I 1 I 2... I N w.r.t. the standard basis is an N-dimensional array where X i1,i 2,...,i N is the element at index (i 1, i 2,..., i N ) The order of a tensor is the number of dimensions or modes or indices required to uniquely identify an element So, a scalar is a 0 mode tensor, a vector is a 1 mode tensor, and a matrix is a 2 mode tensor 6/30

11 s Subarrays Network Feature s fibers: higher-order analogue to matrix rows and columns. For 3 rd -order tensor, fix all but one index. Figure: Fibers of a 3-mode tensor 7/30

12 s Subarrays Network Feature s fibers: higher-order analogue to matrix rows and columns. For 3 rd -order tensor, fix all but one index. Figure: Fibers of a 3-mode tensor All fibers are treated as columns vectors by convention. 7/30

13 s Subarrays Network Feature s slices: two-dimensional sections of a tensor, defined by fixing all but two indices: Figure: Slices of a 3 rd -order tensor 8/30

14 s Subarrays Network Feature s slices: two-dimensional sections of a tensor, defined by fixing all but two indices: Figure: Slices of a 3 rd -order tensor Slices can be horizontal, lateral, or frontal, corresponding to the diagrams above. 8/30

15 s Operations -Matrix Product Network Feature s Consider multiplying tensor by a matrix or vector in the n th mode. 9/30

16 s Operations -Matrix Product Network Feature s Consider multiplying tensor by a matrix or vector in the n th mode. The n-mode (matrix) product of a tensor X R I 1 I 2... I N with a matrix S R J In is denoted X n S and lives in R I 1... I n 1 J I n+1... I N 9/30

17 s Operations -Matrix Product Network Feature s Consider multiplying tensor by a matrix or vector in the n th mode. The n-mode (matrix) product of a tensor X R I 1 I 2... I N with a matrix S R J In is denoted X n S and lives in R I 1... I n 1 J I n+1... I N Elementwise, (X n S) i1,...,i n 1,j,i n+1,...,i N = I n i n=1 x i1,i 2,...,i N s j,in 9/30

18 s Operations -Matrix Product Network Feature s Example. Consider the tensor X R 3 2 2, with frontal slices [ ] X :,:,1 = 2 5 X :,:,2 = U = /30

19 s Operations -Matrix Product Network Feature s Example. Consider the tensor X R 3 2 2, with frontal slices [ X :,:,1 = 2 5 X :,:,2 = U = Then, the (1, 1, 1) element of the 1-mode matrix product of X and U is: I 1 =3 (X 1 U) 1,1,1 = x i1,1,1 u 1,i1 i 1 =1 = = 22 ] 10/30

20 s Operations -Vector Product Network Feature s The n-mode (vector) product of a tensor X R I 1 I 2... I N with a vector w R In is denoted X n w and lives in R I 1... I n 1 I n+1... I N. 11/30

21 s Operations -Vector Product Network Feature s The n-mode (vector) product of a tensor X R I 1 I 2... I N with a vector w R In is denoted X n w and lives in R I 1... I n 1 I n+1... I N. Elementwise, (X n w) i1,...,i n 1,i n+1,...,i N = I N i n=1 x i1,i 2,...,i N w in 11/30

22 s Operations -Vector Product Network Feature s Example. Consider the tensor X R as defined as before, where X 1,1,1 = 1 and X 1,2,1 = 4. Then, the (1, 1) element of the 2-mode vector product of X and w = [1 2] is I 2 =2 (X 2 w) 1,1 = x i1,i 2,i 3 w i2 i 2 =1 = = 9 12/30

23 Multilinear form of a X Network Feature s The multilinear form for a tensor X R I 1 I 2 I 3 as follows. is defined 13/30

24 Multilinear form of a X Network Feature s The multilinear form for a tensor X R I 1 I 2 I 3 is defined as follows. Consider matrices M i R q i p i for i 1, 2, 3. Then, the tensor X (M 1, M 2, M 3 ) R p 1 p 2 p 3 is defined as X j1,j 2,j 3 M 1 (j 1, :) M 2 (j 2, :) M 3 (j 3, :) j 1 [q 1 ] j 2 [q 2 ] j 3 [q 3 ] 13/30

25 Multilinear form of a X Network Feature s The multilinear form for a tensor X R I 1 I 2 I 3 is defined as follows. Consider matrices M i R q i p i for i 1, 2, 3. Then, the tensor X (M 1, M 2, M 3 ) R p 1 p 2 p 3 is defined as X j1,j 2,j 3 M 1 (j 1, :) M 2 (j 2, :) M 3 (j 3, :) j 1 [q 1 ] j 2 [q 2 ] j 3 [q 3 ] A simpler case is with vectors v, w R d. Then, X (I, v, w) := v j w l T (:, j, l) R d j,l [d] which is a multilinear combination of the mode-1 fibers (columns) of the tensor X 13/30

26 Rank One s Network Feature s An N-way tensor X is called rank one if it can be written as the tensor product of N vectors, i.e., X = a (1) a (2)... a (N). (1) where is the vector (outer) product. The simple form of (1) implies that X i1,i 2,...,i N = a (1) i 1 a (2) i 2... a (N) i N 14/30

27 Rank One s Network Feature s An N-way tensor X is called rank one if it can be written as the tensor product of N vectors, i.e., X = a (1) a (2)... a (N). (1) where is the vector (outer) product. The simple form of (1) implies that X i1,i 2,...,i N = a (1) i 1 a (2) i 2... a (N) i N The following diagram exhibits a rank one 3-mode tensor X = a b c: 14/30

28 Rank k s Network Feature s An order-3 tensor X R d d d is said to have rank k if it can be written as the sum of k rank-1 tensors, i.e., X = k w i a i b i c i, w i R, a i, b i, c i R d. i=1 15/30

29 Rank k s Network Feature s An order-3 tensor X R d d d is said to have rank k if it can be written as the sum of k rank-1 tensors, i.e., X = k w i a i b i c i, w i R, a i, b i, c i R d. i=1 Analogy to SVD where M = i σ iu v : suggests finding a decomposition of an arbitrary tensor into a spectrum of rank-one components: 15/30

30 Outline Network Feature s 1 2 Network Feature s 16/30

31 Motivation Network Feature s Consider a 3 rd -order tensor of the form A = i w ia i a i a i. Considering A as a multilinear map, we can represent its action on lower-order input tensors (vectors and matrices) using its multilinear form: A(B, C, D) := i w i (B T a i ) (C T a i ) (D T a i ). 17/30

32 Motivation Network Feature s Consider a 3 rd -order tensor of the form A = i w ia i a i a i. Considering A as a multilinear map, we can represent its action on lower-order input tensors (vectors and matrices) using its multilinear form: A(B, C, D) := i w i (B T a i ) (C T a i ) (D T a i ). Now suppose A had orthonormal columns. Then, M 3 (I, a 1, a 1 ) = i w i (I a i ) (a i a 1 ) 2 = w 1 a /30

33 Motivation Network Feature s Consider a 3 rd -order tensor of the form A = i w ia i a i a i. Considering A as a multilinear map, we can represent its action on lower-order input tensors (vectors and matrices) using its multilinear form: A(B, C, D) := i w i (B T a i ) (C T a i ) (D T a i ). Now suppose A had orthonormal columns. Then, M 3 (I, a 1, a 1 ) = i w i (I a i ) (a i a 1 ) 2 = w 1 a This is analogous to an eigenvector of a matrix. If v is an eigenvector of M we can write Mv = M(I, v) = λv 17/30

34 Whitening the Network Feature s We re extremely unlikely to encounter an empirical tensor built from orthogonal components like this... 18/30

35 Whitening the Network Feature s We re extremely unlikely to encounter an empirical tensor built from orthogonal components like this... But can learn a whitening transform using the second moment, M 2 = i w ia i a i i w ia i a i. 18/30

36 Whitening the Network Feature s We re extremely unlikely to encounter an empirical tensor built from orthogonal components like this... But can learn a whitening transform using the second moment, M 2 = i w ia i a i i w ia i a i. Whitening transforms the covariance matrix to the identity matrix. The data is thereby decorrelated with unit variance. The following diagram displays the action of a whitening transform on data sampled from a bivariate Gaussian: 18/30

37 The whitening transform is invertible so long as the empirical second moment matrix has full column rank Network Feature s 19/30

38 Network Feature s The whitening transform is invertible so long as the empirical second moment matrix has full column rank Given the whitening matrix W, we can whiten the empirical third moment tensor by evaluating T = M 3 (W, W, W ) = w i (W a i ) 3 = w i v 3 i i where {v i } is now an orthogonal basis i [k] 19/30

39 Procedure Start from a whitened tensor T. Then: Network Feature s 20/30

40 Procedure Network Feature s Start from a whitened tensor T. Then: 1 Randomly initialize v. Evaluate the expression v T (I, v, v) T (I, v, v, ) until convergence to obtain v with eigenvalue λ 20/30

41 Procedure Network Feature s Start from a whitened tensor T. Then: 1 Randomly initialize v. Evaluate the expression v T (I, v, v) T (I, v, v, ) until convergence to obtain v with eigenvalue λ 2 Deflate T = T λv v v. Store eigenvalue/eigenvector pair, and then go to 1. 20/30

42 Procedure Network Feature s Start from a whitened tensor T. Then: 1 Randomly initialize v. Evaluate the expression v T (I, v, v) T (I, v, v, ) until convergence to obtain v with eigenvalue λ 2 Deflate T = T λv v v. Store eigenvalue/eigenvector pair, and then go to 1. This leads to the algorithm for recovering the columns of a parameter matrix by representing its columns as moments: 20/30

43 of of Moments Network Feature s factorization is NP-hard in general 21/30

44 of of Moments Network Feature s factorization is NP-hard in general For orthogonal tensors, factorization is polynomial in sample size and number of operations 21/30

45 of of Moments Network Feature s factorization is NP-hard in general For orthogonal tensors, factorization is polynomial in sample size and number of operations Unlike EM algorithm or variational Bayes, this method converges to the global optimum 21/30

46 of of Moments Network Feature s factorization is NP-hard in general For orthogonal tensors, factorization is polynomial in sample size and number of operations Unlike EM algorithm or variational Bayes, this method converges to the global optimum For a more detailed analysis and how to frame any latent variable model using this method, see newport.eecs.uci.edu/anandkumar/mlss.html 21/30

47 Outline Network Feature s 1 2 Network Feature s 22/30

48 Network Feature s Very recent work (last arxiv.org datestamp: Jan ) on finding optimal weights for a two-layer neural network, with notes on how to generalize to more complex architectures: 23/30

49 Network Feature s Very recent work (last arxiv.org datestamp: Jan ) on finding optimal weights for a two-layer neural network, with notes on how to generalize to more complex architectures: Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. Beating the Perils of Non-Convexity: Guaranteed Training of using s. 23/30

50 Network Feature s Network Feature s Target network is a label-generating model with architecture f (x) := E[ỹ x] = A 1 σ(a 2 x + b 1) + b 2 : 24/30

51 Network Feature s Network Feature s Target network is a label-generating model with architecture f (x) := E[ỹ x] = A 1 σ(a 2 x + b 1) + b 2 : Must assume input pdf p(x) is known sufficiently well for learning (or can be estimated using unsupervised methods) 24/30

52 Network Feature s Network Feature s Key insight: there exists a transformation φ( ) of the input {(x i, y i )} that captures the relationship between the parameter matrices A 1 and A 2 and the input 25/30

53 Network Feature s Network Feature s Key insight: there exists a transformation φ( ) of the input {(x i, y i )} that captures the relationship between the parameter matrices A 1 and A 2 and the input The transformation generates feature tensors that can be factorized using the method of moments 25/30

54 Network Feature s Network Feature s Key insight: there exists a transformation φ( ) of the input {(x i, y i )} that captures the relationship between the parameter matrices A 1 and A 2 and the input The transformation generates feature tensors that can be factorized using the method of moments m th -order Score function, defined as (Janzamin et al. 2014) S m (x) := ( 1) m (m) x p(x) p(x) where p(x) is the pdf of the random vector x R d and is the m th order derivative operator (m) x 25/30

55 Network Feature s The 1st order score function is the normalized gradient of the log of the input density function 26/30

56 Network Feature s The 1st order score function is the normalized gradient of the log of the input density function This encodes variations in the input distribution p(x). By looking at the gradient of the distribution you get an idea of where there is a large change occurring in the distribution 26/30

57 Network Feature s The 1st order score function is the normalized gradient of the log of the input density function This encodes variations in the input distribution p(x). By looking at the gradient of the distribution you get an idea of where there is a large change occurring in the distribution The correlation E[y S 3 (x)] between the third-order score function S 3 (x) and the output y then has a particularly useful form, because the x averages out in expectation. 26/30

58 Network Feature s Network Feature s In Lemma 6 of the paper, authors prove that the rank-1 components of the third order tensor E[ỹ S 3 (x)] are the columns of the weight matrix A 1 : E[ỹ S 3 (x)] = k λ j (A 1 ) j (A 1 ) j (A 1 ) j R d d d j=1 27/30

59 Network Feature s Network Feature s In Lemma 6 of the paper, authors prove that the rank-1 components of the third order tensor E[ỹ S 3 (x)] are the columns of the weight matrix A 1 : E[ỹ S 3 (x)] = k λ j (A 1 ) j (A 1 ) j (A 1 ) j R d d d j=1 It follows that the columns of A 1 are recoverable using the method of moments, with optimal convergence guarantees 27/30

60 Network Feature s Network Feature s The remainder of the steps can to the following algorithm: 28/30

61 of Talk Network Feature s ial representations of Latent Variable Models promise to overcome shortcomings of EM algorithm and variational Bayes 29/30

62 of Talk Network Feature s ial representations of Latent Variable Models promise to overcome shortcomings of EM algorithm and variational Bayes s algebra involves non-trivial but conceptually straightforward operations 29/30

63 of Talk Network Feature s ial representations of Latent Variable Models promise to overcome shortcomings of EM algorithm and variational Bayes s algebra involves non-trivial but conceptually straightforward operations These methods may point to a new direction in machine learning research that gives guarantees in unsupervised learning 29/30

64 References I Network Feature s Anandkumar, Anima. Tutorial on and s for Guaranteed Learning, Part 1. (2014). Lecture slides, Machine Learning Summer School 2014, Carnegie Mellon University http: //newport.eecs.uci.edu/anandkumar/mlss.html Janzamin, Majid, Hanie Sedghi, and Anima Anandkumar. Beating the Perils of Non-Convexity: Guaranteed Training of using s. arxiv: v3 [cs.lg] Kolda, Tamara G. Brett W. Bader. Decompositions and Applications. SIAM Review, 51(3), pp /30

Tensor intro 1. SIAM Rev., 51(3), Tensor Decompositions and Applications, Kolda, T.G. and Bader, B.W.,

Tensor intro 1. SIAM Rev., 51(3), Tensor Decompositions and Applications, Kolda, T.G. and Bader, B.W., Overview 1. Brief tensor introduction 2. Stein s lemma 3. Score and score matching for fitting models 4. Bringing it all together for supervised deep learning Tensor intro 1 Tensors are multidimensional

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Dictionary Learning Using Tensor Methods

Dictionary Learning Using Tensor Methods Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone

More information

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Tensor Decompositions and Applications

Tensor Decompositions and Applications Tamara G. Kolda and Brett W. Bader Part I September 22, 2015 What is tensor? A N-th order tensor is an element of the tensor product of N vector spaces, each of which has its own coordinate system. a =

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 6: Some Other Stuff PD Dr.

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future

More information

FEAST at Play: Feature ExtrAction using Score function Tensors

FEAST at Play: Feature ExtrAction using Score function Tensors JMLR: Workshop and Conference Proceedings 44 (2015) 130-144 NIPS 2015 The 1st International Workshop Feature Extraction: Modern Questions and Challenges FEAST at Play: Feature ExtrAction using Score function

More information

Parallel Tensor Compression for Large-Scale Scientific Data

Parallel Tensor Compression for Large-Scale Scientific Data Parallel Tensor Compression for Large-Scale Scientific Data Woody Austin, Grey Ballard, Tamara G. Kolda April 14, 2016 SIAM Conference on Parallel Processing for Scientific Computing MS 44/52: Parallel

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Linear Algebra - Part II

Linear Algebra - Part II Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T

More information

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA) CSC411/2515 Tutorial Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

1 Last time: least-squares problems

1 Last time: least-squares problems MATH Linear algebra (Fall 07) Lecture Last time: least-squares problems Definition. If A is an m n matrix and b R m, then a least-squares solution to the linear system Ax = b is a vector x R n such that

More information

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods Majid Janzamin Hanie Sedghi Anima Anandkumar Abstract Training neural networks is a challenging non-convex

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Background Mathematics (2/2) 1. David Barber

Background Mathematics (2/2) 1. David Barber Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review CS 4980/6980: Introduction to Data Science c Spring 2018 Lecture 2: Linear Algebra Review Instructor: Daniel L. Pimentel-Alarcón Scribed by: Anh Nguyen and Kira Jordan This is preliminary work and has

More information

1 Linearity and Linear Systems

1 Linearity and Linear Systems Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)

More information

Recent Advances in Non-Convex Optimization and its Implications to Learning

Recent Advances in Non-Convex Optimization and its Implications to Learning Recent Advances in Non-Convex Optimization and its Implications to Learning Anima Anandkumar.. U.C. Irvine.. ICML 2016 Tutorial Optimization at the heart of Machine Learning Unsupervised Learning Most

More information

Image Registration Lecture 2: Vectors and Matrices

Image Registration Lecture 2: Vectors and Matrices Image Registration Lecture 2: Vectors and Matrices Prof. Charlene Tsai Lecture Overview Vectors Matrices Basics Orthogonal matrices Singular Value Decomposition (SVD) 2 1 Preliminary Comments Some of this

More information

Econ Slides from Lecture 7

Econ Slides from Lecture 7 Econ 205 Sobel Econ 205 - Slides from Lecture 7 Joel Sobel August 31, 2010 Linear Algebra: Main Theory A linear combination of a collection of vectors {x 1,..., x k } is a vector of the form k λ ix i for

More information

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Jim Lambers MAT 610 Summer Session Lecture 1 Notes Jim Lambers MAT 60 Summer Session 2009-0 Lecture Notes Introduction This course is about numerical linear algebra, which is the study of the approximate solution of fundamental problems from linear algebra

More information

Quantum Computing Lecture 2. Review of Linear Algebra

Quantum Computing Lecture 2. Review of Linear Algebra Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces

More information

November 28 th, Carlos Guestrin 1. Lower dimensional projections

November 28 th, Carlos Guestrin 1. Lower dimensional projections PCA Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 1 Lower dimensional projections Rather than picking a subset of the features, we can new features that are

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors Chapter 7 Canonical Forms 7.1 Eigenvalues and Eigenvectors Definition 7.1.1. Let V be a vector space over the field F and let T be a linear operator on V. An eigenvalue of T is a scalar λ F such that there

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Third-Order Tensor Decompositions and Their Application in Quantum Chemistry

Third-Order Tensor Decompositions and Their Application in Quantum Chemistry Third-Order Tensor Decompositions and Their Application in Quantum Chemistry Tyler Ueltschi University of Puget SoundTacoma, Washington, USA tueltschi@pugetsound.edu April 14, 2014 1 Introduction A tensor

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix

More information

Hebbian Learning II. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. July 20, 2017

Hebbian Learning II. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. July 20, 2017 Hebbian Learning II Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester July 20, 2017 Goals Teach about one-half of an undergraduate course on Linear Algebra Understand when

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Matrix Factorization and Analysis

Matrix Factorization and Analysis Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

Study Notes on Matrices & Determinants for GATE 2017

Study Notes on Matrices & Determinants for GATE 2017 Study Notes on Matrices & Determinants for GATE 2017 Matrices and Determinates are undoubtedly one of the most scoring and high yielding topics in GATE. At least 3-4 questions are always anticipated from

More information

Previously Monte Carlo Integration

Previously Monte Carlo Integration Previously Simulation, sampling Monte Carlo Simulations Inverse cdf method Rejection sampling Today: sampling cont., Bayesian inference via sampling Eigenvalues and Eigenvectors Markov processes, PageRank

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Linear Algebra V = T = ( 4 3 ).

Linear Algebra V = T = ( 4 3 ). Linear Algebra Vectors A column vector is a list of numbers stored vertically The dimension of a column vector is the number of values in the vector W is a -dimensional column vector and V is a 5-dimensional

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,

More information

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti Mobile Robotics 1 A Compact Course on Linear Algebra Giorgio Grisetti SA-1 Vectors Arrays of numbers They represent a point in a n dimensional space 2 Vectors: Scalar Product Scalar-Vector Product Changes

More information

Example Linear Algebra Competency Test

Example Linear Algebra Competency Test Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where: VAR Model (k-variate VAR(p model (in the Reduced Form: where: Y t = A + B 1 Y t-1 + B 2 Y t-2 + + B p Y t-p + ε t Y t = (y 1t, y 2t,, y kt : a (k x 1 vector of time series variables A: a (k x 1 vector

More information

Maths for Signals and Systems Linear Algebra in Engineering

Maths for Signals and Systems Linear Algebra in Engineering Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE

More information

Linear Algebra Practice Problems

Linear Algebra Practice Problems Math 7, Professor Ramras Linear Algebra Practice Problems () Consider the following system of linear equations in the variables x, y, and z, in which the constants a and b are real numbers. x y + z = a

More information

Lecture II: Linear Algebra Revisited

Lecture II: Linear Algebra Revisited Lecture II: Linear Algebra Revisited Overview Vector spaces, Hilbert & Banach Spaces, etrics & Norms atrices, Eigenvalues, Orthogonal Transformations, Singular Values Operators, Operator Norms, Function

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

High Dimensional Model Representation (HDMR) Based Folded Vector Decomposition

High Dimensional Model Representation (HDMR) Based Folded Vector Decomposition High Dimensional Model Representation (HDMR) Based Folded Vector Decomposition LETİSYA DİVANYAN İstanbul Technical University Informatics Institute Maslak, 34469 İstanbul TÜRKİYE (TURKEY) letisya.divanyan@be.itu.edu.tr

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

ECS130 Scientific Computing. Lecture 1: Introduction. Monday, January 7, 10:00 10:50 am

ECS130 Scientific Computing. Lecture 1: Introduction. Monday, January 7, 10:00 10:50 am ECS130 Scientific Computing Lecture 1: Introduction Monday, January 7, 10:00 10:50 am About Course: ECS130 Scientific Computing Professor: Zhaojun Bai Webpage: http://web.cs.ucdavis.edu/~bai/ecs130/ Today

More information

Fundamentals of Multilinear Subspace Learning

Fundamentals of Multilinear Subspace Learning Chapter 3 Fundamentals of Multilinear Subspace Learning The previous chapter covered background materials on linear subspace learning. From this chapter on, we shall proceed to multiple dimensions with

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors 5 Eigenvalues and Eigenvectors 5.2 THE CHARACTERISTIC EQUATION DETERMINANATS n n Let A be an matrix, let U be any echelon form obtained from A by row replacements and row interchanges (without scaling),

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

CVPR A New Tensor Algebra - Tutorial. July 26, 2017 CVPR 2017 A New Tensor Algebra - Tutorial Lior Horesh lhoresh@us.ibm.com Misha Kilmer misha.kilmer@tufts.edu July 26, 2017 Outline Motivation Background and notation New t-product and associated algebraic

More information

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms De La Fuente notes that, if an n n matrix has n distinct eigenvalues, it can be diagonalized. In this supplement, we will provide

More information

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Sham M Kakade c 2019 University of Washington cse446-staff@cs.washington.edu 0 / 10 Announcements Please do Q1

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information