Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
|
|
- Kathryn King
- 5 years ago
- Views:
Transcription
1 Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June /30
2 Contact information Network Feature s Please inform me of any typos or errors at geoff.roeder@gmail.com 2/30
3 Outline Network Feature s 1 2 Network Feature s 3/30
4 Outline Network Feature s 1 2 Network Feature s 4/30
5 Formal Definition s as Multi-Linear Maps Network Feature s The tensor product of two vector spaces V and W over a field F is another vector space over F. It is denoted V K W or V W when the field is understood. 5/30
6 Formal Definition s as Multi-Linear Maps Network Feature s The tensor product of two vector spaces V and W over a field F is another vector space over F. It is denoted V K W or V W when the field is understood. If {v i } and {w j } are bases for V and W, then the set {v i w j } is a basis for V W 5/30
7 Formal Definition s as Multi-Linear Maps Network Feature s The tensor product of two vector spaces V and W over a field F is another vector space over F. It is denoted V K W or V W when the field is understood. If {v i } and {w j } are bases for V and W, then the set {v i w j } is a basis for V W If S : V X and T : W Y are linear maps, then the tensor product of S and T is a linear map defined by S T : V W X Y (S T )(v w) = S(v) T (w) 5/30
8 Sufficient Definition s as Multidimensional Arrays (adapted from Kolda and Bader 2009) Network Feature s With respect to a given basis, a tensor is a multidimensional array. E.g., a real N th -order tensor X R I 1 I 2... I N w.r.t. the standard basis is an N-dimensional array where X i1,i 2,...,i N is the element at index (i 1, i 2,..., i N ) 6/30
9 Sufficient Definition s as Multidimensional Arrays (adapted from Kolda and Bader 2009) Network Feature s With respect to a given basis, a tensor is a multidimensional array. E.g., a real N th -order tensor X R I 1 I 2... I N w.r.t. the standard basis is an N-dimensional array where X i1,i 2,...,i N is the element at index (i 1, i 2,..., i N ) The order of a tensor is the number of dimensions or modes or indices required to uniquely identify an element 6/30
10 Sufficient Definition s as Multidimensional Arrays (adapted from Kolda and Bader 2009) Network Feature s With respect to a given basis, a tensor is a multidimensional array. E.g., a real N th -order tensor X R I 1 I 2... I N w.r.t. the standard basis is an N-dimensional array where X i1,i 2,...,i N is the element at index (i 1, i 2,..., i N ) The order of a tensor is the number of dimensions or modes or indices required to uniquely identify an element So, a scalar is a 0 mode tensor, a vector is a 1 mode tensor, and a matrix is a 2 mode tensor 6/30
11 s Subarrays Network Feature s fibers: higher-order analogue to matrix rows and columns. For 3 rd -order tensor, fix all but one index. Figure: Fibers of a 3-mode tensor 7/30
12 s Subarrays Network Feature s fibers: higher-order analogue to matrix rows and columns. For 3 rd -order tensor, fix all but one index. Figure: Fibers of a 3-mode tensor All fibers are treated as columns vectors by convention. 7/30
13 s Subarrays Network Feature s slices: two-dimensional sections of a tensor, defined by fixing all but two indices: Figure: Slices of a 3 rd -order tensor 8/30
14 s Subarrays Network Feature s slices: two-dimensional sections of a tensor, defined by fixing all but two indices: Figure: Slices of a 3 rd -order tensor Slices can be horizontal, lateral, or frontal, corresponding to the diagrams above. 8/30
15 s Operations -Matrix Product Network Feature s Consider multiplying tensor by a matrix or vector in the n th mode. 9/30
16 s Operations -Matrix Product Network Feature s Consider multiplying tensor by a matrix or vector in the n th mode. The n-mode (matrix) product of a tensor X R I 1 I 2... I N with a matrix S R J In is denoted X n S and lives in R I 1... I n 1 J I n+1... I N 9/30
17 s Operations -Matrix Product Network Feature s Consider multiplying tensor by a matrix or vector in the n th mode. The n-mode (matrix) product of a tensor X R I 1 I 2... I N with a matrix S R J In is denoted X n S and lives in R I 1... I n 1 J I n+1... I N Elementwise, (X n S) i1,...,i n 1,j,i n+1,...,i N = I n i n=1 x i1,i 2,...,i N s j,in 9/30
18 s Operations -Matrix Product Network Feature s Example. Consider the tensor X R 3 2 2, with frontal slices [ ] X :,:,1 = 2 5 X :,:,2 = U = /30
19 s Operations -Matrix Product Network Feature s Example. Consider the tensor X R 3 2 2, with frontal slices [ X :,:,1 = 2 5 X :,:,2 = U = Then, the (1, 1, 1) element of the 1-mode matrix product of X and U is: I 1 =3 (X 1 U) 1,1,1 = x i1,1,1 u 1,i1 i 1 =1 = = 22 ] 10/30
20 s Operations -Vector Product Network Feature s The n-mode (vector) product of a tensor X R I 1 I 2... I N with a vector w R In is denoted X n w and lives in R I 1... I n 1 I n+1... I N. 11/30
21 s Operations -Vector Product Network Feature s The n-mode (vector) product of a tensor X R I 1 I 2... I N with a vector w R In is denoted X n w and lives in R I 1... I n 1 I n+1... I N. Elementwise, (X n w) i1,...,i n 1,i n+1,...,i N = I N i n=1 x i1,i 2,...,i N w in 11/30
22 s Operations -Vector Product Network Feature s Example. Consider the tensor X R as defined as before, where X 1,1,1 = 1 and X 1,2,1 = 4. Then, the (1, 1) element of the 2-mode vector product of X and w = [1 2] is I 2 =2 (X 2 w) 1,1 = x i1,i 2,i 3 w i2 i 2 =1 = = 9 12/30
23 Multilinear form of a X Network Feature s The multilinear form for a tensor X R I 1 I 2 I 3 as follows. is defined 13/30
24 Multilinear form of a X Network Feature s The multilinear form for a tensor X R I 1 I 2 I 3 is defined as follows. Consider matrices M i R q i p i for i 1, 2, 3. Then, the tensor X (M 1, M 2, M 3 ) R p 1 p 2 p 3 is defined as X j1,j 2,j 3 M 1 (j 1, :) M 2 (j 2, :) M 3 (j 3, :) j 1 [q 1 ] j 2 [q 2 ] j 3 [q 3 ] 13/30
25 Multilinear form of a X Network Feature s The multilinear form for a tensor X R I 1 I 2 I 3 is defined as follows. Consider matrices M i R q i p i for i 1, 2, 3. Then, the tensor X (M 1, M 2, M 3 ) R p 1 p 2 p 3 is defined as X j1,j 2,j 3 M 1 (j 1, :) M 2 (j 2, :) M 3 (j 3, :) j 1 [q 1 ] j 2 [q 2 ] j 3 [q 3 ] A simpler case is with vectors v, w R d. Then, X (I, v, w) := v j w l T (:, j, l) R d j,l [d] which is a multilinear combination of the mode-1 fibers (columns) of the tensor X 13/30
26 Rank One s Network Feature s An N-way tensor X is called rank one if it can be written as the tensor product of N vectors, i.e., X = a (1) a (2)... a (N). (1) where is the vector (outer) product. The simple form of (1) implies that X i1,i 2,...,i N = a (1) i 1 a (2) i 2... a (N) i N 14/30
27 Rank One s Network Feature s An N-way tensor X is called rank one if it can be written as the tensor product of N vectors, i.e., X = a (1) a (2)... a (N). (1) where is the vector (outer) product. The simple form of (1) implies that X i1,i 2,...,i N = a (1) i 1 a (2) i 2... a (N) i N The following diagram exhibits a rank one 3-mode tensor X = a b c: 14/30
28 Rank k s Network Feature s An order-3 tensor X R d d d is said to have rank k if it can be written as the sum of k rank-1 tensors, i.e., X = k w i a i b i c i, w i R, a i, b i, c i R d. i=1 15/30
29 Rank k s Network Feature s An order-3 tensor X R d d d is said to have rank k if it can be written as the sum of k rank-1 tensors, i.e., X = k w i a i b i c i, w i R, a i, b i, c i R d. i=1 Analogy to SVD where M = i σ iu v : suggests finding a decomposition of an arbitrary tensor into a spectrum of rank-one components: 15/30
30 Outline Network Feature s 1 2 Network Feature s 16/30
31 Motivation Network Feature s Consider a 3 rd -order tensor of the form A = i w ia i a i a i. Considering A as a multilinear map, we can represent its action on lower-order input tensors (vectors and matrices) using its multilinear form: A(B, C, D) := i w i (B T a i ) (C T a i ) (D T a i ). 17/30
32 Motivation Network Feature s Consider a 3 rd -order tensor of the form A = i w ia i a i a i. Considering A as a multilinear map, we can represent its action on lower-order input tensors (vectors and matrices) using its multilinear form: A(B, C, D) := i w i (B T a i ) (C T a i ) (D T a i ). Now suppose A had orthonormal columns. Then, M 3 (I, a 1, a 1 ) = i w i (I a i ) (a i a 1 ) 2 = w 1 a /30
33 Motivation Network Feature s Consider a 3 rd -order tensor of the form A = i w ia i a i a i. Considering A as a multilinear map, we can represent its action on lower-order input tensors (vectors and matrices) using its multilinear form: A(B, C, D) := i w i (B T a i ) (C T a i ) (D T a i ). Now suppose A had orthonormal columns. Then, M 3 (I, a 1, a 1 ) = i w i (I a i ) (a i a 1 ) 2 = w 1 a This is analogous to an eigenvector of a matrix. If v is an eigenvector of M we can write Mv = M(I, v) = λv 17/30
34 Whitening the Network Feature s We re extremely unlikely to encounter an empirical tensor built from orthogonal components like this... 18/30
35 Whitening the Network Feature s We re extremely unlikely to encounter an empirical tensor built from orthogonal components like this... But can learn a whitening transform using the second moment, M 2 = i w ia i a i i w ia i a i. 18/30
36 Whitening the Network Feature s We re extremely unlikely to encounter an empirical tensor built from orthogonal components like this... But can learn a whitening transform using the second moment, M 2 = i w ia i a i i w ia i a i. Whitening transforms the covariance matrix to the identity matrix. The data is thereby decorrelated with unit variance. The following diagram displays the action of a whitening transform on data sampled from a bivariate Gaussian: 18/30
37 The whitening transform is invertible so long as the empirical second moment matrix has full column rank Network Feature s 19/30
38 Network Feature s The whitening transform is invertible so long as the empirical second moment matrix has full column rank Given the whitening matrix W, we can whiten the empirical third moment tensor by evaluating T = M 3 (W, W, W ) = w i (W a i ) 3 = w i v 3 i i where {v i } is now an orthogonal basis i [k] 19/30
39 Procedure Start from a whitened tensor T. Then: Network Feature s 20/30
40 Procedure Network Feature s Start from a whitened tensor T. Then: 1 Randomly initialize v. Evaluate the expression v T (I, v, v) T (I, v, v, ) until convergence to obtain v with eigenvalue λ 20/30
41 Procedure Network Feature s Start from a whitened tensor T. Then: 1 Randomly initialize v. Evaluate the expression v T (I, v, v) T (I, v, v, ) until convergence to obtain v with eigenvalue λ 2 Deflate T = T λv v v. Store eigenvalue/eigenvector pair, and then go to 1. 20/30
42 Procedure Network Feature s Start from a whitened tensor T. Then: 1 Randomly initialize v. Evaluate the expression v T (I, v, v) T (I, v, v, ) until convergence to obtain v with eigenvalue λ 2 Deflate T = T λv v v. Store eigenvalue/eigenvector pair, and then go to 1. This leads to the algorithm for recovering the columns of a parameter matrix by representing its columns as moments: 20/30
43 of of Moments Network Feature s factorization is NP-hard in general 21/30
44 of of Moments Network Feature s factorization is NP-hard in general For orthogonal tensors, factorization is polynomial in sample size and number of operations 21/30
45 of of Moments Network Feature s factorization is NP-hard in general For orthogonal tensors, factorization is polynomial in sample size and number of operations Unlike EM algorithm or variational Bayes, this method converges to the global optimum 21/30
46 of of Moments Network Feature s factorization is NP-hard in general For orthogonal tensors, factorization is polynomial in sample size and number of operations Unlike EM algorithm or variational Bayes, this method converges to the global optimum For a more detailed analysis and how to frame any latent variable model using this method, see newport.eecs.uci.edu/anandkumar/mlss.html 21/30
47 Outline Network Feature s 1 2 Network Feature s 22/30
48 Network Feature s Very recent work (last arxiv.org datestamp: Jan ) on finding optimal weights for a two-layer neural network, with notes on how to generalize to more complex architectures: 23/30
49 Network Feature s Very recent work (last arxiv.org datestamp: Jan ) on finding optimal weights for a two-layer neural network, with notes on how to generalize to more complex architectures: Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. Beating the Perils of Non-Convexity: Guaranteed Training of using s. 23/30
50 Network Feature s Network Feature s Target network is a label-generating model with architecture f (x) := E[ỹ x] = A 1 σ(a 2 x + b 1) + b 2 : 24/30
51 Network Feature s Network Feature s Target network is a label-generating model with architecture f (x) := E[ỹ x] = A 1 σ(a 2 x + b 1) + b 2 : Must assume input pdf p(x) is known sufficiently well for learning (or can be estimated using unsupervised methods) 24/30
52 Network Feature s Network Feature s Key insight: there exists a transformation φ( ) of the input {(x i, y i )} that captures the relationship between the parameter matrices A 1 and A 2 and the input 25/30
53 Network Feature s Network Feature s Key insight: there exists a transformation φ( ) of the input {(x i, y i )} that captures the relationship between the parameter matrices A 1 and A 2 and the input The transformation generates feature tensors that can be factorized using the method of moments 25/30
54 Network Feature s Network Feature s Key insight: there exists a transformation φ( ) of the input {(x i, y i )} that captures the relationship between the parameter matrices A 1 and A 2 and the input The transformation generates feature tensors that can be factorized using the method of moments m th -order Score function, defined as (Janzamin et al. 2014) S m (x) := ( 1) m (m) x p(x) p(x) where p(x) is the pdf of the random vector x R d and is the m th order derivative operator (m) x 25/30
55 Network Feature s The 1st order score function is the normalized gradient of the log of the input density function 26/30
56 Network Feature s The 1st order score function is the normalized gradient of the log of the input density function This encodes variations in the input distribution p(x). By looking at the gradient of the distribution you get an idea of where there is a large change occurring in the distribution 26/30
57 Network Feature s The 1st order score function is the normalized gradient of the log of the input density function This encodes variations in the input distribution p(x). By looking at the gradient of the distribution you get an idea of where there is a large change occurring in the distribution The correlation E[y S 3 (x)] between the third-order score function S 3 (x) and the output y then has a particularly useful form, because the x averages out in expectation. 26/30
58 Network Feature s Network Feature s In Lemma 6 of the paper, authors prove that the rank-1 components of the third order tensor E[ỹ S 3 (x)] are the columns of the weight matrix A 1 : E[ỹ S 3 (x)] = k λ j (A 1 ) j (A 1 ) j (A 1 ) j R d d d j=1 27/30
59 Network Feature s Network Feature s In Lemma 6 of the paper, authors prove that the rank-1 components of the third order tensor E[ỹ S 3 (x)] are the columns of the weight matrix A 1 : E[ỹ S 3 (x)] = k λ j (A 1 ) j (A 1 ) j (A 1 ) j R d d d j=1 It follows that the columns of A 1 are recoverable using the method of moments, with optimal convergence guarantees 27/30
60 Network Feature s Network Feature s The remainder of the steps can to the following algorithm: 28/30
61 of Talk Network Feature s ial representations of Latent Variable Models promise to overcome shortcomings of EM algorithm and variational Bayes 29/30
62 of Talk Network Feature s ial representations of Latent Variable Models promise to overcome shortcomings of EM algorithm and variational Bayes s algebra involves non-trivial but conceptually straightforward operations 29/30
63 of Talk Network Feature s ial representations of Latent Variable Models promise to overcome shortcomings of EM algorithm and variational Bayes s algebra involves non-trivial but conceptually straightforward operations These methods may point to a new direction in machine learning research that gives guarantees in unsupervised learning 29/30
64 References I Network Feature s Anandkumar, Anima. Tutorial on and s for Guaranteed Learning, Part 1. (2014). Lecture slides, Machine Learning Summer School 2014, Carnegie Mellon University http: //newport.eecs.uci.edu/anandkumar/mlss.html Janzamin, Majid, Hanie Sedghi, and Anima Anandkumar. Beating the Perils of Non-Convexity: Guaranteed Training of using s. arxiv: v3 [cs.lg] Kolda, Tamara G. Brett W. Bader. Decompositions and Applications. SIAM Review, 51(3), pp /30
Tensor intro 1. SIAM Rev., 51(3), Tensor Decompositions and Applications, Kolda, T.G. and Bader, B.W.,
Overview 1. Brief tensor introduction 2. Stein s lemma 3. Score and score matching for fitting models 4. Bringing it all together for supervised deep learning Tensor intro 1 Tensors are multidimensional
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationDictionary Learning Using Tensor Methods
Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone
More informationGuaranteed Learning of Latent Variable Models through Spectral and Tensor Methods
Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationTensor Decompositions and Applications
Tamara G. Kolda and Brett W. Bader Part I September 22, 2015 What is tensor? A N-th order tensor is an element of the tensor product of N vector spaces, each of which has its own coordinate system. a =
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationDonald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016
Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 6: Some Other Stuff PD Dr.
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationFoundations of Computer Vision
Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationCS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery
CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Week #1
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future
More informationFEAST at Play: Feature ExtrAction using Score function Tensors
JMLR: Workshop and Conference Proceedings 44 (2015) 130-144 NIPS 2015 The 1st International Workshop Feature Extraction: Modern Questions and Challenges FEAST at Play: Feature ExtrAction using Score function
More informationParallel Tensor Compression for Large-Scale Scientific Data
Parallel Tensor Compression for Large-Scale Scientific Data Woody Austin, Grey Ballard, Tamara G. Kolda April 14, 2016 SIAM Conference on Parallel Processing for Scientific Computing MS 44/52: Parallel
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More informationLinear Algebra - Part II
Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T
More informationPrincipal Component Analysis (PCA) CSC411/2515 Tutorial
Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More information1 Last time: least-squares problems
MATH Linear algebra (Fall 07) Lecture Last time: least-squares problems Definition. If A is an m n matrix and b R m, then a least-squares solution to the linear system Ax = b is a vector x R n such that
More informationBeating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods Majid Janzamin Hanie Sedghi Anima Anandkumar Abstract Training neural networks is a challenging non-convex
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationBackground Mathematics (2/2) 1. David Barber
Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and
More informationLinear Algebra for Machine Learning. Sargur N. Srihari
Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationLinear Algebra (Review) Volker Tresp 2018
Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c
More informationLecture 2: Linear Algebra Review
CS 4980/6980: Introduction to Data Science c Spring 2018 Lecture 2: Linear Algebra Review Instructor: Daniel L. Pimentel-Alarcón Scribed by: Anh Nguyen and Kira Jordan This is preliminary work and has
More information1 Linearity and Linear Systems
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)
More informationRecent Advances in Non-Convex Optimization and its Implications to Learning
Recent Advances in Non-Convex Optimization and its Implications to Learning Anima Anandkumar.. U.C. Irvine.. ICML 2016 Tutorial Optimization at the heart of Machine Learning Unsupervised Learning Most
More informationImage Registration Lecture 2: Vectors and Matrices
Image Registration Lecture 2: Vectors and Matrices Prof. Charlene Tsai Lecture Overview Vectors Matrices Basics Orthogonal matrices Singular Value Decomposition (SVD) 2 1 Preliminary Comments Some of this
More informationEcon Slides from Lecture 7
Econ 205 Sobel Econ 205 - Slides from Lecture 7 Joel Sobel August 31, 2010 Linear Algebra: Main Theory A linear combination of a collection of vectors {x 1,..., x k } is a vector of the form k λ ix i for
More informationJim Lambers MAT 610 Summer Session Lecture 1 Notes
Jim Lambers MAT 60 Summer Session 2009-0 Lecture Notes Introduction This course is about numerical linear algebra, which is the study of the approximate solution of fundamental problems from linear algebra
More informationQuantum Computing Lecture 2. Review of Linear Algebra
Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces
More informationNovember 28 th, Carlos Guestrin 1. Lower dimensional projections
PCA Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 1 Lower dimensional projections Rather than picking a subset of the features, we can new features that are
More informationEstimating Covariance Using Factorial Hidden Markov Models
Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationChapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors
Chapter 7 Canonical Forms 7.1 Eigenvalues and Eigenvectors Definition 7.1.1. Let V be a vector space over the field F and let T be a linear operator on V. An eigenvalue of T is a scalar λ F such that there
More informationB553 Lecture 5: Matrix Algebra Review
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationThird-Order Tensor Decompositions and Their Application in Quantum Chemistry
Third-Order Tensor Decompositions and Their Application in Quantum Chemistry Tyler Ueltschi University of Puget SoundTacoma, Washington, USA tueltschi@pugetsound.edu April 14, 2014 1 Introduction A tensor
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationApplied Linear Algebra in Geoscience Using MATLAB
Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More information1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det
What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix
More informationHebbian Learning II. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. July 20, 2017
Hebbian Learning II Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester July 20, 2017 Goals Teach about one-half of an undergraduate course on Linear Algebra Understand when
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationCS 143 Linear Algebra Review
CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see
More informationMatrix Factorization and Analysis
Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their
More informationPrincipal Component Analysis CS498
Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy
More informationStudy Notes on Matrices & Determinants for GATE 2017
Study Notes on Matrices & Determinants for GATE 2017 Matrices and Determinates are undoubtedly one of the most scoring and high yielding topics in GATE. At least 3-4 questions are always anticipated from
More informationPreviously Monte Carlo Integration
Previously Simulation, sampling Monte Carlo Simulations Inverse cdf method Rejection sampling Today: sampling cont., Bayesian inference via sampling Eigenvalues and Eigenvectors Markov processes, PageRank
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationLinear Algebra V = T = ( 4 3 ).
Linear Algebra Vectors A column vector is a list of numbers stored vertically The dimension of a column vector is the number of values in the vector W is a -dimensional column vector and V is a 5-dimensional
More informationDeriving Principal Component Analysis (PCA)
-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationWindow-based Tensor Analysis on High-dimensional and Multi-aspect Streams
Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,
More informationMobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti
Mobile Robotics 1 A Compact Course on Linear Algebra Giorgio Grisetti SA-1 Vectors Arrays of numbers They represent a point in a n dimensional space 2 Vectors: Scalar Product Scalar-Vector Product Changes
More informationExample Linear Algebra Competency Test
Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,
More informationLecture 7: Positive Semidefinite Matrices
Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.
More informationVAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:
VAR Model (k-variate VAR(p model (in the Reduced Form: where: Y t = A + B 1 Y t-1 + B 2 Y t-2 + + B p Y t-p + ε t Y t = (y 1t, y 2t,, y kt : a (k x 1 vector of time series variables A: a (k x 1 vector
More informationMaths for Signals and Systems Linear Algebra in Engineering
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE
More informationLinear Algebra Practice Problems
Math 7, Professor Ramras Linear Algebra Practice Problems () Consider the following system of linear equations in the variables x, y, and z, in which the constants a and b are real numbers. x y + z = a
More informationLecture II: Linear Algebra Revisited
Lecture II: Linear Algebra Revisited Overview Vector spaces, Hilbert & Banach Spaces, etrics & Norms atrices, Eigenvalues, Orthogonal Transformations, Singular Values Operators, Operator Norms, Function
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationHigh Dimensional Model Representation (HDMR) Based Folded Vector Decomposition
High Dimensional Model Representation (HDMR) Based Folded Vector Decomposition LETİSYA DİVANYAN İstanbul Technical University Informatics Institute Maslak, 34469 İstanbul TÜRKİYE (TURKEY) letisya.divanyan@be.itu.edu.tr
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationECS130 Scientific Computing. Lecture 1: Introduction. Monday, January 7, 10:00 10:50 am
ECS130 Scientific Computing Lecture 1: Introduction Monday, January 7, 10:00 10:50 am About Course: ECS130 Scientific Computing Professor: Zhaojun Bai Webpage: http://web.cs.ucdavis.edu/~bai/ecs130/ Today
More informationFundamentals of Multilinear Subspace Learning
Chapter 3 Fundamentals of Multilinear Subspace Learning The previous chapter covered background materials on linear subspace learning. From this chapter on, we shall proceed to multiple dimensions with
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition
ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationEigenvalues and Eigenvectors
5 Eigenvalues and Eigenvectors 5.2 THE CHARACTERISTIC EQUATION DETERMINANATS n n Let A be an matrix, let U be any echelon form obtained from A by row replacements and row interchanges (without scaling),
More informationPattern Recognition 2
Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error
More informationCVPR A New Tensor Algebra - Tutorial. July 26, 2017
CVPR 2017 A New Tensor Algebra - Tutorial Lior Horesh lhoresh@us.ibm.com Misha Kilmer misha.kilmer@tufts.edu July 26, 2017 Outline Motivation Background and notation New t-product and associated algebraic
More informationCOMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare
COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationEcon 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis
Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms De La Fuente notes that, if an n n matrix has n distinct eigenvalues, it can be diagonalized. In this supplement, we will provide
More informationMachine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis
Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Sham M Kakade c 2019 University of Washington cse446-staff@cs.washington.edu 0 / 10 Announcements Please do Q1
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More information