IKA: Independent Kernel Approximator
|
|
- Frank Elijah Welch
- 5 years ago
- Views:
Transcription
1 IKA: Independent Kernel Approximator Matteo Ronchetti, Università di Pisa his wor started as my graduation thesis in Como with Stefano Serra Capizzano. Currently is being developed with the help of Federico Poloni.
2 Introduction A ernel K(x, y) : R d given x, x,, x 1 2 n R d R d R is a bivariate function such that the Gramian matrix G = K(x, x 1 1) K(x, x 1 2) K(x, x 1 n) K(x, x 2 1) K(x, x 2 2) K(x, x 2 n) K(x, x N 1) K(x, x N 2) K(x, x n n) R n n is symmetric positive semidefinite.
3 Low ran ernel approximation We want to find matrices B R n c and W such that We add the following constraints: G BW B R c c B = b ij j (x i) : possibility of approximating the ernel on new datapoints. Ran(W ) : possibility of trasnforming the approximation into the more e icient G BW B = CC where C R n
4 Nyström method A well nown method for low ran ernel approximation is the Nyström method: G = ( G G G 12 G 22 ( G 11 G G ) 11 ( G 11 G 12 ) ). 12 his can be generalized (Drineas et al. 2005) as G BW B where B = ( G 11 W = G ) ( argmin X G 11 ). 12 r(x) + Notice that Nyström forces the choice b i(x) = K(x i, x).
5 Nyström features he computation of W is e icient. he approximation is optimal only on the c c bloc G. 11 he forced choice b i(x) = K(x i, x) approximation expensive. can mae online
6 Ideas behind IKA Freely chose the functions {b (x), b (x),, b (x)}. 1 2 c Project the eigenfunctions of the ernel on Span(b (x), b (x),, b 1 2 c(x)). Computing the projected eigenfunctions reduces to solving an c c generalized eigenvalue problem. Approximate the ernel using the eigenfunctions. leading projected
7 he approximation scenario Given the datapoints x, x,, x it is possible to define an inner product between real valued functions: 1 2 n R d f, g : R d 1 R f, g = f(x i)g(x i). n n i=1 he ernel K defines a self-adjoint linear operator: n 1 Kf(x) = K(x, ), f( ) = K(x, x i)f(x i). n i=1 he eigenfunctions of K satisfy the following properties: Kϕ i(x) = λ ϕ (x) ϕ, ϕ i i i j = δ λ ij i R +
8 Projected eigenfunctions Assume that B R n c (where B = b ij j (x i) ) has full column ran and define B B c c P = R M = n B GB n 2 c c R. he projected eigenfunctions of K generalized eigenvalue problem: can be found by solving the Where Λ = diag(λ, λ,, λ 1 2 c), Φ P Φ = I and is the i-th projected eigenfunction. MΦ = P ΦΛ f i (x) = Φ b ji j (x) c j=1
9 Kernel approximation he ernel is approximated using the first eigenfunctions: projected In matrix form this corresponds to K(x, y) λ f (x)f (y) i=1 i i i G BΦΛ Φ B = BW B where Λ = diag(λ, λ,, λ 1 2, 0,, 0).
10 Optimality he approximation computed by IKA ( W = ΦΛ Φ ) is optimal on the whole matrix G with respect to Frobenius and spectral norm: W W = ΦΛ Φ = ΦΛ Φ = argmin 2 G BXB F r(x) = argmin 2 G BXB 2 r(x)
11 Subsampling of G It is possible to use setting: m n points to compute the approximation by G = ( G G B = G ) ( B G B B B 1 1 P = M = m B G B m ) Where G 11 R m m and B 1 R m c. he approximation will be optimal on G 11, the error on the other blocs will depend on the choice of B. Because of the optimality on G 11 it should be possible to study this error using existing results on approximate CUR matrix factorization.
12 Comparison with Nyström method IKA computes the same approximation as Nyström when setting b i(x) = K(x i, x) and m = c. he advantages of IKA are: Possitility of choosing b i (x) = K(x i, x) so that online approximation can be e icient even if the evaluation of is not. K(x, y) Possibility of setting m > c : "see" a bigger part of G to produce a better approximation.
13 Numerical Results We compared the Frobenious norm error committed by IKA and Nyström when approximating an RBF ernel on 5'000 unseen points. Both methods use the same c = 128 functions. Let E, E, E N I O be the errors committed respectively by Nyström, IKA and the optimal approximation of the ernel on test data. We measure the improvement of IKA using two quantities: Δ a Δ r = 1 = 1 E I E N E I E N E E O O
14 Numerical Results Method m Error Δ Δ ime Optimal % 100.0% -- Nyström % 0.0% 9 ms IKA % 11.8% 17 ms IKA % 36.3% 31 ms IKA % 56.4% 80 ms IKA % 74.4% 248 ms IKA % 87.5% 802 ms IKA % 95.4% 3.2 s IKA % 99.2% 13.9 s a r
15 Numerical Results he same comparison is repeated, fixing m = varying the ran of the approximation. for IKA and Nyström error IKA error Δ a % % % % % % Note that Nyström requires an approximation ran performances of IKA at ran. 2 to match the
16 Future Wor Choose the functions when G is sparse. b (x) i such that the method is e icient Study the choice of functions approximation error. b (x) i with respect to e iciency and Generalize to approximate non square matrices. Analyze performances on practical applications: ernel approximation, manifold learning, recommender systems, etc.
17 References owards More E icient SPSD Matrix Approximation and CUR Matrix Decomposition, Shusen Wang, Zhihua Zhang and ong Zhang, Journal of Machine Learning Research (2016) On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning, Petros Drineas, Michael W. Mahoney, Journal of Machine Learning Research (2005) Spectral Clustering and Kernel PCA are Learning Eigenfunctions, Bengio, Vincent and Paiement, CIRANO (2003) Singular value (and eigenvalue) distribution and Krylov preconditioning of sequences of sampling matrices approximating integral operators, A.S. Al Fhaid, S. Serra Capizzano, D. Sesana and M. Zaa Ullah, Numerical Linear Algebra with Applications (2014)
Principal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationOn Sampling-based Approximate Spectral Decomposition
Sanjiv Kumar Google Research, New Yor, NY Mehryar Mohri Courant Institute of Mathematical Sciences and Google Research, New Yor, NY Ameet Talwalar Courant Institute of Mathematical Sciences, New Yor, NY
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationMatrix Approximations
Matrix pproximations Sanjiv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 00 Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning Latent Semantic Indexing (LSI) Given n documents
More informationRandomized Numerical Linear Algebra: Review and Progresses
ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November
More informationApproximating a Gram Matrix for Improved Kernel-Based Learning
Approximating a Gram Matrix for Improved Kernel-Based Learning (Extended Abstract) Petros Drineas 1 and Michael W. Mahoney 1 Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New
More informationBasic Calculus Review
Basic Calculus Review Lorenzo Rosasco ISML Mod. 2 - Machine Learning Vector Spaces Functionals and Operators (Matrices) Vector Space A vector space is a set V with binary operations +: V V V and : R V
More informationThe Nyström Extension and Spectral Methods in Learning
Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali
More informationA Review of Nyström Methods for Large-Scale Machine Learning
A Review of Nyström Methods for Large-Scale Machine Learning Shiliang Sun, Jing Zhao, Jiang Zhu Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and Technology,
More informationBi-stochastic kernels via asymmetric affinity functions
Bi-stochastic kernels via asymmetric affinity functions Ronald R. Coifman, Matthew J. Hirn Yale University Department of Mathematics P.O. Box 208283 New Haven, Connecticut 06520-8283 USA ariv:1209.0237v4
More informationMemory Efficient Kernel Approximation
Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationRelative-Error CUR Matrix Decompositions
RandNLA Reading Group University of California, Berkeley Tuesday, April 7, 2015. Motivation study [low-rank] matrix approximations that are explicitly expressed in terms of a small numbers of columns and/or
More informationNATIONAL UNIVERSITY OF SINGAPORE DEPARTMENT OF MATHEMATICS SEMESTER 2 EXAMINATION, AY 2010/2011. Linear Algebra II. May 2011 Time allowed :
NATIONAL UNIVERSITY OF SINGAPORE DEPARTMENT OF MATHEMATICS SEMESTER 2 EXAMINATION, AY 2010/2011 Linear Algebra II May 2011 Time allowed : 2 hours INSTRUCTIONS TO CANDIDATES 1. This examination paper contains
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationA orthonormal basis for Radial Basis Function approximation
A orthonormal basis for Radial Basis Function approximation 9th ISAAC Congress Krakow, August 5-9, 2013 Gabriele Santin, joint work with S. De Marchi Department of Mathematics. Doctoral School in Mathematical
More informationLab 1: Iterative Methods for Solving Linear Systems
Lab 1: Iterative Methods for Solving Linear Systems January 22, 2017 Introduction Many real world applications require the solution to very large and sparse linear systems where direct methods such as
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationSpectral Clustering. by HU Pili. June 16, 2013
Spectral Clustering by HU Pili June 16, 2013 Outline Clustering Problem Spectral Clustering Demo Preliminaries Clustering: K-means Algorithm Dimensionality Reduction: PCA, KPCA. Spectral Clustering Framework
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationEXAM MATHEMATICAL METHODS OF PHYSICS. TRACK ANALYSIS (Chapters I-V). Thursday, June 7th,
EXAM MATHEMATICAL METHODS OF PHYSICS TRACK ANALYSIS (Chapters I-V) Thursday, June 7th, 1-13 Students who are entitled to a lighter version of the exam may skip problems 1, 8-11 and 16 Consider the differential
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationLinGloss. A glossary of linear algebra
LinGloss A glossary of linear algebra Contents: Decompositions Types of Matrices Theorems Other objects? Quasi-triangular A matrix A is quasi-triangular iff it is a triangular matrix except its diagonal
More informationTikhonov Regularization of Large Symmetric Problems
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2000; 00:1 11 [Version: 2000/03/22 v1.0] Tihonov Regularization of Large Symmetric Problems D. Calvetti 1, L. Reichel 2 and A. Shuibi
More informationLecture 02 Linear Algebra Basics
Introduction to Computational Data Analysis CX4240, 2019 Spring Lecture 02 Linear Algebra Basics Chao Zhang College of Computing Georgia Tech These slides are based on slides from Le Song and Andres Mendez-Vazquez.
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationFunctional Analysis Review
Functional Analysis Review Lorenzo Rosasco slides courtesy of Andre Wibisono 9.520: Statistical Learning Theory and Applications September 9, 2013 1 2 3 4 Vector Space A vector space is a set V with binary
More informationChapter XII: Data Pre and Post Processing
Chapter XII: Data Pre and Post Processing Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 XII.1 4-1 Chapter XII: Data Pre and Post Processing 1. Data
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationHigh Dimensional Covariance and Precision Matrix Estimation
High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance
More informationSOLVING ILL-POSED LINEAR SYSTEMS WITH GMRES AND A SINGULAR PRECONDITIONER
SOLVING ILL-POSED LINEAR SYSTEMS WITH GMRES AND A SINGULAR PRECONDITIONER LARS ELDÉN AND VALERIA SIMONCINI Abstract. Almost singular linear systems arise in discrete ill-posed problems. Either because
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to
More informationIntroduction to Numerical Linear Algebra II
Introduction to Numerical Linear Algebra II Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA 1 / 49 Overview We will cover this material in
More informationFrom Stationary Methods to Krylov Subspaces
Week 6: Wednesday, Mar 7 From Stationary Methods to Krylov Subspaces Last time, we discussed stationary methods for the iterative solution of linear systems of equations, which can generally be written
More informationContents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2
Contents Preface for the Instructor xi Preface for the Student xv Acknowledgments xvii 1 Vector Spaces 1 1.A R n and C n 2 Complex Numbers 2 Lists 5 F n 6 Digression on Fields 10 Exercises 1.A 11 1.B Definition
More informationBackground Mathematics (2/2) 1. David Barber
Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationThe Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationOPERATIONS on large matrices are a cornerstone of
1 On Sparse Representations of Linear Operators and the Approximation of Matrix Products Mohamed-Ali Belabbas and Patrick J. Wolfe arxiv:0707.4448v2 [cs.ds] 26 Jun 2009 Abstract Thus far, sparse representations
More informationExploiting Sparse Non-Linear Structure in Astronomical Data
Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationApproximate Spectral Clustering via Randomized Sketching
Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationApproximate Kernel Methods
Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression
More informationLarge-Scale Manifold Learning
Large-Scale Manifold Learning Ameet Talwalkar Courant Institute New York, NY ameet@cs.nyu.edu Sanjiv Kumar Google Research New York, NY sanjivk@google.com Henry Rowley Google Research Mountain View, CA
More informationMatrix decompositions
Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More information4.8 Arnoldi Iteration, Krylov Subspaces and GMRES
48 Arnoldi Iteration, Krylov Subspaces and GMRES We start with the problem of using a similarity transformation to convert an n n matrix A to upper Hessenberg form H, ie, A = QHQ, (30) with an appropriate
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationReview and problem list for Applied Math I
Review and problem list for Applied Math I (This is a first version of a serious review sheet; it may contain errors and it certainly omits a number of topic which were covered in the course. Let me know
More informationSTATISTICAL LEARNING SYSTEMS
STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis
More informationC&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz
C&O367: Nonlinear Optimization (Winter 013) Assignment 4 H. Wolkowicz Posted Mon, Feb. 8 Due: Thursday, Feb. 8 10:00AM (before class), 1 Matrices 1.1 Positive Definite Matrices 1. Let A S n, i.e., let
More information11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.
C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationA new stable basis for radial basis function interpolation
A new stable basis for radial basis function interpolation Stefano De Marchi and Gabriele Santin Department of Mathematics University of Padua (Italy) Abstract It is well-known that radial basis function
More informationThe Stability of Kernel Principal Components Analysis and its Relation to the Process Eigenspectrum
The Stability of Kernel Principal Components Analysis and its Relation to the Process Eigenspectrum John Shawe-Taylor Royal Holloway University of London john cs.rhul.ac.u Christopher K. I. Williams School
More informationCS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationSparse Support Vector Machines by Kernel Discriminant Analysis
Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationLecture 7 Spectral methods
CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional
More informationProblem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.
CSE 151: Introduction to Machine Learning Winter 2017 Problem Set 1 Instructor: Kamalika Chaudhuri Due on: Jan 28 Instructions This is a 40 point homework Homeworks will graded based on content and clarity
More informationRandNLA: Randomized Numerical Linear Algebra
RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling
More informationReview problems for MA 54, Fall 2004.
Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationLecture 2: Linear Algebra Review
CS 4980/6980: Introduction to Data Science c Spring 2018 Lecture 2: Linear Algebra Review Instructor: Daniel L. Pimentel-Alarcón Scribed by: Anh Nguyen and Kira Jordan This is preliminary work and has
More informationLecture 7: Positive Semidefinite Matrices
Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.
More informationPorcupine Neural Networks: (Almost) All Local Optima are Global
Porcupine Neural Networs: (Almost) All Local Optima are Global Soheil Feizi, Hamid Javadi, Jesse Zhang and David Tse arxiv:1710.0196v1 [stat.ml] 5 Oct 017 Stanford University Abstract Neural networs have
More informationCan matrix coherence be efficiently and accurately estimated?
Mehryar Mohri Courant Institute and Google Research New York, NY mohri@cs.nyu.edu Ameet Talwalkar Computer Science Division University of California, Berkeley ameet@eecs.berkeley.edu Abstract Matrix coherence
More informationKernel PCA, clustering and canonical correlation analysis
ernel PCA, clustering and canonical correlation analsis Le Song Machine Learning II: Advanced opics CSE 8803ML, Spring 2012 Support Vector Machines (SVM) 1 min w 2 w w + C j ξ j s. t. w j + b j 1 ξ j,
More informationElementary linear algebra
Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationReweighted Nuclear Norm Minimization with Application to System Identification
Reweighted Nuclear Norm Minimization with Application to System Identification Karthi Mohan and Maryam Fazel Abstract The matrix ran minimization problem consists of finding a matrix of minimum ran that
More informationMultigrid absolute value preconditioning
Multigrid absolute value preconditioning Eugene Vecharynski 1 Andrew Knyazev 2 (speaker) 1 Department of Computer Science and Engineering University of Minnesota 2 Department of Mathematical and Statistical
More informationSketched Ridge Regression:
Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang UC Berkeley Alex Gittens RPI Michael Mahoney UC Berkeley Overview Ridge Regression min w f w = 1 n Xw y + γ w Over-determined:
More informationSolving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners
Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners Eugene Vecharynski 1 Andrew Knyazev 2 1 Department of Computer Science and Engineering University of Minnesota 2 Department
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationELEC E7210: Communication Theory. Lecture 10: MIMO systems
ELEC E7210: Communication Theory Lecture 10: MIMO systems Matrix Definitions, Operations, and Properties (1) NxM matrix a rectangular array of elements a A. an 11 1....... a a 1M. NM B D C E ermitian transpose
More informationSum-of-Squares and Spectral Algorithms
Sum-of-Squares and Spectral Algorithms Tselil Schramm June 23, 2017 Workshop on SoS @ STOC 2017 Spectral algorithms as a tool for analyzing SoS. SoS Semidefinite Programs Spectral Algorithms SoS suggests
More information1 Feature Vectors and Time Series
PCA, SVD, LSI, and Kernel PCA 1 Feature Vectors and Time Series We now consider a sample x 1,..., x of objects (not necessarily vectors) and a feature map Φ such that for any object x we have that Φ(x)
More informationLow Rank Matrix Completion Formulation and Algorithm
1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9
More informationarxiv: v1 [math.na] 29 Dec 2014
A CUR Factorization Algorithm based on the Interpolative Decomposition Sergey Voronin and Per-Gunnar Martinsson arxiv:1412.8447v1 [math.na] 29 Dec 214 December 3, 214 Abstract An algorithm for the efficient
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationAA242B: MECHANICAL VIBRATIONS
AA242B: MECHANICAL VIBRATIONS 1 / 17 AA242B: MECHANICAL VIBRATIONS Solution Methods for the Generalized Eigenvalue Problem These slides are based on the recommended textbook: M. Géradin and D. Rixen, Mechanical
More informationSection 3.9. Matrix Norm
3.9. Matrix Norm 1 Section 3.9. Matrix Norm Note. We define several matrix norms, some similar to vector norms and some reflecting how multiplication by a matrix affects the norm of a vector. We use matrix
More informationGeneralized Locally Toeplitz Sequences: Theory and Applications
Generalized Locally Toeplitz Sequences: Theory and Applications Carlo Garoni Stefano Serra-Capizzano Generalized Locally Toeplitz Sequences: Theory and Applications Volume I 123 Carlo Garoni Department
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large
More informationRational and H dilation
Rational and H dilation Michael Dritschel, Michael Jury and Scott McCullough 19 December 2014 Some definitions D denotes the unit disk in the complex plane and D its closure. The disk algebra, A(D), is
More informationA Quick Tour of Linear Algebra and Optimization for Machine Learning
A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More information