Supplementary Material Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison
|
|
- Denis Chambers
- 5 years ago
- Views:
Transcription
1 Supplementary Material yström Method vs Random Fourier Features: A Theoretical and Empirical Comparison Appendix A: Proof of Proposition For anyf H f a, we can write Anonymous Author(s) Affiliation Address f( ) m wks( )+w s kc( ), c k wherew (w,w s, c,wm,w s m) c are the coefficients Then we have f(x) w z f (x), f 2 H κ du f(u) 2 exp(σ 2 u 2 /2) w (w γ), where the functional norm follows [2], denotes element-wise dot product, and γ (γ s,γ c,,γ s m,γ c m), γ s i γ c i exp(σ 2 u i 2 /2), weights the Fourier component u i We then finish the proof by replacing the linear form of f(x) and the functional norm off into equation (5) to obtain equation (6) in the paper Appendix B: Proof of Proposition 2 By noting that for anyf( ) r j w j ϕ j ( ) Ha n, we have f(x i ) w j ϕ j (x i ) w z n (x i ), f 2 H κ w 2 2, j where ϕ j (x i ) z n (x i ) follows the equality in (2) in the paper, and the functional norm follows the fact that ϕ j,j,,r are normalized orthonomal eigenfunctions By replacing these into equation (0) in the paper, we obtain an equivalent formulation in equation (9) in the paper Appendix C: Proof of Theorem 2 To prove Theorem 2 We first prove the following lemma Lemma DefineL(f) as L(f) λ 2 f 2 H κ + l(f(x i ),y i ) We have 0 L(f m) L(f ) C 2λ K K r 2,
2 wherec is the bound on the gradient of the loss function, ie, z l(z,y) C, 2 stands for the spectral norm of a matrix Proof Letl (α) denote the Fenchel conjugate ofl(z,y) in terms ofz, ie, l (α) sup(αz l(z,y)) α Ω where Ω is the range of the mapping z l(z,y) : R R Using the conjugate of l(z,y), we can write L(f) λ min f H D 2 f 2 H κ + l(f(x i ),y i ), into an equivalent form L(f ) max {α i Ω} whereα (α,,α ) Similarly, we can cast into Then L(f m) L(fm) λ min f H a n 2 f 2 H κ + max {α i Ω} L(f m) max {α i Ω} l (α i ) 2λ 2(α y) K(α y) l(f(x i ),y i ), l (α i ) 2λ 2(α y) K b K K b (α y) l (α i ) 2λ 2(α y) K(α y) + 2λ 2(α y) (K K K b Kb )(α y) max {α i Ω} l (α i ) 2λ 2(α y) K(α y) + max {α i Ω} 2λ 2(α y) (K K K b Kb )(α y) L(f )+ 2λ 2 α 2 2 K K b K K b 2 L (f )+ C 2λ K K r 2 where the last inequality follows α C, α Ω due to z l(z,y) C, and the definition of K r in equation (7) in the paper Proof of Theorem 2 Define loss function l(f(x),y) λ 2 f 2 H κ + l(f(x),y) To simplify our notation, we define P andp as P ( l f) l(f(x i ),y i ) L(f), P( l f) E [ l(f(x),y) ] F(f) Using this notation, we have Λ(fm ) Λ(f ) P( l f m l f ) P ( l f m l f )+(P P )( l f m l f ) P ( l f m l f )+ max (P P )( l f l f ), (f,f ) G 2
3 whereg is defined as { } G (f,f ) H κ H κ : f f l2 r, f f Hκ R, where f l2 E[f 2 (x)] f Hκ, and r,r are given by r R f m f H κ 4/ λ Using Lemma 9 from [], we have with a probability 2 3, for anyǫr e,ǫ 2 R e, sup (P P )( l f l f ) C 0 L(rǫ+Rǫ 2 +e ) C L(rǫ+e ), (f,f ) G wherec 0,C are numerical constants, andlis the upper bound of the gradient of l for functions in G and is given by L λ+c Since max( f H κ, f m H κ ) 2/ λ and 6ǫ 2 e 2 λ, we have the conditionǫr e satisfied, and therefore, with a probability 2 3, we have P( l f m l f ) P ( l f m l f )+C ( λ+c)(rǫ+e ) C K K r 2 2λ +C ( λ+c)(rǫ+e ), where we use the result in Lemma Using the definition ofr, and for any scalers, we have 2rs λ f m f 2 H κ 8 + 8s2 λ λ 4 f m f 2 H κ + λ 4 f f 2 H κ + 8s2 λ 2 P( l f m l f )+ 2 P( l f l f )+ 8s2 λ, where the last step explores the strong convexity of l Using the bound for2r[c ( λ+c)ǫ/2], we have P( l f m l f ( ) ) C λ+c e C K K r 2 2λ + 2 P( l f m l f )+ 2 P( l f l f )+ 2C2 ( λ+c) 2 ǫ 2 λ Using the fact Λ(f) P( l f l f ), we have 2 Λ(f m ) 3 ( ) 2 Λ(f )+C λ+c e + C K K r 2 2λ + 2C2 ( λ+c) 2 ǫ 2 λ We complete the proof by absorbing the constant terms into a numerical constantc 2 Appendix D: Proof of Theorem 3 To prove Theorem 3, we fist prove the following lemma that relates H r andĥr to matricesk r and K r, respectively Lemma 2 Assume λ r > 0 andλ r > 0 We have [ Kr ] i,j κ(x i, ),Ĥrκ(x j, ) Hκ, [K r ] i,j κ(x i, ),H r κ(x j, ) Hκ, i,j,, Proof By the definition ofĥr and expression of ϕ j, we have κ(x i, ),Ĥrκ(x j, ) m s,tk m s,tk k λ k κ(x i, ), ϕ k κ(x j, ), ϕ k λ k Vs,k Vt,k κ(x i, ),κ( x s, ) κ(x j, ),κ( x t, ) V s,k Vt,k λ k [K b ] i,s [K b ] j,t m [K b ] i,s K s,t [K b] j,t [K K b Kb ] i,j [ K r ] i,j s,t 3
4 where in the second equality we use equation (2) in the paper for ϕ j, and last line uses K r k λ k v k v k Similarly, we have κ(x i, ),H r κ(x j, ) κ(x i, ),ϕ k κ(x j, ),ϕ k λ k s,tk s,tk Proof of Theorem 3 k λ k V s,k V t,k κ(x i, ),κ(x s, ) κ(x j, ),κ(x t, ) V s,k V t,k λ k [K] i,s [K] j,t [K] i,s [Kr ] s,t [K] j,t [KKr K ] i,j [K r ] i,j s,t Define κ(x a,x i ) κ(x a, ), Hκ(x i, ) Hκ By Lemma 2, we have We have K r K r 2 2 max max u 2 i,j a u 2 i,j a [K r K r ] ia κ(x a,x i ) u i u j κ(x a,x i ) κ(x a,x j ) u i u j κ(x a, ), Hκ(x i, ) Hκ κ(x a, ), Hκ(x j, ) Hκ () Using the definition ofl in equaiton () in the paper and the reproducing property, for anyf,g H κ, we can have κ(x i, ),f κ(x i, ),g κ(x i, ),g f(x i ) L f,g Applying the above equality to f Hκ(x i, ),g Hκ(x j, ), we have κ(x a, ), Hκ(x i, ) κ(x a, ), Hκ(x j, ) L Hκ(x i, ), Hκ(x j, ) a Using equality (2) in (), we have κ(x i, ), HL Hκ(x j, ) (2) K r K r 2 2 max u i u j κ(x i, ), HL Hκ(x j, ) Hκ u 2 i,j Definef( ) u iκ(x i, ) We note thatκ(x i, ) can be expressed as an eigen-expansion form, ie, κ(x i, ) λj V i,j ϕ j ( ), (3) Using the relationship betweenκ(x i, ) and eigenfunctionsϕ j ( ) in equation (3), we have f( ) u i λi V i,j ϕ j ( ) L /2 [g]( ), j j where g( ) i,j u iv i,j ϕ j ( ) Since u 2, we have g 2 H κ u VV u, where V (v,,v ) We thus have K r K r max g Hκ L/2 g, HL HL /2 g H κ 2 max g Hκ g,l/2 HL HL /2 g H κ 2 L /2 HL HL /2 2 2 L /2 HL/
5 Appendix E: Proof of Theorem 4 To prove Theorem 4, we first prove the following lemma Lemma 3 Assume (λ r λ r+ )/ > 3 L L m HS, where HS stands for the Hilbert Schmidt norm of an integral operator Let Θ ( ϕ,, ϕ r ),Φ (ϕ,,ϕ r ),Φ (ϕ r+,,ϕ ) Then, there exists a matrixp R ( r) r satisfying such thatθ (Φ+ΦP)(I +P P) /2 P F 2 L L m HS L L m HS Proof of Lemma 3 we need the following perturbation result [3] Lemma 4 (Theorem 27 of Chapter 6 [3]) Let (λ i,v i ),i [n] be the eigenvalues and eigenvectors of a symmetric matrix A R n n ranked in the descending order of eigenvalues Set X (v,,v r ) andy (v r+,,v n ) Given a symmetric perturbation matrix E, let ( ) Ê (X,Y) Ê Ê E(X,Y) 2 Ê 2 Ê 22 Let represent a consistent family of norms and set γ Ê2,δ λ r λ r+ Ê Ê22 If δ > 0 and 2γ < δ, then there exists a unique matrix P R (n r) r satisfying P < 2γ δ such that are the eigenvectors ofa+e Define matrixb as X (X +YP)(I +P P) /2, Y (Y XP )(I +PP ) /2, B i,j m m λ k ϕ k,ϕ i ϕ k,ϕ j k Letz i be the eigenvector ofb corresponding to eigenvalue λ i /m It is straightforward to show that and therefore we have z i ( ϕ, ϕ i Hκ,, ϕ, ϕ i Hκ ),i [m], ϕ i z i,k ϕ k,i [m], orθ (Φ,Φ)Z, k where Z (z,,z r ) To decide the relationship between { ϕ i } r and {ϕ i}, we need to determine matrixz We define matrixd diag(λ /,,λ /) and matrixe B D, ie E i,j B i,j λ i δ i,j / ϕ i,(l m L )ϕ j Hκ Following the notation of Lemma 4, we define X (e,,e r ) and Y (e r+,,e ), where e,,e are the canonical bases of R, which are also eigenvectors of D (which corresponds to matrix A in Lemma 4) Defineδ andγ as follows γ r, jr+ δ i,j We simplify the statement to make it better fit with our objective i,jr+ 5
6 It is easy to verify that γ,δ are defined with respect to the Frobenius norm of Ê in Lemma 4 In order to apply the result in Lemma 4, we need to show δ > 0 andγ < δ/2 To this end, we need to provide the lower and upper bounds forγ andδ, respectively We first boundδ as δ L L m HS We then boundγ as γ r i,j jr+ j L L m HS Hence, when > 3 L L m HS, we haveδ > 2γ > 0, which satisfies the condition specified in Lemma 4 Thus, according to Lemma 4, there exists ap R ( r) r satisfying P < 2γ/δ, such that ( ) Z (z,,z r ) (X +YP)(I +P P) /2 Ir r (X,Y) (I +P P) /2 P implying Θ (Φ,Φ)Z (Φ,Φ)(X,Y) where we use (X,Y) I Proof of Theorem 4 Since we need to bound f,l /2 u2 i, and have ( Ir r P ) (I +P P) /2 (Φ+ΦP)(I +P P) /2, L /2 HL/2 2 max f Hκ f,l/2 HL/2 f H κ, HL/2 f H κ f,l /2 /2 ĤrL f H κ To this end, we write f( ) u iϕ i ( ), where i,j u i u j ϕ i,l /2 /2 ĤrL ϕ j Hκ u i u j λi λ j ϕ i,ĥrϕ j Hκ u DA 2 Du, i,j where u (u,,u ), D diag( λ /,, λ /) and A [ ϕ i, ϕ j Hκ ] r (Φ,Φ) Θ Since > 3 L L m HS, according to Lemma 3, there exists a matrixp R ( r) r satisfying P F 2 L L m HS L L m HS, such thatθ ( Φ+ΦP ) (I +P P) /2 Using the expression ofθ, we computea as Thus, we have A (Φ,Φ) Θ (Φ,Φ) ( Φ+ΦP ) (I +P P) /2 ( ) I (I +P P P) /2 f,l /2 HL/2 f H κ (( u I 0 D 0 0 u DBDu, ) AA ) Du 6
7 whereb is given by B ( I (I +P P) (I +P P) P P(I +P P) P(I +P P) P ( P(I +P P) P (I +P P) P P(I +P P) P(I +P P) P Rewriteu (u a,u b ) whereu a R r includes the firstr entries inuandu b includes the rest of the entries inu DefineD a diag( λ /,, λ r /) andd b diag( λ r+ /,, λ /) LetQ (I +P P) We have u DBDu u a 2 D a PQP D a 2 + u b 2 D b PQP D b 2 +2 u a u b D a QP D b 2 ( ) ( u a + u b ) 2 λ λ λ r+ max P 2 2, P 2 2max ( P 2 2, ) λ r+ / P 2, where we use the following facts PQP 2 P 2 2, QP 2 PQ 2 P 2 P 2, and λ We complete the proof by using the fact L /2 HL/2 2 max f Hκ f,l/2 HL/2 f H κ max u 2 u DBDu and the bound on P 2 in Lemma 3 References [] V Koltchinskii and M Yuan Sparsity in multiple kernel learning Annuals of Statistics, 38: , 200 [2] H Q Minh Some properties of gaussian reproducing kernel hilbert spaces and their implications for function approximation and learning theory Constructive Approximation, 32(2): , 200 [3] G W Stewart and J Sun Matrix Perturbation Theory Academic Press, 990 ) ) 7
Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning
Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,
More informationON THE SINGULAR DECOMPOSITION OF MATRICES
An. Şt. Univ. Ovidius Constanţa Vol. 8, 00, 55 6 ON THE SINGULAR DECOMPOSITION OF MATRICES Alina PETRESCU-NIŢǍ Abstract This paper is an original presentation of the algorithm of the singular decomposition
More informationj=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.
Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationFunctional Gradient Descent
Statistical Techniques in Robotics (16-831, F12) Lecture #21 (Nov 14, 2012) Functional Gradient Descent Lecturer: Drew Bagnell Scribe: Daniel Carlton Smith 1 1 Goal of Functional Gradient Descent We have
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationHere is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J
Class Notes 4: THE SPECTRAL RADIUS, NORM CONVERGENCE AND SOR. Math 639d Due Date: Feb. 7 (updated: February 5, 2018) In the first part of this week s reading, we will prove Theorem 2 of the previous class.
More informationEigenvalues and Eigenvectors
/88 Chia-Ping Chen Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Eigenvalue Problem /88 Eigenvalue Equation By definition, the eigenvalue equation for matrix
More informationLIE ALGEBRAS: LECTURE 7 11 May 2010
LIE ALGEBRAS: LECTURE 7 11 May 2010 CRYSTAL HOYT 1. sl n (F) is simple Let F be an algebraically closed field with characteristic zero. Let V be a finite dimensional vector space. Recall, that f End(V
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationTsung-Ming Huang. Matrix Computation, 2016, NTNU
Tsung-Ming Huang Matrix Computation, 2016, NTNU 1 Plan Gradient method Conjugate gradient method Preconditioner 2 Gradient method 3 Theorem Ax = b, A : s.p.d Definition A : symmetric positive definite
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationLinear Algebra Lecture Notes-II
Linear Algebra Lecture Notes-II Vikas Bist Department of Mathematics Panjab University, Chandigarh-64 email: bistvikas@gmail.com Last revised on March 5, 8 This text is based on the lectures delivered
More informationNumerical Linear Algebra
University of Alabama at Birmingham Department of Mathematics Numerical Linear Algebra Lecture Notes for MA 660 (1997 2014) Dr Nikolai Chernov April 2014 Chapter 0 Review of Linear Algebra 0.1 Matrices
More information1 Linear Algebra Problems
Linear Algebra Problems. Let A be the conjugate transpose of the complex matrix A; i.e., A = A t : A is said to be Hermitian if A = A; real symmetric if A is real and A t = A; skew-hermitian if A = A and
More informationReal Variables # 10 : Hilbert Spaces II
randon ehring Real Variables # 0 : Hilbert Spaces II Exercise 20 For any sequence {f n } in H with f n = for all n, there exists f H and a subsequence {f nk } such that for all g H, one has lim (f n k,
More informationRepresentation of objects. Representation of objects. Transformations. Hierarchy of spaces. Dimensionality reduction
Representation of objects Representation of objects For searching/mining, first need to represent the objects Images/videos: MPEG features Graphs: Matrix Text document: tf/idf vector, bag of words Kernels
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationLecture 1: Center for Uncertainty Quantification. Alexander Litvinenko. Computation of Karhunen-Loeve Expansion:
tifica Lecture 1: Computation of Karhunen-Loeve Expansion: Alexander Litvinenko http://sri-uq.kaust.edu.sa/ Stochastic PDEs We consider div(κ(x, ω) u) = f (x, ω) in G, u = 0 on G, with stochastic coefficients
More informationAlgebra C Numerical Linear Algebra Sample Exam Problems
Algebra C Numerical Linear Algebra Sample Exam Problems Notation. Denote by V a finite-dimensional Hilbert space with inner product (, ) and corresponding norm. The abbreviation SPD is used for symmetric
More informationSpectral analysis for rank one perturbations of diagonal operators in non-archimedean Hilbert space
Comment.Math.Univ.Carolin. 50,32009 385 400 385 Spectral analysis for rank one perturbations of diagonal operators in non-archimedean Hilbert space Toka Diagana, George D. McNeal Abstract. The paper is
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationMath 108b: Notes on the Spectral Theorem
Math 108b: Notes on the Spectral Theorem From section 6.3, we know that every linear operator T on a finite dimensional inner product space V has an adjoint. (T is defined as the unique linear operator
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More informationDimensionality Reduction Using the Sparse Linear Model: Supplementary Material
Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Ioannis Gkioulekas arvard SEAS Cambridge, MA 038 igkiou@seas.harvard.edu Todd Zickler arvard SEAS Cambridge, MA 038 zickler@seas.harvard.edu
More informationOnline Gradient Descent Learning Algorithms
DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline
More informationj=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).
Math 344 Lecture #19 3.5 Normed Linear Spaces Definition 3.5.1. A seminorm on a vector space V over F is a map : V R that for all x, y V and for all α F satisfies (i) x 0 (positivity), (ii) αx = α x (scale
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationIn particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with
Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient
More informationThe goal of this chapter is to study linear systems of ordinary differential equations: dt,..., dx ) T
1 1 Linear Systems The goal of this chapter is to study linear systems of ordinary differential equations: ẋ = Ax, x(0) = x 0, (1) where x R n, A is an n n matrix and ẋ = dx ( dt = dx1 dt,..., dx ) T n.
More informationGeometry on Probability Spaces
Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,
More informationOptimal multilevel preconditioning of strongly anisotropic problems.part II: non-conforming FEM. p. 1/36
Optimal multilevel preconditioning of strongly anisotropic problems. Part II: non-conforming FEM. Svetozar Margenov margenov@parallel.bas.bg Institute for Parallel Processing, Bulgarian Academy of Sciences,
More informationCS 246 Review of Linear Algebra 01/17/19
1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector
More informationav 1 x 2 + 4y 2 + xy + 4z 2 = 16.
74 85 Eigenanalysis The subject of eigenanalysis seeks to find a coordinate system, in which the solution to an applied problem has a simple expression Therefore, eigenanalysis might be called the method
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationLinear algebra for computational statistics
University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j
More informationThroughout these notes we assume V, W are finite dimensional inner product spaces over C.
Math 342 - Linear Algebra II Notes Throughout these notes we assume V, W are finite dimensional inner product spaces over C 1 Upper Triangular Representation Proposition: Let T L(V ) There exists an orthonormal
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory
More informationBackground Mathematics (2/2) 1. David Barber
Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and
More informationTHE INVERSE PROBLEM OF CENTROSYMMETRIC MATRICES WITH A SUBMATRIX CONSTRAINT 1) 1. Introduction
Journal of Computational Mathematics, Vol22, No4, 2004, 535 544 THE INVERSE PROBLEM OF CENTROSYMMETRIC MATRICES WITH A SUBMATRIX CONSTRAINT 1 Zhen-yun Peng Department of Mathematics, Hunan University of
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 2 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory April 5, 2012 Andre Tkacenko
More informationThe Lanczos and conjugate gradient algorithms
The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization
More informationG1110 & 852G1 Numerical Linear Algebra
The University of Sussex Department of Mathematics G & 85G Numerical Linear Algebra Lecture Notes Autumn Term Kerstin Hesse (w aw S w a w w (w aw H(wa = (w aw + w Figure : Geometric explanation of the
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationThe Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation
The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation Zheng-jian Bai Abstract In this paper, we first consider the inverse
More informationNotes on basis changes and matrix diagonalization
Notes on basis changes and matrix diagonalization Howard E Haber Santa Cruz Institute for Particle Physics, University of California, Santa Cruz, CA 95064 April 17, 2017 1 Coordinates of vectors and matrix
More informationStable Process. 2. Multivariate Stable Distributions. July, 2006
Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.
More informationSupplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data
Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department
More informationTHE MATRIX EIGENVALUE PROBLEM
THE MATRIX EIGENVALUE PROBLEM Find scalars λ and vectors x 0forwhich Ax = λx The form of the matrix affects the way in which we solve this problem, and we also have variety as to what is to be found. A
More information1 Last time: least-squares problems
MATH Linear algebra (Fall 07) Lecture Last time: least-squares problems Definition. If A is an m n matrix and b R m, then a least-squares solution to the linear system Ax = b is a vector x R n such that
More informationEECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels
EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationON THE HÖLDER CONTINUITY OF MATRIX FUNCTIONS FOR NORMAL MATRICES
Volume 10 (2009), Issue 4, Article 91, 5 pp. ON THE HÖLDER CONTINUITY O MATRIX UNCTIONS OR NORMAL MATRICES THOMAS P. WIHLER MATHEMATICS INSTITUTE UNIVERSITY O BERN SIDLERSTRASSE 5, CH-3012 BERN SWITZERLAND.
More informationREAL ANALYSIS II HOMEWORK 3. Conway, Page 49
REAL ANALYSIS II HOMEWORK 3 CİHAN BAHRAN Conway, Page 49 3. Let K and k be as in Proposition 4.7 and suppose that k(x, y) k(y, x). Show that K is self-adjoint and if {µ n } are the eigenvalues of K, each
More informationLinear algebra II Homework #1 solutions A = This means that every eigenvector with eigenvalue λ = 1 must have the form
Linear algebra II Homework # solutions. Find the eigenvalues and the eigenvectors of the matrix 4 6 A =. 5 Since tra = 9 and deta = = 8, the characteristic polynomial is f(λ) = λ (tra)λ+deta = λ 9λ+8 =
More informationRegularization via Spectral Filtering
Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationLecture 3. Random Fourier measurements
Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our
More informationSPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS
SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint
More informationLinear Algebra M1 - FIB. Contents: 5. Matrices, systems of linear equations and determinants 6. Vector space 7. Linear maps 8.
Linear Algebra M1 - FIB Contents: 5 Matrices, systems of linear equations and determinants 6 Vector space 7 Linear maps 8 Diagonalization Anna de Mier Montserrat Maureso Dept Matemàtica Aplicada II Translation:
More informationDominant Eigenvalue of a Sudoku Submatrix
Sacred Heart University DigitalCommons@SHU Academic Festival Apr 20th, 9:30 AM - 10:45 AM Dominant Eigenvalue of a Sudoku Submatrix Nicole Esposito Follow this and additional works at: https://digitalcommons.sacredheart.edu/acadfest
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,
More informationON THE JORDAN DECOMPOSITION OF TENSORED MATRICES OF JORDAN CANONICAL FORMS
Math J Okayama Univ 51 (2009), 133 148 ON THE JORDAN DECOMPOSITION OF TENSORED MATRICES OF JORDAN CANONICAL FORMS Kei-ichiro IIMA and Ryo IWAMATSU Abstract Let k be an algebraically closed field of characteristic
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,
More informationAnalysis Preliminary Exam Workshop: Hilbert Spaces
Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H
More informationChapter 4 Euclid Space
Chapter 4 Euclid Space Inner Product Spaces Definition.. Let V be a real vector space over IR. A real inner product on V is a real valued function on V V, denoted by (, ), which satisfies () (x, y) = (y,
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationTHE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR
THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR WEN LI AND MICHAEL K. NG Abstract. In this paper, we study the perturbation bound for the spectral radius of an m th - order n-dimensional
More informationOnline gradient descent learning algorithm
Online gradient descent learning algorithm Yiming Ying and Massimiliano Pontil Department of Computer Science, University College London Gower Street, London, WCE 6BT, England, UK {y.ying, m.pontil}@cs.ucl.ac.uk
More informationTheory of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationNonlinear Programming Algorithms Handout
Nonlinear Programming Algorithms Handout Michael C. Ferris Computer Sciences Department University of Wisconsin Madison, Wisconsin 5376 September 9 1 Eigenvalues The eigenvalues of a matrix A C n n are
More informationON THE QR ITERATIONS OF REAL MATRICES
Unspecified Journal Volume, Number, Pages S????-????(XX- ON THE QR ITERATIONS OF REAL MATRICES HUAJUN HUANG AND TIN-YAU TAM Abstract. We answer a question of D. Serre on the QR iterations of a real matrix
More informationMATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.
MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. Adjoint operator and adjoint matrix Given a linear operator L on an inner product space V, the adjoint of L is a transformation
More information11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.
C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a
More informationSymmetric and anti symmetric matrices
Symmetric and anti symmetric matrices In linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Formally, matrix A is symmetric if. A = A Because equal matrices have equal
More informationLinear Algebra 1. M.T.Nair Department of Mathematics, IIT Madras. and in that case x is called an eigenvector of T corresponding to the eigenvalue λ.
Linear Algebra 1 M.T.Nair Department of Mathematics, IIT Madras 1 Eigenvalues and Eigenvectors 1.1 Definition and Examples Definition 1.1. Let V be a vector space (over a field F) and T : V V be a linear
More informationSingular Value Decomposition and Polar Form
Chapter 14 Singular Value Decomposition and Polar Form 14.1 Singular Value Decomposition for Square Matrices Let f : E! E be any linear map, where E is a Euclidean space. In general, it may not be possible
More informationVibrating-string problem
EE-2020, Spring 2009 p. 1/30 Vibrating-string problem Newton s equation of motion, m u tt = applied forces to the segment (x, x, + x), Net force due to the tension of the string, T Sinθ 2 T Sinθ 1 T[u
More informationContents. Appendix D (Inner Product Spaces) W-51. Index W-63
Contents Appendix D (Inner Product Spaces W-5 Index W-63 Inner city space W-49 W-5 Chapter : Appendix D Inner Product Spaces The inner product, taken of any two vectors in an arbitrary vector space, generalizes
More information1. General Vector Spaces
1.1. Vector space axioms. 1. General Vector Spaces Definition 1.1. Let V be a nonempty set of objects on which the operations of addition and scalar multiplication are defined. By addition we mean a rule
More informationSupplementary Material for Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
Supplementary Material for Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition Serhat S. Bucak bucakser@cse.msu.edu Rong Jin rongjin@cse.msu.edu Anil
More informationFUNCTIONAL ANALYSIS LECTURE NOTES: ADJOINTS IN HILBERT SPACES
FUNCTIONAL ANALYSIS LECTURE NOTES: ADJOINTS IN HILBERT SPACES CHRISTOPHER HEIL 1. Adjoints in Hilbert Spaces Recall that the dot product on R n is given by x y = x T y, while the dot product on C n is
More informationRKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee
RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators
More informationSpectral Regularization
Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationMath 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.
Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,
More informationMathematische Methoden der Unsicherheitsquantifizierung
Mathematische Methoden der Unsicherheitsquantifizierung Oliver Ernst Professur Numerische Mathematik Sommersemester 2014 Contents 1 Introduction 1.1 What is Uncertainty Quantification? 1.2 A Case Study:
More informationThe following definition is fundamental.
1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationChapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors
Chapter 7 Canonical Forms 7.1 Eigenvalues and Eigenvectors Definition 7.1.1. Let V be a vector space over the field F and let T be a linear operator on V. An eigenvalue of T is a scalar λ F such that there
More informationLinear Algebra using Dirac Notation: Pt. 2
Linear Algebra using Dirac Notation: Pt. 2 PHYS 476Q - Southern Illinois University February 6, 2018 PHYS 476Q - Southern Illinois University Linear Algebra using Dirac Notation: Pt. 2 February 6, 2018
More informationThe Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR
Appendix B The Derivative B.1 The Derivative of f In this chapter, we give a short summary of the derivative. Specifically, we want to compare/contrast how the derivative appears for functions whose domain
More informationLecture Notes for Inf-Mat 3350/4350, Tom Lyche
Lecture Notes for Inf-Mat 3350/4350, 2007 Tom Lyche August 5, 2007 2 Contents Preface vii I A Review of Linear Algebra 1 1 Introduction 3 1.1 Notation............................... 3 2 Vectors 5 2.1 Vector
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationLinear Algebra Review
January 29, 2013 Table of contents Metrics Metric Given a space X, then d : X X R + 0 and z in X if: d(x, y) = 0 is equivalent to x = y d(x, y) = d(y, x) d(x, y) d(x, z) + d(z, y) is a metric is for all
More informationLinear Algebra review Powers of a diagonalizable matrix Spectral decomposition
Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2018 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing
More informationMathematical Optimisation, Chpt 2: Linear Equations and inequalities
Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Peter J.C. Dickinson p.j.c.dickinson@utwente.nl http://dickinson.website version: 12/02/18 Monday 5th February 2018 Peter J.C. Dickinson
More informationMath Solutions to homework 5
Math 75 - Solutions to homework 5 Cédric De Groote November 9, 207 Problem (7. in the book): Let {e n } be a complete orthonormal sequence in a Hilbert space H and let λ n C for n N. Show that there is
More informationLinear Ordinary Differential Equations
MTH.B402; Sect. 1 20180703) 2 Linear Ordinary Differential Equations Preliminaries: Matrix Norms. Denote by M n R) the set of n n matrix with real components, which can be identified the vector space R
More information