Secrets of Matrix Factorization: Supplementary Document
|
|
- Mervyn Stewart
- 6 years ago
- Views:
Transcription
1 Secrets of Matrix Factorization: Supplementary Document Je Hyeong Hong University of Cambridge Andrew Fitzgibbon Microsoft, Cambridge, UK Abstract This document consists of details to which the original submission article has made references. These include a list of symbols, some matrix identities and calculus rules, a list of VarPro derivatives, symmetry analysis of the un-regularized VarPro Gauss-Newton matrix and MTSS computation details. Symbol Meaning R p q S p The space of real matrices of size p q The space of symmetric real matrices of size p p M The measurement matrix R m n (may contain noise and missing data) W The weight matrix R m n U The first decomposition matrix R m r V The second decomposition matrix R n r u vec(u) R mr v vec(v ) R nr m The number of rows in M n The number of columns in M p The number of visible entries in M p j The number of visible entries in the j-th column of M f The objective R ε The vector objective R p J Jacobian H Hessian V arg min V f(u, V) R n r Π Projection matrix R p mn selecting nonzero entries of vec(w) W Π diag(vec(w)) R p mn m W vec(m) R p Ũ W(I n U) R p nr Ṽ W(V I m ) R p mr The permutation matrix satisfying vec(u ) = K mr vec(u) R mr mr The permutation matrix satisfying vec(v ) = K nr vec(v) R nr nr R W (UV M) R m n Z (W R) I r R mr nr K mr K nr Table : A list of symbols used in [].
2 . Symbols A list of symbols used in [] can be found in Table. 2. Matrix identities and calculus rules Some of the key matrix identities and calculus rules in [6, 5] are illustrated and extended below for convenience. 2.. Vec and Kronecker product identities Given that each small letter in bold is an arbitrary column vector of appropriate size, we have 2.2. Differentiation of µ-pseudo inverse vec(a B) = diag(vec A) vec(b), () vec(axb) = (B A) vec(x), (2) vec(ab) = (B I) vec(a) = (I A) vec(b), (3) (A B)(C D) = AC BD, (4) (A b)c = AC b, (5) vec(a ) = K A vec(a), (6) (A B) = K (B A)K 2, and (7) (A b) = K (b A). (8) Let us first define µ-pseudo inverse X µ := (X X + µi) X. The derivative of µ-pseudo inverse is [X µ ] = [(X X + µi) ]X + (X X + µi) [X] (9) = (X X + µi) [X X + µi](x X + µi) X + (X X + µi) [X] (0) = (X X + µi) ( [X ]X + X [X])(X X + µi) X + (X X + µi) [X] () = (X X + µi) [X ]XX µ (X X + µi) X [X]X µ + (X X + µi) [X] (2) = X µ [X]X µ + (X X + µi) [X] (I XX µ ). (3) The derivative of pseudo-inverse X := (X X) X is then simply the above for µ = Derivatives for variable projection (VarPro) A full list of derivatives for un-regularized VarPro can be found in Table 2. Derivations are illustrated in [2]. The regularized setting is also discussed in [2]. 3.. Differentiating v (u) From [], we have Substituting (4) to the original objective yields Taking [v ] and applying (3) gives us v (u) := arg min f(u, v) = Ũ µ m. (4) v ε (u, v (u)) = Ũv m = (I ŨŨ µ ) m. (5) [v ] = Ũ µ [Ũ]Ũ µ m + (Ũ Ũ + µi) [Ũ] (I ŨŨ µ ) m (6) = Ũ µ [Ũ]v (Ũ Ũ + µi) [Ũ] ε / noting v (u) = Ũ µ m and (5) (7) = Ũ µ Ṽ u (Ũ Ũ + µi) Z K mr u, / noting bilinearity and the results in [2] (8) and hence du = (Ũ Ũ + µi) (Ũ Ṽ + Z K mr ). (9)
3 Quantity Definition ALS (RW3) + [Approx. Gauss-Newton (RW2)] RW 2 + [Full Gauss-Newton (RW)] RW + [Full Newton] F N w/o Damping w/o Projection constraint P v (u) R nr du Rnr mr v := arg min f(u, v) v du := d vec(v ) d vec(u) Cost vector ε R p ε ε := Ũv m Jacobian J R p mr J := dε du Cost function f R Ordinary gradient g R mr Projected gradient g p R mr Reduced gradient g r R mr r2 Ordinary Hessian H GN, H S mr Projected Hessian H pgn, H p S mr Reduced Hessian H r S (mr r2 ) f := ε 2 2 g := df du g p := P p g (P p := I mr I r UU ) g r := P r g (P r := I r U ) 2 H GN 2 H := 2 := J J + λi dg du + λi 2 H pgn := P pj J P p + λi 2 H p := 2 P dg p p du P p + λi (P p := I r (I m UU )) 2 H r := 2 P rh pp r (P r := I r U ) v = Ũ m ( m R p consists of non-zero elements of m.) du = [Ũ Ṽ ] RW 2 [(Ũ Ũ) Z K mr ] RW (Z := (W R ) I r ) J = (I p [ŨŨ ] RW 2 )Ṽ [Ũ Z K mr ] RW f = W (UV M) F 2 g = Ṽ ε = vec((w R )V ) 2 g p = 2 g 2 g r = 2 P rg 2 H GN = Ṽ (I p [ŨŨ ] RW 2 )Ṽ + [K mrz (Ũ Ũ) Z K mr ] RW + λi mr 2 H = 2 H GN [2K mrz (Ũ Ũ) Z K mr ] F N [K mrz Ũ Ṽ + Ṽ Ũ Z K mr ] F N + λi mr 2 H pgn = 2 H GN 2 H p = 2 H pgn [2K mrz (Ũ Ũ) Z K mr ] F N [K mrz Ũ Ṽ P p + P p Ṽ Ũ Z K mr ] F N + αi r UU P + λi mr 2 H r = 2 P rh pp r = 2 P rh P r = 2 P dg r du P r + λi mr K mr vec(u) = vec(u ) R := W (UV M) Z := (W R ) I r Ordinary update Ordinary update for U U U unvec(h g) Projected update Grassmann manifold retraction U qf(u unvec(h p gp)) Reduced update Grassmann manifold retraction U qf(u P r unvec(h r gr)) Table 2: List of derivatives and extensions for un-regularized variable projection on low-rank matrix factorization with missing data. This includes notions to Ruhe and Wedin s algorithms.
4 4. Additional symmetry of the Gauss-Newton matrix for un-regularized VarPro To observe the additional symmetry, we first need to know how the un-regularized VarPro problem can be reformulated in terms of column-wise derivatives. Below is a summary of these derivatives found in [2]. 4.. Summary of column-wise derivatives for VarPro The unregularized objective f (u, v (u)) can be defined as the sum of squared norm of each column in R (u, v (u)). i.e. f (u, v (u)) := W j (Uvj m j ) 2 2, (20) j= where v j is the j-th row of V (U), m j is the j-th column of M and W j is the non-zero rows of diag vec(w j ), where w j R m is the j-th column of W. Hence, W j is a p j m weight-projection matrix, where p j is the number of non-missing entries in column j. Now define the followings: Hence, v j := arg min v j Ũ j := W j U (2) m j := W j m j, and (22) ε j = Ũ j v j m j. (23) Ũ j vj m j 2 2 (24) j= which shows that each row of V is independent of other rows. 4.. Some definitions We define = arg min Ũ j vj m j 2 2 = Ũ j m j, (25) v j Ṽ j := W j (v j I m ) R pj mr, and (26) Z j := (w j r j ) I r R mr r. (27) 4..2 Column-wise Jacobian J j The Jacobian matrix is also independent for each column. From [2], the Jacobian matrix for column j, J j, is J j = (I Ũ j Ũ j )Ṽ j Ũ j Z j K mr (28) = (I Ũ j Ũ j )Ṽ j Ũ ( j (Ũ j Ũj) (w j r j ) ). (29) 4..3 The column-wise form of Hessian H From [2], the column-wise form of H is 2 H = [ Ṽ j (I [Ũ j Ũ j ] RW 2)Ṽ j + [ ] F N [K mrz j (Ũ j Ũj) Z j K mr ] RW j= [Ṽ j Ũ j Z All Z j K mr can be replaced with I (w j r j ) using (8). j K mr + K mrz j Ũ jṽ j ] F N ] + λi. (30)
5 4.2. Symmetry analysis As derived in [2], the Gauss-Newton matrix for un-regularized VarPro is identical to its projection to the tangent space of U. The column-wise representation of this is 2 H GN = Simplifying the first term yields [ ] Ṽ j (I [Ũ j Ũ j ] RW 2)Ṽ j + [K mrz j (Ũ j Ũj) Z j K mr ] RW + λi. (3) j= Ṽ j Ṽ j = (vj I) W W j j (vj I) (32) = (v j W j )(v j W j ) = v j v j W j W j, / noting (5) (33) which is the Kronecker product of two symmetric matrices. Now defining Ũ jq := qf(ũ j ) gives Ũ j Ũ j = Ũ jq Ũ jq. Hence, simplifying the second term yields Ṽ j (Ũ j Ũ j )Ṽ j = (v j W j )(Ũ jq Ũ jq)(v j W j ) (34) Given that Ũ j = Ũ jq Ũ jr, we can transform (Ũ j Ũj) to Ũ = v j v j W j ŨjQŨ jq W j. / noting (5) (35) jr Ũ jr. The last term then becomes K mrz j (Ũ j Ũj) Z j K mr = (I (w j r j ))Ũ jr Ũ jr (I (w j r j ) ) / noting (8) (36) = Ũ jr Ũ jr (w j r j )(w j r j ). / noting (5) (37) Hence, all the terms inside the square brackets in (3) can be represented as the Kronecker product of two symmetric matrices, which has an additional internal symmetry that can be exploited. This means that computing the Gauss-Newton matrix for un-regularized VarPro can be sped up by factor of 4 at most. 5. Computing Mean time to second success (MTSS) As mentioned in [], computing MTSS is based on the assumption that there exists a known best optimum on a dataset to which thousands or millions of past runs have converged. Given that we have such dataset, we first draw a set of random starting points, usually between 20 to 00, from an isotropic Normal distribution (i.e. U = randn(m, r) with some predefined seed). For each sample of starting points, we run each algorithm and observe two things,. whether the algorithm has successfully converged to the best-known optimum, and 2. the elapsed time. Since the samples of starting points have been drawn in a pseudo-random manner with a monotonically-increasing seed number, we can now estimate how long it would have taken each algorithm to observe two successes starting from each sample number. The average of these yields MTSS. An example is included in the next section to demonstrate this procedures more clearly. 5.. An example illustration Table 3 shows some experimental data produced by an algorithm on a dataset of which the best-known optimum is.225. For each sample of starting points, the final cost and the elapsed time have been recorded. We first determine which samples have yielded success, which is defined as convergence to the best-known optimum (.225). Then, for each sample number, we can calculate up to which sample number we should have run to obtain two successes. e.g. when starting from sample number 5, we would have needed to run the algorithm up to sample number 7. This allows us to estimate the times required for two successes starting from each sample number (see Table 3). MTSS is simply the average of these times, 432.4/7 = 6.8.
6 Sample no Final cost Elapsed time (s) Success (S) S S S S S Sample numbers of 2 & 4 2 & 4 4 & 6 4 & 6 6 & 7 6 & 7 7 & 9 N/A N/A N/A next 2 successes Time required for 2 successes (s) N/A N/A N/A Table 3: Above example data is used to compute MTSS of an algorithm. We assume that each sample starts from randomlydrawn initial points and that the best-known optimum for this dataset is.225 after millions of trials. In this case, MTSS is the average of the bottom row values which is Results The supplementary package includes a set of csv files that were used to generate the results figures in []. They can be found in the <main>/results directory. 6.. Limitations Some results are incomplete for the following reasons:. RTRMC [3] fails to run on FAC and Scu datasets. The algorithm has also failed on several occasions when running on other datasets. The suspected cause of such failure is the algorithm s use of Cholesky decomposition, which is unable to decompose non-positive definite matrices. 2. The original Damped Wiberg code (TO DW) [7] fails to run on FAC and Scu datasets. The way in which the code implements QR-decomposition requires datasets to be trimmed in advance so that each column in the measurement matrix has at least the number of visible entries equal to the estimated rank of the factorization model. 3. CSF [4] runs extremely slowly on UGb dataset. 4. Our Ceres-implemented algorithms are not yet able to take in sparse measurement matrices, which is required to handle large datasets such as NET. References [] Authors. Secrets of matrix factorization: Approximations, numerics and manifold optimization. 205., 2, 5, 6 [2] Authors. Secrets of matrix factorization: Further derivations and comparisons. Technical report, further.pdf, , 4, 5 [3] N. Boumal and P.-A. Absil. RTRMC: A Riemannian trust-region method for low-rank matrix completion. In Advances in Neural Information Processing Systems 24 (NIPS 20), pages [4] P. F. Gotardo and A. M. Martinez. Computing smooth time trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(0): , Oct [5] J. R. Magnus and H. Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics. 3rd edition, [6] T. P. Minka. Old and new matrix algebra useful for statistics. Technical report, Microsoft Research, [7] T. Okatani, T. Yoshida, and K. Deguchi. Efficient algorithm for low-rank matrix factorization with missing components and performance comparison of latest algorithms. In Proceedings of the 20 IEEE International Conference on Computer Vision (ICCV), pages , 20. 6
A Derivation of the 3-entity-type model
A Derivation of the 3-entity-type model The 3-entity-type model consists of two relationship matrices, a m n matrix X and a n r matrix Y In the spirit of generalized linear models we define the low rank
More informationRevisiting the Variable Projection Method for Separable Nonlinear Least Squares Problems
Revisiting the Variable Projection Method for Separable Nonlinear Least Squares Problems e Hyeong Hong University of Cambridge jhh37@cam.ac. Christopher Zach Toshiba Research Europe christopher.m.zach@gmail.com
More informationMatrix Derivatives and Descent Optimization Methods
Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign
More informationEfficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm
Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson, Anton van den Hengel School of Computer Science University of Adelaide,
More informationVectors. January 13, 2013
Vectors January 13, 2013 The simplest tensors are scalars, which are the measurable quantities of a theory, left invariant by symmetry transformations. By far the most common non-scalars are the vectors,
More informationGeneral and Nested Wiberg Minimization: L 2 and Maximum Likelihood
General and Nested Wiberg Minimization: L 2 and Maximum Likelihood Dennis Strelow Google Mountain View, CA strelow@google.com Abstract. Wiberg matrix factorization breaks a matrix Y into low-rank factors
More informationLearning Task Grouping and Overlap in Multi-Task Learning
Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International
More informationLecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan
From Matrix to Tensor: The Transition to Numerical Multilinear Algebra Lecture 4. Tensor-Related Singular Value Decompositions Charles F. Van Loan Cornell University The Gene Golub SIAM Summer School 2010
More informationEfficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm
Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel School of Computer Science University of Adelaide,
More informationFoundations of Computer Vision
Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply
More informationData-driven signal processing
1 / 35 Data-driven signal processing Ivan Markovsky 2 / 35 Modern signal processing is model-based 1. system identification prior information model structure 2. model-based design identification data parameter
More informationRobust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds
Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.
More informationPose estimation from point and line correspondences
Pose estimation from point and line correspondences Giorgio Panin October 17, 008 1 Problem formulation Estimate (in a LSE sense) the pose of an object from N correspondences between known object points
More informationCorrigendum to Inference on impulse response functions in structural VAR models [J. Econometrics 177 (2013), 1-13]
Corrigendum to Inference on impulse response functions in structural VAR models [J. Econometrics 177 (2013), 1-13] Atsushi Inoue a Lutz Kilian b a Department of Economics, Vanderbilt University, Nashville
More informationLearning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems
More informationMethod 1: Geometric Error Optimization
Method 1: Geometric Error Optimization we need to encode the constraints ŷ i F ˆx i = 0, rank F = 2 idea: reconstruct 3D point via equivalent projection matrices and use reprojection error equivalent projection
More informationOptimisation on Manifolds
Optimisation on Manifolds K. Hüper MPI Tübingen & Univ. Würzburg K. Hüper (MPI Tübingen & Univ. Würzburg) Applications in Computer Vision Grenoble 18/9/08 1 / 29 Contents 2 Examples Essential matrix estimation
More informationENGG5781 Matrix Analysis and Computations Lecture 9: Kronecker Product
ENGG5781 Matrix Analysis and Computations Lecture 9: Kronecker Product Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University of Hong Kong Kronecker product and
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationInstitute for Computational Mathematics Hong Kong Baptist University
Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-0 How to find a good submatrix S. A. Goreinov, I. V. Oseledets, D. V. Savostyanov, E. E. Tyrtyshnikov, N. L.
More informationMatrix differential calculus Optimization Geoff Gordon Ryan Tibshirani
Matrix differential calculus 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Matrix differentials: sol n to matrix calculus pain compact way of writing Taylor expansions, or definition: df = a(x;
More informationCorrigendum to Inference on impulse. response functions in structural VAR models. [J. Econometrics 177 (2013), 1-13]
Corrigendum to Inference on impulse response functions in structural VAR models [J. Econometrics 177 (2013), 1-13] Atsushi Inoue a Lutz Kilian b a Department of Economics, Vanderbilt University, Nashville
More informationnonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.
Introduction to nonlinear LS estimation R. I. Hartley and A. Zisserman: Multiple View Geometry in Computer Vision. Cambridge University Press, 2ed., 2004. After Chapter 5 and Appendix 6. We will use x
More information18.06SC Final Exam Solutions
18.06SC Final Exam Solutions 1 (4+7=11 pts.) Suppose A is 3 by 4, and Ax = 0 has exactly 2 special solutions: 1 2 x 1 = 1 and x 2 = 1 1 0 0 1 (a) Remembering that A is 3 by 4, find its row reduced echelon
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationMathematical Foundations of Applied Statistics: Matrix Algebra
Mathematical Foundations of Applied Statistics: Matrix Algebra Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/105 Literature Seber, G.
More informationRank-constrained Optimization: A Riemannian Manifold Approach
Introduction Rank-constrained Optimization: A Riemannian Manifold Approach Guifang Zhou 1, Wen Huang 2, Kyle A. Gallivan 1, Paul Van Dooren 2, P.-A. Absil 2 1- Florida State University 2- Université catholique
More informationOn Expected Gaussian Random Determinants
On Expected Gaussian Random Determinants Moo K. Chung 1 Department of Statistics University of Wisconsin-Madison 1210 West Dayton St. Madison, WI 53706 Abstract The expectation of random determinants whose
More informationCharacterization of Gradient Dominance and Regularity Conditions for Neural Networks
Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past
More informationLecture 4. CP and KSVD Representations. Charles F. Van Loan
Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD Representations Charles F. Van Loan Cornell University CIME-EMS Summer School June 22-26, 2015 Cetraro, Italy Structured Matrix
More informationComputational methods for mixed models
Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different
More informationKRONECKER PRODUCT AND LINEAR MATRIX EQUATIONS
Proceedings of the Second International Conference on Nonlinear Systems (Bulletin of the Marathwada Mathematical Society Vol 8, No 2, December 27, Pages 78 9) KRONECKER PRODUCT AND LINEAR MATRIX EQUATIONS
More informationSubspace Projection Matrix Completion on Grassmann Manifold
Subspace Projection Matrix Completion on Grassmann Manifold Xinyue Shen and Yuantao Gu Dept. EE, Tsinghua University, Beijing, China http://gu.ee.tsinghua.edu.cn/ ICASSP 2015, Brisbane Contents 1 Background
More informationSparse Levenberg-Marquardt algorithm.
Sparse Levenberg-Marquardt algorithm. R. I. Hartley and A. Zisserman: Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, 2004. Appendix 6 was used in part. The Levenberg-Marquardt
More informationReview Questions REVIEW QUESTIONS 71
REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of
More informationVisual SLAM Tutorial: Bundle Adjustment
Visual SLAM Tutorial: Bundle Adjustment Frank Dellaert June 27, 2014 1 Minimizing Re-projection Error in Two Views In a two-view setting, we are interested in finding the most likely camera poses T1 w
More informationMatrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University
Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small
More informationCVPR A New Tensor Algebra - Tutorial. July 26, 2017
CVPR 2017 A New Tensor Algebra - Tutorial Lior Horesh lhoresh@us.ibm.com Misha Kilmer misha.kilmer@tufts.edu July 26, 2017 Outline Motivation Background and notation New t-product and associated algebraic
More informationLinear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg
Linear Algebra, part 3 Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2010 Going back to least squares (Sections 1.7 and 2.3 from Strang). We know from before: The vector
More informationVersal deformations in generalized flag manifolds
Versal deformations in generalized flag manifolds X. Puerta Departament de Matemàtica Aplicada I Escola Tècnica Superior d Enginyers Industrials de Barcelona, UPC Av. Diagonal, 647 08028 Barcelona, Spain
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 141 Part III
More informationRanking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization
Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis
More informationTensor Analysis in Euclidean Space
Tensor Analysis in Euclidean Space James Emery Edited: 8/5/2016 Contents 1 Classical Tensor Notation 2 2 Multilinear Functionals 4 3 Operations With Tensors 5 4 The Directional Derivative 5 5 Curvilinear
More informationBasic Math for
Basic Math for 16-720 August 23, 2002 1 Linear Algebra 1.1 Vectors and Matrices First, a reminder of a few basic notations, definitions, and terminology: Unless indicated otherwise, vectors are always
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationThe Singular Value Decomposition
The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 4
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 4 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory April 12, 2012 Andre Tkacenko
More informationThe Bock iteration for the ODE estimation problem
he Bock iteration for the ODE estimation problem M.R.Osborne Contents 1 Introduction 2 2 Introducing the Bock iteration 5 3 he ODE estimation problem 7 4 he Bock iteration for the smoothing problem 12
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More informationTwo-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix
Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix René Vidal Stefano Soatto Shankar Sastry Department of EECS, UC Berkeley Department of Computer Sciences, UCLA 30 Cory Hall,
More informationToday s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn
Today s class Linear Algebraic Equations LU Decomposition 1 Linear Algebraic Equations Gaussian Elimination works well for solving linear systems of the form: AX = B What if you have to solve the linear
More informationLow-rank matrix completion via preconditioned optimization on the Grassmann manifold
Low-rank matrix completion via preconditioned optimization on the Grassmann manifold Nicolas Boumal a, P.-A. Absil b a Inria & D.I., UMR 8548, Ecole Normale Supérieure, Paris, France. b Department of Mathematical
More informationA Study of Kruppa s Equation for Camera Self-calibration
Proceedings of the International Conference of Machine Vision and Machine Learning Prague, Czech Republic, August 14-15, 2014 Paper No. 57 A Study of Kruppa s Equation for Camera Self-calibration Luh Prapitasari,
More informationComparative Statics. Autumn 2018
Comparative Statics Autumn 2018 What is comparative statics? Contents 1 What is comparative statics? 2 One variable functions Multiple variable functions Vector valued functions Differential and total
More informationGrassmann Averages for Scalable Robust PCA Supplementary Material
Grassmann Averages for Scalable Robust PCA Supplementary Material Søren Hauberg DTU Compute Lyngby, Denmark sohau@dtu.dk Aasa Feragen DIKU and MPIs Tübingen Denmark and Germany aasa@diku.dk Michael J.
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationSparse and low-rank decomposition for big data systems via smoothed Riemannian optimization
Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Yuanming Shi ShanghaiTech University, Shanghai, China shiym@shanghaitech.edu.cn Bamdev Mishra Amazon Development
More informationOptimization on the Manifold of Multiple Homographies
Optimization on the Manifold of Multiple Homographies Anders Eriksson, Anton van den Hengel University of Adelaide Adelaide, Australia {anders.eriksson, anton.vandenhengel}@adelaide.edu.au Abstract It
More informationLecture 9 Least Square Problems
March 26, 2018 Lecture 9 Least Square Problems Consider the least square problem Ax b (β 1,,β n ) T, where A is an n m matrix The situation where b R(A) is of particular interest: often there is a vectors
More informationA Factorization Method for 3D Multi-body Motion Estimation and Segmentation
1 A Factorization Method for 3D Multi-body Motion Estimation and Segmentation René Vidal Department of EECS University of California Berkeley CA 94710 rvidal@eecs.berkeley.edu Stefano Soatto Dept. of Computer
More informationII. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES
II. DIFFERENTIABLE MANIFOLDS Washington Mio Anuj Srivastava and Xiuwen Liu (Illustrations by D. Badlyans) CENTER FOR APPLIED VISION AND IMAGING SCIENCES Florida State University WHY MANIFOLDS? Non-linearity
More informationRecurrent Neural Network Approach to Computation of Gener. Inverses
Recurrent Neural Network Approach to Computation of Generalized Inverses May 31, 2016 Introduction The problem of generalized inverses computation is closely related with the following Penrose equations:
More informationOutline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St
Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationSolving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method
Solving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method Yunshen Zhou Advisor: Prof. Zhaojun Bai University of California, Davis yszhou@math.ucdavis.edu June 15, 2017 Yunshen
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationLinear Algebra Short Course Lecture 2
Linear Algebra Short Course Lecture 2 Matthew J. Holland matthew-h@is.naist.jp Mathematical Informatics Lab Graduate School of Information Science, NAIST 1 Some useful references Introduction to linear
More informationDUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee
J. Korean Math. Soc. 0 (0), No. 0, pp. 1 0 https://doi.org/10.4134/jkms.j160152 pissn: 0304-9914 / eissn: 2234-3008 DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM
More informationTRACKING and DETECTION in COMPUTER VISION
Technischen Universität München Winter Semester 2013/2014 TRACKING and DETECTION in COMPUTER VISION Template tracking methods Slobodan Ilić Template based-tracking Energy-based methods The Lucas-Kanade(LK)
More information1 Differentiable manifolds and smooth maps. (Solutions)
1 Differentiable manifolds and smooth maps Solutions Last updated: March 17 2011 Problem 1 The state of the planar pendulum is entirely defined by the position of its moving end in the plane R 2 Since
More informationMatrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved.
Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved. 1 Converting Matrices Into (Long) Vectors Convention:
More informationNumerical Methods for CSE. Examination - Solutions
R. Hiptmair A. Moiola Fall term 2010 Numerical Methods for CSE ETH Zürich D-MATH Examination - Solutions February 10th, 2011 Total points: 65 = 8+14+16+16+11 (These are not master solutions but just a
More informationHigh Accuracy Fundamental Matrix Computation and Its Performance Evaluation
High Accuracy Fundamental Matrix Computation and Its Performance Evaluation Kenichi Kanatani Department of Computer Science, Okayama University, Okayama 700-8530 Japan kanatani@suri.it.okayama-u.ac.jp
More informationQuick Introduction to Nonnegative Matrix Factorization
Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices
More informationELA THE OPTIMAL PERTURBATION BOUNDS FOR THE WEIGHTED MOORE-PENROSE INVERSE. 1. Introduction. Let C m n be the set of complex m n matrices and C m n
Electronic Journal of Linear Algebra ISSN 08-380 Volume 22, pp. 52-538, May 20 THE OPTIMAL PERTURBATION BOUNDS FOR THE WEIGHTED MOORE-PENROSE INVERSE WEI-WEI XU, LI-XIA CAI, AND WEN LI Abstract. In this
More informationNonnegative Matrix Factorization Clustering on Multiple Manifolds
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,
More informationUsing Hankel structured low-rank approximation for sparse signal recovery
Using Hankel structured low-rank approximation for sparse signal recovery Ivan Markovsky 1 and Pier Luigi Dragotti 2 Department ELEC Vrije Universiteit Brussel (VUB) Pleinlaan 2, Building K, B-1050 Brussels,
More informationLow Rank Approximation Lecture 7. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL
Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Alternating least-squares / linear scheme General setting:
More informationLecture 9: Numerical Linear Algebra Primer (February 11st)
10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template
More informationMatrix-Tensor and Deep Learning in High Dimensional Data Analysis
Matrix-Tensor and Deep Learning in High Dimensional Data Analysis Tien D. Bui Department of Computer Science and Software Engineering Concordia University 14 th ICIAR Montréal July 5-7, 2017 Introduction
More informationComputational Methods. Least Squares Approximation/Optimization
Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the
More informationRiemannian Optimization Method on the Flag Manifold for Independent Subspace Analysis
Riemannian Optimization Method on the Flag Manifold for Independent Subspace Analysis Yasunori Nishimori 1, Shotaro Akaho 1, and Mark D. Plumbley 2 1 Neuroscience Research Institute, National Institute
More informationENGG5781 Matrix Analysis and Computations Lecture 0: Overview
ENGG5781 Matrix Analysis and Computations Lecture 0: Overview Wing-Kin (Ken) Ma 2018 2019 Term 1 Department of Electronic Engineering The Chinese University of Hong Kong Course Information W.-K. Ma, ENGG5781
More informationReconstruction from projections using Grassmann tensors
Reconstruction from projections using Grassmann tensors Richard I. Hartley 1 and Fred Schaffalitzky 2 1 Australian National University and National ICT Australia, Canberra 2 Australian National University,
More informationComputational Multilinear Algebra
Computational Multilinear Algebra Charles F. Van Loan Supported in part by the NSF contract CCR-9901988. Involves Working With Kronecker Products b 11 b 12 b 21 b 22 c 11 c 12 c 13 c 21 c 22 c 23 c 31
More informationB553 Lecture 5: Matrix Algebra Review
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations
More informationFast Nonnegative Matrix Factorization with Rank-one ADMM
Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,
More informationDerivation of the Kalman Filter
Derivation of the Kalman Filter Kai Borre Danish GPS Center, Denmark Block Matrix Identities The key formulas give the inverse of a 2 by 2 block matrix, assuming T is invertible: T U 1 L M. (1) V W N P
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 2: Orthogonal Vectors and Matrices; Vector Norms Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 11 Outline 1 Orthogonal
More informationErrata List Numerical Mathematics and Computing, 7th Edition Ward Cheney & David Kincaid Cengage Learning (c) March 2013
Chapter Errata List Numerical Mathematics and Computing, 7th Edition Ward Cheney & David Kincaid Cengage Learning (c) 202 9 March 203 Page 4, Summary, 2nd bullet item, line 4: Change A segment of to The
More informationNatural Gradient Learning for Over- and Under-Complete Bases in ICA
NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent
More informationThe Robust Low Rank Matrix Factorization in the Presence of. Zhang Qiang
he Robust Low Rank Matrix Factorization in the Presence of Missing data and Outliers Zhang Qiang Content he definition of matrix factorization and its application he methods for matrix factorization in
More informationDesigning Information Devices and Systems II
EECS 16B Fall 2016 Designing Information Devices and Systems II Linear Algebra Notes Introduction In this set of notes, we will derive the linear least squares equation, study the properties symmetric
More informationA LINEAR PROGRAMMING BASED ANALYSIS OF THE CP-RANK OF COMPLETELY POSITIVE MATRICES
Int J Appl Math Comput Sci, 00, Vol 1, No 1, 5 1 A LINEAR PROGRAMMING BASED ANALYSIS OF HE CP-RANK OF COMPLEELY POSIIVE MARICES YINGBO LI, ANON KUMMER ANDREAS FROMMER Department of Electrical and Information
More informationf(x, y) = 1 2 x y2 xy 3
Problem. Find the critical points of the function and determine their nature. We have We find the critical points We calculate f(x, y) = 2 x2 + 3 2 y2 xy 3 f x = x y 3 = 0 = x = y 3 f y = 3y 3xy 2 = 0
More informationNon-Negative Matrix Factorization with Quasi-Newton Optimization
Non-Negative Matrix Factorization with Quasi-Newton Optimization Rafal ZDUNEK, Andrzej CICHOCKI Laboratory for Advanced Brain Signal Processing BSI, RIKEN, Wako-shi, JAPAN Abstract. Non-negative matrix
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationNumerically Stable Cointegration Analysis
Numerically Stable Cointegration Analysis Jurgen A. Doornik Nuffield College, University of Oxford, Oxford OX1 1NF R.J. O Brien Department of Economics University of Southampton November 3, 2001 Abstract
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More information