矩阵论马锦华数据科学与计算机学院中山大学

Size: px

Start display at page:

Download "矩阵论马锦华数据科学与计算机学院中山大学"

Abel Davidson
5 years ago
Views:

1 矩阵论马锦华数据科学与计算机学院中山大学

2 About The Instructor Education - Bachelor and master in mathematics, SYSU - PhD in computer science, HKBU Research Experience - Postdoc at Rutgers - Postdoc at Johns Hopkins - Postdoc at HKBU Research interests - Machine learning: feature fusion, transfer learning, etc. - Computer vision: intelligent video surveillance, etc. - Medical data analysis: diagnosis and prediction models, etc. 2

3 About This Course Instructor s contact - Office: 超算中心 5 楼 529C - majh8@mail.sysu.edu.cn Teaching assistant - 孙淑婷, sunsht3@mail2.sysu.edu.cn Lecture hours and venue - Monday 3-4 节, C104 - Thursday 5-6 节, C304 Prerequisite - Linear Algebra, Advanced Mathematics 3

4 About This Course Text Books - 矩阵论简明教程, 徐仲张凯院等编著, 第三版, 科学出版社出版 Recommended readings - Roger A. Horn and Charles R. Johnson, Matrix Analysis (Second Edition), Cambridge University Press, Gene H. Golub and Charles F. van Loan, Matrix Computations (Fourth Edition), John Hopkins University Press, Assessment - 6 Assignments (60%), final exam (40%), attendance 4

5 About This Course Course Contents - 矩阵的相似变换 - 范数理论 - 矩阵分析 - 矩阵分解 - 特征值的估计与表示 - 广义逆矩阵 - 线性空间与线性变换 Applications - Machine learning, computer vision, systems and control, Slides Website: 5

6 A Glimpse of Topics W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 7

7 Eigenvalue Problem Problem: given A R n n, find a v R n such that Av = λv, for some λ. Eigendecomposition: let A R n n be symmetric; i.e., a ij = a ji for all i, j. It also admits a decomposition A = QΛQ T, where Q R n n is orthogonal, i.e., Q T Q = I; Λ = Diag(λ 1,..., λ n ) also widely used, either as an analysis tool or as a computational tool no closed form in general; can be numerically computed W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 12

the idea is to use counts of links of various pages to determine pages

8 Application Example: PageRank PageRank is an algorithm used by Google to rank the pages of a search result. the idea is to use counts of links of various pages to determine pages importance. Source: Wiki. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 13

9 One-Page Explanation of How PageRank Works Model: v j = v i, i = 1,..., n, c j j L i where c j is the number of outgoing links from page j; L i is the set of pages with a link to page i; v i is the importance score of page i. as an example, A {}} { {}} { v v 1 v 2 v 3 v 4 = v { }} { v 1 v 2 v 3 v 4. finding v is an eigenvalue problem with n being of order of millions! further reading: [Bryan-Tanya2006] W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 14

10 Least Squares (LS) Problem: given A R m n, y R n, solve min y x R n Ax 2 2, where 2 is the Euclidean norm; i.e., x 2 = n i=1 x i 2. widely used in science, engineering, and mathematics assuming a tall and full-rank A, the LS solution is uniquely given by x LS = (A T A) 1 A T y. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 8

11 Application Example: Linear Prediction (LP) let y[0], y[1], be a time series. Model (autoregressive (AR) model): y[n] = a 1 y[n 1] + a 2 y[n 2] + + a L y[n L] + w[n], n = 0, 1, 2,... for some coefficients {a i } L i=1, where w[n] is noise or modeling error. Problem: estimate {a i } L i=1 from {y[n]}; can be formulated as LS Applications: time-series prediction, speech analysis and coding, spectral estimation... W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 9

12 A Toy Demo: Predicting Hang Seng Index 2.3 x Hang Seng Index Linear Prediction 2.2 Hang Seng Index day blue Hang Seng Index during a certain time period. red training phase; the line is L l=1 a ly[n l]; a is obtained by LS; L = 10. green prediction phase; the line is ŷ[n] = L l=1 a lŷ[n l]; the same a in the training phase. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 10

13 A Real Example: Real-Time Prediction of Flu Activity Tracking influenza outbreaks by ARGO a model combining the AR model and Google search data. Source: [Yang-Santillana-Kou2015]. (PNAS) W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 11

14 Low-Rank Matrix Approximation Problem: given Y R m n and an integer r < min{m, n}, find an (A, B) R m r R r n such that either Y = AB or Y AB. Formulation: min Y AB 2 A R m r,b R r n F, where F is the Frobenius, or matrix Euclidean, norm. Applications: dimensionality reduction, extracting meaningful features from data, low-rank modeling,... W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 15

15 Application Example: Image Compression let Y R m n be an image. (a) original image, size= store the low-rank factor pair (A, B), instead of Y. (b) truncated SVD, k= 5 (c) truncated SVD, k= 10 (d) truncated SVD, k= 20 W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 16

16 Application Example: Principal Component Analysis (PCA) Aim: given a set of data points {y 1, y 2,..., y n } R n and an integer k < min{m, n}, perform a low-dimensional representation y i = Qc i + µ + e i, i = 1,..., n, where Q R m k is a basis; c i s are coefficients; µ is a base; e i s are errors W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 17

17 Toy Demo: Dimensionality Reduction of a Face Image Dataset A face image dataset. Image size = , number of face images = 400. Each x i is the vectorization of one face image, leading to m = = 10304, n = 400. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 18

18 Toy Demo: Dimensionality Reduction of a Face Image Dataset Mean face 1st principal left singular vector 2nd principal left singular vector 3rd principal left singular vector 400th left singular vector Energy Rank Energy Concentration W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 19

19 Singular Value Decomposition (SVD) SVD: Any Y R m n can be decomposed into Y = UΣV T, where U R m m, V R n n are orthogonal; Σ R m n takes a diagonal form. also a widely used analysis and computational tool; can be numerically computed SVD can be used to solve the low-rank matrix approximation problem min Y AB 2 A R m r,b R r n F. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 20

20 Why Matrix Analysis and Computations is Important? as said, areas such as signal processing, image processing, machine learning, optimization, computer vision, control, communications,..., use matrix operations extensively it helps you build the foundations for understanding hot topics such as sparse recovery; structured low-rank matrix approximation; matrix completion. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 22

21 The Sparse Recovery Problem Problem: given y R m, A R m n, m < n, find a sparsest x R n such that y = Ax. measurements sparse vector with few nonzero entries by sparsest, we mean that x should have as many zero elements as possible. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 23

Application: Magnetic resonance imaging (MRI) Problem: MRI image reconstruction. (a) (b) Fig. a shows the original test image. Fig. b shows the sampling region in the frequency domain.

22 Application: Magnetic resonance imaging (MRI) Problem: MRI image reconstruction. (a) (b) Fig. a shows the original test image. Fig. b shows the sampling region in the frequency domain. Fourier coefficients are sampled along 22 approximately radial lines. Source: [Candès-Romberg-Tao2006] W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 24

Application: Magnetic resonance imaging (MRI) Problem: MRI image reconstruction. (c) (d) Fig. c is the recovery by filling the unobserved Fourier coefficients to zero.

23 Application: Magnetic resonance imaging (MRI) Problem: MRI image reconstruction. (c) (d) Fig. c is the recovery by filling the unobserved Fourier coefficients to zero. Fig. d is the recovery by a sparse recovery solution. Source: [Candès-Romberg-Tao2006] W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 25

24 Application: Low-Rank Matrix Completion recommendation systems in 2009, Netflix awarded $1 million to a team that performed best in recommending new movies to users based on their previous preference 1. let Z be a preference matrix, where z ij records how user i likes movie j. Z = movies 2 3 1?? 5 5 1? 4 2???? 3 1? users??? 3? 1 5 some entries z ij are missing, since no one watches all movies. Z is assumed to be of low rank; research shows that only a few factors affect users preferences. Aim: guess the unkown z ij s from the known ones. 1 W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 26

25 Low-Rank Matrix Completion The 2009 Netflix Grand Prize winners used low-rank matrix approximations [Koren-Bell-Volinsky2009]. Formulation (oversimplified): min A R m r,b R r n (i,j) Ω z ij [AB] i,j 2 where Ω is an index set that indicates the known entries of Z. cannot be solved by SVD in the recommendation system application, it s a large-scale problem alternating LS may be used W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 27

Right: the low-rank matrix completion result. r = 120. W.

26 Toy Demonstration of Low-Rank Matrix Completion Left: An incomplete image with 40% missing pixels. Right: the low-rank matrix completion result. r = 120. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 28

27 Nonnegative Matrix Factorization (NMF) Aim: we want the factors to be non-negative Formulation: min Y AB 2 A R m r,b R r n F s.t. A 0, B 0, where X 0 means that x ij 0 for all i, j. arguably a topic in optimization, but worth noticing found to be able to extract meaningful features (by empirical studies) numerous applications, e.g., in machine learning, signal processing, remote sensing W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 29

they are non-global and contain several versions of mouths, noses and other facial parts, where the various versions are in different locations or forms.

Image Processing: Original NMF = Figure 1 Nonfaces, wherea holistic repres m ¼ 2;429 fa n 3 m matrix three different and methods.

28 they are non-global and contain several versions of mouths, noses and other facial parts, where the various versions are in different locations or forms. The variability of a whole face is generated by NMF Examples combining these different parts. Although all parts are used by at through tri the matrix Fig. 2 may bases. Image Processing: Original NMF = Figure 1 Nonfaces, wherea holistic repres m ¼ 2;429 fa n 3 m matrix three different and methods. r ¼ 49 basis i with red pixels represented b superposition superpositions learns to repre The basis elements extract facial features such as eyes, nose and lips. Source: VQ (Nature) [Lee-Seung1999]. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 30

29 Text Mining: basis elements allow us to recover different topics; weights allow us to assign each text to its corresponding topics. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 31

30 Toy Demonstration of NMF A face image dataset. Image size = , number of face images = Each x i is the vectorization of one face image, leading to m = = 10201, n = W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 32

31 Toy Demonstration of NMF: NMF-Extracted Features NMF settings: r = 49, Lee-Seung multiplicative update with 5000 iterations. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 33

Toy Demonstration of NMF: Comparison with

vector 2nd principal left singular vector

principal left singular vector 10 6 10 5

10 3 0 1000 2000 3000 4000 5000 6000 7000

32 Toy Demonstration of NMF: Comparison with PCA Mean face 1st principal left singular vector 2nd principal left singular vector 3th principal left singular vector last principal left singular vector Energy Rank Energy Concentration W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 34

33 References [Yang-Santillana-Kou2015] S. Yang, M. Santillana, and S. C. Kou, Accurate estimation of influenza epidemics using Google search data via ARGO, Proceedings of the National Academy of Sciences, vol. 112, no. 47, pp , [Bryan-Tanya2006] K. Bryan and L. Tanya, The 25, 000, 000, 000 eigenvector: The linear algebra behind Google, SIAM Review, vol. 48, no. 3, pp , [Candès-Romberg-Tao2006] E. J. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Information Theory, vol. 52, no. 2, pp , [Koren-Bell-Volinsky2009] B. Koren, R. Bell, and C. Volinsky, Matrix factorization techniques for recommender systems, IEEE Computer, vol. 42 no. 8, pp , [Lee-Seung1999] D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol. 401, no. 6755, pp , W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, Term 2. 36

ENGG5781 Matrix Analysis and Computations Lecture 0: Overview

ENGG5781 Matrix Analysis and Computations Lecture 0: Overview Wing-Kin (Ken) Ma 2018 2019 Term 1 Department of Electronic Engineering The Chinese University of Hong Kong Course Information W.-K. Ma, ENGG5781

矩阵论 马锦华 数据科学与计算机学院 中山大学

矩阵论马锦华数据科学与计算机学院中山大学