Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015
|
|
- Rafe Mathews
- 6 years ago
- Views:
Transcription
1 Tensor Factorization via Matrix Factorization Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
2 Introduction: tensor factorization An application: community detection a b c d Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
3 Introduction: tensor factorization An application: community detection a b c d Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
4 Introduction: tensor factorization An application: community detection a b c d Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
5 Introduction: tensor factorization An application: community detection?? a b?? c d Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
6 Introduction: tensor factorization An application: community detection Anandkumar, Ge, Hsu, and S. Kakade 2013 a b c d = Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
7 Introduction: tensor factorization Applications of tensor factorization I Community detection I Anandkumar, Ge, Hsu, and S. Kakade 2013 I Parsing I Cohen, Satta, and Collins 2013 I Knowledge base completion I Chang et al I Singh, Rocktäschel, and Riedel 2015 I Topic modelling I Anandkumar, Foster, et al I Crowdsourcing I Zhang et al I Mixture models I Anandkumar, Ge, Hsu, S. M. Kakade, et al I Bottlenecked models I Chaganty and Liang 2014 I... Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
8 Introduction: tensor factorization What is tensor (CP) factorization? I Tensor analogue of matrix eigen-decomposition. kÿ M = fi i u i u i. i=1 = k Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
9 Introduction: tensor factorization What is tensor (CP) factorization? I Tensor analogue of matrix eigen-decomposition. kÿ T = fi i u i u i u i. i=1 = k Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
10 Introduction: tensor factorization What is tensor (CP) factorization? I Tensor analogue of matrix eigen-decomposition. kÿ T = fi i u i u i u i + R. i=1 I Goal: Given T with noise, R, recover factors u i. = k Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
11 Introduction: tensor factorization What is tensor (CP) factorization? I Tensor analogue of matrix eigen-decomposition. kÿ T = fi i u i u i u i + R. i=1 I Goal: Given T with noise, R, recover factors u i. Orthogonal = Non-orthogonal = Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
12 Introduction: tensor factorization Existing tensor factorization algorithms I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013) I Analog of matrix power method. I Sensitive to noise. I Restricted to orthogonal tensors. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
13 Introduction: tensor factorization Existing tensor factorization algorithms I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013) I Analog of matrix power method. I Sensitive to noise. I Restricted to orthogonal tensors. I Alternating least squares (Comon, Luciani, and Almeida 2009; Anandkumar, Ge, and Janzamin 2014) I Sensitive to initialization. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
14 Introduction: tensor factorization Existing tensor factorization algorithms I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013) I Analog of matrix power method. I Sensitive to noise. I Restricted to orthogonal tensors. I Alternating least squares (Comon, Luciani, and Almeida 2009; Anandkumar, Ge, and Janzamin 2014) I Sensitive to initialization. Our approach: reduce to existing fast and robust matrix algorithms. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
15 Orthogonal Tensor factorization Outline Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
16 Orthogonal Tensor factorization Tensor factorization via single matrix factorization T = fi 1 u fi 2 u fi 3 u R Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
17 Orthogonal Tensor factorization Tensor factorization via single matrix factorization T = u u u 1 3 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
18 Orthogonal Tensor factorization Tensor factorization via single matrix factorization T = u u u 1 3 T (I, I, w) = (w u 1 )u (w u 2 )u (w u 3 )u 3 2 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
19 Orthogonal Tensor factorization Tensor factorization via single matrix factorization T = u u u 1 3 T (I, I, w) = (w u 1 ) u (w u 2 ) u (w u 3 ) I Proposal: Eigen-decomposition on the projected matrix. I Return: recovered eigenvectors, u i. u Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
20 Orthogonal Tensor factorization Sensitivity of single matrix projection I Problem: Eigendecomposition is very sensitive to the eigengap. error in factors à 1 min(di erence in eigenvalues). Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
21 Orthogonal Tensor factorization Sensitivity of single matrix projection I Problem: Eigendecomposition is very sensitive to the eigengap. error in factors à 1 min(di erence in eigenvalues). I Intuition: If two eigenvalues are equal, corresponding eigenvectors are arbitrary. = + + Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
22 Orthogonal Tensor factorization Sensitivity of single matrix projection I Problem: Eigendecomposition is very sensitive to the eigengap. error in factors à 1 min(di erence in eigenvalues). I Intuition: If two eigenvalues are equal, corresponding eigenvectors are arbitrary. = + + Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
23 Orthogonal Tensor factorization Sensitivity analysis (contd.) Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
24 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
25 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
26 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
27 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
28 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
29 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
30 Orthogonal Tensor factorization Sensitivity analysis (contd.) (Cardoso 1994) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. I Simultaneous matrix factorization: error in factors à 1 min avg. di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
31 Orthogonal Tensor factorization Sensitivity analysis (contd.) (Cardoso 1994) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Every coordinate pair needs one good projection (with a large eigengap). I Simultaneous matrix factorization: error in factors à 1 min avg. di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
32 Orthogonal Tensor factorization Sensitivity analysis (contd.) (Cardoso 1994) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Every coordinate pair needs one good projection (with a large eigengap). I Simultaneous matrix factorization: error in factors à 1 min avg. di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
33 Orthogonal Tensor factorization Sensitivity analysis (contd.) (Cardoso 1994) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Every coordinate pair needs one good projection (with a large eigengap). I Simultaneous matrix factorization: error in factors à 1 min avg. di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
34 Orthogonal Tensor factorization Sensitivity analysis (contd.) (Cardoso 1994) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Every coordinate pair needs one good projection (with a large eigengap). I Simultaneous matrix factorization: error in factors à 1 min avg. di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
35 Orthogonal Tensor factorization Reduction to simultaneous diagonalization T (I, I, w 1 ) = (w 1 u 1 ) u 1 u 1 + (w 1 u 2 ) u 2 u 2 + (w 1 u 3 ) u 3 u 3 M Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
36 Orthogonal Tensor factorization Reduction to simultaneous diagonalization T (I, I, w 1 ) = (w 1 u 1 ) u 1 u 1 + (w 1 u 2 ) u 2 u 2 + (w 1 u 3 ) u 3 u 3 M T (I, I, w ) M = (w u 1 ) u 1 u 1 + (w u 2 ) u 2 u 2 + (w u 3 ) u 3 u Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
37 Orthogonal Tensor factorization Reduction to simultaneous diagonalization T (I, I, w 1 ) = (w 1 u 1 ) u 1 u 1 + (w 1 u 2 ) u 2 u 2 + (w 1 u 3 ) u 3 u 3 M T (I, I, w ) M = (w u 1 ) u 1 u 1 + (w u 2 ) u 2 u 2 + (w u 3 ) u 3 u I Projections share factors: can be simultaneously diagonalized. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
38 Orthogonal Tensor factorization Simultaneous diagonalization algorithm I Algorithm: Simultaneously diagonalize projected matrices. U =arg min U:UU =I Lÿ =1 o (U M U) o (A) =ÿ i =j A 2 ij. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
39 Orthogonal Tensor factorization Simultaneous diagonalization algorithm I Algorithm: Simultaneously diagonalize projected matrices. U =arg min U:UU =I Lÿ =1 o (U M U) o (A) =ÿ i =j A 2 ij. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
40 Orthogonal Tensor factorization Simultaneous diagonalization algorithm I Algorithm: Simultaneously diagonalize projected matrices. U =arg min U:UU =I Lÿ =1 o (U M U) o (A) =ÿ i =j A 2 ij. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
41 Orthogonal Tensor factorization Simultaneous diagonalization algorithm I Algorithm: Simultaneously diagonalize projected matrices. U =arg min U:UU =I Lÿ =1 o (U M U) o (A) =ÿ i =j A 2 ij. I Optimize using Jacobi angles (Cardoso and Souloumiac 1996). Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
42 Orthogonal Tensor factorization Simultaneous diagonalization algorithm I Algorithm: Simultaneously diagonalize projected matrices. U =arg min U:UU =I Lÿ =1 o (U M U) o (A) =ÿ i =j A 2 ij. I Optimize using Jacobi angles (Cardoso and Souloumiac 1996). I Used widly in the ICA community. I Empirically appears to be generically global convergence. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
43 Orthogonal Tensor factorization Projections Outline Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
44 Orthogonal Tensor factorization Projections Oracle and random projections I Hypothetically: oracle projections along the factors is good. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
45 Orthogonal Tensor factorization Projections Oracle and random projections I Hypothetically: oracle projections along the factors is good. I Practically: use projections along random directions. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
46 Orthogonal Tensor factorization Projections Results: Orthogonal tensor decomposition T = kÿ i=1 fi i u 3 i + R. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
47 Orthogonal Tensor factorization Projections Results: Orthogonal tensor decomposition T = Theorem (Random projections) kÿ i=1 fi i u 3 i + R. Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R apple Û ÎfiÎ1 fi max d error in factors Æ O c + d a L b fi 2 min Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
48 Orthogonal Tensor factorization Projections Results: Orthogonal tensor decomposition T = Theorem (Random projections) kÿ i=1 fi i u 3 i + R. Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R apple Û ÎfiÎ1 fi max d error in factors Æ O c a fimin 2 + d L b oracle error Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
49 Orthogonal Tensor factorization Projections Results: Orthogonal tensor decomposition T = Theorem (Random projections) kÿ i=1 fi i u 3 i + R. Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R apple Û ÎfiÎ1 fi max d error in factors Æ O c a fimin 2 + d L b oracle error conc. term Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
50 Orthogonal Tensor factorization Projections Empirical: Random vs. oracle projections rthogonaO case Error umEer of projections Figure: Comparing random vs. oracle projections (d = k = 10, =0.05) Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
51 Non-orthogonal tensor factorization Outline Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
52 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization T (I, I, w 1 ) = (w 1 u 1 ) u 1 u 1 + (w 1 u 2 ) u 2 u 2 + (w 1 u 3 ) u 3 u 3 M I No unique non-orthogonal factorization for a single matrix. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
53 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization T (I, I, w 1 ) = (w 1 u 1 ) u 1 u 1 + (w 1 u 2 ) u 2 u 2 + (w 1 u 3 ) u 3 u 3 M T (I, I, w ) M = (w u 1 ) u 1 u 1 + (w u 2 ) u 2 u 2 + (w u 3 ) u 3 u I No unique non-orthogonal factorization for a single matrix. I Ø 2 matrices have a unique non-orthogonal factorization. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
54 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization I Algorithm: Simultaneously diagonalize projected matrices. U =argmin U Lÿ =1 o (U 1 M U ) o (A) =ÿ i =j A 2 ij. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
55 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization I Algorithm: Simultaneously diagonalize projected matrices. U =argmin U Lÿ =1 o (U 1 M U ) o (A) =ÿ i =j A 2 ij. I U are not constrained to be orthogonal. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
56 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization I Algorithm: Simultaneously diagonalize projected matrices. U =argmin U Lÿ =1 o (U 1 M U ) o (A) =ÿ i =j A 2 ij. I U are not constrained to be orthogonal. I Optimize using the QR1JD algorithm (Afsari 2006). I Only guaranteed to have local convergence. I More stable than ALS in practice. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
57 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization I Algorithm: Simultaneously diagonalize projected matrices. U =argmin U Lÿ =1 o (U 1 M U ) o (A) =ÿ i =j A 2 ij. I U are not constrained to be orthogonal. I Optimize using the QR1JD algorithm (Afsari 2006). I Only guaranteed to have local convergence. I More stable than ALS in practice. I Sensitivity analysis due to Afsari 2008 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
58 Non-orthogonal tensor factorization Results: Non-orthogonal tensor decomposition Theorem (Random projections) Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R error in factors Æ O c a ÎU Î µ 2 apple Q Û R ÎfiÎ1 fi max d a1+ b fimin 2 L d b where U =[u 1... u k ],µ=maxu i u j and ÎU Î µ 2 measures non-orthogonality. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
59 Non-orthogonal tensor factorization Results: Non-orthogonal tensor decomposition Theorem (Random projections) Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R error in factors Æ O c a ÎU Î µ 2 apple Q Û R ÎfiÎ1 fi max d a1+ b fimin 2 L d b ortho. cost where U =[u 1... u k ],µ=maxu i u j and ÎU Î µ 2 measures non-orthogonality. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
60 Non-orthogonal tensor factorization Results: Non-orthogonal tensor decomposition Theorem (Random projections) Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R ÎU Î 2 2 error in factors Æ O c 1 µ a 2 non ortho. cost apple Q Û R ÎfiÎ1 fi max d a1+ b fimin 2 L d b ortho. cost where U =[u 1... u k ],µ=maxu i u j and ÎU Î µ 2 measures non-orthogonality. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
61 Related work Outline Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
62 Related work Notes and Related work I Orthogonal tensor methods can factorize non-orthogonal tensors using a whitening transformation (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013). I Is a major source of errors itself (Souloumiac 2009). Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
63 Related work Notes and Related work I Orthogonal tensor methods can factorize non-orthogonal tensors using a whitening transformation (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013). I Is a major source of errors itself (Souloumiac 2009). I Simultaneous diagonalization for tensors proposed by Lathauwer I Relies on computing the SVD of a d 4 k 2 matrix. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
64 Related work Notes and Related work I Orthogonal tensor methods can factorize non-orthogonal tensors using a whitening transformation (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013). I Is a major source of errors itself (Souloumiac 2009). I Simultaneous diagonalization for tensors proposed by Lathauwer I Relies on computing the SVD of a d 4 k 2 matrix. I Simultaneous diagonalizations for multiple projections mentioned in Anandkumar, Ge, Hsu, S. M. Kakade, et al I No analysis presented. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
65 Empirical results Outline Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
66 Empirical results Community detection Anandkumar, Ge, Hsu, and S. Kakade 2013 a b c d = Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
67 Empirical results Community detection Error Recall Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
68 Empirical results Community detection 0.15 TPM 0.1 Error Recall Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
69 Empirical results Community detection 0.15 TPM OJD 0.1 Error Recall Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
70 Empirical results Crowdsourcing Zhang et al a b c Y N N Y N Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
71 Empirical results Crowdsourcing Zhang et al Accuracy web rte birds dogs TPM ALS Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
72 Empirical results Crowdsourcing Zhang et al Accuracy web rte birds dogs TPM ALS OJD Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
73 Empirical results Crowdsourcing Zhang et al Accuracy web rte birds dogs TPM ALS OJD NOJD Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
74 Empirical results Crowdsourcing Zhang et al Accuracy web rte birds dogs TPM ALS OJD NOJD MV+EM Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
75 Conclusions Conclusions $ I Reduce tensor problems to matrix ones with random projections. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
76 Conclusions Conclusions $ I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for non-orthogonal, asymmetric tensors of arbitrary order. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
77 Conclusions Conclusions $ I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for non-orthogonal, asymmetric tensors of arbitrary order. I Open question: is the Jacobi angles algorithm for orthogonal simultaneous diagaonalization generically globally convergent? Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
78 Conclusions Conclusions $ I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for non-orthogonal, asymmetric tensors of arbitrary order. I Open question: is the Jacobi angles algorithm for orthogonal simultaneous diagaonalization generically globally convergent? I Github: I Codalab: Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
79 Conclusions Conclusions $ I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for non-orthogonal, asymmetric tensors of arbitrary order. I Open question: is the Jacobi angles algorithm for orthogonal simultaneous diagaonalization generically globally convergent? I Github: I Codalab: I Thanks! Questions? Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28
Tensor Factorization via Matrix Factorization
Volodymyr Kuleshov Arun Tejasvi Chaganty Percy Liang Department of Computer Science Stanford University Stanford, CA 94305 Abstract Tensor factorization arises in many machine learning applications, such
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationAppendix A. Proof to Theorem 1
Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation
More informationTensor Decomposition via Joint Matrix Schur Decomposition
Nicolò Colombo NICOLO.COLOMBO@UNI.LU Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur Alzette, Luxembourg Nikos Vlassis Adobe Research, San Jose, CA VLASSIS@ADOBE.COM Abstract
More informationTensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationDictionary Learning Using Tensor Methods
Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone
More information1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s
Blind Separation of Nonstationary Sources in Noisy Mixtures Seungjin CHOI x1 and Andrzej CICHOCKI y x Department of Electrical Engineering Chungbuk National University 48 Kaeshin-dong, Cheongju Chungbuk
More informationA Robust Form of Kruskal s Identifiability Theorem
A Robust Form of Kruskal s Identifiability Theorem Aditya Bhaskara (Google NYC) Joint work with Moses Charikar (Princeton University) Aravindan Vijayaraghavan (NYU Northwestern) Background: understanding
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationIndependent Component Analysis
Independent Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) ICA Fall 2017 1 / 18 Introduction Independent Component Analysis (ICA) falls under the broader topic of Blind Source
More informationLecture 02 Linear Algebra Basics
Introduction to Computational Data Analysis CX4240, 2019 Spring Lecture 02 Linear Algebra Basics Chao Zhang College of Computing Georgia Tech These slides are based on slides from Le Song and Andres Mendez-Vazquez.
More informationGraph Partitioning Using Random Walks
Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient
More informationSimple LU and QR based Non-Orthogonal Matrix Joint Diagonalization
This paper was presented in the Sixth International Conference on Independent Component Analysis and Blind Source Separation held in Charleston, SC in March 2006. It is also published in the proceedings
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationwhere A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ
BLIND SEPARATION OF NONSTATIONARY AND TEMPORALLY CORRELATED SOURCES FROM NOISY MIXTURES Seungjin CHOI x and Andrzej CICHOCKI y x Department of Electrical Engineering Chungbuk National University, KOREA
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationIdentifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints
Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Anima Anandkumar U.C. Irvine Joint work with Daniel Hsu, Majid Janzamin Adel Javanmard and Sham Kakade.
More informationNon-convex Robust PCA: Provable Bounds
Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing
More informationDonald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016
Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix
More informationGuaranteed Learning of Latent Variable Models through Spectral and Tensor Methods
Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point
More informationEstimating Covariance Using Factorial Hidden Markov Models
Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationAn Improved Cumulant Based Method for Independent Component Analysis
An Improved Cumulant Based Method for Independent Component Analysis Tobias Blaschke and Laurenz Wiskott Institute for Theoretical Biology Humboldt University Berlin Invalidenstraße 43 D - 0 5 Berlin Germany
More informationTHE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING
THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way
More informationRethinking LDA: Moment Matching for Discrete ICA
Rethinking LDA: Moment Matching for Discrete ICA Anastasia Podosinnikova Francis Bach Simon Lacoste-Julien INRIA - École normale supérieure Paris Abstract We consider moment matching techniques for estimation
More informationAnnouncements (repeat) Principal Components Analysis
4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationORIENTED PCA AND BLIND SIGNAL SEPARATION
ORIENTED PCA AND BLIND SIGNAL SEPARATION K. I. Diamantaras Department of Informatics TEI of Thessaloniki Sindos 54101, Greece kdiamant@it.teithe.gr Th. Papadimitriou Department of Int. Economic Relat.
More informationFunctional SVD for Big Data
Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem
More informationDeep Learning Book Notes Chapter 2: Linear Algebra
Deep Learning Book Notes Chapter 2: Linear Algebra Compiled By: Abhinaba Bala, Dakshit Agrawal, Mohit Jain Section 2.1: Scalars, Vectors, Matrices and Tensors Scalar Single Number Lowercase names in italic
More informationSpectral Learning of Hidden Markov Models with Group Persistence
000 00 00 00 00 00 00 007 008 009 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 Abstract In this paper, we develop a general Method of Moments
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationFisher s Linear Discriminant Analysis
Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationMETHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS
METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS Y. Cem Subakan, Johannes Traa, Paris Smaragdis,,, Daniel Hsu UIUC Computer Science Department, Columbia University Computer Science Department
More informationThe Singular Value Decomposition
The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will
More informationsublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)
sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank
More informationSingular Value Decomposition
Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =
More informationBasis Learning as an Algorithmic Primitive
JMLR: Workshop and Conference Proceedings vol 49:1 42, 2016 Basis Learning as an Algorithmic Primitive Mikhail Belkin Luis Rademacher James Voss The Ohio State University MBELKIN@CSE.OHIO-STATE.EDU LRADEMAC@CSE.OHIO-STATE.EDU
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationSpectral Learning of Mixture of Hidden Markov Models
Spectral Learning of Mixture of Hidden Markov Models Y Cem Sübakan, Johannes raa, Paris Smaragdis,, Department of Computer Science, University of Illinois at Urbana-Champaign Department of Electrical and
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationQuantum Computing Lecture 2. Review of Linear Algebra
Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationSpectral Theorem for Self-adjoint Linear Operators
Notes for the undergraduate lecture by David Adams. (These are the notes I would write if I was teaching a course on this topic. I have included more material than I will cover in the 45 minute lecture;
More informationarxiv: v4 [cs.lg] 26 Apr 2018
: Online Spectral Learning for Single Topic Models Tong Yu 1, Branislav Kveton 2, Zheng Wen 2, Hung Bui 3, and Ole J. Mengshoel 1 arxiv:1709.07172v4 [cs.lg] 26 Apr 2018 1 Carnegie Mellon University 2 Adobe
More informationFoundations of Computer Vision
Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationTENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018
TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationLecture 1: Systems of linear equations and their solutions
Lecture 1: Systems of linear equations and their solutions Course overview Topics to be covered this semester: Systems of linear equations and Gaussian elimination: Solving linear equations and applications
More informationLinear Algebra - Part II
Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T
More informationReduced-Rank Hidden Markov Models
Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume
More informationPrincipal components analysis COMS 4771
Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature
More informationPrincipal Component Analysis
Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal
More informationarxiv: v2 [stat.ml] 3 Jun 2016
Anastasia Podosinnikova Francis Bach Simon Lacoste-Julien INRIA - École normale supérieure, Paris ANASTASIA.PODOSINNIKOVA@INRIA.FR FRANCIS.BACH@INRIA.FR FIRSTNAME.LASTNAME@INRIA.FR arxiv:1602.09013v2 [stat.ml]
More informationTHE EIGENVALUE PROBLEM
THE EIGENVALUE PROBLEM Let A be an n n square matrix. If there is a number λ and a column vector v 0 for which Av = λv then we say λ is an eigenvalue of A and v is an associated eigenvector. Note that
More informationLecture 5 Singular value decomposition
Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn
More informationSpectral Unsupervised Parsing with Additive Tree Metrics
Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationAssignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:
Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition Due date: Friday, May 4, 2018 (1:35pm) Name: Section Number Assignment #10: Diagonalization
More informationCS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery
CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationGaussian Filtering Strategies for Nonlinear Systems
Gaussian Filtering Strategies for Nonlinear Systems Canonical Nonlinear Filtering Problem ~u m+1 = ~ f (~u m )+~ m+1 ~v m+1 = ~g(~u m+1 )+~ o m+1 I ~ f and ~g are nonlinear & deterministic I Noise/Errors
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationA Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices
A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices ao Shen and Martin Kleinsteuber Department of Electrical and Computer Engineering Technische Universität München, Germany {hao.shen,kleinsteuber}@tum.de
More informationDimensionality Reduc1on
Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the
More informationMachine Learning Applied to 3-D Reservoir Simulation
Machine Learning Applied to 3-D Reservoir Simulation Marco A. Cardoso 1 Introduction The optimization of subsurface flow processes is important for many applications including oil field operations and
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationBeyond CCA: Moment Matching for Multi-View Models
Beyond CCA: Moment Matching for Multi-View Models Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien To cite this version: Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien. Beyond CCA:
More informationRecent Advances in Non-Convex Optimization and its Implications to Learning
Recent Advances in Non-Convex Optimization and its Implications to Learning Anima Anandkumar.. U.C. Irvine.. ICML 2016 Tutorial Optimization at the heart of Machine Learning Unsupervised Learning Most
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationEstimating Mixture Models via Mixtures of Polynomials
Estimating Mixture Models via Mixtures of Polynomials Sida I. Wang Arun Tejasvi Chaganty Percy Liang Computer Science Department, Stanford University, Stanford, CA, 94305 {sidaw,chaganty,pliang}@cs.stanford.edu
More informationA Generalization of Principal Component Analysis to the Exponential Family
A Generalization of Principal Component Analysis to the Exponential Family Michael Collins Sanjoy Dasgupta Robert E. Schapire AT&T Labs Research 8 Park Avenue, Florham Park, NJ 7932 mcollins, dasgupta,
More informationExercises * on Principal Component Analysis
Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationCS540 Machine learning Lecture 5
CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3
More informationLearning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions
Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions [Extended Abstract] ABSTRACT Daniel Hsu Microsoft Research New England dahsu@microsoft.com This work provides a computationally
More informationSingular Value Decomposition and Principal Component Analysis (PCA) I
Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression
More informationData Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples
Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationChapter 7: Symmetric Matrices and Quadratic Forms
Chapter 7: Symmetric Matrices and Quadratic Forms (Last Updated: December, 06) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved
More informationTutorial on Principal Component Analysis
Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms
More informationFall TMA4145 Linear Methods. Exercise set Given the matrix 1 2
Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an
More information1 Linearity and Linear Systems
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)
More informationSingular Value Decomposition
Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationLarge-scale eigenvalue problems
ELE 538B: Mathematics of High-Dimensional Data Large-scale eigenvalue problems Yuxin Chen Princeton University, Fall 208 Outline Power method Lanczos algorithm Eigenvalue problems 4-2 Eigendecomposition
More informationEdexcel GCE A Level Maths Further Maths 3 Matrices.
Edexcel GCE A Level Maths Further Maths 3 Matrices. Edited by: K V Kumaran kumarmathsweebly.com kumarmathsweebly.com 2 kumarmathsweebly.com 3 kumarmathsweebly.com 4 kumarmathsweebly.com 5 kumarmathsweebly.com
More informationNotes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces
Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we
More information