Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015

Size: px
Start display at page:

Download "Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015"

Transcription

1 Tensor Factorization via Matrix Factorization Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

2 Introduction: tensor factorization An application: community detection a b c d Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

3 Introduction: tensor factorization An application: community detection a b c d Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

4 Introduction: tensor factorization An application: community detection a b c d Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

5 Introduction: tensor factorization An application: community detection?? a b?? c d Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

6 Introduction: tensor factorization An application: community detection Anandkumar, Ge, Hsu, and S. Kakade 2013 a b c d = Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

7 Introduction: tensor factorization Applications of tensor factorization I Community detection I Anandkumar, Ge, Hsu, and S. Kakade 2013 I Parsing I Cohen, Satta, and Collins 2013 I Knowledge base completion I Chang et al I Singh, Rocktäschel, and Riedel 2015 I Topic modelling I Anandkumar, Foster, et al I Crowdsourcing I Zhang et al I Mixture models I Anandkumar, Ge, Hsu, S. M. Kakade, et al I Bottlenecked models I Chaganty and Liang 2014 I... Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

8 Introduction: tensor factorization What is tensor (CP) factorization? I Tensor analogue of matrix eigen-decomposition. kÿ M = fi i u i u i. i=1 = k Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

9 Introduction: tensor factorization What is tensor (CP) factorization? I Tensor analogue of matrix eigen-decomposition. kÿ T = fi i u i u i u i. i=1 = k Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

10 Introduction: tensor factorization What is tensor (CP) factorization? I Tensor analogue of matrix eigen-decomposition. kÿ T = fi i u i u i u i + R. i=1 I Goal: Given T with noise, R, recover factors u i. = k Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

11 Introduction: tensor factorization What is tensor (CP) factorization? I Tensor analogue of matrix eigen-decomposition. kÿ T = fi i u i u i u i + R. i=1 I Goal: Given T with noise, R, recover factors u i. Orthogonal = Non-orthogonal = Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

12 Introduction: tensor factorization Existing tensor factorization algorithms I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013) I Analog of matrix power method. I Sensitive to noise. I Restricted to orthogonal tensors. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

13 Introduction: tensor factorization Existing tensor factorization algorithms I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013) I Analog of matrix power method. I Sensitive to noise. I Restricted to orthogonal tensors. I Alternating least squares (Comon, Luciani, and Almeida 2009; Anandkumar, Ge, and Janzamin 2014) I Sensitive to initialization. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

14 Introduction: tensor factorization Existing tensor factorization algorithms I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013) I Analog of matrix power method. I Sensitive to noise. I Restricted to orthogonal tensors. I Alternating least squares (Comon, Luciani, and Almeida 2009; Anandkumar, Ge, and Janzamin 2014) I Sensitive to initialization. Our approach: reduce to existing fast and robust matrix algorithms. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

15 Orthogonal Tensor factorization Outline Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

16 Orthogonal Tensor factorization Tensor factorization via single matrix factorization T = fi 1 u fi 2 u fi 3 u R Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

17 Orthogonal Tensor factorization Tensor factorization via single matrix factorization T = u u u 1 3 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

18 Orthogonal Tensor factorization Tensor factorization via single matrix factorization T = u u u 1 3 T (I, I, w) = (w u 1 )u (w u 2 )u (w u 3 )u 3 2 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

19 Orthogonal Tensor factorization Tensor factorization via single matrix factorization T = u u u 1 3 T (I, I, w) = (w u 1 ) u (w u 2 ) u (w u 3 ) I Proposal: Eigen-decomposition on the projected matrix. I Return: recovered eigenvectors, u i. u Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

20 Orthogonal Tensor factorization Sensitivity of single matrix projection I Problem: Eigendecomposition is very sensitive to the eigengap. error in factors à 1 min(di erence in eigenvalues). Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

21 Orthogonal Tensor factorization Sensitivity of single matrix projection I Problem: Eigendecomposition is very sensitive to the eigengap. error in factors à 1 min(di erence in eigenvalues). I Intuition: If two eigenvalues are equal, corresponding eigenvectors are arbitrary. = + + Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

22 Orthogonal Tensor factorization Sensitivity of single matrix projection I Problem: Eigendecomposition is very sensitive to the eigengap. error in factors à 1 min(di erence in eigenvalues). I Intuition: If two eigenvalues are equal, corresponding eigenvectors are arbitrary. = + + Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

23 Orthogonal Tensor factorization Sensitivity analysis (contd.) Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

24 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

25 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

26 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

27 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

28 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

29 Orthogonal Tensor factorization Sensitivity analysis (contd.) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

30 Orthogonal Tensor factorization Sensitivity analysis (contd.) (Cardoso 1994) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. I Simultaneous matrix factorization: error in factors à 1 min avg. di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

31 Orthogonal Tensor factorization Sensitivity analysis (contd.) (Cardoso 1994) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Every coordinate pair needs one good projection (with a large eigengap). I Simultaneous matrix factorization: error in factors à 1 min avg. di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

32 Orthogonal Tensor factorization Sensitivity analysis (contd.) (Cardoso 1994) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Every coordinate pair needs one good projection (with a large eigengap). I Simultaneous matrix factorization: error in factors à 1 min avg. di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

33 Orthogonal Tensor factorization Sensitivity analysis (contd.) (Cardoso 1994) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Every coordinate pair needs one good projection (with a large eigengap). I Simultaneous matrix factorization: error in factors à 1 min avg. di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

34 Orthogonal Tensor factorization Sensitivity analysis (contd.) (Cardoso 1994) I Single matrix factorization: error in factors à 1 min di. in eigenvalues. Every coordinate pair needs one good projection (with a large eigengap). I Simultaneous matrix factorization: error in factors à 1 min avg. di. in eigenvalues. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

35 Orthogonal Tensor factorization Reduction to simultaneous diagonalization T (I, I, w 1 ) = (w 1 u 1 ) u 1 u 1 + (w 1 u 2 ) u 2 u 2 + (w 1 u 3 ) u 3 u 3 M Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

36 Orthogonal Tensor factorization Reduction to simultaneous diagonalization T (I, I, w 1 ) = (w 1 u 1 ) u 1 u 1 + (w 1 u 2 ) u 2 u 2 + (w 1 u 3 ) u 3 u 3 M T (I, I, w ) M = (w u 1 ) u 1 u 1 + (w u 2 ) u 2 u 2 + (w u 3 ) u 3 u Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

37 Orthogonal Tensor factorization Reduction to simultaneous diagonalization T (I, I, w 1 ) = (w 1 u 1 ) u 1 u 1 + (w 1 u 2 ) u 2 u 2 + (w 1 u 3 ) u 3 u 3 M T (I, I, w ) M = (w u 1 ) u 1 u 1 + (w u 2 ) u 2 u 2 + (w u 3 ) u 3 u I Projections share factors: can be simultaneously diagonalized. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

38 Orthogonal Tensor factorization Simultaneous diagonalization algorithm I Algorithm: Simultaneously diagonalize projected matrices. U =arg min U:UU =I Lÿ =1 o (U M U) o (A) =ÿ i =j A 2 ij. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

39 Orthogonal Tensor factorization Simultaneous diagonalization algorithm I Algorithm: Simultaneously diagonalize projected matrices. U =arg min U:UU =I Lÿ =1 o (U M U) o (A) =ÿ i =j A 2 ij. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

40 Orthogonal Tensor factorization Simultaneous diagonalization algorithm I Algorithm: Simultaneously diagonalize projected matrices. U =arg min U:UU =I Lÿ =1 o (U M U) o (A) =ÿ i =j A 2 ij. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

41 Orthogonal Tensor factorization Simultaneous diagonalization algorithm I Algorithm: Simultaneously diagonalize projected matrices. U =arg min U:UU =I Lÿ =1 o (U M U) o (A) =ÿ i =j A 2 ij. I Optimize using Jacobi angles (Cardoso and Souloumiac 1996). Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

42 Orthogonal Tensor factorization Simultaneous diagonalization algorithm I Algorithm: Simultaneously diagonalize projected matrices. U =arg min U:UU =I Lÿ =1 o (U M U) o (A) =ÿ i =j A 2 ij. I Optimize using Jacobi angles (Cardoso and Souloumiac 1996). I Used widly in the ICA community. I Empirically appears to be generically global convergence. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

43 Orthogonal Tensor factorization Projections Outline Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

44 Orthogonal Tensor factorization Projections Oracle and random projections I Hypothetically: oracle projections along the factors is good. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

45 Orthogonal Tensor factorization Projections Oracle and random projections I Hypothetically: oracle projections along the factors is good. I Practically: use projections along random directions. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

46 Orthogonal Tensor factorization Projections Results: Orthogonal tensor decomposition T = kÿ i=1 fi i u 3 i + R. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

47 Orthogonal Tensor factorization Projections Results: Orthogonal tensor decomposition T = Theorem (Random projections) kÿ i=1 fi i u 3 i + R. Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R apple Û ÎfiÎ1 fi max d error in factors Æ O c + d a L b fi 2 min Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

48 Orthogonal Tensor factorization Projections Results: Orthogonal tensor decomposition T = Theorem (Random projections) kÿ i=1 fi i u 3 i + R. Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R apple Û ÎfiÎ1 fi max d error in factors Æ O c a fimin 2 + d L b oracle error Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

49 Orthogonal Tensor factorization Projections Results: Orthogonal tensor decomposition T = Theorem (Random projections) kÿ i=1 fi i u 3 i + R. Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R apple Û ÎfiÎ1 fi max d error in factors Æ O c a fimin 2 + d L b oracle error conc. term Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

50 Orthogonal Tensor factorization Projections Empirical: Random vs. oracle projections rthogonaO case Error umEer of projections Figure: Comparing random vs. oracle projections (d = k = 10, =0.05) Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

51 Non-orthogonal tensor factorization Outline Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

52 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization T (I, I, w 1 ) = (w 1 u 1 ) u 1 u 1 + (w 1 u 2 ) u 2 u 2 + (w 1 u 3 ) u 3 u 3 M I No unique non-orthogonal factorization for a single matrix. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

53 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization T (I, I, w 1 ) = (w 1 u 1 ) u 1 u 1 + (w 1 u 2 ) u 2 u 2 + (w 1 u 3 ) u 3 u 3 M T (I, I, w ) M = (w u 1 ) u 1 u 1 + (w u 2 ) u 2 u 2 + (w u 3 ) u 3 u I No unique non-orthogonal factorization for a single matrix. I Ø 2 matrices have a unique non-orthogonal factorization. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

54 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization I Algorithm: Simultaneously diagonalize projected matrices. U =argmin U Lÿ =1 o (U 1 M U ) o (A) =ÿ i =j A 2 ij. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

55 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization I Algorithm: Simultaneously diagonalize projected matrices. U =argmin U Lÿ =1 o (U 1 M U ) o (A) =ÿ i =j A 2 ij. I U are not constrained to be orthogonal. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

56 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization I Algorithm: Simultaneously diagonalize projected matrices. U =argmin U Lÿ =1 o (U 1 M U ) o (A) =ÿ i =j A 2 ij. I U are not constrained to be orthogonal. I Optimize using the QR1JD algorithm (Afsari 2006). I Only guaranteed to have local convergence. I More stable than ALS in practice. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

57 Non-orthogonal tensor factorization Non-orthogonal simultaneous diagonalization I Algorithm: Simultaneously diagonalize projected matrices. U =argmin U Lÿ =1 o (U 1 M U ) o (A) =ÿ i =j A 2 ij. I U are not constrained to be orthogonal. I Optimize using the QR1JD algorithm (Afsari 2006). I Only guaranteed to have local convergence. I More stable than ALS in practice. I Sensitivity analysis due to Afsari 2008 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

58 Non-orthogonal tensor factorization Results: Non-orthogonal tensor decomposition Theorem (Random projections) Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R error in factors Æ O c a ÎU Î µ 2 apple Q Û R ÎfiÎ1 fi max d a1+ b fimin 2 L d b where U =[u 1... u k ],µ=maxu i u j and ÎU Î µ 2 measures non-orthogonality. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

59 Non-orthogonal tensor factorization Results: Non-orthogonal tensor decomposition Theorem (Random projections) Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R error in factors Æ O c a ÎU Î µ 2 apple Q Û R ÎfiÎ1 fi max d a1+ b fimin 2 L d b ortho. cost where U =[u 1... u k ],µ=maxu i u j and ÎU Î µ 2 measures non-orthogonality. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

60 Non-orthogonal tensor factorization Results: Non-orthogonal tensor decomposition Theorem (Random projections) Pick L = (k log k) projections randomly from the unit sphere. Then, with high probability, Q R ÎU Î 2 2 error in factors Æ O c 1 µ a 2 non ortho. cost apple Q Û R ÎfiÎ1 fi max d a1+ b fimin 2 L d b ortho. cost where U =[u 1... u k ],µ=maxu i u j and ÎU Î µ 2 measures non-orthogonality. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

61 Related work Outline Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

62 Related work Notes and Related work I Orthogonal tensor methods can factorize non-orthogonal tensors using a whitening transformation (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013). I Is a major source of errors itself (Souloumiac 2009). Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

63 Related work Notes and Related work I Orthogonal tensor methods can factorize non-orthogonal tensors using a whitening transformation (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013). I Is a major source of errors itself (Souloumiac 2009). I Simultaneous diagonalization for tensors proposed by Lathauwer I Relies on computing the SVD of a d 4 k 2 matrix. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

64 Related work Notes and Related work I Orthogonal tensor methods can factorize non-orthogonal tensors using a whitening transformation (Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013). I Is a major source of errors itself (Souloumiac 2009). I Simultaneous diagonalization for tensors proposed by Lathauwer I Relies on computing the SVD of a d 4 k 2 matrix. I Simultaneous diagonalizations for multiple projections mentioned in Anandkumar, Ge, Hsu, S. M. Kakade, et al I No analysis presented. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

65 Empirical results Outline Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

66 Empirical results Community detection Anandkumar, Ge, Hsu, and S. Kakade 2013 a b c d = Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

67 Empirical results Community detection Error Recall Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

68 Empirical results Community detection 0.15 TPM 0.1 Error Recall Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

69 Empirical results Community detection 0.15 TPM OJD 0.1 Error Recall Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

70 Empirical results Crowdsourcing Zhang et al a b c Y N N Y N Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

71 Empirical results Crowdsourcing Zhang et al Accuracy web rte birds dogs TPM ALS Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

72 Empirical results Crowdsourcing Zhang et al Accuracy web rte birds dogs TPM ALS OJD Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

73 Empirical results Crowdsourcing Zhang et al Accuracy web rte birds dogs TPM ALS OJD NOJD Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

74 Empirical results Crowdsourcing Zhang et al Accuracy web rte birds dogs TPM ALS OJD NOJD MV+EM Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

75 Conclusions Conclusions $ I Reduce tensor problems to matrix ones with random projections. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

76 Conclusions Conclusions $ I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for non-orthogonal, asymmetric tensors of arbitrary order. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

77 Conclusions Conclusions $ I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for non-orthogonal, asymmetric tensors of arbitrary order. I Open question: is the Jacobi angles algorithm for orthogonal simultaneous diagaonalization generically globally convergent? Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

78 Conclusions Conclusions $ I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for non-orthogonal, asymmetric tensors of arbitrary order. I Open question: is the Jacobi angles algorithm for orthogonal simultaneous diagaonalization generically globally convergent? I Github: I Codalab: Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

79 Conclusions Conclusions $ I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for non-orthogonal, asymmetric tensors of arbitrary order. I Open question: is the Jacobi angles algorithm for orthogonal simultaneous diagaonalization generically globally convergent? I Github: I Codalab: I Thanks! Questions? Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, / 28

Tensor Factorization via Matrix Factorization

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy Liang Department of Computer Science Stanford University Stanford, CA 94305 Abstract Tensor factorization arises in many machine learning applications, such

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

Tensor Decomposition via Joint Matrix Schur Decomposition

Tensor Decomposition via Joint Matrix Schur Decomposition Nicolò Colombo NICOLO.COLOMBO@UNI.LU Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur Alzette, Luxembourg Nikos Vlassis Adobe Research, San Jose, CA VLASSIS@ADOBE.COM Abstract

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Dictionary Learning Using Tensor Methods

Dictionary Learning Using Tensor Methods Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone

More information

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s Blind Separation of Nonstationary Sources in Noisy Mixtures Seungjin CHOI x1 and Andrzej CICHOCKI y x Department of Electrical Engineering Chungbuk National University 48 Kaeshin-dong, Cheongju Chungbuk

More information

A Robust Form of Kruskal s Identifiability Theorem

A Robust Form of Kruskal s Identifiability Theorem A Robust Form of Kruskal s Identifiability Theorem Aditya Bhaskara (Google NYC) Joint work with Moses Charikar (Princeton University) Aravindan Vijayaraghavan (NYU Northwestern) Background: understanding

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Independent Component Analysis

Independent Component Analysis Independent Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) ICA Fall 2017 1 / 18 Introduction Independent Component Analysis (ICA) falls under the broader topic of Blind Source

More information

Lecture 02 Linear Algebra Basics

Lecture 02 Linear Algebra Basics Introduction to Computational Data Analysis CX4240, 2019 Spring Lecture 02 Linear Algebra Basics Chao Zhang College of Computing Georgia Tech These slides are based on slides from Le Song and Andres Mendez-Vazquez.

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

Simple LU and QR based Non-Orthogonal Matrix Joint Diagonalization

Simple LU and QR based Non-Orthogonal Matrix Joint Diagonalization This paper was presented in the Sixth International Conference on Independent Component Analysis and Blind Source Separation held in Charleston, SC in March 2006. It is also published in the proceedings

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ BLIND SEPARATION OF NONSTATIONARY AND TEMPORALLY CORRELATED SOURCES FROM NOISY MIXTURES Seungjin CHOI x and Andrzej CICHOCKI y x Department of Electrical Engineering Chungbuk National University, KOREA

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints

Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Anima Anandkumar U.C. Irvine Joint work with Daniel Hsu, Majid Janzamin Adel Javanmard and Sham Kakade.

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix

More information

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

An Improved Cumulant Based Method for Independent Component Analysis

An Improved Cumulant Based Method for Independent Component Analysis An Improved Cumulant Based Method for Independent Component Analysis Tobias Blaschke and Laurenz Wiskott Institute for Theoretical Biology Humboldt University Berlin Invalidenstraße 43 D - 0 5 Berlin Germany

More information

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way

More information

Rethinking LDA: Moment Matching for Discrete ICA

Rethinking LDA: Moment Matching for Discrete ICA Rethinking LDA: Moment Matching for Discrete ICA Anastasia Podosinnikova Francis Bach Simon Lacoste-Julien INRIA - École normale supérieure Paris Abstract We consider moment matching techniques for estimation

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

ORIENTED PCA AND BLIND SIGNAL SEPARATION

ORIENTED PCA AND BLIND SIGNAL SEPARATION ORIENTED PCA AND BLIND SIGNAL SEPARATION K. I. Diamantaras Department of Informatics TEI of Thessaloniki Sindos 54101, Greece kdiamant@it.teithe.gr Th. Papadimitriou Department of Int. Economic Relat.

More information

Functional SVD for Big Data

Functional SVD for Big Data Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem

More information

Deep Learning Book Notes Chapter 2: Linear Algebra

Deep Learning Book Notes Chapter 2: Linear Algebra Deep Learning Book Notes Chapter 2: Linear Algebra Compiled By: Abhinaba Bala, Dakshit Agrawal, Mohit Jain Section 2.1: Scalars, Vectors, Matrices and Tensors Scalar Single Number Lowercase names in italic

More information

Spectral Learning of Hidden Markov Models with Group Persistence

Spectral Learning of Hidden Markov Models with Group Persistence 000 00 00 00 00 00 00 007 008 009 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 Abstract In this paper, we develop a general Method of Moments

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Fisher s Linear Discriminant Analysis

Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS

METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS Y. Cem Subakan, Johannes Traa, Paris Smaragdis,,, Daniel Hsu UIUC Computer Science Department, Columbia University Computer Science Department

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =

More information

Basis Learning as an Algorithmic Primitive

Basis Learning as an Algorithmic Primitive JMLR: Workshop and Conference Proceedings vol 49:1 42, 2016 Basis Learning as an Algorithmic Primitive Mikhail Belkin Luis Rademacher James Voss The Ohio State University MBELKIN@CSE.OHIO-STATE.EDU LRADEMAC@CSE.OHIO-STATE.EDU

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Spectral Learning of Mixture of Hidden Markov Models

Spectral Learning of Mixture of Hidden Markov Models Spectral Learning of Mixture of Hidden Markov Models Y Cem Sübakan, Johannes raa, Paris Smaragdis,, Department of Computer Science, University of Illinois at Urbana-Champaign Department of Electrical and

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

Quantum Computing Lecture 2. Review of Linear Algebra

Quantum Computing Lecture 2. Review of Linear Algebra Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Spectral Theorem for Self-adjoint Linear Operators

Spectral Theorem for Self-adjoint Linear Operators Notes for the undergraduate lecture by David Adams. (These are the notes I would write if I was teaching a course on this topic. I have included more material than I will cover in the 45 minute lecture;

More information

arxiv: v4 [cs.lg] 26 Apr 2018

arxiv: v4 [cs.lg] 26 Apr 2018 : Online Spectral Learning for Single Topic Models Tong Yu 1, Branislav Kveton 2, Zheng Wen 2, Hung Bui 3, and Ole J. Mengshoel 1 arxiv:1709.07172v4 [cs.lg] 26 Apr 2018 1 Carnegie Mellon University 2 Adobe

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Lecture 1: Systems of linear equations and their solutions

Lecture 1: Systems of linear equations and their solutions Lecture 1: Systems of linear equations and their solutions Course overview Topics to be covered this semester: Systems of linear equations and Gaussian elimination: Solving linear equations and applications

More information

Linear Algebra - Part II

Linear Algebra - Part II Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T

More information

Reduced-Rank Hidden Markov Models

Reduced-Rank Hidden Markov Models Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal

More information

arxiv: v2 [stat.ml] 3 Jun 2016

arxiv: v2 [stat.ml] 3 Jun 2016 Anastasia Podosinnikova Francis Bach Simon Lacoste-Julien INRIA - École normale supérieure, Paris ANASTASIA.PODOSINNIKOVA@INRIA.FR FRANCIS.BACH@INRIA.FR FIRSTNAME.LASTNAME@INRIA.FR arxiv:1602.09013v2 [stat.ml]

More information

THE EIGENVALUE PROBLEM

THE EIGENVALUE PROBLEM THE EIGENVALUE PROBLEM Let A be an n n square matrix. If there is a number λ and a column vector v 0 for which Av = λv then we say λ is an eigenvalue of A and v is an associated eigenvector. Note that

More information

Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name: Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition Due date: Friday, May 4, 2018 (1:35pm) Name: Section Number Assignment #10: Diagonalization

More information

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

Gaussian Filtering Strategies for Nonlinear Systems

Gaussian Filtering Strategies for Nonlinear Systems Gaussian Filtering Strategies for Nonlinear Systems Canonical Nonlinear Filtering Problem ~u m+1 = ~ f (~u m )+~ m+1 ~v m+1 = ~g(~u m+1 )+~ o m+1 I ~ f and ~g are nonlinear & deterministic I Noise/Errors

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices

A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices ao Shen and Martin Kleinsteuber Department of Electrical and Computer Engineering Technische Universität München, Germany {hao.shen,kleinsteuber}@tum.de

More information

Dimensionality Reduc1on

Dimensionality Reduc1on Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

More information

Machine Learning Applied to 3-D Reservoir Simulation

Machine Learning Applied to 3-D Reservoir Simulation Machine Learning Applied to 3-D Reservoir Simulation Marco A. Cardoso 1 Introduction The optimization of subsurface flow processes is important for many applications including oil field operations and

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Beyond CCA: Moment Matching for Multi-View Models

Beyond CCA: Moment Matching for Multi-View Models Beyond CCA: Moment Matching for Multi-View Models Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien To cite this version: Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien. Beyond CCA:

More information

Recent Advances in Non-Convex Optimization and its Implications to Learning

Recent Advances in Non-Convex Optimization and its Implications to Learning Recent Advances in Non-Convex Optimization and its Implications to Learning Anima Anandkumar.. U.C. Irvine.. ICML 2016 Tutorial Optimization at the heart of Machine Learning Unsupervised Learning Most

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Estimating Mixture Models via Mixtures of Polynomials

Estimating Mixture Models via Mixtures of Polynomials Estimating Mixture Models via Mixtures of Polynomials Sida I. Wang Arun Tejasvi Chaganty Percy Liang Computer Science Department, Stanford University, Stanford, CA, 94305 {sidaw,chaganty,pliang}@cs.stanford.edu

More information

A Generalization of Principal Component Analysis to the Exponential Family

A Generalization of Principal Component Analysis to the Exponential Family A Generalization of Principal Component Analysis to the Exponential Family Michael Collins Sanjoy Dasgupta Robert E. Schapire AT&T Labs Research 8 Park Avenue, Florham Park, NJ 7932 mcollins, dasgupta,

More information

Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

CS540 Machine learning Lecture 5

CS540 Machine learning Lecture 5 CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3

More information

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions [Extended Abstract] ABSTRACT Daniel Hsu Microsoft Research New England dahsu@microsoft.com This work provides a computationally

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples

Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Chapter 7: Symmetric Matrices and Quadratic Forms

Chapter 7: Symmetric Matrices and Quadratic Forms Chapter 7: Symmetric Matrices and Quadratic Forms (Last Updated: December, 06) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2 Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an

More information

1 Linearity and Linear Systems

1 Linearity and Linear Systems Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Large-scale eigenvalue problems

Large-scale eigenvalue problems ELE 538B: Mathematics of High-Dimensional Data Large-scale eigenvalue problems Yuxin Chen Princeton University, Fall 208 Outline Power method Lanczos algorithm Eigenvalue problems 4-2 Eigendecomposition

More information

Edexcel GCE A Level Maths Further Maths 3 Matrices.

Edexcel GCE A Level Maths Further Maths 3 Matrices. Edexcel GCE A Level Maths Further Maths 3 Matrices. Edited by: K V Kumaran kumarmathsweebly.com kumarmathsweebly.com 2 kumarmathsweebly.com 3 kumarmathsweebly.com 4 kumarmathsweebly.com 5 kumarmathsweebly.com

More information

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we

More information