TENSOR DECOMPOSITIONS AND THEIR APPLICATIONS

Size: px
Start display at page:

Download "TENSOR DECOMPOSITIONS AND THEIR APPLICATIONS"

Transcription

1 TENSOR DECOMPOSITIONS AND THEIR APPLICATIONS ANKUR MOITRA MASSACHUSETTS INSTITUTE OF TECHNOLOGY

2 SPEARMAN S HYPOTHESIS Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve

3 SPEARMAN S HYPOTHESIS Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve educive (adj): the ability to make sense out of complexity reproducive (adj): the ability to store and reproduce informaion

4 SPEARMAN S HYPOTHESIS Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve To test this theory, he invented Factor Analysis: students (1000) tests (10) M A B T educive (adj): the ability to make sense out of complexity reproducive (adj): the ability to store and reproduce informaion

5 SPEARMAN S HYPOTHESIS Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve To test this theory, he invented Factor Analysis: students (1000) inner- dimension (2) tests (10) M A B T educive (adj): the ability to make sense out of complexity reproducive (adj): the ability to store and reproduce informaion

6 Given: M = a i b i = A B T correct factors

7 Given: M = a i b i = A B T correct factors When can we recover the factors a i and b i uniquely?

8 Given: M = a i b i = A B T = AR R - 1 B T correct factors alternaive factorizaion When can we recover the factors a i and b i uniquely?

9 Given: M = a i b i = A B T = AR R - 1 B T correct factors alternaive factorizaion When can we recover the factors a i and b i uniquely? Claim: The factors {a i } and {b i } are not determined uniquely unless we impose addiional condiions on them

10 Given: M = a i b i = A B T = AR R - 1 B T correct factors alternaive factorizaion When can we recover the factors a i and b i uniquely? Claim: The factors {a i } and {b i } are not determined uniquely unless we impose addiional condiions on them e.g. if {a i } and {b i } are orthogonal, or rank(m)=1

11 Given: M = a i b i = A B T = AR R - 1 B T correct factors alternaive factorizaion When can we recover the factors a i and b i uniquely? Claim: The factors {a i } and {b i } are not determined uniquely unless we impose addiional condiions on them e.g. if {a i } and {b i } are orthogonal, or rank(m)=1 This is called the rota=on problem, and is a major issue in factor analysis and moivates the study of tensor methods

12 OUTLINE The focus of this tutorial is on Algorithms/ApplicaIons/Models for tensor decomposiions Part I: Algorithms The RotaIon Problem Jennrich s Algorithm Part II: Applica=ons PhylogeneIc ReconstrucIon Pure Topic Models Part III: Smoothed Analysis Overcomplete Problems Kruskal Rank and the Khatri- Rao Product

13 MATRIX DECOMPOSITIONS M = a 1 b 1 + a 2 b a R b R + +

14 MATRIX DECOMPOSITIONS M = a 1 b 1 + a 2 b a R b R + + TENSOR DECOMPOSITIONS T = a 1 b 1 c a R b R c R (i, j, k) entry of x y z is x(i) y(j) z(k)

15 When are tensor decomposiions unique?

16 When are tensor decomposiions unique? Theorem [Jennrich 1970]: Suppose {a i } and {b i } are linearly independent and no pair of vectors in {c i } is a scalar muliple of each other

17 When are tensor decomposiions unique? Theorem [Jennrich 1970]: Suppose {a i } and {b i } are linearly independent and no pair of vectors in {c i } is a scalar muliple of each other. Then T = a 1 b 1 c a R b R c R is unique up to permuing the rank one terms and rescaling the factors.

18 When are tensor decomposiions unique? Theorem [Jennrich 1970]: Suppose {a i } and {b i } are linearly independent and no pair of vectors in {c i } is a scalar muliple of each other. Then T = a 1 b 1 c a R b R c R is unique up to permuing the rank one terms and rescaling the factors. Equivalently, the rank one factors are unique

19 When are tensor decomposiions unique? Theorem [Jennrich 1970]: Suppose {a i } and {b i } are linearly independent and no pair of vectors in {c i } is a scalar muliple of each other. Then T = a 1 b 1 c a R b R c R is unique up to permuing the rank one terms and rescaling the factors. Equivalently, the rank one factors are unique There is a simple algorithm to compute the factors too!

20 JENNRICH S ALGORITHM Compute T(,, x ) i.e. add up matrix slices x x i T i

21 JENNRICH S ALGORITHM Compute T(,, x ) i.e. add up matrix slices x x i T i If T = a b c then T(,, x ) = c, x a b

22 JENNRICH S ALGORITHM Compute T(,, x ) i.e. add up matrix slices x x i T i

23 JENNRICH S ALGORITHM Compute T(,, x ) = c i, x a i b i i.e. add up matrix slices x i T i x

24 JENNRICH S ALGORITHM Compute T(,, x ) = c i, x a i b i i.e. add up matrix slices x i T i x (x is chosen uniformly at random from S n- 1 )

25 JENNRICH S ALGORITHM Diag( c i, x ) Compute T(,, x ) = A D x B T i.e. add up matrix slices x x i T i (x is chosen uniformly at random from S n- 1 )

26 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T

27 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T

28 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1

29 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1 A D x B T (B T ) - 1 D y - 1 A - 1

30 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1 A D x D y - 1 A - 1

31 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1 A D x D y - 1 A - 1 Claim: whp (over x,y) the eigenvalues are disinct, so the EigendecomposiIon is unique and recovers a i s

32 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1

33 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1 Diagonalize T(,, y ) T(,, x ) - 1

34 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1 Diagonalize T(,, y ) T(,, x ) - 1 Match up the factors (their eigenvalues are reciprocals) and find {c i } by solving a linear syst.

35 Given: M = a i b i When can we recover the factors a i and b i uniquely? This is only possible if {a i } and {b i } are orthonormal, or rank(m)=1

36 Given: M = a i b i When can we recover the factors a i and b i uniquely? This is only possible if {a i } and {b i } are orthonormal, or rank(m)=1 Given: T = a i b i c i When can we recover the factors a i, b i and c i uniquely?

37 Given: M = a i b i When can we recover the factors a i and b i uniquely? This is only possible if {a i } and {b i } are orthonormal, or rank(m)=1 Given: T = a i b i c i When can we recover the factors a i, b i and c i uniquely? Jennrich: If {a i } and {b i } are full rank and no pair in {c i } are scalar muliples of each other

38 OUTLINE The focus of this tutorial is on Algorithms/ApplicaIons/Models for tensor decomposiions Part I: Algorithms The RotaIon Problem Jennrich s Algorithm Part II: Applica=ons PhylogeneIc ReconstrucIon Pure Topic Models Part III: Smoothed Analysis Overcomplete Problems Kruskal Rank and the Khatri- Rao Product

39 PHYLOGENETIC RECONSTRUCTION x a y b z c d = exinct = extant Tree of Life

40 PHYLOGENETIC RECONSTRUCTION x a y b z c d = exinct = extant

41 PHYLOGENETIC RECONSTRUCTION a x root: π : Σ R + iniial distribuion y b z c d = exinct = extant Σ = alphabet

42 PHYLOGENETIC RECONSTRUCTION a x root: π : Σ R + iniial distribuion y condiional distribuion R z,b b z c d = exinct = extant Σ = alphabet

43 PHYLOGENETIC RECONSTRUCTION a x root: π : Σ R + iniial distribuion y condiional distribuion R z,b b z c d = exinct = extant Σ = alphabet In each sample, we observe a symbol (Σ) at each extant ( ) node where we sample from π for the root, and propagate it using R x,y, etc

44 HIDDEN MARKOV MODELS = hidden = observed x y z a b c

45 HIDDEN MARKOV MODELS π : Σ s R + iniial distribuion = hidden = observed x y z a b c

46 HIDDEN MARKOV MODELS π : Σ s R + iniial distribuion obs. matrices O x,a x a R x,y = hidden = observed transiion matrices y z b c

47 HIDDEN MARKOV MODELS π : Σ s R + iniial distribuion obs. matrices O x,a x a R x,y = hidden = observed transiion matrices y z b c In each sample, we observe a symbol (Σ o ) at each obs. ( ) node where we sample from π for the start, and propagate it using R x,y, etc (Σ s )

48 Ques=on: Can we reconstruct just the topology from random samples?

49 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily

50 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Steel, 1994]: The following is a distance funcion on the edges d x,y = - ln det(p x,y ) + ½ ln π x,σ - ½ ln π y,σ σ in Σ σ in Σ where P x,y is the joint distribuion

51 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Steel, 1994]: The following is a distance funcion on the edges d x,y = - ln det(p x,y ) + ½ ln π x,σ - ½ ln π y,σ σ in Σ σ in Σ where P x,y is the joint distribuion, and the distance between leaves is the sum of distances on the path in the tree

52 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Steel, 1994]: The following is a distance funcion on the edges d x,y = - ln det(p x,y ) + ½ ln π x,σ - ½ ln π y,σ σ in Σ σ in Σ where P x,y is the joint distribuion, and the distance between leaves is the sum of distances on the path in the tree (It s not even obvious it s nonnega=ve!)

53 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily

54 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Erdos, Steel, Szekely, Warnow, 1997]: Used Steel s distance funcion and quartet tests a d b c OR a b c d OR to reconstrucion the topology

55 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Erdos, Steel, Szekely, Warnow, 1997]: Used Steel s distance funcion and quartet tests a d b c OR to reconstrucion the topology, from polynomially many samples a b c d OR

56 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Erdos, Steel, Szekely, Warnow, 1997]: Used Steel s distance funcion and quartet tests a d b c OR to reconstrucion the topology, from polynomially many samples a b c d OR For many problems (e.g. HMMs) finding the transiion matrices is the main issue

57 [Chang, 1996]: The model is idenifiable (if R s are full rank)

58 [Chang, 1996]: The model is idenifiable (if R s are full rank) x a y R z,b z d b c

59 [Chang, 1996]: The model is idenifiable (if R s are full rank) x a y R z,b z d b c

60 [Chang, 1996]: The model is idenifiable (if R s are full rank) x a a R z,b b z y c d b z c

61 [Chang, 1996]: The model is idenifiable (if R s are full rank) x a a R z,b b z y c d b z c Joint distribu=on over (a, b, c): σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ]

62 [Chang, 1996]: The model is idenifiable (if R s are full rank) x a a R z,b b z y c d b z c Joint distribu=on over (a, b, c): σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ] columns of R z,b

63 [Mossel, Roch, 2006]: There is an algorithm to PAC learn a phylogeneic tree or an HMM (if its transiion/output matrices are full rank) from polynomially many samples

64 [Mossel, Roch, 2006]: There is an algorithm to PAC learn a phylogeneic tree or an HMM (if its transiion/output matrices are full rank) from polynomially many samples Ques=on: Is the full- rank assumpion necessary?

65 [Mossel, Roch, 2006]: There is an algorithm to PAC learn a phylogeneic tree or an HMM (if its transiion/output matrices are full rank) from polynomially many samples Ques=on: Is the full- rank assumpion necessary? [Mossel, Roch, 2006]: It is as hard as noisy- parity to learn the parameters of a general HMM

66 [Mossel, Roch, 2006]: There is an algorithm to PAC learn a phylogeneic tree or an HMM (if its transiion/output matrices are full rank) from polynomially many samples Ques=on: Is the full- rank assumpion necessary? [Mossel, Roch, 2006]: It is as hard as noisy- parity to learn the parameters of a general HMM Noisy- parity is an infamous problem in learning, where O(n) samples suffice but the best algorithms run in Ime 2 n/log(n) Due to [Blum, Kalai, Wasserman, 2003]

67 [Mossel, Roch, 2006]: There is an algorithm to PAC learn a phylogeneic tree or an HMM (if its transiion/output matrices are full rank) from polynomially many samples Ques=on: Is the full- rank assumpion necessary? [Mossel, Roch, 2006]: It is as hard as noisy- parity to learn the parameters of a general HMM Noisy- parity is an infamous problem in learning, where O(n) samples suffice but the best algorithms run in Ime 2 n/log(n) Due to [Blum, Kalai, Wasserman, 2003] (It s now used as a hard problem to build cryptosystems!)

68 THE POWER OF CONDITIONAL INDEPENDENCE [Phylogene=c Trees/HMMS]: (joint distribuion on leaves a, b, c) σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ]

69 PURE TOPIC MODELS topics (r) Each topic is a distribuion on words words (m)

70 PURE TOPIC MODELS topics (r) Each topic is a distribuion on words words (m) Each document is about only one topic (stochasically generated)

71 PURE TOPIC MODELS topics (r) Each topic is a distribuion on words words (m) Each document is about only one topic (stochasically generated) Each document, we sample L words from its distribuion

72 PURE TOPIC MODELS A W M =

73 PURE TOPIC MODELS A W M =

74 PURE TOPIC MODELS A W M =

75 PURE TOPIC MODELS A W M =

76 PURE TOPIC MODELS A W M =

77 PURE TOPIC MODELS A W M =

78 PURE TOPIC MODELS A W M

79 PURE TOPIC MODELS A W M [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank)

80 PURE TOPIC MODELS A W M [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank) Ques=on: Where can we find three condiionally independent random variables?

81 PURE TOPIC MODELS A W M [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank)

82 PURE TOPIC MODELS A W M [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank) The first, second and third words are independent condiioned on the topic t (and are random samples from A t )

83 THE POWER OF CONDITIONAL INDEPENDENCE [Phylogene=c Trees/HMMS]: (joint distribuion on leaves a, b, c) σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ]

84 THE POWER OF CONDITIONAL INDEPENDENCE [Phylogene=c Trees/HMMS]: (joint distribuion on leaves a, b, c) σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ] [Pure Topic Models/LDA]: (joint distribuion on first three words) j Pr[topic = j] A j A j A j

85 THE POWER OF CONDITIONAL INDEPENDENCE [Phylogene=c Trees/HMMS]: (joint distribuion on leaves a, b, c) σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ] [Pure Topic Models/LDA]: (joint distribuion on first three words) j Pr[topic = j] A j A j A j [Community Detec=on]: (couning stars) j Pr[C x = j] (C A Π) j (C B Π) j (C C Π) j

86 OUTLINE The focus of this tutorial is on Algorithms/ApplicaIons/Models for tensor decomposiions Part I: Algorithms The RotaIon Problem Jennrich s Algorithm Part II: Applica=ons PhylogeneIc ReconstrucIon Pure Topic Models Part III: Smoothed Analysis Overcomplete Problems Kruskal Rank and the Khatri- Rao Product

87 So far, Jennrich s algorithm has been the key but it has a crucial limitaion.

88 So far, Jennrich s algorithm has been the key but it has a crucial limitaion. Let R T = a i a i a i i = 1 where {a i } are n- dimensional vectors

89 So far, Jennrich s algorithm has been the key but it has a crucial limitaion. Let T = a i a i a i i = 1 where {a i } are n- dimensional vectors R Ques=on: What if R is much larger than n?

90 So far, Jennrich s algorithm has been the key but it has a crucial limitaion. Let R T = a i a i a i i = 1 where {a i } are n- dimensional vectors Ques=on: What if R is much larger than n? This is called the overcomplete case e.g. the number of factors is much larger than the number of observaions

91 So far, Jennrich s algorithm has been the key but it has a crucial limitaion. Let R T = a i a i a i i = 1 where {a i } are n- dimensional vectors Ques=on: What if R is much larger than n? This is called the overcomplete case e.g. the number of factors is much larger than the number of observaions In such cases, why stop at third- order tensors?

92 Consider a sixth- order tensor T: R T = a i a i a i i = 1 a i a i a i

93 Consider a sixth- order tensor T: R T = a i a i a i i = 1 a i a i a i Ques=on: Can we find its factors, even if R is much larger than n?

94 Consider a sixth- order tensor T: R T = a i a i a i i = 1 a i a i a i Ques=on: Can we find its factors, even if R is much larger than n? Let s flaven it: R flat(t) = b i b i b i i = 1 (where b i = a i a KR i ) n 2 - dimensional vector whose (j,k) th entry is the product of the j th and k th entries of a i Khatri- Rao product

95 Consider a sixth- order tensor T: R T = a i a i a i i = 1 a i a i a i Ques=on: Can we find its factors, even if R is much larger than n? Let s flaven it by rearranging its entries into a third- order tensor: R flat(t) = b i b i b i i = 1 (where b i = a i a KR i ) n 2 - dimensional vector whose (j,k) th entry is the product of the j th and k th entries of a i Khatri- Rao product

96 Ques=on: Can we apply Jennrich s Algorithm to flat(t)?

97 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i a KR i linearly independent?

98 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #1: a KR i linearly independent? Let {a i } be all ( n ) 2 vectors with exactly two ones

99 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #1: a KR i linearly independent? Let {a i } be all ( n ) 2 vectors with exactly two ones Then {b i } are vectorizaions of: =

100 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #1: a KR i linearly independent? Let {a i } be all ( n ) 2 Then {b i } are vectorizaions of: vectors with exactly two ones Non- zero only in b i =

101 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #1: a KR i linearly independent? Let {a i } be all ( n ) 2 Then {b i } are vectorizaions of: vectors with exactly two ones Non- zero only in b i = and are linearly independent

102 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i a KR i linearly independent?

103 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #2: a KR i linearly independent? Let {a 1 n } and {a n+1..2n } be two random orthonormal bases

104 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #2: a KR i linearly independent? Let {a 1 n } and {a n+1..2n } be two random orthonormal bases Then there is a linear dependence with 2n terms:

105 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #2: a KR i linearly independent? Let {a 1 n } and {a n+1..2n } be two random orthonormal bases Then there is a linear dependence with 2n terms: n i = 1 a i a i KR 2n - a i KR a i i = n+1 = 0

106 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #2: a KR i linearly independent? Let {a 1 n } and {a n+1..2n } be two random orthonormal bases Then there is a linear dependence with 2n terms: n i = 1 a i a i KR 2n - a i KR a i i = n+1 = 0 (as matrices, both sum to the idenity)

107 THE KRUSKAL RANK

108 THE KRUSKAL RANK Defini=on: The Kruskal rank (k- rank) of {b i } is the largest k s.t. every set of k vectors is linearly independent

109 THE KRUSKAL RANK Defini=on: The Kruskal rank (k- rank) of {b i } is the largest k s.t. every set of k vectors is linearly independent b i = a i a KR i k- rank({a i }) = n

110 THE KRUSKAL RANK Defini=on: The Kruskal rank (k- rank) of {b i } is the largest k s.t. every set of k vectors is linearly independent b i = a i a i KR k- rank({a i }) = n Example #1: k- rank({b i }) = R = ( n ) 2

111 THE KRUSKAL RANK Defini=on: The Kruskal rank (k- rank) of {b i } is the largest k s.t. every set of k vectors is linearly independent b i = a i a i KR k- rank({a i }) = n Example #1: k- rank({b i }) = R = ( n ) 2 Example #2: k- rank({b i }) = 2n- 1

112 THE KRUSKAL RANK Defini=on: The Kruskal rank (k- rank) of {b i } is the largest k s.t. every set of k vectors is linearly independent b i = a i a i KR k- rank({a i }) = n Example #1: k- rank({b i }) = R = ( n ) 2 Example #2: k- rank({b i }) = 2n- 1 The Kruskal rank always adds under the Khatri- Rao product, but someimes it mul=plies and that can allow us to handle R >> n

113 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product

114 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Proof: The set of {a i } where b i = a i is measure zero a KR i and det({b i }) = 0

115 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Proof: The set of {a i } where b i = a i is measure zero a KR i and det({b i }) = 0 But this yields a very weak bound on the condi=on number of {b i }

116 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Proof: The set of {a i } where b i = a i is measure zero a KR i and det({b i }) = 0 But this yields a very weak bound on the condi=on number of {b i } which is what we need to apply it to learning/staisics, where we have an esimate to T

117 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product

118 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Defini=on: The robust Kruskal rank (k- rank γ ) of {b i } is the largest k s.t. every set of k vector has condiion number at most O(γ)

119 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Defini=on: The robust Kruskal rank (k- rank γ ) of {b i } is the largest k s.t. every set of k vector has condiion number at most O(γ) [Bhaskara, Charikar, Vijayaraghavan, 2013]: The robust Kruskal rank always under the Khatri- Rao product

120 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Defini=on: The robust Kruskal rank (k- rank γ ) of {b i } is the largest k s.t. every set of k vector has condiion number at most O(γ) [Bhaskara, Charikar, Vijayaraghavan, 2013]: The robust Kruskal rank always under the Khatri- Rao product [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed

121 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Defini=on: The robust Kruskal rank (k- rank γ ) of {b i } is the largest k s.t. every set of k vector has condiion number at most O(γ) [Bhaskara, Charikar, Vijayaraghavan, 2013]: The robust Kruskal rank always under the Khatri- Rao product [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)

122 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)

123 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ) Hence we can apply Jennrich s Algorithm to flat(t) with R >> n

124 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ) Hence we can apply Jennrich s Algorithm to flat(t) with R >> n Note: These bounds are easy to prove with inverse polynomial failure probability, but then γ depends δ

125 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ) Hence we can apply Jennrich s Algorithm to flat(t) with R >> n Note: These bounds are easy to prove with inverse polynomial failure probability, but then γ depends δ This can be extended to any constant order Khatri- Rao product

126 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ) Hence we can apply Jennrich s Algorithm to flat(t) with R >> n

127 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ) Hence we can apply Jennrich s Algorithm to flat(t) with R >> n Sample applica=on: Algorithm for learning mixtures of n O(1) spherical Gaussians in R n, if their means are ε- perturbed This was also obtained independently by [Anderson, Belkin, Goyal, Rademacher, Voss, 2014]

128 Any QuesIons? Summary: Tensor decomposiions are unique under much more general condiions, compared to matrix decomposiions Jennrich s Algorithm (rediscovered many Imes!), and its many applicaions in learning/staisics Introduced new models to study overcomplete problems (R >> n) Are there algorithms for order- k tensors that work with R = n 0.51 k?

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

A Robust Form of Kruskal s Identifiability Theorem

A Robust Form of Kruskal s Identifiability Theorem A Robust Form of Kruskal s Identifiability Theorem Aditya Bhaskara (Google NYC) Joint work with Moses Charikar (Princeton University) Aravindan Vijayaraghavan (NYU Northwestern) Background: understanding

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Supplemental for Spectral Algorithm For Latent Tree Graphical Models Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable

More information

Computational Lower Bounds for Statistical Estimation Problems

Computational Lower Bounds for Statistical Estimation Problems Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK

More information

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way

More information

arxiv: v1 [math.ra] 13 Jan 2009

arxiv: v1 [math.ra] 13 Jan 2009 A CONCISE PROOF OF KRUSKAL S THEOREM ON TENSOR DECOMPOSITION arxiv:0901.1796v1 [math.ra] 13 Jan 2009 JOHN A. RHODES Abstract. A theorem of J. Kruskal from 1977, motivated by a latent-class statistical

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

A concise proof of Kruskal s theorem on tensor decomposition

A concise proof of Kruskal s theorem on tensor decomposition A concise proof of Kruskal s theorem on tensor decomposition John A. Rhodes 1 Department of Mathematics and Statistics University of Alaska Fairbanks PO Box 756660 Fairbanks, AK 99775 Abstract A theorem

More information

arxiv: v1 [cs.lg] 13 Aug 2013

arxiv: v1 [cs.lg] 13 Aug 2013 When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity arxiv:1308.2853v1 [cs.lg] 13 Aug 2013 Animashree Anandkumar, Daniel Hsu, Majid Janzamin

More information

Hidden Markov Models. Algorithms for NLP September 25, 2014

Hidden Markov Models. Algorithms for NLP September 25, 2014 Hidden Markov Models Algorithms for NLP September 25, 2014 Quick Review WFSAs DeterminisIc (for scoring strings) Best path algorithm for WFSAs General Case: Ambiguous WFSAs Ambiguous WFSAs A string may

More information

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we

More information

Learning Mixtures of Gaussians in High Dimensions

Learning Mixtures of Gaussians in High Dimensions Learning Mixtures of Gaussians in High Dimensions [Extended Abstract] Rong Ge Microsoft Research, New England Qingqing Huang MIT, EECS Sham M. Kakade Microsoft Research, New England ABSTRACT Efficiently

More information

Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons

Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons Nima Anari Computer Science Stanford University anari@cs.stanford.edu Constantinos Daskalakis EECS MIT costis@csail.mit.edu

More information

Tensor Analysis. Topics in Data Mining Fall Bruno Ribeiro

Tensor Analysis. Topics in Data Mining Fall Bruno Ribeiro Tensor Analysis Topics in Data Mining Fall 2015 Bruno Ribeiro Tensor Basics But First 2 Mining Matrices 3 Singular Value Decomposition (SVD) } X(i,j) = value of user i for property j i 2 j 5 X(Alice, cholesterol)

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

Spectral Methods for Learning Multivariate Latent Tree Structure

Spectral Methods for Learning Multivariate Latent Tree Structure Spectral Methods for Learning Multivariate Latent Tree Structure Animashree Anandkumar UC Irvine a.anandkumar@uci.edu Kamalika Chaudhuri UC San Diego kamalika@cs.ucsd.edu Daniel Hsu Microsoft Research

More information

Efficient algorithms for estimating multi-view mixture models

Efficient algorithms for estimating multi-view mixture models Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New England Outline Multi-view mixture models Multi-view method-of-moments Some applications and open questions

More information

Quantum Computing Lecture 2. Review of Linear Algebra

Quantum Computing Lecture 2. Review of Linear Algebra Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces

More information

Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints

Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Anima Anandkumar U.C. Irvine Joint work with Daniel Hsu, Majid Janzamin Adel Javanmard and Sham Kakade.

More information

Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra in Actuarial Science: Slides to the lecture Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations

More information

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Introduc)on to Bayesian methods (con)nued) - Lecture 16 Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling

More information

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory

More information

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T. Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where

More information

Math 1553, Introduction to Linear Algebra

Math 1553, Introduction to Linear Algebra Learning goals articulate what students are expected to be able to do in a course that can be measured. This course has course-level learning goals that pertain to the entire course, and section-level

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Efficient Spectral Methods for Learning Mixture Models!

Efficient Spectral Methods for Learning Mixture Models! Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.

More information

1 Last time: least-squares problems

1 Last time: least-squares problems MATH Linear algebra (Fall 07) Lecture Last time: least-squares problems Definition. If A is an m n matrix and b R m, then a least-squares solution to the linear system Ax = b is a vector x R n such that

More information

Learning mixtures of spherical Gaussians: moment methods and spectral decompositions

Learning mixtures of spherical Gaussians: moment methods and spectral decompositions Learning mixtures of spherical Gaussians: moment methods and spectral decompositions Daniel Hsu and Sham M. Kakade Microsoft Research New England October 8, 01 Abstract This work provides a computationally

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given

More information

Learning Nonsingular Phylogenies and Hidden Markov Models

Learning Nonsingular Phylogenies and Hidden Markov Models University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2006 Learning Nonsingular Phylogenies and Hidden Markov Models Elchanan Mossel University of Pennsylvania Sébastien

More information

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2 Contents Preface for the Instructor xi Preface for the Student xv Acknowledgments xvii 1 Vector Spaces 1 1.A R n and C n 2 Complex Numbers 2 Lists 5 F n 6 Digression on Fields 10 Exercises 1.A 11 1.B Definition

More information

arxiv: v4 [cs.lg] 13 Nov 2014

arxiv: v4 [cs.lg] 13 Nov 2014 Journal of Machine Learning Research 15 (2014) 2773-2832 Submitted 2/13; Revised 3/14; Published 8/14 Tensor Decompositions for Learning Latent Variable Models arxiv:1210.7559v4 [cs.lg] 13 Nov 2014 Animashree

More information

Therefore, A and B have the same characteristic polynomial and hence, the same eigenvalues.

Therefore, A and B have the same characteristic polynomial and hence, the same eigenvalues. Similar Matrices and Diagonalization Page 1 Theorem If A and B are n n matrices, which are similar, then they have the same characteristic equation and hence the same eigenvalues. Proof Let A and B be

More information

Minimal Realization Problems for Hidden Markov Models

Minimal Realization Problems for Hidden Markov Models Minimal Realization Problems for Hidden Markov Models Qingqing Huang, Rong Ge, Sham Kakade, Munther Dahleh 1 arxiv:14113698v2 [cslg] 14 Dec 2015 Abstract This paper addresses two fundamental problems in

More information

BEYOND MATRIX COMPLETION

BEYOND MATRIX COMPLETION BEYOND MATRIX COMPLETION ANKUR MOITRA MASSACHUSETTS INSTITUTE OF TECHNOLOGY Based on joint work with Boaz Barak (MSR) Part I: RecommendaIon systems and parially observed matrices THE NETFLIX PROBLEM movies

More information

Lecture 9 : Identifiability of Markov Models

Lecture 9 : Identifiability of Markov Models Lecture 9 : Identifiability of Markov Models MATH285K - Spring 2010 Lecturer: Sebastien Roch References: [SS03, Chapter 8]. Previous class THM 9.1 (Uniqueness of tree metric representation) Let δ be a

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons

Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons Nima Anari Computer Science Stanford University anari@cs.stanford.edu Constantinos Daskalakis EECS MIT costis@csail.mit.edu

More information

1 Introduction The j-state General Markov Model of Evolution was proposed by Steel in 1994 [14]. The model is concerned with the evolution of strings

1 Introduction The j-state General Markov Model of Evolution was proposed by Steel in 1994 [14]. The model is concerned with the evolution of strings Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model Mary Cryan Leslie Ann Goldberg Paul W. Goldberg. July 20, 1998 Abstract The j-state General Markov Model of evolution

More information

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 The test lasts 1 hour and 15 minutes. No documents are allowed. The use of a calculator, cell phone or other equivalent electronic

More information

[3] (b) Find a reduced row-echelon matrix row-equivalent to ,1 2 2

[3] (b) Find a reduced row-echelon matrix row-equivalent to ,1 2 2 MATH Key for sample nal exam, August 998 []. (a) Dene the term \reduced row-echelon matrix". A matrix is reduced row-echelon if the following conditions are satised. every zero row lies below every nonzero

More information

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL MATH 3 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL MAIN TOPICS FOR THE FINAL EXAM:. Vectors. Dot product. Cross product. Geometric applications. 2. Row reduction. Null space, column space, row space, left

More information

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015 Solutions to Final Practice Problems Written by Victoria Kala vtkala@math.ucsb.edu Last updated /5/05 Answers This page contains answers only. See the following pages for detailed solutions. (. (a x. See

More information

Robustness Meets Algorithms

Robustness Meets Algorithms Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately

More information

CS4670: Computer Vision Kavita Bala. Lecture 7: Harris Corner Detec=on

CS4670: Computer Vision Kavita Bala. Lecture 7: Harris Corner Detec=on CS4670: Computer Vision Kavita Bala Lecture 7: Harris Corner Detec=on Announcements HW 1 will be out soon Sign up for demo slots for PA 1 Remember that both partners have to be there We will ask you to

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

Unit 5. Matrix diagonaliza1on

Unit 5. Matrix diagonaliza1on Unit 5. Matrix diagonaliza1on Linear Algebra and Op1miza1on Msc Bioinforma1cs for Health Sciences Eduardo Eyras Pompeu Fabra University 218-219 hlp://comprna.upf.edu/courses/master_mat/ We have seen before

More information

MA 265 FINAL EXAM Fall 2012

MA 265 FINAL EXAM Fall 2012 MA 265 FINAL EXAM Fall 22 NAME: INSTRUCTOR S NAME:. There are a total of 25 problems. You should show work on the exam sheet, and pencil in the correct answer on the scantron. 2. No books, notes, or calculators

More information

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show MTH 0: Linear Algebra Department of Mathematics and Statistics Indian Institute of Technology - Kanpur Problem Set Problems marked (T) are for discussions in Tutorial sessions (T) If A is an m n matrix,

More information

Tensors: a geometric view Open lecture November 24, 2014 Simons Institute, Berkeley. Giorgio Ottaviani, Università di Firenze

Tensors: a geometric view Open lecture November 24, 2014 Simons Institute, Berkeley. Giorgio Ottaviani, Università di Firenze Open lecture November 24, 2014 Simons Institute, Berkeley Plan of the talk Introduction to Tensors A few relevant applications Tensors as multidimensional matrices A (complex) a b c tensor is an element

More information

Ankur Moitra. c Draft date March 30, Algorithmic Aspects of Machine Learning

Ankur Moitra. c Draft date March 30, Algorithmic Aspects of Machine Learning Algorithmic Aspects of Machine Learning Ankur Moitra c Draft date March 30, 2014 Algorithmic Aspects of Machine Learning 2015 by Ankur Moitra. Note: These are unpolished, incomplete course notes. Developed

More information

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

MATH 304 Linear Algebra Lecture 34: Review for Test 2. MATH 304 Linear Algebra Lecture 34: Review for Test 2. Topics for Test 2 Linear transformations (Leon 4.1 4.3) Matrix transformations Matrix of a linear mapping Similar matrices Orthogonality (Leon 5.1

More information

MATH 320: PRACTICE PROBLEMS FOR THE FINAL AND SOLUTIONS

MATH 320: PRACTICE PROBLEMS FOR THE FINAL AND SOLUTIONS MATH 320: PRACTICE PROBLEMS FOR THE FINAL AND SOLUTIONS There will be eight problems on the final. The following are sample problems. Problem 1. Let F be the vector space of all real valued functions on

More information

Lecture 4. CP and KSVD Representations. Charles F. Van Loan

Lecture 4. CP and KSVD Representations. Charles F. Van Loan Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD Representations Charles F. Van Loan Cornell University CIME-EMS Summer School June 22-26, 2015 Cetraro, Italy Structured Matrix

More information

Chapter Two Elements of Linear Algebra

Chapter Two Elements of Linear Algebra Chapter Two Elements of Linear Algebra Previously, in chapter one, we have considered single first order differential equations involving a single unknown function. In the next chapter we will begin to

More information

Algebraic Statistics Tutorial I

Algebraic Statistics Tutorial I Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =

More information

Final Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015

Final Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015 Final Review Written by Victoria Kala vtkala@mathucsbedu SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015 Summary This review contains notes on sections 44 47, 51 53, 61, 62, 65 For your final,

More information

Lecture 6. Numerical methods. Approximation of functions

Lecture 6. Numerical methods. Approximation of functions Lecture 6 Numerical methods Approximation of functions Lecture 6 OUTLINE 1. Approximation and interpolation 2. Least-square method basis functions design matrix residual weighted least squares normal equation

More information

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions [Extended Abstract] ABSTRACT Daniel Hsu Microsoft Research New England dahsu@microsoft.com This work provides a computationally

More information

A Polynomial Time Algorithm for Lossy Population Recovery

A Polynomial Time Algorithm for Lossy Population Recovery A Polynomial Time Algorithm for Lossy Population Recovery Ankur Moitra Massachusetts Institute of Technology joint work with Mike Saks A Story A Story A Story A Story A Story Can you reconstruct a description

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information

Problem # Max points possible Actual score Total 120

Problem # Max points possible Actual score Total 120 FINAL EXAMINATION - MATH 2121, FALL 2017. Name: ID#: Email: Lecture & Tutorial: Problem # Max points possible Actual score 1 15 2 15 3 10 4 15 5 15 6 15 7 10 8 10 9 15 Total 120 You have 180 minutes to

More information

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2018 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing

More information

Key words. computational learning theory, evolutionary trees, PAC-learning, learning of distributions,

Key words. computational learning theory, evolutionary trees, PAC-learning, learning of distributions, EVOLUTIONARY TREES CAN BE LEARNED IN POLYNOMIAL TIME IN THE TWO-STATE GENERAL MARKOV MODEL MARY CRYAN, LESLIE ANN GOLDBERG, AND PAUL W. GOLDBERG. Abstract. The j-state General Markov Model of evolution

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

On Finding Planted Cliques and Solving Random Linear Equa9ons. Math Colloquium. Lev Reyzin UIC

On Finding Planted Cliques and Solving Random Linear Equa9ons. Math Colloquium. Lev Reyzin UIC On Finding Planted Cliques and Solving Random Linear Equa9ons Math Colloquium Lev Reyzin UIC 1 random graphs 2 Erdős- Rényi Random Graphs G(n,p) generates graph G on n ver9ces by including each possible

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information

Preliminary Linear Algebra 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 100

Preliminary Linear Algebra 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 100 Preliminary Linear Algebra 1 Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 100 Notation for all there exists such that therefore because end of proof (QED) Copyright c 2012

More information

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix

More information

Disentangling Gaussians

Disentangling Gaussians Disentangling Gaussians Ankur Moitra, MIT November 6th, 2014 Dean s Breakfast Algorithmic Aspects of Machine Learning 2015 by Ankur Moitra. Note: These are unpolished, incomplete course notes. Developed

More information

Guaranteed Learning of Latent Variable Models through Tensor Methods

Guaranteed Learning of Latent Variable Models through Tensor Methods Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of Maryland furongh@cs.umd.edu ACM SIGMETRICS Tutorial 2018 1/75 Tutorial Topic Learning algorithms for latent

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

CAREER: Algorithmic Aspects of Machine Learning

CAREER: Algorithmic Aspects of Machine Learning Proposal Summary Over the past few decades an uncomfortable truth has set in that worst case analysis is not the right framework for studying machine learning: every model that is interesting enough to

More information

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2016 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing

More information

3 (Maths) Linear Algebra

3 (Maths) Linear Algebra 3 (Maths) Linear Algebra References: Simon and Blume, chapters 6 to 11, 16 and 23; Pemberton and Rau, chapters 11 to 13 and 25; Sundaram, sections 1.3 and 1.5. The methods and concepts of linear algebra

More information

Bayesian networks Lecture 18. David Sontag New York University

Bayesian networks Lecture 18. David Sontag New York University Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence

More information

Spectral Learning of Mixture of Hidden Markov Models

Spectral Learning of Mixture of Hidden Markov Models Spectral Learning of Mixture of Hidden Markov Models Y Cem Sübakan, Johannes raa, Paris Smaragdis,, Department of Computer Science, University of Illinois at Urbana-Champaign Department of Electrical and

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

ANSWERS. E k E 2 E 1 A = B

ANSWERS. E k E 2 E 1 A = B MATH 7- Final Exam Spring ANSWERS Essay Questions points Define an Elementary Matrix Display the fundamental matrix multiply equation which summarizes a sequence of swap, combination and multiply operations,

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Compressed Sensing and Linear Codes over Real Numbers

Compressed Sensing and Linear Codes over Real Numbers Compressed Sensing and Linear Codes over Real Numbers Henry D. Pfister (joint with Fan Zhang) Texas A&M University College Station Information Theory and Applications Workshop UC San Diego January 31st,

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra: Matrix Eigenvalue Problems CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given

More information

Sum-of-Squares and Spectral Algorithms

Sum-of-Squares and Spectral Algorithms Sum-of-Squares and Spectral Algorithms Tselil Schramm June 23, 2017 Workshop on SoS @ STOC 2017 Spectral algorithms as a tool for analyzing SoS. SoS Semidefinite Programs Spectral Algorithms SoS suggests

More information

A Review of Linear Algebra

A Review of Linear Algebra A Review of Linear Algebra Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.edu These slides are a supplement to the book Numerical Methods with Matlab: Implementations

More information

Introduction to Numerical Linear Algebra II

Introduction to Numerical Linear Algebra II Introduction to Numerical Linear Algebra II Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA 1 / 49 Overview We will cover this material in

More information

CS 10: Problem solving via Object Oriented Programming Winter 2017

CS 10: Problem solving via Object Oriented Programming Winter 2017 CS 10: Problem solving via Object Oriented Programming Winter 2017 Tim Pierson 260 (255) Sudikoff Day 20 PaGern RecogniIon Agenda 1. PaGern matching vs. recogniion 2. From Finite Automata to Hidden Markov

More information

MTH 2032 SemesterII

MTH 2032 SemesterII MTH 202 SemesterII 2010-11 Linear Algebra Worked Examples Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2011 ii Contents Table of Contents

More information