TENSOR DECOMPOSITIONS AND THEIR APPLICATIONS
|
|
- Sharlene Montgomery
- 5 years ago
- Views:
Transcription
1 TENSOR DECOMPOSITIONS AND THEIR APPLICATIONS ANKUR MOITRA MASSACHUSETTS INSTITUTE OF TECHNOLOGY
2 SPEARMAN S HYPOTHESIS Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve
3 SPEARMAN S HYPOTHESIS Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve educive (adj): the ability to make sense out of complexity reproducive (adj): the ability to store and reproduce informaion
4 SPEARMAN S HYPOTHESIS Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve To test this theory, he invented Factor Analysis: students (1000) tests (10) M A B T educive (adj): the ability to make sense out of complexity reproducive (adj): the ability to store and reproduce informaion
5 SPEARMAN S HYPOTHESIS Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve To test this theory, he invented Factor Analysis: students (1000) inner- dimension (2) tests (10) M A B T educive (adj): the ability to make sense out of complexity reproducive (adj): the ability to store and reproduce informaion
6 Given: M = a i b i = A B T correct factors
7 Given: M = a i b i = A B T correct factors When can we recover the factors a i and b i uniquely?
8 Given: M = a i b i = A B T = AR R - 1 B T correct factors alternaive factorizaion When can we recover the factors a i and b i uniquely?
9 Given: M = a i b i = A B T = AR R - 1 B T correct factors alternaive factorizaion When can we recover the factors a i and b i uniquely? Claim: The factors {a i } and {b i } are not determined uniquely unless we impose addiional condiions on them
10 Given: M = a i b i = A B T = AR R - 1 B T correct factors alternaive factorizaion When can we recover the factors a i and b i uniquely? Claim: The factors {a i } and {b i } are not determined uniquely unless we impose addiional condiions on them e.g. if {a i } and {b i } are orthogonal, or rank(m)=1
11 Given: M = a i b i = A B T = AR R - 1 B T correct factors alternaive factorizaion When can we recover the factors a i and b i uniquely? Claim: The factors {a i } and {b i } are not determined uniquely unless we impose addiional condiions on them e.g. if {a i } and {b i } are orthogonal, or rank(m)=1 This is called the rota=on problem, and is a major issue in factor analysis and moivates the study of tensor methods
12 OUTLINE The focus of this tutorial is on Algorithms/ApplicaIons/Models for tensor decomposiions Part I: Algorithms The RotaIon Problem Jennrich s Algorithm Part II: Applica=ons PhylogeneIc ReconstrucIon Pure Topic Models Part III: Smoothed Analysis Overcomplete Problems Kruskal Rank and the Khatri- Rao Product
13 MATRIX DECOMPOSITIONS M = a 1 b 1 + a 2 b a R b R + +
14 MATRIX DECOMPOSITIONS M = a 1 b 1 + a 2 b a R b R + + TENSOR DECOMPOSITIONS T = a 1 b 1 c a R b R c R (i, j, k) entry of x y z is x(i) y(j) z(k)
15 When are tensor decomposiions unique?
16 When are tensor decomposiions unique? Theorem [Jennrich 1970]: Suppose {a i } and {b i } are linearly independent and no pair of vectors in {c i } is a scalar muliple of each other
17 When are tensor decomposiions unique? Theorem [Jennrich 1970]: Suppose {a i } and {b i } are linearly independent and no pair of vectors in {c i } is a scalar muliple of each other. Then T = a 1 b 1 c a R b R c R is unique up to permuing the rank one terms and rescaling the factors.
18 When are tensor decomposiions unique? Theorem [Jennrich 1970]: Suppose {a i } and {b i } are linearly independent and no pair of vectors in {c i } is a scalar muliple of each other. Then T = a 1 b 1 c a R b R c R is unique up to permuing the rank one terms and rescaling the factors. Equivalently, the rank one factors are unique
19 When are tensor decomposiions unique? Theorem [Jennrich 1970]: Suppose {a i } and {b i } are linearly independent and no pair of vectors in {c i } is a scalar muliple of each other. Then T = a 1 b 1 c a R b R c R is unique up to permuing the rank one terms and rescaling the factors. Equivalently, the rank one factors are unique There is a simple algorithm to compute the factors too!
20 JENNRICH S ALGORITHM Compute T(,, x ) i.e. add up matrix slices x x i T i
21 JENNRICH S ALGORITHM Compute T(,, x ) i.e. add up matrix slices x x i T i If T = a b c then T(,, x ) = c, x a b
22 JENNRICH S ALGORITHM Compute T(,, x ) i.e. add up matrix slices x x i T i
23 JENNRICH S ALGORITHM Compute T(,, x ) = c i, x a i b i i.e. add up matrix slices x i T i x
24 JENNRICH S ALGORITHM Compute T(,, x ) = c i, x a i b i i.e. add up matrix slices x i T i x (x is chosen uniformly at random from S n- 1 )
25 JENNRICH S ALGORITHM Diag( c i, x ) Compute T(,, x ) = A D x B T i.e. add up matrix slices x x i T i (x is chosen uniformly at random from S n- 1 )
26 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T
27 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T
28 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1
29 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1 A D x B T (B T ) - 1 D y - 1 A - 1
30 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1 A D x D y - 1 A - 1
31 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1 A D x D y - 1 A - 1 Claim: whp (over x,y) the eigenvalues are disinct, so the EigendecomposiIon is unique and recovers a i s
32 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1
33 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1 Diagonalize T(,, y ) T(,, x ) - 1
34 JENNRICH S ALGORITHM Compute T(,, x ) = A D x B T Compute T(,, y ) = A D y B T Diagonalize T(,, x ) T(,, y ) - 1 Diagonalize T(,, y ) T(,, x ) - 1 Match up the factors (their eigenvalues are reciprocals) and find {c i } by solving a linear syst.
35 Given: M = a i b i When can we recover the factors a i and b i uniquely? This is only possible if {a i } and {b i } are orthonormal, or rank(m)=1
36 Given: M = a i b i When can we recover the factors a i and b i uniquely? This is only possible if {a i } and {b i } are orthonormal, or rank(m)=1 Given: T = a i b i c i When can we recover the factors a i, b i and c i uniquely?
37 Given: M = a i b i When can we recover the factors a i and b i uniquely? This is only possible if {a i } and {b i } are orthonormal, or rank(m)=1 Given: T = a i b i c i When can we recover the factors a i, b i and c i uniquely? Jennrich: If {a i } and {b i } are full rank and no pair in {c i } are scalar muliples of each other
38 OUTLINE The focus of this tutorial is on Algorithms/ApplicaIons/Models for tensor decomposiions Part I: Algorithms The RotaIon Problem Jennrich s Algorithm Part II: Applica=ons PhylogeneIc ReconstrucIon Pure Topic Models Part III: Smoothed Analysis Overcomplete Problems Kruskal Rank and the Khatri- Rao Product
39 PHYLOGENETIC RECONSTRUCTION x a y b z c d = exinct = extant Tree of Life
40 PHYLOGENETIC RECONSTRUCTION x a y b z c d = exinct = extant
41 PHYLOGENETIC RECONSTRUCTION a x root: π : Σ R + iniial distribuion y b z c d = exinct = extant Σ = alphabet
42 PHYLOGENETIC RECONSTRUCTION a x root: π : Σ R + iniial distribuion y condiional distribuion R z,b b z c d = exinct = extant Σ = alphabet
43 PHYLOGENETIC RECONSTRUCTION a x root: π : Σ R + iniial distribuion y condiional distribuion R z,b b z c d = exinct = extant Σ = alphabet In each sample, we observe a symbol (Σ) at each extant ( ) node where we sample from π for the root, and propagate it using R x,y, etc
44 HIDDEN MARKOV MODELS = hidden = observed x y z a b c
45 HIDDEN MARKOV MODELS π : Σ s R + iniial distribuion = hidden = observed x y z a b c
46 HIDDEN MARKOV MODELS π : Σ s R + iniial distribuion obs. matrices O x,a x a R x,y = hidden = observed transiion matrices y z b c
47 HIDDEN MARKOV MODELS π : Σ s R + iniial distribuion obs. matrices O x,a x a R x,y = hidden = observed transiion matrices y z b c In each sample, we observe a symbol (Σ o ) at each obs. ( ) node where we sample from π for the start, and propagate it using R x,y, etc (Σ s )
48 Ques=on: Can we reconstruct just the topology from random samples?
49 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily
50 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Steel, 1994]: The following is a distance funcion on the edges d x,y = - ln det(p x,y ) + ½ ln π x,σ - ½ ln π y,σ σ in Σ σ in Σ where P x,y is the joint distribuion
51 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Steel, 1994]: The following is a distance funcion on the edges d x,y = - ln det(p x,y ) + ½ ln π x,σ - ½ ln π y,σ σ in Σ σ in Σ where P x,y is the joint distribuion, and the distance between leaves is the sum of distances on the path in the tree
52 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Steel, 1994]: The following is a distance funcion on the edges d x,y = - ln det(p x,y ) + ½ ln π x,σ - ½ ln π y,σ σ in Σ σ in Σ where P x,y is the joint distribuion, and the distance between leaves is the sum of distances on the path in the tree (It s not even obvious it s nonnega=ve!)
53 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily
54 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Erdos, Steel, Szekely, Warnow, 1997]: Used Steel s distance funcion and quartet tests a d b c OR a b c d OR to reconstrucion the topology
55 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Erdos, Steel, Szekely, Warnow, 1997]: Used Steel s distance funcion and quartet tests a d b c OR to reconstrucion the topology, from polynomially many samples a b c d OR
56 Ques=on: Can we reconstruct just the topology from random samples? Usually, we assume T x,y, etc are full rank so that we can re- root the tree arbitrarily [Erdos, Steel, Szekely, Warnow, 1997]: Used Steel s distance funcion and quartet tests a d b c OR to reconstrucion the topology, from polynomially many samples a b c d OR For many problems (e.g. HMMs) finding the transiion matrices is the main issue
57 [Chang, 1996]: The model is idenifiable (if R s are full rank)
58 [Chang, 1996]: The model is idenifiable (if R s are full rank) x a y R z,b z d b c
59 [Chang, 1996]: The model is idenifiable (if R s are full rank) x a y R z,b z d b c
60 [Chang, 1996]: The model is idenifiable (if R s are full rank) x a a R z,b b z y c d b z c
61 [Chang, 1996]: The model is idenifiable (if R s are full rank) x a a R z,b b z y c d b z c Joint distribu=on over (a, b, c): σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ]
62 [Chang, 1996]: The model is idenifiable (if R s are full rank) x a a R z,b b z y c d b z c Joint distribu=on over (a, b, c): σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ] columns of R z,b
63 [Mossel, Roch, 2006]: There is an algorithm to PAC learn a phylogeneic tree or an HMM (if its transiion/output matrices are full rank) from polynomially many samples
64 [Mossel, Roch, 2006]: There is an algorithm to PAC learn a phylogeneic tree or an HMM (if its transiion/output matrices are full rank) from polynomially many samples Ques=on: Is the full- rank assumpion necessary?
65 [Mossel, Roch, 2006]: There is an algorithm to PAC learn a phylogeneic tree or an HMM (if its transiion/output matrices are full rank) from polynomially many samples Ques=on: Is the full- rank assumpion necessary? [Mossel, Roch, 2006]: It is as hard as noisy- parity to learn the parameters of a general HMM
66 [Mossel, Roch, 2006]: There is an algorithm to PAC learn a phylogeneic tree or an HMM (if its transiion/output matrices are full rank) from polynomially many samples Ques=on: Is the full- rank assumpion necessary? [Mossel, Roch, 2006]: It is as hard as noisy- parity to learn the parameters of a general HMM Noisy- parity is an infamous problem in learning, where O(n) samples suffice but the best algorithms run in Ime 2 n/log(n) Due to [Blum, Kalai, Wasserman, 2003]
67 [Mossel, Roch, 2006]: There is an algorithm to PAC learn a phylogeneic tree or an HMM (if its transiion/output matrices are full rank) from polynomially many samples Ques=on: Is the full- rank assumpion necessary? [Mossel, Roch, 2006]: It is as hard as noisy- parity to learn the parameters of a general HMM Noisy- parity is an infamous problem in learning, where O(n) samples suffice but the best algorithms run in Ime 2 n/log(n) Due to [Blum, Kalai, Wasserman, 2003] (It s now used as a hard problem to build cryptosystems!)
68 THE POWER OF CONDITIONAL INDEPENDENCE [Phylogene=c Trees/HMMS]: (joint distribuion on leaves a, b, c) σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ]
69 PURE TOPIC MODELS topics (r) Each topic is a distribuion on words words (m)
70 PURE TOPIC MODELS topics (r) Each topic is a distribuion on words words (m) Each document is about only one topic (stochasically generated)
71 PURE TOPIC MODELS topics (r) Each topic is a distribuion on words words (m) Each document is about only one topic (stochasically generated) Each document, we sample L words from its distribuion
72 PURE TOPIC MODELS A W M =
73 PURE TOPIC MODELS A W M =
74 PURE TOPIC MODELS A W M =
75 PURE TOPIC MODELS A W M =
76 PURE TOPIC MODELS A W M =
77 PURE TOPIC MODELS A W M =
78 PURE TOPIC MODELS A W M
79 PURE TOPIC MODELS A W M [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank)
80 PURE TOPIC MODELS A W M [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank) Ques=on: Where can we find three condiionally independent random variables?
81 PURE TOPIC MODELS A W M [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank)
82 PURE TOPIC MODELS A W M [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank) The first, second and third words are independent condiioned on the topic t (and are random samples from A t )
83 THE POWER OF CONDITIONAL INDEPENDENCE [Phylogene=c Trees/HMMS]: (joint distribuion on leaves a, b, c) σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ]
84 THE POWER OF CONDITIONAL INDEPENDENCE [Phylogene=c Trees/HMMS]: (joint distribuion on leaves a, b, c) σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ] [Pure Topic Models/LDA]: (joint distribuion on first three words) j Pr[topic = j] A j A j A j
85 THE POWER OF CONDITIONAL INDEPENDENCE [Phylogene=c Trees/HMMS]: (joint distribuion on leaves a, b, c) σ Pr[z = σ] Pr[a z = σ] Pr[b z = σ] Pr[c z = σ] [Pure Topic Models/LDA]: (joint distribuion on first three words) j Pr[topic = j] A j A j A j [Community Detec=on]: (couning stars) j Pr[C x = j] (C A Π) j (C B Π) j (C C Π) j
86 OUTLINE The focus of this tutorial is on Algorithms/ApplicaIons/Models for tensor decomposiions Part I: Algorithms The RotaIon Problem Jennrich s Algorithm Part II: Applica=ons PhylogeneIc ReconstrucIon Pure Topic Models Part III: Smoothed Analysis Overcomplete Problems Kruskal Rank and the Khatri- Rao Product
87 So far, Jennrich s algorithm has been the key but it has a crucial limitaion.
88 So far, Jennrich s algorithm has been the key but it has a crucial limitaion. Let R T = a i a i a i i = 1 where {a i } are n- dimensional vectors
89 So far, Jennrich s algorithm has been the key but it has a crucial limitaion. Let T = a i a i a i i = 1 where {a i } are n- dimensional vectors R Ques=on: What if R is much larger than n?
90 So far, Jennrich s algorithm has been the key but it has a crucial limitaion. Let R T = a i a i a i i = 1 where {a i } are n- dimensional vectors Ques=on: What if R is much larger than n? This is called the overcomplete case e.g. the number of factors is much larger than the number of observaions
91 So far, Jennrich s algorithm has been the key but it has a crucial limitaion. Let R T = a i a i a i i = 1 where {a i } are n- dimensional vectors Ques=on: What if R is much larger than n? This is called the overcomplete case e.g. the number of factors is much larger than the number of observaions In such cases, why stop at third- order tensors?
92 Consider a sixth- order tensor T: R T = a i a i a i i = 1 a i a i a i
93 Consider a sixth- order tensor T: R T = a i a i a i i = 1 a i a i a i Ques=on: Can we find its factors, even if R is much larger than n?
94 Consider a sixth- order tensor T: R T = a i a i a i i = 1 a i a i a i Ques=on: Can we find its factors, even if R is much larger than n? Let s flaven it: R flat(t) = b i b i b i i = 1 (where b i = a i a KR i ) n 2 - dimensional vector whose (j,k) th entry is the product of the j th and k th entries of a i Khatri- Rao product
95 Consider a sixth- order tensor T: R T = a i a i a i i = 1 a i a i a i Ques=on: Can we find its factors, even if R is much larger than n? Let s flaven it by rearranging its entries into a third- order tensor: R flat(t) = b i b i b i i = 1 (where b i = a i a KR i ) n 2 - dimensional vector whose (j,k) th entry is the product of the j th and k th entries of a i Khatri- Rao product
96 Ques=on: Can we apply Jennrich s Algorithm to flat(t)?
97 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i a KR i linearly independent?
98 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #1: a KR i linearly independent? Let {a i } be all ( n ) 2 vectors with exactly two ones
99 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #1: a KR i linearly independent? Let {a i } be all ( n ) 2 vectors with exactly two ones Then {b i } are vectorizaions of: =
100 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #1: a KR i linearly independent? Let {a i } be all ( n ) 2 Then {b i } are vectorizaions of: vectors with exactly two ones Non- zero only in b i =
101 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #1: a KR i linearly independent? Let {a i } be all ( n ) 2 Then {b i } are vectorizaions of: vectors with exactly two ones Non- zero only in b i = and are linearly independent
102 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i a KR i linearly independent?
103 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #2: a KR i linearly independent? Let {a 1 n } and {a n+1..2n } be two random orthonormal bases
104 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #2: a KR i linearly independent? Let {a 1 n } and {a n+1..2n } be two random orthonormal bases Then there is a linear dependence with 2n terms:
105 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #2: a KR i linearly independent? Let {a 1 n } and {a n+1..2n } be two random orthonormal bases Then there is a linear dependence with 2n terms: n i = 1 a i a i KR 2n - a i KR a i i = n+1 = 0
106 Ques=on: Can we apply Jennrich s Algorithm to flat(t)? When are the new factors b i = a i Example #2: a KR i linearly independent? Let {a 1 n } and {a n+1..2n } be two random orthonormal bases Then there is a linear dependence with 2n terms: n i = 1 a i a i KR 2n - a i KR a i i = n+1 = 0 (as matrices, both sum to the idenity)
107 THE KRUSKAL RANK
108 THE KRUSKAL RANK Defini=on: The Kruskal rank (k- rank) of {b i } is the largest k s.t. every set of k vectors is linearly independent
109 THE KRUSKAL RANK Defini=on: The Kruskal rank (k- rank) of {b i } is the largest k s.t. every set of k vectors is linearly independent b i = a i a KR i k- rank({a i }) = n
110 THE KRUSKAL RANK Defini=on: The Kruskal rank (k- rank) of {b i } is the largest k s.t. every set of k vectors is linearly independent b i = a i a i KR k- rank({a i }) = n Example #1: k- rank({b i }) = R = ( n ) 2
111 THE KRUSKAL RANK Defini=on: The Kruskal rank (k- rank) of {b i } is the largest k s.t. every set of k vectors is linearly independent b i = a i a i KR k- rank({a i }) = n Example #1: k- rank({b i }) = R = ( n ) 2 Example #2: k- rank({b i }) = 2n- 1
112 THE KRUSKAL RANK Defini=on: The Kruskal rank (k- rank) of {b i } is the largest k s.t. every set of k vectors is linearly independent b i = a i a i KR k- rank({a i }) = n Example #1: k- rank({b i }) = R = ( n ) 2 Example #2: k- rank({b i }) = 2n- 1 The Kruskal rank always adds under the Khatri- Rao product, but someimes it mul=plies and that can allow us to handle R >> n
113 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product
114 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Proof: The set of {a i } where b i = a i is measure zero a KR i and det({b i }) = 0
115 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Proof: The set of {a i } where b i = a i is measure zero a KR i and det({b i }) = 0 But this yields a very weak bound on the condi=on number of {b i }
116 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Proof: The set of {a i } where b i = a i is measure zero a KR i and det({b i }) = 0 But this yields a very weak bound on the condi=on number of {b i } which is what we need to apply it to learning/staisics, where we have an esimate to T
117 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product
118 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Defini=on: The robust Kruskal rank (k- rank γ ) of {b i } is the largest k s.t. every set of k vector has condiion number at most O(γ)
119 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Defini=on: The robust Kruskal rank (k- rank γ ) of {b i } is the largest k s.t. every set of k vector has condiion number at most O(γ) [Bhaskara, Charikar, Vijayaraghavan, 2013]: The robust Kruskal rank always under the Khatri- Rao product
120 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Defini=on: The robust Kruskal rank (k- rank γ ) of {b i } is the largest k s.t. every set of k vector has condiion number at most O(γ) [Bhaskara, Charikar, Vijayaraghavan, 2013]: The robust Kruskal rank always under the Khatri- Rao product [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed
121 [Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank muliplies under the Khatri- Rao product Defini=on: The robust Kruskal rank (k- rank γ ) of {b i } is the largest k s.t. every set of k vector has condiion number at most O(γ) [Bhaskara, Charikar, Vijayaraghavan, 2013]: The robust Kruskal rank always under the Khatri- Rao product [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)
122 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)
123 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ) Hence we can apply Jennrich s Algorithm to flat(t) with R >> n
124 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ) Hence we can apply Jennrich s Algorithm to flat(t) with R >> n Note: These bounds are easy to prove with inverse polynomial failure probability, but then γ depends δ
125 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ) Hence we can apply Jennrich s Algorithm to flat(t) with R >> n Note: These bounds are easy to prove with inverse polynomial failure probability, but then γ depends δ This can be extended to any constant order Khatri- Rao product
126 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ) Hence we can apply Jennrich s Algorithm to flat(t) with R >> n
127 [Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {a i } are ε- perturbed. Then k- rank γ ({b i }) = R for R = n 2 /2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ) Hence we can apply Jennrich s Algorithm to flat(t) with R >> n Sample applica=on: Algorithm for learning mixtures of n O(1) spherical Gaussians in R n, if their means are ε- perturbed This was also obtained independently by [Anderson, Belkin, Goyal, Rademacher, Voss, 2014]
128 Any QuesIons? Summary: Tensor decomposiions are unique under much more general condiions, compared to matrix decomposiions Jennrich s Algorithm (rediscovered many Imes!), and its many applicaions in learning/staisics Introduced new models to study overcomplete problems (R >> n) Are there algorithms for order- k tensors that work with R = n 0.51 k?
ECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationA Robust Form of Kruskal s Identifiability Theorem
A Robust Form of Kruskal s Identifiability Theorem Aditya Bhaskara (Google NYC) Joint work with Moses Charikar (Princeton University) Aravindan Vijayaraghavan (NYU Northwestern) Background: understanding
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationSupplemental for Spectral Algorithm For Latent Tree Graphical Models
Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable
More informationComputational Lower Bounds for Statistical Estimation Problems
Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK
More informationTHE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING
THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way
More informationarxiv: v1 [math.ra] 13 Jan 2009
A CONCISE PROOF OF KRUSKAL S THEOREM ON TENSOR DECOMPOSITION arxiv:0901.1796v1 [math.ra] 13 Jan 2009 JOHN A. RHODES Abstract. A theorem of J. Kruskal from 1977, motivated by a latent-class statistical
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationA concise proof of Kruskal s theorem on tensor decomposition
A concise proof of Kruskal s theorem on tensor decomposition John A. Rhodes 1 Department of Mathematics and Statistics University of Alaska Fairbanks PO Box 756660 Fairbanks, AK 99775 Abstract A theorem
More informationarxiv: v1 [cs.lg] 13 Aug 2013
When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity arxiv:1308.2853v1 [cs.lg] 13 Aug 2013 Animashree Anandkumar, Daniel Hsu, Majid Janzamin
More informationHidden Markov Models. Algorithms for NLP September 25, 2014
Hidden Markov Models Algorithms for NLP September 25, 2014 Quick Review WFSAs DeterminisIc (for scoring strings) Best path algorithm for WFSAs General Case: Ambiguous WFSAs Ambiguous WFSAs A string may
More informationCS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery
CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we
More informationLearning Mixtures of Gaussians in High Dimensions
Learning Mixtures of Gaussians in High Dimensions [Extended Abstract] Rong Ge Microsoft Research, New England Qingqing Huang MIT, EECS Sham M. Kakade Microsoft Research, New England ABSTRACT Efficiently
More informationSmoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons
Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons Nima Anari Computer Science Stanford University anari@cs.stanford.edu Constantinos Daskalakis EECS MIT costis@csail.mit.edu
More informationTensor Analysis. Topics in Data Mining Fall Bruno Ribeiro
Tensor Analysis Topics in Data Mining Fall 2015 Bruno Ribeiro Tensor Basics But First 2 Mining Matrices 3 Singular Value Decomposition (SVD) } X(i,j) = value of user i for property j i 2 j 5 X(Alice, cholesterol)
More informationLinear Algebra (Review) Volker Tresp 2018
Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c
More informationlinearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice
3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is
More information(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =
. (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)
More informationSpectral Methods for Learning Multivariate Latent Tree Structure
Spectral Methods for Learning Multivariate Latent Tree Structure Animashree Anandkumar UC Irvine a.anandkumar@uci.edu Kamalika Chaudhuri UC San Diego kamalika@cs.ucsd.edu Daniel Hsu Microsoft Research
More informationEfficient algorithms for estimating multi-view mixture models
Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New England Outline Multi-view mixture models Multi-view method-of-moments Some applications and open questions
More informationQuantum Computing Lecture 2. Review of Linear Algebra
Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces
More informationIdentifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints
Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Anima Anandkumar U.C. Irvine Joint work with Daniel Hsu, Majid Janzamin Adel Javanmard and Sham Kakade.
More informationLinear Algebra in Actuarial Science: Slides to the lecture
Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations
More informationIntroduc)on to Bayesian methods (con)nued) - Lecture 16
Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of
More informationAppendix A. Proof to Theorem 1
Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationLearning Tractable Graphical Models: Latent Trees and Tree Mixtures
Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling
More informationDonald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016
Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationMath 1553, Introduction to Linear Algebra
Learning goals articulate what students are expected to be able to do in a course that can be measured. This course has course-level learning goals that pertain to the entire course, and section-level
More informationCS 143 Linear Algebra Review
CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see
More informationGuaranteed Learning of Latent Variable Models through Spectral and Tensor Methods
Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationEfficient Spectral Methods for Learning Mixture Models!
Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.
More information1 Last time: least-squares problems
MATH Linear algebra (Fall 07) Lecture Last time: least-squares problems Definition. If A is an m n matrix and b R m, then a least-squares solution to the linear system Ax = b is a vector x R n such that
More informationLearning mixtures of spherical Gaussians: moment methods and spectral decompositions
Learning mixtures of spherical Gaussians: moment methods and spectral decompositions Daniel Hsu and Sham M. Kakade Microsoft Research New England October 8, 01 Abstract This work provides a computationally
More informationLearning convex bodies is hard
Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given
More informationLearning Nonsingular Phylogenies and Hidden Markov Models
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2006 Learning Nonsingular Phylogenies and Hidden Markov Models Elchanan Mossel University of Pennsylvania Sébastien
More informationContents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2
Contents Preface for the Instructor xi Preface for the Student xv Acknowledgments xvii 1 Vector Spaces 1 1.A R n and C n 2 Complex Numbers 2 Lists 5 F n 6 Digression on Fields 10 Exercises 1.A 11 1.B Definition
More informationarxiv: v4 [cs.lg] 13 Nov 2014
Journal of Machine Learning Research 15 (2014) 2773-2832 Submitted 2/13; Revised 3/14; Published 8/14 Tensor Decompositions for Learning Latent Variable Models arxiv:1210.7559v4 [cs.lg] 13 Nov 2014 Animashree
More informationTherefore, A and B have the same characteristic polynomial and hence, the same eigenvalues.
Similar Matrices and Diagonalization Page 1 Theorem If A and B are n n matrices, which are similar, then they have the same characteristic equation and hence the same eigenvalues. Proof Let A and B be
More informationMinimal Realization Problems for Hidden Markov Models
Minimal Realization Problems for Hidden Markov Models Qingqing Huang, Rong Ge, Sham Kakade, Munther Dahleh 1 arxiv:14113698v2 [cslg] 14 Dec 2015 Abstract This paper addresses two fundamental problems in
More informationBEYOND MATRIX COMPLETION
BEYOND MATRIX COMPLETION ANKUR MOITRA MASSACHUSETTS INSTITUTE OF TECHNOLOGY Based on joint work with Boaz Barak (MSR) Part I: RecommendaIon systems and parially observed matrices THE NETFLIX PROBLEM movies
More informationLecture 9 : Identifiability of Markov Models
Lecture 9 : Identifiability of Markov Models MATH285K - Spring 2010 Lecturer: Sebastien Roch References: [SS03, Chapter 8]. Previous class THM 9.1 (Uniqueness of tree metric representation) Let δ be a
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationSmoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons
Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons Nima Anari Computer Science Stanford University anari@cs.stanford.edu Constantinos Daskalakis EECS MIT costis@csail.mit.edu
More information1 Introduction The j-state General Markov Model of Evolution was proposed by Steel in 1994 [14]. The model is concerned with the evolution of strings
Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model Mary Cryan Leslie Ann Goldberg Paul W. Goldberg. July 20, 1998 Abstract The j-state General Markov Model of evolution
More informationMidterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015
Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 The test lasts 1 hour and 15 minutes. No documents are allowed. The use of a calculator, cell phone or other equivalent electronic
More information[3] (b) Find a reduced row-echelon matrix row-equivalent to ,1 2 2
MATH Key for sample nal exam, August 998 []. (a) Dene the term \reduced row-echelon matrix". A matrix is reduced row-echelon if the following conditions are satised. every zero row lies below every nonzero
More informationMATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL
MATH 3 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL MAIN TOPICS FOR THE FINAL EXAM:. Vectors. Dot product. Cross product. Geometric applications. 2. Row reduction. Null space, column space, row space, left
More informationSolutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015
Solutions to Final Practice Problems Written by Victoria Kala vtkala@math.ucsb.edu Last updated /5/05 Answers This page contains answers only. See the following pages for detailed solutions. (. (a x. See
More informationRobustness Meets Algorithms
Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately
More informationCS4670: Computer Vision Kavita Bala. Lecture 7: Harris Corner Detec=on
CS4670: Computer Vision Kavita Bala Lecture 7: Harris Corner Detec=on Announcements HW 1 will be out soon Sign up for demo slots for PA 1 Remember that both partners have to be there We will ask you to
More informationRobust Statistics, Revisited
Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown
More informationUnit 5. Matrix diagonaliza1on
Unit 5. Matrix diagonaliza1on Linear Algebra and Op1miza1on Msc Bioinforma1cs for Health Sciences Eduardo Eyras Pompeu Fabra University 218-219 hlp://comprna.upf.edu/courses/master_mat/ We have seen before
More informationMA 265 FINAL EXAM Fall 2012
MA 265 FINAL EXAM Fall 22 NAME: INSTRUCTOR S NAME:. There are a total of 25 problems. You should show work on the exam sheet, and pencil in the correct answer on the scantron. 2. No books, notes, or calculators
More informationProblem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show
MTH 0: Linear Algebra Department of Mathematics and Statistics Indian Institute of Technology - Kanpur Problem Set Problems marked (T) are for discussions in Tutorial sessions (T) If A is an m n matrix,
More informationTensors: a geometric view Open lecture November 24, 2014 Simons Institute, Berkeley. Giorgio Ottaviani, Università di Firenze
Open lecture November 24, 2014 Simons Institute, Berkeley Plan of the talk Introduction to Tensors A few relevant applications Tensors as multidimensional matrices A (complex) a b c tensor is an element
More informationAnkur Moitra. c Draft date March 30, Algorithmic Aspects of Machine Learning
Algorithmic Aspects of Machine Learning Ankur Moitra c Draft date March 30, 2014 Algorithmic Aspects of Machine Learning 2015 by Ankur Moitra. Note: These are unpolished, incomplete course notes. Developed
More informationMATH 304 Linear Algebra Lecture 34: Review for Test 2.
MATH 304 Linear Algebra Lecture 34: Review for Test 2. Topics for Test 2 Linear transformations (Leon 4.1 4.3) Matrix transformations Matrix of a linear mapping Similar matrices Orthogonality (Leon 5.1
More informationMATH 320: PRACTICE PROBLEMS FOR THE FINAL AND SOLUTIONS
MATH 320: PRACTICE PROBLEMS FOR THE FINAL AND SOLUTIONS There will be eight problems on the final. The following are sample problems. Problem 1. Let F be the vector space of all real valued functions on
More informationLecture 4. CP and KSVD Representations. Charles F. Van Loan
Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD Representations Charles F. Van Loan Cornell University CIME-EMS Summer School June 22-26, 2015 Cetraro, Italy Structured Matrix
More informationChapter Two Elements of Linear Algebra
Chapter Two Elements of Linear Algebra Previously, in chapter one, we have considered single first order differential equations involving a single unknown function. In the next chapter we will begin to
More informationAlgebraic Statistics Tutorial I
Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =
More informationFinal Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015
Final Review Written by Victoria Kala vtkala@mathucsbedu SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015 Summary This review contains notes on sections 44 47, 51 53, 61, 62, 65 For your final,
More informationLecture 6. Numerical methods. Approximation of functions
Lecture 6 Numerical methods Approximation of functions Lecture 6 OUTLINE 1. Approximation and interpolation 2. Least-square method basis functions design matrix residual weighted least squares normal equation
More informationLearning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions
Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions [Extended Abstract] ABSTRACT Daniel Hsu Microsoft Research New England dahsu@microsoft.com This work provides a computationally
More informationA Polynomial Time Algorithm for Lossy Population Recovery
A Polynomial Time Algorithm for Lossy Population Recovery Ankur Moitra Massachusetts Institute of Technology joint work with Mike Saks A Story A Story A Story A Story A Story Can you reconstruct a description
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationProblem # Max points possible Actual score Total 120
FINAL EXAMINATION - MATH 2121, FALL 2017. Name: ID#: Email: Lecture & Tutorial: Problem # Max points possible Actual score 1 15 2 15 3 10 4 15 5 15 6 15 7 10 8 10 9 15 Total 120 You have 180 minutes to
More informationLinear Algebra review Powers of a diagonalizable matrix Spectral decomposition
Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2018 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing
More informationKey words. computational learning theory, evolutionary trees, PAC-learning, learning of distributions,
EVOLUTIONARY TREES CAN BE LEARNED IN POLYNOMIAL TIME IN THE TWO-STATE GENERAL MARKOV MODEL MARY CRYAN, LESLIE ANN GOLDBERG, AND PAUL W. GOLDBERG. Abstract. The j-state General Markov Model of evolution
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationOn Finding Planted Cliques and Solving Random Linear Equa9ons. Math Colloquium. Lev Reyzin UIC
On Finding Planted Cliques and Solving Random Linear Equa9ons Math Colloquium Lev Reyzin UIC 1 random graphs 2 Erdős- Rényi Random Graphs G(n,p) generates graph G on n ver9ces by including each possible
More informationRecall the convention that, for us, all vectors are column vectors.
Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists
More informationPreliminary Linear Algebra 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 100
Preliminary Linear Algebra 1 Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 100 Notation for all there exists such that therefore because end of proof (QED) Copyright c 2012
More information1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det
What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix
More informationDisentangling Gaussians
Disentangling Gaussians Ankur Moitra, MIT November 6th, 2014 Dean s Breakfast Algorithmic Aspects of Machine Learning 2015 by Ankur Moitra. Note: These are unpolished, incomplete course notes. Developed
More informationGuaranteed Learning of Latent Variable Models through Tensor Methods
Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of Maryland furongh@cs.umd.edu ACM SIGMETRICS Tutorial 2018 1/75 Tutorial Topic Learning algorithms for latent
More informationElementary linear algebra
Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The
More informationCAREER: Algorithmic Aspects of Machine Learning
Proposal Summary Over the past few decades an uncomfortable truth has set in that worst case analysis is not the right framework for studying machine learning: every model that is interesting enough to
More informationLinear Algebra review Powers of a diagonalizable matrix Spectral decomposition
Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2016 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing
More information3 (Maths) Linear Algebra
3 (Maths) Linear Algebra References: Simon and Blume, chapters 6 to 11, 16 and 23; Pemberton and Rau, chapters 11 to 13 and 25; Sundaram, sections 1.3 and 1.5. The methods and concepts of linear algebra
More informationBayesian networks Lecture 18. David Sontag New York University
Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence
More informationSpectral Learning of Mixture of Hidden Markov Models
Spectral Learning of Mixture of Hidden Markov Models Y Cem Sübakan, Johannes raa, Paris Smaragdis,, Department of Computer Science, University of Illinois at Urbana-Champaign Department of Electrical and
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More informationANSWERS. E k E 2 E 1 A = B
MATH 7- Final Exam Spring ANSWERS Essay Questions points Define an Elementary Matrix Display the fundamental matrix multiply equation which summarizes a sequence of swap, combination and multiply operations,
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationCompressed Sensing and Linear Codes over Real Numbers
Compressed Sensing and Linear Codes over Real Numbers Henry D. Pfister (joint with Fan Zhang) Texas A&M University College Station Information Theory and Applications Workshop UC San Diego January 31st,
More informationTensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationLinear Algebra: Matrix Eigenvalue Problems
CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given
More informationSum-of-Squares and Spectral Algorithms
Sum-of-Squares and Spectral Algorithms Tselil Schramm June 23, 2017 Workshop on SoS @ STOC 2017 Spectral algorithms as a tool for analyzing SoS. SoS Semidefinite Programs Spectral Algorithms SoS suggests
More informationA Review of Linear Algebra
A Review of Linear Algebra Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.edu These slides are a supplement to the book Numerical Methods with Matlab: Implementations
More informationIntroduction to Numerical Linear Algebra II
Introduction to Numerical Linear Algebra II Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA 1 / 49 Overview We will cover this material in
More informationCS 10: Problem solving via Object Oriented Programming Winter 2017
CS 10: Problem solving via Object Oriented Programming Winter 2017 Tim Pierson 260 (255) Sudikoff Day 20 PaGern RecogniIon Agenda 1. PaGern matching vs. recogniion 2. From Finite Automata to Hidden Markov
More informationMTH 2032 SemesterII
MTH 202 SemesterII 2010-11 Linear Algebra Worked Examples Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2011 ii Contents Table of Contents
More information