Exact tensor completion with sum-of-squares

Size: px

Start display at page:

Download "Exact tensor completion with sum-of-squares"

Marianna Hampton
5 years ago
Views:

1 Proceedings of Machine Learning Research vol 65:1 54, th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David Steurer Cornell University and Institute for Advanced Study, Princeton Editors: Satyen Kale, Ohan Shair and Kaalika Chaudhuri Abstract We obtain the first polynoial-tie algorith for exact tensor copletion that iproves over the bound iplied by reduction to atrix copletion. The algorith recovers an unknown 3-tensor with r incoherent, orthogonal coponents in n fro r Õn 1.5 ) randoly observed entries of the tensor. This bound iproves over the previous best one of r Õn 2 ) by reduction to exact atrix copletion. Our bound also atches the best known results for the easier proble of approxiate tensor copletion Barak & Moitra, 2015). Our algorith and analysis extends seinal results for exact atrix copletion Candes & Recht, 2009) to the tensor setting via the su-of-squares ethod. The ain technical challenge is to show that a sall nuber of randoly chosen onoials are enough to construct a degree-3 polynoial with precisely planted orthogonal global optia over the sphere and that this fact can be certified within the su-of-squares proof syste. Keywords: tensor copletion, su-of-squares ethod, seidefinite prograing, exact recovery, atrix polynoials, atrix nor bounds 1. Introduction A basic task in achine learning and signal processing is to infer issing data fro a sall nuber of observations about the data. An iportant exaple is atrix copletiton which asks to recover an unknown low-rank atrix fro a sall nuber of observed entries. This proble has any interesting applications one of the proinent original otivations was the Netflix Prize that sought iproved algoriths for predicting user ratings for ovies fro a sall nuber of user-provided ratings. After an extensive research effort Candès and Recht, 2009; Candès and Tao, 2010; Keshavan et al., 2009; Srebro and Shraiban, 2005), efficient algoriths with alost optial, provable recovery guarantees have been obtained: In order to efficiently recover an unknown incoherent n-by-n atrix of rank r it is enough to observe r Õn) rando entries of the atrix Gross, 2011; Recht, 2011). One of the reaining challenges is to obtain algorith for the ore general and uch less understood tensor copletion proble where the observations do not just consist of pairwise correlations but also higher-order ones. Algoriths and analyses for atrix and tensor copletion coe in three flavors: 1. algoriths analyzed by statistical learning tools like Radeacher coplexity Srebro and Shraiban, 2005; Barak and Moitra, 2016). 2. iterative algoriths like alternating iniization Jain et al., 2013; Hardt, 2014; Hardt and Wootters, 2014) A. Potechin & D. Steurer.

2 Potechin Steurer 3. algoriths analyzed by constructing dual certificates for convex prograing relaxations Candès and Recht, 2009; Gross, 2011; Recht, 2011). While each of these flavors have different benefits, typically only algoriths of the third flavor achieve exact recovery. The only exceptions to this rule we are aware of are a recent fast algorith for atrix copletion Jain and Netrapalli, 2015) and a recent analysis Ge et al., 2016) showing that the coonly used non-convex objective function for positive seidefinite atrix copletion has no spurious local inia and thus stochastic gradient descent and other popular optiization progras can solve positive seidefinite atrix copletion with arbitrary initialization.) For all other algoriths, the analysis exhibits a trade-off between reconstruction error and the required nuber of observations even when there is no noise in the input).1 In this work, we obtain the first algorith for exact tensor copletion that iproves over the bounds iplied by reduction to exact atrix copletion. The algorith recovers an unknown 3-tensor with r incoherent, orthogonal coponents in n fro r Õn 1.5 ) randoly observed entries of the tensor. The previous best bound for exact recovery is r Õn 2 ), which is iplied by reduction to exact atrix copletion. The reduction views 3-tensor on n as an n-by-n 2 atrix. We can recover rank-r atrices of this shape fro r Õn 2 ) saples, which is best possible.) Our bound also atches the best known results for the easier proble of approxiate tensor copletion Jain and Oh, 2014; Bhojanapalli and Sanghavi, 2015; Barak and Moitra, 2016) the results of the last work also applies to a wider range of tensors and does not require orthogonality). A proble siilar to atrix and tensor copletion is atrix and tensor sensing. The goal is to recover an unknown low rank atrix or tensor fro a sall nuber of linear easureents. An interesting phenoenon is that for carefully designed easureents which actually happen to be rank 1) it is possible to efficiently recover a 3-tensor of rank r with just Or 2 n) easureents Forbes and Shpilka, 2012), which is better than the best bounds for tensor copletion when r n 0.5. We conjecture that for tensor copletion fro rando entries the bound we obtain is up to logarithic factors best possible aong polynoial-tie algoriths. Su-of-squares ethod. Our algorith is based on su-of-squares Shor, 1987; Parrilo, 2000; Lasserre, 2000/01), a very general and powerful eta-algorith studied extensively in any scientific counities see for exaple the survey Barak and Steurer, 2014)). In theoretical coputer science, the ain research focus has been on the capabilities of su-of-squares for approxiation probles Barak et al., 2012), especially in the context of Khot s Unique Gaes Conjecture Khot, 2002). More recently, su-of-squares eerged as a general approach to inference probles that arise in achine learning and have defied other algorithic techniques. This approach has lead to iproved algoriths for tensor decoposition Barak et al., 2015; Ge and Ma, 2015; Hopkins et al., 2016; Ma et al., 2016), dictionary learning Barak et al., 2015; Hazan and Ma, 2016), tensor principal coponent analysis Hopkins et al., 2015; Raghavendra et al., 2016; Bhattiprolu et al., 2016), planted sparse vectors Barak et al., 2014; Hopkins et al., 2016). An exciting direction is also to understand liitations of su-of-squares for inference probles on concrete input distributions Ma and Wigderson, 2015; Hopkins et al., 2015; Barak et al., 2016). 1. We reark that this trade-off is a property of the analysis and not necessarily the algorith. For exaple, soe algoriths of the first flavor are based on the sae convex prograing relaxations as exact recovery algoriths. Also for iterative algorith, the trade-off between reconstruction error and nuber of saple coes fro the requireent of the analysis that each iteration uses fresh saples. For these iterative algoriths, the nuber of saples depends only logarithically on the desired accuracy, which eans that these analyses iply exact recovery if the bit coplexity of the entries is sall. 2

3 Exact tensor copletion with su-of-squares An appealing feature of the su-of-squares ethod is that its capabilities and liitations can be understood through the lens of a siple but surprisingly powerful and intuitive restricted proof syste called su-of-squares or Positivstellensatz syste Grigoriev and Vorobjov, 2001; Grigoriev, 2001a,b). A conceptual contribution of this work is to show that seinal results for inference proble like copressed sensing and atrix copletion have natural interpretations as identifiability proofs in this syste. Furtherore, we show that this interpretation is helpful in order to analyze ore challenging inference probles like tensor copletion. A proising future direction is to find ore exaples of inference probles where this lens on inference algoriths and identifiability proofs yields stronger provable guarantees. A technical contribution of our work is that we develop techniques in order to show that suof-squares achieves exact recovery. Most previous works only showed that su-of-squares gives approxiate solutions, which in soe cases can be turned to exact solutions by invoking algoriths with local convergence guarantees Ge and Ma, 2015; Barak et al., 2014) or solving successive su-of-squares relaxations Ma et al., 2016) Results We say that a vector v n is -incoherent with respect to the coordinate basis e 1,..., e n if for every index i [n], e i, v 2 n v ) We say that a 3-tensor X n n n is orthogonal of rank r if there are orthogonal vectors {u i } i [r] n, {v i } i [r] n, {w i } i [r] n such that X = r u i v i w i. We say that such a 3-tensor X is -incoherent if all of the vectors u i, v i, w i are -incoherent. Theore 1 ain) There exists a polynoial-tie algorith that given at least r O1) Õn) 1.5 rando entries of an unknown orthogonal -incoherent 3-tensor X n n n of rank r, outputs all entries of X with probability at least 1 n ω1). We note that the analysis also shows that the algorith is robust to inverse polynoial aount of noise in the input resulting in inverse polynoial aount of error in the output). We reark that the running tie of the algorith depends polynoially on the bit coplexity on X. 2. Techniques Let {u i } i [r], {v i } i [r], {w i } i [r] be three orthonoral sets in n. Consider a 3-tensor X n n n of the for X = r λ i u i v i w i with λ 1,..., λ n 0. Let Ω [n] 3 be a subset of the entries of X. Our goal is to efficiently reconstruct the unknown tensor X fro its restriction X Ω to the entries in Ω. Ignoring coputational efficiency, we first ask if this task is inforation-theoretically possible. More concretely, for a given set of observations X Ω, how can we rule out that there exists another rank-r orthogonal 3-tensor X X that would give rise to the sae observations X Ω = X Ω?2 2. We ephasize that we ask here about the uniqueness of X for a fixed set of entries Ω. This questions differs fro asking about the uniqueness for a rando set of entries, which could be answered by suitably counting the nuber of low-rank 3-tensors. 3

4 Potechin Steurer A priori it is not clear how an answer to this inforation-theoretic question could be related to the goal of obtaining an efficient algorith. However, it turns out that the su-of-squares fraework allows us to systeatically translate a uniqueness proof to an algorith that efficiently finds the solution. In addition, this solution also coes with a short certificate for uniqueness.3) Uniqueness proof. Let Ω [n] 3 be a set of entries and let X = r λ i u i v i w i be a 3-tensor with λ 1,..., λ r 0. It turns out that the following two conditions are enough to iply that X Ω uniquely deterines X: The first condition is that the vectors {u i v i w i ) Ω } are linearly independent. The second condition is that exists a 3-linear for T on n with the following properties: 1. in the onoial basis T is supported on Ω so that Tx, y, z) = i, j,k) Ω T ijk x i y j x k, 2. evaluated over unit vectors, the 3-for T is exactly axiized at the points u i, v i, w i ) so that Tu 1, v 1, w 1 ) = = Tu r, v r, w r ) = 1 and Tx, y, z) < 1 for all unit vectors x, y, z) {u i, v i, w i ) i [r]}. We show that the two deterinistic conditions above are satisfied with high probability if the vectors {u i }, {v i }, {w i } are incoherent and Ω is a rando set of entries of size at least r Õn 1.5 ). Let us sketch the proof that such a 3-linear for T indeed iplies uniqueness. Concretely, we clai that if we let X be a 3-tensor of the for r λ i u i v i w i for λ 1,..., λ r 0 and unit vectors {u i }, {v i }, {w i } with X Ω = X Ω that iniizes r λ i then X = X ust hold. We identify T with an eleent of n n n the coefficient tensor of T in the onoial basis). Let X be as before. We are to show that X = X. On the one hand, using that Tx, y, z) 1 for all unit vectors x, y, z, r r T, X = λ i Tu i, v i, w i ) λ i. At the sae tie, using that T is supported on Ω and the fact that X Ω = X Ω, T, X = T, X = r λ i Tu i, v i, w i ) = r λ i. Since X iniizes r λ i, equality has to hold in the previous inequality. It follows that every point u i, v i, w i ) is equal to one of the points u j, v j, w j ), because T is uniquely axiized at the points {u i, v i, w i ) i [r]}. Since we assued that {u i v i w i ) Ω } is linearly independent, we can conclude that X = X. When we show that such a 3-linear for T exists, we will actually show soething stronger, naely that the second property is not only true but also has a short certificate in for of a degree-4 su-of-squares proof, which we describe next. This certificate also enables us to efficiently recover the issing tensor entries. 3. This certificate is closely related to certificates in the for of dual solutions for convex prograing relaxations that are used in the copressed sensing and atrix copletion literature. 4

5 Exact tensor copletion with su-of-squares Uniqueness proof in the su-of-squares syste. A degree-4 sos certificate for the second property of T is an n + n 2 )-by-n + n 2 ) positive-seidefinite atrix M acting as a linear operator on n n n )) that represents the polynoial x 2 + y 2 z 2 2Tx, y, z), i.e., x, y z), Mx, y z) = x 2 + y 2 z 2 2Tx, y, z). 2.1) Furtherore, we require that the kernel of M is precisely the span of the vectors {u i, v i w i ) i [r]}. Let s see that this atrix M certifies that T has the property that over unit vectors it is exactly axiized at the desired points u i, v i, w i ). Let u, v, w be unit vectors such that u, v, w) is not a ultiple of one of the vectors u i, v i, w i ). Then by orthogonality, both u, v w) and u, v w) have non-zero projection on the orthogonal copleent of the kernel of M. Therefore, the bounds 0 < u, v w), Mu, v w) = 2 2pu, v, w) and 0 < u, v w), M u, v w) = 2 + 2pu, v, w) together give the desired conclusion that Tu, v, w) < 1. Reconstruction algorith based on the su-of-squares syste. The existence of a positive seidefinite atrix M as above not only eans that reconstruction of X fro X Ω is possible inforation-theoretically but also efficiently. The su-of-squares algorith allows us to efficiently search over low-degree oents of objects called pseudo-distributions that generalize probability distributions over real vector spaces. Every pseudo-distribution defines pseudo-expectation values f for all low-degree polynoial functions f x, y, z), which behave in any ways like expectation values under an actual probability distribution. In order to reconstruct X fro the observations X Ω, we use the su-of-squares algorith to efficiently find a pseudo-distribution that satisfies4 x, y,z) x 2 + y 2 z ) ) x y z = X Ω 2.3) x, y,z) Note that the distribution over the vectors u i, v i, w i ) with probabilities λ i satisfies the above conditions. Our previous discussion about uniqueness shows that the existence of a positive seidefinite atrix M as above iplies no other distribution satisfies the above conditions. It turns out that the atrix M iplies that this uniqueness holds even aong pseudo-distributions in the sense that any pseudodistribution that satisfies Eqs. 2.2) and 2.3) ust satisfy x, y,z) x y z = X, which eans that the reconstruction is successful.5 When do such uniqueness certificates exist? The above discussion shows that in order to achieve reconstruction it is enough to show that uniqueness certificates of the for above exist. We show that these certificates exists with high probability if we choose Ω to be a large enough rando subset of entries under suitable assuptions on X). Our existence proof is based on a randoized procedure to construct such a certificate heavily inspired by siilar constructions for atrix copletion Gross, 2011; Recht, 2011). We note that this construction uses the unknown tensor X and is therefore not constructive in the context of the recovery proble.) 4. The viewpoint in ters of pseudo-distributions is useful to see how the previous uniqueness proof relates to the algorith. We can also describe the solutions to the constraints Eqs. 2.2) and 2.3) in ters of linearly constrained positive seidefinite atrices. See alternative description of Algorith 2 5. The atrix M can also be viewed as a solution to the dual of the convex optiization proble of finding a pseudo-distribution that satisfies conditions Eqs. 2.2) and 2.3). Ω 5

6 Potechin Steurer Before describing the construction, we ake the requireents on the 3-linear for T ore concrete. We identify T with the linear operator fro n n to n such that Tx, y, z) = x, Ty z). Furtherore, let T a be linear operators on n such that Tx, y, z) = n a=1 x a y, T a z. Then, the following conditions on T iply the existence of a uniqueness certificate M which also eans that recover succeeds), 1. every unknown entry i, j, k) Ω satisfies e i, Te j e k ) = 0, 2. every index i [r] satisfies u i = Tv i w i ), 3. the atrix n a=1 T a T a r v i w i )v i w i ) has spectral nor at ost We note that the uniqueness certificates for atrix copletion Gross, 2011; Recht, 2011) have siilar requireents. The key difference is that we need to control the spectral nor of an operator that depends quadratically on the constructed object T as opposed to a linear dependence in the atrix copletion case). Cobined with the fact that the construction of T is iterative about log n steps), the spectral nor bound unfortunately requires significant technical work. In particular, we cannot apply general atrix concentration inequalities and instead apply the trace oent ethod. See Section 5.) We also note that the fact that the above requireents allow us to construct the certifcate M is not iediate and requires soe new ideas about atrix representations of polynoials, which ight be useful elsewhere. See Appendix A.) Finally, we note that the transforation applied to T in order to obtain the atrix for the third condition above appears in any works about 3-tensors Hopkins et al., 2015; Barak and Moitra, 2016) with the earliest appearance in a work on refutation algoriths for rando 3-SAT instances see Feige and Ofek, 2007)). The iterative construction of the linear operator T exactly follows the recipe fro atrix copletion Gross, 2011; Recht, 2011). Let R Ω be the projection operator into the linear space of operators T that satify the first requireent. Let P T be the affine) projection operator into the affine linear space of operators T that satisfy the second reqireent. We start with T 0) = X. At this point we satisfy the second condition. Also the atrix in the third condition is 0.) In order to enforce the first condition we apply the operator R Ω. After this projection, the second condition is ost likely no longer satisfied. To enforce the second condition, we apply the affine linear operator P T and obtain T 1) = P T R Ω X). The idea is to iterate this construction and show that after a logarithic nuber of iterations both the first and second condition are satisfied up to an inverse polynoially sall error which we can correct in a direct way). The ain challenge is to show that the iterates obtained in this way satisfy the desired spectral nor bound. We note that for technical reasons the construction uses fresh randoness Ω for each iteration like in the atrix copletion case Recht, 2011; Gross, 2011). Since the nuber of iterations is logarithic, the total nuber of required observations reains the sae up to a logarithic factor.) 3. Preliinaries Unless explicitly stated otherwise, O )-notation hides absolute ultiplicative constants. Concretely, every occurrence of Ox) is a placeholder for soe function f x) that satisfies x. f x) C x for soe absolute constant C > 0. Siilarly, Ωx) is a placeholder for a function gx) that satisfies x. gx) x /C for soe absolute constant C > 0. 6

7 Exact tensor copletion with su-of-squares Our algorith is based on a generalization of probability distributions over n. To define this generalization the following notation for the foral expectation of a function f on n with respect to a finitely-supported function : n, f = x) f x). x support) A degree-d pseudo-distribution over n is a finitely-supported function : n such that 1 = 1 and f 2 0 for every polynoial f of degree at ost d/2. A key algorithic property of pseudo-distributions is that their low-degree oents have an efficient separation oracle. Concretely, the set of degree-d oents 1, x) d such that is a degree-d pseudo-distributions over n has an n Od) -tie separation oracle. Therefore, standard convex optiization ethods allow us to efficiently optiize linear functions over low-degree oents of pseudo-distributions even subject to additional convex constraints that have efficient separation oracles) up to arbitrary nuerical accuracy. 4. Tensor copletion algorith In this section, we show that the following algorith for tensor copletion succeeds in recovering the unknown tensor fro partial observations assuing the existence of a particular linear operator T. We will state conditions on the unknown tensor that iply that such a linear operator exists with high probability if the observed entries are chosen at rando. We use essentially the sae convex relaxation as in Barak and Moitra, 2016) but our analysis differs significantly. Algorith 2 Exact tensor copletion based on degree-4 su-of-squares) Input: locations Ω [n] 3 and partial observations X Ω of an unknown 3-tensor X n n n. Operation: Find a degree-4 pseudo-distribution on n n n such that the third oent atches the observations x, y,z) x y z Ω = X Ω so as to iniize x, y,z) x 2 + y 2 z 2. Output the 3-tensor x, y,z) x y z n n n. Alternative description: Output a iniu trace, positive seidefinite atrix Y acting on n n n ) with blocks Y 1,1, Y 1,2 and Y 2,2 such that Y 1,2 ) Ω = X Ω atches the observations, and Y 2,2 satisfies the additional syetry constraints that each entry e j e k, Y 2,2 e j e k ) only depends on the index sets { j, j }, {k, k }. Let {u i }, {v i }, {w i } be three orthonoral sets in n, each of cardinality r. We reason about the recovery guarantees of the algorith in ters of the following notion of certifcate. Definition 3 We say that a linear operator T fro n n to n is a degree-4 certificate for Ω and orthonoral sets {u i }, {v i }, {w i } n if the following conditions are satisfies 1. the vectors {u i v j w k ) Ω i, j, k) S} are linearly independent, where S [n] 3 is the set of triples with at least two identical indices fro [r], 7

8 Potechin Steurer 2. every entry a, b, c) Ω satisfies e a, Te b e c ) = 0, 3. If we view T as a 3-tensor in n ) 3 whose a, b, c) entry is e a, Te b e c ), every index i [r] satisfies u i v i Id)T = w i, u i Id w i )T = v i, and Id v i w i )T = u i. 4. the following atrix has spectral nor at ost 0.01, n r T a T a v i w i )v i w i ), a=1 where {T a } are atrices such that x, Ty x) = n a=1 x a y, T a z. In Section 4.4, we prove that existence of such certifcates iplies that the above algorith successfully recovers the unknown tensor, as foralized by the following theore. Theore 4 Let X n n n be any 3-tensor of the for r λ i u i v i w i for λ 1,..., λ r +. Let Ω [n] 3 be a subset of indices. Suppose there exists degree-4 certificate in the sense of Definition 3. Then, given the observations X Ω the above algorith recovers the unknown tensor X exactly. In Section 4.5, we show that degree-4 certificates are likely to exist when Ω is a rando set of appropriate size. Theore 5 Let {u i }, {v i }, {w i } be three orthonoral sets of -incoherent vectors in n, each of cardinality r. Let Ω [n] 3 be a rando set of tensor entries of cardinality = r n 1.5 log n) C for an absolute constant C 1. Then, with probability 1 n ω1), there exists a linear operator T that satisfies the requireents of Definition 3. Taken together the two theores above iply our ain result Theore Sipler proofs via higher-degree su-of-squares Unfortunately the proof of Theore 5 requires extreely technical spectral nor bounds for rando atrices. It turns out that less technical nor bounds suffice if we use degree 6 su-of-squares relaxations. For this ore powerful algorith, weaker certificates are enough to ensure exact recovery and the proof that these weaker certificates exist with high probability is considerably easier than the proof that degree-4 certificates exist with high probability. In the following we describe this weaker notion of certificates and state their properties. In the subsequent sections we prove properties of these certificates are enough to iply our ain result Theore 1. Algorith 6 Exact tensor copletion based on higher-degree su-of-squares) Input: locations Ω [n] 3 and partial observations X Ω of an unknown 3-tensor X n n n. Operation: Find a degree-6 pseudo-distribution on n n n so as to iniize x, y,z) x 2 + z 2 subject to the following constraints ) x y z x, y,z) Ω = X Ω, 4.1) 8

9 Exact tensor copletion with su-of-squares x, y,z) y 2 1) px, y, z) = 0 for all px, y, z) [x, y, z] ) Output the 3-tensor x, y,z) x y z n n n. Let {u i }, {v i }, {w i } be three orthonoral sets in n, each of cardinality r. We reason about the recovery guarantees of the above algorith in ters of the following notion of certificate. The ain difference to degree-4 certificate Definition 3) is that the spectral nor condition is replaced by a condition in ters of su-of-squares representations. Definition 7 We say that a 3-tensor T n ) 3 is a higher-degree certificate for Ω and orthonoral sets {u i }, {v i }, {w i } n if the following conditions are satisfies 1. the vectors {u i v i w i ) Ω } i [r] are linearly independent, 2. every entry a, b, c) Ω satisfies T, e a e b e c ) = 0, 3. every index i [r] satisfies u i v i Id)T = w i, u i Id w i )T = v i, and Id v i w i )T = u i, 4. the following degree-4 polynoials in [x, y, z] are su of squares x 2 + y 2 z 2 1/ε T, x y z, 4.3) y 2 + x 2 z 2 1/ε T, x y z, 4.4) z 2 + x 2 y 2 1/ε T, x y z. 4.5) where T = T r u i v i w i and ε > 0 is an absolute constant say ε = 10 6 ). In the following sections we prove that higher-degree certificates iply that Algorith 6 successfully recovers the desired tensor and that they exist with high probability for rando Ω of appropriate size Higher-degree certificates iply exact recovery Let {u i }, {v i }, {w i } be orthonoral bases in n. We say that a degree-l pseudo-distribution x, y, z) satisfies the constraint y 2 = 1, denoted = { y 2 = 1}, if x, y,z) px, y, z) 1 y 2 ) = 0 for all polynoials p [x, y, z] l 2 We are to show that a higher-degree certificate in the sense of Definition 7 iplies that Algorith 6 reconstructs the partially observed tensor exactly. A key step of this proof is the following lea about expectation values of higher degree pseudo-distributions. Lea 8 Let T n ) 3 be a higher-degree certificate as in Definition 7 for the set Ω [n] 3 and the vectors {u i } i [r], {v i } i [r], {w i } i [r]. Then, every degree-6 pseudo-distribution x, y, z) with = { y 2 = 1} satisfies x Tx, y, z) 2 + z 2 1 x, y,z) x, y,z) n u i, x 2 + w i, z 2 ) i=r+1 n j [n]\{i} v i, y 2 u j, x 2 + w j, z 2) 4.6)

10 Potechin Steurer To prove this lea it will be useful to introduce the su-of-squares proof syste. Before doing that let us observe that the lea indeed allows us to prove that Algorith 6 works. Theore 9 Higher-degree certificates iply exact recovery) Suppose there exists a higherdegree certificate T in the sense of Definition 7 for the set Ω [n] 3 and the vectors {u i } i [r], {v i } i [r], {w i } i [r]. Then, Algorith 6 recovers the partially observed tensor exactly. In other words, if X = r λ i u i v i w i with λ 1,..., λ r 0 and x, y, z) is a degree- 6 pseudo-distribution with = { y 2 1 = 1} that iniizes x, y,z) 2 x 2 + z 2 ) subject to x, y,z) x y z) Ω = X Ω, then x, y,z) x y z = X. Proof Consider the distribution over vectors x, y, z) such that λ i n u i, v i, λ i n w i ) has probability 1/n. By construction, x, y,z) x y z = X. We have Tx, y, z) = x, y,z) r Tx, y, z) = λ i = x, y,z) By Lea 8 and the optiality of, it follows that 1 x, y,z) 100 n u i, x 2 + w i, z 2 ) + 1 i=r n j [n]\{i} 1 x, y,z) 2 x 2 + z 2 ). v i, y 2 u j, x 2 + w j, z 2) = 0 Since the suands on the left-hand side are squares it follows that each suand has pseudo-expectation 0. It follows that u i, x 2 = v i, y 2 = w i, z 2 = 0 for all i > r and v i, y 2 u j, x 2 = v i, y 2 w j, z 2 = 0 for all i j. By the Cauchy Schwarz inequality for pseudo-expectations, it follows that x, y,z) x y z, u i v j w k = 0 unless i = j = k [r]. Consequently, x, y,z) x y z is a linear cobination of the vectors {u i v i w i i [r]}. Finally, the linear independence of the vectors {u i v i w i ) Ω i [r]} iplies that x y z = X as desired. It reains to prove Lea 8. Here it is convenient to use foral notation for su-of-squares proofs. We will work with polynoials [x, y, z] and the polynoial equation A = { y 2 = 1}. For p [x, y, z], we say that there exists a degree-l SOS proof that A iplies p 0, denoted A l p 0, if there exists a polynoial q [x, y, z] of degree at ost l 2 such that p+q 1 y 2 ) is a su of squares of polynoials. This notion proof allows us to reason about pseudo-distributions. In particular, if A l p 0 then every degree-l pseudo-distribution with = A satisfies p 0. We will change coordinates such that u i = v i = w i = e i is the i-th coordinate vector for every i [n]. Then, the conditions on T in Definition 7 iply that T, x y z) = r x i y i z i + T x, y, z), 4.7) where T is a 3-linear for with the property that T x, x, x) does not contain squares i.e. is ultilinear). Furtherore, the conditions iply the following SOS proofs for T : 1. 4 T x, y, z) ε x + y 2 z 2, 2. 4 T x, y, z) ε y + x 2 z 2, 3. 4 T x, y, z) ε z + x 2 y 2. The following lea gives an upper bound on one of the parts in Eq. 4.7). 10

11 Exact tensor copletion with su-of-squares Lea 10 For A = { y 2 = 1}, the following inequality has a degree-6 su-of-squares proof, A 6 r x i y i z i 1 2 x z n x 2 i + z2 i ) i=r ij yi 2 x 2 j + z2 j + y2 j x 2 + z 2 ) ). 4.8) Proof We bound the left-hand side in the lea as follows, r r A 6 x i y i z i 1 2 x2 i y2 i z2 i ) 4.9) 1 2 x x 2 i i>r n yi 2 z2 i. 4.10) We can further bound i y 2 i z2 i A 6 as follows, n yi 2 z2 i = n y 2 i n z 2 i yi 2 z 2 j ij 4.11) = n z 2 i yi 2 z 2 j. ij 4.12) We can prove a different bound on i y 2 i z2 i as follows, A 6 n yi 2 z2 i 1 2 z z n yi 4 z2 i 4.13) n yi 4 z ) = 1 2 z n 2 y 2 i n yi 2 z yi 2 yj 2 z ) ij = z yi 2 yj 2 z ) ij By cobining these three inequalities, we obtain the inequality A 6 r x i y i z i 1 2 x z x 2 i 1 2 yi 2 z 2 j 1 4 yi 2 yj 2 z 2. i>r ij ij By syetry between z and x, the sae inequality holds with x and z exchanged. Cobining these syetric inequalities, we obtain the desired inequality A 6 r x i y i z i 1 2 x z x 2 i + z2 i ) i>r 11

12 Potechin Steurer 1 4 ij yi 2 x 2 j + z2 j ) 1 8 yi 2 yj 2 x 2 + z 2 ). 4.17) ij It reains to bound the second part in Eq. 4.7), which the following lea achieves. Lea 11 A 6 T x, y, z) 3ε 2 i ji yi 2 x 2 j + z 2 j y2 j x 2 + z 2 ) ) Proof It is enough to show the following inequality for all i [n], A 6 yi 2 T x, y, z) ε ji 3 2 y2 i x2 j + z2 j ) y2 i y2 j x 2 + z 2 ) By syetry it suffices to consider the case i = 1. Let x = x x 1 e 1, y = y y 1 e 1, and z = z z 1 e 1. We observe that A 4 T x, y, z) = T x 1 e 1 + x, y 1 e 1 + y, z 1 e 1 + z ) = T x 1 e 1, y, z ) + T x, y 1 e 1, z ) + T x, y, z 1 e 1 ) + T x, y, z ) We now apply the following inequalities 1. A 4 T x 1 e 1, y, z ) ε 2 x 2 1 y 2 + z 2 ε 2 j1yj 2 x 2 + z 2 j ) 2. A 4 T x, y 1 e 1, z ) ε 2 x 2 y1 2 + z 2 ε 2 j1x 2 j + z2 j ) 3. A 4 T x, y, z 1 e 1 ) ε 2 z 2 1 y 2 + x 2 ε 2 j1yj 2 z 2 + x 2 j ) 4. A 4 T x, y, z ) ε 2 x 2 y 2 + z 2 ε 2 j1x 2 j + z2 j ) ) We can now prove Lea 8. Proof [Proof of Lea 8] Taken together, Leas 10 and 11 iply A 6 T, x y z 1 2 x z Oε)) n x 2 i + z2 i ) i=r Oε)) yi 2 x 2 j + z2 j + y2 j x 2 + z 2 ) ), 4.18) where the absolute constant hidden by O ) notation is at ost 10. Therefore for ε < 1/100, as we assued in Definition 7, we get a SOS proof of the inequality, A 6 T, x y z 1 2 x z ij n x 2 i + z2 i ) i=r+1 12

13 Exact tensor copletion with su-of-squares 1 16 yi 2 x 2 j + z2 j + y2 j x 2 + z 2 ) ), 4.19) ij This SOS proof iplies that that every degree-6 pseudo-distribution x, y, z) with = A satisfies the desired inequality, 1 T, x y z x, y,z) x, y,z) 2 x z n x 2 i + z2 i ) i=r+1 yi 2 x 2 j + z2 j + y2 j x 2 + z 2 ) ), 4.20) ij 4.3. Constructing the certificate T In this section we give a procedure for constructing the certificate T. This construction is directly inspired by the construction of the dual certificate in Gross, 2011; Recht, 2011) soeties called quantu golfing). We will then prove that T satisfies all of the conditions for a higher-degree certificate of Ω. In Section 4.5 we will show that T also satisfies the conditions for a degree-4 certificate for Ω. Let {u i }, {v i }, {w i } n be three orthonoral bases, with all vectors -incoherent. Let X = r u i v i w i. Let Ω [n] 3 chosen at rando such that each eleent is included independently with probability /n 1.5 so that Ω is tightly concentrated around ). Let P be the projector on the span of the vectors u i v j w k such that an index in [r] appears at least twice in i, j, k) i.e., at least one of the conditions i = j [r], i = k [r], j = k [r] is satisfied). Let R Ω be the linear operator on n n n that sets all entries outside of Ω to 0 so that R Ω [T]) Ω = R Ω [T]) and is scaled such that Ω R Ω = Id. Let R Ω be Id R Ω. Our goal is to construct T n n n such that P[T] = X, T) Ω = T, and the spectral nor condition in Definition 3 is satisfied. The idea for constructing T is to start with T = X. Then, ove to closest point T that satisfies R Ω [T ] = T Then, ove to closest point T that satisfies P[T ] = X and repeat. To ipleent this strategy, we define k 1 T k) = 1) j R Ωj+1 P R Ωj ) P R Ω1 )[X], 4.21) j=0 where Ω 1,..., Ω k are iid saples fro the sae distribution as Ω. By induction, we can show the following lea about linear constraints that the constructed stensors T k) satisfy. Lea 12 For every k 1, the tensor T k) satisfies T) Ω = T and P[T k) ] + 1) k PP R Ωk ) P R Ω1 )[X] = X. Here, PP R Ωk ) P R Ω1 )[X] is an error ter that decreases geoetrically. In the paraeter regie of Theore 5, the nor of this ter is n ω1) for soe k = log n) O1). 13

14 Potechin Steurer The following lea shows that it is possible to correct such sall errors. This lea also iplies that the linear independence condition in Definition 3 is satisfied with high probability. Therefore, we can ignore this condition in the following.) Lea 13 Suppose rn log n) O1). Then, with probability 1 n ω1) over the choice of Ω, the following holds: For every E n n n with P[E] = E, there exists Y with Y) Ω = Y such that P[Y] = E and Y F O1) E F. Proof Let S [n] 3 be such that P is the projector to the vectors u i v j w k with i, j, k) S. By construction of P we have S 3rn. In order to show the conclusion of the lea it is enough to show that the vectors u i v j w k ) Ω with i, j, k) S are well-conditioned in the sense that the ratio of the largest and sallest singular value is O1). This fact follows fro standard atrix concentration inequalities. See Lea 15. The ain technical challenge is to show that the construction satisfies the condition that the following degree 4 polynoials are sus of squares where T = T X). x 2 + y 2 z 2 1/ε T, x y z, 4.22) y 2 + x 2 z 2 1/ε T, x y z, 4.23) z 2 + x 2 y 2 1/ε T, x y z. 4.24) We show how to prove the first stateent, the other stateents can be proved with syetrical arguents. To prove the first stateent, we decopose T into pieces of the for R Ωl P) R Ω1 P) R Ω0 X), P R Ωl P) R Ω1 P) R Ω0 X) where P is a part of P), or E. For each piece A, we prove a nor bound a A a A T a B. Since a A a A T a represents the sae polynoial as A A, this proves that B y 2 z 2 y z) A Ay z) is a degree 4 su of squares. Now note that y z) A Ay z) Bx Ay z) By z) A x + B x 2 is also a su of squares. Cobining these equations and scaling we have that x 2 + y 2 z 2 2 x Ay z) is a degree 4 su of B squares. Thus, it is sufficient to prove nor bounds on a A a A T a. We have an appropriate bound in the case when A = E because E has very sall Frobenius nor. For the cases when A = R Ωl P) R Ω1 P) R Ω0 X) or A = P R Ωl P) R Ω1 P) R Ω0 X), we use the following theore Theore 14 Let A = R Ωl P) R Ω1 P) R Ω0 X) or P R Ωl P) R Ω1 P) R Ω0 X) where P is a part of P. There is an absolute constant C such that for any α > 1 and β > 0, A a A T a > α l+1) < n β a as long as > Cα β 3 2 rn 1.5 logn) and > Cα β 2 rn logn). Proof This theore follows directly fro cobining Proposition 29, Theore 33, Theore 42, and Theore

15 Exact tensor copletion with su-of-squares Final correction of error ters In this section, we prove a spectral nor bound that allows us to correct error ters that are left at the end of the construction. The proof uses the by now standard Matrix Bernstein concentration inequality. Siilar proofs appear in the atrix copletion literature Gross, 2011; Recht, 2011). Let {u i }, {v i }, {w i } n be three orthonoral bases, with all vectors -incoherent. Let Ω [n] 3 be entires sapled uniforly at rando with replaceent. This sapling odel is different fro what is used in the rest of the proof. However, it is well known that the odels are equivalent in ters of the final recovery proble.) Lea 15 Let S [n] 3. Suppose = S log n) C for an absolute constant C 1. Then with probability 1 n ω1) over the choice of Ω, the vectors u i v j w k ) Ω for i, j, k) S are well-conditioned in the sense that the ratio between the largest and sallest singular value is at ost 1.1. Proof For s = i, j, k) S, let y s = u i v j w k. Let Ω = {ω 1,..., ω }, where ω 1,..., ω [n] 3 are sapled uniforly at rando with replaceent. Let A be the S-by-S Gra atrix of the vectors y s ) Ω. Then, A is the su of identically distributed rank-1 atrices A i, A = A i with A i ) s,s = y s ) ωi y s ) ωi. Each A i has expectation A i = n 1.5 Id and spectral nor at ost S /n 1.5. Standard atrix concentration inequalities Tropp, 2012) show that O S 2 log n) is enough to ensure that the su is spectral close to its expectation /n 1.5 ) Id in the sense that 0.99A /n 1.5 ) Id 1.1A Degree-4 certificates iply exact recovery In this section we prove Theore 4. We need the following technical lea, which we prove in Appendix A. Lea 16 Let R be self-adjoint linear operator R on n n. Suppose v j w k ), Rv i w i ) = 0 for all indices i, j, k [r] such that i { j, k}. Then, there exists a self-adjoint linear operator R on n n such that R v i w i ) = 0 for all i [r], the spectral nor of R satisfies R 10 R, and R represents the sae polynoial in [y, z], y z), R y z) = y z), Ry z). We can now prove that certificates in the sense of Definition 3 iply that our algorith successfully recoves the unknown tensor. Proof [Proof of Theore 4] Let T be a certificate in the sense of Definition 3. Our goal is to construct a positive seidefinite atrix M on n n n that represents the following polynoial x, y z), Mx, y z) = x 2 + y 2 z 2 2 x, Ty z). 15

16 Potechin Steurer Let T a be atrices such that x, Tx y) = a x a T a y, z). Since x 2 + n a=1 T a y, z) 2 2 x, Ty z) = x Ty z) is a su of squares of polynoials, it will be enough to find a positive seidefinite atrix that represents the polynoial y 2 z 2 n a=1 T a y, z) 2. This step is a polynoial version of the Schur copleent condition for positive seidefiniteness.) Let R be the following linear operator R = n T a T a a=1 r v i w i )v i w i ), Lea 17 R satisfies the requireent of Lea 16. Proof Consider v j w k ), Rv j w j ). Since v j is repeated, the value of this expression will be the sae if we replace R by an R 2 which represents the sae polynoial. Thus, we can replace R by R 2 = n a=1 T a T a r v i w i )v i w i ) = T T r v i w i )v i w i ) We now observe that v j w k ), R 2 v j w j ) = v j w k ), T u j ) v j w j ) = 0. By a syetrical proof, v j w k ), Rv k w k ) = 0 as well. By Lea Lea 16, there exists a self-adjoint linear operator R that represents the sae polynoial as R, has spectral nor R 10 R 0.1, and sends all vectors v i w i to 0. Since R sends all vectors v i w i to 0 and R 0.1, the following atrix R = r v i w i )v i w i ) + R has r eigenvalues of value 1 corresponding to the space spanned by v i w i ) and all other eigenvalues are at ost 0.1 because the non-zero eigenvalues of R have eigenvectors orthogonal to all v i w i ). At the sae tie, R represents the following polynoial, y z), R y z) = n T a y, z) 2. Let P be a positive seidefinite atrix that represents the polynoial x 2 + n a=1 T a y, z) 2 2 x, Ty z) such a atrix exists because the polynoial is a su of squares). We choose M as follows ) ) Id T 0 0 M = T) T) + T 0 Id R Since R Id, this atrix is positive seidefinite. Also, M represents x 2 + y 2 z 2 2 x, Ty z). Since u i = Tv i w i ) for all i [r] and the kernel of Id R only contains span of v i w i, the kernel of M is exactly the span of the vectors u i, v i w i ). Next, we show that the above atrix M iplies that Algorith 2 recovers the unknown tensor X. Recall that the algorith on input X Ω finds a pseudo-distribution x, y, z) so as to iniize x 2 + y 2 z 2 such that x y z) Ω = X Ω. Since everything is scale invariant, we ay assue that X = r λ i u i v i w i for λ 1,..., λ r 0 and i λ i = 1. Then, a valid pseudodistribution would be the probability distribution over u 1, v 1, w 1 ),..., u r, v r, w r ) with probabilities a=1 16

17 Exact tensor copletion with su-of-squares λ 1,..., λ r. Let be the pseudo-distribution coputed by the algorith. By optiality of, we know that the objective value satisfies x 2 + y 2 z 2 i λ u i 2 + v i 2 w i 2 = 2. Then, if we let Y = x, y z)x, y z), 0 M, Y = x, y,z) x 2 + y 2 z 2 2 x, Ty z) 2 2 x, y,z) x, Ty z) = 2 2 i λ u i, Tv i w i ) = 0 The first step uses that M and Y are psd. The second step uses that M represents the polynoial x 2 + y 2 z 2 2 x, Ty z). The third step uses that iniizes the objective function. The fourth step uses that the entries of T are 0 outside of Ω and that atches the observations x y z) Ω = X Ω. The last step uses that u i = Tv i w i ) for all i [r]. We conclude that M, Y = 0, which eans that the range of Y is contained in the kernel of M. Therefore, Y = r i, j=1 γ i, j u i, v i w i )u j, v j w j ) for scalars {γ i, j }. We clai that the ultipliers ust satisfy γ i,i = λ i and γ i, j = 0 for all i j [r]. Indeed since atches the observations in Ω, 0 = n λ i γ i, j δ ij ) u i v j w j ) Ω. i, j=1 Since the vectors u i v j w j ) Ω are linearly independent, we conclude that γ i, j = λ i δ ij as desired. This linear independence was one of the requireents of the certificate in Definition 3.) 4.5. Degree-4 certificates exist with high probability In this section we show that our certificate T in fact satisfies the conditions for a degree-4 certificate, proving Theore 5. We use the sae construction as in Section 4.3. The ain, reaining technical challenge for Theore 5 is to show that the construction satisfies the spectral nor condition of Definition 3. This spectral nor bound follows fro the following theore which we give a proof sketch for in Appendix B. Theore 18 Let A = R Ωl P) R Ω1 P) R Ω0 X) or P R Ωl P) R Ω1 P) R Ω0 X) and let B = R Ωl P) R Ω1 P) R Ω0 X) or P R Ωl P) R Ω1 P) R Ω0 X). There is an absolute constant C such that for any α > 1 and β > 0, A a B T a > α l+l +2) < n β a as long as > Cα β 3 2 rn 1.5 logn) and > Cα β 2 rn logn). Reark 19 If it were true in general that a A a B T a a A a A T a a B a B T a then it would be sufficient to use Theore 14 and we would not need to prove Theore 18. Unfortunately, this is not true in general. 17

18 Potechin Steurer That said, it ay be possible to show that even if we do not know directly that a A a B T a is sall, since a A a A T a and a B a B T a are both sall there ust be soe alternative atrix representation of a A a B T a which has sall nor, and this is sufficient. We leave it as an open proble whether this can be done. We have now all ingredients to prove Theore 5. Proof [Proof of Theore 5] Let k = log n) C for soe absolute constant C 1. Let E = 1) k PP R Ωk ) P R Ω1 )[X]. By Lea 13 there exists Y with Y) Ω = Y and P[Y] = E such that Y F O1) E. We let T = T k) + Y. This tensor satisfies the desired linear constraints T) Ω = T and P[T] = X. Since E has the for of the atrices in Theore 18, the bound in Theore 18 iplies E F 2 k n 10 n C+10. Here, we use that the nor in the conclusion of Theore 18 is within a factor of n 10 of the Frobenius nor.) We are to prove that the following atrix has spectral nor bounded by 0.01, n T) a T) T a a=1 n X a Xa T. We expand the su according to the definition of T l) in Eq. 4.21). Then, ost ters that appear in the expansion have the for as in Theore 18. Since those ters decrease geoetrically, we can bound their contribution by with probability 1 n ω1). The ters that involve the error correction Y is saller than because Y has polynoially sall nor Y F n C+10. The only reaining ters are cross ters between X and a tensor of the for as in Theore 18. We can bound the total contribution of these ters also bounded by at ost using Theore 84. a=1 5. Matrix nor bound techniques In this section, we describe the techniques that we will use to prove probabilistic nor bounds on atrices of the for Y = a R Ω A) a R Ω A) T a. We will prove these nor bounds using the trace oent ethod, which obtains probabilistic bounds on the nor of a atrix Y fro bounds on the expected value of tryy T ) q ) for sufficiently large q. This will require analyzing tryy T ) q ), which will take the for of a su of products, where the ters in the product are either entries of A or ters of the for R Ω a, b, c) where R Ω a, b, c) = n3 1 if a, b, c) Ω and 1 otherwise. To analyze tryy T ) q ), we will group products together which have the sae expected behavior on the R Ω a, b, c) ters, foring saller sus of products. For each of these sus, we can then use the sae bound on the expected behavior of the R Ω a, b, c) ters for each product in the su. This allows us to ove this bound outside of the su, leaving us with a su of products of entries of A. We will then bound the value of these sus by carefully choosing the order in which we su over the indices. In the reainder of this section and in the next two sections, we allow for our tensors to have asyetric diensions. We account for this with the following definitions. Definition 20 We define n 1 to the diension of the u vectors, n 2 to be the diension of the v vectors, and n 3 to be the diension of the w vectors. We define n ax = ax {n 1, n 2, n 3 } 18

19 Exact tensor copletion with su-of-squares 5.1. The trace oent ethod We use the trace oent ethod through the following proposition and corollary. Proposition 21 For any rando atrix Y, for any integer q 1 and any ε > 0, Pr Y > 2q E tryy T ) q ) ε < ε Proof By Markov s inequality, for all integers q 1 and all ε > 0 Pr tryy T ) q ) > E tryy T ) q ) ε < ε The result now follows fro the observation that if Y > 2q E[trYY T ) q )] ε then tryy T ) q ) > E[trYY T ) q )] ε. Corollary 22 For a given p 1, r 0, n > 0, and B > 0, for a rando atrixy, if E tr YY T ) q q p B) 2q n r for all integers q > 1 then for all β > 0, [ ) p ] r + β) Pr Y > Be p ln n + 1 < n β 2p Proof We take ε = n β and we choose q to iniize 2q q p B) 2q n r ε = Bq p n r+β 2q. Setting the derivative of this expression to 0 we obtain that p q r+β ln n)bq p n r+β 2q 2 2q q ust be an integer, so we instead take q = r+β 2p Bq p n r+β 2q B ) p r + β 2p ln n + 1 n p ln n Applying Proposition 21 with q, we obtain that Pr Y > 2q E tryy T ) q ) ε Pr = 0, so we want q = r+β 2p ln n. With this q, we have that = Be p r + β) ln n + 1 2p ) p [ ) p ] r + β) Y > Be p ln n + 1 < n β 2p ln n. However, 5.2. Partitioning by intersection pattern As discussed at the beginning of the section, E tryy T ) q ) will be a su of products, where part of these products will be of the for 2q R Ω a i, b i, c i ). Here, q ay or ay not be equal to q, in fact we will often have q = 2q because each Y will contribute two ters of the for R Ω a, b, c) to the product. To handle this part of the product, we partition the ters of our su based on the intersection pattern of which triples a i, b i, c i ) are equal to each other. Fixing an intersection pattern deterines the expected value of 2q R Ω a i, b i, c i ). 19

20 Potechin Steurer Definition 23 We define an intersection pattern to be a set of equalities and inequalities satisfying the following conditions 1. All of the equalities and inequalities are of the for a i1, b i1, c i1 ) = a i2, b i2, c i2 ) or a i1, b i1, c i1 ) a i2, b i2, c i2 ), respectively. 2. For every i 1, i 2, either a i1, b i1, c i1 ) = a i2, b i2, c i2 ) is in the intersection pattern or a i1, b i1, c i1 ) a i2, b i2, c i2 ) is in the intersection pattern 3. All of the equalities and inequalities are consistent with each other, i.e. there exist values of a 1, b 1, c 1 ),, a 2q, b 2q, c 2q ) satisfying all of the equalities and inequalities in the intersection pattern. Proposition 24 For a given a, b, c), 1. E R Ω a, b, c) = 0 2. For all k > 1, E [ R Ω a, b, c) k ] n 1 n 2 n 3 k 1 Corollary 25 For a given intersection pattern, if there is any triple a, b, c) which appears exactly once, E [ 2q R Ω a i, b i, c i ) ] = 0. Otherwise, letting z be the nuber of distinct triples, E [ 2q R Ω a i, b i, c i ) ] n 1 n 2 n 3 2q z Proof for a given intersection pattern, let a i1, b i1, c i1 ),, a iz, b iz, c iz ) be the distinct triples and let c j be the nuber of ties the triple a ij, b ij, c ij ) appears. We have that 2q E R Ω a i, b i, c i ) = z E R Ω a ij, b ij, c ij ) c j j=1 If c j = 1 for any j then this expression is 0. Otherwise, z E R Ω a ij, b ij, c ij ) c j j=1 z j=1 n1 n 2 n 3 ) cj 1 n1 n 2 n ) z 3 j=1 c j = ) z n1 n 2 n ) 2q z 3 = 5.3. Bounding sus of products of tensor entries In this subsection, we describe how to bound the su of products of tensor entries we obtain for a given intersection pattern after oving our bound on the expected value of the R Ω a, b, c) ters outside the su. We represent such a product with a hypergraph as follows. Definition 26 Given a set of distinct indices and a set of tensor entries on those indices, let H be the hypergraph with one vertex for each distinct index and one hyperedge for each tensor entry, where the hyperedge consists of all indices contained in the tensor entry. If the tenor entry appears to the pth power, we take this hyperedge with ultiplicity p. 20

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,