On Tensor Completion via Nuclear Norm Minimization

Size: px
Start display at page:

Download "On Tensor Completion via Nuclear Norm Minimization"

Transcription

1 On Tensor Completion via Nuclear Norm Minimization Ming Yuan and Cun-Hui Zhang University of Wisconsin-Madison and Rutgers University April 21, 2015) Abstract Many problems can be formulated as recovering a low-rank tensor. Although an increasingly common task, tensor recovery remains a challenging problem because of the delicacy associated with the decomposition of higher order tensors. To overcome these difficulties, existing approaches often proceed by unfolding tensors into matrices and then apply techniques for matrix completion. We show here that such matricization fails to exploit the tensor structure and may lead to suboptimal procedure. More specifically, we investigate a convex optimization approach to tensor completion by directly minimizing a tensor nuclear norm and prove that this leads to an improved sample size requirement. To establish our results, we develop a series of algebraic and probabilistic techniques such as characterization of subdifferetial for tensor nuclear norm and concentration inequalities for tensor martingales, which may be of independent interests and could be useful in other tensor related problems. Department of Statistics, University of Wisconsin-Madison, Madison, WI The research of Ming Yuan was supported in part by NSF Career Award DMS and FRG Grant DMS , and NIH Grant 1U54AI Department of Statistics and Biostatistics, Rutgers University, Piscataway, New Jersey The research of Cun-Hui Zhang was supported in part by NSF Grants DMS and DMS

2 1 Introduction Let T R d 1 d 2 d N be an Nth order tensor, and Ω be a randomly sampled subset of [d 1 ] [d N ] where [d] = {1, 2,..., d. The goal of tensor completion is to recover T when observing only entries T ω) for ω Ω. In particular, we are interested in the case when the dimensions d 1,..., d N are large. Such a problem arises naturally in many applications. Examples include hyper-spectral image analysis Li and Li, 2010), multi-energy computed tomography Semerci et al., 2013), radar signal processing Sidiropoulos and Nion, 2010), audio classification Mesgarani, Slaney and Shamma, 2006) and text mining Cohen and Collins, 2012) among numerous others. Common to these and many other problems, the tensor T can oftentimes be identified with a certain low-rank structure. The low-rankness entails reduction in degrees of freedom, and as a result, it is possible to recover T exactly even when the sample size Ω is much smaller than the total number, d 1 d 2 d N, of entries in T. In particular, when N = 2, this becomes the so-called matrix completion problem which has received considerable amount of attention in recent years. See, e.g., Candès and Recht 2008), Candès and Tao 2009), Recht 2010), and Gross 2011) among many others. An especially attractive approach is through nuclear norm minimization: min X subject to Xω) = T ω) ω Ω, X R d 1 d 2 where the nuclear norm of a matrix is given by X = min{d 1,d 2 k=1 σ k X), and σ k ) stands for the kth largest singular value of a matrix. Denote by T the solution to the aforementioned nuclear norm minimization problem. As shown, for example, by Gross 2011), if an unknown d 1 d 2 matrix T of rank r is of low coherence with respect to the canonical basis, then it can be perfectly reconstructed by T with high probability whenever Ω Cd 1 + d 2 )r log 2 d 1 + d 2 ), where C is a numerical constant. In other words, perfect recovery of a matrix is possible with observations from a very small fraction of entries in T. In many practical situations, we need to consider higher order tensors. The seemingly innocent task of generalizing these ideas from matrices to higher order tensor completion problems, however, turns out to be rather subtle, as basic notion such as rank, or singular value decomposition, becomes ambiguous for higher order tensors e.g., Kolda and Bader, 2009; Hillar and Lim, 2013). A common strategy to overcome the challenges in dealing with high order tensors is to unfold them to matrices, and then resort to the usual nuclear 2

3 norm minimization heuristics for matrices. To fix ideas, we shall focus on third order tensors N = 3) in the rest of the paper although our techniques can be readily used to treat higher order tensor. Following the matricization approach, T can be reconstructed by the solution of the following convex program: min { X 1) + X 2) + X 3) subject to Xω) = T ω) ω Ω, X R d 1 d 2 d 3 where X j) is a d j k j d k) matrix whose columns are the mode-j fibers of X. e.g., Liu et al. 2009), Signoretto, Lathauwer and Suykens 2010), Gandy et al. 2011), Tomioka, Hayashi and Kashima 2010), and Tomioka et al. 2011). In the light of existing results on matrix completion, with this approach, T can be reconstructed perfectly with high probability provided that Ω Cd 1 d 2 r 3 + d 1 r 2 d 3 + r 1 d 2 d 3 ) log 2 d 1 + d 2 + d 3 ) uniformly sampled entries are observed, where r j is the rank of X j) and C is a numerical constant. See, e.g., Mu et al. 2013). It is of great interests to investigate if this sample size requirement can be improved by avoiding matricization of tensors. We show here that the answer indeed is affirmative and a more direct nuclear norm minimization formulation requires a smaller sample size to recover T. More specifically, write, for two tensors X, Y R d 1 d 2 d 3, as their inner product. Define X = X, Y = ω [d 1 ] [d 2 ] [d 3 ] Xω)Y ω) max X, u 1 u 2 u 3, u j R dj : u 1 = u 2 = u 3 =1 where, with slight abuse of notation, also stands for the usual Euclidean norm for a vector, and for vectors u j = u j 1,..., u j d j ), u 1 u 2 u 3 = u 1 au 2 bu 3 c) 1ad1,1bd 2,1cd 3. It is clear that the defined above for tensors is a norm and can be viewed as an extension of the usual matrix spectral norm. Appealing to the duality between the spectral norm and nuclear norm in the matrix case, we now consider the following nuclear norm for tensors: X = max Y, X. Y R d 1 d 2 d 3 : Y 1 See, 3

4 It is clear that is also a norm. We then consider reconstructing T via the solution to the following convex program: min X subject to Xω) = T ω) ω Ω. X R d 1 d 2 d 3 We show that the sample size requirement for perfect recovery of a tensor with low coherence using this approach is Ω C r 2 d 1 + d 2 + d 3 ) + rd 1 d 2 d 3 ) polylog d 1 + d 2 + d 3 ), where r = r 1 r 2 d 3 + r 1 r 3 d 2 + r 2 r 3 d 1 )/d 1 + d 2 + d 3 ), polylogx) is a certain polynomial function of logx), and C is a numerical constant. In particular, when considering nearly) cubic tensors with d 1, d 2 and d 3 approximately equal to a common d, then this sample size requirement is essentially of the order r 1/2 d log d) 3/2. In the case when the tensor dimension d is large while the rank r is relatively small, this can be a drastic improvement over the existing results based on matricizing tensors where the sample size requirement is rd log d) 2. The high-level strategy to the investigation of the proposed nuclear norm minimization approach for tensors is similar, in a sense, to the treatment of matrix completion. Yet the analysis for tensors is much more delicate and poses significant new challenges because many of the well-established tools for matrices, either algebraic such as characterization of the subdifferential of the nuclear norm, or probabilistic such as concentration inequalities for martingales, do not exist for tensors. Some of these disparities can be bridged and we develop various tools to do so. Others are due to fundamental differences between matrices and higher order tensors, and we devise new strategies to overcome them. The tools and techniques we developed may be of independent interests and can be useful in dealing with other problems for tensors. The rest of the paper is organized as follows. We first describe some basic properties of tensors and their nuclear norm necessary for our analysis in Section 2. Section 3 discusses the main architect of our analysis. The main probabilistic tools we use are concentration bounds for the sum of random tensors. Because the tensor spectral norm does not have the interpretation as an operator norm of a linear mapping between Hilbert spaces, the usual matrix Bernstein inequality cannot be directly applied. It turns out that different strategies are required for tensors of low rank and tensors with sparse support, and these results are presented in Sections 4 and 5 respectively. We conclude the paper with a few remarks in Section 6. 4

5 2 Tensor We first collect some useful algebraic facts for tensors essential to our later analysis. Recall that the inner product between two third order tensors X, Y R d 1 d 2 d 3 d 1 d 2 d 3 X, Y = Xa, b, c)y a, b, c), a=1 b=1 c=1 is given by and X HS = X, X 1/2 is the usual Hilbert-Schmidt norm of X. Another tensor norm of interest is the entrywise l norm, or tensor max norm: X max = max Xω). ω [d 1 ] [d 2 ] [d 3 ] It is clear that for any the third order tensor X R d 1 d 2 d 3, X max X X HS X, and X 2 HS X X. We shall also encounter linear maps defined on tensors. Let R : R d 1 d 2 d 3 R d 1 d 2 d 3 be a linear map. We define the induced operator norm of R under tensor Hilbert-Schmidt norm as { R = max RX HS : X R d 1 d 2 d 3, X HS Decomposition and Projection Consider the following tensor decomposition of X into rank-one tensors: X = [A, B, C] := r a k b k c k, 1) k=1 where a k s, b k s and c k s are the column vectors of matrices A, B and C respectively. Such a decomposition in general is not unique see, e.g., Kruskal, 1989). However, the linear spaces spanned by columns of A, B and C respectively are uniquely defined. More specifically, write X, b, c) = X1, b, c),..., Xd 1, b, c)), that is the mode-1 fiber of X. Define Xa,, c) and Xa, b, ) in a similar fashion. Let L 1 X) = l.s.{x, b, c) : 1 b d 2, 1 c d 3 ; L 2 X) = l.s.{xa,, c) : 1 a d 1, 1 c d 3 ; L 3 X) = l.s.{xa, b, ) : 1 a d 1, 1 b d 2, where l.s. represents the linear space spanned by a collection of vectors of conformable dimension. Then it is clear that the linear space spanned by the column vectors of A is 5

6 L 1 X), and similar statements hold true for the column vectors of B and C. In the case of matrices, both marginal linear spaces, L 1 and L 2 are necessarily of the same dimension as they are spanned by the respective singular vectors. For higher order tensors, however, this is typically not true. We shall denote by r j X) the dimension of L j X) for j = 1, 2 and 3, which are often referred to the Tucker ranks of X. Another useful notion of tensor rank for our purposes is rx) = r 1 X)r 2 X)d 3 + r 1 X)r 3 X)d 2 + r 2 X)r 3 X)d 1 ) /d. where d = d 1 + d 2 + d 3, which can also be viewed as a generalization of the matrix rank to tensors. It is well known that the smallest value for r in the rank-one decomposition 1) is in [rx), r 2 X)]. See, e.g., Kolda and Bader 2009). Let M be a matrix of size d 0 d 1. Marginal multiplication of M and a tensor X in the first coordinate yields a tensor of size d 0 d 2 d 3 : d 1 M 1 X)a, b, c) = M aa Xa, b, c). It is easy to see that if X = [A, B, C], then M 1 X = [MA, B, C]. Marginal multiplications 2 and 3 between a matrix of conformable size and X can be similarly defined. Let P be arbitrary projection from R d 1 to a linear subspace of R d 1. It is clear from the definition of marginal multiplications, [P A, B, C] is also uniquely defined for tensor X = [A, B, C], that is, [P A, B, C] does not depend on the particular decomposition of A, B, C. Now let P j be arbitrary projection from R d j to a linear subspace of R d j. Define a tensor projection P 1 P 2 P 3 on X = [A, B, C] as P 1 P 2 P 3 )X = [P 1 A, P 2 B, P 3 C]. We note that there is no ambiguity in defining P 1 P 2 P 3 )X because of the uniqueness of marginal projections. Recall that L j X) is the linear space spanned by the mode-j fibers of X. Let P j X be the projection from R d j to L j X), and P j be the projection to its orthogonal complement X in R d j. The following tensor projections will be used extensively in our analysis: Q 0 X = P 1 X P 2 X P 3 X, Q 0 = P 1 P 2 P 3, X X X X Q 1 X = P 1 P 2 X X P 3 X, Q 1 = P 1 X X P 2 P 3, X X Q 2 X = P 1 X P 2 P 3 X X, Q 2 = P 1 P 2 X X X P 3, X Q 3 X = P 1 X P 2 X P 3, X Q 3 = P 1 P 2 P 3 X X X X, Q X = Q 0 X + Q 1 X + Q 2 X + Q 3 X, Q X = Q 0 + Q 1 + Q 2 + Q 3. X X X X a =1 6

7 2.2 Subdifferential of Tensor Nuclear Norm One of the main technical tools in analyzing the nuclear norm minimization is the characterization of the subdifferntial of the nuclear norm. Such results are well known in the case of matrices. In particular, let M = UDV be the singular value decomposition of a matrix M, then the subdifferential of the nuclear norm at M is given by M) = {UV + W : U W = W V = 0, and W 1, where with slight abuse of notion, and are the nuclear and spectral norms of matrices. See, e.g., Watson 1992). In other words, for any other matrix Y of conformable dimensions, Y M + UV + W, Y M if and only if U W = W V = 0 and W 1. Characterizing the subdifferential of the nuclear norm for higher order tensors is more subtle due to the lack of corresponding spectral decomposition. A straightforward generalization of the above characterization may suggest that X) be identified with {W + W : W = Q X W, W 1, for some W in the range of Q 0 X. It turns out that this in general is not true. As a simple counterexample, let and Y = 1i,j,k2 X = e 1 e 1 e 1, e i e j e k = e 1 + e 2 ) e 1 + e 2 ) e 1 + e 2 ), where d 1 = d 2 = d 3 = 2 and e i s are the canonical basis of an Euclidean space. It is clear that X = 1 and Y = 2 2. Take W = U/ U where U = e 1 e 2 e 2 + e 2 e 1 e 2 + e 2 e 2 e 1. 2) As we shall show in the proof of Lemma 1 below, U = 2/ 3. It is clear that W = Q X W and W 1. Yet, X + Y X, W + W = 1 + Y X, W = /2 > 2 2 = Y, for any W such that W = Q 0 X W. Fortunately, for our purposes, the following relaxed characterization is sufficient. 7

8 Lemma 1 For any third order tensor X R d 1 d 2 d 3, there exists a W R d 1 d 2 d 3 such that W = Q 0 XW, W = 1 and X = W, X. Furthermore, for any Y R d 1 d 2 d 3 and W R d 1 d 2 d 3 obeying W 1/2, Y X + W + Q X W, Y X. Proof of Lemma 1. If W 1, then max Q 0 XW, u 1 u 2 u 3 max W, P 1 Xu 1 ) P 2 Xu 2 ) P 3 Xu 3 ) 1. u j =1 u j =1 It follows that X = max W, X = max Q 0 XW, X W =1 W =1 is attained with a certain W satisfying W = Q 0 X W = 1. Now consider a tensor W satisfying W + Q X W 1. Because Q X X = 0, it follows from the definition of the tensor nuclear norm that W + Q X W, Y X Y W, X = Y X. It remains to prove that W 1/2 implies W + Q X W 1. Recall that W = Q 0 X W, W 1/2, and u j = 1. Then W + Q X W, u 1 u 2 u 3 Q 0 Xu 1 u 2 u 3 ) Q X u 1 u 2 u 3 ) 3 1 a 2 j + 1 ) a 1 a 2 + a 1 a 3 1 a 22 + a 2 a 3 1 a j=1 where a j = P j X u j 2, for j = 1, 2, 3. Let x = a 1 a 2 and y = 1 a 2 1)1 a 2 2). 8

9 We have ) 2 a 1 1 a 22 + a 2 1 a 2 1 = a 2 11 a 2 2) + a 2 21 a 2 1) + 2xy = a a 2 2 2a 2 1a xy = 1 y x) 2. It follows that for any value of a 3 0, 1), W + Q X W, u 1 u 2 u 3 y 1 a ) x + a 3 1 y x) 2. 2 This function of x, y) is increasing in the smaller of x and y. For x < y, the maximum of x 2 given y 2 is attained when a 1 = a 2 by simple calculation with the Lagrange multiplier. Similarly, for y < x, the maximum of y 2 given x 2 is attained when a 1 = a 2. Thus, setting a 1 = a 2 = a, we find { W + Q X W, u 1 u 2 u 3 max 1 a 2 ) a 3,a 1 a a 2 + 2a 3 a 1 a 2 ). The above maximum is attained when a 3 = a. Because 1 a 2 + a 2 /2 1, we have which completes the proof of the lemma. W + Q X W, u 1 u 2 u 3 1, The norm of U defined in 2) can be computed using a similar argument: ) U = max a 1 a 2 1 a 23 + a 1 a 3 1 a 22 + a 2 a 3 1 a 2 1 0a 1 a 2 a 3 1 ) = max x 1 a a 3 1 y x) 2 x,y,a 3 ) = max a a 1 a a 3 1 a 2, a,a 3 = max a a a 2 ), a which yields U = 2/ 3 with a 2 = 2/3. Note that Lemma 1 gives only sufficient conditions of the subgradient of tensor nuclear norm. Equivalently it states that X) { W + Q X W : W 1/2. We note also that the constant 1/2 may be further improved. No attempt has been made here to sharpen the constant as it already suffices for our analysis. 9

10 2.3 Coherence A central concept to matrix completion is coherence. dimensional linear subspace U of R k is defined as Recall that the coherence of an r µu) = k r max P Ue i 2 = max 1ik P U e i 2 1ik k 1 k P Ue i, 2 where P U is the orthogonal projection onto U and e i s are the canonical basis for R k. See, e.g., Candès and Recht 2008). We shall define the coherence of a tensor X R d 1 d 2 d 3 µx) = max{µl 1 X)), µl 2 X)), µl 3 X)). It is clear that µx) 1, since µu) is the ratio of the l and length-normalized l 2 norms of a vector. Lemma 2 Let X R d 1 d 2 d 3 be a third order tensor. Then max a,b,c Q Xe a e b e c ) 2 HS r2 X)d d 1 d 2 d 3 µ 2 X). as Proof of Lemma 2. Recall that Q X = Q 0 X + Q1 X + Q2 X + Q3 X. Therefore, Q X e a e b e c ) 2 = = 3 Q j X e a e b e c ), Q k Xe a e b e c ) j,k=0 3 Q j X e a e b e c ) 2 j=0 = P 1 Xe a 2 P 2 Xe b 2 P 3 Xe c 2 + P 1 X e a 2 P 2 Xe b 2 P 3 Xe c 2 For brevity, write r j = r j X), and µ = µx). Then + P 1 Xe a 2 P 2 X e b 2 P 3 Xe c 2 + P 1 Xe a 2 P 2 Xe b 2 P 3 X e c 2. P 1 Xe a 2 P 2 Xe b 2 P 3 Xe c 2 + P 1 X e a 2 P 2 Xe b 2 P 3 Xe c 2 r 2r 3 µ 2 d 2 d 3 ; P 1 Xe a 2 P 2 Xe b 2 P 3 Xe c 2 + P 1 Xe a 2 P 2 X e b 2 P 3 Xe c 2 r 1r 3 µ 2 d 1 d 3 ; P 1 Xe a 2 P 2 Xe b 2 P 3 Xe c 2 + P 1 Xe a 2 P 2 Xe b 2 P 3 X e c 2 r 1r 2 µ 2 d 1 d 2. As a result, for any a, b, c) [d 1 ] [d 2 ] [d 3 ], Q X e a e b e c ) 2 µ2 r 1 r 2 d 3 + r 1 d 2 r 3 + d 1 r 2 r 3 ) d 1 d 2 d 3, 10

11 which implies the desired statement. Another measure of coherence for a tensor X is αx) := d 1 d 2 d 3 /rx) W max where W is such that W = Q 0 X W, W = 1 and X, W = X as described in Lemma 1. The quantity αx) is related to µx) defined earlier and the spikiness Lemma 3 Let X R d 1 d 2 d 3 αx) = d 1 d 2 d 3 W max / W HS. be a third order tensor. Assume without loss of generality that r 1 X) r 2 X) r 3 X). Then, { α 2 X) min r 1 X)r 2 X)r 3 X)µ 3 X)/rX), r 1 X)r 2 X) α 2 X)/rX). Moreover, if X admits a bi-orthogonal eigentensor decomposition r λ iu i v i w i ) with λ i 0 and u i u j = v i v j w i w j = I{i = j for 1 i, j r, then r 1 X) = r 3 X) = rx) = r, X 1) = X, and αx) = αx) 1. Proof of Lemma 3. Due to the conditions W = Q 0 XW and W = 1, W 2 max = max Xe a ) P 2 Xe b ) P 3 Xe c ) 2 a,b,c max Xe a 2 P 2 Xe b 2 P 3 Xe c 2 a,b,c r 1 X)r 2 X)r 3 X)µ 3 X)/d 1 d 2 d 3 ), which yields the upper bound for αx) in terms of µx). Because W is in the range of Q 0 X, L jw ) L j X). Therefore, r 1 W ) r 1 X). Recall that W 1) is a d 1 d 2 d 3 ) matrix whose columns are the mode-1 fibers of W. Applying singular value decomposition to W 1) suggests that there are orthornomal vectors {u 1,..., u r1 in R d 1 and matrices M 1,..., M r1 R d 2 d 3 such that M j, M k = 0 if j k, and W = r 1 X) k=1 u k M k. It is clear that M k W = 1, and rankm k ) r 2 X). Therefore, W 2 HS r 1 X) k=1 M k 2 HS r 1 X)r 2 X). 11

12 This gives the upper bound for αx) in terms of αx). It remains to consider the case of X = r λ iu i v i w i ). Obviously, by triangular inequality, r r X u i v i w i = w i. On the other hand, let Because W max a,b,c: a, b, c 1 W = r u i v i w i / w i ). ) a u i tracecb v i w i / w i ) a bc HS 1, we find which implies that W is dual to X and X W, X = r w i, X = r w i, where the rightmost hand side also equals to X 1) and X 2). The last statement now follows from the fact that W 2 HS = r. As in the matrix case, exact recovery with observations on a small fraction of the entries is only possible for tensors with low coherence. In particular, we consider in this article the recovery of a tensor T obeying µt ) µ 0 and αt ) α 0 for some µ 0, α Exact Tensor Recovery We are now in position to study the nuclear norm minimization for tensor completion. Let T be the solution to min X subject to P Ω X = P Ω T, 3) X R d 1 d 2 d 3 where P Ω : R d 1 d 2 d 3 R d 1 d 2 d 3 such that { Xi, j, k) if i, j, k) Ω P Ω X)i, j, k) = 0 otherwise. 12

13 Assume that Ω is a uniformly sampled subset of [d 1 ] [d 2 ] [d 3 ]. The goal is to determine what the necessary sample size is for successful reconstruction of T using T with high probability. In particular, we show that that with high probability, exact recovery can be achieved with nuclear norm minimization 3) if ) Ω α 0 rd1 d 2 d 3 + µ 2 0r 2 d polylogd), where d = d 1 + d 2 + d 3. More specifically, we have Theorem 1 Assume that µt ) µ 0, αt ) α 0, and rt ) = r. Let Ω be a uniformly sampled subset of [d 1 ] [d 2 ] [d 3 ] and T be the solution to 3). For β > 0, define q 1 = β + log d ) 2 α 2 0 r log d, q 2 = 1 + β)log d)µ 2 0r 2. Let n = Ω. Suppose that for a sufficiently large numerical constant c 0, [ ] n c 0 δ2 1 q11 + β)δ1 1 d 1 d 2 d 3 + q1d 1+δ 1 + q2d 1+δ 2 4) with certain {δ 1, δ 2 [1/ log d, 1/2] and β > 0. Then, P { T T d β. 5) In particular, for δ 1 = δ 2 = log d) 1, 4) can be written as [ n C µ0,α 0,β log d) 3 rd 1 d 2 d 3 + { rlog d) 3 + r 2 log d) ] d with a constant C µ0,α 0,β depending on {µ 0, α 0, β only. For d 1 d 2 d 3 and fixed {α 0, µ 0, δ 1, δ 2, β, the sample size requirement 4) becomes n rd log d) 3/2, provided max{rlog d) 3 d 2δ 1, r 3 d 2δ 2 /log d) = Od). The high level idea of our strategy is similar to the matrix case exact recovery of T is implied by the existence of a dual certificate G supported on Ω, that is P Ω G = G, such that Q T G = W and Q T G < 1/2. See, e.g., Gross 2011) and Recht 2011). 3.1 Recovery with a Dual Certificate Write T = T +. Then, P Ω = 0 and T + T. 13

14 Recall that, by Lemma 1, there exists a W obeying W = Q 0 T W and W = 1 such that T + T + W + Q T W, for any W obeying W 1/2. Assume that a tensor G supported on Ω, that is P Ω G = G, such that Q T G = W, and Q T G < 1/2. When Q T > 0, W + Q T W, = W + Q T W G, Take W = U/2 where We find that Q T > 0 implies = W Q T G, + W, Q T Q T G, Q T > W, Q T 1 2 Q T U = arg max X, Q T. X: X 1 T + T W + Q T W, > 0, which contradicts with fact that T minimizes the nuclear norm. Thus, Q T = 0, which then implies Q T Q Ω Q T = Q T Q Ω = 0. When Q T Q Ω Q T is invertible in the range of Q T, we also have Q T = 0 and T = T. With this in mind, it then suffices to seek such a dual certificate. In fact, it turns out that finding an approximate dual certificate is actually enough for our purposes. Lemma 4 Assume that { n inf P Ω Q T X HS : Q T X HS = 1. 6) 2d 1 d 2 d 3 If there exists a tensor G supported on Ω such that Q T G W HS < 1 n and max G, Q 4 2d 1 d 2 d T X 1/4, 7) 3 Q T X =1 then T = T. Proof of Lemma 4. Write T = T +, then P Ω = 0 and T + T. Recall that, by Lemma 1, there exists a W obeying W = Q 0 T W and W = 1 such that for any W obeying W 1/2, T + T + W + Q T W,. 14

15 Since G, = P Ω G, = G, P Ω = 0 and Q T W = W, 0 W + Q T W, = W + Q T W G, = Q T W Q T G, + W, Q T G, Q T W Q T G HS Q T HS + W, Q T 1 4 Q T. In particular, taking W satisfying W = 1/2 and W, Q T = Q T /2, we find 1 4 Q T W Q T G HS Q T HS. Recall that P Ω = P Ω Q T + P Ω Q T = 0. Thus, in view of the condition on P Ω, Q T HS 2d1 d 2 d 3 /n P ΩQ T HS = P Ω Q T HS Q T HS Q T. 8) Consequently, Since 1 4 Q T 2d 1 d 2 d 3 /n W Q T G HS Q T. 2d1 d 2 d 3 /n W Q T G HS < 1/4, we have Q T = 0. Together with 8), we conclude that = 0, or equivalently T = T. Equation 6) indicates the invertibility of P Ω when restricted to the range of Q T. We argue first that this is true for incoherent tensors. To this end, we prove that ) Q T d 1 d 2 d 3 /n)p Ω I Q T 1/2 with high probability. This implies that as an operator in the range of Q T, the spectral norm of d 1 d 2 d 3 /n)q T P Ω Q T is contained in [1/2, 3/2]. Consequently, 6) holds because for any X R d 1 d 2 d 3, d 1 d 2 d 3 /n) P Ω Q T X 2 HS = Recall that d = d 1 + d 2 + d 3. We have Q T X, d 1 d 2 d 3 /n)q T P Ω Q T X 1 2 Q T X 2 HS. Lemma 5 Assume µt ) µ 0, rt ) = r, and Ω is uniformly sampled from [d 1 ] [d 2 ] [d 3 ] without replacement. Then, for any τ > 0, P { QT d1 d 2 d 3 /n)p Ω I ) Q T τ 2r 2 d exp In particular, taking τ = 1/2 in Lemma 5 yields { P 6) holds 1 2r 2 d exp 15 3 n )). 32 µ 2 0r 2 d τ 2 /2 n )) τ/3 µ 2 0r 2 d

16 3.2 Constructing a Dual Certificate We now show that the approximate dual certificate as required by Lemma 4 can indeed be constructed. We use a strategy similar to the golfing scheme for the matrix case see, e.g., Gross, 2011). Recall that Ω is a uniformly sampled subset of size n = Ω from [d 1 ] [d 2 ] [d 3 ]. The main idea is to construct an approximate dual certificate supported on a subset of Ω, and we do so by first constructing a random sequence with replacement from Ω. More specifically, we start by sampling a 1, b 1, c 1 ) uniformly from Ω. We then sequentially sample a i, b i, c i ) i = 2,..., n) uniformly from the set of unique past observations, denoted by S i 1, with probability S i 1 /d 1 d 2 d 3 ; and uniformly from Ω\S i 1 with probability 1 S i 1 /d 1 d 2 d 3. It is worth noting that, in general, there are replicates in the sequence {a j, b j, c j ) : 1 j i and the set S i consists of unique observations from the sequence so in general S i < i. It is not hard to see that the sequence a 1, b 1, c 1 ),..., a n, b n, c n ) forms an independent and uniformly distributed with replacement) sequence on [d 1 ] [d 2 ] [d 3 ]. We now divide the sequence {a i, b i, c i ) : 1 i n into n 2 subsequences of length n 1 : Ω k = {a i, b i, c i ) : k 1)n 1 < i kn 1, for k = 1, 2,..., n 2, where n 1 n 2 n. Recall that W is such that W = Q 0 T W, W = 1, and T = T, W. For brevity, write P a,b,c) : R d 1 d 2 d 3 R d 1 d 2 d 3 as a linear operator that zeroes out all but the a, b, c) entry of a tensor. Let R k = I 1 kn 1 n 1 i=k 1)n 1 +1 with I being the identity operator on tensors and define G k = d 1 d 2 d 3 )P ai,b i,c i ) k ) I Rl QT R l 1 Q T Q T R 1 Q T W, G = G n2. l=1 Since a i, b i, c i ) Ω, P Ω I R k ) = I R k, so that P Ω G = G. It follows from the definition of G k that Q T G k = k Q T Q T R l Q T )Q T R l 1 Q T ) Q T R 1 Q T W ) l=1 = W Q T R k Q T ) Q T R 1 Q T )W and k G k, Q T X = R l Q T R l 1 Q T ) Q T R 1 Q T )W, Q T X. l=1 16

17 Thus, condition 7) holds if Q T R n2 Q T ) Q T R 1 Q T )W HS < 1 n 9) 4 2d 1 d 2 d 3 and n 2 R l Q T R l 1 Q T ) Q T R 1 Q T )W < 1/4. 10) l=1 3.3 Verifying Conditions for Dual Certificate We now prove that 9) and 10) hold with high probability for the approximate dual certificate constructed above. For this purpose, we need large deviation bounds for the average of certain iid tensors under the spectral and maximum norms. Lemma 6 Let {a i, b i, c i ) be an independently and uniformly sampled sequence from [d 1 ] [d 2 ] [d 3 ]. Assume that µt ) µ 0 and rt ) = r. Then, for any fixed k = 1, 2,..., n 2, and for all τ > 0, { QT P R k Q T τ 2r 2 d exp τ 2 /2 n1 )), 11) 1 + 2τ/3 µ 2 0r 2 d and max X max=1 QT P{ R k Q T X τ max 2d 1 d 2 d 3 exp τ 2 /2 n1 )). 12) 1 + 2τ/3 µ 2 0r 2 d Because d 1 d 2 d 3 ) 1/2 W HS W max W 1, Equation 9) holds if max 1ln2 Q T R l Q T τ and n 2 1 log τ log 32d1 d 2 d 3 n 1/2 ). 13) Thus, an application of 11) now gives the following bound: { P 9) holds 1 P{ Q T R n2 Q T ) Q T R 1 Q T ) τ n 2 1 P{ max Q T R l Q T τ 1ln 2 1 2n 2 r 2 d exp τ 2 /2 n1 )) τ/3 µ 2 0r 2 d 17

18 Now consider Equation 10). Let W l = Q T R l Q T W l 1 for l 1 with W 0 = W. Observe that 10) does not hold with at most probability { n 2 P R l W l 1 1/4 l=1 P{ R 1 W 0 1/8 l=2 + P{ W max 1 W max/2 { n 2 +P R l W l 1 1/8, W max 1 < W max /2 { R1 P W 0 1/8 W + P{ max 1 W max/2 R2 +P{ W 1 1/16, W max 1 < W max/2 W +P{ max 2 W max/4, W max 1 < W max/2 { n 2 1 +P R l W l 1 16, W max 2 < W max /4 2 1 l=1 n 2 + l=1 l=3 { QT P R l Q T W max l 1 W max/2 l, W max l 1 W max/2 l 1 { Rl P W l l, W max l 1 W max/2 l 1. Since {R l, W l are i.i.d., 12) with X = W l 1 / W l 1 max implies { P 10) holds QT 1 n 2 max P{ R 1 Q T X > 1 R1 + P{ X > X:X=Q T X max 2 X max 1 1 2n 2 d 1 d 2 d 3 exp 3/32)n1 ) µ 2 0r 2 d n 2 max P X:X=Q T X X max W max { R1 X 1 ) 8 W max > 1 8 The last term on the right hand side can be bounded using the following result.. Lemma 7 Assume that αt ) α 0, rt ) = r and q 1 = β + log d ) 2 α 2 0 r log d. There exists a numerical constant c 1 > 0 such that for any constants β > 0 and 1/log d) δ 1 < 1, ] n 1 c 1 [q1d 1+δ 1 + q11 + β)δ1 1 d 1 d 2 d 3 14) implies { R1 max P X 1 d β 1, 15) X:X=Q T X 8 X max W max 18

19 where W is in the range of Q 0 T such that W = 1 and T, W = T. 3.4 Proof of Theorem 1 Since 7) is a consequence of 9) and 10), it follows from Lemmas 4, 5, 6 and 7 that for τ 0, 1/2] and n n 1 n 2 satisfying conditions 13) and 14), P { T T 2r 2 d exp n 2 d 1 d 2 d 3 exp n )) µ 2 0r 2 d 3/32)n1 µ 2 0r 2 d + 2n 2 r 2 d exp ) + n 2 d β 1. τ 2 /2 n1 )) 1 + 2τ/3 µ 2 0r 2 d We now prove Theorem 1 by setting τ = d δ 2/2 /2, so that condition 13) can be written as n 2 c 2 /δ 2. Assume without loss of generality n 2 d/2 because large c 0 forces large d. For sufficiently large c 2, the right-hand side of the above inequality is no greater than d β when n 1 c 21 + β)log d)µ 2 0r 2 d/4τ 2 ) = c 2q 2d 1+δ 2 holds as well as 14). Thus, 4) implies 5) for sufficiently large c 0. 4 Concentration Inequalities for Low Rank Tensors We now prove Lemmas 5 and 6, both involving tensors of low rank. We note that Lemma 5 concerns the concentration inequality for the sum of a sequence of dependent tensors whereas in Lemma 6, we are interested in a sequence of iid tensors. 4.1 Proof of Lemma 5 We first consider Lemma 5. Let a k, b k, c k ) be sequentially uniformly sampled from Ω without replacement, S k = {a j, b j, c j ) : j k, and m k = d 1 d 2 d 3 k. Given S k, the conditional expectation of P ak+1,b k+1,c k+1 ) is [ E P ak+1,b k+1,c k+1 ) ] S k = P S c k m k. For k = 1,..., n, define martingale differences D k = d 1 d 2 d 3 m n /m k )Q T P ak,b k,c k ) P S c k 1 /m k 1 )Q T. Because P S c 0 = I and S n = Ω, we have Q T P Ω Q T /m n = D n d 1 d 2 d 3 m n + Q T P S c n 1 /m n 1 )Q T /m n + Q T P Sn 1 Q T /m n 19

20 = = D n + Q T 1/m n 1/m n 1 ) + Q T P Sn 1 Q T /m n 1 d 1 d 2 d 3 m n D k + Q T 1/m n 1/m 0 ). d 1 d 2 d 3 m n k=1 Since 1/m n 1/m 0 = n/d 1 d 2 d 3 m n ), it follows that Q T d 1 d 2 d 3 /n)p Ω Q T Q T = 1 n D k. Now an application of the matrix martingale Bernstein inequality see, e.g., Tropp, 2011) gives { 1 P n k=1 k=1 n 2 τ 2 /2 ) D k > τ 2 rankq T ) exp, σ 2 + nτm/3 where M is a constant upper bound of D k and σ 2 is a constant upper bound of E [ ] D k D k S k 1. k=1 Note that D k are random self-adjoint operators. Recall that Q T can be decomposed as a sum of orthogonal projections Q T = Q 0 T + Q 1 T ) + Q 2 T + Q 3 T = I P 2 T P 3 T + P 1 T P 2 P 3 T T + P 1 T P 2 T P 3. T The rank of Q T, or equivalently the dimension of its range, is given by d 1 r 2 r 3 + d 2 r 2 )r 1 r 3 + d 3 r 3 )r 1 r 2 r 2 d. Hereafter, we shall write r j for r j T ), µ for µt ), and r for rt ) for brevity when no confusion occurs. Since E [ ] D Sk 1 k = 0, the total variation is bounded by max Q T X HS =1 max Q T X HS =1 k=1 k=1 k=1 d 1 d 2 d 3 m n /m k ) [ ] Sk 1 E D k X, D k X ) 2E [ QT ) ] 2X, Sk 1 d 1 d 2 d 3 m n /m k ) P ak,bk,ck)q T X ) 2m 1 max k 1 Q T X HS =1 a,b,c QT P a,b,c) Q T ) 2X, X. Since m n m k and n k=1 m n/m k )/m k 1 = n/d 1 d 2 d 3, [ ] Sk 1 max E D k X, D k X nd 1 d 2 d 3 max QT P a,b,c) Q T. Q T X HS =1 a,b,c k=1 20

21 It then follows that max QT P a,b,c) Q T = max a,b,c Q T X HS =1 = max Q T X HS =1 µ2 r 2 d d 1 d 2 d 3. Consequently, we may take σ 2 = nµ 2 0r 2 d. Similarly, M max k Q T P a,b,c) Q T X, Q T X Q T e a e b e c, Q T X d 1 d 2 d 3 m n /m k )2 max QT P a,b,c) Q T 2µ 2 r 2 d. a,b,c Inserting the expression and bounds for rankq T ), σ 2 and M into the Bernstein inequality, we find { 1 P n k=1 τ D k > τ 2r 2 2 /2 n )) d) exp 1 + 2τ/3 µ 2 r 2, d which completes the proof because µt ) µ 0 and rt ) = r Proof of Lemma 6. In proving Lemma 6, we consider first 12). Let X be a tensor with X max 1. Similar to before, write D i = d 1 d 2 d 3 Q T P ai,b i,c i ) Q T for i = 1,..., n 1. Again, we shall also write µ for µt ), and r for rt ) for brevity. Observe that for each point a, b, c) [d 1 ] [d 2 ] [d 3 ], = 1 e a e b e c, D i X d 1 d 2 d 3 Q T e a e b e c ), Q T e ak e bk e ck ) Xa k, b k, c k ) Q T e a e b e c ), Q T X /d 1 d 2 d 3 ) 2 max Q T e a e b e c ) 2 HS X max a,b,c 2µ 2 r 2 d/d 1 d 2 d 3 ). Since the variance of a variable is no greater than the second moment, 1 E ea e b e c, Q T D i X ) 2 d 1 d 2 d 3 E QT e a e b e c ), Q T e a e b e c ) Xa k, b k, c k ) 2 1 QT e a e b e c, Q T e a e b e c ) 2 d 1 d 2 d 3 a,b,c 1 = Q T e a e b e c ) 2 HS d 1 d 2 d 3 21

22 µ 2 r 2 d/d 1 d 2 d 3 ) 2. Since e a e b e c, Q T D i X are iid random variables, the Bernstein inequality yields { 1 P n 1 n 1 e a e b e c, Q T D i X > τ This yields 12) by the union bound. 2 exp n 1 τ) 2 /2 ) n 1 µ 2 r 2 d + n 1 τ2µ 2 r 2. d/3 The proof of 11) is similar, but the matrix Bernstein inequality is used. R d 1 d 2 d 3 We equip with the Hilbert-Schmidt norm so that it can be viewed as the Euclidean space. As linear maps in this Euclidean space, the operators D i are just random matrices. Since the projection P a,b,c) : X e a e b e c, X e a e b e c is of rank 1, Q T P a,b,c) Q T = Q T e a e b e c ) 2 HS µ 2 r 2 d/d 1 d 2 d 3 ). It follows that D i 2µ 2 r 2 d. Moreover, D i is a self-adjoint operator and its covariance operator is bounded by max E QT ) 2X, Di X 2 HS d 1 d 2 d 3 ) 2 max E P ak,bk,ck)q T X X HS =1 X HS =1 d 1 d 2 d 3 Q T e a e b e c ) 2 HS ea e b e c, X 2 a,b,c d 1 d 2 d 3 max Q T e a e b e c ) 2 HS a,b,c = µ 2 r 2 d Consequently, by the matrix Bernstein inequality Tropp, 2011), { 1 P n 1 n 1 τ 2 /2 n1 )) D i > τ 2 rankq T ) exp 1 + 2τ/3 µ 2 r 2. d This completes the proof due to the fact that rankq T ) r 2 d. 5 Concentration Inequalities for Sparse Tensors We now derive probabilistic bounds for R l X when a i, b i, c i )s are iid vectors uniformly sampled from [d 1 ] [d 2 ] [d 3 ] and X = Q T X with small X max. 5.1 Symmetrization We are interested in bounding max X U η) { d 1 d 2 d 3 P n P ai,b i,c i )X X t, 22

23 e.g. with n, η, t) replaced by n 1, α 0 r) d1 d 2 d 3, 1/8) in the proof of Lemma 7, where Our first step is symmetrization. U η) = {X : Q T X = X, X max η/ d 1 d 2 d 3. Lemma 8 Let ϵ i s be a Rademacher sequence, that is a sequence of i.i.d. ϵ i with P{ϵ i = 1 = P{ϵ i = 1 = 1/2. Then max X U η) 4 max X U η) { d 1 d 2 d 3 P n { d 1 d 2 d 3 P n P ai,b i,c i )X X t ϵ i P ai,b i,c i )X t/2 + 4 exp nt 2 /2 η 2 + 2ηt d 1 d 2 d 3 /3 ). Proof of Lemma 8. argument, we get max X U η) 4 max X U η) + 2 max { d 1 d 2 d 3 P n { d 1 d 2 d 3 P n Following a now standard Giné-Zinn type of symmetrization P ai,b i,c i )X X t ϵ i P ai,b i,c i )X t/2 max X U η) u = v = w =1 { P u v w, d 1d 2 d 3 n P ai,b i,c i )X X > t. See, e.g., Giné and Zinn 1984). It remains to bound the second quantity on the right-hand side. To this end, denote by ξ i = u v w, d 1 d 2 d 3 P ai,b i,c i )X X. For u = v = w = 1 and X U η), ξ i are iid variables with Eξ i = 0, ξ i 2d 1 d 2 d 3 X max 2η d 1 d 2 d 3 and Eξ 2 i d 1 d 2 d 3 ) X 2 max η 2. Thus, the statement follows from the Bernstein inequality. In the light of Lemma 8, it suffices to consider bounding { d 1 d 2 d 3 max P ϵ i P X U η) ai,b n i,c i )X t/2. To this end, we use a thinning method to control the spectral norm of tensors. 23

24 5.2 Thinning of the spectral norm of tensors Recall that the spectral norm of a tensor Z R d 1 d 2 d 3 is defined as Z = max u R d 1,v R d 2,w R d 3 u v w 1 u v w, Z. We first use a thinning method to discretize maximization in the unit ball in R d j to the problem involving only vectors taking values 0 or ±2 l/2, l m j := log 2 d j, that is, binary digitalized vectors that belong to B mj,d j = {0, ±1, ±2 1/2,..., ±2 m j/2 d {u R d j : u 1. 16) Lemma 9 For any tensor Z R d 1 d 2 d 3, Z 8 max { u v w, Z : u B m1,d1, v B m2,d2, w B m3,d3, where m j := log 2 d j, j = 1, 2, 3. Proof of Lemma 9. Denote by C m,d = min a =1 max u a, u B m,d which bounds the effect of discretization. Let X be a linear mapping from R d to a linear space equipped with a seminorm. Then, Xu can be written as the maximum of ϕxu) over linear functionals ϕ ) of unit dual norm. Since max u 1 u a = 1 for a = 1, it follows from the definition of C m,d that max u a a C 1 m,d u 1 max u a/ a ) = C 1 u B m,d m,d max u a u B m,d for every a R d with a > 0. Consequently, for any positive integer m, max Xu = u 1 max a:a v=ϕxv) v u 1 max a u C 1 An application of the above inequality to each coordinate yields m,d max Xu. u B m,d Z C 1 m 1,d 1 C 1 m 2,d 2 C 1 m 3,d 3 max u v w, Z. u B m1,d 1,v B m2,d 2,w B m3,d 3 It remains to show that C mj,d j 1/2. To this end, we prove a stronger result that for any m and d, C 1 m,d 2 + 2d 1)/2 m 1). 24

25 Consider first a continuous version of C m,d : C m,d = min max a =1 u B m,d a u. where B m,d = {t : t2 [0, 1] \ 0, 2 m ) d {u : u 1. Without loss of generality, we confine the calculation to nonnegative ordered a = a 1,..., a d ) satisfying 0 a 1... a d and a = 1. Let { j 1 k = max j : 2 m a 2 j + a 2 i 1 and v = a ii{i > k) d 1 {1. k a2 i 1/2 Because 2 m vk+1 2 = 2m a 2 k+1 /1 k a2 i ) 1, we have v B m,d. By the definition of k, there exists x 2 a 2 k satisfying 2 m 1)x 2 + k a 2 i = 1. It follows that k a 2 i = k a2 i 2 m 1)x 2 + k a2 i kx 2 2 m 1)x 2 + kx d m + d 2. Because a v = 1 k a2 i ) 1/2 for this specific v B m,d, we get C m,d min 1 a 2 =1 k a 2 i ) 1/2 1 d 1 ) 1/2 2 m 1 ) 1/2. = 2 m + d 2 2 m + d 2 Now because every v B m,d with nonnegative components matches a u B m,d with we find C m,d C m,d / 2. Consequently, sgnv i ) 2u i v i sgnv i )u i, 1/C m,d 2/C m,d 2{1 + d 1)/2 m 1) 1/2. It follows from Lemma 9 that the spectrum norm Z is of the same order as the maximum of u v w, Z over u B m1,d 1, v B m2,d 2 and w B m3,d 3. We will further decompose such tensors u v w according to the absolute value of their entries and bound the entropy of the components in this decomposition. 25

26 5.3 Spectral norm of tensors with sparse support Denote by D j a digitalization operator such that D j X) will zero out all entries of X whose absolute value is not 2 j/2, that is D j X) = I { e a e b e c, X = 2 j/2 e a e b e c, X e a e b e c. 17) a,b,c With this notation, it is clear that for u B m1,d 1, v B m2,d 2 and w B m3,d 3, u v w, X = m 1 +m 2 +m 3 j=0 D j u v w), X. The possible choice of D j u v w) in the above expression may be further reduced if X is sparse. More specifically, denote by suppx) = {ω [d 1 ] [d 2 ] [d 3 ] : Xω) 0. Define the maximum aspect ratio of suppx) as νsuppx) = max l=1,2,3 max i k :k l {i l : i 1, i 2, i 3 ) suppx). 18) In other words, the quantity νsuppx) is the maximum l 0 norm of the fibers of the third-order tensor. We observe first that, if suppx) is a uniformly sampled subset of [d 1 ] [d 2 ] [d 3 ], then it necessarily has a small aspect ratio. Lemma 10 Let Ω be a uniformly sampled subset of [d 1 ] [d 2 ] [d 3 ] without replacement. Let d = d 1 + d 2 + d 3, p = maxd 1, d 2, d 3 )/d 1 d 2 d 3 ), and ν 1 = d δ 1 enp ) {3 + β)/δ 1 with a certain δ 1 [1/ log d, 1]. Then, P {ν Ω ν 1 d β 1 /3. Proof of Lemma 10. Let p 1 = d 1 /d 1 d 2 d 3 ), t = logν 1 /np )) 1, and N i2 i 3 = {i l : i 1, i 2, i 3 ) Ω. Because N i2 i 3 follows the Hypergeometricd 1 d 2 d 3, d 1, n) distribution, its moment generating function is no greater than that of Binomialn, p 1 ). Due to p 1 p, P{N i2 i 3 ν 1 exp tν 1 + np e t 1) ) exp ν 1 logν 1 /enp ))). 26

27 The condition on ν 1 implies ν 1 logν 1 /enp )) 3 + β) log d. By the union bound, P{max i 2,i 3 N i2 i 3 ν 1 d 2 d 3 d 3 β. By symmetry, the same tail probability bound also holds for max i1,i 3 {i 2 : i 1, i 2, i 3 ) Ω and max i1,i 2 {i 3 : i 1, i 2, i 3 ) Ω, so that P{ν Ω ν 1 d 1 d 2 + d 1 d 3 + d 2 d 3 )d 3 β. The conclusion follows from d 1 d 2 + d 1 d 3 + d 2 d 3 d 2 /3. We are now in position to further reduce the set of maximization in defining the spectrum norm of sparse tensors. To this end, denote for a block A B C [d 1 ] [d 2 ] [d 3 ], ha B C) = min { ν : A ν B C, B ν A C, C ν A B. 19) It is clear that for any block A B C, there exists à A, B B and C C such that hã B C) ν Ω and A B C) Ω = à B C) Ω. For u B m1,d 1, v B m2,d 2 and w B m3,d 3, let A i1 = {a : u 2 a = 2 i 1, B i2 = {b : v 2 b = 2 i 2 and C i3 = {c : wc 2 = 2 i 3, and define D j u v w) = i 1,i 2,i 3 ):i 1 +i 2 +i 3 =j PÃj,i1 B j,i2 C j,i3 D j u v w) 20) where Ãj,i 1 A j,i1, B j,i2 B j,i2 and C j,i3 C j,i3 satisfying hãj,i 1 B j,i2 C j,i3 ) ν Ω and A i1 B i2 C i3 ) Ω = Ãj,i 1 B j,i2 C j,i3 ) Ω. Because P Ω D j u v w) is supported in i1,i 2,i 3 ):i 1 +i 2 +i 3 =ja i1 B i2 C i3 ) Ω, we have P Ω Dj u v w) = P Ω D j u v w). This observation, together with Lemma 9, leads to the following characterization of the spectral norm of a tensor support on a set with bounded aspect ratio. Lemma 11 Let m l = log 2 d l for l = 1, 2, 3, and D j ) and D j ) be as in 17) and 20) respectively. Define { BΩ,m = Dj u 1 u 2 u 3 ) + 0jm m <jm D j u 1 u 2 u 3 ) : u l B ml,d l and B ν,m = νω νb Ω,m. Let X R d 1 d 2 d 3 be a tensor with suppx) Ω. For any 0 m m 1 + m 2 + m 3 and ν ν Ω, X 8 max Y B Ω,m Y, X 8 max Y, X. Y Bν,m, 27

28 5.4 Entropy bounds Essential to our argument are entropic bounds related to Bν,m. It is clear that ) dj #{D j u) : u 1 2 2k d j exp2 k d 2 k j )log logd j /2 k )) + )), d j so that by 16) m j B mj,d j k=0 dj 2 k d j ) 2 2k d j exp d j l=1 Consequently, due to d = d 1 + d 2 + d 3 and /4, ) 2 l log log2 l )) exp4.78 d j ) 3 B ν,m B mj,d j e 21/4)d. 21) We derive tighter entropy bounds for slices of Bν,m by considering D ν,j,k = {D j Y ) : Y B ν,m, D j Y ) 2HS 2 k j. j=1 Here and in the sequel, we suppress the dependence of D on quantities such as m, m 1, m 2, m 3 for brevity, when no confusion occurs. Lemma 12 Let Lx, y) = max{1, logey/x) and ν 1. For all 0 k j m, log D ν,j,k 21/4)Jν, j, k), 22) where Jν, j, k) = j + 2) ν2 k 1 L ν2 k 1, j + 2)d ). Proof of Lemma 12. We first bound the entropy of a single block. Let { D block) ν,l = sgnu a )sgnv b )sgnw c )I{a, b, c) A B C : ha B C) ν, A B C = l. By the constraints on the size and aspect ratio of the block, By dividing D block) ν,l D block) ν,l max A 2, B 2, C 2 ) ν A B C νl. into subsets according to l 1, l 2, l 3 ) = A, B, C ), we find l 1 l 2 l 3 =l,maxl 1,l 2,l 3 ) νl 28 2 l 1+l 2 +l 3 d1 l 1 ) ) d2 d3 l 2 l 3 )

29 By the Stirling formula, for i = 1, 2, 3, [ ) log 2πli 2 l di ] i l i L l i, 2d ) νl L ) νl, 2d. l i We note that kk + 1)/2 q k ) is no greater than 2.66, 1.16 and 1 respectively for q = 2, q = 3 and q 5. Let l = m j=1 qk j j with distinct prime factors q j. We get {l 1, l 2, l 3 ) : l 1 l 2 l 3 = l = m ) kj + 1 m q k j j 2 j=1 j=1 πl 1/2 3 2πli. It follows that D block) ν,l exp 3 νll νl, 2d ) ). 23) Due to the constraint i 1 + i 2 + i 3 = j in defining Bν,m, for any Y Bν,m, D j Y ) is composed of at most i = ) j+2 2 blocks. Since the sum of the sizes of the blocks is bounded by 2 k, 23) yields D ν,j,k i D block) νl i l l i 2 k i exp 3 νl i L νli, 2d )) l l i 2 k i 2 k ) i max exp 3 νl i L νli, 2d )). l l i 2 k It follows from the definition of Lx, y) and the Cauchy-Schwarz inequality that i li L νli, 2d ) = i li L ν2k, 2d ) )) + log 2k /l i 2 k i L ν2k, 2d ) + i i 2 k L ν2k, 2d ) + log i )), li /2 k log 2k /l i )) where the last inequality above follows from the fact that subject to u 1, u 2 0 and u 2 1 +u 2 2 2c 2 2, the maximum of u 1 logu 1 ) u 2 logu 2 ) is attained at u 1 = u 2 = c. Consequently, since i ) j+2 2, log D ν,j,k i log 2 k) + 3 ν2k i ν2 k L, 2d j+2 2 ) ). 29

30 We note that j k, 2 k k 8/3, ν 1 and xlx, y) is increasing in x, so that j+2 ) ) i ν2 k L ν2k, 2d 2 8/3L i log k j+2 ) ) 8/2, 2d 2 k) 2 8 loged) i log 2 i log 8. Moreover, because 2 m +3/2 8 3 j=1 2d j) 82d/3) 3 d 3, we get 2i j + 1)j + 2) j + 3/2 m + 3/2 3/ log 2) log d, so that the right-hand side of the above inequality is no smaller than 4/ log 8)log 2)/3 = 4/9. It follows that j+2 ) log D ν,j,k 3 + 9/4) i ν2 L ) k ν2k, 2d. This yields 22) due to i ) j By 21), the entropy bound in Lemma 12 is useful only when 0 k j m, where { m = min x : x m or Jν 1, x, x) d. 24) 5.5 Probability bounds We are now ready to derive a useful upper bound for { d 1 d 2 d 3 max P ϵ i P X U η) ai,b n i,c i )X t. Let X U η). For brevity, write Z i = d 1 d 2 d 3 ϵ i P ai,b i,c i )X and Z = d 1d 2 d 3 n ϵ i P ai,b i,c i )X = 1 n Z i. Let Ω = {a i, b i, c i ) : i n. In the light of Lemma 10, we shall proceed conditional on the event that ν Ω ν 1 in this subsection. In this event, Lemma 11 yields Z 8 max Y, Z 25) Y Bν 1,m ) = 8 max Y Bν 1,m 0jm D j Y ), Z + S Y ), Z where m is as in 24) with the given ν 1 and S Y ) = j>m D j Y ). Let Y Bν 1,m and Y j = D j Y ). Recall that for Y j 0, 2 j Y j 2 HS 1, so that Y j j k=0 D ν 1,j,k. To bound the first term on the right-hand side of 25), consider { P max Y j, Z tm + 2) 1/2 Y j HS Y j D ν1,j,k\d ν1,j,k 1 30,

31 with 0 k j m. Because Z i max d 1 d 2 d 3 X max η d 1 d 2 d 3, for any Y j D ν1,j,k, Y j, Z i 2 j/2 η d 1 d 2 d 3, E Y j, Z i 2 η 2 Y j 2 HS η 2 2 k j. Let h 0 u) = 1 + u) log1 + u) u. By Bennet s inequality, P { Y j, Z t2 k j 1)/2 m + 2) 1/2 exp nη2 2 k j ) t2 k j 1)/2 m + 2) 1/2 )2 j/2 η )) d 1 d 2 d 3 ) h 2 j η 2 0 d 1 d 2 d 3 η 2 2 k j = exp n2k t )) d 1 d 2 d 3 h 0 d 1 d 2 d 3 η. m + 2)2 k+1 Recall that D ν,j,k = { Y j = D j Y ) : Y B ν,m, Y j 2 HS 2k j. By Lemma 12, { P max Y j, Z tm + 2) 1/2 Y j HS Y j D ν1,j,k\d ν1,j,k 1 D ν1,j,k max P { Y j, Z t2 k j 1)/2 m + 2) 1/2 26) Y j D ν1,j,k exp 21/4)Jν 1, j, k) n2k t )) d 1 d 2 d 3 h 0 d 1 d 2 d 3 η. m + 2)2 k+1 Let L k = 1 logedm + 2)/ ν 1 2 k 1 ). By the definition of Jν, j, k) in Lemma 12, Jν 1, j, k) Jν 1, m, k) = m + 2) 2 k 1 ν 1 L k Jν 1, m, m ) = d. Let x 1 and t 1 be a constant satisfying nt 1 24ηm + 2) 3/2 ν 1 d 1 d 2 d 3, nxt 2 1h 0 1) 12η 2 m + 2) 2 edl 0. 27) We prove that for all x 1 and 0 k j m ) n2 k xt 1 d1 d 2 d 3 h 0 d 1 d 2 d 3 η 6xJν 1, m, k) 6xJν 1, j, k). 28) m + 2)2 k+1 Consider the following three cases: xt 1 d1 d 2 d 3 Case 1: η m + 2)2 k+1 Case 2: 1 < xt 1 d1 d 2 d 3 η m + 2)2 k+1 Case 3: xt 1 d1 d 2 d 3 η 1. m + 2)2k+1 31 [ { max 1, edm ] 1/2 + 2), ν1 2 k 1 [ edm + 2) ν1 2 k 1 ] 1/2,

32 Case 1: Due to h 0 u) u/2) log1 + u) for u 1 and the lower bound for nt 1, ) ) n2 k xt 1 d1 d 2 d 3 h 0 d 1 d 2 d 3 η n2k xt 1 d1 d 2 d 3 m + 2)2 k+1 d 1 d 2 d 3 η L k m + 2)2 k+1 4 Case 2: Due to m + 2) 2 k 1 ν 1 d, we have 1 xt 1 em + 2) d1 d 2 d 3 η m + 2)2 k+1 6xm + 2) ν 1 2 k 1 L k. [ edm + 2) ν1 2 k 1 ] 1 = xt 1 ν1 2 k 1 ηd em + 2)2 k+1. Thus, due to h 0 u) uh 0 1) for u 1, we have ) n2 k xt 1 d1 d 2 d 3 h 0 d 1 d 2 d 3 η xt 1 ν1 2 k 1 ) n 2 m + 2)2 k+1 ηd k 1 xt 1 h 0 1) em + 2)2 k+1 η m + 2 = nx2 t 2 1h 0 1) ν 1 2 k 1 2η 2 m + 2) ed. Because nxt 2 1h 0 1) 12η 2 m + 2) 2 edl 0 and L 0 L k, it follows that ) n2 k xt 1 d1 d 2 d 3 h 0 d 1 d 2 d 3 η 6xm + 2) ν 1 2 k 1 L k. m + 2)2 k+1 Case 3: Due to h 0 u) u 2 h 0 1) for 0 u 1 and nxt 2 1h 0 1) 12η 2 m + 2)d, we have ) n2 k xt 1 d1 d 2 d 3 h 0 d 1 d 2 d 3 η nx2 t 2 1h 0 1) m + 2)2 k+1 2η 2 m + 2) 6xd 6xJν 1, m, k). Thus, 28) holds in all three cases. It follows from 26) and 28) that for t 1 satisfying 27) and all x 1 { P max Y j, Z xt 1 Y j HS exp 6x 21/4)Jν 1, m, k)). Y j D ν1,j,k\d ν1,j,k 1 m + 2 We note that Jν 1, m, k) J1, m, 0) m + 2) logedm + 2)) by the monotonicity of x logy/x) for x [1, y] and ν 1 2 k 1 1. Summing over 0 k j m, we find by the union bound that { P max m + 2 0jm Y Bν 1,m :D jy ) 0 2 D j Y ), Z D j Y ) HS max ) {edm + 2) 6x 21/4)m +2) For the second term on the right-hand side of 25), we have xt 1 m + 2) 1/2 S Y ), Z i 2 m /2 η d 1 d 2 d 3, E S Y ), Z i ) 2 η 2 S Y ) 2 HS. 32

33 As in the proof of 26) and 28), log Bν 1,m 5d = 5Jν 1, m, m ) implies { S Y ), Z xt 1 P max Y Bν 1,m :S Y ) 0 S Y ) HS m + 2) 1/2 m m ){edm + 2) 6x 21/4)m +2) By Cauchy Schwartz inequality, for any Y B ν 1,m, S Y ) HS + 0jm D j Y ) HS m + 2) 1/2 S Y ) 2 HS + 0jm D j Y ) 2 HS ) 1/2 1. Thus, by 25), for all t 1 satisfying 27) and x 1, { P max Y, Z xt 1 Y Bν 1,m { ) m m m {edm + 2) 6x 21/4)m +2). 29) 2 We now have the following probabilistic bound via 29) and Lemma 10. Lemma 13 Let ν 1 be as in Lemma 10, x 1, t 1 as in 27) and m in 24). Then, { d 1 d 2 d 3 max P ϵ i P X U η) ai,b n i,c i )X xt 1 { ) m m m {edm + 2) 6x 21/4)m +2) + d β 1 / Proof of Lemma 7 We are now in position to prove Lemma 7. Let Ω 1 = {a i, b i, c i ), i n 1. By the definition of the tensor coherence αx) and the conditions on αt ) and rt ), we have W max η/ d 1 d 2 d 3 with η = α 0 r) d1 d 2 d 3, so that in the light of Lemmas 8, 10 and 13, { d 1 d 2 d 3 n 1 max P P X:X=Q T X ai,b n i,c i )X X X max W max ) n 1 1/16) 2 /2 4 exp η 2 + 2/3)η1/16) { ) d 1 d 2 d 3 m m m {edm + 2) 6x 21/4)m +2) + d β 1 /3 2 with t = 1/8 in Lemma 8 and the ν 1 in Lemma 10, provided that xt 1 1/16 and n 1 t 1 24ηm + 2) 3/2 ν 1 d 1 d 2 d 3, n 1 xt 2 1h 0 1) 12η 2 m + 2) 2 edl )

34 Thus, the right-hand side of 30) is no greater than d β 1 for certain x > 1 and t 1 satisfying these conditions when n 1 c β ) η ) m + 2) m ν 1 d 1 d 2 d 3 + η 2 m + 2) 2 L 0 d +c 11 + β)log d) η 2 + η ) d 1 d 2 d 3 for a sufficiently large constant c 1. Because 2 m 1 Jν 1, m, m ) = d, it suffices to have [ β ) { n 1 c 1 + log d η log d)ν 1 d 1 d 2 d 3 + η 2 log d) d β)log d)η ] d 1 d 2 d 3. When the sample size is n 1, ν 1 = d δ 1 en 1 maxd 1, d 2, d 3 )/d 1 d 2 d 3 )) {3 + β)/δ 1 in Lemma 10 with δ 1 [1/ log d, 1]. When ν 1 = d δ 1 en 1 maxd 1, d 2, d 3 )/d 1 d 2 d 3 ), n 1 x ν 1 d 1 d 2 d 3 iff n 1 x 2 d δ 1 e maxd 1, d 2, d 3 ). Thus, it suffices to have [ β ) 2η n 1 c 1 + log d 2 log d)d 1+δ 1 + β + log d ) ] η log d)3 + β)δ1 1 d 1 d 2 d 3 [ β ) + log d η 2 log d) 2 d β)log d)η ] d 1 d 2 d 3. +c 1 Due to 1 + β)log d) 1 + β + log d, the quantities in the second line in the above inequality is absorbed into those in the first line. Consequently, with η = α 0 r, the stated sample size is sufficient. 6 Concluding Remarks In this paper, we study the performance of nuclear norm minimization in recovering a large tensor with low Tucker ranks. Our results demonstrate the benefits of not treating tensors as matrices despite its popularity. Throughout the paper, we have focused primarily on third order tensors. In principle, our technique can also be used to treat higher order tensors although the analysis is much more tedious and the results quickly become hard to describe. Here we outline a considerably simpler strategy which yields similar sample size requirement as the vanilla nuclear norm minimization. The goal is to illustrate some unique and interesting phenomena associated with higher order tensors. The idea is similar to matricization instead of unfolding a Nth order tensor into a matrix, we unfold it into a cubic or nearly cubic third order tensor. To fix ideas, we shall restrict our attention to hyper cubic Nth order tensors with d 1 =... = d N =: d and r 1 T ), r 2 T ),..., r N T ) are bounded from above by a constant. The discussion can be straightforwardly extended to more general situations. In this case, the resulting third order tensor will have dimensions either d N/3 or d N/3 +1, and Tucker ranks again bounded. 34

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression

Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression Han Chen, Garvesh Raskutti and Ming Yuan University of Wisconsin-Madison Morgridge Institute for Research and Department

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Rank Determination for Low-Rank Data Completion

Rank Determination for Low-Rank Data Completion Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,

More information

SCALE INVARIANT FOURIER RESTRICTION TO A HYPERBOLIC SURFACE

SCALE INVARIANT FOURIER RESTRICTION TO A HYPERBOLIC SURFACE SCALE INVARIANT FOURIER RESTRICTION TO A HYPERBOLIC SURFACE BETSY STOVALL Abstract. This result sharpens the bilinear to linear deduction of Lee and Vargas for extension estimates on the hyperbolic paraboloid

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Statistical Performance of Convex Tensor Decomposition

Statistical Performance of Convex Tensor Decomposition Slides available: h-p://www.ibis.t.u tokyo.ac.jp/ryotat/tensor12kyoto.pdf Statistical Performance of Convex Tensor Decomposition Ryota Tomioka 2012/01/26 @ Kyoto University Perspectives in Informatics

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

INF-SUP CONDITION FOR OPERATOR EQUATIONS

INF-SUP CONDITION FOR OPERATOR EQUATIONS INF-SUP CONDITION FOR OPERATOR EQUATIONS LONG CHEN We study the well-posedness of the operator equation (1) T u = f. where T is a linear and bounded operator between two linear vector spaces. We give equivalent

More information

Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression

Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression Journal of Machine Learning Research 20 (2019) 1-37 Submitted 12/16; Revised 1/18; Published 2/19 Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression Han Chen Garvesh Raskutti

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Contents. 0.1 Notation... 3

Contents. 0.1 Notation... 3 Contents 0.1 Notation........................................ 3 1 A Short Course on Frame Theory 4 1.1 Examples of Signal Expansions............................ 4 1.2 Signal Expansions in Finite-Dimensional

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

A Short Course on Frame Theory

A Short Course on Frame Theory A Short Course on Frame Theory Veniamin I. Morgenshtern and Helmut Bölcskei ETH Zurich, 8092 Zurich, Switzerland E-mail: {vmorgens, boelcskei}@nari.ee.ethz.ch April 2, 20 Hilbert spaces [, Def. 3.-] and

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem Morteza Ashraphijuo Columbia University ashraphijuo@ee.columbia.edu Xiaodong Wang Columbia University wangx@ee.columbia.edu

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Normed & Inner Product Vector Spaces

Normed & Inner Product Vector Spaces Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed

More information

Information-Theoretic Limits of Matrix Completion

Information-Theoretic Limits of Matrix Completion Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We

More information

Measurable functions are approximately nice, even if look terrible.

Measurable functions are approximately nice, even if look terrible. Tel Aviv University, 2015 Functions of real variables 74 7 Approximation 7a A terrible integrable function........... 74 7b Approximation of sets................ 76 7c Approximation of functions............

More information

1 Definition and Basic Properties of Compa Operator

1 Definition and Basic Properties of Compa Operator 1 Definition and Basic Properties of Compa Operator 1.1 Let X be a infinite dimensional Banach space. Show that if A C(X ), A does not have bounded inverse. Proof. Denote the unit ball of X by B and the

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

arxiv: v1 [math.pr] 22 Dec 2018

arxiv: v1 [math.pr] 22 Dec 2018 arxiv:1812.09618v1 [math.pr] 22 Dec 2018 Operator norm upper bound for sub-gaussian tailed random matrices Eric Benhamou Jamal Atif Rida Laraki December 27, 2018 Abstract This paper investigates an upper

More information

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

4 Linear Algebra Review

4 Linear Algebra Review Linear Algebra Review For this topic we quickly review many key aspects of linear algebra that will be necessary for the remainder of the text 1 Vectors and Matrices For the context of data analysis, the

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

We describe the generalization of Hazan s algorithm for symmetric programming

We describe the generalization of Hazan s algorithm for symmetric programming ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline

More information

Fundamentals of Multilinear Subspace Learning

Fundamentals of Multilinear Subspace Learning Chapter 3 Fundamentals of Multilinear Subspace Learning The previous chapter covered background materials on linear subspace learning. From this chapter on, we shall proceed to multiple dimensions with

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

Invertibility of symmetric random matrices

Invertibility of symmetric random matrices Invertibility of symmetric random matrices Roman Vershynin University of Michigan romanv@umich.edu February 1, 2011; last revised March 16, 2012 Abstract We study n n symmetric random matrices H, possibly

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

j=1 x j p, if 1 p <, x i ξ : x i < ξ} 0 as p.

j=1 x j p, if 1 p <, x i ξ : x i < ξ} 0 as p. LINEAR ALGEBRA Fall 203 The final exam Almost all of the problems solved Exercise Let (V, ) be a normed vector space. Prove x y x y for all x, y V. Everybody knows how to do this! Exercise 2 If V is a

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Convex Feasibility Problems

Convex Feasibility Problems Laureate Prof. Jonathan Borwein with Matthew Tam http://carma.newcastle.edu.au/drmethods/paseky.html Spring School on Variational Analysis VI Paseky nad Jizerou, April 19 25, 2015 Last Revised: May 6,

More information

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent Chapter 5 ddddd dddddd dddddddd ddddddd dddddddd ddddddd Hilbert Space The Euclidean norm is special among all norms defined in R n for being induced by the Euclidean inner product (the dot product). A

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

arxiv: v1 [math.pr] 22 May 2008

arxiv: v1 [math.pr] 22 May 2008 THE LEAST SINGULAR VALUE OF A RANDOM SQUARE MATRIX IS O(n 1/2 ) arxiv:0805.3407v1 [math.pr] 22 May 2008 MARK RUDELSON AND ROMAN VERSHYNIN Abstract. Let A be a matrix whose entries are real i.i.d. centered

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we

More information

arxiv: v1 [math.oc] 11 Jun 2009

arxiv: v1 [math.oc] 11 Jun 2009 RANK-SPARSITY INCOHERENCE FOR MATRIX DECOMPOSITION VENKAT CHANDRASEKARAN, SUJAY SANGHAVI, PABLO A. PARRILO, S. WILLSKY AND ALAN arxiv:0906.2220v1 [math.oc] 11 Jun 2009 Abstract. Suppose we are given a

More information

Lecture 6: Geometry of OLS Estimation of Linear Regession

Lecture 6: Geometry of OLS Estimation of Linear Regession Lecture 6: Geometry of OLS Estimation of Linear Regession Xuexin Wang WISE Oct 2013 1 / 22 Matrix Algebra An n m matrix A is a rectangular array that consists of nm elements arranged in n rows and m columns

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

JUHA KINNUNEN. Harmonic Analysis

JUHA KINNUNEN. Harmonic Analysis JUHA KINNUNEN Harmonic Analysis Department of Mathematics and Systems Analysis, Aalto University 27 Contents Calderón-Zygmund decomposition. Dyadic subcubes of a cube.........................2 Dyadic cubes

More information

PolynomialExponential Equations in Two Variables

PolynomialExponential Equations in Two Variables journal of number theory 62, 428438 (1997) article no. NT972058 PolynomialExponential Equations in Two Variables Scott D. Ahlgren* Department of Mathematics, University of Colorado, Boulder, Campus Box

More information

On Least Squares Linear Regression Without Second Moment

On Least Squares Linear Regression Without Second Moment On Least Squares Linear Regression Without Second Moment BY RAJESHWARI MAJUMDAR University of Connecticut If \ and ] are real valued random variables such that the first moments of \, ], and \] exist and

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 22: Preliminaries for Semiparametric Inference

Introduction to Empirical Processes and Semiparametric Inference Lecture 22: Preliminaries for Semiparametric Inference Introduction to Empirical Processes and Semiparametric Inference Lecture 22: Preliminaries for Semiparametric Inference Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

arxiv: v1 [math.na] 5 May 2011

arxiv: v1 [math.na] 5 May 2011 ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and

More information

NOTES ON FRAMES. Damir Bakić University of Zagreb. June 6, 2017

NOTES ON FRAMES. Damir Bakić University of Zagreb. June 6, 2017 NOTES ON FRAMES Damir Bakić University of Zagreb June 6, 017 Contents 1 Unconditional convergence, Riesz bases, and Bessel sequences 1 1.1 Unconditional convergence of series in Banach spaces...............

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

arxiv: v2 [math.ag] 24 Jun 2015

arxiv: v2 [math.ag] 24 Jun 2015 TRIANGULATIONS OF MONOTONE FAMILIES I: TWO-DIMENSIONAL FAMILIES arxiv:1402.0460v2 [math.ag] 24 Jun 2015 SAUGATA BASU, ANDREI GABRIELOV, AND NICOLAI VOROBJOV Abstract. Let K R n be a compact definable set

More information

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers. Chapter 3 Duality in Banach Space Modern optimization theory largely centers around the interplay of a normed vector space and its corresponding dual. The notion of duality is important for the following

More information

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE 5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 Uncertainty Relations for Shift-Invariant Analog Signals Yonina C. Eldar, Senior Member, IEEE Abstract The past several years

More information

Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra

Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra D. R. Wilkins Contents 3 Topics in Commutative Algebra 2 3.1 Rings and Fields......................... 2 3.2 Ideals...............................

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

Typical Problem: Compute.

Typical Problem: Compute. Math 2040 Chapter 6 Orhtogonality and Least Squares 6.1 and some of 6.7: Inner Product, Length and Orthogonality. Definition: If x, y R n, then x y = x 1 y 1 +... + x n y n is the dot product of x and

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector

More information

Dissertation Defense

Dissertation Defense Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation

More information

Quantum Computing Lecture 2. Review of Linear Algebra

Quantum Computing Lecture 2. Review of Linear Algebra Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces

More information

Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes

Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes Michael M. Sørensen July 2016 Abstract Path-block-cycle inequalities are valid, and sometimes facet-defining,

More information

2. Signal Space Concepts

2. Signal Space Concepts 2. Signal Space Concepts R.G. Gallager The signal-space viewpoint is one of the foundations of modern digital communications. Credit for popularizing this viewpoint is often given to the classic text of

More information

CHEBYSHEV INEQUALITIES AND SELF-DUAL CONES

CHEBYSHEV INEQUALITIES AND SELF-DUAL CONES CHEBYSHEV INEQUALITIES AND SELF-DUAL CONES ZDZISŁAW OTACHEL Dept. of Applied Mathematics and Computer Science University of Life Sciences in Lublin Akademicka 13, 20-950 Lublin, Poland EMail: zdzislaw.otachel@up.lublin.pl

More information

Some tensor decomposition methods for machine learning

Some tensor decomposition methods for machine learning Some tensor decomposition methods for machine learning Massimiliano Pontil Istituto Italiano di Tecnologia and University College London 16 August 2016 1 / 36 Outline Problem and motivation Tucker decomposition

More information

Low-Rank Matrix Recovery

Low-Rank Matrix Recovery ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Math 61CM - Solutions to homework 6

Math 61CM - Solutions to homework 6 Math 61CM - Solutions to homework 6 Cédric De Groote November 5 th, 2018 Problem 1: (i) Give an example of a metric space X such that not all Cauchy sequences in X are convergent. (ii) Let X be a metric

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Using Friendly Tail Bounds for Sums of Random Matrices

Using Friendly Tail Bounds for Sums of Random Matrices Using Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA,

More information

Overview of normed linear spaces

Overview of normed linear spaces 20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural

More information

A linear algebra proof of the fundamental theorem of algebra

A linear algebra proof of the fundamental theorem of algebra A linear algebra proof of the fundamental theorem of algebra Andrés E. Caicedo May 18, 2010 Abstract We present a recent proof due to Harm Derksen, that any linear operator in a complex finite dimensional

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Where is matrix multiplication locally open?

Where is matrix multiplication locally open? Linear Algebra and its Applications 517 (2017) 167 176 Contents lists available at ScienceDirect Linear Algebra and its Applications www.elsevier.com/locate/laa Where is matrix multiplication locally open?

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e., Abstract Hilbert Space Results We have learned a little about the Hilbert spaces L U and and we have at least defined H 1 U and the scale of Hilbert spaces H p U. Now we are going to develop additional

More information

CHAPTER VIII HILBERT SPACES

CHAPTER VIII HILBERT SPACES CHAPTER VIII HILBERT SPACES DEFINITION Let X and Y be two complex vector spaces. A map T : X Y is called a conjugate-linear transformation if it is a reallinear transformation from X into Y, and if T (λx)

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Topological properties of Z p and Q p and Euclidean models

Topological properties of Z p and Q p and Euclidean models Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

Linear algebra for computational statistics

Linear algebra for computational statistics University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j

More information