Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality

Size: px
Start display at page:

Download "Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality"

Transcription

1 Proceedings of Machine Learning Research vol 65: 40, 207 Solving SDPs for synchronization and MaxCut problems via the Grothendiec inequality Song Mei Institute for Computational and Mathematical Engineering, Stanford University Theodor Misiaiewicz Departement de Physique, Ecole Normale Supérieure Andrea Montanari Department of Electrical Engineering and Department of Statistics, Stanford University Roberto I. Oliveira Instituto Nacional de Matemática Pura e Aplicada IMPA) SONGMEI@STANFORD.EDU THEODOR.MISIAKIEWICZ@ENS.FR MONTANARI@STANFORD.EDU RIMFO@IMPA.BR Abstract A number of statistical estimation problems can be addressed by semidefinite programs SDP). While SDPs are solvable in polynomial time using interior point methods, in practice generic SDP solvers do not scale well to high-dimensional problems. In order to cope with this problem, Burer and Monteiro proposed a non-convex ran-constrained formulation, which has good performance in practice but is still poorly understood theoretically. In this paper we study the ran-constrained version of SDPs arising in MaxCut and in Z 2 and SOd) synchronization problems. We establish a Grothendiec-type inequality that proves that all the local maxima and dangerous saddle points are within a small multiplicative gap from the global maximum. We use this structural information to prove that SDPs can be solved within a nown accuracy, by applying the Riemannian trust-region method to this non-convex problem, while constraining the ran to be of order one. For the MaxCut problem, our inequality implies that any local maximizer of the ran-constrained SDP provides a / )) approximation of the MaxCut, when the ran is fixed to. We then apply our results to data matrices generated according to the Gaussian Z 2 synchronization problem, and the two-groups stochastic bloc model with large bounded degree. We prove that the error achieved by local maximizers undergoes a phase transition at the same threshold as for information-theoretically optimal methods. Keywords: Semidefinite programming, non-convex optimization, MaxCut, group synchronization, Grothendiec inequality. Introduction A successful approach to statistical estimation and statistical learning suggests to estimate the object of interest by solving an optimization problem, for instance motivated by maximum lielihood, or empirical ris minimization. In modern applications, the unnown object is often combinatorial, e.g. a sparse vector in high-dimensional regression or a partition in clustering. In these cases, the resulting optimization problem is computationally intractable and convex relaxations have been a method of choice for obtaining tractable and yet statistically efficient estimators. c 207 S. Mei, T. Misiaiewicz, A. Montanari & R.I. Oliveira.

2 MEI MISIAKIEWICZ MONTANARI OLIVEIRA In this paper we consider the following specific semidefinite program maximize A, X subject to X ii =, i [n], X 0, MC-SDP) as well as some of its generalizations. This SDP famously arises as a convex relaxation of the MaxCut problem, whereby the matrix A is the opposite of the adjacency matrix of the graph to be cut. In a seminal paper, Goemans and Williamson Goemans and Williamson, 995) proved that this SDP provides a approximation of the combinatorial problem. Under the unique games conjecture, this approximation factor is optimal for polynomial time algorithms Khot et al., 2007). More recently, SDPs of this form see below for generalizations) have been studied in the context of group synchronization and community detection problems. An incomplete list of references includes Singer 20), Singer and Sholnisy 20), Bandeira et al. 204), Guédon and Vershynin 206), Montanari and Sen 206), Javanmard et al. 206), Haje et al. 206), Abbe et al. 206). In community detection, we try to partition the vertices of a graph into tightly connected communities under a statistical model for the edges. Synchronization aims at estimating n elements g,..., g n in a group G, from the pairwise noisy measurement of the group differences gi g j. Examples include Z 2 synchronization in which G = Z 2 = {+, }, ) the group with elements {+, } and usual multiplication), angular synchronization in which G = U) the multiplicative group of complex numbers of modulo one), and SOd) synchronization in which we need to estimate n rotations R,..., R n SOd) from the special orthogonal group. In this paper, we will focus on Z 2 synchronization and SOd) synchronization. Although SDPs can be solved to arbitrary precision in polynomial time Nesterov, 203), generic solvers do not scale well to large instances. In order to address the scalability problem, Burer and Monteiro 2003) proposed to reduce the problem dimensions by imposing the ran constraint ranx). This constraint can be solved by setting X = σσ T where σ R n. In the case of MC-SDP), we obtain the following non-convex problem, with decision variable σ: maximize σ, Aσ subject to σ = [σ,..., σ n ] T R n, σ i 2 =, i [n]. -Ncvx-MC-SDP) Provided that 2n, the solution of MC-SDP) corresponds to the global maximum of - Ncvx-MC-SDP) Barvino, 995; Patai, 998; Burer and Monteiro, 2003). Recently, Boumal et al. 206) proved that, as long as 2n, for almost all matrices A, the problem -Ncvx- MC-SDP) has a unique local maximum which is also the global maximum. This paper proposed to use the Riemannian trust-region method to solve the non-convex SDP problem, and provided computational complexity guarantees on the resulting algorithm. While the theory of Boumal et al. 206) suggests the choice = O n), it has been observed empirically that setting = O) yields excellent solutions and scales well to large scale applications Javanmard et al., 206). In order to explain this phenomenon, Bandeira et al. 206) considered the Z 2 synchronization problem with = 2, and established theoretical guarantees for the. In the MaxCut problem, we are given a graph G = V, E) and want to partition the vertices in two sets as to maximize the number of edges across the partition. 2

3 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY local maxima, provided the noise level is small enough. A different point of view was taen in a recent unpublished technical note Montanari, 206), which proposed a Grothendiec-type inequality for the local maxima of -Ncvx-MC-SDP). In this paper we continue and develop the preliminary wor in Montanari 206), to obtain explicit computational guarantees for the non-convex approach with ran constraint = O). As mentioned above, we extend our analysis beyond the MaxCut type problem -Ncvx-MC- SDP) to treat an optimization problem motivated by SOd) synchronization. SOd) synchronization with d = 3) has applications to computer vision Arie-Nachimson et al., 202) and cryo-electron microscopy cryo-em) Singer and Sholnisy, 20). A natural SDP relaxation of the maximum lielihood estimator is given by the Orthogonal-Cut SDP problem maximize A, X subject to X ii = I d, i [m], X 0, OC-SDP) with decision variable X. Here A, X R n n are matrices with blocs denoted by A ij ) i,j m, X ij ) i,j m, where n = md and A ij, X ij R d d. This semidefinite program is also nown as Orthogonal-Cut SDP. In the context of SOd) synchronization, A ij R d d is a noisy measurement of the pairwise group differences Ri R j where R i SOd). By imposing the ran constraint ranx), we obtain a non-convex analogue of OC-SDP), namely: maximize σ, Aσ subject to σ = [σ,..., σ m ] T R n, σ T i σ i = I d, i [m]. -Ncvx-OC-SDP) Here the decision variables are matrices σ i R d. According to the result in Burer and Monteiro 2003), as long as d + ) m, the global maximum of the problem -Ncvx-OC-SDP) coincides with the maximum of the problem OC- SDP). As proved in Boumal et al. 206), with the same value of for almost all matrices A, the non-convex problem has no local maximum other than the global maximum. Boumal 205) proposed to choose the ran adaptively: as is not large enough, increase to find a better solution. However, none of these wors considers = O), which is the focus of the present paper under the assumption that d is of order one as well)... Our contributions A main result of our paper is a Grothendiec-type inequality that generalizes and strengthens the preliminary technical result of Montanari 206). Namely, we prove that for any ε-approximate concave point σ of the ran- non-convex SDP -Ncvx-MC-SDP), we have SDPA) fσ) SDPA) SDPA) + SDP A)) n 2 ε, ) where SDPA) denotes the maximum value of the problem MC-SDP) and fσ) is the objective function in -Ncvx-MC-SDP). An ε-approximate concave point is a point at which the eigenvalues of the Hessian of f ) are upper bounded by ε see below Definition for formal statements). 3

4 MEI MISIAKIEWICZ MONTANARI OLIVEIRA Surprisingly, this result connects a second order local property, namely the highest local curvature of the cost function, to its global position. In particular, all the local maxima corresponding to ε = 0) are within a /-gap of the SDP value. Namely, for any local maximizer σ, we have fσ ) SDPA) SDPA) + SDP A)). 2) All the points outside this gap, with an nε/2-margin have a direction of positive curvature of at least size ε. Figure illustrates the landscape of the ran- non-convex MaxCut SDP problem -Ncvx- MC-SDP). We show that this structure implies global convergence rates for approximately solving -Ncvx-MC-SDP). We study the Riemannian trust-region method in Theorem 3. In particular, we show that this algorithm with any initialization returns a O/)) approximation of the MaxCut of a random d-regular graph in On 2 ) iterations, cf. Theorem 5. SDPA) global optimizer a local optimizer Gap SDPA) n"/2 SDPA)+SDP A) asaddlepointwith " curvature SDP A) Gap = SDPA)+SDP A) Figure : The landscape of ran- non-convex SDP In the case of Z 2 synchronization, we show that for any signal-to-noise ratio λ >, all the local maxima of the ran- non-convex SDP correlate non-trivially with the ground truth when λ) = O) Theorem 6). Furthermore, Theorem 7 provides a lower bound on the correlation between local maxima and the ground truth that converges to one when λ goes to infinity. These results improve over the earlier ones of Bandeira et al. 206) and Boumal et al. 206), by establishing the tight phase transition location, and the correct qualitative behavior. We extend these results to the two-groups symmetric Stochastic Bloc Model. For SOd) synchronization, we consider the problem -Ncvx-OC-SDP) and generalize our main Grothendiec-type inequality to this case, cf. Theorem 9. Namely, for any ε-approximate 4

5 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY concave point σ of the ran- non-convex Orthogonal-Cut SDP -Ncvx-OC-SDP), we have fσ) SDP o A) d SDP oa) + SDP o A)) n ε, 3) 2 where d = 2/d + ), SDP o A) denotes the maximum value of the problem OC-SDP) and fσ) is the objective function in -Ncvx-OC-SDP). We expect that the statistical analysis of local maxima, as well as the analysis of optimization algorithms, should extend to this case as well, but we leave this to future wor..2. Notations Given a matrix A = A ij ) R m n, we write A = max m j n i= A ij for its operator l -norm, A op or A 2 for its operator l 2 -norm largest singular value), and A F = m n i= j= A2 ij )/2 for its Frobenius norm. For two matrices A, B R m n, we write A, B = TrA T B) for the inner product associated to the Frobenius norm A, A = A 2 F. In particular for two vectors u, v R n, u, v corresponds to the inner product of the vectors u and v associated to the Euclidean norm on R n. We denote by ddiagb) the matrix obtained from B by setting to zero all the entries outside the diagonal. Given a real symmetric matrix A R n n, we write SDPA) for value of the SDP problem MC-SDP). That is, SDPA) = max{ A, X : X 0, X ii =, i [n]}. 4) Optimization is performed over the convex set of positive-semidefinite matrices with diagonal entries equal to one, also nown as the elliptope. We write RgA) = SDPA) + SDP A) for the length of the range of the SDP with data A noticing that for every matrix X in the elliptope, we have SDPA) A, X SDP A)). For the ran- non-convex SDP problem -Ncvx-MC-SDP), we define the manifold M as M = {σ R n : σ = σ, σ 2,..., σ n ) T, σ i 2 = } = S } S {{... S }. 5) n times where S {x R : x 2 = } is the unit sphere in R. Given a real symmetric matrix A R n n, for σ M, we write fσ) = σ, Aσ the objective function of the ran- non-convex SDP -Ncvx-MC-SDP). Our optimization algorithm maes use of the Riemannian gradient and the Hessian of the function f. We anticipate their formulas here, deferring to Section 3. for further details. Defining Λ = ddiag Aσσ T), the gradient is given by: gradfσ) = 2 A Λ ) σ. 6) The Hessian is uniquely defined by the following holding for all u, v in the tangent space T σ M : 2. Main results v, Hessfσ)[u] = 2 v, A Λ)u. 7) First we define the notion of approximate concave point of a function f on a manifold M. 5

6 MEI MISIAKIEWICZ MONTANARI OLIVEIRA Definition [Approximate concave point] Let f be a twice differentiable function on a Riemannian manifold M. We say σ M is an ε-approximate concave point of f on M, if σ satisfies u, Hessfσ)[u] ε u, u, u T σ M, where Hessfσ) denotes the Riemannian intrinsic) Hessian of f at point σ, T σ M is the tangent space, and, is the scalar product on T σ M. Note that an approximate concave point may not be a stationary point, or may not even be an approximate stationary point. Both local maximizers and saddles with largest eigenvalue of the Hessian close to zero are approximate concave points. The classical Grothendiec inequality relates the global maximum of a non-convex optimization problem to the maximum of its SDP relaxation Grothendiec, 996; Khot and Naor, 202). Our main tool is instead an inequality that applies to all approximate concave ponts in the non-convex problem. Theorem 2 For any ε-approximate concave point σ M of the ran- non-convex problem -Ncvx-MC-SDP), we have fσ) SDPA) SDPA) + SDP A)) n ε. 8) Fast Riemannian trust-region algorithm We can use the structural information in Theorem 2, to develop an algorithm that approximately solves the problem -Ncvx-MC-SDP), and hence the MaxCut SDP MC-SDP). The algorithm we propose is a variant of the Riemannian trust-region algorithm. The Riemannian trust-region algorithm RTR) Absil et al. 2007) is a generalization of the trustregion algorithm to manifolds. To maximize the objective function f on the manifold M, RTR proceeds as follows: at each step, we find a direction ξ T σ M that maximizes the quadratic approximation of f over a ball of small radius η σ { } ξ arg max fσ) + gradfσ), ξ + ξ, Hessfσ)[ξ], ξ T σ M, ξ η σ, RTR-update) where gradfσ) is the manifold gradient of f, and the radius η σ is chosen to ensure that the higher order terms remain small. The next iterate σ new = P M σ + ξ ) is obtained by projecting σ + ξ bac onto the manifold. Solving the trust-region problem RTR-update) exactly is computationally expensive. In order to obtain a faster algorithm, we adopt two variants in the RTR algorithm. First, if the gradient of f at the current estimate σ t is sufficiently large, we only use gradient information to determine the new direction: we call this a gradient-step; if the gradient is small i.e. we are at an approximately stationary point), we try to maximize uniquely the Hessian contribution: we call this an eigenstep. Second, in an eigen-step, we only approximately maximize the Hessian contribution. Let us emphasize that these two variants are commonly used and we do not claim they are novel. For the non-convex MaxCut SDP problem -Ncvx-MC-SDP), we describe the algorithm concretely as follows. In each step, first we find a direction u t using the direction-finding routine outlined below Throughout the paper, points σ M and vectors u T σm are represented by matrices σ, u R n and hence the norm on T σm is identified with the Frobenius norm u F. 6

7 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY DIRECTION-FINDING ALGORITHM Input : Current position σ t ; parameter µ G ; Output : Searching direction u t with u t F = ; : Compute gradfσ t ) F ; 2: If gradfσ t ) F > µ G 3: Return u t = gradfσ t )/ gradfσ t ) F ; 4: Else 5: Use power method to construct a direction u t T σ M such that u t F =, u t, Hessfσ t )[u t ] λ max Hessfσ t ))/2, and u t, gradfσ t ) 0; Return u t. 6: End Given this direction u t, we update our current estimate by σ t+ = P M σ t + η t u t ) with η t an appropriately chosen step size. We consider two specific implementations for the parameter µ G and the choice of step size: a) Tae µ G =, which means that only eigen-steps are used. In this implementation, we tae the step size η t H = ut, Hessfσ t )[u t ] /00 A ). b) Tae µ G = A 2. When gradfσ t ) F > µ G, we choose the step size η t G = µ G /20 A ). When gradfσ t ) F µ G, we choose the step size η t H = min{ λ t H /26 A ); λ t H /2 A 2)}, where λ t H = ut, Hessfσ t )[u t ]. In each eigen-step, we need to compute a direction u T σ M such that u F = and u, Hessfσ)[u] λ max Hessfσ))/2. This can be done using the following power method. Note that the condition u t, gradfσ t ) 0 can always be ensured eventually by replacing u t by u t.) POWER METHOD Input : σ, Hessfσ); parameters N H, µ H ; Output : u T σ M, such that u F = and u, Hessfσ)[u] λ max Hessfσ))/2; : Sample a u 0 uniformly randomly on T σ M with u 0 F = ; 2: For i =,..., N H 3: u i Hessfσ)[u i ] + µ H u i ; 4: u i u i / u i F ; 5: End 6: Return u N. The shifting parameter µ H can be chosen as 4 A which is an upper bound of Hessfσ) op. We tae the parameter N H = C A log n/λ max Hessfσ)) with a large absolute constant C. In practice, when choosing the parameter N H, we do not now λ max Hessfσ)) for each σ, but we can replace it by a lower bound, or estimate it using some heuristics. It is a classical result that with high probability the power method with this number of iterations finds a solution u t with the required curvature Kuczyńsi and Woźniaowsi, 992). 7

8 MEI MISIAKIEWICZ MONTANARI OLIVEIRA Theorem 3 There exists a universal constant c such that, for any matrix A and ε > 0, the Fast Riemannian Trust-Region method with step size as described above for each iteration and initialized with any σ 0 M returns a point σ M with fσ ) SDPA) SDPA) + SDP A)) n ε, 9) 2 within the following number of steps with each implementation a) Taing µ G = i.e. only eigen-steps are used), then it is sufficient to run T H c n A 2 /ε2 steps. b) Taing µ G = A 2, then it is sufficient to run T = T H + T G steps in which there are T H c n max A 2 2 /ε2, A /ε ) eigen-steps and T G c RgA) A / A 2 2 gradientsteps. The gap RgA)/ ) = SDPA)+SDP A))/ ) in Eq. 9), is due to the fact that Theorem 2 does not rule out the presence of local maxima within an interval RgA)/ ) from the global maximum. It is therefore natural to set ε = 2RgA)/n )), to obtain the following corollary. Corollary 4 There exists a universal constant c such that for any matrix A, the Fast Riemannian Trust-Region method with step size as described above for each iteration and initialized with any σ 0 M returns a point σ M with fσ ) SDPA) 2 SDPA) + SDP A)) 0) within the following number of steps with each implementation a) Taing µ G =, then it is sufficient to run T H c n 2 n A /RgA)) 2 eigen-steps. b) Taing µ G = A 2, then it is sufficient to run T = T H + T G steps in which there are T H c n max n 2 2 A 2 2 /RgA)2, n A /RgA) ) eigen-steps and T G c RgA) A / A 2 2 gradient-steps. In order to develop some intuition on these complexity bounds, let us consider two specific examples. Consider the problem of finding the minimum bisection of a random d-regular graph G, with adjacency matrix A G. A natural SDP relaxation is given by the SDP MC-SDP) with A = A G EA G = A G d/n) T the centered adjacency matrix. For this choice of A, we have A 2d, A 2 = 2 d + o n )) Friedman, 2003), SDPA) = 2n d + on) and SDP A) = 2n d + on) Montanari and Sen, 206) with high probability). Using implementation a) only eigen-steps), the bound on the number of iterations in Corollary 4 scales as T H = Ond 2 ). In implementation b), we choose µ G = Θ d), and the number of gradient-steps and eigensteps scale respectively as T G = On d) and T H = On max, d)). In terms of floating point operations, in each gradient-step, the computation of the gradient costs Ond) operations; in each eigen-step, each iteration of the power method costs Ond) operations and the number of iterations in each power method scales as O d log n). Implementation b) presents a better scaling. The total number of floating point operations to find a O/)) approximate solution of 8

9 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY the minimum bisection SDP of a random d-regular graph is with high probability) upper bounded by On 2 3 d 3/2 max, d) log n). As a second example, consider the MaxCut problem for a d-regular graph G, with adjacency matrix A G. This can be addressed by considering the SDP MC-SDP) with A = A G, and the corresponding non-convex version -Ncvx-MC-SDP). As shown in the next section, finding a 2RgA)/n ))-approximate concave point of -Ncvx-MC-SDP) yields an O/)) approximation of the MaxCut of G. For this choice of A, we have A = d, A 2 = d, and RgA) = Θnd). Therefore, in implementation a) where all the steps are eigen-step, the number of iterations given by Corollary 4 scales as T H = On 2 ). In implementation b), we choose µ G = Θd), and the number of gradient-steps and eigen-steps scale respectively as T G = On) and T H = On 2 ). In terms of floating point operations, the computational costs of one gradient-step and one eigen-step power iteration are the same which are Ond)) as in the example of minimum bisection SDP. The number of iterations in the power method scales as O log n). Therefore, the two approaches are equivalent. The total number of floating point operations to find a O/)) approximate solution of the MaxCut of a d-regular graph is upper bounded by On 2 d 4 log n). Let us emphasize that the complexity bound in Theorem 3 is not superior to the ones available for some alternative approaches. There is a vast literatures that studies fast SDP solvers Arora et al., 2005; Arora and Kale, 2007; Steurer, 200; Garber and Hazan, 20). In particular, Arora and Kale 2007) and Steurer 200) give nearly linear-time algorithms to approximate MC-SDP). These algorithms are different from the one studied here, and rely on the multiplicative weight update method Arora et al., 202). Using setching techniques, their complexity can be further reduced Garber and Hazan, 20). However, in practice, the Burer-Monteiro approach studied here is extremely simple and scales well to large instances Burer and Monteiro, 2003; Javanmard et al., 206). Empirically, it appears to have better complexity than what is guaranteed by our theorem. It would be interesting to compare the multiplicative weight update method and the non-convex approach both theoretically and experimentally Application to MaxCut Let A G R n n denote the weighted adjacency matrix of a non-negative weighted graph G. The MaxCut of G is given by the following integer program MaxCutG) = max x i {,+} 4 A G,ij x i x j ). ) We consider the following semidefinite programming relaxation SDPCutG) = max X 0,X ii = 4 A G,ij X ij ). 2) Denote by X the solution of this SDP. Goemans and Williamson Goemans and Williamson 995) proposed a celebrated rounding scheme using this X, which is guaranteed to find an α -approximate solution to the MaxCut problem ), where α min θ [0,π] 2θ/π cos θ)), α >

10 MEI MISIAKIEWICZ MONTANARI OLIVEIRA The corresponding ran- non-convex formulation is given by max A σ G,ij σ i, σ j ) : σ i S, i [n] 4. 3) Applying Theorem 2, we obtain the following result. Theorem 5 For any 3, if σ is a local maximizer of the ran- non-convex SDP problem 3), then using σ we can find an α / )) / ))-approximate solution of the MaxCut problem ). If σ is a 2RgA G )/n ))-approximate concave point, then using σ we can find an α 2/ )) / ))-approximate solution of the MaxCut problem. The proof is deferred to Section Z 2 synchronization Recall the definition of the Gaussian Orthogonal Ensemble. We write W GOEn) if W R n n is symmetric with W ij ) i j independent, with distribution W ii N0, 2/n) and W ij N0, /n) for i < j. In the Z 2 synchronization problem, we are required to estimate the vector u {±} n from noisy pairwise measurements Aλ) = λ n uut + W n, 4) where W n GOEn), and λ is a signal-to-noise ratio. The random matrix model 4) is also nown as the spied model Johnstone, 200) or deformed Wigner matrix and has attracted significant attention across statistics and probability theory Bai et al., 2005). The Maximum Lielihood Estimator for recovering the labels u {±} n is given by û ML A) = arg max Ax. n x, x {±} A natural SDP relaxation of this optimization problem is given once more by MC-SDP). It is nown that Z 2 synchronization undergoes a phase transition at λ c =. For λ, no statistical estimator ûa) achieves scalar product ûa), u bounded away from 0 as n. For λ >, there exists an estimator with ûa), u bounded away from 0 better than random guessing ) Korada and Macris, 2009; Deshpande et al., 205). Further, for λ < it is not possible 3 to distinguish whether A is drawn from the spied model or A GOEn) with probability of error converging to 0 as n. This is instead possible for λ. It was proved in Montanari and Sen 206) that the SDP relaxation MC-SDP) with a suitable rounding scheme achieves the information-theoretic threshold λ c = for this problem. In this paper, we prove a similar result for the non-convex problem -Ncvx-MC-SDP). Namely, we show that for any signal-to-noise ratio λ > there exists a sufficiently large such that every local maximizer has a non trivial correlation to the ground truth. Below we denote by Cr n, A) the set of local maximizers of problem -Ncvx-MC-SDP). 3. To the best of our nowledge, a formal proof of this statement has not been published. However, a proof can be obtained by the techniques of Mossel et al. 205). 0

11 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY Theorem 6 For any λ >, there exists a function λ) > 0, such that for any > λ), with high probability, any local maximizer σ of the ran- non-convex SDP -Ncvx-MC-SDP) problem has non-vanishing correlation with the ground truth parameter. Explicitly, there exists ε = ελ) > 0 such that ) lim P inf n σ Cr n, A) n σt u 2 ε =. 5) The proof of this theorem is deferred to Section 5.2. Note that this guarantee is weaer than the one of Montanari and Sen, 206), which also presents an explicit rounding scheme to obtain an estimator û {+, } n. However, we expect that the techniques of Montanari and Sen 206) should be generalizable to the present setting. A simple rounding scheme taes the sign of principal left singular vector of σ. We will use this estimator in our numerical experiments in Section 4. This theorem can be compared with the one of Bandeira et al. 206) which uses = 2 but requires λ > 8. As a side result which improves over Bandeira et al. 206) for = 2, we obtain the following lower bound on the correlation for any 2. Theorem 7 For any 2, the following holds almost surely lim inf n inf σ Cr n, n 2 σt u 2 2 min 6 λ, + 4 λ ). 6) The proof is deferred to Section 5.3. Our lower bound converges to at large λ, which is the qualitatively correct behavior Stochastic bloc model The planted partition problem two-groups symmetric stochastic bloc model), is another wellstudied statistical estimation problem that can be reduced to MC-SDP) Montanari and Sen, 206). We write G Gn, p, q) if G is a graph over n vertices generated as follows for simplicity of notation, we assume n even). Let u {±} n be a vector of labels that is uniformly random with u T = 0. Conditional on this partition, edges are drawn independently with { p, if ui = u Pi, j) E u) = j, q, if u i u j. We consider the case when p = a/n and q = b/n with a, b = O), and a > b, and denote by d = a + b)/2 the average degree. A phase transition occurs as the following signal-to-noise parameter increases λa, b) a b. 2a + b) For λ > there exists an efficient estimator that correlates with the true labels with high probability Massoulié, 204; Mossel et al., 203), whereas no estimator exists below this threshold, regardless of its computational complexity Mossel et al., 205). The Maximum Lielihood Estimator of the vertex labels is given by û ML G) = arg max { x, A G x : x {+, } n, x, = 0 }, SBM-MLE)

12 MEI MISIAKIEWICZ MONTANARI OLIVEIRA where A G is the adjacency matrix of the graph G. This optimization problem can again be attaced using the relaxation MC-SDP), where A = A G A G d/n T )/ d is the scaled and centered adjacency matrix. In order to emphasize the relationship between this problem and Z 2 synchronization, we rewrite A G = λ/n)uu T +E where E T = E has zero mean and E ij ) i<j are independent with distribution { d p ij ), with probability p ij, E ij = p ij d, with probability p ij, where p ij = a/n for u i = u j and p ij = b/n for u i u j. In analogy with Theorem 6, we have the following results on the ran-constrained approach to the two-groups stochastic bloc model. Theorem 8 Consider the ran- non-convex SDP -Ncvx-MC-SDP) with A = A G the centered, scaled adjacency matrix of graph G Gn, a/n, b/n). For any λ = λa, b) >, there exists an average degree d λ) and a ran λ), such that for any d d λ) and λ), with high probability, any local maximizer σ has non-vanishing correlation with the true labels. Explicitly, there exists an ε = ελ) > 0 such that lim P ) inf n σ Cr n, n σt u 2 2 ε =. 7) The proof of this theorem can be found in Section 5.4. As mentioned above, efficient algorithms that estimate the hidden partition better than random guessing for λ > and any d > have been developed, among others, in Massoulié 204); Mossel et al. 203). However, we expect the optimization approach -Ncvx-MC-SDP) to share some of the robustness properties of semidefinite programming Moitra et al., 206), while scaling well to large instances SOd) synchronization In SOd) synchronization we would lie to estimate m matrices R,..., R m in the special orthogonal group SOd) = {R R d d : R T R = I d, detr) = }, from noisy measurements of the pairwise group differences A ij = Ri R j + W ij for each pairs i, j) [m] [m]. Here A ij R d d is a measurement, and W ij R d d is noise. Let n = md, we denote matrix A R n n the observation matrix with i, j) th sub-matrix A ij. The Maximum Lielihood Estimator for recovering the group elements R i SOd) solves the problem of the form m max σ i, A ij σ j, σ...σ m SOd) which can be relaxed to the Orthogonal-Cut SDP OC-SDP). The non-convex ran-constrained approach fixes > d, and solves the problem -Ncvx-OC-SDP). This is a smooth optimization problem with objective function fσ) = σ, Aσ over the manifold M o,d, = Od, ) m, where Od, ) = {σ R d : σ T σ = I d } is the set of d orthogonal matrices. We also denote the maximum value of the SDP OC-SDP) by SDP o A) = { A, X : X 0, X ii = I d, i [m]}. 8) In analogy with the MaxCut SDP, we obtain the following Grothendiec-type inequality. 2

13 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY Theorem 9 For an ε-approximate concave point σ M o,d, of the ran- non-convex Orthogonal-Cut SDP problem -Ncvx-OC-SDP), we have where d = 2/d + ). fσ) SDP o A) d SDP oa) + SDP o A)) n 2 ε 9) The proof of this theorem is a generalization of the proof of Theorem 2, and is deferred to Section Proof of Theorem 2 In this section we present the proof of Theorem 2, while deferring other proofs to Section 5. Notice that the present proof is simpler and provides a tighter bound with respect to the one of Montanari 206). Before passing to the actual proof, we mae a few remars about the geometry of optimization on M. 3.. Geometry of the manifold M The set M as defined in 5) is a smooth submanifold of R n. We endow M with the Riemannian geometry induced by the Euclidean space R n. At any point σ M, the tangent space is obtained by taing the differential of the equality constraints T σ M = { u R n : u = u, u 2,..., u n ) T, u i, σ i = 0, i [n] }. In words, T σ M is the set of matrices u R n such that each row u i of u is orthogonal to the corresponding row σ i of σ. Equivalently, T σ M is the direct product of the tangent spaces of the n unit spheres S R at σ,..., σ n. Let P be the orthogonal projection operator from R n onto T σ M. We have P u) = P u ),..., P n u n ) ) T = u σ, u σ,..., u n σ n, u n σ n ) T = u ddiag uσ T) σ, where we denoted by ddiag : R n n R n n the operator on the matrix space that sets all offdiagonal entries to zero. In problem -Ncvx-MC-SDP), we consider the cost function fσ) = σ, Aσ on the submanifold M. At σ M, we denote fσ) and gradfσ) respectively the Euclidean gradient in R n and the Riemannian gradient of f. The former is fσ) = 2Aσ, and the latter is the projection of the first onto the tangent space: gradfσ) = P fσ)) = 2 A ddiag Aσσ T)) σ. We will write Λ = Λσ) = ddiag Aσσ T) and often drop the dependence on σ for simplicity. At σ M, let 2 fσ) and Hessfσ) be respectively the Euclidean and the Riemannian Hessian of 3

14 MEI MISIAKIEWICZ MONTANARI OLIVEIRA f. The Riemannian Hessian is a symmetric operator on the tangent space and is given by projecting the directional derivative of the gradient vector field we use D to denote the directional derivative): u T σ M, Hessfσ)[u] = P Dgradfσ)[u]) = P [2A Λ)u 2ddiagAσu T +Auσ T )σ]. In particular, we will use the following identity u, v T σ M, v, Hessfσ)[u] = 2 v, A Λ)u, 20) where we used that the projection operator P is self-adjoint and v i, σ i = 0 by definition of the tangent space. We observe that the Riemannian Hessian has a similar interpretation as in Euclidean geometry, namely it provides a second order approximation of the function f in a neighborhood of σ Proof of Theorem 2 Let σ be an ε-approximate concave point of fσ) on M. Using the definition and Equation 20), we have for Λ = ddiagaσσ T )) u T σ M, u, Λ A)u ε u, u. 2) 2 Let V = [v,..., v n ] T R n n be such that X = V V T is an optimal solution of MC-SDP) problem. Let G R n be a random matrix with independent entries G ij N0, /), and denote by P i = I σ i σi T R the projection onto the subspace orthogonal to σ i in R. We use G to obtain a random projection W = [P Gv,..., P n Gv n ] T T σ M. From 2), we have E W, Λ A)W ε E W, W, 2 where the expectation is taen over the random matrix G. 4

15 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY The left hand side of the last equation gives E W, Λ A)W =E Λ A) ij P i Gv i, P j Gv j =E = = = = = Λ A) ij P i G v is e s, P j G s= v jt e t t= Λ A) ij v is v jt E[ P i Ge s, P j Ge t ] s,t= Λ A) ij s,t= v is v jt δ st TrP i P j ) Λ A) ij v i, v j ) I Tr σ i σi T σ j σj T + σ i σi T σ j σj T Λ A) ij v i, v j 2 + ) σ i, σ j 2 ) TrΛ) 2 ) SDPA) A ij v i, v j σ i, σ j ) 2, whereas the right hand side verifies E W, W = E P i Gv i, P i Gv i = i= i= 2 + ) σ i 2 2 = ) n. Note that TrΛ) = fσ). Crucially, if we let Xij = v i, v j σ i, σ j ) 2, we have X ii = and X 0. Thus we have SDP A) A, X. Therefore, we have ) fσ) 2 ) SDPA) + SDP A) 2 εn ). Rearranging the terms gives the conclusion. 4. Numerical illustration In this section we carry out some numerical experiments to illustrate our results. We also find interesting phenomena which are not captured by our analysis. Although Theorem 3 provides a complexity bound for the Riemannian trust-region method RTR), we observe that projected) gradient ascent also converges very fast. That is, gradient ascent rapidly increases the objective function, is not trapped at a saddle point, and converges to a local maximizer eventually. In Figure 2, we tae A GOE000), and use projected gradient ascent to solve the optimization problem -Ncvx-MC-SDP) with a random initialization and fixed step size. Figure 2a shows that the objective function increases rapidly and converges within a small interval 5

16 MEI MISIAKIEWICZ MONTANARI OLIVEIRA = 5. = 0. = 5. fσ) gradfσ) F = 5. = 0. = 5. SDPA) Iteration a) Iteration b) Figure 2: Projected gradient ascent algorithm to optimize MC-SDP) with A GOE000): a) fσ) as a function of the iteration number for a single realization of the trajectory; b) gradfσ) F as a function of the iteration number. from the local maximum which is upper bounded by the value SDPA)). Also the gap between the value obtained by this procedure and the value SDPA) decreases rapidly with. Figure 2b shows that the Riemannian gradient decreases very rapidly, but presents some non-monotonicity. We believe these bumps occur when the iterates are close to saddle points SDPA) fσ ) RgA)/5 )) SDPA) fσ))/n = 0. = 5. = λ maxhessfσ)) a) b) Figure 3: Geometric properties of the ran- non-convex SDP, where A GOE000). a). λ max Hessfσ)) versus SDPA) fσ))/n 2. b). SDPA) fσ ) for different, where σ M is a local maximizer. In Figure 3, we examine some geometric properties of the ran- non-convex SDP. As above, we explore the landscape of this problem by projected gradient ascent. In Figure 3a, we plot the curvature λ max Hessfσ)) versus the gap from the SDP value SDPA) fσ))/n 2 along the iterations. When fσ) is far from SDPA), there is a linear relationship between these two quantities, which is consistent with Theorem 2. In Figure 3b, we plot the gap between SDPA) and fσ ) for a local maximizer σ M that is produced by projected gradient ascent, for 6

17 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY different values of. These data are averaged over 0 realizations of the random matrix A. This gap converges to zero as gets large, and is upper bounded by the curve RgA)/5 )). This coincides with Theorem 2, which predicts that this gap must be smaller than RgA)/ ). Note however that in this case Theorem 2 is overly pessimistic, and the gap appears to decrease very rapidly with Cut value n Figure 4: Cut value found by rounding local maximizer of ran- non-convex SDP, for Erdős-Rényi random graphs with n = 000 and average degree d = 50. Data are averaged over 0 realizations. Now we turn to study the MaxCut problem. Note that Theorem 5 gives a guarantee for the approximation ratio for the cut induced by any local maximizer of the ran- non-convex SDP - Ncvx-MC-SDP). In Figure 4, we tae the graph to be an Erdős-Rényi graph with n = 000 and average degree d = 50. We plot the cut value found by rounding the maximizer of the ran- non-convex SDP, for from 2 to 0, and also for = n which corresponds to the MC-SDP). Surprisingly, the cut value found by solving ran- non-convex problem is typically bigger than the cut value found by solving the original SDP. This provides a further reason to adopt the non-convex approach -Ncvx-MC-SDP). It appears to provide a significantly tight relaxation for random instances. In order to study Z 2 synchronization, we consider the matrix A = λ/n)uu T + W n where W n GOEn) for n = 000. Figure 5a shows the correlation σ T u 2 2 /n2 of a local maximizer σ M produced by projected gradient ascent, with the ground truth u. In Figure 5b we construct label estimates ûa) = signv σ)) where v σ) is the principal left singular vector of σ R n. We plot the correlation û, u /n) 2 as a function of λ. In both cases, results are averaged over 0 realizations of the matrix A. Surprisingly, the resulting correlation is strongly concentrated, despite the fact that gradient ascent converges to a random local maximum σ M. Finally, we turn to the SO3) synchronization problem, and study the local maximizer of the Orthogonal-Cut SDP OC-SDP). We sample a matrix A GOE300), and find the local maximum of the ran- non-convex Orthogonal-Cut SDP -Ncvx-OC-SDP). In Figure 6 we plot the gap between SDP o A) and fσ ) for a local maximizer σ R n produced by projected gradient ascent for different. This gap converges to zero as is larger, and is upper bounded by RgA)/20 d )). This is in agreement with Theorem 9, which predicts that the gap is smaller than RgA)/ d ). 7

18 MEI MISIAKIEWICZ MONTANARI OLIVEIRA n 2 σ u n 2 û,u = n. = = 5. = λ a) = n. = = 5. = λ b) Figure 5: Z 2 synchronization: correlation between estimator and ground truth σ T u 2 2 /n2 and û T, u 2 /n versus λ SDP oa) fσ ) RgA)/20 d )) Figure 6: SDP o A) fσ ) for different, where σ M o,d, is a local maximizer. 5. Other proofs 5.. Proof of Theorem 5 Note that problem 3) is equivalent to problem -Ncvx-MC-SDP) with matrix A = A G. Applying Theorem 2, and noting that the elements of A G are non-negative, we for any local maximizer σ of the problem 3), and any X optimal solution of the SDP 2), σ, A G σ A G, X A G, X + SDPA G )) = A G, X A G, X + A G,ij ). 22) 8

19 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY Thus, we have 4 A G,ij σi, σj ) = 4 4 = A G,ij + 4 σ, A G σ A G,ij + 4 [ A G, X A G, X + ) 4 ) MaxCutG). A G,ij Xij) = A G,ij )] ) SDPCutG) 23) Applying the randomized rounding scheme of Goemans and Williamson 995), we sample a vector u N0, I ), and define v {±} n by v i = sign σi, u ), then we obtain E A G,ij v i, v j ) α A G,ij σi, σj ) α ) MaxCutG). 4 4 Therefore, for any local maximizer σ, it gives an α / ))-approximate solution of the MaxCut problem. If σ is an ε = 2RgA G )/n ))-approximate concave point, using Theorem 2 and the same argument, we can prove that it gives an α 2/ ))-approximate solution of the MaxCut problem Proof of Theorem 6 Let Aλ) = λ/n uu T + W n. For any local maximum σ Cr n, of the ran- non-convex MaxCut SDP problem, according to Theorem 2, we have Therefore λ n σt u 2 2 ) fσ) SDPAλ)) SDPAλ)) + SDP Aλ))). λ SDP ) SDP ) n uut + W n ) λ n uut + W n SDP λn ) uut W n W n, σσ T SDP W n) SDP W n ). 24) Using the convergence of the SDP value as proved in Montanari and Sen, 206, Theorem 5), for any λ >, there exists λ) > 0 such that, for any δ > 0, the following holds with high probability n SDP ±W n) 2 + δ, and ) λ n SDP n uut + W n 2 + λ). 9

20 MEI MISIAKIEWICZ MONTANARI OLIVEIRA Therefore, we have with high probability n 2 σt u 2 2 [ 2 ) + λ) λ = ) λ) λ ) + ) ] 2 + δ) 4 + δ λ δ λ. 25) Since λ) > 0 for λ >, there exists a λ) such that the above expression is greater than ε for sufficiently small ε and δ, which concludes the proof Proof of Theorem 7 We decompose the proof into two parts. In part a), we prove that almost surely lim inf n inf σ C n, n 2 σt u λ, using only the second order optimality condition. In part b), we incorporate the first order optimality condition and prove that as λ 2, we have almost surely Proof PART a) lim inf n inf σ C n, n 2 σt u λ. The proof of this part is similar to the proof of Theorem 2. We replace the matrix A by the expression A = uu T +, where u {±} n and = n/λ W n. Let g R, g N0, / I ), and W = [P gu,..., P n gu n ] T T σ M, where P i = I σ i σ T i R. Due to the second order optimality condition, similar to the calculation in Theorem 2, we have for any local maximizer σ of the ran- non-convex SDP problem: 0 E g W, Λσ) A)W = Plugging in the expression of A, we obtain ) fσ) 2 ) A ij u i u j ) uu T +, σσ T 2 ) n 2 + u, u ) A ij u i u j σ i, σ j 2. [ σi, σ j 2 + ij u i u j σ i, σ j 2] 0. Letting X ij = u i u j σ i, σ j 2, we have ) σ T u ) n 2 ) σ, σ + 2 ) u, u + [ σi, σ j 2 ] + ij Xij. 20

21 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY Recall that ranσσ T ) =, and Trσσ T ) = n. Thus, we get the lower bound σ i, σ j 2 = σσ T 2 F = ) 2 λ 2 i σσ T ) λ i σσ T ) = i= i= ) 2 Trσσ T n 2 ) =. Also note that X is a feasible point of MC-SDP). Therefore, ) σ T u ) 2 n 2 which implies that lim inf n ) σ, σ + 2 ) u, u +, X ) 2 n 2 ) SDP ) ) SDP ) ) 2 n 2 2 ) n op inf σ C n, n 2 σt u 2 2 lim inf n 2 λ W n op ) = 4, a.s. 26) λ where we used the fact that for a GOE matrix W n, we have lim n W n op = 2 almost surely Anderson et al., 200). PART b) In part a) we only used the second order optimality condition. In this part of the proof, we will incorporate the first order optimality condition. Note that as λ < 2, the bound in part a) is better. So in this part, we only consider the case when λ 2. Without loss of generality, let u =, the vector with all entries equal to one. Let σ R n be a local optimizer of the ran- non-convex SDP problem. We remar that the cost function is invariant by a right rotation of σ. We can therefore assume that σ = v,..., v ) where v i R n and v i, v j = 0 for i j tae the SVD decomposition σ = UΣV T and consider σ = UΣ). Let X = σσ T and Aλ) = λ/n) T + W n. For simplicity, we will sometimes omit the dependence on λ and write A = Aλ). We decompose the proof into the following steps. Step Upper bound on, v j 2 /n 2, for j = 2,...,, using the first order optimality condition. The first order optimality condition gives Aσ = ddiagaσσ T ) σ, which implies that Av i ) v j = Av j ) v i, for any i j, where we denoted u v the entry-wise product of u and v. Replacing A by its expression gives ) ) ) ) λ λ n T + W n v i v j = n T + W n v j v i, which implies, v i v j, v j v i = n λ [ W nv i ) v j + W n v j ) v i ]. 2

22 MEI MISIAKIEWICZ MONTANARI OLIVEIRA We tae the norm of this expression and, recalling that v i, v j = 0, we obtain, v i 2 v j 2 2 +, v j 2 v i 2 2 n2 λ 2 [ Wn v i ) v j 2 + W n v j ) v i 2 ] 2. 27) Notice that v j, j [], hence, v i 2 v j 2 2 +, v j 2 v i 2 2 n2 λ 2 [ Wn v i 2 + W n v j 2 ] 2 n2 λ 2 W n 2 op 2n2 λ 2 W n 2 op [ vi 2 + v j 2 ] 2 28) vi v j 2 ) 2. Without loss of generality, let us assume that v 2 v j 2 for j 2 which implies We deduce the following upper bound, v j 2 v 2 2 4n2 λ 2 W n 2 op v 2 2, for j 2. lim sup n sup σ Cr n, n 2, v j 2 6, a.s. 29) λ2 for j = 2,...,, where we use the fact that for a GOE matrix W n, we have lim n W n op = 2 almost surely. Step 2 Lower bound on, v 2 /n 2. We combine equation 26) and 29) to get almost surely lim inf inf n σ Cr n, n 2, v 2 = lim inf inf n n 2 σt 2 2 n 2 σ Cr n, Since we assumed that λ 2 and 2, we obtain, almost surely, lim inf n, v j 2 4 λ 6 λ 2. j=2 inf σ Cr n, n 2, v The second inequality above is loose but it is sufficient for our purposes. Step 3 Upper bound on v a 2 2 for a {2,..., }. In Equation 28), let us tae i = and j = a {2,..., }, we have, v 2 v a 2 2 +, v a 2 v 2 2 2n2 λ ) W n 2 op v v a 2 2n 2) 3 λ 2 W n 2 op. 3) Combining equation 3) and 30) results in the following upper bound for λ 2, lim sup n sup σ Cr n, n v a λ 2, 32) 22

23 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY holding almost surely for any a {2,..., }. Step 4 Lower bound on fσ). By second order optimality of σ, for any vectors {ξ i } n i= satisfying σ i, ξ i = 0, we have ξ, Λ A)ξ 0 where ξ = [ξ,..., ξ n ] T and Λ = ddiagaσσ T ). Tae ξ i = e a σ i, e a σ i, where e a is the a-th canonical basis vector in R, a {2,..., }. Noting that σ = σ,..., σ n ) T = v,..., v ), we have σ i, e a = v a,i. Therefore, we have ξ i, ξ j = v 2 a,i v 2 a,j + X ij v a,i v a,j. 33) Using the second order stationarity condition with this choice of ξ i, we have which implies 0 = = Λ A) ij ξ i, ξ j Λ A) ij va,i 2 va,j 2 + X ij v a,i v a,j ) Λ ii va,i) 2 i= A ij 2va,i 2 + X ij v a,i v a,j ), fσ) =TrΛ) Λ ii va,i 2 + A ij 2va,i 2 + X ij v a,i v a,j ) =, A + i= Λ ii va,i 2 2 A ij va,i 2 + i=, A + B + B 2 + B 3. A ij X ij v a,i v a,j 34) Consider the first term B. It is easy to see that the second order stationary condition implies Λ A) ii 0. Thus, we have B = Λ ii va,i 2 i= i= A ii va,i 2 = i= λ/n + W n,ii )va,i 2 i= W n,ii v 2 a,i max i [n] W n,ii W n op v a 2 2. Next consider the second term B 2. We have i= v 2 a,i B 2 =2, Av a v a ) = 2, λ/n T + W n )v a v a ) 2λ v a , W n v a v a ) 2λ v a n W n op v a v a 2 2λ v a n W n op v a 2. where the last inequality is because v a,i so that v a v a 2 v a 2. 23

24 MEI MISIAKIEWICZ MONTANARI OLIVEIRA Finally, consider the last term B 3. B 3 = v a, λ/n T + W n ) X)v a =λ/n v a, Xv a + v a, W n X)v a v a, W n X)v a W n X op v a 2 2 W n op v a 2 2, where the last inequality used a fact that if X R n n is in the elliptope, we have W X op W op for any W R n n. Here is the justification of the above fact. For X in the elliptope, we have X ii = and X 0. For any Z satisfying Z 0 and TrZ), X Z also satisfies X Z 0 and TrX Z). Therefore, using the variational representation of the operator norm, we have { } W X op = max = max max { { sup Z 0,TrZ) sup Z 0,TrZ) sup Y 0,TrY ) W X, Z, sup W, X Z, W, Y, Z 0,TrZ) sup Z 0,TrZ) sup Y 0,TrY ) W X, Z W, X Z W, Y } } = W op. Step 5 Finish the proof. Noting that fσ) = λ/n σ T σ, W nσ and, A = nλ +, W n, we rewrite Equation 34) as following n 2 σt 2 2 λn σ, W nσ, W n ) + λn B + B 2 + B 3 ). Plug in the lower bound of B, B 2, B 3, we have almost surely lim inf n lim inf n inf σ Cr n, n 2 σt 2 2 inf σ Cr n, 4 λ λ 6 λ. { 2 λ W n op λ λ2 λ λn 2 W n op v a λ v a } n W n op v a 2 ) 32 λ ) Here we used Equation 32), λ 2 24, and the fact that for a GOE matrix W n, we have lim n W n op = 2 almost surely. 24

25 SOLVING SDPS VIA THE GROTHENDIECK INEQUALITY 5.4. Proof of Theorem 8 Proof The proof is similar to the proof of Theorem 6, where the GOE matrix W n is replaced by the noise matrix E. Applying Theorem 2 with the matrix A G λ), similar to Equation 24), we have λ n σt u 2 2 ) SDP A G λ) ) SDP E) SDP E). 35) According to Montanari and Sen, 206, Theorem 8), the gap between the SDPs with the two different noise matrices is bounded with high probability by a function of the average degree d n SDPA Gλ)) n SDPAλ)) < C log d d /0 and n SDP±E) n SDP ±W n) < C log d d /0, where Aλ) = λ/n uu T + W n corresponds to the Z 2 synchronization model and C = Cλ) is a function of λ bounded for any fixed λ. According to Montanari and Sen, 206, Theorem 5), for any δ > 0 and λ >, there exists a function λ) > 0 such that with high probability, we have n SDP ±W n) 2 + δ, and ) λ n SDP n uut + W n 2 + λ). 36) Combining the above results, we have for any δ > 0, with high probability inf σ Cr n, n 2 σt u 2 2 ) λ) λ 4 + δ λ δ λ 2Cλ) λ log d d /0. For a sufficiently small ε > 0, taing δ sufficiently small, and taing successively d and sufficiently large, the above expression will be greater than ε, which concludes the proof Proof of Theorem 9 We decompose the proof into three parts. In the first part, we do the calculation for a general nonconvex problem. In the second part, we focus on the non-convex problem -Ncvx-OC-SDP). In the third part, we prove a claim we made in the second part. Proof PART First, let s consider a general SDP problem. Given a symmetric matrix A R n n, symmetric matrices B, B 2,..., B s R n n and real numbers c,..., c s R, we consider the following SDP: max A, X X R n n subject to B i, X = c i, i [s], X 0. 37) 25

arxiv: v2 [math.oc] 29 Mar 2017

arxiv: v2 [math.oc] 29 Mar 2017 Solving SDPs for synchronization and MaxCut problems via the Grothendiec inequality Song Mei Theodor Misiaiewicz Andrea Montanari Roberto I. Oliveira arxiv:703.08729v2 [math.oc] 29 Mar 207 March 3, 207

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3 MI 6.97 Algebraic techniques and semidefinite optimization February 4, 6 Lecture 3 Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo In this lecture, we will discuss one of the most important applications

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

SDP Relaxations for MAXCUT

SDP Relaxations for MAXCUT SDP Relaxations for MAXCUT from Random Hyperplanes to Sum-of-Squares Certificates CATS @ UMD March 3, 2017 Ahmed Abdelkader MAXCUT SDP SOS March 3, 2017 1 / 27 Overview 1 MAXCUT, Hardness and UGC 2 LP

More information

Relaxations and Randomized Methods for Nonconvex QCQPs

Relaxations and Randomized Methods for Nonconvex QCQPs Relaxations and Randomized Methods for Nonconvex QCQPs Alexandre d Aspremont, Stephen Boyd EE392o, Stanford University Autumn, 2003 Introduction While some special classes of nonconvex problems can be

More information

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY OLIVIER GUÉDON AND ROMAN VERSHYNIN Abstract. We present a simple and flexible method to prove consistency of semidefinite optimization

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

arxiv: v3 [cs.cv] 3 Sep 2013

arxiv: v3 [cs.cv] 3 Sep 2013 GLOBAL REGISTRATION OF MULTIPLE POINT CLOUDS USING SEMIDEFINITE PROGRAMMING KUNAL N. CHAUDHURY, YUEHAW KHOO, AND AMIT SINGER arxiv:1306.5226v3 [cs.cv] 3 Sep 2013 ABSTRACT. Consider N points in R d and

More information

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization April 6, 2018 1 / 34 This material is covered in the textbook, Chapters 9 and 10. Some of the materials are taken from it. Some of

More information

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints Nicolas Boumal Université catholique de Louvain (Belgium) IDeAS seminar, May 13 th, 2014, Princeton The Riemannian

More information

arxiv: v1 [math.st] 10 Sep 2015

arxiv: v1 [math.st] 10 Sep 2015 Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and

More information

Introduction to Semidefinite Programming I: Basic properties a

Introduction to Semidefinite Programming I: Basic properties a Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010 Semidefinite

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

approximation algorithms I

approximation algorithms I SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University

More information

Fast and Robust Phase Retrieval

Fast and Robust Phase Retrieval Fast and Robust Phase Retrieval Aditya Viswanathan aditya@math.msu.edu CCAM Lunch Seminar Purdue University April 18 2014 0 / 27 Joint work with Yang Wang Mark Iwen Research supported in part by National

More information

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Convex relaxation. In example below, we have N = 6, and the cut we are considering Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution

More information

Lecture: Examples of LP, SOCP and SDP

Lecture: Examples of LP, SOCP and SDP 1/34 Lecture: Examples of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:

More information

arxiv: v3 [math.oc] 8 Jan 2019

arxiv: v3 [math.oc] 8 Jan 2019 Why Random Reshuffling Beats Stochastic Gradient Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo Parrilo arxiv:1510.08560v3 [math.oc] 8 Jan 2019 January 9, 2019 Abstract We analyze the convergence rate

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016 U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest

More information

Recoverabilty Conditions for Rankings Under Partial Information

Recoverabilty Conditions for Rankings Under Partial Information Recoverabilty Conditions for Rankings Under Partial Information Srikanth Jagabathula Devavrat Shah Abstract We consider the problem of exact recovery of a function, defined on the space of permutations

More information

Lecture 8: The Goemans-Williamson MAXCUT algorithm

Lecture 8: The Goemans-Williamson MAXCUT algorithm IU Summer School Lecture 8: The Goemans-Williamson MAXCUT algorithm Lecturer: Igor Gorodezky The Goemans-Williamson algorithm is an approximation algorithm for MAX-CUT based on semidefinite programming.

More information

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Convex relaxation. In example below, we have N = 6, and the cut we are considering Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Chapter 26 Semidefinite Programming Zacharias Pitouras 1 Introduction LP place a good lower bound on OPT for NP-hard problems Are there other ways of doing this? Vector programs

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions

Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions March 6, 2013 Contents 1 Wea second variation 2 1.1 Formulas for variation........................

More information

On the low-rank approach for semidefinite programs arising in synchronization and community detection

On the low-rank approach for semidefinite programs arising in synchronization and community detection JMLR: Workshop and Conference Proceedings vol 49:1, 016 On the low-rank approach for semidefinite programs arising in synchronization and community detection Afonso S. Bandeira Department of Mathematics,

More information

Porcupine Neural Networks: (Almost) All Local Optima are Global

Porcupine Neural Networks: (Almost) All Local Optima are Global Porcupine Neural Networs: (Almost) All Local Optima are Global Soheil Feizi, Hamid Javadi, Jesse Zhang and David Tse arxiv:1710.0196v1 [stat.ml] 5 Oct 017 Stanford University Abstract Neural networs have

More information

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah 00 AIM Workshop on Ranking LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION By Srikanth Jagabathula Devavrat Shah Interest is in recovering distribution over the space of permutations over n elements

More information

Overparametrization for Landscape Design in Non-convex Optimization

Overparametrization for Landscape Design in Non-convex Optimization Overparametrization for Landscape Design in Non-convex Optimization Jason D. Lee University of Southern California September 19, 2018 The State of Non-Convex Optimization Practical observation: Empirically,

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department

More information

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.85J / 8.5J Advanced Algorithms Fall 008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 8.5/6.85 Advanced Algorithms

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Lecture Semidefinite Programming and Graph Partitioning

Lecture Semidefinite Programming and Graph Partitioning Approximation Algorithms and Hardness of Approximation April 16, 013 Lecture 14 Lecturer: Alantha Newman Scribes: Marwa El Halabi 1 Semidefinite Programming and Graph Partitioning In previous lectures,

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Convex and Semidefinite Programming for Approximation

Convex and Semidefinite Programming for Approximation Convex and Semidefinite Programming for Approximation We have seen linear programming based methods to solve NP-hard problems. One perspective on this is that linear programming is a meta-method since

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Unique Games and Small Set Expansion

Unique Games and Small Set Expansion Proof, beliefs, and algorithms through the lens of sum-of-squares 1 Unique Games and Small Set Expansion The Unique Games Conjecture (UGC) (Khot [2002]) states that for every ɛ > 0 there is some finite

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Backward Error Estimation

Backward Error Estimation Backward Error Estimation S. Chandrasekaran E. Gomez Y. Karant K. E. Schubert Abstract Estimation of unknowns in the presence of noise and uncertainty is an active area of study, because no method handles

More information

8 Approximation Algorithms and Max-Cut

8 Approximation Algorithms and Max-Cut 8 Approximation Algorithms and Max-Cut 8. The Max-Cut problem Unless the widely believed P N P conjecture is false, there is no polynomial algorithm that can solve all instances of an NP-hard problem.

More information

Near-Optimal Algorithms for Maximum Constraint Satisfaction Problems

Near-Optimal Algorithms for Maximum Constraint Satisfaction Problems Near-Optimal Algorithms for Maximum Constraint Satisfaction Problems Moses Charikar Konstantin Makarychev Yury Makarychev Princeton University Abstract In this paper we present approximation algorithms

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

COMP 558 lecture 18 Nov. 15, 2010

COMP 558 lecture 18 Nov. 15, 2010 Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Reweighted Nuclear Norm Minimization with Application to System Identification

Reweighted Nuclear Norm Minimization with Application to System Identification Reweighted Nuclear Norm Minimization with Application to System Identification Karthi Mohan and Maryam Fazel Abstract The matrix ran minimization problem consists of finding a matrix of minimum ran that

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning David Steurer Cornell Approximation Algorithms and Hardness, Banff, August 2014 for many problems (e.g., all UG-hard ones): better guarantees

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Lebesgue Measure on R n

Lebesgue Measure on R n CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Throughout these notes we assume V, W are finite dimensional inner product spaces over C. Math 342 - Linear Algebra II Notes Throughout these notes we assume V, W are finite dimensional inner product spaces over C 1 Upper Triangular Representation Proposition: Let T L(V ) There exists an orthonormal

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Lectures in Discrete Differential Geometry 2 Surfaces

Lectures in Discrete Differential Geometry 2 Surfaces Lectures in Discrete Differential Geometry 2 Surfaces Etienne Vouga February 4, 24 Smooth Surfaces in R 3 In this section we will review some properties of smooth surfaces R 3. We will assume that is parameterized

More information

Low-ranksemidefiniteprogrammingforthe. MAX2SAT problem. Bosch Center for Artificial Intelligence

Low-ranksemidefiniteprogrammingforthe. MAX2SAT problem. Bosch Center for Artificial Intelligence Low-ranksemidefiniteprogrammingforthe MAX2SAT problem Po-Wei Wang J. Zico Kolter Machine Learning Department Carnegie Mellon University School of Computer Science, Carnegie Mellon University, and Bosch

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Lecture 9: Low Rank Approximation

Lecture 9: Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

1 The independent set problem

1 The independent set problem ORF 523 Lecture 11 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, March 29, 2016 When in doubt on the accuracy of these notes, please cross chec with the instructor

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Instructor: Farid Alizadeh Scribe: Anton Riabov 10/08/2001 1 Overview We continue studying the maximum eigenvalue SDP, and generalize

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Inverse Singular Value Problems

Inverse Singular Value Problems Chapter 8 Inverse Singular Value Problems IEP versus ISVP Existence question A continuous approach An iterative method for the IEP An iterative method for the ISVP 139 140 Lecture 8 IEP versus ISVP Inverse

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Directional Field. Xiao-Ming Fu

Directional Field. Xiao-Ming Fu Directional Field Xiao-Ming Fu Outlines Introduction Discretization Representation Objectives and Constraints Outlines Introduction Discretization Representation Objectives and Constraints Definition Spatially-varying

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information