Convergence Analysis of Gradient Descent for Eigenvector Computation
|
|
- Dwight Snow
- 5 years ago
- Views:
Transcription
1 Convergence Analysis of Gradient Descent for Eigenvector Computation Zhiqiang Xu, Xin Cao 2 and Xin Gao KAUST, Saudi Arabia 2 UNSW, Australia zhiqiang.u@kaust.edu.sa, in.cao@unsw.edu.au, in.gao@kaust.edu.sa Abstract We present a novel, simple and systematic convergence analysis of gradient descent for eigenvector computation. As a popular, practical, and provable approach to numerous machine learning problems, gradient descent has found successful applications to eigenvector computation as well. However, surprisingly, it lacks a thorough theoretical analysis for the underlying geodesically non-conve problem. In this work, the convergence of the gradient descent solver for the leading eigenvector computation is shown to be at a global rate O(min{( λ 2 log ɛ, ɛ }, where λ p λ p+ > 0 represents the generalized positive eigengap and always eists without loss of generality with λ i being the i-th largest eigenvalue of the given real symmetric matri and p being the multiplicity of λ. The rate is linear at O(( λ 2 log ɛ if ( λ 2 O(, otherwise sub-linear at O( ɛ. We also show that the convergence only logarithmically instead of quadratically depends on the initial iterate. Particularly, this is the first time the linear convergence for the case that the conventionally considered eigengap λ λ 2 0 but the generalized eigengap satisfies ( λ 2 O(, as well as the logarithmic dependence on the initial iterate are established for the gradient descent solver. We are also the first to leverage for analysis the log principal angle between the iterate and the space of globally optimal solutions. Theoretical properties are verified in eperiments. Introduction Eigenvector computation is a ubiquitous problem in data processing nowadays, such as spectral clustering [Ng et al., 2002; Xu and Ke, 206a], pagerank computation, dimensionality reduction [Fan et al., 208], and so on. Classic solvers from numerical algebra are power methods and Lanczos algorithms [Golub and Van Loan, 996], based on which there has been a recent surge of interest in developing varieties Corresponding author <zhiqiangu200@gmail.com> of solvers [Hardt and Price, 204; Musco and Musco, 205; Garber et al., 206; Balcan et al., 206]. However, most of them are purely theoretic. In this work, we focus on a general and practical solver [Wen and Yin, 203], namely gradient descent from the optimization perspective, for the leading eigenvector computation: min f( R n : A, ( 2 where A R n n and A A. Gradient descent is the most widely used method in optimization for machine learning [Sra et al., 20], due to its effectiveness, simplicity, and provable theoretical guarantees. It has also found successful applications to eigenvector computation [Absil et al., 2008; Wen and Yin, 203; Pitaval et al., 205; Liu et al., 206; Zhang et al., 206; Xu and Ke, 206b]. In particular, it was demonstrated to have comparable performance and better robustness in comparison to Lanczos algorithms with etensive eperimental studies [Wen and Yin, 203]. Despite being practical, however, its convergence analysis remains unsatisfied thus far. Pitaval et al. proved its global convergence but the rate was unknown [Pitaval et al., 205]. Under a large positive eigengap assumption a linear but local rate was achieved in [Xu and Ke, 206b; Liu et al., 206; Xu et al., 207], while a gap-free rate O( ɛ was given by [Arora 2 et al., 203; Shamir, 206a]. Very recently, the rate for the case of an arbitrary eigengap has been improved to O( ɛ [Xu and Gao, 208]. As shown in our eperimental study, however, certain cases of zero eigengap are significantly underestimated and the quadratic dependence on the initial iterate is not true. In this paper, we try to present a systematic convergence analysis of the gradient descent solver for Problem ( to address all the issues mentioned above, and make it from a view of Riemannian optimization. The sphere constraint in ( actually represents a Riemannian manifold, called the sphere manifold, which is a special case of the Stiefel manifold defined by the orthogonality constraint [Absil et al., 2008]. In this sense, Problem ( is geodesically non-conve. A gradient descent step in Riemannian optimization takes the following form: t+ R( t, α t+ f(t, (2 where f( t denotes the Riemannian gradient representing the steepest ascent direction in the equidimensional Euclidean 2933
2 space tangent to the manifold at t, α t+ > 0 represents the step-size, and R( t, represents the retraction map from the tangent space at t to the manifold. In particular, in order to measure the progress of the iterate t to one of globally optimal solutions, we use for analysis a novel potential function defined by the log principal angle between t and, the space of such solutions rather than only one of such solutions as in previous work [Shamir, 205]. It turns out that this potential function boasts the advantage of establishing the global convergence and improving the dependence on the initial iterate. We present a novel general analysis that depends on the generalized eigengap, λ p λ p+, where λ i represents the i-th largest eigenvalue of the given matri A and p is the multiplicity of λ. always remains positive without loss of generality. This unifies all the cases of the conventionally considered eigengap λ λ 2, i.e., > 0 and 0. When is as small as the target precision parameter ɛ, the resulting theoretic compleity is high. However, another unified analysis regardless of the value of we present subsequently shows that this can be significantly reduced. Specifically, we make the following contributions: A global convergence rate O(min{( λ 2 log ɛ, ɛ } of the Riemannian gradient descent solver is achieved for Problem (. The rate is linear at O(( λ 2 log ɛ if ( λ 2 O(, otherwise sub-linear at O( ɛ. This shows that even if 0, it is possible as well to converge linearly. The quadratic dependence of the convergence on the initial iterate is improved to the logarithmic one. Theoretical properties, especially those related to 0, are empirically verified on synthetic or real data. 2 Related Work Due to the space limit, we only discuss those gradient based solvers and refer readers to the cited references for more literature work. There are two categories of such solvers: projected gradient descent and Riemannian gradient descent. Arora et al. proposed the stochastic power method without theoretical guarantees [Arora et al., 202], which actually is equivalent to the projected stochastic gradient descent for the principal component analysis (PCA problem. It was named after the power method because the projected (deterministic gradient descent for PCA will degenerate to the power method when the step-size goes to infinity. This method was subsequently etended via conve relaation and proved to have a global, gap-free, and sub-linear convergence rate O( ɛ [Arora et al., 203]. Balsubramani et al. demonstrated that 2 Oja s algorithm converges at a global, gap-dependent, and sub-linear rate O( ɛ via the martingale analysis [Balsubramani et al., 203]. Note that the gap dependence refers to the dependence on in most of eisting analyses. More recently, stochastic gradient descent (SGD for PCA was shown to converge either at a global, gap-dependent, and sub-linear rate O( ɛ or at a global, gap-free, and sub-linear rate O( ɛ [Shamir, 2 206a]. Shamir proposed the projected SGD with variance reduction for PCA, called VR-PCA, and proved its global or local, gap-dependent, and linear rate O( log 2 ɛ [Shamir, 205; 206b]. More relevant to our work is an increasing body of Riemannian gradient descent methods. It is worth noting that general Riemannian optimization methods are applicable to Problem (, including Riemannian gradient descent [Edelman et al., 999; Absil et al., 2008; Wen and Yin, 203], Riemannian SGD [Bonnabel, 203] and Riemannian SVRG [Zhang et al., 206]. Notably, Wen et al. demonstrated that curvilinear search, which actually performs Riemannian gradient descent with Cayley transformation based retraction, achieves a comparable performance and is more robust compared to Lanczos algorithms [Wen and Yin, 203]. However, the analysis only, at most, achieves either a global, gap-free, and sub-linear convergence to critical points [Absil et al., 2008; Wen and Yin, 203; Bonnabel, 203; Zhang et al., 206] or a local, gap-dependent, and linear convergence to globally optimal solutions [Absil et al., 2008], due to the geodesic non-conveity. As we know, each eigenvector corresponds to a critical point of the objective function in Problem ( and thus they fail to meet the global optimality in theory. On the other hand, specifically, gradient descent for low-rank approimation was proven to converge globally but without any rate provided in [Pitaval et al., 205]. Xu et al. established the local, gap-dependent, and linear rate O( log 2 ɛ of the Riemannian SVRG solver [Xu and Ke, 206b; Xu et al., 207]. By proving an eplicit Łojasiewicz eponent at 2 in [Liu et al., 206], Liu et al. demonstrated a local, gapdependent, and linear convergence of the Riemannian linesearch methods for the quadratic problem under the orthogonality constraint, which includes Problem ( as a special case. In addition, despite the motivation from optimization on Riemannian quotient manifolds, alecton as a SGD solver for streaming low-rank matri approimation ends up with an update like the stochastic power method, and achieves a global, gap-dependent, and sub-linear rate O( 2 ɛ via the martingale analysis [Sa et al., 205]. Although gradient based methods for Problem ( work well in practice [Arora et al., 202; Wen and Yin, 203], related theory is far behind. Despite great interest in developing stochastic solvers, surprisingly, there even does not eist a thorough convergence analysis for the deterministic gradient descent solver as is achieved in this work, which we believe will be an important basis for developing stochastic versions with better theoretical guarantees than the state-of-the-art. 3 Analysis of Gradient Descent We now conduct the convergence analysis of the Riemannian gradient descent solver for Problem (, starting from introducing necessary notions and notations. Main results are then stated in theorems and followed by a few important supporting lemmas. The section ends with the proofs of the theorem and lemmas. 3. Notions and Notations Recall that given a real symmetric matri A R n n, its i-th largest eigenvalue is denoted as λ i, i.e., λ λ n, and 2934
3 the corresponding eigenvector denoted as v i. Define eigengap i λ i λ i+ and let p be the multiplicity of λ with the associated eigenspace denoted as V p [v,, v p ]. One then should have j 0 for j,, p and > 0 without loss of generality, i.e., p < n, because f( will be a constant function if p n. Instead of focusing on a specific case of the eigengap like > 0, we target a general framework admitting all the cases of. To this end, we have to rely on and V p, where we follow [Xu and Gao, 208] to term as the generalized eigengap. That is, it suffices for us to show the convergence to one of globally optimal solutions, i.e., a unit vector v span(v p, instead of a specific solution, e.g., v. We thus define the following potential function: ψ( t 2 log V p t 2, (3 where actually Vp t 2 cos θ( t, V p representing the cosine of the principal angle between the current iterate t and the space of globally optimal solutions. As Vp t 2, it is easy to see that ψ( t 0 and t converges to a unit vector in span(v p when ψ( t goes to 0. In addition, we take the normalization retraction, i.e., R(, ξ +ξ +ξ 2. Thus, the update in (2 can be eplicitly written as t+ t α t+ f(t t α t+ f(t 2, (4 where the Riemannian gradient for Problem ( is f( (I A. 3.2 Main Results Theorem. If the initial 0 y y 2 where entries of y are i.i.d. standard normal samples, i.e., y i N (0,, then the Rimannian gradient descent solver for Problem ( will converge to certain unit vector v span(v p at a global rate O(min{( λ 2 log ɛ, ɛ } with high probability. Specifically, for any ɛ (0,, 2λ 2 (+α p I the solver with constant step-sizes α < will converge, i.e., ψ( T < ɛ, after T O(( λ 2 log ψ(0 ɛ iterations, with probability at least ν p, where ν 0 if p otherwise ν (0, ; II the solver with diminishing step-sizes α t τ+t for sufficiently large constants c, τ > 0 will converge, i.e., ψ( T < ɛ, after T O( ɛ iterations, with probability at least ν p, where ν 0 if p otherwise ν (0,. As the convergence holds with high probability for any random initial iterate, it is global by convention. Contrastingly, the success probability given in [Shamir, 206a] is Ω( n. To prove the theorem, we need a few important lemmas whose proofs are deferred to after that of Theorem. Lemma 2. λ A sin 2 θ(, V p. Lemma 3. t α t+ f(t α 2 t+(λ 2 + λ 2 2 sin 2 θ( t, V p. c log( log( Lemma 4. for any (0,, and log( + for any >. + Lemma 5. V p 0 2 > 0 with probability if p, otherwise with probability at least ν p, where 0 < ν < is a constant about V p y. Proof of Theorem. Proof. I We start from epanding ψ( t+ using (3-(4: ψ( t+ 2 log V p t+ 2 2 log V p ( t α t+ f(t 2 +2 log t α t+ f(t 2. To proceed, we need the full eigen-decomposition of A, i.e., A λ V p V p + V p diag(λ p+,, λ n (V p, (5 where V p represents the orthogonal complement of V p. Plugging in this decomposition to V p A, one then gets V p f( t V p (I t t A t We now can write V p A t ( t A t V p t (λ t A t V p t. ψ( t+ ψ( t 2 log( + α t+ (λ t A t where we have and +2 log t α t+ f(t 2, 2 log( + α t+ (λ t A t 2 log( + α t+ sin 2 θ( t, V p (by Lemma 2 2α t+ sin 2 θ( t, V p + α t+ sin 2 θ( t, V p 2α t+ + α t+ sin 2 θ( t, V p, (by Lemma 4 2 log t α t+ f(t 2 log( + α 2 t+(λ 2 + λ 2 2 sin 2 θ( t, V p (by Lemma 3 α 2 t+(λ 2 + λ 2 2 sin 2 θ( t, V p (by Lemma 4 2α 2 t+λ 2 sin 2 θ( t, V p. One thus can arrive at ψ( t+ ψ( t 2α t+ + α t+ sin 2 θ( t, V p +2αt+λ 2 2 sin 2 θ( t, V p (6 2 ψ( t ( 2α t+ λ 2 + α t+ p α t+ sin 2 θ( t, V p ψ( t ψ( t. 2935
4 Note that by Lemma 4, sin 2 θ( t, V p ψ( t sin 2 θ( t, V p log( sin 2 θ( t, V p log( sin 2 θ( t, V p + ψ( t. (7 If +α t+ 2α t+ λ 2 > 0, i.e., α t+ < then ψ( t+ < ψ( t and 2λ 2 (+αt+ p, α t+ ψ( t+ ψ( t + α t+ + ψ( t ψ( t ( if α t+ α. We thus have + α t+ α t+ + ψ( t ψ( t (8 ( + α α + ψ( 0 ψ( t, (9 ψ( t ( + α α + ψ( 0 t ψ( 0 If Ξ < ɛ, i.e., and α ep{ t + α + ψ( 0 }ψ( 0 Ξ. ψ( 0 < + (0 T > ( + α( + ψ( 0 log ψ( 0, α ɛ we must have ψ( T < ɛ. Plugging in α < we can write 2λ 2 (+α p to T, T O( log ψ( 0 ( α ɛ O(( λ 2 log ψ( 0. (2 ɛ Finally note that (0 is equivalent to V p 0 2 > 0, which occurs with probability if p, otherwise at least ν p for certain constant ν (0,, by Lemma 5. II We now restart from (6. Plugging in (7 to (6, we can write ψ( t+ ψ( t 2α t+ + α t+ sin 2 θ( t, V p +2α 2 t+λ 2 sin 2 θ( t, V p ψ( t 2α t+ + α t+ ψ( t + ψ( t + 2λ2 α 2 t+. c Let α t+ τ+t where c, τ > 0 are constants. It is easy to see that for sufficiently small α t+, equivalently, sufficiently large τ, we have ψ( t+ < ψ( t. Thus, one gets ψ( t+ ( 2α t+ + c τ + ψ( 0 ψ( t + 2λ 2 αt+. 2 Denoting a 2c p + c τ p +ψ( and b 0 2c2 λ 2, one can write ψ( t+ ( a τ + t ψ( b t + (τ + t 2. As long as ψ( 0 < + and a >, by Lemma D. in [Balsubramani et al., 203], the recursion of the above inequality will yield τ + ψ( t ( t + τ + a ψ( 0 + 2a+ b a t + τ +. We thus have ψ( T O( T, i.e., T O( ɛ. The proof completes by noting that ψ( 0 < + occurs with high probability similarly and a > can be guaranteed by choosing sufficiently large constants c, τ. Remark We make a few remarks on our proof. Equation (8 shows that the convergence is at least linear because ψ( t+ < ψ( t and thus the contraction factor keep decreasing for constant step-sizes. In Equation (9, α t+ is not necessarily constant. In fact, the results will hold for any stepsize scheme as long as α t+ < 2λ 2 (+αt+ p. This provides theoretical support for us to fleibly choose step-size schemes. Equation ( shows that larger step-sizes in a safe range will lead to faster convergence. Moreover, we can see from Equation (2 that the convergence actually depends on the relative generalized eigengap p λ. This seems to be eplicitly shown for the first time for the considered solver. As mentioned in Section, the proof provides two general analysis frameworks. The generality lies in that both rates hold for any value of the positive. The combination of the results from two kinds of analysis comprehensively portrays the convergence behaviors of the solver. For both kinds of analysis, the convergence has only logarithmic dependence on the initial iterate via ψ( 0 2 log Vp 0 2. Proof of Lemma 2 Proof. Plugging in (5 to A, we get A λ V p V p diag(λ p+,, λ n (V p λ V p λ p+ V p (V p λ V p λ p+ (I V p V p λ V p λ p+ ( V p 2 2 λ cos 2 θ(, V p + λ p+ sin 2 θ(, V p. Thus, one arrives at λ A λ λ cos 2 θ(, V p λ p+ sin 2 θ(, V p (λ λ p+ sin 2 θ(, V p. 2936
5 Proof of Lemma 3 Proof. t α t+ f(t 2 2 ( t α t+ f(t ( t α t+ f(t + α 2 t+ f( t 2 2, where, plugging in the following form of A s full eigendecomposition, A λ vv + v diag(λ 2,, λ n v, for any unit vector v span(v p, we get f( t 2 2 A 2 2 λ vv + v diag(λ 2,, λ n v λ vv v diag(λ 2,, λ n v 2 2 2λ 2 v 2 2 v λ 2 2 v 2 2 v 2 2 2λ 2 v λ 2 2 v 2 2 2λ 2 v v + 2λ 2 2 v v 2λ 2 v (I v + 2λ 2 2 (I vv 2(λ 2 + λ 2 2( (v 2. Since the above inequality holds for any unit vector v span(v p, we have f( t 2 2 2(λ 2 + λ 2 2 min v 2 min ( v span(v (v 2. p By the definition of principal angles [Golub and Van Loan, 996], Vp 2 min min ( v 2 v span(v (v 2. p One thus has f( t 2 2 2(λ 2 + λ 2 2 sin 2 θ(, V p. Proof of Lemma 4 Proof. For any, it holds that + e. Then for any >, log( +. If one lets y + in the above inequality, then log y y. Further letting y z yields log z z + z Last, setting z + gives us z. log( Note that log( + i+ i0 ( i i+ for <. One then can write for (0, that log( ( i+ i0 ( i i+ i0 i i+ + i i i /( log(. + i i i+ ( i i ( i i Proof of Lemma 5 Proof. Let σ min ( and σ ma ( be the smallest and largest singular value of a matri, respectively. We then have Vp 0 2 V p y 2 σ min(vp y, y 2 σ ma (y where σ ma (y > 0 almost surely. Since entries of y are i.i.d. samples from N (0, and V p is orthonormal, entries of V p y are i.i.d. standard normal samples as well. By Equation (3.2 in [Rudelson and Vershynin, 200], σ min (V y > 0 almost surely (i.e., with probability. By Theorem 3.3 in [Rudelson and Vershynin, 200], we have σ min (V p y > 0 with probability at least ν p, where 0 < ν < is a constant about V p y. 4 Eperiments Theoretical properties of the considered solver were already observed and verified empirically in, e.g., [Wen and Yin, 203; Liu et al., 206], albeit without thorough theoretical support. For the sake of completeness, we reproduce a few eperiments on both synthetic and real data here, with new testing on matrices of zero eigengap. All the ground truth information, e.g., v or V p, is obtained by matlab s eigs function for the purpose of benchmarking. The advantage of using synthetic data mainly lies in the control of the eigengap. We set n 000 and follow [Shamir, 205] to generate data using the full eigenvalue decomposition A UΣU. U is an orthogonal matri and is set the same way as 0. Σ diag(σ, Σ 2, where Σ 2 diag( g n,, gn r n with g i N (0, and r being the order of Σ. In addition, the following two settings are considered: p : Σ diag(, η,.η,.2η,.3η,.4η, and then p λ η > 0, where η {0.2, 0.5, 0.8}. p 3: Σ diag(,,, and then p λ g n > 0. For the conventionally considered case that > 0, two types of convergence curves, in terms of the relative function and the commonly used potential function error f(t f(v f(v sin 2 θ( t, v, respectively, are shown in Figure 2 for different eigengap values. Global linear convergence of the solver is observed, and a larger eigengap yields faster convergence. It agrees with the theory. The convergence trends remain greatly consistent for the two types, which holds in the rest of eperiments as well. For the case that 0 and α t+ c τ+t (see Remark, different pairs of step-size parameters are tested in Figure. As described by Theorem, the convergence reported in Figures (a-(b is global linear, rather than sub-linear by [Xu and Gao, 208], and large step-sizes result in faster convergence as well. As it is unknown which leading eigenvector t will converge to finally, it is necessary for us to change the potential from the commonly used sin 2 θ( t, v to sin 2 θ( t, V p in this case. As demonstrated in Figure (c, 2937
6 ( f( t - f(v / f(v c2.87, 0 c4.48, 0 c6.09, 0 sin 2 ( t, V p c2.87, 0 c4.48, 0 c6.09, 0 sin 2 ( t, v c2.87, 0 c4.48, 0 c6.09, (a Relative function error (b Potential sin 2 θ( t, V p Figure : Synthetic data: (c Improper potential sin 2 θ( t, v > real: Schenk Cayley + Barzilai-Borwein normalization + constant ( f( t - f(v / f(v , 0.50, 0.80, t.96 t.4 t.5 ( f( t - f(v / f(v (a Relative function error > (a Relative function error real: Schenk Cayley + Barzilai-Borwein normalization + constant sin 2 ( t, v , 0.50, 0.80, t.96 t.4 t.5 sin 2 ( t, v (b Potential sin 2 θ( t, v Figure 2: Synthetic data: > 0 sin 2 θ( t, v is unable to converge, which implies that t has converged to another leading eigenvector in span(v p. We run two implementations of the solver on a real symmetric matri A, named Schenk, which is of size 0, 728 0, 728 with 85, 000 nonzero entries. The implementation is characterized by the combination of the retraction and stepsize scheme used. Curvilinear search was proposed in [Wen and Yin, 203] which uses Cayley transform + Barzilai- Borwein step-size. The other implementation uses normalization + constant step-size. They were fed with the same random initial iterate. The results are reported in Figure 3, which reflect the global linear convergence as well and meanwhile shows that different step-size schemes matter in practice Conclusion (b Potential sin 2 θ( t, v Figure 3: Real data. We presented a simple yet comprehensive convergence analysis of the Riemannian gradient descent solver for the leading eigenvector computation. Two kinds of general analysis jointly established the true global rate of convergence to one of the leading eigenvectors. The generalized eigengap eliminates the limitation of the commonly considered eigengap. When is large, the convergence is linear, which is described by the first general analysis. If it is as small as ɛ, the second general analysis shows that it is sub-linear O( ɛ rather than O( ɛ 2. It was also shown that the convergence only logarithmically depends on the initial iterate. The key to these breakthroughs made for the considered solver is to use for analysis the log principal angle between iterates and the space of the leading eigenvectors. Along this work, there are a few interesting future research directions. For eample, it is unknown if the results can be etended to k > for the top- 2938
7 k eigenspace computation without deflation. In particular, it would be more attractive to practitioners if the analysis can be translated to the case of the stochastic gradient descent solver, which is well worth further investigation. Also, although the rate is optimal for the considered solver, it would be of great interest to develop faster solvers of this type to match the optimal rate for the problem, e.g., those of Lanczos algorithms. Acknowledgements This research is supported in part by the funding from King Abdullah University of Science and Technology (KAUST. References [Absil et al., 2008] P-A Absil, Robert Mahony, and Rodolphe Sepulchre. Optimization algorithms on matri manifolds. Princeton University Press, [Arora et al., 202] Raman Arora, Andrew Cotter, Karen Livescu, and Nathan Srebro. Stochastic optimization for PCA and PLS. In 50th Annual Allerton Conference on Communication, Control, and Computing, Allerton, pages , 202. [Arora et al., 203] Raman Arora, Andrew Cotter, and Nati Srebro. Stochastic optimization of PCA with capped MSG. In NIPS, pages , 203. [Balcan et al., 206] Maria-Florina Balcan, Simon Shaolei Du, Yining Wang, and Adams Wei Yu. An improved gapdependency analysis of the noisy power method. In COLT, pages , 206. [Balsubramani et al., 203] Akshay Balsubramani, Sanjoy Dasgupta, and Yoav Freund. The fast convergence of incremental pca. In NIPS, pages Curran Associates, Inc., 203. [Bonnabel, 203] Silvere Bonnabel. Stochastic gradient descent on riemannian manifolds. IEEE Trans. Automat. Contr., 58(9: , 203. [Edelman et al., 999] Alan Edelman, Tomás A. Arias, and Steven T. Smith. The geometry of algorithms with orthogonality constraints. SIAM J. Matri Anal. Appl., 20(2: , April 999. [Fan et al., 208] Jianqing Fan, Qiang Sun, Wen-Xin Zhou, and Ziwei Zhu. Principal component analysis for big data. arxiv preprint arxiv: , 208. [Garber et al., 206] Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, and Aaron Sidford. Faster eigenvector computation via shift-and-invert preconditioning. In ICML, pages , 206. [Golub and Van Loan, 996] Gene H. Golub and Charles F. Van Loan. Matri Computations (3rd Ed.. Johns Hopkins University Press, Baltimore, MD, USA, 996. [Hardt and Price, 204] Moritz Hardt and Eric Price. The noisy power method: A meta algorithm with applications. In NIPS, pages , 204. [Liu et al., 206] Huikang Liu, Weijie Wu, and Anthony Man-Cho So. Quadratic optimization with orthogonality constraints: Eplicit lojasiewicz eponent and linear convergence of line-search methods. In ICML, pages 58 67, 206. [Musco and Musco, 205] Cameron Musco and Christopher Musco. Randomized block krylov methods for stronger and faster approimate singular value decomposition. In NIPS, pages , 205. [Ng et al., 2002] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, pages MIT Press, [Pitaval et al., 205] Renaud-Aleandre Pitaval, Wei Dai, and Olav Tirkkonen. Convergence of gradient descent for low-rank matri approimation. IEEE Trans. Information Theory, 6(8: , 205. [Rudelson and Vershynin, 200] Mark Rudelson and Roman Vershynin. Non-asymptotic theory of random matrices: etreme singular values. arxiv preprint arxiv: , 200. [Sa et al., 205] Christopher De Sa, Christopher Re, and Kunle Olukotun. Global convergence of stochastic gradient descent for some non-conve matri problems. In ICML, pages , 205. [Shamir, 205] Ohad Shamir. A stochastic PCA and SVD algorithm with an eponential convergence rate. In ICML, pages 44 52, 205. [Shamir, 206a] Ohad Shamir. Convergence of stochastic gradient descent for PCA. In ICML, pages , 206. [Shamir, 206b] Ohad Shamir. Fast stochastic algorithms for SVD and PCA: convergence properties and conveity. In ICML, pages , 206. [Sra et al., 20] Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright. Optimization for Machine Learning. The MIT Press, 20. [Wen and Yin, 203] Zaiwen Wen and Wotao Yin. A feasible method for optimization with orthogonality constraints. Math. Program., 42(-2: , 203. [Xu and Gao, 208] Zhiqiang Xu and Xin Gao. On truly block eigensolvers via riemannian optimization. In AIS- TATS, pages 68 77, 208. [Xu and Ke, 206a] Zhiqiang Xu and Yiping Ke. Effective and efficient spectral clustering on tet and link data. In CIKM, pages , 206. [Xu and Ke, 206b] Zhiqiang Xu and Yiping Ke. Stochastic variance reduced riemannian eigensolver. CoRR, abs/ , 206. [Xu et al., 207] Zhiqiang Xu, Yiping Ke, and Xin Gao. A fast algorithm for matri eigen-decompositionn. In UAI 207, Sydney, Australia, August -5, 207, 207. [Zhang et al., 206] Hongyi Zhang, Sashank J. Reddi, and Suvrit Sra. Riemannian SVRG: fast stochastic optimization on riemannian manifolds. In NIPS, pages ,
Gradient Descent Meets Shift-and-Invert Preconditioning for Eigenvector Computation
Gradient Descent Meets Shift-and-Invert Preconditioning for Eigenvector Computation Zhiqiang Xu Cognitive Computing Lab (CCL, Baidu Research National Engineering Laboratory of Deep Learning Technology
More informationFirst Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate
58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu
More informationOn Truly Block Eigensolvers via Riemannian Optimization
Zhiqiang Xu Xin Gao zhiqiang.xu@kaust.edu.sa xin.gao@kaust.edu.sa Computer, Electrical and Mathematical Sciences and Engineering Division King Abdullah University of Science and Technology (KAUST) Abstract
More informationOnline Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization
Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization Xingguo Li Zhehui Chen Lin Yang Jarvis Haupt Tuo Zhao University of Minnesota Princeton University
More informationA Stochastic PCA Algorithm with an Exponential Convergence Rate. Ohad Shamir
A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute of Science NIPS Optimization Workshop December 2014 Ohad Shamir Stochastic PCA with Exponential Convergence
More informationDiffusion Approximations for Online Principal Component Estimation and Global Convergence
Diffusion Approximations for Online Principal Component Estimation and Global Convergence Chris Junchi Li Mengdi Wang Princeton University Department of Operations Research and Financial Engineering, Princeton,
More informationCommunication-efficient Algorithms for Distributed Stochastic Principal Component Analysis
Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis Dan Garber Ohad Shamir Nathan Srebro 3 Abstract We study the fundamental problem of Principal Component Analysis
More informationA Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate
Ohad Shamir Weizmann Institute of Science, Rehovot, Israel OHAD.SHAMIR@WEIZMANN.AC.IL Abstract We describe and analyze a simple algorithm for principal component analysis and singular value decomposition,
More informationarxiv: v4 [math.oc] 5 Jan 2016
Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The
More informationGen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation
Gen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation Kush Bhatia University of California, Berkeley kushbhatia@berkeley.edu Nicolas Flammarion University of California,
More informationStatistical Geometry Processing Winter Semester 2011/2012
Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian
More informationReferences. --- a tentative list of papers to be mentioned in the ICML 2017 tutorial. Recent Advances in Stochastic Convex and Non-Convex Optimization
References --- a tentative list of papers to be mentioned in the ICML 2017 tutorial Recent Advances in Stochastic Convex and Non-Convex Optimization Disclaimer: in a quite arbitrary order. 1. [ShalevShwartz-Zhang,
More informationRobust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds
Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationDELFT UNIVERSITY OF TECHNOLOGY
DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug
More informationGeometric Modeling Summer Semester 2010 Mathematical Tools (1)
Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric
More informationA Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang
More informationSVRG++ with Non-uniform Sampling
SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More informationIncremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method
Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yingbin Liang Department of EECS Syracuse
More informationFast Angular Synchronization for Phase Retrieval via Incomplete Information
Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department
More informationStreaming Principal Component Analysis in Noisy Settings
Streaming Principal Component Analysis in Noisy Settings Teodor V. Marinov * 1 Poorya Mianjy * 1 Raman Arora 1 Abstract We study streaming algorithms for principal component analysis (PCA) in noisy settings.
More informationKey words. conjugate gradients, normwise backward error, incremental norm estimation.
Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322
More informationExponentially convergent stochastic k-pca without variance reduction
arxiv:1904.01750v1 [cs.lg 3 Apr 2019 Exponentially convergent stochastic k-pca without variance reduction Cheng Tang Amazon AI Abstract We present Matrix Krasulina, an algorithm for online k-pca, by generalizing
More informationSparse and low-rank decomposition for big data systems via smoothed Riemannian optimization
Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Yuanming Shi ShanghaiTech University, Shanghai, China shiym@shanghaitech.edu.cn Bamdev Mishra Amazon Development
More information1 Kernel methods & optimization
Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationA Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades
A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades Qiaochu Yuan Department of Mathematics UC Berkeley Joint work with Prof. Ming Gu, Bo Li April 17, 2018 Qiaochu Yuan Tour
More informationSelf-Tuning Spectral Clustering
Self-Tuning Spectral Clustering Lihi Zelnik-Manor Pietro Perona Department of Electrical Engineering Department of Electrical Engineering California Institute of Technology California Institute of Technology
More informationSupplementary Materials for Riemannian Pursuit for Big Matrix Recovery
Supplementary Materials for Riemannian Pursuit for Big Matrix Recovery Mingkui Tan, School of omputer Science, The University of Adelaide, Australia Ivor W. Tsang, IS, University of Technology Sydney,
More informationONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering
ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering Filippo Pompili 1, Nicolas Gillis 2, P.-A. Absil 2,andFrançois Glineur 2,3 1- University of Perugia, Department
More informationarxiv: v1 [stat.ml] 15 Feb 2018
1 : A New Algorithm for Streaming PCA arxi:1802.054471 [stat.ml] 15 Feb 2018 Puyudi Yang, Cho-Jui Hsieh, Jane-Ling Wang Uniersity of California, Dais pydyang, chohsieh, janelwang@ucdais.edu Abstract In
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationStochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations
Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Weiran Wang Toyota Technological Institute at Chicago * Joint work with Raman Arora (JHU), Karen Livescu and Nati Srebro (TTIC)
More informationDistributed Inexact Newton-type Pursuit for Non-convex Sparse Learning
Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology
More informationOptimization for Machine Learning (Lecture 3-B - Nonconvex)
Optimization for Machine Learning (Lecture 3-B - Nonconvex) SUVRIT SRA Massachusetts Institute of Technology MPI-IS Tübingen Machine Learning Summer School, June 2017 ml.mit.edu Nonconvex problems are
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationA DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson
204 IEEE INTERNATIONAL WORKSHOP ON ACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 2 24, 204, REIS, FRANCE A DELAYED PROXIAL GRADIENT ETHOD WITH LINEAR CONVERGENCE RATE Hamid Reza Feyzmahdavian, Arda Aytekin,
More informationarxiv: v2 [math.oc] 1 Nov 2017
Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence arxiv:1710.09447v [math.oc] 1 Nov 017 Mingrui Liu, Tianbao Yang Department of Computer Science The University of
More informationTutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer
Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,
More informationAccelerated Stochastic Power Iteration
Peng Xu 1 Bryan He 1 Christopher De Sa Ioannis Mitliagkas 3 Christopher Ré 1 1 Stanford University Cornell University 3 University of Montréal Abstract Principal component analysis (PCA) is one of the
More informationCharacterization of Gradient Dominance and Regularity Conditions for Neural Networks
Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past
More informationParallel Singular Value Decomposition. Jiaxing Tan
Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector
More informationStochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos
1 Stochastic Variance Reduction for Nonconvex Optimization Barnabás Póczos Contents 2 Stochastic Variance Reduction for Nonconvex Optimization Joint work with Sashank Reddi, Ahmed Hefny, Suvrit Sra, and
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania
More informationDistance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationarxiv: v1 [math.oc] 22 May 2018
On the Connection Between Sequential Quadratic Programming and Riemannian Gradient Methods Yu Bai Song Mei arxiv:1805.08756v1 [math.oc] 22 May 2018 May 23, 2018 Abstract We prove that a simple Sequential
More informationMajorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II)
1 Majorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II) Merico Argentati (speaker), Andrew Knyazev, Ilya Lashuk and Abram Jujunashvili Department of Mathematics
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationOnline (and Distributed) Learning with Information Constraints. Ohad Shamir
Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information
More informationUnconstrained Optimization Models for Computing Several Extreme Eigenpairs of Real Symmetric Matrices
Unconstrained Optimization Models for Computing Several Extreme Eigenpairs of Real Symmetric Matrices Yu-Hong Dai LSEC, ICMSEC Academy of Mathematics and Systems Science Chinese Academy of Sciences Joint
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationEfficient coordinate-wise leading eigenvector computation
Journal of Machine Learning Research (07-5 Submitted ; Published Efficient coordinate-wise leading eigenvector computation Jialei Wang University of Chicago Weiran Wang Amazon Alexa Dan Garber Technion
More informationGiven the vectors u, v, w and real numbers α, β, γ. Calculate vector a, which is equal to the linear combination α u + β v + γ w.
Selected problems from the tetbook J. Neustupa, S. Kračmar: Sbírka příkladů z Matematiky I Problems in Mathematics I I. LINEAR ALGEBRA I.. Vectors, vector spaces Given the vectors u, v, w and real numbers
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationThe Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)
Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationLecture: Local Spectral Methods (1 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More informationarxiv: v1 [math.oc] 23 May 2017
A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method
More informationBlock Bidiagonal Decomposition and Least Squares Problems
Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008 Outline Bidiagonal Decomposition
More informationdistances between objects of different dimensions
distances between objects of different dimensions Lek-Heng Lim University of Chicago joint work with: Ke Ye (CAS) and Rodolphe Sepulchre (Cambridge) thanks: DARPA D15AP00109, NSF DMS-1209136, NSF IIS-1546413,
More informationConceptual Questions for Review
Conceptual Questions for Review Chapter 1 1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths.
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationTighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time
Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Zhaoran Wang Huanran Lu Han Liu Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08540 {zhaoran,huanranl,hanliu}@princeton.edu
More informationOn the Computations of Eigenvalues of the Fourth-order Sturm Liouville Problems
Int. J. Open Problems Compt. Math., Vol. 4, No. 3, September 2011 ISSN 1998-6262; Copyright c ICSRS Publication, 2011 www.i-csrs.org On the Computations of Eigenvalues of the Fourth-order Sturm Liouville
More informationSolving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow
Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow Gang Wang,, Georgios B. Giannakis, and Jie Chen Dept. of ECE and Digital Tech. Center, Univ. of Minnesota,
More informationStein s Method for Matrix Concentration
Stein s Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp University of California, Berkeley California Institute of Technology
More informationReview of similarity transformation and Singular Value Decomposition
Review of similarity transformation and Singular Value Decomposition Nasser M Abbasi Applied Mathematics Department, California State University, Fullerton July 8 7 page compiled on June 9, 5 at 9:5pm
More informationThird-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima
Third-order Smoothness elps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Yaodong Yu and Pan Xu and Quanquan Gu arxiv:171.06585v1 [math.oc] 18 Dec 017 Abstract We propose stochastic
More informationEIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems
EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems JAMES H. MONEY and QIANG YE UNIVERSITY OF KENTUCKY eigifp is a MATLAB program for computing a few extreme eigenvalues
More informationR-Linear Convergence of Limited Memory Steepest Descent
R-Linear Convergence of Limited Memory Steepest Descent Frank E. Curtis, Lehigh University joint work with Wei Guo, Lehigh University OP17 Vancouver, British Columbia, Canada 24 May 2017 R-Linear Convergence
More informationA projection algorithm for strictly monotone linear complementarity problems.
A projection algorithm for strictly monotone linear complementarity problems. Erik Zawadzki Department of Computer Science epz@cs.cmu.edu Geoffrey J. Gordon Machine Learning Department ggordon@cs.cmu.edu
More informationThe nonsmooth Newton method on Riemannian manifolds
The nonsmooth Newton method on Riemannian manifolds C. Lageman, U. Helmke, J.H. Manton 1 Introduction Solving nonlinear equations in Euclidean space is a frequently occurring problem in optimization and
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationSYMMETRIC MATRIX PERTURBATION FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENT ANALYSIS. Hafiz Imtiaz and Anand D. Sarwate
SYMMETRIC MATRIX PERTURBATION FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENT ANALYSIS Hafiz Imtiaz and Anand D. Sarwate Rutgers, The State University of New Jersey ABSTRACT Differential privacy is a strong,
More informationOn the exponential convergence of. the Kaczmarz algorithm
On the exponential convergence of the Kaczmarz algorithm Liang Dai and Thomas B. Schön Department of Information Technology, Uppsala University, arxiv:4.407v [cs.sy] 0 Mar 05 75 05 Uppsala, Sweden. E-mail:
More informationSmall sample size in high dimensional space - minimum distance based classification.
Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,
More information4.2. ORTHOGONALITY 161
4.2. ORTHOGONALITY 161 Definition 4.2.9 An affine space (E, E ) is a Euclidean affine space iff its underlying vector space E is a Euclidean vector space. Given any two points a, b E, we define the distance
More informationInvariant Subspace Computation - A Geometric Approach
Invariant Subspace Computation - A Geometric Approach Pierre-Antoine Absil PhD Defense Université de Liège Thursday, 13 February 2003 1 Acknowledgements Advisor: Prof. R. Sepulchre (Université de Liège)
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,
More informationarxiv: v1 [math.na] 1 Sep 2018
On the perturbation of an L -orthogonal projection Xuefeng Xu arxiv:18090000v1 [mathna] 1 Sep 018 September 5 018 Abstract The L -orthogonal projection is an important mathematical tool in scientific computing
More informationStochastic Quasi-Newton Methods
Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationM.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011
.S. Project Report Efficient Failure Rate Prediction for SRA Cells via Gibbs Sampling Yamei Feng /5/ Committee embers: Prof. Xin Li Prof. Ken ai Table of Contents CHAPTER INTRODUCTION...3 CHAPTER BACKGROUND...5
More informationChapter 8 Gradient Methods
Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point
More informationBarzilai-Borwein Step Size for Stochastic Gradient Descent
Barzilai-Borwein Step Size for Stochastic Gradient Descent Conghui Tan The Chinese University of Hong Kong chtan@se.cuhk.edu.hk Shiqian Ma The Chinese University of Hong Kong sqma@se.cuhk.edu.hk Yu-Hong
More informationCOR-OPT Seminar Reading List Sp 18
COR-OPT Seminar Reading List Sp 18 Damek Davis January 28, 2018 References [1] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht. Low-rank Solutions of Linear Matrix Equations via Procrustes
More informationarxiv: v5 [math.na] 16 Nov 2017
RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem
More informationNormalized power iterations for the computation of SVD
Normalized power iterations for the computation of SVD Per-Gunnar Martinsson Department of Applied Mathematics University of Colorado Boulder, Co. Per-gunnar.Martinsson@Colorado.edu Arthur Szlam Courant
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationTHE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR
THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR WEN LI AND MICHAEL K. NG Abstract. In this paper, we study the perturbation bound for the spectral radius of an m th - order n-dimensional
More informationLecture 7: Weak Duality
EE 227A: Conve Optimization and Applications February 7, 2012 Lecture 7: Weak Duality Lecturer: Laurent El Ghaoui 7.1 Lagrange Dual problem 7.1.1 Primal problem In this section, we consider a possibly
More informationSparsification by Effective Resistance Sampling
Spectral raph Theory Lecture 17 Sparsification by Effective Resistance Sampling Daniel A. Spielman November 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened
More informationRecursive Generalized Eigendecomposition for Independent Component Analysis
Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu
More informationComputation of the Joint Spectral Radius with Optimization Techniques
Computation of the Joint Spectral Radius with Optimization Techniques Amir Ali Ahmadi Goldstine Fellow, IBM Watson Research Center Joint work with: Raphaël Jungers (UC Louvain) Pablo A. Parrilo (MIT) R.
More information