Efficient Iterative Semi-Supervised Classification on Manifold
|
|
- Ami Simpson
- 6 years ago
- Views:
Transcription
1 Efficient Iterative Semi-Supervised Classification on Manifold Mehrdad Farajtabar, Hamid R. Rabiee, Amirreza Shaban, Ali Soltani-Farani Digital Media Lab, AICTC Research Center, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran. {farajtabar, shaban, a soltani}@ce.sharif.edu, rabiee@sharif.edu Abstract Semi-Supervised Learning SSL has become a topic of recent research that effectively addresses the problem of limited labeled data. Many SSL methods have been developed based on the manifold assumption, among them, the Local and Global Consistency LGC is a popular method. The problem with most of these algorithms, and in particular with LGC, is the fact that their naive implementations do not scale well to the size of data. Time and memory limitations are the major problems faced in large-scale problems. In this paper, we provide theoretical bounds on gradient descent, and to overcome the aforementioned problems, a new approximate Newton s method is proposed. Moreover, convergence analysis and theoretical bounds for time complexity of the proposed method is provided. We claim that the number of iterations in the proposed methods, logarithmically depends on the number of data, which is a considerable improvement compared to the naive implementations. Experimental results on real world datasets confirm superiority of the proposed methods over LGC s default iterative implementation and the state of the art factorization method. Keywords-Semi-supervised learning, Manifold assumption, Local and global consistency, Iterative method, Convergence analysis I. INTRODUCTION Semi-supervised Learning has become a popular approach to the problem of classification with limited labeled data in recent years []. To use unlabeled data effectively in the learning process, certain assumptions regarding the possible labeling functions and the underlying geometry need to be held []. In many real world classification problems, data points lie on a low dimensional manifold. The manifold assumption states that the labeling function varies smoothly with respect to the underlying manifold [3]. Methods utilizing the manifold assumption prove to be effective in many applications including image segmentation [4], handwritten digit recognition, and text classification [5]. Regularization is essentially the soul of semi-supervised learning based on the manifold assumption. Manifold regularization is commonly formulated as a quadratic optimization problem, min x xt Ax b T x, where A R n n and b, x R n. It is in effect equivalent to solving the system of linear equations, Ax = b. A is fortunately a sparse symmetric positive definite matrix. Naive solutions to this problem require On 3 operations to solve for x, while methods that take into account the sparse structure of A can cost much less. Taking the inverse of A directly is an obvious bad choice for various reasons. First taking the inverse requires On 3 operations regardless of the sparse structure of A. Secondly A may be near singular in which case the inverse operation is numerically unstable. Lastly the inverse of A is usually not sparse in which case a large amount of memory is needed to store and process A. To elaborate, note that semi-supervised learning is specially advantageous when there is large amount of unlabeled data which leads to better utilization of the underlying manifold structure. For example consider the huge amount of unlabeled documents or images on the web which may be used to improve classification results. In these large-scale settings ordinary implementations are not effective, because time and memory limitations are an important concern in SSL methods with the manifold assumption []. There are commonly two approaches to overcome this problem. First, one may reformulate the manifold regularization problem in a new form, more suitable for large-scale settings. For example, [6] considers a linear base kernel and thus requires an inverse operation with a very smaller matrix. [7] uses a sparsified manifold regularizer with core vector machines which is recently proposed for scaling up kernel methods to handle large-scale data. The second approach to this problem which is the focus of this paper relies heavily on factorization, optimization, or iterative procedures to solve the original manifold regularization formulation. Specially, Iterative methods are of great interest. Label propagation LP [8] is an iterative algorithm for computing harmonic solution [9], which is a variation of manifold regularization problem. The other naturally iterative manifold regularization algorithm is local and global consistency LGC [], upon which we build our work. Linear neighborhood propagation LNP [] is another iterative one which differs from other manifold learning methods mostly in the way of constructing the neighborhood graph. The problem with the most of these iterative methods is that, though of being claimed to be
2 converged fast, there is no analytical guarantee or proof for that claim. In this paper we conduct a theoretical analysis of iterative methods for the LGC. We apply gradient descent to the LGC and derive an analytical bound for the number of iterations and its dependency on the number of data. These bounds are also true for other manifold regularization problems such as harmonic solution and tikhonov regularization. We then show that the LGC s iterative procedure may be improved through an approximation of the inverse Hessian and present a detailed convergence analysis. Again a theoretical bound is derived for the number of iterations. We show that these iterative implementations require Olog n sparse matrix-vector multiplications to compute LGC s solution with sufficient accuracy. Then it is proved that LGC s iterative procedure is a special case of our proposed method. Finally proposed methods are compared with LGC s iterative procedure, and a state of the art factorization method utilizing Cholesky. The rest of the paper is organized as follows. In section II some related works in the domain of optimization, factorization and iterative methods are introduced. Section III provides a basic overview of LGC and introduces the notations. Section IV provides a detailed analysis of gradient descent applied to LGC. In section V we then show how the LGC s iterative procedure may be improved and derive further theoretical bounds. Section VI gives experimental results validating the derived bounds, after which the paper is concluded in Section VII. II. RELATED WORKS Methods such as LQ, LU, or Cholesky factorization overcome the inverse operation problems by factorizing A into matrices with special structure that greatly simplify computations especially when A is sparse. In particular Cholesky factorization best fits our problem by making use of the symmetry and positive definiteness properties of A. It decomposes A as P U T UP T, where P is a permutation matrix and U is upper triangular with positive diagonal elements. Heuristics are used to choose a matrix P that leads to a sparse U. In some instances these heuristics fail and the resulting algorithm may not be computationally as efficient as expected []. Iterative methods are another well studied approaches to the problem. Two views to the problem exist. When considering the problem in its optimization form, solutions such as gradient descent, conjugate gradient, steepest descent, and quasi-newton methods become evident. Taking the machine learning view point leads to more meaningful iterative methods. Among them are LP, LNP and LGC which are introduced in the previous section. LGC s iterative procedure is useful in many other applications, so improving and analyzing it may be helpful. For example [3] proposed an iterative procedure based on LGC for ranking in the web and [4] used similar ideas in image retrieval. As stated before the problem with LGC or LP s iterative procedure is that there is no analysis provided on the number of iterations for convergence. Morever, no explicit stopping criterion is mentioned which is essential for bounding convergence iterations. Gradient descent is one of the simplest iterative solutions to any optimization problem, however beyond this simplicity its linear convergence rate is strongly dependent on the condition number of the Hessian [5]. Conjugate gradient is a method especially designed to solve large systems of linear equations. A conjugate set of directions with respect to A are chosen. In each iteration the objective function is minimized in one of the directions. Theoretically the method should converge in at most n iterations, with each iteration costing as much as a sparse matrix-vector multiplication. While this makes conjugate gradient a suitable choice, its inherent numerical instability in finding conjugate directions could yield the procedure slower than expected. [6], [] apply conjugate gradient to harmonic solution with both superior and inferior results to LP depending on the dataset in use. Quasi-newton methods exhibit super-linear convergence. At each iteration the inverse Hessian in Newton s method is replaced by an approximation. These methods will not be helpful unless the approximation is sparse, However sparse quasi-newton methods have an empirically lower convergence rate than low storage quasi-newton [7]. Thus they couldn t be helpful. Moreover for our problem, in which the Hessian is constant, computing an approximate to the inverse Hessian per iteration is costly. In our proposed algorithm we shall avoid this cost by computing a sufficiently precise and also sparse approximation of the inverse Hessian at the start. III. BASICS AND NOTATIONS Consider the general problem of semi-supervised learning. Let X u = {x,..., x u } and X l = {x u+,..., x u+l } be sets of unlabeled and labeled data points respectively, where n = u + l is the total number of data points. Also let y be a vector of length n with y i = for unlabeled x i and y i equals to the or corresponding to the class labels for the labeled data points. Our goal is to predict labels of X = X u X l as f, where f i is the label associated to x i for i =,..., n. It s usual to construct the similarity graph of data using methods like weighted k-nn for better performance and accuracy []. Let W be the n n weight matrix W ij = exp x i x j σ where σ is the bandwidth parameter. Define the diagonal matrix D with nonzero entries Di, i = n j= W ij. Symmetrically normalize W by S = D / W D /. The laplacian matrix is L = I S.
3 The family of manifold regularization algorithms can be formulated as following optimization problem: min f f T Qf + f y T Cf y 3 where Q is a regularization matrix usually the laplacian itself and C is a diagonal matrix with C ii equal to the importance of the i th node to stick to its initial value y i. The first term represents smoothness of the predicted labels with respect to the underlying manifold and the second term is squared error of the predicted labels compared with the initial ones weighted by C. Choosing different Qs and Cs leads to various manifold classification methods [5], [], [9], [3]. In LGC, Q = L and C = µi. It may easily be shown that the solution is equal to: f = L + C Cy = I αs y, 4 where α = µ+. Authors of [] propose an iterative algorithm to compute this solution: f t+ = αsf t + αy. 5 Since < α < and the eigenvalues of S are in [, ], this iterative algorithm converges to the solution of LGC []. In summary, the manifold regularization problem casts into the minimizing, Rf = f T Lf + f y T Cf y. 6 Throughout the paper R t and f t denote the value and point respectively, at the t th iteration of the algorithm and R and f for corresponding optimal ones. IV. ANALYSIS OF GRADIENT DESCENT The gradient of 6 is R = Lf + Cf y, which leads to the gradient descent update rule: f t+ = f t αlf + Cf y. 7 The stopping criterion is R. Choosing α appropriately is essential for convergence. Following [5], applying exact line search to our problem ensures linear convergence and at iteration t we have: t log R R log /z R t R. 8 which z is a constant equal to λminl+c λ maxl+c. For deeper analysis of the method we need the following lemma. Lemma [8]. If λ m and λ M are the smallest and largest eigenvalues of L respectively, then we have = λ m < λ M. Using the above lemma and the fact that C = µi, we have λ min L + C = µ and λ max L + C = µ + λ M µ +. Lemma. For any convex function R of f in 6 the followings hold: R R R R λ max R R. 9 λ min R R. R R λ max R f f f f λ max R. R Proof: Considering that Hessian is a constant matrix, the proof for equations 9 and can be found in standard optimization texts such as [5]. For we need the following [5]: Rh Rf + Rf T h f + λ max R Replacing f for f and f for h we get: h f. 3 Rf Rf + λ max R f f. 4 And the third equation is proved. Combining this with 9 the forth equation is proved. Theorem. The maximum number of iterations for gradient descent with exact line search and fixed, µ is Olog n. Proof: Consider the iteration t just before stopping, i.e., when R t > and R t+. using equation 9 and lemma : R t R λ max L + C Inserting this into 8 yields R t t log λ M +µr R log + µ λ M λ M + µ In order to find an upper bound for R R inequality is used: R R λ M + µ f f λ M + µn 7 where in the last inequality we use the fact that f = and elements of f are in [, ]. Using this in 6 we reach t log λ M +µ n log + µ λ M +µ log n log + µ. 8 Each iteration of gradient descent in equation 7 consists of two steps. First α is computed which takes a fixed number of matrix-vector multiplications. Next Lf + Cf y is
4 computed which costs the same. Considering that all the matrices involved are sparse, because L is constructed using k-nn and C is diagonal, there are some sparse matrixvector multiplications. Thus the total cost of each iteration is Okn, where k is associated to neighborhood size in the construction of similarity graph. Putting these together we come to a Okn log n time complexity of computing the solution of LGC with gradient descent, i.e., a On log n rate of growth with respect the number of data, n, which is comparably less than the ordinary inverse complexity of On 3 in naive implementations or On with sparsity taken into consideration. It is easy to show the analysis presented above is valid for other laplacians, L, and Cs, i.e. applying gradient descent to other manifold regularization methods, such as harmonic solution and tikhonov regularization leads to the same bound. An interesting feature of the bound derived in 8 is that it is independent of the dataset in use. Replacing λ M for its upper bound in 8 eliminates the dependence of the bound to the data. This independence accompanied with being sufficiently tight is appropirate for data-independent practical implementation. V. SPARSE APPROXIMATION OF NEWTON S METHOD Newton s update rule for our problem is f t+ = f t α R R 9 For our quadratic problem one iteration is sufficient to reach the optimum point with α =, however we wish to find a sparse approximation of the inverse Hessian. We show that using a sparse approximation of the inverse Hessian leads to an iterative method with acceptable convergence rate. As an interesting result it may be seen that in the special case our method reduces to the LGC. We start with approximating the inverse Hessian. R = L + C = I S + C = I I + C S I + C = Σ I + C S i I + C The last equality is obtained because eigenvalues of I + C S are all less than one. Using the m first terms in the above summation leads to an approximation of the inverse Hessian: R Σ m I + C S i I + C. Rewriting Newton s method with the approximated inverse Hessian results in the update rule below. f t+ =f t R Lf + Cf y f t Σ m I + C S i I + C Lf + Cf y =f t Σ m I + C S i I + C I + C Sf t I + C Cy =f t Σ m I + C S i I I + C S f t + Σ m I + C S i I + C Cy =f t I I + C S m f t + I + C S i I + C Cy Σ m = I + C S m f t + I + C S i I + C Cy. Σ m In summary it can be restated as: where f t+ = H m f t + g m, 3 H = I + C S 4 m g m = H i I + C Cy. 5 This update rule is performed iteratively from an initial f until the stopping criterion R is reached. Theorem. The approximate Newton s method in 3 converges to the optimal solution of LGC. Proof: Unfolding the update rule in 3 leads to m f t =H mt f + H mi g m m =H mt f + mt m H mi =H mt f + H i I + C Cy H i I + C Cy 6 Tending t gives the final solution. Since the magnitude of the eigenvalues of H are less than one, H mt f, and lim f t = I H I +C Cy = L+C Cy, 7 t which is equal to f in 4.
5 Theorem 3. For the approximate Newton s method in 3 the stopping criterion R is reached in Olog n iterations with respect to the number of data n. Proof: f t f = H m f t g m H m f g m = H m f t f H m is symmetric so H m x λ max H m x, so f t f λmax H m f t f λ max H m t f f = λ max I + C S mt f f f = + µ mt f 8 9 By rewriting the above inequality one can see that the maximum number of iterations is bounded by t log f f f t f m log + µ 3 As in gradient decent consider the iteration t just before the stopping criterion is met, i.e., when R t > and R t+. Using equation we have f t f R t λ max L + C λ M + µ. 3 The maximum number of iterations is thus bounded above by t log λm +µ f f m log + µ f +µ f log m log + µ +µn log m log + µ 3 Similar to gradient descent an Olog n dependency on the number of data is derived for our approximate Newton s method. The sparsity degree of H m is k m, So the matrixvector operations with this matrix cost Ok m n. As the approximation become more exact, H m will become less sparse. So as m increases the number of iterations decrease, as can be seen from 3, however, the cost of each iteration grows. Empirically it is seen that m should be chosen from to 3, so we can treat it as constant and achive a Ok 3 n log n dependence on the number of data for the whole algorithm. Also since k is chosen independent of n and is usually constant, the growth of the algorithm s time complexity is On log n with respect to the number of data. Approx. Newton m = Approx. Newton m = Gradient Descent Figure : Demonstration of steps taken by gradient descent and approximate Newton s method for two data points from MNIST. The algorithms start their movments from top left point to the optimal point which is located at bottom right. Similar to gradient descent the bound derived in 3 is independent of dataset, which accompanied with tightness is a good feature for practical implementation. Experiments show that that the bound derived here is tighter than that of for gradient descent and of course the number of iterations for approximate Newton is much less than that of for gradient descent. As a special case, we claim that for m =, the algorithm is the same as LGC s iteration procedure. Remembering C = Iµ; f t+ =Hf t + g = I + C Sf t + I + C Cy = µ Sf t + µ µ + Cy = αsf t + αcy, 33 which is the same as 5. Figure shows how increasing m affects steps taken by the optimization algorithm in contrast to steps taken by gradient descent for simulations on the MNIST dataset. Gradient descent is extremely dependent on the condition number of the Hessian; for high condition numbers gradient descent usually takes a series of zigzag steps to reach the optimum point. Approximating the Newton step refines the search direction and decreases the zigzag effect. Figure shows that the steps form approximately a line at m =. The Newton step for quadratic problems is in the direction to the optimal point. The trace of approximate method with m = highly coincides with the true direction to the optimum point, indicating how well inverse Hessian is approximated in the proposed method. This is the reason of small number of iterations needed for convergence of approximate method compared with that of for gradient descent. The experiments validating the improvement are presented in the next section.
6 VI. EXPERIMENTS For experiments three real world datasets are used: MNIST for digit recognition, Covertype for forest cover prediction, and Classic for text categorization. These rather large datasets are chosen to better simulate a large-scale setting, for which naive solutions, such as inverse operation, are not applicable in terms of memory and time. The MNIST is a collection of 6 handwritten digit recognition samples. For classification we choose data points from digits and 8. Each is of dimension 784. No processing is done on the data. The forest Covertype dataset is collected for predicting forest cover type from cartographic variables. It includes seven classes and 58 samples of dimension 54. We randomly select samples of types and, and normalize them such that each feature is in [ ]. Classic collection is a benchmark dataset in text mining. This dataset consists of 4 different document collections: CACM 34 documents, CISI 46 documents, CRAN 398 documents, and MED 33 documents. We try to separate first category from others. Terms are single words; Minimum term length is 3. A term appears at least in 3 documents, and a term can appear at most 95 % of the documents. Moreover, Porters stemming is applied while preprocessing. Features are weighted with TFIDF scheme and normalized to. For all the datasets we use the same setting: Adjacency matrices are constructed using 5-NN with the bandwidth size set to mean of standard deviation of data. % of data points are labeled. µ is set to.5. Choosing =.5 empirically ensures convergence to the optimal solutions. Number of Iterations, accuracy, and distance to optimum are reported by average of runs for different random labelings. The algorithms are run on datasets and the results are depicted and discussed in the following. Figure shows the number of iterations for three iterative methods with respect to the number of data. The solution of iterative methods are almost converged to the optimum point as depicted by Figure 3. LGC s default implementation is the worst among the three. Gradient descent is second, and our approximate Newton s method has the fastest convergence rate consistently in the three diverse datasets. Note that LGC corresponds to the approximate method with m =, and as indicated in figure has better direction compared with gradient descent, so it may be surprising that its iterations are more than that of gradient descent. The key point is the line search. Although the direction proposed by gradient descent is worse than the one for LGC, exact line search causes gradient descent to reach the optimum faster. If we incorporate our approximate method with an exact line search we reach even fewer iterations, however empirically it was observed that due to the time consumed by line search, there is no improvement in terms of time duration. Another important point about diagrams in figure is the order of growth with respect to the number of data, which is consistent with the logarithmic growth derived in the previous sections. This makes LGC with iterative implementation a good choice for large-scale SSL tasks. To illustrate how tight the bounds derived for iterative methods are, we put the parameters into equations 3 and 8 to get 9, 38, and 97 for approximate method with m =, m =, and for gradient descent respectively, which may be compared with the empirical values from the diagrams in figure. Interestingly the diagrams show that the derived bounds are quite tight regardless of the dataset. Figure 3 shows accuracy of the iterative methods compared with a factorization method, CHOLMOD [9], which uses Cholesky factorization to solve a system of linear equations fast. Since computing exact solution via inverse is impractical we use a factorization method to solve for the exact solution and compare it with the solution of iterative methods. As seen from the diagrams, for all three datasets, the solution of iterative methods is sufficiently close to the optimal solution, with the number of iterations demonstrated in figure. Figure 4 compares distance to optimum with different methods at each iteration and shows how these methods converge to the optimum. As expected from previous results approximate Newton s method with m = has the fastest convergence, while LGC is the slowest one. As stated before, the superiority measured by number of iterations point of view of gradient descent to LGC is due to its line search, not the direction chosen by the method. Figure 5 shows the time needed to compute the solution. Figure 5a compares our approximate Newton s method with CHOLMOD which is the state of the art method in solving large systems of linear equations. Iterative methods are obviously superior to CHOLMOD. Figure 5b compares running times of different iterative methods. Again the proposed method with m = is the best, however this time LGC performs better than gradient descent, because of the overhead imposed due to the line search. As the number of data get larger the difference between the methods becomes more evident. Time growth is of order n logn, as predicted by theorems and 3. VII. CONCLUSION AND FUTURE WORKS In this paper, a novel approximation to Newton s method is proposed for solving manifold regularization problem along with a theoretical analysis on the number of iterations. We proved that the number of iterations have logarithmic dependence on the number of data. We also applied gradient descent to this problem and proved that its number of iterations also grows logarithmically with the number of data. The logarithmic dependence makes iterative methods a reasonable approach when a large amount of data is being classified. It s notable that the bounds derived, are empirically tight independent of the dataset in use, which is
7 Number of Iterations LGC Approx. Newton m = Gradient Descent a MNIST Number of Iterations x 4 b Covertype Number of Iterations c Classic Figure : Number of iterations for three iterative methods with respect to the number of data. Accuracy.5 LGC Approx. Newton m = Gradient Descent CHOLMOD.95 Accuracy Accuracy a MNIST.5.5 x 4 b Covertype Figure 3: Accuracy of the iterative methods compared with CHOLMOD c Classic f t f * 5 LGC Approx. Newton m = Gradient Descent 5 f t f * 5 5 f t f * Number of iterations a MNIST 3 Number of iterations b Covertype Number of iterations c Classic Figure 4: Distance form optimum for the three methods with respect to the iteration number practically an important feature of an algorithm. We derived LGC s iterative procedure as a special case of our proposed approximate Newton s method. Our method is based upon approximation of the inverse Hessian. The more exact the approximation is, the better the search direction is chosen. Experimental results confirm improvement of our proposed method over LGC s iterative procedure without any loss in accuracy of classification. Also the improvement of our approximate method over gradient descent is revealed both theoretically and empirically. A theoretical analysis of robustness against noise, incorporating a low cost line search with the proposed method, and finding lower bounds on the number of iterations or tighter bounds, to name a few, are interesting problems that remain as future work.
8 Duration Sec 4Approx. Newton m = CHOLMOD 3 Duration Sec.8LGC Approx. Newton m =.6Gradient Descent a MNIST b MNIST Figure 5: Comparison of time needed to compute the solution for iterative methods and CHOLMOD REFERENCES [] X. Zhu, Semi-supervised learning with graphs, Ph.D. dissertation, Carnegie Mellon University, 5. [] O. Chapelle, B. Scholkopf, and A. Zien, Semi-supervised learning. MIT press Cambridge, MA, 6, vol.. [3] M. Belkin, P. Niyogi, and V. Sindhwani, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, Journal of Machine Learning Research, vol. 7, pp , 6. [4] O. Duchenne, J. Audibert, R. Keriven, J. Ponce, and F. Ségonne, Segmentation by transduction, in Computer Vision and Pattern Recognition, 8. CVPR 8. IEEE Conference on. IEEE, 8, pp. 8. [5] M. Belkin and P. Niyogi, Using manifold stucture for partially labeled classification, in NIPS,, pp [6] V. Sindhwani, P. Niyogi, M. Belkin, and S. Keerthi, Linear manifold regularization for large scale semi-supervised learning, in Proc. of the nd ICML Workshop on Learning with Partially Classified Training Data, 5. [7] I. Tsang and J. Kwok, Large-scale sparsified manifold regularization, Advances in Neural Information Processing Systems, vol. 9, p. 4, 7. [8] X. Zhu and Z. Ghahramani, Learning from labeled and unlabeled data with label propagation, School Comput. Sci., Carnegie Mellon Univ., Tech. Rep. CMUCALD--7,. [] A. George and J. Liu, Computer solution of large sparse positive definite systems, ser. Prentice-Hall series in computational mathematics. Prentice-Hall, 98. [3] D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Scholkopf, Ranking on data manifolds, in Advances in neural information processing systems 6: proceedings of the 3 conference, vol. 6. The MIT Press, 4, p. 69. [4] J. He, M. Li, H. Zhang, H. Tong, and C. Zhang, Manifoldranking based image retrieval, in Proceedings of the th annual ACM international conference on Multimedia. ACM, 4, pp [5] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge Univ Pr, 4. [6] A. Argyriou, Efficient approximation methods for harmonic semi- supervised learning, Master s thesis, University College London, UK, 4. [7] J. Nocedal and S. Wright, Numerical optimization. Springer verlag, 999. [8] F. Chung, Spectral graph theory. Amer Mathematical Society, 997, no. 9. [9] Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, Algorithm 887: Cholmod, supernodal sparse cholesky factorization and update/downdate, ACM Trans. Math. Softw., vol. 35, pp. : :4, October 8. [9] X. Zhu, Z. Ghahramani, and J. D. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, in ICML, 3, pp [] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf, Learning with local and global consistency, in NIPS, 3. [] F. Wang and C. Zhang, Label propagation through linear neighborhoods, in Proceedings of the 3rd international conference on Machine learning. ACM, 6, pp
Manifold Coarse Graining for Online Semi-supervised Learning
for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationGraph-Based Semi-Supervised Learning
Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationClassification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter
Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS 249-2 2017 Winter Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationSemi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)
Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Gigantic Image Collections What does the world look like? High
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationSemi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data
More informationSemi-Supervised Learning
Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human
More informationGlobal vs. Multiscale Approaches
Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationSemi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion
Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation
More informationLarge Scale Semi-supervised Linear SVM with Stochastic Gradient Descent
Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationLimits of Spectral Clustering
Limits of Spectral Clustering Ulrike von Luxburg and Olivier Bousquet Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tübingen, Germany {ulrike.luxburg,olivier.bousquet}@tuebingen.mpg.de
More information1 Graph Kernels by Spectral Transforms
Graph Kernels by Spectral Transforms Xiaojin Zhu Jaz Kandola John Lafferty Zoubin Ghahramani Many graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target
More informationSelf-Tuning Semantic Image Segmentation
Self-Tuning Semantic Image Segmentation Sergey Milyaev 1,2, Olga Barinova 2 1 Voronezh State University sergey.milyaev@gmail.com 2 Lomonosov Moscow State University obarinova@graphics.cs.msu.su Abstract.
More informationHou, Ch. et al. IEEE Transactions on Neural Networks March 2011
Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011 Semi-supervised approach which attempts to incorporate partial information from unlabeled data points Semi-supervised approach which attempts
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationRegularization on Discrete Spaces
Regularization on Discrete Spaces Dengyong Zhou and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany {dengyong.zhou, bernhard.schoelkopf}@tuebingen.mpg.de
More informationABC-LogitBoost for Multi-Class Classification
Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12
More informationMAA507, Power method, QR-method and sparse matrix representation.
,, and representation. February 11, 2014 Lecture 7: Overview, Today we will look at:.. If time: A look at representation and fill in. Why do we need numerical s? I think everyone have seen how time consuming
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationLarge-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Large-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver Yan-Ming Zhang and Xu-Yao Zhang National Laboratory
More informationA PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS. Akshay Gadde and Antonio Ortega
A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS Akshay Gadde and Antonio Ortega Department of Electrical Engineering University of Southern California, Los Angeles Email: agadde@usc.edu,
More informationSemi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances
Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationDiscrete vs. Continuous: Two Sides of Machine Learning
Discrete vs. Continuous: Two Sides of Machine Learning Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Oct. 18,
More informationSemi-Supervised Learning with Graphs
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu LTI SCS CMU Thesis Committee John Lafferty (co-chair) Ronald Rosenfeld (co-chair) Zoubin Ghahramani Tommi Jaakkola 1 Semi-supervised Learning classifiers
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France TA: Pierre Perrault Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton October 30, 2017
More informationOn Optimal Frame Conditioners
On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,
More information6. Iterative Methods for Linear Systems. The stepwise approach to the solution...
6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse
More informationMaximum-weighted matching strategies and the application to symmetric indefinite systems
Maximum-weighted matching strategies and the application to symmetric indefinite systems by Stefan Röllin, and Olaf Schenk 2 Technical Report CS-24-7 Department of Computer Science, University of Basel
More information12. Cholesky factorization
L. Vandenberghe ECE133A (Winter 2018) 12. Cholesky factorization positive definite matrices examples Cholesky factorization complex positive definite matrices kernel methods 12-1 Definitions a symmetric
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationSummer School on Graphs in Computer Graphics, Image and Signal Analysis Bornholm, Denmark, August 2011
Summer School on Graphs in Computer Graphics, Image and Signal Analysis Bornholm, Denmark, August 2011 1 Succinct Games Describing a game in normal form entails listing all payoffs for all players and
More informationSPARSE signal representations have gained popularity in recent
6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationImproved Local Coordinate Coding using Local Tangents
Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854
More informationLarge Scale Semi-supervised Linear SVMs. University of Chicago
Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.
More informationESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp.
On different ensembles of kernel machines Michiko Yamana, Hiroyuki Nakahara, Massimiliano Pontil, and Shun-ichi Amari Λ Abstract. We study some ensembles of kernel machines. Each machine is first trained
More informationDistributed Inexact Newton-type Pursuit for Non-convex Sparse Learning
Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology
More informationOnline Manifold Regularization: A New Learning Setting and Empirical Study
Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg 1, Ming Li 2, Xiaojin Zhu 1 1 Computer Sciences, University of Wisconsin Madison, USA. {goldberg,jerryzhu}@cs.wisc.edu
More informationConvex Optimization of Graph Laplacian Eigenvalues
Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Abstract. We consider the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationSemi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data
Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data Boaz Nadler Dept. of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel 76 boaz.nadler@weizmann.ac.il
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationGraph Quality Judgement: A Large Margin Expedition
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Graph Quality Judgement: A Large Margin Expedition Yu-Feng Li Shao-Bo Wang Zhi-Hua Zhou National Key
More informationConvergence Rates for Greedy Kaczmarz Algorithms
onvergence Rates for Greedy Kaczmarz Algorithms Julie Nutini 1, Behrooz Sepehry 1, Alim Virani 1, Issam Laradji 1, Mark Schmidt 1, Hoyt Koepke 2 1 niversity of British olumbia, 2 Dato Abstract We discuss
More informationFrom graph to manifold Laplacian: The convergence rate
Appl. Comput. Harmon. Anal. 2 (2006) 28 34 www.elsevier.com/locate/acha Letter to the Editor From graph to manifold Laplacian: The convergence rate A. Singer Department of athematics, Yale University,
More informationSpectral Clustering on Handwritten Digits Database
University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:
More informationIntegrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction
Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Jianhui Chen, Jieping Ye Computer Science and Engineering Department Arizona State University {jianhui.chen,
More informationImproving L-BFGS Initialization for Trust-Region Methods in Deep Learning
Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University
More informationAdaptive Affinity Matrix for Unsupervised Metric Learning
Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, Junxuan Chen, Yiru Zhao and Hongtao Lu Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering,
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationJustin Solomon MIT, Spring 2017
Justin Solomon MIT, Spring 2017 http://pngimg.com/upload/hammer_png3886.png You can learn a lot about a shape by hitting it (lightly) with a hammer! What can you learn about its shape from vibration frequencies
More informationApproximating the Covariance Matrix with Low-rank Perturbations
Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu
More informationAn Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM
An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department
More informationNonnegative Matrix Factorization Clustering on Multiple Manifolds
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,
More informationMatrix Assembly in FEA
Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,
More informationDimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota
Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the
More informationConjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.
Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear
More informationAn Empirical Study of Building Compact Ensembles
An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.
More informationSemi-Supervised Classification with Universum
Semi-Supervised Classification with Universum Dan Zhang 1, Jingdong Wang 2, Fei Wang 3, Changshui Zhang 4 1,3,4 State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory
More informationMultiple Similarities Based Kernel Subspace Learning for Image Classification
Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationSVRG++ with Non-uniform Sampling
SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract
More informationCOMPARING PERFORMANCE OF NEURAL NETWORKS RECOGNIZING MACHINE GENERATED CHARACTERS
Proceedings of the First Southern Symposium on Computing The University of Southern Mississippi, December 4-5, 1998 COMPARING PERFORMANCE OF NEURAL NETWORKS RECOGNIZING MACHINE GENERATED CHARACTERS SEAN
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationAppendix to: On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts
Appendix to: On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts Hariharan Narayanan Department of Computer Science University of Chicago Chicago IL 6637 hari@cs.uchicago.edu
More informationLecture 9: Numerical Linear Algebra Primer (February 11st)
10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template
More informationAccelerating SVRG via second-order information
Accelerating via second-order information Ritesh Kolte Department of Electrical Engineering rkolte@stanford.edu Murat Erdogdu Department of Statistics erdogdu@stanford.edu Ayfer Özgür Department of Electrical
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationTOPOLOGY FOR GLOBAL AVERAGE CONSENSUS. Soummya Kar and José M. F. Moura
TOPOLOGY FOR GLOBAL AVERAGE CONSENSUS Soummya Kar and José M. F. Moura Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail:{moura}@ece.cmu.edu)
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationA Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification
JMLR: Workshop and Conference Proceedings 1 16 A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification Chih-Yang Hsia r04922021@ntu.edu.tw Dept. of Computer Science,
More informationLec10p1, ORF363/COS323
Lec10 Page 1 Lec10p1, ORF363/COS323 This lecture: Conjugate direction methods Conjugate directions Conjugate Gram-Schmidt The conjugate gradient (CG) algorithm Solving linear systems Leontief input-output
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationImproving Semi-Supervised Target Alignment via Label-Aware Base Kernels
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Improving Semi-Supervised Target Alignment via Label-Aware Base Kernels Qiaojun Wang, Kai Zhang 2, Guofei Jiang 2, and Ivan Marsic
More informationSemi-supervised Eigenvectors for Locally-biased Learning
Semi-supervised Eigenvectors for Locally-biased Learning Toke Jansen Hansen Section for Cognitive Systems DTU Informatics Technical University of Denmark tjha@imm.dtu.dk Michael W. Mahoney Department of
More informationMulti-view Laplacian Support Vector Machines
Multi-view Laplacian Support Vector Machines Shiliang Sun Department of Computer Science and Technology, East China Normal University, Shanghai 200241, China slsun@cs.ecnu.edu.cn Abstract. We propose a
More informationFast Nonnegative Matrix Factorization with Rank-one ADMM
Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More information