Graph-Based Semi-Supervised Learning
|
|
- Leon Dawson
- 5 years ago
- Views:
Transcription
1 Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005
2 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier Delalleau and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005
3 Graph-Based Semi-Supervised Learning Nicolas Le Roux, Olivier Delalleau and Yoshua Bengio Université de Montréal CIAR Workshop - April 26th, 2005
4 Graph-Based Semi-Supervised Learning Olivier Delalleau, Nicolas Le Roux and Yoshua Bengio Université de Montréal CIAR Workshop - April 26th, 2005
5 Graph-Based Semi-Supervised Learning Yoshua Bengio, Nicolas Le Roux and Olivier Delalleau Université de Montréal CIAR Workshop - April 26th, 2005
6 Graph-Based Semi-Supervised Learning Nicolas Le Roux, Yoshua Bengio and Olivier Delalleau Université de Montréal CIAR Workshop - April 26th, 2005
7 Outline 1 Semi-Supervised Setting 2 Graph Regularization and Label Propagation 3 Transduction vs. Induction 4 Curse of Dimensionality
8 Semi-Supervised Learning for Dummies Task of binary classification with labels y i { 1, 1}
9 Semi-Supervised Learning for Dummies Task of binary classification with labels y i { 1, 1} Semi-supervised = learn something about labels using both labeled and unlabeled data X = (x 1, x 2,..., x n ) Y l = (y 1, y 2,..., y l ) n = l + u
10 Semi-Supervised Learning for Dummies Task of binary classification with labels y i { 1, 1} Semi-supervised = learn something about labels using both labeled and unlabeled data X = (x 1, x 2,..., x n ) Y l = (y 1, y 2,..., y l ) n = l + u Transduction Ŷu = (ŷ l+1, ŷ l+2,..., ŷ n )
11 Semi-Supervised Learning for Dummies Task of binary classification with labels y i { 1, 1} Semi-supervised = learn something about labels using both labeled and unlabeled data X = (x 1, x 2,..., x n ) Y l = (y 1, y 2,..., y l ) n = l + u Transduction Ŷu = (ŷ l+1, ŷ l+2,..., ŷ n ) Induction ŷ : x ŷ(x)
12 The Classical Two-Moon Problem
13 The Classical Two-Moon Problem
14 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)?
15 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)? 1 one that is consistent with the given labels: Ŷl Y l
16 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)? 1 one that is consistent with the given labels: Ŷl Y l 2 one that is smooth on the where the data lie ( / cluster assumption): ŷ i ŷ j when x i close to x j
17 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)? 1 one that is consistent with the given labels: Ŷl Y l 2 one that is smooth on the manifold where the data lie (manifold / cluster assumption): ŷ i ŷ j when x i close to x j
18 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)? 1 one that is consistent with the given labels: Ŷl Y l 2 one that is smooth on the manifold where the data lie (manifold / cluster assumption): ŷ i ŷ j when x i close to x j Cost function C(Ŷ ) = l (ŷ i y i ) 2 + µ 2 i=1 n W ij (ŷ i ŷ j ) 2 i,j=1 with W ij = W X (x i, x j ) a positive weighting function (e.g )
19 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)? 1 one that is consistent with the given labels: Ŷl Y l 2 one that is smooth on the manifold where the data lie (manifold / cluster assumption): ŷ i ŷ j when x i close to x j Cost function C(Ŷ ) = l (ŷ i y i ) 2 + µ 2 i=1 n W ij (ŷ i ŷ j ) 2 i,j=1 with W ij = W X (x i, x j ) a positive weighting function (e.g kernel)
20 Graphs Would be Cool too! Nodes = points, edges (i, j) W ij > 0.
21 Graphs Would be Cool too! Nodes = points, edges (i, j) W ij > 0.
22 Regularization on Graph Graph Laplacian: L ii = j i W ij and L ij = W ij. l C(Ŷ ) = (ŷ i y i ) 2 + µ 2 i=1 n W ij (ŷ i ŷ j ) 2 i,j=1 = Ŷl Y l 2 + µŷ LŶ
23 Regularization on Graph Graph Laplacian: L ii = j i W ij and L ij = W ij. l C(Ŷ ) = (ŷ i y i ) 2 + µ 2 i=1 C(Ŷ ) is minimized when n W ij (ŷ i ŷ j ) 2 i,j=1 = Ŷl Y l 2 + µŷ LŶ (S + µl) Ŷ = Y linear system with n unknowns and equations.
24 From Matrix Inversion to Label Propagation Linear system rewrites for a labeled point ŷ i = j W ijŷj + 1 µ y i j W ij + 1 µ and for an unlabeled point ŷ i = j W ijŷj j W. ij
25 From Matrix Inversion to Label Propagation Linear system rewrites for a labeled point ŷ (t+1) i = and for an unlabeled point ŷ (t+1) i = j W ijŷ(t) j + 1 µ y i j W ij + 1 µ j W ijŷ(t) j j W. ij Jacobi or Gauss-Seidel iteration algorithm.
26 No, I didn t come up with that last week-end X. Zhu, Z. Ghahramani and J. Lafferty (2003): Semi-supervised learning using Gaussian fields and harmonic functions D. Zhou, O. Bousquet, T. Navin Lal, J. Weston, B. Schölkopf (2004): Learning with local and global consistency M. Belkin, I. Matveeva and P. Niyogi (2004): Regularization and Semi-supervised Learning on Large Graphs
27 Remember your Physics class? Electric network analogy (Doyle and Snell, 1984; Zhu, Ghahramani and Lafferty, 2003). Graph electric network with resistors between nodes: R ij = 1 W ij
28 Remember your Physics class? Electric network analogy (Doyle and Snell, 1984; Zhu, Ghahramani and Lafferty, 2003). Graph electric network with resistors between nodes: Ohm s law (potential = label): R ij = 1 W ij ŷ j ŷ i = R ij I ij
29 Remember your Physics class? Electric network analogy (Doyle and Snell, 1984; Zhu, Ghahramani and Lafferty, 2003). Graph electric network with resistors between nodes: Ohm s law (potential = label): R ij = 1 W ij ŷ j ŷ i = R ij I ij Kirchoff s law on an unlabeled node i: I ij = 0 j
30 Remember your Physics class? Electric network analogy (Doyle and Snell, 1984; Zhu, Ghahramani and Lafferty, 2003). Graph electric network with resistors between nodes: Ohm s law (potential = label): R ij = 1 W ij ŷ j ŷ i = R ij I ij Kirchoff s law on an unlabeled node i: I ij = 0 j Same linear system as minimizing C(Ŷ ) over Ŷu only.
31 From Transduction to Induction Solving the linear system Ŷ (transduction).
32 From Transduction to Induction Solving the linear system Ŷ (transduction). From a new point x and already computed Ŷ : C(ŷ(x)) = C(Ŷ ) + µ 2 n W X (x i, x)(ŷ i ŷ(x)) 2 i=1 ŷ(x) = n i=1 W X (x i, x)ŷ i n i=1 W X (x i, x) (Induction like Parzen Windows, but using estimated labels Ŷ ).
33 Faster Training from Subset Previous algorithms are at least quadratic in n.
34 Faster Training from Subset Previous algorithms are at least quadratic in n. Induction formula could train only on subset S.
35 Faster Training from Subset Previous algorithms are at least quadratic in n. Induction formula could train only on subset S. Better: minimize the full cost over ŶS only.
36 Faster Training from Subset Previous algorithms are at least quadratic in n. Induction formula could train only on subset S. Better: minimize the full cost over ŶS only. For x i R = X \ S: ŷ i = j S W ijŷj j S W ij i.e. ŶR = W RS Ŷ S : the cost C(Ŷ ) now only depends on ŶS.
37 Nice Cost isn t it? C(ŶS) = µ ) 2 W ij (ŷi ŷ j 2 i,j R }{{} C RR + 2 µ ) 2 W ij (ŷi ŷ j 2 i R,j S }{{} C RS + µ ) 2 W ij (ŷi ŷ j 2 i,j S }{{} C SS + (ŷ i y i ) 2 i L }{{} C L
38 Let s Make it Simpler Computing C RR is quadratic in n just get rid of it. Linear system with S = m n unknowns much faster Still need to do matrix multiplications scales as O(m 2 n) This slide looked pretty empty with only three points It is important to have reading material when nobody understands your accent
39 Subset Selection 1 Random:
40 Subset Selection 1 Random: fast,
41 Subset Selection 1 Random: fast, easy,
42 Subset Selection 1 Random: fast, easy, crappy. Main problem = does not fill the space well enough bad approximation by the induction formula (some points have no near neighbors in the subset)
43 Subset Selection 1 Random: fast, easy, crappy. Main problem = does not fill the space well enough bad approximation by the induction formula (some points have no near neighbors in the subset) 2 Heuristic: greedy construction of subset by starting with the labeled points and iteratively minimizing over i j S W ij i.e. choose the point x i farthest from the current subset. (Additional tricks to eliminate outliers and sample more points near decision surface)
44 Experimental Results This is not last-minute research so we do have results!
45 Experimental Results This is not last-minute research so we do have results! Table: Comparative Classification Error (Induction) % labeled LETTERS MNIST COVTYPE 1% NoSub RandSub subonly RandSub SmartSub % NoSub RandSub subonly RandSub SmartSub % NoSub RandSub subonly RandSub SmartSub More comparisons between RandSub and SmartSub on 8 more UCI datasets SmartSub always performs better.
46 Curse of Dimensionality Labeling function: ŷ(x) = i ŷ i W X (x i, x)
47 Curse of Dimensionality Labeling function: ŷ(x) = i ŷ i W X (x i, x) Locality: far point prediction = nearest neighbor; also normal vector ŷ x (x) is approximately in the span of the nearest neighbors of x Smoothness: it does not vary much within a ball of radius small w.r.t. σ (obtained from second derivative); also form of cost the learned function varies smoothly in regions with no labeled examples Curse: one needs lots of unlabeled examples to fill the region near the decision surface, and lots of labeled examples to account for all clusters
48 Almost Real-Life Example Need unlabeled examples along the sinus decision surface, and labeled examples in each class region.
49 Conclusion
50 Conclusion Simple non-parametric setting powerful non-parametric semi-supervised algorithm
51 Conclusion Simple non-parametric setting powerful non-parametric semi-supervised algorithm Can scale to large datasets thanks to sparsity / subset selection
52 Conclusion Simple non-parametric setting powerful non-parametric semi-supervised algorithm Can scale to large datasets thanks to sparsity / subset selection Interesting links with electric networks / heat diffusion
53 Conclusion Simple non-parametric setting powerful non-parametric semi-supervised algorithm Can scale to large datasets thanks to sparsity / subset selection Interesting links with electric networks / heat diffusion Limitations of local weights: curse of dimensionality
54 Conclusion Simple non-parametric setting powerful non-parametric semi-supervised algorithm Can scale to large datasets thanks to sparsity / subset selection Interesting links with electric networks / heat diffusion Limitations of local weights: curse of dimensionality Makes it possible to be on time for lunch!!!
55 References Belkin, M., Matveeva, I., and Niyogi, P. (2004). Regularization and semi-supervised learning on large graphs. In Shawe-Taylor, J. and Singer, Y., editors, COLT Springer. Delalleau, O., Bengio, Y., and Le Roux, N. (2005). Efficient non-parametric function induction in semi-supervised learning. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics. Doyle, P. G. and Snell, J. L. (1984). Random walks and electric networks. Mathematical Association of America. Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., and Schölkopf, B. (2004). Learning with local and global consistency. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16, Cambridge, MA. MIT Press. Zhu, X., Ghahramani, Z., and Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In ICML 2003.
Semi-Supervised Learning
Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human
More informationSemi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data
More informationThe Curse of Dimensionality for Local Kernel Machines
The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux April 7th 2005 Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux Snowbird Learning Workshop Perspective
More informationGlobal vs. Multiscale Approaches
Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:
More informationSemi-Supervised Classification with Universum
Semi-Supervised Classification with Universum Dan Zhang 1, Jingdong Wang 2, Fei Wang 3, Changshui Zhang 4 1,3,4 State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationSemi-Supervised Learning with Graphs
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu LTI SCS CMU Thesis Committee John Lafferty (co-chair) Ronald Rosenfeld (co-chair) Zoubin Ghahramani Tommi Jaakkola 1 Semi-supervised Learning classifiers
More informationNeural Networks, Convexity, Kernels and Curses
Neural Networks, Convexity, Kernels and Curses Yoshua Bengio Work done with Nicolas Le Roux, Olivier Delalleau and Hugo Larochelle August 26th 2005 Perspective Curse of Dimensionality Most common non-parametric
More informationThe Curse of Dimensionality for Local Kernel Machines
The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Dept. IRO, Université de Montréal P.O. Box 6128, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,delallea,lerouxni}@iro.umontreal.ca
More informationSemi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data
Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data Boaz Nadler Dept. of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel 76 boaz.nadler@weizmann.ac.il
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationLimits of Spectral Clustering
Limits of Spectral Clustering Ulrike von Luxburg and Olivier Bousquet Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tübingen, Germany {ulrike.luxburg,olivier.bousquet}@tuebingen.mpg.de
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationSemi-Supervised Learning through Principal Directions Estimation
Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de
More informationClassification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter
Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS 249-2 2017 Winter Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationLarge-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Large-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver Yan-Ming Zhang and Xu-Yao Zhang National Laboratory
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationSemi-supervised Learning
Semi-supervised Learning Introduction Supervised learning: x r, y r R r=1 E.g.x r : image, y r : class labels Semi-supervised learning: x r, y r r=1 R, x u R+U u=r A set of unlabeled data, usually U >>
More informationA graph based approach to semi-supervised learning
A graph based approach to semi-supervised learning 1 Feb 2011 Two papers M. Belkin, P. Niyogi, and V Sindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples.
More informationSINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES
SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationRegularization on Discrete Spaces
Regularization on Discrete Spaces Dengyong Zhou and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany {dengyong.zhou, bernhard.schoelkopf}@tuebingen.mpg.de
More informationLearning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst
Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of
More informationCluster Kernels for Semi-Supervised Learning
Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France TA: Pierre Perrault Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton October 30, 2017
More informationEfficient Iterative Semi-Supervised Classification on Manifold
Efficient Iterative Semi-Supervised Classification on Manifold Mehrdad Farajtabar, Hamid R. Rabiee, Amirreza Shaban, Ali Soltani-Farani Digital Media Lab, AICTC Research Center, Department of Computer
More informationImproving Semi-Supervised Target Alignment via Label-Aware Base Kernels
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Improving Semi-Supervised Target Alignment via Label-Aware Base Kernels Qiaojun Wang, Kai Zhang 2, Guofei Jiang 2, and Ivan Marsic
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to
More informationSemi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances
Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2
More informationAppendix to: On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts
Appendix to: On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts Hariharan Narayanan Department of Computer Science University of Chicago Chicago IL 6637 hari@cs.uchicago.edu
More informationSemi-supervised Learning by Entropy Minimization
Semi-supervised Learning by Entropy Minimization Yves Grandvalet Heudiasyc, CNRS/UTC 60205 Compiègne cedex, France grandval@utc.fr Yoshua Bengio Dept. IRO, Université de Montréal Montreal, Qc, H3C 3J7,
More informationGraph Quality Judgement: A Large Margin Expedition
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Graph Quality Judgement: A Large Margin Expedition Yu-Feng Li Shao-Bo Wang Zhi-Hua Zhou National Key
More informationDiscrete vs. Continuous: Two Sides of Machine Learning
Discrete vs. Continuous: Two Sides of Machine Learning Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Oct. 18,
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationSemi-Supervised Learning by Multi-Manifold Separation
Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationPartially labeled classification with Markov random walks
Partially labeled classification with Markov random walks Martin Szummer MIT AI Lab & CBCL Cambridge, MA 0239 szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA 0239 tommi@ai.mit.edu Abstract To
More informationTHE presence of missing values in a dataset often makes
1 Efficient EM Training of Gaussian Mixtures with Missing Data Olivier Delalleau, Aaron Courville, and Yoshua Bengio arxiv:1209.0521v1 [cs.lg] 4 Sep 2012 Abstract In data-mining applications, we are frequently
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More information1 Graph Kernels by Spectral Transforms
Graph Kernels by Spectral Transforms Xiaojin Zhu Jaz Kandola John Lafferty Zoubin Ghahramani Many graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationNonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization
Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization Kilian Q. Weinberger, Benjamin D. Packer, and Lawrence K. Saul Department of Computer and Information Science
More informationJustin Solomon MIT, Spring 2017
Justin Solomon MIT, Spring 2017 http://pngimg.com/upload/hammer_png3886.png You can learn a lot about a shape by hitting it (lightly) with a hammer! What can you learn about its shape from vibration frequencies
More informationHou, Ch. et al. IEEE Transactions on Neural Networks March 2011
Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011 Semi-supervised approach which attempts to incorporate partial information from unlabeled data points Semi-supervised approach which attempts
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationOne-class Label Propagation Using Local Cone Based Similarity
One-class Label Propagation Using Local Based Similarity Takumi Kobayashi and Nobuyuki Otsu Abstract In this paper, we propose a novel method of label propagation for one-class learning. For binary (positive/negative)
More informationSimilarity and kernels in machine learning
1/31 Similarity and kernels in machine learning Zalán Bodó Babeş Bolyai University, Cluj-Napoca/Kolozsvár Faculty of Mathematics and Computer Science MACS 2016 Eger, Hungary 2/31 Machine learning Overview
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationWeighted Nonlocal Laplacian on Interpolation from Sparse Data
Noname manuscript No. (will be inserted by the editor) Weighted Nonlocal Laplacian on Interpolation from Sparse Data Zuoqiang Shi Stanley Osher Wei Zhu Received: date / Accepted: date Abstract Inspired
More informationConvergence Rates for Greedy Kaczmarz Algorithms
onvergence Rates for Greedy Kaczmarz Algorithms Julie Nutini 1, Behrooz Sepehry 1, Alim Virani 1, Issam Laradji 1, Mark Schmidt 1, Hoyt Koepke 2 1 niversity of British olumbia, 2 Dato Abstract We discuss
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton February 10, 2015 MVA 2014/2015 Previous
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationLinear and Non-Linear Dimensionality Reduction
Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationImproved Local Coordinate Coding using Local Tangents
Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationMultiscale Wavelets on Trees, Graphs and High Dimensional Data
Multiscale Wavelets on Trees, Graphs and High Dimensional Data ICML 2010, Haifa Matan Gavish (Weizmann/Stanford) Boaz Nadler (Weizmann) Ronald Coifman (Yale) Boaz Nadler Ronald Coifman Motto... the relationships
More informationCSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?
CSE 190: Reinforcement Learning: An Introduction Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to
More informationGeneralized Optimization Framework for Graph-based Semi-supervised Learning
Generalized Optimization Framework for Graph-based Semi-supervised Learning Konstantin Avrachenkov INRIA Sophia Antipolis, France K.Avrachenkov@sophia.inria.fr Alexey Mishenin St. Petersburg State University,
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationSemi-Supervised Learning Using Gaussian Fields and Harmonic Functions
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin hu HUXJ@CSCMUEDU oubin Ghahramani OUBIN@GASBUCLACU John Lafferty LAFFER@CSCMUEDU School of Computer Science, Carnegie Mellon
More informationSpectral Dimensionality Reduction
Spectral Dimensionality Reduction Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Jean-François Paiement, Pascal Vincent, and Marie Ouimet Département d Informatique et Recherche Opérationnelle Centre
More informationFast Krylov Methods for N-Body Learning
Fast Krylov Methods for N-Body Learning Nando de Freitas Department of Computer Science University of British Columbia nando@cs.ubc.ca Maryam Mahdaviani Department of Computer Science University of British
More informationContribution from: Springer Verlag Berlin Heidelberg 2005 ISBN
Contribution from: Mathematical Physics Studies Vol. 7 Perspectives in Analysis Essays in Honor of Lennart Carleson s 75th Birthday Michael Benedicks, Peter W. Jones, Stanislav Smirnov (Eds.) Springer
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationTrading Convexity for Scalability
Trading Convexity for Scalability Léon Bottou leon@bottou.org Ronan Collobert, Fabian Sinz, Jason Weston ronan@collobert.com, fabee@tuebingen.mpg.de, jasonw@nec-labs.com NEC Laboratories of America A Word
More informationSemi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion
Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation
More information9 Entropy Regularization
9 Entropy Regularization Yves Grandvalet Yoshua Bengio The problem of semi-supervised induction consists in learning a decision rule from labeled and unlabeled data. This task can be undertaken by discriminative
More informationIntegrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction
Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Jianhui Chen, Jieping Ye Computer Science and Engineering Department Arizona State University {jianhui.chen,
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationThe local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance
The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA
More informationUnlabeled Data: Now It Helps, Now It Doesn t
institution-logo-filena A. Singh, R. D. Nowak, and X. Zhu. In NIPS, 2008. 1 Courant Institute, NYU April 21, 2015 Outline institution-logo-filena 1 Conflicting Views in Semi-supervised Learning The Cluster
More informationQuery Rules Study on Active Semi-Supervised Learning using Particle Competition and Cooperation
The Brazilian Conference on Intelligent Systems (BRACIS) and Encontro Nacional de Inteligência Artificial e Computacional (ENIAC) Query Rules Study on Active Semi-Supervised Learning using Particle Competition
More informationManifold Denoising. Matthias Hein Markus Maier Max Planck Institute for Biological Cybernetics Tübingen, Germany
Manifold Denoising Matthias Hein Markus Maier Max Planck Institute for Biological Cybernetics Tübingen, Germany {first.last}@tuebingen.mpg.de Abstract We consider the problem of denoising a noisily sampled
More informationCS599 Lecture 2 Function Approximation in RL
CS599 Lecture 2 Function Approximation in RL Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview of function approximation (FA)
More informationCSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?
CSE 190: Reinforcement Learning: An Introduction Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to
More informationFrom graph to manifold Laplacian: The convergence rate
Appl. Comput. Harmon. Anal. 2 (2006) 28 34 www.elsevier.com/locate/acha Letter to the Editor From graph to manifold Laplacian: The convergence rate A. Singer Department of athematics, Yale University,
More informationSemi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)
Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Gigantic Image Collections What does the world look like? High
More informationCS 730/730W/830: Intro AI
CS 730/730W/830: Intro AI 1 handout: slides Wheeler Ruml (UNH) Lecture 20, CS 730 1 / 13 Summary k-nn Wheeler Ruml (UNH) Lecture 20, CS 730 2 / 13 Summary Summary k-nn supervised learning: induction, generalization
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationMultiple Kernel Self-Organizing Maps
Multiple Kernel Self-Organizing Maps Madalina Olteanu (2), Nathalie Villa-Vialaneix (1,2), Christine Cierco-Ayrolles (1) http://www.nathalievilla.org {madalina.olteanu,nathalie.villa}@univ-paris1.fr, christine.cierco@toulouse.inra.fr
More informationSelf-Tuning Semantic Image Segmentation
Self-Tuning Semantic Image Segmentation Sergey Milyaev 1,2, Olga Barinova 2 1 Voronezh State University sergey.milyaev@gmail.com 2 Lomonosov Moscow State University obarinova@graphics.cs.msu.su Abstract.
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationIFT LAPLACIAN APPLICATIONS. Mikhail Bessmeltsev
IFT 6112 09 LAPLACIAN APPLICATIONS http://www-labs.iro.umontreal.ca/~bmpix/teaching/6112/2018/ Mikhail Bessmeltsev Rough Intuition http://pngimg.com/upload/hammer_png3886.png You can learn a lot about
More informationLarge Scale Semi-supervised Linear SVM with Stochastic Gradient Descent
Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLocal Learning Projections
Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com
More informationMicroarray Data Analysis: Discovery
Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationCENG 793. On Machine Learning and Optimization. Sinan Kalkan
CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen
More informationLinks between Perceptrons, MLPs and SVMs
Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:
More informationManifold Coarse Graining for Online Semi-supervised Learning
for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,
More informationA Scalable Kernel-Based Algorithm for Semi-Supervised Metric Learning
A Scalable Kernel-Based Algorithm for Semi-Supervised Metric Learning Dit-Yan Yeung, Hong Chang, Guang Dai Department of Computer Science and Engineering Hong Kong University of Science and Technology
More informationData-dependent representations: Laplacian Eigenmaps
Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More information