Graph-Based Semi-Supervised Learning

Size: px
Start display at page:

Download "Graph-Based Semi-Supervised Learning"

Transcription

1 Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005

2 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier Delalleau and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005

3 Graph-Based Semi-Supervised Learning Nicolas Le Roux, Olivier Delalleau and Yoshua Bengio Université de Montréal CIAR Workshop - April 26th, 2005

4 Graph-Based Semi-Supervised Learning Olivier Delalleau, Nicolas Le Roux and Yoshua Bengio Université de Montréal CIAR Workshop - April 26th, 2005

5 Graph-Based Semi-Supervised Learning Yoshua Bengio, Nicolas Le Roux and Olivier Delalleau Université de Montréal CIAR Workshop - April 26th, 2005

6 Graph-Based Semi-Supervised Learning Nicolas Le Roux, Yoshua Bengio and Olivier Delalleau Université de Montréal CIAR Workshop - April 26th, 2005

7 Outline 1 Semi-Supervised Setting 2 Graph Regularization and Label Propagation 3 Transduction vs. Induction 4 Curse of Dimensionality

8 Semi-Supervised Learning for Dummies Task of binary classification with labels y i { 1, 1}

9 Semi-Supervised Learning for Dummies Task of binary classification with labels y i { 1, 1} Semi-supervised = learn something about labels using both labeled and unlabeled data X = (x 1, x 2,..., x n ) Y l = (y 1, y 2,..., y l ) n = l + u

10 Semi-Supervised Learning for Dummies Task of binary classification with labels y i { 1, 1} Semi-supervised = learn something about labels using both labeled and unlabeled data X = (x 1, x 2,..., x n ) Y l = (y 1, y 2,..., y l ) n = l + u Transduction Ŷu = (ŷ l+1, ŷ l+2,..., ŷ n )

11 Semi-Supervised Learning for Dummies Task of binary classification with labels y i { 1, 1} Semi-supervised = learn something about labels using both labeled and unlabeled data X = (x 1, x 2,..., x n ) Y l = (y 1, y 2,..., y l ) n = l + u Transduction Ŷu = (ŷ l+1, ŷ l+2,..., ŷ n ) Induction ŷ : x ŷ(x)

12 The Classical Two-Moon Problem

13 The Classical Two-Moon Problem

14 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)?

15 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)? 1 one that is consistent with the given labels: Ŷl Y l

16 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)? 1 one that is consistent with the given labels: Ŷl Y l 2 one that is smooth on the where the data lie ( / cluster assumption): ŷ i ŷ j when x i close to x j

17 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)? 1 one that is consistent with the given labels: Ŷl Y l 2 one that is smooth on the manifold where the data lie (manifold / cluster assumption): ŷ i ŷ j when x i close to x j

18 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)? 1 one that is consistent with the given labels: Ŷl Y l 2 one that is smooth on the manifold where the data lie (manifold / cluster assumption): ŷ i ŷ j when x i close to x j Cost function C(Ŷ ) = l (ŷ i y i ) 2 + µ 2 i=1 n W ij (ŷ i ŷ j ) 2 i,j=1 with W ij = W X (x i, x j ) a positive weighting function (e.g )

19 Where are Manifolds and Kernels? What is a good labeling Ŷ = (Ŷl, Ŷu)? 1 one that is consistent with the given labels: Ŷl Y l 2 one that is smooth on the manifold where the data lie (manifold / cluster assumption): ŷ i ŷ j when x i close to x j Cost function C(Ŷ ) = l (ŷ i y i ) 2 + µ 2 i=1 n W ij (ŷ i ŷ j ) 2 i,j=1 with W ij = W X (x i, x j ) a positive weighting function (e.g kernel)

20 Graphs Would be Cool too! Nodes = points, edges (i, j) W ij > 0.

21 Graphs Would be Cool too! Nodes = points, edges (i, j) W ij > 0.

22 Regularization on Graph Graph Laplacian: L ii = j i W ij and L ij = W ij. l C(Ŷ ) = (ŷ i y i ) 2 + µ 2 i=1 n W ij (ŷ i ŷ j ) 2 i,j=1 = Ŷl Y l 2 + µŷ LŶ

23 Regularization on Graph Graph Laplacian: L ii = j i W ij and L ij = W ij. l C(Ŷ ) = (ŷ i y i ) 2 + µ 2 i=1 C(Ŷ ) is minimized when n W ij (ŷ i ŷ j ) 2 i,j=1 = Ŷl Y l 2 + µŷ LŶ (S + µl) Ŷ = Y linear system with n unknowns and equations.

24 From Matrix Inversion to Label Propagation Linear system rewrites for a labeled point ŷ i = j W ijŷj + 1 µ y i j W ij + 1 µ and for an unlabeled point ŷ i = j W ijŷj j W. ij

25 From Matrix Inversion to Label Propagation Linear system rewrites for a labeled point ŷ (t+1) i = and for an unlabeled point ŷ (t+1) i = j W ijŷ(t) j + 1 µ y i j W ij + 1 µ j W ijŷ(t) j j W. ij Jacobi or Gauss-Seidel iteration algorithm.

26 No, I didn t come up with that last week-end X. Zhu, Z. Ghahramani and J. Lafferty (2003): Semi-supervised learning using Gaussian fields and harmonic functions D. Zhou, O. Bousquet, T. Navin Lal, J. Weston, B. Schölkopf (2004): Learning with local and global consistency M. Belkin, I. Matveeva and P. Niyogi (2004): Regularization and Semi-supervised Learning on Large Graphs

27 Remember your Physics class? Electric network analogy (Doyle and Snell, 1984; Zhu, Ghahramani and Lafferty, 2003). Graph electric network with resistors between nodes: R ij = 1 W ij

28 Remember your Physics class? Electric network analogy (Doyle and Snell, 1984; Zhu, Ghahramani and Lafferty, 2003). Graph electric network with resistors between nodes: Ohm s law (potential = label): R ij = 1 W ij ŷ j ŷ i = R ij I ij

29 Remember your Physics class? Electric network analogy (Doyle and Snell, 1984; Zhu, Ghahramani and Lafferty, 2003). Graph electric network with resistors between nodes: Ohm s law (potential = label): R ij = 1 W ij ŷ j ŷ i = R ij I ij Kirchoff s law on an unlabeled node i: I ij = 0 j

30 Remember your Physics class? Electric network analogy (Doyle and Snell, 1984; Zhu, Ghahramani and Lafferty, 2003). Graph electric network with resistors between nodes: Ohm s law (potential = label): R ij = 1 W ij ŷ j ŷ i = R ij I ij Kirchoff s law on an unlabeled node i: I ij = 0 j Same linear system as minimizing C(Ŷ ) over Ŷu only.

31 From Transduction to Induction Solving the linear system Ŷ (transduction).

32 From Transduction to Induction Solving the linear system Ŷ (transduction). From a new point x and already computed Ŷ : C(ŷ(x)) = C(Ŷ ) + µ 2 n W X (x i, x)(ŷ i ŷ(x)) 2 i=1 ŷ(x) = n i=1 W X (x i, x)ŷ i n i=1 W X (x i, x) (Induction like Parzen Windows, but using estimated labels Ŷ ).

33 Faster Training from Subset Previous algorithms are at least quadratic in n.

34 Faster Training from Subset Previous algorithms are at least quadratic in n. Induction formula could train only on subset S.

35 Faster Training from Subset Previous algorithms are at least quadratic in n. Induction formula could train only on subset S. Better: minimize the full cost over ŶS only.

36 Faster Training from Subset Previous algorithms are at least quadratic in n. Induction formula could train only on subset S. Better: minimize the full cost over ŶS only. For x i R = X \ S: ŷ i = j S W ijŷj j S W ij i.e. ŶR = W RS Ŷ S : the cost C(Ŷ ) now only depends on ŶS.

37 Nice Cost isn t it? C(ŶS) = µ ) 2 W ij (ŷi ŷ j 2 i,j R }{{} C RR + 2 µ ) 2 W ij (ŷi ŷ j 2 i R,j S }{{} C RS + µ ) 2 W ij (ŷi ŷ j 2 i,j S }{{} C SS + (ŷ i y i ) 2 i L }{{} C L

38 Let s Make it Simpler Computing C RR is quadratic in n just get rid of it. Linear system with S = m n unknowns much faster Still need to do matrix multiplications scales as O(m 2 n) This slide looked pretty empty with only three points It is important to have reading material when nobody understands your accent

39 Subset Selection 1 Random:

40 Subset Selection 1 Random: fast,

41 Subset Selection 1 Random: fast, easy,

42 Subset Selection 1 Random: fast, easy, crappy. Main problem = does not fill the space well enough bad approximation by the induction formula (some points have no near neighbors in the subset)

43 Subset Selection 1 Random: fast, easy, crappy. Main problem = does not fill the space well enough bad approximation by the induction formula (some points have no near neighbors in the subset) 2 Heuristic: greedy construction of subset by starting with the labeled points and iteratively minimizing over i j S W ij i.e. choose the point x i farthest from the current subset. (Additional tricks to eliminate outliers and sample more points near decision surface)

44 Experimental Results This is not last-minute research so we do have results!

45 Experimental Results This is not last-minute research so we do have results! Table: Comparative Classification Error (Induction) % labeled LETTERS MNIST COVTYPE 1% NoSub RandSub subonly RandSub SmartSub % NoSub RandSub subonly RandSub SmartSub % NoSub RandSub subonly RandSub SmartSub More comparisons between RandSub and SmartSub on 8 more UCI datasets SmartSub always performs better.

46 Curse of Dimensionality Labeling function: ŷ(x) = i ŷ i W X (x i, x)

47 Curse of Dimensionality Labeling function: ŷ(x) = i ŷ i W X (x i, x) Locality: far point prediction = nearest neighbor; also normal vector ŷ x (x) is approximately in the span of the nearest neighbors of x Smoothness: it does not vary much within a ball of radius small w.r.t. σ (obtained from second derivative); also form of cost the learned function varies smoothly in regions with no labeled examples Curse: one needs lots of unlabeled examples to fill the region near the decision surface, and lots of labeled examples to account for all clusters

48 Almost Real-Life Example Need unlabeled examples along the sinus decision surface, and labeled examples in each class region.

49 Conclusion

50 Conclusion Simple non-parametric setting powerful non-parametric semi-supervised algorithm

51 Conclusion Simple non-parametric setting powerful non-parametric semi-supervised algorithm Can scale to large datasets thanks to sparsity / subset selection

52 Conclusion Simple non-parametric setting powerful non-parametric semi-supervised algorithm Can scale to large datasets thanks to sparsity / subset selection Interesting links with electric networks / heat diffusion

53 Conclusion Simple non-parametric setting powerful non-parametric semi-supervised algorithm Can scale to large datasets thanks to sparsity / subset selection Interesting links with electric networks / heat diffusion Limitations of local weights: curse of dimensionality

54 Conclusion Simple non-parametric setting powerful non-parametric semi-supervised algorithm Can scale to large datasets thanks to sparsity / subset selection Interesting links with electric networks / heat diffusion Limitations of local weights: curse of dimensionality Makes it possible to be on time for lunch!!!

55 References Belkin, M., Matveeva, I., and Niyogi, P. (2004). Regularization and semi-supervised learning on large graphs. In Shawe-Taylor, J. and Singer, Y., editors, COLT Springer. Delalleau, O., Bengio, Y., and Le Roux, N. (2005). Efficient non-parametric function induction in semi-supervised learning. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics. Doyle, P. G. and Snell, J. L. (1984). Random walks and electric networks. Mathematical Association of America. Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., and Schölkopf, B. (2004). Learning with local and global consistency. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16, Cambridge, MA. MIT Press. Zhu, X., Ghahramani, Z., and Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In ICML 2003.

Semi-Supervised Learning

Semi-Supervised Learning Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human

More information

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data

More information

The Curse of Dimensionality for Local Kernel Machines

The Curse of Dimensionality for Local Kernel Machines The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux April 7th 2005 Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux Snowbird Learning Workshop Perspective

More information

Global vs. Multiscale Approaches

Global vs. Multiscale Approaches Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:

More information

Semi-Supervised Classification with Universum

Semi-Supervised Classification with Universum Semi-Supervised Classification with Universum Dan Zhang 1, Jingdong Wang 2, Fei Wang 3, Changshui Zhang 4 1,3,4 State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Semi-Supervised Learning with Graphs

Semi-Supervised Learning with Graphs Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu LTI SCS CMU Thesis Committee John Lafferty (co-chair) Ronald Rosenfeld (co-chair) Zoubin Ghahramani Tommi Jaakkola 1 Semi-supervised Learning classifiers

More information

Neural Networks, Convexity, Kernels and Curses

Neural Networks, Convexity, Kernels and Curses Neural Networks, Convexity, Kernels and Curses Yoshua Bengio Work done with Nicolas Le Roux, Olivier Delalleau and Hugo Larochelle August 26th 2005 Perspective Curse of Dimensionality Most common non-parametric

More information

The Curse of Dimensionality for Local Kernel Machines

The Curse of Dimensionality for Local Kernel Machines The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Dept. IRO, Université de Montréal P.O. Box 6128, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,delallea,lerouxni}@iro.umontreal.ca

More information

Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data

Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data Boaz Nadler Dept. of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel 76 boaz.nadler@weizmann.ac.il

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Limits of Spectral Clustering

Limits of Spectral Clustering Limits of Spectral Clustering Ulrike von Luxburg and Olivier Bousquet Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tübingen, Germany {ulrike.luxburg,olivier.bousquet}@tuebingen.mpg.de

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS 249-2 2017 Winter Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin

More information

How to learn from very few examples?

How to learn from very few examples? How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A

More information

Large-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver

Large-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Large-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver Yan-Ming Zhang and Xu-Yao Zhang National Laboratory

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Semi-supervised Learning

Semi-supervised Learning Semi-supervised Learning Introduction Supervised learning: x r, y r R r=1 E.g.x r : image, y r : class labels Semi-supervised learning: x r, y r r=1 R, x u R+U u=r A set of unlabeled data, usually U >>

More information

A graph based approach to semi-supervised learning

A graph based approach to semi-supervised learning A graph based approach to semi-supervised learning 1 Feb 2011 Two papers M. Belkin, P. Niyogi, and V Sindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples.

More information

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:

More information

Graphs, Geometry and Semi-supervised Learning

Graphs, Geometry and Semi-supervised Learning Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In

More information

Regularization on Discrete Spaces

Regularization on Discrete Spaces Regularization on Discrete Spaces Dengyong Zhou and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany {dengyong.zhou, bernhard.schoelkopf}@tuebingen.mpg.de

More information

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France TA: Pierre Perrault Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton October 30, 2017

More information

Efficient Iterative Semi-Supervised Classification on Manifold

Efficient Iterative Semi-Supervised Classification on Manifold Efficient Iterative Semi-Supervised Classification on Manifold Mehrdad Farajtabar, Hamid R. Rabiee, Amirreza Shaban, Ali Soltani-Farani Digital Media Lab, AICTC Research Center, Department of Computer

More information

Improving Semi-Supervised Target Alignment via Label-Aware Base Kernels

Improving Semi-Supervised Target Alignment via Label-Aware Base Kernels Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Improving Semi-Supervised Target Alignment via Label-Aware Base Kernels Qiaojun Wang, Kai Zhang 2, Guofei Jiang 2, and Ivan Marsic

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2

More information

Appendix to: On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

Appendix to: On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts Appendix to: On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts Hariharan Narayanan Department of Computer Science University of Chicago Chicago IL 6637 hari@cs.uchicago.edu

More information

Semi-supervised Learning by Entropy Minimization

Semi-supervised Learning by Entropy Minimization Semi-supervised Learning by Entropy Minimization Yves Grandvalet Heudiasyc, CNRS/UTC 60205 Compiègne cedex, France grandval@utc.fr Yoshua Bengio Dept. IRO, Université de Montréal Montreal, Qc, H3C 3J7,

More information

Graph Quality Judgement: A Large Margin Expedition

Graph Quality Judgement: A Large Margin Expedition Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Graph Quality Judgement: A Large Margin Expedition Yu-Feng Li Shao-Bo Wang Zhi-Hua Zhou National Key

More information

Discrete vs. Continuous: Two Sides of Machine Learning

Discrete vs. Continuous: Two Sides of Machine Learning Discrete vs. Continuous: Two Sides of Machine Learning Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Oct. 18,

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Semi-Supervised Learning by Multi-Manifold Separation

Semi-Supervised Learning by Multi-Manifold Separation Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

Partially labeled classification with Markov random walks

Partially labeled classification with Markov random walks Partially labeled classification with Markov random walks Martin Szummer MIT AI Lab & CBCL Cambridge, MA 0239 szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA 0239 tommi@ai.mit.edu Abstract To

More information

THE presence of missing values in a dataset often makes

THE presence of missing values in a dataset often makes 1 Efficient EM Training of Gaussian Mixtures with Missing Data Olivier Delalleau, Aaron Courville, and Yoshua Bengio arxiv:1209.0521v1 [cs.lg] 4 Sep 2012 Abstract In data-mining applications, we are frequently

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

1 Graph Kernels by Spectral Transforms

1 Graph Kernels by Spectral Transforms Graph Kernels by Spectral Transforms Xiaojin Zhu Jaz Kandola John Lafferty Zoubin Ghahramani Many graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization

Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization Kilian Q. Weinberger, Benjamin D. Packer, and Lawrence K. Saul Department of Computer and Information Science

More information

Justin Solomon MIT, Spring 2017

Justin Solomon MIT, Spring 2017 Justin Solomon MIT, Spring 2017 http://pngimg.com/upload/hammer_png3886.png You can learn a lot about a shape by hitting it (lightly) with a hammer! What can you learn about its shape from vibration frequencies

More information

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011 Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011 Semi-supervised approach which attempts to incorporate partial information from unlabeled data points Semi-supervised approach which attempts

More information

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014 Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists

More information

One-class Label Propagation Using Local Cone Based Similarity

One-class Label Propagation Using Local Cone Based Similarity One-class Label Propagation Using Local Based Similarity Takumi Kobayashi and Nobuyuki Otsu Abstract In this paper, we propose a novel method of label propagation for one-class learning. For binary (positive/negative)

More information

Similarity and kernels in machine learning

Similarity and kernels in machine learning 1/31 Similarity and kernels in machine learning Zalán Bodó Babeş Bolyai University, Cluj-Napoca/Kolozsvár Faculty of Mathematics and Computer Science MACS 2016 Eger, Hungary 2/31 Machine learning Overview

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Weighted Nonlocal Laplacian on Interpolation from Sparse Data

Weighted Nonlocal Laplacian on Interpolation from Sparse Data Noname manuscript No. (will be inserted by the editor) Weighted Nonlocal Laplacian on Interpolation from Sparse Data Zuoqiang Shi Stanley Osher Wei Zhu Received: date / Accepted: date Abstract Inspired

More information

Convergence Rates for Greedy Kaczmarz Algorithms

Convergence Rates for Greedy Kaczmarz Algorithms onvergence Rates for Greedy Kaczmarz Algorithms Julie Nutini 1, Behrooz Sepehry 1, Alim Virani 1, Issam Laradji 1, Mark Schmidt 1, Hoyt Koepke 2 1 niversity of British olumbia, 2 Dato Abstract We discuss

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton February 10, 2015 MVA 2014/2015 Previous

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Improved Local Coordinate Coding using Local Tangents

Improved Local Coordinate Coding using Local Tangents Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Multiscale Wavelets on Trees, Graphs and High Dimensional Data

Multiscale Wavelets on Trees, Graphs and High Dimensional Data Multiscale Wavelets on Trees, Graphs and High Dimensional Data ICML 2010, Haifa Matan Gavish (Weizmann/Stanford) Boaz Nadler (Weizmann) Ronald Coifman (Yale) Boaz Nadler Ronald Coifman Motto... the relationships

More information

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating? CSE 190: Reinforcement Learning: An Introduction Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to

More information

Generalized Optimization Framework for Graph-based Semi-supervised Learning

Generalized Optimization Framework for Graph-based Semi-supervised Learning Generalized Optimization Framework for Graph-based Semi-supervised Learning Konstantin Avrachenkov INRIA Sophia Antipolis, France K.Avrachenkov@sophia.inria.fr Alexey Mishenin St. Petersburg State University,

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions

Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin hu HUXJ@CSCMUEDU oubin Ghahramani OUBIN@GASBUCLACU John Lafferty LAFFER@CSCMUEDU School of Computer Science, Carnegie Mellon

More information

Spectral Dimensionality Reduction

Spectral Dimensionality Reduction Spectral Dimensionality Reduction Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Jean-François Paiement, Pascal Vincent, and Marie Ouimet Département d Informatique et Recherche Opérationnelle Centre

More information

Fast Krylov Methods for N-Body Learning

Fast Krylov Methods for N-Body Learning Fast Krylov Methods for N-Body Learning Nando de Freitas Department of Computer Science University of British Columbia nando@cs.ubc.ca Maryam Mahdaviani Department of Computer Science University of British

More information

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN Contribution from: Mathematical Physics Studies Vol. 7 Perspectives in Analysis Essays in Honor of Lennart Carleson s 75th Birthday Michael Benedicks, Peter W. Jones, Stanislav Smirnov (Eds.) Springer

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

Trading Convexity for Scalability

Trading Convexity for Scalability Trading Convexity for Scalability Léon Bottou leon@bottou.org Ronan Collobert, Fabian Sinz, Jason Weston ronan@collobert.com, fabee@tuebingen.mpg.de, jasonw@nec-labs.com NEC Laboratories of America A Word

More information

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation

More information

9 Entropy Regularization

9 Entropy Regularization 9 Entropy Regularization Yves Grandvalet Yoshua Bengio The problem of semi-supervised induction consists in learning a decision rule from labeled and unlabeled data. This task can be undertaken by discriminative

More information

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Jianhui Chen, Jieping Ye Computer Science and Engineering Department Arizona State University {jianhui.chen,

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA

More information

Unlabeled Data: Now It Helps, Now It Doesn t

Unlabeled Data: Now It Helps, Now It Doesn t institution-logo-filena A. Singh, R. D. Nowak, and X. Zhu. In NIPS, 2008. 1 Courant Institute, NYU April 21, 2015 Outline institution-logo-filena 1 Conflicting Views in Semi-supervised Learning The Cluster

More information

Query Rules Study on Active Semi-Supervised Learning using Particle Competition and Cooperation

Query Rules Study on Active Semi-Supervised Learning using Particle Competition and Cooperation The Brazilian Conference on Intelligent Systems (BRACIS) and Encontro Nacional de Inteligência Artificial e Computacional (ENIAC) Query Rules Study on Active Semi-Supervised Learning using Particle Competition

More information

Manifold Denoising. Matthias Hein Markus Maier Max Planck Institute for Biological Cybernetics Tübingen, Germany

Manifold Denoising. Matthias Hein Markus Maier Max Planck Institute for Biological Cybernetics Tübingen, Germany Manifold Denoising Matthias Hein Markus Maier Max Planck Institute for Biological Cybernetics Tübingen, Germany {first.last}@tuebingen.mpg.de Abstract We consider the problem of denoising a noisily sampled

More information

CS599 Lecture 2 Function Approximation in RL

CS599 Lecture 2 Function Approximation in RL CS599 Lecture 2 Function Approximation in RL Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview of function approximation (FA)

More information

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating? CSE 190: Reinforcement Learning: An Introduction Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to

More information

From graph to manifold Laplacian: The convergence rate

From graph to manifold Laplacian: The convergence rate Appl. Comput. Harmon. Anal. 2 (2006) 28 34 www.elsevier.com/locate/acha Letter to the Editor From graph to manifold Laplacian: The convergence rate A. Singer Department of athematics, Yale University,

More information

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Gigantic Image Collections What does the world look like? High

More information

CS 730/730W/830: Intro AI

CS 730/730W/830: Intro AI CS 730/730W/830: Intro AI 1 handout: slides Wheeler Ruml (UNH) Lecture 20, CS 730 1 / 13 Summary k-nn Wheeler Ruml (UNH) Lecture 20, CS 730 2 / 13 Summary Summary k-nn supervised learning: induction, generalization

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Multiple Kernel Self-Organizing Maps

Multiple Kernel Self-Organizing Maps Multiple Kernel Self-Organizing Maps Madalina Olteanu (2), Nathalie Villa-Vialaneix (1,2), Christine Cierco-Ayrolles (1) http://www.nathalievilla.org {madalina.olteanu,nathalie.villa}@univ-paris1.fr, christine.cierco@toulouse.inra.fr

More information

Self-Tuning Semantic Image Segmentation

Self-Tuning Semantic Image Segmentation Self-Tuning Semantic Image Segmentation Sergey Milyaev 1,2, Olga Barinova 2 1 Voronezh State University sergey.milyaev@gmail.com 2 Lomonosov Moscow State University obarinova@graphics.cs.msu.su Abstract.

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

IFT LAPLACIAN APPLICATIONS. Mikhail Bessmeltsev

IFT LAPLACIAN APPLICATIONS.   Mikhail Bessmeltsev IFT 6112 09 LAPLACIAN APPLICATIONS http://www-labs.iro.umontreal.ca/~bmpix/teaching/6112/2018/ Mikhail Bessmeltsev Rough Intuition http://pngimg.com/upload/hammer_png3886.png You can learn a lot about

More information

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Local Learning Projections

Local Learning Projections Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

CENG 793. On Machine Learning and Optimization. Sinan Kalkan CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen

More information

Links between Perceptrons, MLPs and SVMs

Links between Perceptrons, MLPs and SVMs Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:

More information

Manifold Coarse Graining for Online Semi-supervised Learning

Manifold Coarse Graining for Online Semi-supervised Learning for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,

More information

A Scalable Kernel-Based Algorithm for Semi-Supervised Metric Learning

A Scalable Kernel-Based Algorithm for Semi-Supervised Metric Learning A Scalable Kernel-Based Algorithm for Semi-Supervised Metric Learning Dit-Yan Yeung, Hong Chang, Guang Dai Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information