Multiple Kernel Self-Organizing Maps

Size: px
Start display at page:

Download "Multiple Kernel Self-Organizing Maps"

Transcription

1 Multiple Kernel Self-Organizing Maps Madalina Olteanu (2), Nathalie Villa-Vialaneix (1,2), Christine Cierco-Ayrolles (1) , April 24th (1) (2) MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

2 Introduction Outline 1 Introduction 2 MK-SOM 3 Applications MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

3 Introduction Data, notations and objectives Data: D datasets (x d i ) i=1,...,n,d=1,...,d all measured on the same individuals or on the same objects, {i,..., n}, each taking values in an arbitrary space G d. MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

4 Introduction Data, notations and objectives Data: D datasets (x d i ) i=1,...,n,d=1,...,d all measured on the same individuals or on the same objects, {i,..., n}, each taking values in an arbitrary space G d. Examples: (x d i ) i are n observations of p numeric variables; (x d i ) i are n nodes of a graph; (x d i ) i are n observations of p factors; (x d i ) i are n texts/labels... MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

5 Introduction Data, notations and objectives Data: D datasets (x d i ) i=1,...,n,d=1,...,d all measured on the same individuals or on the same objects, {i,..., n}, each taking values in an arbitrary space G d. Examples: (x d i ) i are n observations of p numeric variables; (x d i ) i are n nodes of a graph; (x d i ) i are n observations of p factors; (x d i ) i are n texts/labels... Purpose: Combine all datasets to obtain a map of individuals/objects: self-organizing maps for clustering {i,..., n} using all datasets. MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

6 Introduction Example [Villa-Vialaneix et al., 2013] Data: A network with labeled nodes. Examples: Gender in a social network, Functional group of a gene in a gene interaction network... Purpose: find communities, i.e., groups of close nodes in the graph; close meaning: densely connected and sharing (comparatively) a few links with the other groups ( communities ); but also having similar labels. MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

7 MK-SOM Outline 1 Introduction 2 MK-SOM 3 Applications MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

8 MK-SOM Kernels/Multiple kernel What is a kernel? (x i ) G, (K(x i, x j )) ij st: K(x i, x j ) = K(x j, x i ) and (α i ) i, ij α i α j K(x i, x j ) 0. In this case [Aronszajn, 1950], (H,.,. ), Φ : G H st : K(x i, x j ) = Φ(x i ), Φ(x j ) Examples: nodes of a graph: Heat kernel K = e βl or K = L + where L is the Laplacian [Kondor and Lafferty, 2002, Smola and Kondor, 2003, Fouss et al., 2007] numerical variables: Gaussian kernel K(x i, x j ) = e β x i x j 2 ; text: [Watkins, 2000]... MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

9 MK-SOM Kernel SOM [Mac Donald and Fyfe, 2000, Andras, 2002, Villa and Rossi, 2007] (x i ) i G described by a kernel (K(x i, x i )) ii. Prototypes are defined in H: p j = i γ ji φ(x i ); energy is calculated in H: 1: Prototypes initialization: randomly set p 0 = n j i=1 γ0 ij Φ(x i) st γ 0 0 ij and i γ 0 = 1 ij 2: for t = 1 T do 3: Randomly choose i {1,..., n} 4: Assignment f t (x i ) arg min x i p t 1 j j=1,...,m 5: for all j = 1 M do Representation 6: γ t γ t 1 + µ j j t h t (δ(f t (x i ), j)) ( ) 1 i γ t 1 j 7: end for 8: end for δ: distance between neurons, h: decreasing function ((h t ) t diminishes with t), µ t 1/t. MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18 H

10 MK-SOM Kernel SOM [Mac Donald and Fyfe, 2000, Andras, 2002, Villa and Rossi, 2007] (x i ) i G described by a kernel (K(x i, x i )) ii. Prototypes are defined in H: p j = i γ ji φ(x i ); energy is calculated in H: 1: Prototypes initialization: randomly set p 0 = n j i=1 γ0 ij Φ(x i) st γ 0 0 ij and i γ 0 = 1 ij 2: for t = 1 T do 3: Randomly choose i {1,..., n} 4: Assignment f t (x i ) arg min j=1,...,m ll γ t 1 jl γ t 1 jl K t (x l, x l ) 2 5: for all j = 1 M do Representation 6: γ t γ t 1 + µ j j t h t (δ(f t (x i ), j)) ( ) 1 i γ t 1 j 7: end for 8: end for l γ t 1 jl K t (x l, x i ) δ: distance between neurons, h: decreasing function ((h t ) t diminishes with t), µ t 1/t. MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

11 MK-SOM Combining kernels Suppose: each (x d i ) i is described by a kernel K d, kernels can be combined (e.g., [Rakotomamonjy et al., 2008] for SVM): with α d 0 and d α d = 1. K = D α d K d, d=1 MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

12 MK-SOM Combining kernels Suppose: each (x d i ) i is described by a kernel K d, kernels can be combined (e.g., [Rakotomamonjy et al., 2008] for SVM): K = D α d K d, with α d 0 and d α d = 1. Remark: Also useful to integrate different types of information coming from different kernels on the same dataset. d=1 MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

13 MK-SOM multiple kernel SOM 1: Prototypes initialization: randomly set p 0 = n j i=1 γ0 ji Φ(x i) st γ 0 0 and ji i γ 0 = 1 ji 2: Kernel initialization: set (α 0 d ) st α0 0 and d d α d = 1 (e.g., α 0 = 1/D); d K 0 d α 0 d K d 3: for t = 1 T do 4: Randomly choose i {1,..., n} 5: Assignment f t (x i ) arg min x i p t 1 j=1,...,m 6: for all j = 1 M do Representation 7: γ t γ t 1 + µ j j t h t (δ(f t (x i ), j)) ( ) 1 i γ t 1 j 8: end for 9: empty state ;-) 10: end for j 2 H(K t ) MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

14 MK-SOM multiple kernel SOM 1: Prototypes initialization: randomly set p 0 = n j i=1 γ0 ji Φ(x i) st γ 0 0 and ji i γ 0 = 1 ji 2: Kernel initialization: set (α 0 d ) st α0 0 and d d α d = 1 (e.g., α 0 = 1/D); d K 0 d α 0 d K d 3: for t = 1 T do 4: Randomly choose i {1,..., n} 5: Assignment f t (x i ) arg min γ t 1 jl γ t 1 jl K t (x l, x l ) 2 γ t 1 jl K t (x l, x i ) j=1,...,m 6: for all j = 1 M do Representation 7: γ t γ t 1 + µ j j t h t (δ(f t (x i ), j)) ( ) 1 i γ t 1 j 8: end for 9: empty state ;-) 10: end for ll l MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

15 MK-SOM Tuning (α d ) d on-line Purpose: minimize over (γ ji ) ji and (α d ) d the energy E((γ ji ) ji, (α d ) d ) = n i=1 j=1 M h (δ(f(x i ), j)) φ α (x i ) p α j (γ j ) 2, α MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

16 MK-SOM Tuning (α d ) d on-line Purpose: minimize over (γ ji ) ji and (α d ) d the energy E((γ ji ) ji, (α d ) d ) = n i=1 j=1 M h (δ(f(x i ), j)) φ α (x i ) p α j (γ j ) 2, α KSOM picks up an observation x i that has a contribution to the energy equal to: M E xi = h (δ(f(x i ), j)) φ α (x i ) p α j (γ j ) j=1 2 α (1) MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

17 MK-SOM Tuning (α d ) d on-line Purpose: minimize over (γ ji ) ji and (α d ) d the energy E((γ ji ) ji, (α d ) d ) = n i=1 j=1 M h (δ(f(x i ), j)) φ α (x i ) p α j (γ j ) 2, α KSOM picks up an observation x i that has a contribution to the energy equal to: M E xi = h (δ(f(x i ), j)) φ α (x i ) p α j (γ j ) j=1 2 α (1) Idea: Add a gradient descent step based on the derivative of (1): E xi M n = h (δ(f(x i ), j)) α K d(x d i, x d i ) 2 γ jl K d (x d i, x d l )+ d j=1 l=1 n M γ jl γ jl K d (x d l, x d l ) = h (δ(f(x i ), j)) φ(x i ) p d j (γ j ) 2 H(K d ) l,l =1 MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18 j=1

18 MK-SOM adaptive multiple kernel SOM 1: Prototypes initialization: randomly set p 0 = n j i=1 γ0 ji Φ(x i) st γ 0 0 and ji i γ 0 = 1 ji 2: Kernel initialization: set (α 0 d ) st α0 0 and d d α d = 1 (e.g., α 0 = 1/D); d K 0 d α 0 d K d 3: for t = 1 T do 4: Randomly choose i {1,..., n} 5: Assignment f t (x i ) arg min x i p t 1 j=1,...,m 6: for all j = 1 M do Representation 7: γ t γ t 1 + µ j j t h t (δ(f t (x i ), j)) ( ) 1 i γ t 1 j 8: end for 9: 10: end for j 2 H(K t ) MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

19 MK-SOM adaptive multiple kernel SOM 1: Prototypes initialization: randomly set p 0 = n j i=1 γ0 ji Φ(x i) st γ 0 0 and ji i γ 0 = 1 ji 2: Kernel initialization: set (α 0 d ) st α0 0 and d d α d = 1 (e.g., α 0 = 1/D); d K 0 d α 0 d K d 3: for t = 1 T do 4: Randomly choose i {1,..., n} 5: Assignment f t (x i ) arg min x i p t 1 j=1,...,m j 2 H(K t ) 6: for all j = 1 M do Representation 7: γ t γ t 1 + µ j j t h t (δ(f t (x i ), j)) ( ) 1 i γ t 1 j 8: end for 9: Kernel update α t d αt 1 E xi + ν d t α d and K t d α t d K d 10: end for MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

20 Applications Outline 1 Introduction 2 MK-SOM 3 Applications MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

21 Applications Example1: simulated data Graph with 200 nodes classified in 8 groups: graph: Erdös Reyni models: groups 1 to 4 and groups 5 to 8 with intra-group edge probability 0.3 and inter-group edge probability 0.01; numerical data: (( nodes ) ( labelled )) with 2-dimensional (( Gaussian ) ( vectors: )) odd groups N, and even groups N, ; factor with two levels: groups 1, 2, 5 and 7: first level; other groups: second level. Only the knownledge on the three datasets can discriminate all 8 groups. MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

22 Applications Experiment Kernels: graph: L +, numerical data: Gaussian kernel; factor: another Gaussian kernel on the disjunctive recoding MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

23 Applications Experiment Kernels: graph: L +, numerical data: Gaussian kernel; factor: another Gaussian kernel on the disjunctive recoding Comparison: on 100 randomly generated datasets as previously described: multiple kernel SOM with all three data; kernel SOM with a single dataset; kernel SOM with two datasets or all three datasets in a single (Gaussian) kernel. MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

24 Applications An example MK-SOM All in one kernel Graph only Numerical variables only MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

25 Applications Numerical comparison I (over 100 simulations) with mutual information C i C j ij 200 log C i C j C i C j (adjusted version, equal to 1 if partitions are identical [Danon et al., 2005]). MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

26 Applications Numerical comparison II (over 100 simulations) with node purity (average percentage of nodes in the neuron that come from the majority (original) class of the neuron). MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

27 Applications Conclusion Summary integrating multiple sources information on SOM (e.g., finding communities in graph, while taking into account labels); uses multiple kernel and automatically tunes the combination; the method gives relevant communities according to all sources of information and a well-organized map; a similar approach also tested for relational SOM ([Massoni et al., 2013] for analyzing school-to-work transitions). MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

28 Applications Conclusion Summary integrating multiple sources information on SOM (e.g., finding communities in graph, while taking into account labels); uses multiple kernel and automatically tunes the combination; the method gives relevant communities according to all sources of information and a well-organized map; a similar approach also tested for relational SOM ([Massoni et al., 2013] for analyzing school-to-work transitions). Possible developments Main issue: computational time; currently studying a sparse version. MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

29 Applications Conclusion Summary integrating multiple sources information on SOM (e.g., finding communities in graph, while taking into account labels); uses multiple kernel and automatically tunes the combination; the method gives relevant communities according to all sources of information and a well-organized map; a similar approach also tested for relational SOM ([Massoni et al., 2013] for analyzing school-to-work transitions). Possible developments Main issue: computational time; currently studying a sparse version. Thank you for your attention... MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

30 References Applications Andras, P. (2002). Kernel-Kohonen networks. International Journal of Neural Systems, 12: Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3): Danon, L., Diaz-Guilera, A., Duch, J., and Arenas, A. (2005). Comparing community structure identification. Journal of Statistical Mechanics, page P Fouss, F., Pirotte, A., Renders, J., and Saerens, M. (2007). Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3): Kondor, R. and Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. In Proceedings of the 19th International Conference on Machine Learning, pages Mac Donald, D. and Fyfe, C. (2000). The kernel self organising map. In Proceedings of 4th International Conference on knowledge-based intelligence engineering systems and applied technologies, pages Massoni, S., Olteanu, M., and Villa-Vialaneix, N. (2013). Which distance use when extracting typologies in sequence analysis? An application to school to work transitions. In International Work Conference on Artificial Neural Networks (IWANN 2013), Puerto de la Cruz, Tenerife. Forthcoming. Rakotomamonjy, A., Bach, F., Canu, S., and Grandvalet, Y. (2008). SimpleMKL. MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

31 Applications Journal of Machine Learning Research, 9: Smola, A. and Kondor, R. (2003). Kernels and regularization on graphs. In Warmuth, M. and Schölkopf, B., editors, Proceedings of the Conference on Learning Theory (COLT) and Kernel Workshop, Lecture Notes in Computer Science, pages Villa, N. and Rossi, F. (2007). A comparison between dissimilarity SOM and kernel SOM for clustering the vertices of a graph. In 6th International Workshop on Self-Organizing Maps (WSOM), Bielefield, Germany. Neuroinformatics Group, Bielefield University. Villa-Vialaneix, N., Olteanu, M., and Cierco-Ayrolles, C. (2013). Carte auto-organisatrice pour graphes étiquetés. In Actes des Ateliers FGG (Fouille de Grands Graphes), colloque EGC (Extraction et Gestion de Connaissances), Toulouse, France. Watkins, C. (2000). Dynamic alignment kernels. In Smola, A., Bartlett, P., Schölkopf, B., and Schuurmans, D., editors, Advances in Large Margin Classifiers, pages 39 50, Cambridge, MA, USA. MIT P. MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, / 18

Analysis of Multiclass Support Vector Machines

Analysis of Multiclass Support Vector Machines Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

A Sparse Kernelized Matrix Learning Vector Quantization Model for Human Activity Recognition

A Sparse Kernelized Matrix Learning Vector Quantization Model for Human Activity Recognition A Sparse Kernelized Matrix Learning Vector Quantization Model for Human Activity Recognition M. Kästner 1, M. Strickert 2, and T. Villmann 1 1- University of Appl. Sciences Mittweida - Dept. of Mathematics

More information

Spatial correlation in bipartite networks: the impact of the geographical distances on the relations in a corpus of medieval transactions

Spatial correlation in bipartite networks: the impact of the geographical distances on the relations in a corpus of medieval transactions Spatial correlation in bipartite networks: the impact of the geographical distances on the relations in a corpus of medieval transactions Nathalie Villa-Vialaneix, Bertrand Jouve Fabrice Rossi, Florent

More information

Spatial correlation in bipartite networks: the impact of the geographical distances on the relations in a corpus of medieval transactions

Spatial correlation in bipartite networks: the impact of the geographical distances on the relations in a corpus of medieval transactions Spatial correlation in bipartite networks: the impact of the geographical distances on the relations in a corpus of medieval transactions Nathalie Villa-Vialaneix, Bertrand Jouve Fabrice Rossi, Florent

More information

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Immediate Reward Reinforcement Learning for Projective Kernel Methods ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Linking non-binned spike train kernels to several existing spike train metrics

Linking non-binned spike train kernels to several existing spike train metrics Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents

More information

Graph-Based Semi-Supervised Learning

Graph-Based Semi-Supervised Learning Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Dynamic Time-Alignment Kernel in Support Vector Machine

Dynamic Time-Alignment Kernel in Support Vector Machine Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information

More information

Foundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,

More information

Learning with kernels and SVM

Learning with kernels and SVM Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Foundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Unsupervised and Semisupervised Classification via Absolute Value Inequalities

Unsupervised and Semisupervised Classification via Absolute Value Inequalities Unsupervised and Semisupervised Classification via Absolute Value Inequalities Glenn M. Fung & Olvi L. Mangasarian Abstract We consider the problem of classifying completely or partially unlabeled data

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Multiple kernel learning for multiple sources

Multiple kernel learning for multiple sources Multiple kernel learning for multiple sources Francis Bach INRIA - Ecole Normale Supérieure NIPS Workshop - December 2008 Talk outline Multiple sources in computer vision Multiple kernel learning (MKL)

More information

c 4, < y 2, 1 0, otherwise,

c 4, < y 2, 1 0, otherwise, Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,

More information

Statistical learning on graphs

Statistical learning on graphs Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,

More information

Machine Learning : Support Vector Machines

Machine Learning : Support Vector Machines Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Non-linear models Unsupervised machine learning Generic scaffolding So far Supervised

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model,

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Support Vector Machine Classification via Parameterless Robust Linear Programming

Support Vector Machine Classification via Parameterless Robust Linear Programming Support Vector Machine Classification via Parameterless Robust Linear Programming O. L. Mangasarian Abstract We show that the problem of minimizing the sum of arbitrary-norm real distances to misclassified

More information

Graph Matching & Information. Geometry. Towards a Quantum Approach. David Emms

Graph Matching & Information. Geometry. Towards a Quantum Approach. David Emms Graph Matching & Information Geometry Towards a Quantum Approach David Emms Overview Graph matching problem. Quantum algorithms. Classical approaches. How these could help towards a Quantum approach. Graphs

More information

Analysis of N-terminal Acetylation data with Kernel-Based Clustering

Analysis of N-terminal Acetylation data with Kernel-Based Clustering Analysis of N-terminal Acetylation data with Kernel-Based Clustering Ying Liu Department of Computational Biology, School of Medicine University of Pittsburgh yil43@pitt.edu 1 Introduction N-terminal acetylation

More information

Support Vector Machine For Functional Data Classification

Support Vector Machine For Functional Data Classification Support Vector Machine For Functional Data Classification Nathalie Villa 1 and Fabrice Rossi 2 1- Université Toulouse Le Mirail - Equipe GRIMM 5allées A. Machado, 31058 Toulouse cedex 1 - FRANCE 2- Projet

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

A New Space for Comparing Graphs

A New Space for Comparing Graphs A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main

More information

Knowledge-Based Nonlinear Kernel Classifiers

Knowledge-Based Nonlinear Kernel Classifiers Knowledge-Based Nonlinear Kernel Classifiers Glenn M. Fung, Olvi L. Mangasarian, and Jude W. Shavlik Computer Sciences Department, University of Wisconsin Madison, WI 5376 {gfung,olvi,shavlik}@cs.wisc.edu

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Mathematical Programming for Multiple Kernel Learning

Mathematical Programming for Multiple Kernel Learning Mathematical Programming for Multiple Kernel Learning Alex Zien Fraunhofer FIRST.IDA, Berlin, Germany Friedrich Miescher Laboratory, Tübingen, Germany 07. July 2009 Mathematical Programming Stream, EURO

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

Machine Learning - Waseda University Logistic Regression

Machine Learning - Waseda University Logistic Regression Machine Learning - Waseda University Logistic Regression AD June AD ) June / 9 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given

More information

Model Selection for LS-SVM : Application to Handwriting Recognition

Model Selection for LS-SVM : Application to Handwriting Recognition Model Selection for LS-SVM : Application to Handwriting Recognition Mathias M. Adankon and Mohamed Cheriet Synchromedia Laboratory for Multimedia Communication in Telepresence, École de Technologie Supérieure,

More information

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

ν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors

ν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors p.1/1 ν-svms In the traditional softmargin classification SVM formulation we have a penalty constant C such that 1 C size of margin. Furthermore, there is no a priori guidance as to what C should be set

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Machine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3

Machine Learning. Nathalie Villa-Vialaneix -  Formation INRA, Niveau 3 Machine Learning Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial () if its output depends on (is a nonincreasing function of) the distance of the input from a given stored vector. s represent local receptors, as illustrated

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning

More information

Support Vector Machines on General Confidence Functions

Support Vector Machines on General Confidence Functions Support Vector Machines on General Confidence Functions Yuhong Guo University of Alberta yuhong@cs.ualberta.ca Dale Schuurmans University of Alberta dale@cs.ualberta.ca Abstract We present a generalized

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

EM-algorithm for Training of State-space Models with Application to Time Series Prediction EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial basis () if its output depends on (is a non-increasing function of) the distance of the input from a given stored vector. s represent local receptors,

More information

Generative MaxEnt Learning for Multiclass Classification

Generative MaxEnt Learning for Multiclass Classification Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Query Rules Study on Active Semi-Supervised Learning using Particle Competition and Cooperation

Query Rules Study on Active Semi-Supervised Learning using Particle Competition and Cooperation The Brazilian Conference on Intelligent Systems (BRACIS) and Encontro Nacional de Inteligência Artificial e Computacional (ENIAC) Query Rules Study on Active Semi-Supervised Learning using Particle Competition

More information

Learning Vector Quantization

Learning Vector Quantization Learning Vector Quantization Neural Computation : Lecture 18 John A. Bullinaria, 2015 1. SOM Architecture and Algorithm 2. Vector Quantization 3. The Encoder-Decoder Model 4. Generalized Lloyd Algorithms

More information

One-class Label Propagation Using Local Cone Based Similarity

One-class Label Propagation Using Local Cone Based Similarity One-class Label Propagation Using Local Based Similarity Takumi Kobayashi and Nobuyuki Otsu Abstract In this paper, we propose a novel method of label propagation for one-class learning. For binary (positive/negative)

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Smooth Bayesian Kernel Machines

Smooth Bayesian Kernel Machines Smooth Bayesian Kernel Machines Rutger W. ter Borg 1 and Léon J.M. Rothkrantz 2 1 Nuon NV, Applied Research & Technology Spaklerweg 20, 1096 BA Amsterdam, the Netherlands rutger@terborg.net 2 Delft University

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Heterogeneous mixture-of-experts for fusion of locally valid knowledge-based submodels

Heterogeneous mixture-of-experts for fusion of locally valid knowledge-based submodels ESANN'29 proceedings, European Symposium on Artificial Neural Networks - Advances in Computational Intelligence and Learning. Bruges Belgium), 22-24 April 29, d-side publi., ISBN 2-9337-9-9. Heterogeneous

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Similarity and kernels in machine learning

Similarity and kernels in machine learning 1/31 Similarity and kernels in machine learning Zalán Bodó Babeş Bolyai University, Cluj-Napoca/Kolozsvár Faculty of Mathematics and Computer Science MACS 2016 Eger, Hungary 2/31 Machine learning Overview

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

SVMC An introduction to Support Vector Machines Classification

SVMC An introduction to Support Vector Machines Classification SVMC An introduction to Support Vector Machines Classification 6.783, Biomedical Decision Support Lorenzo Rosasco (lrosasco@mit.edu) Department of Brain and Cognitive Science MIT A typical problem We have

More information

Learning from Examples

Learning from Examples Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Bernhard Schölkopf Max-Planck-Institut für biologische Kybernetik 72076 Tübingen, Germany Bernhard.Schoelkopf@tuebingen.mpg.de Alex Smola RSISE, Australian National

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

Space-Time Kernels. Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London

Space-Time Kernels. Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London Space-Time Kernels Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London Joint International Conference on Theory, Data Handling and Modelling in GeoSpatial Information Science, Hong Kong,

More information

An online kernel-based clustering approach for value function approximation

An online kernel-based clustering approach for value function approximation An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr

More information

Multiclass Boosting with Repartitioning

Multiclass Boosting with Repartitioning Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Sparse Support Vector Machines by Kernel Discriminant Analysis

Sparse Support Vector Machines by Kernel Discriminant Analysis Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes

A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes Ürün Dogan 1 Tobias Glasmachers 2 and Christian Igel 3 1 Institut für Mathematik Universität Potsdam Germany

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information