Graphs in Machine Learning
|
|
- Annabel McLaughlin
- 6 years ago
- Views:
Transcription
1 Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton February 10, 2015 MVA 2014/2015
2 Previous Lecture resistive networks recommendation score as a resistance? Laplacian and resistive networks computation of effective resistance geometry of the data and the connectivity spectral clustering connectivity vs. compactness MinCut, RatioCut, NCut spectral relaxations manifold learning Michal Valko Graphs in Machine Learning Lecture 4-2/43
3 This Lecture manifold learning with Laplacian Eigenmaps Gaussian random fields and harmonic solution Graph-based semi-supervised learning and manifold regularization Theory of Laplacian-based manifold methods Transductive learning SSL Learnability Michal Valko Graphs in Machine Learning Lecture 4-3/43
4 Previous Lab Session by Content Graph Construction Test sensitivity to parameters: σ, k, ε Spectral Clustering Spectral Clustering vs. k-means Image Segmentation Short written report (graded, each lab around 5% of grade) Hint: Order 2.1, 2.6 (find the bend), 2.2, 2.3, 2.4, 2.5 Questions to Deadline: ~calandri/ta/graphs/td1_handout.pdf Michal Valko Graphs in Machine Learning Lecture 4-4/43
5 Advanced Learning for Text and Graph Data Time: Wednesdays 8h30-11h30 4 lectures and 3 Labs Place: Polytechnique / Amphi Sauvy Lecturer 1: Michalis Vazirgiannis (Polytechnique) Lecturer 2: Yassine Faihe (Hewlett-Packard - Vertica) ALTeGraD and Graphs in ML run in parallel The two graph courses are coordinated to be complementary. Some of covered graph topics not covered in this course Ranking algorithms and measures (Kendal Tau, NDCG) Advanced graph generators Community mining, advanced graph clustering Graph degeneracy (k-core & extensions) Privacy in graph mining contenus-/advanced-learning-for-text-and-graph-data-altegrad kjsp?rh= Michal Valko Graphs in Machine Learning Lecture 4-5/43
6 PhD proposal at CMU and JIE SYSU-CMU Joint Institute of Engineering (JIE) in Guangzhou, China: International environment, English working language Fully-funded PhD positions available at SYSU-CMU JIE: Single-degree program at SYSU in Guangzhou, China Double-degree program (selective) 2 years at CMU, Pittsburgh rest of the time at JIE in Guangzhou, China Fundamental research with applications in: Supercomputing & Big Data Biomedical applications Autonomous driving Smart grids and power systems Contact: paweng@cmu.edu A New Engineering School Michal Valko Graphs in Machine Learning Lecture 4-6/43
7 Manifold Learning: Recap problem: definition reduction/manifold learning Given {x i } n i=1 from Rd find {y i } n i=1 in Rm, where m d. What do we know about the dimensionality reduction representation/visualization (2D or 3D) an old example: globe to a map often assuming M R d feature extraction linear vs. nonlinear dimensionality reduction What do we know about linear linear vs. nonlinear methods? linear: ICA, PCA, SVD,... nonlinear often preserve only local distances Michal Valko Graphs in Machine Learning Lecture 4-7/43
8 Manifold Learning: Linear vs. Non-linear Michal Valko Graphs in Machine Learning Lecture 4-8/43
9 Manifold Learning: Preserving (just) local distances d(y i, y j ) = d(x i, x j ) only if d(x i, x j ) is small min ij w ij y i y j 2 Looks familiar? Michal Valko Graphs in Machine Learning Lecture 4-9/43
10 Manifold Learning: Laplacian Eigenmaps Step 1: Solve generalized eigenproblem: Lf = λdf Step 2: Assign m new coordinates: x i (f 2 (i),..., f m (i)) Note 1 : we need to get m smallest eigenvectors Note 2 : f 1 is useless Michal Valko Graphs in Machine Learning Lecture 4-10/43
11 Manifold Learning: Laplacian Eigenmaps to 1D Laplacian Eigenmaps 1D objective min f f T Lf s.t. f i R, f T D1 = 0, f T Df = 1 The meaning for constraints is similar as for spectral clustering: f T Df = 1 is for scaling f T D1 = 0 is to not get v 1 What is the solution? Michal Valko Graphs in Machine Learning Lecture 4-11/43
12 Manifold Learning: Example laplacian-eigenmap-~-diffusion-map-~-manifold-learning Michal Valko Graphs in Machine Learning Lecture 4-12/43
13 Semi-supervised learning: How is it possible? This is how children learn! hypothesis Michal Valko Graphs in Machine Learning Lecture 4-13/43
14 Semi-supervised learning (SSL) SSL problem: definition Given {x i } n i=1 from Rd and {y i } n l i=1, with n l n, find {y i } n i=n l +1 (transductive) or find f predicting y well beyond that (inductive). Some facts about SSL assumes that the unlabeled data is useful works with data geometry assumptions cluster assumption - low-density separation manifold assumption smoothness assumptions, generative models,... now it helps now, now it does not (sic) provable cases when it helps inductive or transductive/out-of-sample extension Michal Valko Graphs in Machine Learning Lecture 4-14/43
15 SSL: Self-Training Michal Valko Graphs in Machine Learning Lecture 4-15/43
16 SSL: Overview: Self-Training SSL: Self-Training Input: L = {x i, y i } n l i=1 and U = {x i} n i=n l +1 Repeat: train f using L apply f to (some) U and add them to L What are the properties of self-training? its a wrapper method heavily depends on the the internal classifier some theory exist for specific classifiers nobody uses it anymore errors propagate (unless the cluster are well separated) Michal Valko Graphs in Machine Learning Lecture 4-16/43
17 SSL: Self-Training: Bad Case Michal Valko Graphs in Machine Learning Lecture 4-17/43
18 SSL: Transductive SVM: S3VM Michal Valko Graphs in Machine Learning Lecture 4-18/43
19 SSL: Transductive SVM: Classical SVM Linear case: f = w T x + b we look for (w, b) max-margin classification max w,b 1 w s.t. y i (w T x i + b) 1 i = 1,..., n l max-margin classification min w,b w 2 s.t. y i (w T x i + b) 1 i = 1,..., n l Michal Valko Graphs in Machine Learning Lecture 4-19/43
20 SSL: Transductive SVM: Classical SVM max-margin classification: separable case min w,b w 2 s.t. y i (w T x i + b) 1 i = 1,..., n l max-margin classification: non-separable case min w,b λ w 2 + i ξ i s.t. y i (w T x i + b) 1 ξ i i = 1,..., n l ξ i 0 i = 1,..., n l Michal Valko Graphs in Machine Learning Lecture 4-20/43
21 SSL: Transductive SVM: Classical SVM max-margin classification: non-separable case min w,b λ w 2 + i ξ i s.t. y i (w T x i + b) 1 ξ i i = 1,..., n l ξ i 0 i = 1,..., n l Unconstrained formulation: l max (1 y i (w T x i + b), 0) + λ w 2 In general? min w,b i min w,b l V (x i, y i, f (x i )) + λω(f ) i Michal Valko Graphs in Machine Learning Lecture 4-21/43
22 SSL: Transductive SVM: Unlabeled Examples min w,b n l max (1 y i (w T x i + b), 0) + λ w 2 i How to incorporate unlabeled examples? No y s for unlabeled x. Prediction of f for (any) x? ŷ = sgn (f (x)) = sgn (w T x + b) Pretending that sgn (f (x)) is true... V (x, ŷ, f (x)) = max (1 ŷ (w T x + b), 0) = max (1 sgn (w T x + b) (w T x + b), 0) = max (1 w T x + b, 0) Michal Valko Graphs in Machine Learning Lecture 4-22/43
23 SSL: Transductive SVM: Hinge and Hat Loss What is the difference in the objectives? Hinge loss penalizes? Hat loss penalizes? the margin of being on the wrong side predicting in the margin Michal Valko Graphs in Machine Learning Lecture 4-23/43
24 SSL: Transductive SVM: S3VM This what we wanted! Michal Valko Graphs in Machine Learning Lecture 4-24/43
25 SSL: Transductive SVM: Formulation min w,b Main SVM idea stays: penalize the margin n l i=1 n l +n u max (1 y i (w T x i + b), 0)+λ 1 w 2 +λ 2 What is the loss and what is the regularizer? i=l+1 max (1 w T x i + b, 0) min w,b n l i=1 n l +n u max (1 y i (w T x i + b), 0)+λ 1 w 2 +λ 2 i=l+1 max (1 w T x i + b, 0) Think of unlabeled data as the regularizers for your classifiers! Practical hint: Additionally enforce the class balance. Another problem: Optimization is difficult. Michal Valko Graphs in Machine Learning Lecture 4-25/43
26 SSL with Graphs: Prehistory Blum/Chawla: Learning from Labeled and Unlabeled Data using Graph Mincuts *following some insights from vision research in 1980s Michal Valko Graphs in Machine Learning Lecture 4-26/43
27 SSL with Graphs: MinCut MinCut SSL: an idea similar to MinCut clustering Where is the link? connected classes, not necessarily compact What is the formal statement? We look for f (x) {±1} n l +n u cut = w ij i,j=1 (f (x i ) f (x j )) 2 = Ω(f ) Why (f (x i ) f (x j )) 2 and not f (x i ) f (x j )? It does not matter. Michal Valko Graphs in Machine Learning Lecture 4-27/43
28 SSL with Graphs: MinCut We look for f (x) {±1} Ω(f ) = n l +n u i,j=1 w ij (f (x i ) f (x j )) 2 Clustering was unsupervised, here we have supervised data. Recall the general objective framework: min w,b l V (x i, y i, f (x i )) + λω(f ) i It would be nice if we match the prediction on labeled data: V (x, y, f (x)) = (f (x) y) 2 n l i=1 Michal Valko Graphs in Machine Learning Lecture 4-28/43
29 SSL with Graphs: MinCut Final objective function: min f {±1} n l +nu n l i=1 n l +n u (f (x) y) 2 + λ i,j=1 w ij (f (x i ) f (x j )) 2 This is an integer program :( Can we solve it? It still just MinCut. Are we happy? There are six solutions. All equivalent. We need a better way to reflect the confidence. Michal Valko Graphs in Machine Learning Lecture 4-29/43
30 SSL with Graphs: Harmonic Functions Zhu/Ghahramani/Lafferty: Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions *a seminal paper that convinced people to use graphs for SSL Idea 1: Look for a unique solution. Idea 2: Find a smooth one. (Harmonic solution) Harmonic SSL 1): As before we constrain f to match the supervised data: f (x i ) = y i i {1,..., n l } 2): We enforce the solution f to be harmonic. i j f (x i ) = f (x j)w ij i j w i {n l + 1,..., n u + n l } ij Michal Valko Graphs in Machine Learning Lecture 4-30/43
31 SSL with Graphs: Harmonic Functions The harmonic solution is obtained from the mincut one... min f {±1} n l +nu n l i=1 n l +n u (f (x i ) y i ) 2 + λ i,j=1 w ij (f (x i ) f (x j )) 2... if we just relax the integer constraints to be real... min f R n l +nu n l i=1 n l +n u (f (x i ) y i ) 2 + λ i,j=1... or equivalently (note that f (x i ) = f i )... n l +n u min w ij (f (x i ) f (x j )) 2 f R n l +nu i,j=1 s.t. y i = f (x i ) i = 1,..., n l w ij (f (x i ) f (x j )) 2 Michal Valko Graphs in Machine Learning Lecture 4-31/43
32 SSL with Graphs: Harmonic Functions Properties of the relaxation from ±1 to R there is a closed form solution for f this solution is unique globally optimal it is either constant or has a maximum /minimum on a boundary f (x i ) may not be discrete but we can threshold it random walk interpretation electric networks interpretation Michal Valko Graphs in Machine Learning Lecture 4-32/43
33 SSL with Graphs: Harmonic Functions Random walk interpretation: 1) start from the vertex to label and follow 2) P(j i) = w ij k w P = D 1 W ik 3) finish when the labeled vertex is hit absorbing random walk f i = probability of reaching a positive labeled vertex Michal Valko Graphs in Machine Learning Lecture 4-33/43
34 SSL with Graphs: Harmonic Functions How to compute HS? Option A: iteration/propagation Step 1: Set f (x i ) = y i for i = 1,..., n l Step 2: Propagate iteratively (only for unlabeled) i j f (x i ) f (x j)w ij i j w i {n l + 1,..., n u + n l } ij Properties: this will converge to the harmonic solution we can set the initial values for unlabeled nodes arbitrarily an interesting option for large-scale data Michal Valko Graphs in Machine Learning Lecture 4-34/43
35 SSL with Graphs: Harmonic Functions How to compute HS? Option B: Closed form solution Define f = (f (x 1 ),..., f (x nl +n u )) = (f 1,..., f nl +n u ) Ω(f ) = n l +n u i,j=1 w ij (f (x i ) f (x j )) 2 = f T Lf L is a (n l + n u ) (n l + n u ) matrix: [ Lll L L = lu L u1 L uu ] How to compute this constrained minimization problem? Yes, Lagrangian multipliers are an option, but... Michal Valko Graphs in Machine Learning Lecture 4-35/43
36 SSL with Graphs: Harmonic Functions Let us compute harmonic solution using harmonic property! How did we formalize the harmonic property of a circuit? (Lf) u = 0 In matrix notation [ Lll L ul L lu L uu ] [ ] fl = f l is constrained to be y l and for f u f u [ 0l 0 u ]... from which we get L ul f l + L uu f u = 0 u f u = L 1 uu ( L ul f l ) = L 1 uu (W ul f l ). Michal Valko Graphs in Machine Learning Lecture 4-36/43
37 SSL with Graphs: Harmonic Functions Can we see that this calculate the probability of a random walk? f u = L 1 uu ( L ul f l ) = L 1 uu (W ul f l ) Note that P = D 1 W. Then equivalently f u = (I P uu ) 1 P ul f l. Split the equation into +ve & -ve part: f i = (I P uu ) 1 iu P ulf l = (I P uu ) 1 j:y j =1 iu P uj } {{ } p (+1) i = p (+1) i p ( 1) i j:y j = 1 (I P uu ) 1 iu P uj } {{ } p ( 1) i Michal Valko Graphs in Machine Learning Lecture 4-37/43
38 SSL with Graphs: Regularized Harmonic Functions f i = f i sgn(f }{{} i ) }{{} confidence label What if a nasty outlier sneaks in? The prediction for the outlier can be hyperconfident :( How to control the confidence of the inference? We add a sink to the graph. Allow the random walk to die! sink = artificial label node with value 0 We connect it to every other vertex. What will this do to our predictions? depends on the weigh on the edges Michal Valko Graphs in Machine Learning Lecture 4-38/43
39 SSL with Graphs: Regularized Harmonic Functions How do we compute this regularized random walk? How does γ g influence HS? f u = (L uu + γ g I) 1 (W ul f l ) What happens to sneaky outliers? Michal Valko Graphs in Machine Learning Lecture 4-39/43
40 SSL with Graphs: Soft Harmonic Functions Regularized HS objective with Q = L + γ g I: n l min w ij (f (x i ) y i ) 2 + λf T Qf f R n l +nu i=1 What if we do not really believe that f (x i ) = y i, i? f = min (f f R n y)t C(f y) + f T Qf { c l for labeled examples C is diagonal with C ii = c u otherwise. { true label for labeled examples y pseudo-targets with y i = 0 otherwise. Michal Valko Graphs in Machine Learning Lecture 4-40/43
41 SSL with Graphs: Soft Harmonic Functions f = min f R n (f y)t C(f y) + f T Qf Closed form soft harmonic solution: f = (C 1 Q + I) 1 y What are the differences between hard and soft? Not much different in practice. Provable generalization guarantees for soft. Michal Valko Graphs in Machine Learning Lecture 4-41/43
42 SSL with Graphs: Regularized Harmonic Functions Larger implications of random walks random walk relates to commute distance which should satisfy ( ) Vertices in the same cluster of the graph have a small commute distance, whereas two vertices in different clusters of the graph have a large commute distance. Do we have this property for HS? What if n? Luxburg/Radl/Hein: Getting lost in space: Large sample analysis of the commute distance people/luxburg/publications/luxburgradlhein2010_paperandsupplement.pdf Solutions? 1) γ g 2) amplified commute distance 3) L p 4) L... The goal of these solutions: make them remember! Michal Valko Graphs in Machine Learning Lecture 4-42/43
43 SequeL INRIA Lille MVA 2014/2015 Michal Valko sequel.lille.inria.fr
Graphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France TA: Pierre Perrault Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton October 30, 2017
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France TA: Pierre Perrault Partially based on material by: Ulrike von Luxburg, Gary Miller, Mikhail Belkin October 16th, 2017 MVA 2017/2018
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationManifold Coarse Graining for Online Semi-supervised Learning
for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France TA: Pierre Perrault Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman October 10th,
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationSemi-Supervised Learning
Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human
More informationSemi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)
Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Gigantic Image Collections What does the world look like? High
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationSemi-Supervised Learning by Multi-Manifold Separation
Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationSemi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationMachine Learning for Data Science (CS4786) Lecture 11
Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to
More informationClassification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter
Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS 249-2 2017 Winter Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin
More informationSpectral Clustering. Zitao Liu
Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationGraph-Based Semi-Supervised Learning
Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier
More informationSemi-Supervised Learning with Graphs
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu LTI SCS CMU Thesis Committee John Lafferty (co-chair) Ronald Rosenfeld (co-chair) Zoubin Ghahramani Tommi Jaakkola 1 Semi-supervised Learning classifiers
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 12: Graph Clustering Cho-Jui Hsieh UC Davis May 29, 2018 Graph Clustering Given a graph G = (V, E, W ) V : nodes {v 1,, v n } E: edges
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France Partially based on material by: Toma s Koca k November 23, 2015 MVA 2015/2016 Last Lecture Examples of applications of online SSL
More informationManifold Regularization
Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is
More informationINTRODUCTION TO DATA SCIENCE
INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationGraph-Based Anomaly Detection with Soft Harmonic Functions
Graph-Based Anomaly Detection with Soft Harmonic Functions Michal Valko Advisor: Milos Hauskrecht Computer Science Department, University of Pittsburgh, Computer Science Day 2011, March 18 th, 2011. Anomaly
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationA graph based approach to semi-supervised learning
A graph based approach to semi-supervised learning 1 Feb 2011 Two papers M. Belkin, P. Niyogi, and V Sindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples.
More informationDiffuse interface methods on graphs: Data clustering and Gamma-limits
Diffuse interface methods on graphs: Data clustering and Gamma-limits Yves van Gennip joint work with Andrea Bertozzi, Jeff Brantingham, Blake Hunter Department of Mathematics, UCLA Research made possible
More informationTHE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING
THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationSpectral Bandits for Smooth Graph Functions with Applications in Recommender Systems
Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Tomáš Kocák SequeL team INRIA Lille France Michal Valko SequeL team INRIA Lille France Rémi Munos SequeL team, INRIA
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationData-dependent representations: Laplacian Eigenmaps
Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component
More informationSemi-supervised Learning
Semi-supervised Learning Introduction Supervised learning: x r, y r R r=1 E.g.x r : image, y r : class labels Semi-supervised learning: x r, y r r=1 R, x u R+U u=r A set of unlabeled data, usually U >>
More informationGlobal vs. Multiscale Approaches
Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:
More informationLecture 15: Random Projections
Lecture 15: Random Projections Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 15 1 / 11 Review of PCA Unsupervised learning technique Performs dimensionality reduction
More informationHou, Ch. et al. IEEE Transactions on Neural Networks March 2011
Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011 Semi-supervised approach which attempts to incorporate partial information from unlabeled data points Semi-supervised approach which attempts
More informationHilbert Space Methods in Learning
Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationSupport vector machines
Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Rob Fergus, Tomáš Kocák March 17, 2015 MVA 2014/2015 Last Lecture Analysis of online SSL Analysis
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationMachine Learning and Data Mining. Support Vector Machines. Kalev Kask
Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems
More informationOnline Manifold Regularization: A New Learning Setting and Empirical Study
Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg 1, Ming Li 2, Xiaojin Zhu 1 1 Computer Sciences, University of Wisconsin Madison, USA. {goldberg,jerryzhu}@cs.wisc.edu
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationData Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings
Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline
More informationMachine Learning Lecture 6 Note
Machine Learning Lecture 6 Note Compiled by Abhi Ashutosh, Daniel Chen, and Yijun Xiao February 16, 2016 1 Pegasos Algorithm The Pegasos Algorithm looks very similar to the Perceptron Algorithm. In fact,
More information9.520 Problem Set 2. Due April 25, 2011
9.50 Problem Set Due April 5, 011 Note: there are five problems in total in this set. Problem 1 In classification problems where the data are unbalanced (there are many more examples of one class than
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationNonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationRandom Walks, Random Fields, and Graph Kernels
Random Walks, Random Fields, and Graph Kernels John Lafferty School of Computer Science Carnegie Mellon University Based on work with Avrim Blum, Zoubin Ghahramani, Risi Kondor Mugizi Rwebangira, Jerry
More informationSpectral Clustering. Guokun Lai 2016/10
Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationNon-linear Support Vector Machines
Non-linear Support Vector Machines Andrea Passerini passerini@disi.unitn.it Machine Learning Non-linear Support Vector Machines Non-linearly separable problems Hard-margin SVM can address linearly separable
More informationIntroduction to Spectral Graph Theory and Graph Clustering
Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationLimits of Spectral Clustering
Limits of Spectral Clustering Ulrike von Luxburg and Olivier Bousquet Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tübingen, Germany {ulrike.luxburg,olivier.bousquet}@tuebingen.mpg.de
More informationSelf-Tuning Semantic Image Segmentation
Self-Tuning Semantic Image Segmentation Sergey Milyaev 1,2, Olga Barinova 2 1 Voronezh State University sergey.milyaev@gmail.com 2 Lomonosov Moscow State University obarinova@graphics.cs.msu.su Abstract.
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationCS540 ANSWER SHEET
CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationPhysical Metaphors for Graphs
Graphs and Networks Lecture 3 Physical Metaphors for Graphs Daniel A. Spielman October 9, 203 3. Overview We will examine physical metaphors for graphs. We begin with a spring model and then discuss resistor
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationSoft-margin SVM can address linearly separable problems with outliers
Non-linear Support Vector Machines Non-linearly separable problems Hard-margin SVM can address linearly separable problems Soft-margin SVM can address linearly separable problems with outliers Non-linearly
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationAbout this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes
About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationECE-271B. Nuno Vasconcelos ECE Department, UCSD
ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More information