Random Walks, Random Fields, and Graph Kernels
|
|
- Corey Henry
- 5 years ago
- Views:
Transcription
1 Random Walks, Random Fields, and Graph Kernels John Lafferty School of Computer Science Carnegie Mellon University Based on work with Avrim Blum, Zoubin Ghahramani, Risi Kondor Mugizi Rwebangira, Jerry Zhu
2 Outline Graph Kernels Random Fields Random Walks Continuous Fields
3 Using a Kernel ˆf(x) = N i= α i y i x, x i ˆf(x) = N i= α i y i K(x, x i ) 2
4 K(x, x ) positive semidefinite: The Kernel Trick X X f(x)f(x )K(x, x ) dx dx 0 Taking feature space of functions F = {Φ(x) = K(, x), x X }, has reproducing property g(x) = K(, x), g. Φ(x), Φ(x ) = K(, x), K(, x ) = K(x, x ) 3
5 Structured Data What if data lies on a graph or other data structure? S Google Cornell CMU N time VP foobar.com NSF L like V flies 4
6 Combinatorial Laplacian Think of edge e as tangent vector at e. For f : V R, df : E R is the -form df(e) = f(e + ) f(e ) Then = d d (as matrix) is discrete analogue of div 5
7 It is an averaging operator Combinatorial Laplacian f(x) = y x w xy (f(x) f(y)) = d(x) f(x) x y w xy f(y) We say f is harmonic if f = 0. Since f, g = df, dg, is self-adjoint and positive. 6
8 Diffusion Kernels on Graphs (Kondor and L., 2002) If is the graph Laplacian, in analogy with the continuous setting, t K t = K t is the heat equation on a graph. Solution K t = e t is the diffusion kernel. 7
9 Physical Interpretation ( t) K = 0, initial condition δx (y): e t f(x) = M K t(x, y) f(y) dy For a kernel-based classifier ŷ(x) = i α i y i K t (x i, x) decision function is given by heat flow with initial condition α i x = x i positive labeled data f(x) = α i x = x i negative labeled data 0 otherwise 8
10 RKHS Representation General spectral representation of a kernel as K(x, y) = n i= λ iφ i (x)φ i (y) leads to reproducing kernel Hilbert space i a i φ i, i b i φ i HK = i a i b i λ i For the diffusion kernel, RKHS inner product is f, g HK = i e tµ i fi ĝ i Interpretation: Functions with small norm don t oscillate rapidly on the graph. 9
11 Building Up Kernels If K (i) t are kernels on X i K t = n i= K(i) t is a kernel on X... X n. For the hypercube: Hamming distance {}}{ K t (x, x ) (tanh t) d(x, x ) Similar kernels apply to standard categorical data. Other graphs with explicit diffusion kernels: Infinite trees (Chung & Yau, 999) Rooted trees Cycles Strings with wildcards 0
12 Results on UCI Datasets Hamming Diffusion Kernel Improv. Data Set error SV error SV β err SV Breast Cancer 7.64% % % 83% Hepatitis 7.98% % % 58% Income 9.9% % % 8% Mushroom 3.36% % % 70% Votes 4.69% % % 2% Recent application to protein classification by Vert and Kanehisa (NIPS 2002).
13 Random Fields View of Combining Labeled/Unlabeled Data 2
14 Random Fields View View each vertex x as having label f(x) {+, }. Ising model on graph/lattice, spins f : V {+, } Energy H(f) = 2 w xy (f(x) f(y)) 2 x y x y w xy f(x) f(y) Gibbs distribution P (f) = Z(β) e βh(f) β = T Partition function Z(β) = f e βh(f) 3
15 Graph Mincuts Graph mincuts can be very unbalanced Graph mincuts don t exploit probabilistic properties of random fields Idea: Replace by averages under Ising model E β [f(x)] = f(x) e βh(f) Z(β) f S =f B 4
16 Pinned Ising Model β=3 β= β= β= β= β=
17 Not (Provably) Efficient to Approximate Unfortunately, analogue of rapid mixing result of Jerrum & Sinclair for ferromagnetic Ising model not known for mixed boundary conditions Question: Can we compute averages using graph algorithms in the zero temperature limit? 6
18 Idea: Relax to Statistical Field Theory Euclidean field theory on graph/lattice, fields f : V R Energy H(f) = 2 Gibbs distribution P (f) = Partition function Z(β) = x y w xy (f(x) f(y)) 2 Z(β) e βh(f) e βh(f) df f β = T Physical Interpretation: analytic continuation to imaginary time, t it Poincaré group Euclidean group. 7
19 View from Statistical Field Theory (cont.) Most probable field is harmonic Weighted graph G = (V, E), edge weights w xy, combinatorial Laplacian. Subgraph S with boundary S. Dirichlet Problem: unique solution f = 0 on S f S = f B 8
20 Random Walk Solution Perform random walk on unlabeled data, stop when hit a labeled point. What is the probability of hitting a positive labeled point before a negative labeled point? Precisely the same as minimum energy (continuous) random field. Label Propagation. Related work by Szummer and Jaakkola (NIPS 200) 9
21 Unconstrained Constrained
22 View from Statistical Field Theory In one-dimensional case: low temperature limit of average Ising model is the same is minimum energy Euclidean field. (Landau) Intuition: linear. average over graph s-t mincuts; harmonic solution is Not true in general... 2
23 Computing the Partition Function Let λ i be spectrum of, Dirichlet boundary conditions: Z(β) = e βh(f ) (βπ) n/2 det det = n i= λ i By generalization of matrix-tree (Chung & Langlands, 96) det = # rooted spanning forests i deg(i) 22
24 Connection with Diffusion Kernels Again take, combinatorial Laplacian with Dirichlet boundary conditions (zero on labeled data) For K t = e t diffusion kernel let K = 0 K t dt Solution to the Dirichlet problem (label prop, minimum energy continuous field): f (x) = z fringe K(x, z) f D (z) 23
25 Connection with Diffusion Kernels (cont.) Want to solve Laplace s equation: f = g. terms of. Solution given in Quick way to see connection using spectral representation: x,x = i µ i φ i (x) φ i (x ) K t (x, x ) = i e tµ i φ i (x) φ i (x ) x,x = i µ i φ i (x) φ i (x ) = 0 K t (x, x ) dt Used by Chung and Yau (2000). 24
26 Bounds on Covering Numbers and Generalization Error, Continuous Case Eigenvalue bounds from differential geometry (Li and Yau): )2 d µj c 2 ( j + )2 d c ( j V V Give bounds on SVM hypothesis class covering numbers log N (ɛ, F R (x)) = O (( V t d 2 ) log d+2 2 ( )) ɛ 25
27 Bounds on Generalization Error Better bounds on generalization error are now available based on Rademacher averages involving trace of the kernel (Bartlett, Bousquet, & Mendelson, preprint). Question: Can diffusion kernel connection be exploited to get transductive generalization error bounds for random walks approach? 26
28 Summary Random fields with discrete class labels intractable, unstable Continuous fields tractable, more desirable behavior for segentation and labeling Intimate connections with random walks, electric networks, graph flows, and diffusion kernels Advantages/disadvantages? 27
Data and Structure in Machine Learning
Data and Structure in Machine Learning John Lafferty School of Computer Science Carnegie Mellon University based on joint work with Zoubin Ghahramani, Risi Kondor, Guy Lebanon, Jerry Zhu The Data Century
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationSemi-Supervised Learning with Graphs
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu LTI SCS CMU Thesis Committee John Lafferty (co-chair) Ronald Rosenfeld (co-chair) Zoubin Ghahramani Tommi Jaakkola 1 Semi-supervised Learning classifiers
More informationSemi-Supervised Learning
Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationSemi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More information1 Graph Kernels by Spectral Transforms
Graph Kernels by Spectral Transforms Xiaojin Zhu Jaz Kandola John Lafferty Zoubin Ghahramani Many graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationClassification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter
Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS 249-2 2017 Winter Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France TA: Pierre Perrault Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton October 30, 2017
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationMachine Learning, Fall 2011: Homework 5
0-60 Machine Learning, Fall 0: Homework 5 Machine Learning Department Carnegie Mellon University Due:??? Instructions There are 3 questions on this assignment. Please submit your completed homework to
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationHilbert Space Methods in Learning
Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem
More informationLecture 4 February 2
4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationDefinition: If y = f(x), then. f(x + x) f(x) y = f (x) = lim. Rules and formulas: 1. If f(x) = C (a constant function), then f (x) = 0.
Definition: If y = f(x), then Rules and formulas: y = f (x) = lim x 0 f(x + x) f(x). x 1. If f(x) = C (a constant function), then f (x) = 0. 2. If f(x) = x k (a power function), then f (x) = kx k 1. 3.
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationLecture No 1 Introduction to Diffusion equations The heat equat
Lecture No 1 Introduction to Diffusion equations The heat equation Columbia University IAS summer program June, 2009 Outline of the lectures We will discuss some basic models of diffusion equations and
More informationDiscrete vs. Continuous: Two Sides of Machine Learning
Discrete vs. Continuous: Two Sides of Machine Learning Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Oct. 18,
More informationSemi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances
Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationHow Good is a Kernel When Used as a Similarity Measure?
How Good is a Kernel When Used as a Similarity Measure? Nathan Srebro Toyota Technological Institute-Chicago IL, USA IBM Haifa Research Lab, ISRAEL nati@uchicago.edu Abstract. Recently, Balcan and Blum
More informationSemi-Supervised Learning Using Gaussian Fields and Harmonic Functions
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin hu HUXJ@CSCMUEDU oubin Ghahramani OUBIN@GASBUCLACU John Lafferty LAFFER@CSCMUEDU School of Computer Science, Carnegie Mellon
More informationGroup theoretical methods in Machine Learning
Group theoretical methods in Machine Learning Risi Kondor Columbia University Tutorial at ICML 2007 Tiger, tiger, burning bright In the forests of the night, What immortal hand or eye Dare frame thy fearful
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More information1 Distributions (due January 22, 2009)
Distributions (due January 22, 29). The distribution derivative of the locally integrable function ln( x ) is the principal value distribution /x. We know that, φ = lim φ(x) dx. x ɛ x Show that x, φ =
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationStatistical learning on graphs
Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,
More informationGraph-Based Semi-Supervised Learning
Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationBinet-Cauchy Kernerls on Dynamical Systems and its Application to the Analysis of Dynamic Scenes
on Dynamical Systems and its Application to the Analysis of Dynamic Scenes Dynamical Smola Alexander J. Vidal René presented by Tron Roberto July 31, 2006 Support Vector Machines (SVMs) Classification
More informationPartially labeled classification with Markov random walks
Partially labeled classification with Markov random walks Martin Szummer MIT AI Lab & CBCL Cambridge, MA 0239 szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA 0239 tommi@ai.mit.edu Abstract To
More informationManifold Regularization
Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationSemi-Supervised Learning on Riemannian Manifolds
Semi-Supervised Learning on Riemannian Manifolds Mikhail Belkin, Partha Niyogi University of Chicago, Department of Computer Science Abstract. We consider the general problem of utilizing both labeled
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives
More informationLearning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst
Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationSemi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data
Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data Boaz Nadler Dept. of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel 76 boaz.nadler@weizmann.ac.il
More informationIFT CONTINUOUS LAPLACIAN Mikhail Bessmeltsev
IFT 6112 07 CONTINUOUS LAPLACIAN http://www-labs.iro.umontreal.ca/~bmpix/teaching/6112/2018/ Mikhail Bessmeltsev Famous Motivation An Experiment Unreasonable to Ask? Length of string http://www.takamine.com/templates/default/images/gclassical.png
More informationThe Kernel Trick. Carlos C. Rodríguez October 25, Why don t we do it in higher dimensions?
The Kernel Trick Carlos C. Rodríguez http://omega.albany.edu:8008/ October 25, 2004 Why don t we do it in higher dimensions? If SVMs were able to handle only linearly separable data, their usefulness would
More informationof semi-supervised Gaussian process classifiers.
Semi-supervised Gaussian Process Classifiers Vikas Sindhwani Department of Computer Science University of Chicago Chicago, IL 665, USA vikass@cs.uchicago.edu Wei Chu CCLS Columbia University New York,
More informationDesigning Kernel Functions Using the Karhunen-Loève Expansion
July 7, 2004. Designing Kernel Functions Using the Karhunen-Loève Expansion 2 1 Fraunhofer FIRST, Germany Tokyo Institute of Technology, Japan 1,2 2 Masashi Sugiyama and Hidemitsu Ogawa Learning with Kernels
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationKernel Methods for Computational Biology and Chemistry
Kernel Methods for Computational Biology and Chemistry Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Centre for Computational Biology Ecole des Mines de Paris, ParisTech Mathematical Statistics and Applications,
More informationGeometry on Probability Spaces
Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,
More informationClustering and efficient use of unlabeled examples
Clustering and efficient use of unlabeled examples Martin Szummer MIT AI Lab & CBCL Cambridge, MA 02139 szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA 02139 tommi@ai.mit.edu Abstract Efficient
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationXVI International Congress on Mathematical Physics
Aug 2009 XVI International Congress on Mathematical Physics Underlying geometry: finite graph G=(V,E ). Set of possible configurations: V (assignments of +/- spins to the vertices). Probability of a configuration
More informationSupport Vector Machines on General Confidence Functions
Support Vector Machines on General Confidence Functions Yuhong Guo University of Alberta yuhong@cs.ualberta.ca Dale Schuurmans University of Alberta dale@cs.ualberta.ca Abstract We present a generalized
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLogarithmic Harnack inequalities
Logarithmic Harnack inequalities F. R. K. Chung University of Pennsylvania Philadelphia, Pennsylvania 19104 S.-T. Yau Harvard University Cambridge, assachusetts 02138 1 Introduction We consider the relationship
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationRegularization on Discrete Spaces
Regularization on Discrete Spaces Dengyong Zhou and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany {dengyong.zhou, bernhard.schoelkopf}@tuebingen.mpg.de
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationDivide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates
: A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More informationKernel methods CSE 250B
Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from
More informationFunctions of Several Variables
Functions of Several Variables Partial Derivatives Philippe B Laval KSU March 21, 2012 Philippe B Laval (KSU) Functions of Several Variables March 21, 2012 1 / 19 Introduction In this section we extend
More informationLinear algebra and applications to graphs Part 1
Linear algebra and applications to graphs Part 1 Written up by Mikhail Belkin and Moon Duchin Instructor: Laszlo Babai June 17, 2001 1 Basic Linear Algebra Exercise 1.1 Let V and W be linear subspaces
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton February 10, 2015 MVA 2014/2015 Previous
More informationKernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444
Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationLecture: Some Practical Considerations (3 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 14-03/10/2015 Lecture: Some Practical Considerations (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.
More informationDiffusion Kernels on Statistical Manifolds
John Lafferty Guy Lebanon School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 USA lafferty@cs.cmu.edu lebanon@cs.cmu.edu Editor: Tommi Jaakkola Abstract A family of kernels for statistical
More informationEigenvalues of graphs
Eigenvalues of graphs F. R. K. Chung University of Pennsylvania Philadelphia PA 19104 September 11, 1996 1 Introduction The study of eigenvalues of graphs has a long history. From the early days, representation
More informationMachine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationA graph based approach to semi-supervised learning
A graph based approach to semi-supervised learning 1 Feb 2011 Two papers M. Belkin, P. Niyogi, and V Sindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples.
More information9.2 Support Vector Machines 159
9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationKernel Methods & Support Vector Machines
Kernel Methods & Support Vector Machines Mahdi pakdaman Naeini PhD Candidate, University of Tehran Senior Researcher, TOSAN Intelligent Data Miners Outline Motivation Introduction to pattern recognition
More informationSimilarity-Based Theoretical Foundation for Sparse Parzen Window Prediction
Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Nina Balcan Avrim Blum Nati Srebro Toyota Technological Institute Chicago Sparse Parzen Window Prediction We are concerned with
More information15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018
15-388/688 - Practical Data Science: Decision trees and interpretable models J. Zico Kolter Carnegie Mellon University Spring 2018 1 Outline Decision trees Training (classification) decision trees Interpreting
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationJustin Solomon MIT, Spring 2017
Justin Solomon MIT, Spring 2017 http://pngimg.com/upload/hammer_png3886.png You can learn a lot about a shape by hitting it (lightly) with a hammer! What can you learn about its shape from vibration frequencies
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationOn the definition and properties of p-harmonious functions
On the definition and properties of p-harmonious functions University of Pittsburgh, UBA, UAM Workshop on New Connections Between Differential and Random Turn Games, PDE s and Image Processing Pacific
More information