Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang
|
|
- Eugenia Oliver
- 5 years ago
- Views:
Transcription
1 Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #3 Machine Learning Edward Y. Chang Edward Chang Founda'ons of LSMM 1
2 Edward Chang Foundations of LSMM 2
3 Machine Learning Approaches Introduc'on Linear Models Large D D >> N Genera've vs. Discrimina've Models Non- Linear Models 6/5/12 Ed Founda'ons of MM 3
4 Sta's'cal Learning Program the computers to learn! Computers improve performance with experience at some task Example: Task: playing checkers Performance: % games it wins Experience: expert players 6/5/12 Ed Founda'ons of MM 4
5 Sta's'cal Learning Task Ŷ = f(u) Represented by some model(s) Implies hypothesis Performance Measured by error func'ons Experience (L) Characterized by training data Algorithm (Φ) 6/5/12 Ed Founda'ons of MM 5
6 Supervised Learning X: Data U: Unlabeled pool L: Labeled pool G: Labels Regression Classifica'on Φ: Learning algorithm f = Φ(L) Ŷ = f(u) 6/5/12 Ed Founda'ons of MM 6
7 Learning Algorithms Φ Linear Model K- NN Kernel Methods Neural Networks Probabilis'c Graphic Models Decision Trees Etc. Ed Founda'ons of MM 7 6/5/12
8 Linear Model Ed Founda'ons of MM 8 6/5/12
9 Linear Model ŷ = w0 + Σ Xj wj (j = 1 to d) X is an n d matrix d: data dimension n: number of training instances ŷ = Xw L(w, S) = RSS(w) = (y Xw) T (y Xw) RSS: Residual Sum of Square L(w, S)/ w = - 2 X T y + 2X T Xw = 0 w = (X T X) - 1 X T y Ed Founda'ons of MM 9 6/5/12
10 Three Challenges D is too large Curse of dimensionality D > N Insufficient samples N is too large Later 6/5/12 Ed Founda'ons of MM 10
11 Gene Profiling Example N = 59 cases, D = 4026 genes 6/5/12 Ed Founda'ons of MM 11
12 Subset Selec'on & Shrinkage Least Square ooen suffers from large variance Shrinkage sets some coefficients to zero Algorithms Forward Stepwise Selec'on Backward Stepwise Selec'on Ridge Regression Ed Founda'ons of MM 12 6/5/12
13 Ridge Regression w = argmin w {Σ n (y i - w 0 - Σ d x ij w j ) 2 + λ Σ d w j 2 } Why would this help? Regulariza'on: remedying an ill- posed model Correlated variables Data prepara'on Normalize input Centralize input (removing w 0 ) w = (X T X+ λi d ) - 1 X T y Ed Founda'ons of MM 13 6/5/12
14 Ridge Regression Tikhonov Regulariza'on min L λ (w S) = min λ w 2 + (y i f (x i )) 2 As oppose to min (y Xw) T (y Xw) w = (X T X+ λi d ) - 1 X T y As oppose to w = (X T X) - 1 X T y w = X T α α = (G + λi n ) - 1 y (G: n n Gram matrix) Ed Founda'ons of MM 14 6/5/12
15 Regulariza'on wikipedia 6/5/12 Ed Founda'ons of MM 15
16 SVD Interpreta'on Ed Founda'ons of MM 16 6/5/12
17 PCR: Principal Component Regression SVD Discard components with smallest Eigen coefficients PCA Linear Mul'variate Regression Sum of univariate regressions Ed Founda'ons of MM 17 6/5/12
18 Limita'ons & Treatments High bias Low variance Ed Founda'ons of MM 18 6/5/12
19 Linear Model Ed Founda'ons of MM 19 6/5/12
20 Limita'ons & Treatments High bias Low variance High- dimensional or overfiyng Ridge, Subset, Lasso PCR, PLS In general, Regulariza'on Ed Founda'ons of MM 20 6/5/12
21 Genera've vs. Discrimina've Models Genera've Models Model en're distribu'on One class at a 'me Look for maximum likelihood Discrimina've Models Model class boundaries Ignore distribu'on Support Vector Machines (SVMs) Perhaps be{er for large problems! 6/5/12 Ed Founda'ons of MM 21
22 Maximum Likelihood View ŷ = w0 + Σ wj Xj (j = 1 to d) ŷ = Xw ŷ = Xw + ε ε (noise signals) are independent ε N (ŷ, 2 ) P(ŷ wx) has a normal dist. with Mean at ŷ = wx Variance 2 Ed Founda'ons of MM 22 6/5/12
23 Deriva'on P(ŷ w x) N (ŷ, 2 ) Training Given (x 1,y 1 ) (x 2,y 2 ) (x n,y n ) Infer w by training data By Bayes rule, or Maximum Likelihood Es'mate Ed Founda'ons of MM 23 6/5/12
24 Maximum Likelihood For what w is P(y 1, y 2, y n x 1, x 2, x n, w) maximized? Π P(y i wx i ) maximized? Π exp(- ½(y i - wx i / ) 2 ) maximized? Σ (- ½(y i - wx i / ) 2 maximized? Σ (y i - wx i ) 2 minimized? Ed Founda'ons of MM 24 6/5/12
25 Observa'ons RSS = MAP What if n < d? Gradient Decent (Perceptron) Converges only when instances are linearly separable, or behaves erra'cally Dual Formula'on Ed Founda'ons of MM 25 6/5/12
26 Dual View (Duality) Primal w = (X T X) - 1 X T y (d d) (d n) (n 1) Dual (if (X T X) - 1 exists) w = X T X(X T X) - 2 X T y = X T (X(X T X) - 2 X T y) = X T α w = α i x i, i = 1 to n α = X(X T X) - 2 X T y = G y G: (n d) (d d) (d n) à n n Gram matrix When n < d ((X T X) - 1 does not exists) Restrict (bias) the choice of func'ons Regulariza'on Ed Founda'ons of MM 26 6/5/12
27 Ridge Regression min L λ (w S) = min λ w 2 + (y i f (x i )) 2 As oppose to min (y Xw) T (y Xw) w = (X T X+ λi d ) - 1 X T y As oppose to w = (X T X) - 1 X T y w = X T α α = (G + λi n ) - 1 y (G: n n Gram matrix) Ed Founda'ons of MM 27 6/5/12
28 Primal vs. Dual Primal Dual Training Cost O(d 3 ) O(n 3 ) Classifica'on O(d) O(nd) 6/5/12 Ed Founda'ons of MM 28
29 Primal vs. Dual Primal Dual is the choice Training Cost O(d 3 ) O(n 3 ) Classifica'on O(d) O(nd) 6/5/12 Ed Founda'ons of MM 29
30 Models & Linearity Genera've Models Model en're distribu'on One class at a 'me Look for maximum likelihood Discrimina've Models Model class boundaries Ignore distribu'on Support Vector Machines (SVMs) Perhaps be{er for large problems! 6/5/12 Ed Founda'ons of MM 30
31 Gaussian Mixture Model 6/5/12 Ed Founda'ons of MM 31 From: h{p://neural.cs.nthu.edu.tw/jang/matlab/toolbox/dcpr/image/gmmtraindemo2dcovtype01a.gif
32 Support Vector Machine - Linear 6/5/12 Ed Founda'ons of MM 32
33 Support Vector Machine Nonlinear 6/5/12 Ed Founda'ons of MM 33
34 Decision Tree 6/5/12 Ed Founda'ons of MM 34 From: h{p://upload.wikimedia.org/wikipedia/commons/f/ff/decision_tree_model.png
35 Decision Tree Output 6/5/12 Ed Founda'ons of MM 35 h{p://prsysdesign.net/prsd/blog/dectree/dectree1.png
36 Boosted Decision Tree Mul'ple weak classifiers Strength in numbers Emphasize mistakes Put resources at hard cases Provable Strong classifier Converges 6/5/12 Ed Founda'ons of MM 36
37 AdaBoost Example 6/5/12 Ed Founda'ons of MM 37
38 Machine Learning Approaches Introduc'on Linear Models Large D D >> N Genera've vs. Discrimina've Models Non- Linear Models 6/5/12 Ed Founda'ons of MM 38
39 Classical Model N: Number of training instances N +, N - D: Dimensionality N >> D N E.g., PAC learnability N - N + 6/5/12 Ed Founda'ons of MM 39
40 Emerging MM Applica'ons N < D N + << N - Examples Informa'on Retrieval with relevance feedback Surveillance event detec'on 6/5/12 Ed Founda'ons of MM 40
41 IR à A Classifica'on Problem Ed Foundations of MM 41 6/5/12
42 Apple Search 6/5/12 Ed Foundations of MM 42
43 Relevance Feedback 6/5/12 Ed Foundations of MM 43
44 Fruit 6/5/12 Ed Foundations of MM 44
45 Text- based image search limita4ons... Ed Foundations of MM 45 6/5/12
46 VIMA Visual Search Ed Foundations of MM 46 6/5/12
47 Step #1: Solicit Labels Ed Foundations of MM 47 6/5/12
48 Step #2: Compute Boundary Ed Foundations of MM 48 6/5/12
49 Step #3: Iden'fy Useful Samples Ed Foundations of MM 49 6/5/12
50 Step #4: Solicit More Feedback Ed Foundations of MM 50 6/5/12
51 Step #5: Refine Boundary Ed Foundations of MM 51 6/5/12
52 Step #6: Ranking Ed Foundations of MM 52 6/5/12
53 Observa'ons Iden'fy good samples Collect diversified samples Is linear model sufficient? Ed Foundations of MM 53 6/5/12
54 IR à A Classifica'on Problem Ed Foundations of MM 54 6/5/12
55 Non- Linear Boundary Ed Foundations of MM 55 6/5/12
56 Separa'ng Hyperplane 6/5/12 Ed Founda'ons of MM 56
57 Separa'ng Hyperplane 6/5/12 Ed Founda'ons of MM 57
58 Maximum Margin Hyperplane 6/5/12 Ed Founda'ons of MM 58
59 Linear Model Fits All Data? 6/5/12 Ed Founda'ons of MM 59
60 Linear Model Fits All? 6/5/12 Ed Founda'ons of MM 60
61 How about Joining the Dots? ŷ(x) = 1/k Σ yi, xi Nk(x) K = 1 6/5/12 Ed Founda'ons of MM 61
62 NN with k = 1 6/5/12 Ed Founda'ons of MM 62
63 Nearest Neighbor Four Things Make a NN Memory- Based Learner A distance func'on K: number of neighbors to consider? A weighted func'on (op'onal) How to fit with the local points? 6/5/12 Ed Founda'ons of MM 63
64 Problems Fiyng Noise Jagged Boundaries 6/5/12 Ed Founda'ons of MM 64
65 Solu'ons Fiyng Noise Pick a Larger K? Jagged Boundaries Introducing Kernel as a weigh'ng func'on 6/5/12 Ed Founda'ons of MM 65
66 NN with k = 15 6/5/12 Ed Founda'ons of MM 66
67 NN 6/5/12 Ed Founda'ons of MM 67
68 Solu'ons Fiyng Noise Pick a larger K? Jagged Boundaries Introducing Kernel as a weigh'ng func'on 6/5/12 Ed Founda'ons of MM 68
69 Nearest Neighbor - > Kernel Method Four Things Make a Memory Based Learner A distance func'on K: number of neighbors to consider? All A weighted func'on: RBF kernels How to fit with the local points? Predict weights 6/5/12 Ed Founda'ons of MM 69
70 Kernel Method RBF Weighted Func'on Kernel width holds the key Use cross valida'on to find the op'mal width Fiyng with the Local Points Where NN meets Linear Model 6/5/12 Ed Founda'ons of MM 70
71 LM vs. NN Linear Model f(x) is approximated by a global linear func'on More stable, less flexible Nearest Neighbor K- NN assumes f(x) is well approximated by a locally constant func'on Less stable, more flexible Between LM and NN The other models 6/5/12 Ed Founda'ons of MM 71
72 Where Are We and Where Am I Heading To? LM and NN Kernel Method of Three Views LM view NN view Geometric view 6/5/12 Ed Founda'ons of MM 72
73 Linear Model View Y = β0 + Σ β X Separa'ng Hyperplane Max β =1 C Subject to yi f(xi) C, or yi (β0 +β xi) C 6/5/12 Ed Founda'ons of MM 73
74 Maximum Margin Hyperplane 6/5/12 Ed Founda'ons of MM 74
75 Classifier Margin Margin Defined as with of the boundary before hiyng a data object Maximum Margin Tends to minimize classifica'on variance No formal theory for this yet 6/5/12 Ed Founda'ons of MM 75
76 Separa'ng Hyperplane 6/5/12 Ed Founda'ons of MM 76
77 M s Mathema'cal Representa'on Plus- plane {x: wx+b = +1} Minus- plane {x: wx+b = - 1} w Plus- plane w(u v) = 0, if u and v on plus- plane w Minus- plane 6/5/12 Ed Founda'ons of MM 77
78 Separa'ng Hyperplane 6/5/12 Ed Founda'ons of MM 78
79 M Let x - be any point on minus- plane Let x + be the closest plus- plane- point to x - x + = x - + λw, why The line (x + x - ) minus- plane M = x + - x - 6/5/12 Ed Founda'ons of MM 79
80 M 1. wx - + b = wx + + b = 1 3. x + = x - + λw 4. M = x + - x - 5. w(x - + λw) + b = 1 (from 2 & 3) 6. wx - + b + λww = 1 7. λww = 2 6/5/12 Ed Founda'ons of MM 80
81 M 1. λww = 2 2. λ = 2/ww 3. M = x + - x - = λw = λ w = 2/ w 4. Max M Gradient decent, simulated annealing, EM, Newton s method? 6/5/12 Ed Founda'ons of MM 81
82 Max M Max M = 2/ w Min w /2 Min w 2 /2 subject to y i (x i w+b) 1 i = 1,,N Quadra'c criterion with linear inequality constraints 6/5/12 Ed Founda'ons of MM 82
83 Max M Min w 2 /2 subject to y i (x i w+b) 1 i = 1,,N Lp = min w,b w 2 /2 + Σ i=1..n α i [y i (x i w+b)- 1] w = Σ i=1..n α i y i x i 0 = Σ i=1..n α i y i 6/5/12 Ed Founda'ons of MM 83
84 Wolfe Dual Ld = Σ i=1..n α - 1/2 ΣΣ i,j=1..n α i α j y i y j x i x j Subject to α i 0 α i [y i (x i w+b)- 1] = 0 KKT condi'ons α i > 0, y i (x i w+b) = 1 (Support Vectors) α i = 0, y i (x i w+b) > 1 6/5/12 Ed Founda'ons of MM 84
85 Class Predic'on yq = w xq + b w = Σ i=1..n α i y i x i yq = sign(σ i=1..n α i y i (x i X q ) + b) 6/5/12 Ed Founda'ons of MM 85
86 Non- seperatable Classes Soo Margin Hyperplane Basis Expansion 6/5/12 Ed Founda'ons of MM 86
87 Non- separa'ng Case 6/5/12 Ed Founda'ons of MM 87
88 Soo Margin SVMs Min w 2 /2 subject to y i (x i w+b) 1 i = 1,,N Min w 2 /2 + C ε i x i w+b 1 - ε i if y i = 1 x i w+b ε i if y i = - 1 ε i 0 6/5/12 Ed Founda'ons of MM 88
89 Non- separa'ng Case 6/5/12 Ed Founda'ons of MM 89
90 Wolfe Dual Ld = Σ i=1..n α - 1/2 ΣΣ i,j=1..n α i α j y i y j x i x j Subject to C α i 0 Σ α i y i = 0 KKT condi'ons yq = sign (Σ i=1..n α i y i (x i X q ) + b) 6/5/12 Ed Founda'ons of MM 90
91 Basis Func'on 6/5/12 Ed Founda'ons of MM 91
92 Harder 1D Example 6/5/12 Ed Founda'ons of MM 92
93 Basis Func'on Φ(X) = (x, x 2 ) 6/5/12 Ed Founda'ons of MM 93
94 Harder 1D Example 6/5/12 Ed Founda'ons of MM 94
95 Some Basis Func'ons Φ(X) = Σ γ m h m (X) h m (X) R p R Common Func'ons Polynomial Radial basis func'ons Sigmoid func'ons 6/5/12 Ed Founda'ons of MM 95
96 Wolfe Dual Ld = Σ i=1..n α - 1/2 ΣΣ i,j=1..n α i α j y i y j Φ(x i )Φ (x j ) Subject to C α i 0 Σ α i y i = 0 KKT condi'ons yq = sign (Σ i=1..n α i y i (Φ(x i ) Φ(X q )) + b) K(x i, x j ) = Φ(x i ) Φ(Xj) Kernel func'on! 6/5/12 Ed Founda'ons of MM 96
97 Nearest Neighbor View Z, a set of zero mean jointly Gaussian random variables, Each Zi corresponds to one example Xi Cov (zi, zj) = K(xi, xj) y i, the lable of zi, +1 or - 1 P(yi zi) = σ(yi,zi) 6/5/12 Ed Founda'ons of MM 97
98 Training Data 6/5/12 Ed Founda'ons of MM 98
99 General Kernel Classifier [Jaakkola, etc. 99] MAP Classifica'on for x t y t = sign (Σ αi yi K(x t,x i )) K(xi, xj) = Cov (zi, zj) (some similarity func'on) Supervised Training: Compute αi Given X and y, and An error func'on such as J(α) = - ½ Σ αi αj yi yj K(xi,xj) + Σ F(αi) 6/5/12 Ed Founda'ons of MM 99
100 Leave One Out 6/5/12 Ed Founda'ons of MM 100
101 SVMs yt = sign (Σ αi yi K(xt,xi)) (yi xi) training data, αi nonnega've, and kernel K posi've definite αi is obtained by minimizing J(α) = - ½ Σ αi αj yi yj K(xi,xj) + Σ F(αi) F(αi) = αi αi 0, Σyiαi = 0 6/5/12 Ed Founda'ons of MM 101
102 SVMs 6/5/12 Ed Founda'ons of MM 102
103 Important Insight K(xi, xj) = Cov (zi, zj) To design of a kernel is to design a similarity func'on that produces a posi've definite covariance matrix on the training instances 6/5/12 Ed Founda'ons of MM 103
104 Basis Func'on Selec'on Three General Approaches Restric'on methods Limit the class of func'ons Selec'on methods Scan the dic'onary adap'vely (Boos'ng) Regulariza'on methods Use the en're dic'onary but restrict coefficients (Ridge Regression) 6/5/12 Ed Founda'ons of MM 104
105 Overfiyng? Probably Not Because N free parameters (not D) Maximizing margin 6/5/12 Ed Founda'ons of MM 105
106 Summary of ML Introduc'on Linear Models Large D D >> N Genera've vs. Discrimina've Models Nearest Neighbors Non- Linear Models Chapters 10, 11, 12: Large N 6/5/12 Ed Founda'ons of MM 106
107 Reading Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval, E. Y. Chang, Springer, 2011 Chapter #3 Query- Concept Learning Chapter #9 Imbalanced Data Learning Edward Chang Founda'ons of LSMM 107
COMP 562: Introduction to Machine Learning
COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu
More informationCSE546: SVMs, Dual Formula5on, and Kernels Winter 2012
CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationMachine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler
+ Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationLMS Algorithm Summary
LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationAn Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University
An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University Why Do We Care? Necessity in today s labs Principled approach:
More informationLinear Classification and SVM. Dr. Xin Zhang
Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a
More informationMachine Learning and Data Mining. Support Vector Machines. Kalev Kask
Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationSTAD68: Machine Learning
STAD68: Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 1 Evalua;on 3 Assignments worth 40%. Midterm worth 20%. Final
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationMachine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationSupport Vector Machine
Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationUVA CS 4501: Machine Learning. Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationUVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 6: Linear Regression Model with Regulariza@ons Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sec@ons of this course
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationBias/variance tradeoff, Model assessment and selec+on
Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationSupport Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range
More informationSupport Vector Machines
Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationMachine Learning 4771
Machine Learning 477 Instructors: Adrian Weller and Ilia Vovsha Lecture 3: Support Vector Machines Dual Forms Non Separable Data Support Vector Machines (Bishop 7., Burges Tutorial) Kernels Dual Form DerivaPon
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationMachine Learning and Data Mining. Linear regression. Prof. Alexander Ihler
+ Machine Learning and Data Mining Linear regression Prof. Alexander Ihler Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Learning algorithm Program ( Learner ) Change µ Improve
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationAbout this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes
About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the
More informationSupport Vector Machines
Support Vector Machines Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: http://www.cs.cmu.edu/~awm/tutorials/ Where Should We Draw the Line????
More informationFounda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #4 Similarity. Edward Chang
Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #4 Similarity Edward Y. Chang Edward Chang Foundations of LSMM 1 Edward Chang Foundations of LSMM 2 Similar? Edward Chang
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationLearning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationRegression.
Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets
More informationSlides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP
Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More information