Kernel machines and sparsity
|
|
- Timothy Lawson
- 5 years ago
- Views:
Transcription
1 Kernel machines and sparsity 2 juillet, 2009 ENBIS 09, Saint Etienne Stéphane Canu & Alain Rakotomamonjy stephane.canu@litislab.eu
2 Roadmap 1 Introduction A typical learning problem Kernel machines: a definition 2 Tools: the functional framework In the beginning was the kernel Kernel and hypothesis set 3 Kernel machines and regularization path non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simplemkl: the multiple kernel solution 5 Conclusion
3 Optical character recognition Example (The MNIST database) MNIST 1, data = «image-label» n = 60, 000; d = 700; classes = 10 Kernel error rate = 0.56 %, Best error rate = 0.4 %
4 Learning challenges: the size effect Number of variables Bio computing Text Object recognition Census MNIST RCV 1 Sample size Speech translation Scene analysis Geostatistic Lucy 3 key issues 1. learn any problem: 2. from data: functional universality statistical consistency 3. with large data sets: computational efficency kernel machines adress these three issues (up to a certain point regarding efficency) L. Bottou, 2006
5 Kernel machines Definition (Kernel machines) A ( ) ( n (x i, y i ) i=1,n (x) = ψ α i k(x, x i ) + i=1 α et β: parameters to be estimated. p j=1 ) β j q j (x) Exemples n A(x) = α i (x x i ) β 0 + β 1 x i=1 ( A(x) = sign α i exp x x i 2 ) b +β 0 i I( IP(y x) = 1 Z exp i I α i 1I {y=yi }(x x i + b) 2) splines SVM exponential family
6 Roadmap 1 Introduction A typical learning problem Kernel machines: a definition 2 Tools: the functional framework In the beginning was the kernel Kernel and hypothesis set 3 Kernel machines and regularization path non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simplemkl: the multiple kernel solution 5 Conclusion
7 In the beginning was the kenrel... Definition (Kernel) a function of two variable k from X X to IR Definition (Positive kernel) A kernel k(s, t) on X is said to be positive if it is symetric: k(s, t) = k(t, s) an if for any finite positive interger n: {α i } i=1,n IR, {x i } i=1,n X, n n α i α j k(x i, x j ) 0 i=1 j=1 it is strictly positive if for α i 0 n n α i α j k(x i, x j ) > 0 i=1 j=1
8 Examples of positive kernels the linear kernel: s, t IR d, k(s, t) = s t symetric: s t = t s n n n positive: α i α j k(x i, x j ) = i=1 j=1 n α i α j x i x j i=1 j=1 ( n ) n = α i x i α j x j n 2 = α i x i i=1 j=1 i=1 the product kernel: k(s, t) = g(s)g(t) for some g : IR d IR, symetric by construction positive: n n α i α j k(x i, x j ) = n n α i α j g(x i )g(x j ) i=1 j=1 i=1 j=1 ( n = α i g(x i )) ( n n ) 2 α j g(x j ) = α i g(x i ) i=1 j=1 i=1 k is positive (its square root exists) k(s, t) = φ s, φ t J.P. Vert, 2006
9 positive definite Kernel (PDK) algebra (closure) if k 1 (s, t) and k 2 (s, t) are two positive kernels DPK are a convex cone: a 1 IR + a 1 k 1 (s, t) + k 2 (s, t) product kernel k 1 (s, t)k 2 (s, t) proofs by linearity: n n ( α i α j a1 k 1 (i, j)k 2 (i, j) ) n n n n = a 1 α i α j k 1 (i, j) + α i α j k 2 (i, j) i=1 j=1 i=1 j=1 i=1 j=1 n n ( α i α j ψ(xi )ψ(x j ) ) = ( n α i ψ(x i ) )( n α j ψ(x j ) ) by linearity: i=1 j=1 i=1 j=1 assuming ψ l s.t. k 1 (s, t) = l ψ l(s)ψ l (t) n n n n ( α i α j k 1 (x i, x j )k 2 (x i, x j ) = α i α j ψ l (x i )ψ l (x j )k 2 (x i, x j ) ) i=1 j=1 i=1 j=1 n l n ( αi ψ l (x i ) ) ( α j ψ l (x j ) ) k 2 (x i, x j ) = l i=1 j=1 N. Cristianini and J. Shawe Taylor, kernel methods for pattern analysis, 2004
10 Kernel engineering: building PDK for any polynomial with positive coef. φ from IR to IR φ ( k(s, t) ) if Ψis a function from IR d to IR d k ( Ψ(s), Ψ(t) ) if ϕ from IR d to IR +, is minimum in 0 k(s, t) = ϕ(s + t) ϕ(s t) convolution of two positive kernels is a positive kernel K 1 K 2 the Gaussian kernel is a PDK exp( s t 2 ) = exp( s 2 t 2 2s t) = exp( s 2 ) exp( t 2 ) exp(2s t) s t is a PDK and function exp as the limit of positive series expansion, so exp(2s t) is a PDK exp( s 2 ) exp( t 2 ) is a PDK as a product kernel the product of two PDK is a PDK O. Catoni, master lecture, 2005
11 some examples of PD kernels... type name ( k(s, t) radial gaussian exp r 2 b ), r = s t radial laplacian exp( r/b) radial rationnal 1 r 2 radial loc. gauss. max ( 0, 1 r 3b r 2 +b ) d exp( r 2 b ) non stat. χ 2 exp( r/b), r = k (s k t k ) 2 s k +t k projective polynomial (s t) p projective affine (s t + b) p projective cosine s ( t/ s t ) projective correlation exp s t s t b Most of the kernels depends on a quantity b called the bandwidth
12 kernels for objects and structures kernels on histograms and probability distributions ( ) k(p(x), q(x)) = k i p(x), q(x) IP(x)dx kernel on strings spectral string kernel using sub sequences similarities by alignements kernels on graphs the pseudo inverse of the (regularized) graph Laplacian k(s, t) = u φu(s)φu(t) k(s, t) = π exp(β(s, t, π)) L = D A A is the adjency matrixd the degree matrix diffusion kernels 1 Z(b) expbl subgraph kernel convolution (using random walks) and kernels on heterogeneous data (image), HMM, automata... Shawe-Taylor & Cristianini s Book, 2004 ; JP Vert, 2006
13 Roadmap 1 Introduction A typical learning problem Kernel machines: a definition 2 Tools: the functional framework In the beginning was the kernel Kernel and hypothesis set 3 Kernel machines and regularization path non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simplemkl: the multiple kernel solution 5 Conclusion
14 From kernel to functions H 0 = { f mf < ; f j IR; t j X, f (x) = m f } j=1 f jk(x, t j ) let define the bilinear form (g(x) = m g i=1 g i k(x, s i )) : f, g H 0, f, g H0 = f j g i k(t j, s i ) m f m g j=1 i=1 Evaluation functional: x X f (x) = f (.), k(x,.) H0 from k to H with any postive kernel, a hypothesis set H = H 0 can be constructed with its metric
15 RKHS Definition (reproducing kernel Hibert space (RKHS)) a Hilbert space H embeded with the inner product.,. H is said to be with reproduicing kernel if it exists a positive kernel k such that s X, k(., s) H et f H, f (s) = f (.), k(s,.) H positive kernel RKHS any function is pointwise defined defines the inner product it defines the regularity (smoothness) of the hypothesis set
16 functional differentiation in RKHS Let J be a functional J : H IR f J(f ) examples: J 1 (f ) = f 2, J 2 (f ) = f (x), J directional derivative in direction g at point f dj(f, g) = lim ε 0 J(f + εg) J(f ) ε Gradient J (f ) J : H H f J (f ) if dj(f, g) = J (f ), g H exercice: find out J1 (f ) et J2 (f )
17 other kernels (what realy matters) finite kernels k(s, t) = ( φ 1 (s),..., φ p (s) ) ( φ1 (t),..., φ p (t) ) Mercer kernels positive on a compact set k(s, t) = p j=1 λ jφ j (s)φ j (t) positive kernels positive semi-definite conditionnaly positive (for some functions p j ) n n n {x i } i=1,n, α i, α i p j (x i ) = 0; j = 1, p, α i α j k(x i, x j ) 0 i symetric non positive k(s, t) = tanh(s t + α 0 ) i=1 j=1 non symetric non positive the key property: Jt (f ) = k(t,.) holds C. Ong et al, ICML, 2004
18 Let s summarize positive kernels RKHS = H regularity f 2 H the key property: Jt (f ) = k(t,.) holds not only for positive kernels f (x i ) exists (pointwise defined functions) universal consistency in RKHS the Gram matrix summarize the pairwise comparizons
19 Plan 1 Introduction A typical learning problem Kernel machines: a definition 2 Tools: the functional framework In the beginning was the kernel Kernel and hypothesis set 3 Kernel machines and regularization path non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simplemkl: the multiple kernel solution 5 Conclusion
20 Interpolation splines find out f H such that f (x i ) = y i, i = 1,..., n It is an ill posed problem
21 Interpolation splines: minimum norm interpolation { min f H 1 2 f 2 H such that f (x i ) = y i, i = 1,..., n The lagrangian (α i Lagrange multipliers) L(f, α) = 1 n 2 f ( ) 2 α i f (xi ) y i i=1 optimality for f : f L(f, α) = 0 f (x) = n α i k(x i, x) dual formulation (remove f from the Lagrangian): Q(α) = 1 n n n α i α j k(x i, x j ) + α i y i solution: max 2 Q(α) α IR n i=1 j=1 i=1 Kα= y i=1
22 Interpolation splines: minimum norm interpolation { min f H 1 2 f 2 H such that f (x i ) = y i, i = 1,..., n The lagrangian (α i Lagrange multipliers) L(f, α) = 1 n 2 f ( ) 2 α i f (xi ) y i i=1 optimality for f : f L(f, α) = 0 f (x) = n α i k(x i, x) dual formulation (remove f from the Lagrangian): Q(α) = 1 n n n α i α j k(x i, x j ) + α i y i solution: max 2 Q(α) α IR n i=1 j=1 i=1 Kα= y i=1
23 Interpolation splines: minimum norm interpolation { min f H 1 2 f 2 H such that f (x i ) = y i, i = 1,..., n The lagrangian (α i Lagrange multipliers) L(f, α) = 1 n 2 f ( ) 2 α i f (xi ) y i i=1 optimality for f : f L(f, α) = 0 f (x) = n α i k(x i, x) dual formulation (remove f from the Lagrangian): Q(α) = 1 n n n α i α j k(x i, x j ) + α i y i solution: max 2 Q(α) α IR n i=1 j=1 i=1 Kα= y i=1
24 Representer theorem Theorem (Representer theorem) Let H be a RKHS with kernel k(s, t). Let l be a function from X to IR (loss function) and Φ a non decreasing function from IR to IR. If there exists a function f minimizing: f = argmin f H n i=1 l ( y i, f (x i ) ) + Φ ( f 2 ) H then there exists a vector α IR n such that: f (x) = n i=1 α i k(x, x i ) it can be generalized to the semi parametric case: + m j=1 β jφ j (x)
25 Smoothing splines introducing the error (the slack) ξ = f (x i ) y i n 1 min (S) f H 2 f 2 H + 1 2λ ξi 2 i=1 such that f (x i ) = y i + ξ i, i = 1, n three equivalents definitions min f H such that (S 1 n ( ) 2 λ ) min f (xi ) y i + f H 2 2 f 2 H i=1 1 2 f 2 H n ( ) 2 min f (xi ) y n i ( ) 2 f (xi ) y i C f H i=1 such that f 2 i=1 H C using the representer theorem(s 1 ) min α IR n 2 Kα y 2 + λ 2 α Kα (K + λi )α = y 1 using min α IR n 2 Kα y 2 + λ 2 α α α = (K K + λi ) 1 K y
26 From 0 to interpollation problem: min f H 1 2 n i=1 ( f (xi ) y i ) 2 + λ 2 f 2 H solution: α(λ) = (K + λi ) 1 y λ = 0 : α(0) = K 1 y: interpollation λ = : α( ) = 0: Regularization path S: set of solutions as a function of λ S = { α(λ) λ [0, [ } also called the solution path
27 1D Ridge Regression in the costs domain the Loss term L as a function of the penalty term P n (x i α y i ) 2 + λ α 2 min α IR i=1 L(α) = n (x i α y i ) 2 i=1 P(α) = α 2 { L(P) = ap ± b P + c a, b and c IR Figure: regularization path as a function of the criteria L and P.
28 How to tune the regularization parameter λ? brute force for each λ 1 < λ 2 <... < λ k <... < λ K compute α k = (K + λ k I ) 1 y, k = 1, K warm start α k = Φ(α k 1 ) warm start + prediction step O ( Kn 3) (using l conjugate gradient iterations) O ( Kln 2) α (p) k = α k 1 + ρ α(l(α k 1 ) + λ k P(α k 1 )) (prediction) α k = Φ(α (p) k ) (correction step using CG) O ( Kl n 2) use only the prediction step! α k = α k 1 + λ k Ψ(α k 1 ) (prediction step) to do so the regularization path has to be piecewise linear O ( Kn 2)
29 How to tune the regularization parameter λ? brute force for each λ 1 < λ 2 <... < λ k <... < λ K compute α k = (K + λ k I ) 1 y, k = 1, K warm start α k = Φ(α k 1 ) warm start + prediction step O ( Kn 3) (using l conjugate gradient iterations) O ( Kln 2) α (p) k = α k 1 + ρ α(l(α k 1 ) + λ k P(α k 1 )) (prediction) α k = Φ(α (p) k ) (correction step using CG) O ( Kl n 2) use only the prediction step! α k = α k 1 + λ k Ψ(α k 1 ) (prediction step) to do so the regularization path has to be piecewise linear O ( Kn 2)
30 How to tune the regularization parameter λ? brute force for each λ 1 < λ 2 <... < λ k <... < λ K compute α k = (K + λ k I ) 1 y, k = 1, K warm start α k = Φ(α k 1 ) warm start + prediction step O ( Kn 3) (using l conjugate gradient iterations) O ( Kln 2) α (p) k = α k 1 + ρ α(l(α k 1 ) + λ k P(α k 1 )) (prediction) α k = Φ(α (p) k ) (correction step using CG) O ( Kl n 2) use only the prediction step! α k = α k 1 + λ k Ψ(α k 1 ) (prediction step) to do so the regularization path has to be piecewise linear O ( Kn 2)
31 How to tune the regularization parameter λ? brute force for each λ 1 < λ 2 <... < λ k <... < λ K compute α k = (K + λ k I ) 1 y, k = 1, K warm start α k = Φ(α k 1 ) warm start + prediction step O ( Kn 3) (using l conjugate gradient iterations) O ( Kln 2) α (p) k = α k 1 + ρ α(l(α k 1 ) + λ k P(α k 1 )) (prediction) α k = Φ(α (p) k ) (correction step using CG) O ( Kl n 2) use only the prediction step! α k = α k 1 + λ k Ψ(α k 1 ) (prediction step) to do so the regularization path has to be piecewise linear O ( Kn 2)
32 How to choose L and P to get linear reg. path? Solution path is linear min α IR d L(α) + λp(α) one cost is piecewise quadratic and the other one piecewise linear α(λ + ε) α(λ) 1. Piecewise linearity: lim = constant ε 0 ε 2. optimality 3. use Taylor expantion convex case [Rosset & Zhu, 07] L(α(λ)) + λ P(α(λ)) = 0 L(α(λ + ε)) + (λ + ε) P(α(λ + ε)) = 0 α(λ + ε) α(λ) lim = [ 2 L(α(λ)) + λ 2 P(α(λ)) ] 1 P(α(λ)) ε 0 ε 2 L(α(λ)) = constant and 2 P(α(λ)) = 0
33 standart formulation portfolio optimization (Markovitz, { 1952) return vs. risk min 1 α 2 α Qα with e α = C efficiency frontier: piecewise linear (Critical path Algo.) Sensitivity analysis: standart formulation (Heller, 1954) { min 1 α 2 α Qα + (c + λ c) α with Aα = b + µ b Parametric programming (see T. Gal s book 1968) in the general case of PQP: the reg. path is piecewise linear PLP and multi parametric programming
34 Piecewise linear regularization path algorithms L P regression classification clustering L 2 L 1 Lasso/LARS L1 L2 SVM L1 PCA/SVD L 1 L 2 SVR SVM OC SVM L 1 L 1 L1 LAD L1 SVM Danzig Selector Table: example of piecewise linear regularization path algorithms. d P : L p = β j p L : L p : f (x) y p hinge (yf (x) 1) p + j=1 ε-insensitive { 0 if f (x) y < ε f (x) y ε else Huber s loss: { f (x) y 2 if f (x) y < t 2t f (x) y t 2 else
35 the world is changing The Gaussian Hare and the Laplacian Tortoise Computability of L 1 vs. L 2 Regression Estimators. Portnoy & Koenker 1997
36 The consequence of having useful reg. path min L(α) + λp(α) {α(λ) λ [0, ]} α IR d efficient computing piecewise linearity α NEW = α OLD + (λ NEW λ OLD )u = piecewise linearity L 1 criteria = either L or P is L 1 = sparsity: a lot of α j = 0 sparsity and active constraints why does L 1 provide sparsity?
37 Definition: strong homogeneity set (variables) I 0 = { j {1,..., d} αj = 0 } Theorem Regular if L(α) + λp(α) differentiable and if I 0 (y) ε > 0, y B(y, ε) such that I 0 (y ) I 0 (y) Singular if L(α) + λp(α) NONdifferentiable and if I 0 (y) ε > 0, y B(y, ε) then I 0 (y ) = I 0 (y) singular criteria = sparsity Nikolova, 2000 L 1 criteria are singular in 0 singurality provides sparsity
38 L1 Splines: introducing sparsity The L1 error (S 1 ) min f H 1 2 f 2 H + 1 λ n ξ i i=1 such that f (x i ) = y i + ξ i, i = 1, n representer theorem: The dual: { (D 1 ) f (x) = n (α + α )k(x, x i ) i=1 1 min α 2 (α + α ) K(α + α ) + (α + + α ) y + α such that 0 α + i 1 λ, 0 α i 1 λ, i = 1, n Typical parametric quadratic program (pqp) but α i 0
39 K-Lasso (Kernel Basis pursuit) The Kernel Lasso (S 1 ) { 1 min α IR n 2 Kα y 2 + λ n α i i=1 Typical parametric quadratic program (pqp) with α i = 0 Piecewise linear regularization path The dual: (D 1 ) { min α such that 1 2 Kα 2 K (Kα y) t The K-Danzig selector can be treated the same way require to compute K K - no more function f!
40 Support vector regression (SVR) Lasso s dual adaptation: { min 1 α 2 Kα 2 s. t. K (Kα y) t { min f H 1 2 f 2 H s. t. f (x i ) y i t, i = 1, n The support vector regression introduce slack variables (SVR) { 1 min f H 2 f 2 H + C ξ i such that f (x i ) y i t + ξ i 0 ξ i i = 1, n a typical multi parametric quadratic program (mpqp) piecewise linear regularization path α(c, t) = α(c 0, t 0 ) + ( 1 C 1 C 0 )u + 1 C 0 (t t 0 )v 2d Pareto s front (the tube width and the regularity)
41 y y Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Support vector regression illustration 1 Support Vector Machine Regression 1.5 Support Vector Machine Regression x x C large C small there exists other formulations such as LP SVR...
42 Large scale pqp Not yet adapted reweighed LS interior points projected gradient Adapted homotopy (regularization path) and other pqp active set decomposition coordinate wise (Gauss Seidel) other: cutting plane, proximal...
43 checker board 2 classes 500 examples separable
44 a separable case n = 5000 data points n = 500 data points
45 empirical complexity Results for C = 1 Left : γ = 1 Right γ = 0.3 Results for C = 1000 Left : γ = 1 Right γ = 0.3 Results for C = Left : γ = 1 Right γ = Training time results in cpu seconds (log scale) Training size (log) Training size (log) Training size (log) Training size (log) Training size (log) Training size (log) Number of Support Vectors (log scale) Training size (log) Training size (log) Training size (log) Training size (log) Training size (log) Training size (log) Error rate (%) CVM LibSVM SimpleSVM (over 2000 unseen points) Training size (log) Training size (log) Training size (log) Training size (log) Training size (log) Training size (log) G. Loosli et al JMLR, 2007
46 Plan 1 Introduction A typical learning problem Kernel machines: a definition 2 Tools: the functional framework In the beginning was the kernel Kernel and hypothesis set 3 Kernel machines and regularization path non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simplemkl: the multiple kernel solution 5 Conclusion
47 Multiple Kernel The model l f (x) = α i K(x, x i ) + b, i=1 Given M kernel functions K 1,..., K M that are potentially well suited for a given problem, find a positive linear combination of these kernels such that the resutling kernel K is optimal K(x, x ) = M m=1 d m K m (x, x ), with d m 0, m d m = 1 Need to learn together the kernel coefficients d m and the SVR parameters α i, b.
48 Multiple Kernel functional Learning min {f m},b,ξ,d The problem (for given C and t) 1 1 min f m 2 H {f m},b,ξ,d 2 d m + C ξ i m m i s.t. f m (x i ) + b y i t + ξi iξ i 0 i m d m = 1, d m 0 m, regularization formulation min {f m},b,d m 1 1 f m 2 H 2 d m + C max( f m (x i ) + b y i t, 0) m m i m d m = 1, d m 0 m, m Equivalently ( max i m ) f m (x i ) + b y i t, C m 1 d m f m 2 H m + µ m d m
49 Multiple Kernel functional Learning The problem (for given C and t) 1 1 min f m 2 H {f m},b,ξ,d 2 d m + C ξ i m m i s.t. f m (x i ) + b y i t + ξi iξ i 0 i m d m = 1, d m 0 m, m Treated as a bi-level optimization task min d IR M s.t. 1 1 min f m 2 H {f m},b,ξ 2 d m + C ξ i m m i s.t. f m (x i ) + b y i t + ξi m ξ i 0 i d m = 1, d m 0 m, m i
50 Multiple Kernel Algorithm Use a Reduced Gradient Algorithm 2 min d IR M s.t. J(d) d m = 1, d m 0 m, m SimpleMKL algorithm set d m = 1 M for m = 1,..., M while stopping criterion not met do compute J(d) using an QP solver with K = m d mk m compute J d m, Hessian and descent direction D γ compute optimal stepsize d d + γd end while Recent improvement reported using the Hessian 2 Rakotomamonjy et al. JMLR 08
51 Complexity For each iteration: SVM training: O(nn sv + n 3 sv). Inverting K sv,sv is O(n 3 sv), but might already be available as a by-product of the SVM training. Computing H: O(Mn 2 sv) Finding d: O(M 3 ). The number of iterations is usually less than 10. When M < n sv, computing d is not more expensive than QP.
52 Multiple Kernel experiments 1 LinChirp 1 Wave 1 Blocks 1 Spikes x x x x Single Kernel Kernel Dil Kernel Dil-Trans Data Set Norm. MSE (%) #Kernel Norm. MSE #Kernel Norm. MSE LinChirp 1.46 ± ± ± 0.20 Wave 0.98 ± ± ± 0.07 Blocks 1.96 ± ± ± 0.13 Spike 6.85 ± ± ± 0.84 Table: Normalized Mean Square error averaged over 20 runs.
53 Conclusion Kernels sparsity L 1 efficent algorithm some limits instability large data sets when to stop non convexity Perspectives more algorithms, more criteria, more applications
54 Conclusion Kernels sparsity L 1 efficent algorithm some limits instability coupling with active set large data sets randomize when to stop derive relevant bounds non convexity iterative L1 Perspectives more algorithms, more criteria, more applications
55 Questions? questions? LAR(s) (and svmpath) in R Matlab www-stat.stanford.edu/~hastie asi.insa-rouen.fr/~vguigue/lars.html kernlab SVR Matlab cran.r-project.org/src/contrib/descriptions/kernlab.html asi.insa-rouen.fr/~arakotom Danzig selector
56 biblio Least Angle Regression. Least angle regression, Bradley Efron, ; Annals of statistics, vol. 32 (2), pp , 2004 Piecewise Linear Regularized Solution Paths, Saharon Rosset, Ji Zhu. Annals of Statistics, to appear, 2007 Local strong homogeneity of a regularized estimator, Mila Nikolova, SIAM Journal on Applied Mathematics, vol. 61, no. 2, pp , Two-Dimensional Solution Path for Support Vector Regression, Gang Wang, Dit-Yan Yeung, Frederick H. Lochovsky, ICML, 2006 Algorithmic linear dimension reduction in the l 1 norm for sparse vectors, A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin, 2006 (submitted). A tutorial on support vector regression, A. Smola and B. Scholkopf, Statistics and Computing 14(3), pp , 2004 The Dantzig selector: Statistical estimation when p is much smaller than n, Emmanuel Candes and Terence Tao Submitted to IEEE Transactions on Information Theory, June An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, I. Daubechies, M. Defrise and C. De Mol, Comm. Pure Appl. Math, 57, pp , Pathwise coordinate optimization, Jerome Friedman, Trevor Hastie and Robert Tibshirani, technical report, May 2007.
Lecture 9: Multi Kernel SVM
Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:
More informationRobust minimum encoding ball and Support vector data description (SVDD)
Robust minimum encoding ball and Support vector data description (SVDD) Stéphane Canu stephane.canu@litislab.eu. Meriem El Azamin, Carole Lartizien & Gaëlle Loosli INRIA, Septembre 2014 September 24, 2014
More informationLMS Algorithm Summary
LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationLecture 6: Minimum encoding ball and Support vector data description (SVDD)
Lecture 6: Minimum encoding ball and Support vector data description (SVDD) Stéphane Canu stephane.canu@litislab.eu Sao Paulo 2014 May 12, 2014 Plan 1 Support Vector Data Description (SVDD) SVDD, the smallest
More informationSVM and Kernel machine
SVM and Kernel machine Stéphane Canu stephane.canu@litislab.eu Escuela de Ciencias Informáticas 20 July 3, 20 Road map Linear SVM Linear classification The margin Linear SVM: the problem Optimization in
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationLeast Squares SVM Regression
Least Squares SVM Regression Consider changing SVM to LS SVM by making following modifications: min (w,e) ½ w 2 + ½C Σ e(i) 2 subject to d(i) (w T Φ( x(i))+ b) = e(i), i, and C>0. Note that e(i) is error
More informationComments on the Core Vector Machines: Fast SVM Training on Very Large Data Sets
Journal of Machine Learning Research 8 (27) 291-31 Submitted 1/6; Revised 7/6; Published 2/7 Comments on the Core Vector Machines: Fast SVM Training on Very Large Data Sets Gaëlle Loosli Stéphane Canu
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationSupport Vector Machines Explained
December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationClassification and Support Vector Machine
Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationLecture 2: Linear SVM in the Dual
Lecture 2: Linear SVM in the Dual Stéphane Canu stephane.canu@litislab.eu São Paulo 2015 July 22, 2015 Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationNon-linear Support Vector Machines
Non-linear Support Vector Machines Andrea Passerini passerini@disi.unitn.it Machine Learning Non-linear Support Vector Machines Non-linearly separable problems Hard-margin SVM can address linearly separable
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationSupport Vector Machine for Classification and Regression
Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More information... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology
..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....
More informationDirect Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationSemi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances
Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationSUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS
SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS Filippo Portera and Alessandro Sperduti Dipartimento di Matematica Pura ed Applicata Universit a di Padova, Padova, Italy {portera,sperduti}@math.unipd.it
More informationHierarchical kernel learning
Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel
More information5.6 Nonparametric Logistic Regression
5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationSUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE
SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE M. Lázaro 1, I. Santamaría 2, F. Pérez-Cruz 1, A. Artés-Rodríguez 1 1 Departamento de Teoría de la Señal y Comunicaciones
More informationMachine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart
More informationSupport Vector Machine Regression for Volatile Stock Market Prediction
Support Vector Machine Regression for Volatile Stock Market Prediction Haiqin Yang, Laiwan Chan, and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationRegularization in Reproducing Kernel Banach Spaces
.... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred
More informationSupport Vector Machines
Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationSupport Vector Machine via Nonlinear Rescaling Method
Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationThe Entire Regularization Path for the Support Vector Machine
The Entire Regularization Path for the Support Vector Machine Trevor Hastie Department of Statistics Stanford University Stanford, CA 905, USA hastie@stanford.edu Saharon Rosset IBM Watson Research Center
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationSupport Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature
Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].
More informationThe lasso an l 1 constraint in variable selection
The lasso algorithm is due to Tibshirani who realised the possibility for variable selection of imposing an l 1 norm bound constraint on the variables in least squares models and then tuning the model
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技巧 ) Lecture 6: Kernel Models for Regression Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More information