Dipartimento di Informatica e Scienze dell Informazione
|
|
- Kelly Skinner
- 6 years ago
- Views:
Transcription
1 Dipartimento di Informatica e Scienze dell Informazione A Statistical Learning approach to Liver Iron Overload Estimation Luca Baldassarre 1,2, Barbara Gianesin 1, Annalisa Barla 2 and Mauro Marinelli DIFI - Università di Genova, v. Dodecaneso 33, 16146, Genova, Italy 2 - DISI - Università di Genova, v. Dodecaneso 35, 16146, Genova, Italy Technical Report DISI-TR DISI, Università di Genova v. Dodecaneso 35, Genova, Italy 1
2 A Statistical Learning approach to Liver Iron Overload Estimation Luca Baldassarre, Barbara Gianesin, Annalisa Barla and Mauro Marinelli Abstract In this work we present and discuss in detail a novel vector-valued regression technique: our approach allows for an all-at-once estimation of the vector components, as opposed to solve a number of independent scalar-valued regression tasks. Despite its general purpose nature, the method has been designed to solve a delicate medical issue: a reliable and non-invasive assessment of body-iron overload. The Magnetic Iron Detector (MID) measures the magnetic track of a person, which depends on the anthropometric characteristics and the body-iron burden. We aim to provide an estimate of this signal in absence of iron overload. We show how this question can be formulated as the estimation of a vector-valued function which encompasses the prior knowledge on the shape of the magnetic track. This is accomplished by designing an appropriate vector-valued feature map. We successfully applied the method on a dataset of 84 volunteers. 1 Introduction The iron is essential to human life, but is toxic in excessive amounts. There are several diseases characterized by liver iron overload, such as thalassemia or hereditary hemochromatosis. The assessment of body iron excess is therefore essential for managing the therapies for these diseases. Each disease is characterized by a different mechanism of iron accumulation. For thalassemic patients iron overload is induced by periodical blood transfusions and for hemochromatosis patients it is induced by an incorrect dietary absorbtion of iron. The invasive liver biopsy is still considered the best way to perform iron overload evaluation, but being a local measure, it is affected by large errors due to the heterogeneous distribution 2
3 3 PATIENT x[cm] Eddy Current Signal 3 R ( x) = Magnetization Signal x[cm] VOLUNTEER μv MAGNETIZATION -2 EDDY -3 CURRENT -4-5 μv 1.1: Volunteer and patient magnetizaton signals. 1.2: Eddy current and magnetization signals of a volunteer. of iron deposition in the liver. Recently, Marinelli and colleagues [9] have developed a roomtemperature biosusceptometer, the Magnetic Iron Detector (MID), which measures the variation of a magnetic field at different positions along the axis that crosses the patient s liver. This instrument allows the non-invasive assessment of the iron overload in the whole liver. Given an estimate of the background signal of a patient, that is the signal that would be generated in absence of iron overload, it is possible to recover the iron burden by subtracting the estimated signal from the measured signal. The statistical model developed by Marinelli and coworkers [8] is currently used at the E.O. Ospedali Galliera Hospital in Genoa, Italy, for assessing the iron overload. The model has been trained on a dataset of 84 healthy volunteers and it estimates the ratio R(x) between the two signals shown in subfig.1.2, of which only the magnetization signal depends on the liver iron content. The core idea behind their approach is that the magnetization signal of a well-treated patient is indistinguishable from the one generated by a healthy volunteer with the same biometric features, see subfig.1.1. Furthermore, they assume that the ratio R(x) of the two signals, evaluated only in the range between -8cm and 8cm, resembles a parabola. We reformulate this problem in the context of Statistical Learning presenting a method to transform a curve fitting task into a vector-valued regression model. Since the measures are always taken at fixed positions along the measurement axis, they can be thought of as components of a vector and a high correlation among them can be assumed, because they approximately lie on a parabola. In this way we eschew from directly estimating the magnetization curve. Our 3
4 vector-valued regression model simultaneously estimates the five points of the background signal. The correlation between these points is introduced by an appropriate feature map, which is linear in the biometric features and quadratic in the measurement positions. As we will show in subsec.3.2, we can compute the corresponding matrix-valued kernel function The method described in Sec.3.2 can be implemented by means of iterative algorithms, see [7], such as Landweber, ν-method or the sparsity enforcing l1l2 regularization [3; 4]. 2 Liver Iron Overload Estimation The biosusceptometer is composed of an AC magnetic source and a pickup coil which measures the electromotive force emf produced by the oscillation of the magnetic field flux, as shown in subfig.1.3. A body placed between the magnet and the pickup slightly modifies the flux and therefore the emf measured: the amount of the variation depends on the magnetic properties of the body, on its geometry and on its positioning in the field. The emf produced in the pickup by the field is about 4V and the diamagnetic signal of a body is about 4µV. A moderate iron overload adds a paramagnetic contribution of about 0.4µV. The symmetry of the system and the use of synchronous detection make this difficult measurement feasible [8]. A sample of susceptibility χ, placed between the magnet and one of the pickups, generates the signal: V = V olume χ( r)g( r)d r (1) The weight function g( r) is reported in subfig.1.4: all body tissues contribute to the signal generation, but the major contribution comes from those between the magnet and the pick up coil (measurement region). The patient lies supine on a stretcher, such that in the measurement region the liver center of mass crosses the magnetic field axis. The magnetic track of the patient is the complete scan of the magnetic properties of the body section. It is composed of the measurements of the magnetic signal 4cm apart. Position x = 0cm corresponds to the center of the body; negative positions indicate the liver side, while positive positions the spleen side. Subfigure 1.1 reports the magnetic tracks of a healthy volunteer and of a patient with similar anthropometric features: the liver iron overload produces an evident 4
5 A Magnet A g(r ) g max B Pickup g( r ) gmax B g( r ) g max 1.3: Magnet and pickup configuration. 1.4: Weight function g( r) and its cross section in the direction A and B. Figure 1: When the patient is positioned between the magnet and the pickup a emf is produced by the magnetization on the tissues of the patient. With this positioning, the whole liver falls in the region where the function g( r) is greater. variation of the signal in the left part of the track. Due to the oscillating magnetic field, the magnetic signal generated by the human body has two independent sources: the magnetization signal, from the diamagnetic and paramagnetic properties of the tissues, and the eddy current signal, from their electrical conductivity. For each patient a double track is recorded (an example is shown in subfig. 1.2): only the magnetization signal depends on the iron overload. The aim of the present work is to find a model which best approximates the magnetization signal of a healthy volunteer (background signal) from its anthropometric data and the eddy current signal. The iron overload can be evaluated by computing the difference between the measured magnetization signal and the estimate of the background signal. Therefore, increasing the prediction accuracy of the model, increases our ability to detect slight iron overloads. However, the maximum accuracy we can obtain is limited by the measurement error, mainly due to the positioning of the patient on the stretcher. This error is about 150nV and corresponds to an iron overload of 0.4g 5
6 3 Statistical Learning Approach 3.1 Non-parametric regression and regularization Given a set of input-output examples z = {(x 1, y 1 ),..., (x n, y n )}, x i X and y i Y, the aim of statistical learning is to find the deterministic function that best represents the relationship between x and y. This function is called the regression function. To tackle this problem it is necessary to define the space of candidate functions (hypothesis space, H) and which measure to use to assess the goodness of a candidate (loss function, V ). Ideally one would like to find a function that minimizes the loss function on all the possible input-output pairs. Since one always deals with finite sets, one has to resort to minimize the empirical risk, that is the loss calculated only on the examples z: E n (f) = 1 n n V (y i, f(x i )). i=1 If the hypothesis space is large enough to accommodate for almost any sensible function, it will always be possible to perfectly predict (or fit) the examples z, without any guarantee that this minimizer will perform well on unseen data. This problem is called overfitting. There are two alternative approaches to avoid incurring in overfitting: the first consists in restricting the hypothesis space, the second in favoring smooth functions. It is usually more straightforward to translate our prior information on the nature of the specific problem at hand into properties of the regression function than into a restriction of the hypothesis space. We prefer to follow the second approach. Convenient hypothesis spaces are the Reproducing Kernel Hilbert Spaces (RKHS), which, for specific choices of the kernel function, K, are dense in L 2 (X ). The norm in these spaces normally reflects the L2 norm: a smaller norm corresponds to smoother functions. It is therefore natural to search for those functions that minimize the following functional: 1 n n V (y i, f(x i )) + λ f 2 K (2) i=1 The regularizing parameter, λ, controls the trade off between the two terms: let λ reach zero and we obtain the interpolating solution, let λ grow and the solution will be smoother and smoother. This approach is called Tikhonov regularization. In a RKHS the representer theorems [5; 10] 6
7 guarantee that the solution can always be written as: f(x) = n K(x, x i )c i i=1 where the coefficients c i depend on the data, on the loss function, on the kernel choice and on the regularization parameter λ. Note that, if Y R then the c i and K(x, x i ) are scalar, whether if Y R d the c i are d-dimensional vectors and K(x, x i ) is a d d matrix. The direct approach for vector-valued regression, as in [10], is computationally expensive since it requires to invert a nd nd matrix. To overcome this issue, we propose an extension to the vector-valued case of iterative methods originally developed for scalar regression [12; 7]. The main idea of these techniques is to start with an approximate solution and iteratively add a correction in the direction opposite to the gradient of the empirical risk. Letting the iterations go to infinity leads to an overfitted solution, with its problems of stability and generalization. By early stopping the procedure, a regularized solution is achieved. The number of iterations m plays the role of the regularization parameter. We are also interested in studying feature selection on vector-valued functions. We implemented the l1l2 regularization, a sparsification method initially proposed by [13], studied in [3] and already applied in [6]. This method iteratively minimizes the following functional, derived from (2) with the square loss and the addition of a l1 penalty term: 1 n n (y i f(x i )) 2 d + λ(1 α) f 2 l2 + λα f l1. i=1 3.2 Designing the feature map For each person i = 1,..., 84, we consider only the 5 measures y ik at positions t k = { 8, 4, 0, 4, 8}cm. The measures can be thought as the components of a five-dimensional vector and lie approximately on a parabola, hence we can model them as y ik = f(x i ) k + ɛ ik, where x i stands for the biometric data of the volunteer i, ɛ ik represents the noise and: f(x i ) k = c 0 (x i ) + c 1 (x i )t k + c 2 (x i )t 2 k + b(t k), (3) where b(t) represents a measure offset independent of the volunteer. To avoid computing b(t k ), we choose to set the mean of the values y ik to zero for each k. 7
8 In our model, we assume that the coefficients c j depend linearly on x: c j (x) = β j x, j = 0, 1, 2 and introduce the vector-valued feature map, ϕ : X R 5 3p, (X R p, p = 22): x xt 1 xt 2 1 x xt 2 xt 2 2 ϕ(x) = x xt 3 xt 2. (4) 3 x xt 4 xt 2 4 x xt 5 xt 2 5 Let us define β as the vector obtained by concatenating the coefficient vectors β j. Note that the element β lj is the l-th component of the coefficient vector β j. The vector-valued estimator can be rewritten as a linear combination of these new features: f(x) = ϕ(x)β, β R 3p. (5) We decided to use the quadratic loss function V, therefore the empirical risk is: E n (β) = 1 n n y i ϕ(x i )β 2 R. 5 i=1 Our aim is to compare the performance of the MID parametric model versus the vector-valued model estimated via the Landweber, ν-method and l1l2 algorithms. These methods require the computation of the empirical risk gradient, which, for this specific case, is: E n (β) = 2 n (ϕy ϕt ϕβ) (6) n (ϕy ) γ = < ϕ γ (x i ), y i > R 5 (ϕ T ϕ) γ,γ = i=1 n < ϕ γ (x i ), ϕ γ (x i ) > R 5 i=1 where ϕ γ (x) corresponds to the γ-th row of ϕ(x), ϕy R 3p and ϕ T ϕ R 3p 3p. The simplest iterative method is the Landweber approach [1]. It starts with the null solution (i.e. all the coefficients β lj equal to zero), which is updated by adding the negative of the gradient multiplied by a constant step size, η: β m+1 = β m η E n (β m ), β 0 = (0,..., 0) 8
9 The number of iterations m corresponds to the inverse of the regularization parameter λ. The ν method [7] extends Landweber by using a dynamic step size and introducing an inertial term which keeps memory of the previous update: β m+1 = β m + u(β m β m 1 ) wη E n (β m ), where w and u change at each iteration. It has been shown that this algorithm performs better and faster than Landweber: in fact, the number of iterations corresponds to λ 2. l1l2 regularization iteratively minimizes the following functional: 1 n n y i ϕ(x i )β 2 R + λ(1 α) β 2 5 l2 + λα β l1 i=1 The l1 penalty term forces many of the coefficients β lj to be zero and the corresponding variables can be considered irrelevant to the problem and discarded. The iterations are essentially of the Landweber type, but at each step the coefficients are soft-thresholded and shrunk: β m+1 = H(β m η E n (β m ), τ)/(1 + µ) τ = λα µ = λ(1 α) where H is the soft-thresholding operator, which sets to zero all coefficients within [ τ, τ] and shifts towards zero by τ the remaining coefficients. The algorithm stops accordingly to a convergence criterion, for details see [3]. From the vector-valued feature map ϕ we can calculate the corresponding matrix-valued kernel. Following [2]: 5 (K(x, s)) pq = ϕ kp (x)ϕ kq (s) k=1 = (x s)(1 + t p t q + t 2 pt 2 q). Note: We can recast the vector-valued model into a scalar one, by considering for each volunteer five input points (x, t k ), one for each measurement position, and using the factorized scalar kernel: K((x, t p ), (s, t q )) = (x s)(1 + t p t q + t 2 pt 2 q) 9
10 It is possible to extend our approach to the non-linear case by replacing the dot product with a suitable scalar kernel function. The estimator can be written as: f(x, t) = n 5 K((x, t), (x i, t k ))c ik i=1 k=1 With this approach one can use standard scalar regularizing regression techniques, but there are some considerations to be made. The first regards the i.i.d hypothesis on the examples. Reformulating the problem as scalar regression, each volunteer is associated with 5 vectors composed of two parts. The first is the biometric data of the volunteer, x i, while the second part is the measurement position, t k. Consequently, the training set has 5 n elements, whose biometric and position components are not i.i.d. Furthermore, enforcing sparsity on the coefficients c ik is very different than sparsifying the coefficients β lj, which are directly related to the biometric features. 3.3 A naïve scalar approach For comparison, we tested our model against a naïve approach, which consists in treating the measures at each position as independent scalar regression problems. Therefore, five scalar models are trained separately and their outputs combined to recover the background signal. The prior knowledge that the magnetic signal for each subject is roughly a parabola, is no longer taken into account. We implemented standard RLS regression with polinomial and gaussian kernel and l1l2 regularization. 3.4 Model selection and assessment We adopt an experimental protocol in order to select the model parameters and assess the generalization capabilities of our method in an unbiased way. We perform two nested loops of K-fold Cross Validation. We recall that the estimate of the generalization error is the mean of the empirical errors on the K test sets. If K equals the total number of data available, the method is called Leave One Out Cross Validation (LOO). Higher values of K reduce the bias of the estimator, since the model is trained on more data, but increase its variance, since fewer 10
11 data are used for testing. On the other hand, more splits imply more computations, hence more time.in some cases, RLS for example [11], closed form solutions of the Leave One Out Error have been obtained, resulting in very fast computation. For the vector-valued model, the inner loop is a 5-fold Cross Validation and is performed to select the regularizing parameter (e.g. λ or the number of iterations, m). For each value of the parameter, an estimate of the generalization error is computed. The value that minimizes the error is used for training. The outer loop is a Leave One Out Cross Validation evaluating the performance of the chosen model. The estimate of the generalization error is the mean of the K = n empirical errors. For the RLS scalar models, the inner loop is a LOO CV for selecting both the kernel parameter and the regularizing parameter, λ, exploiting the computational advantage of the closed form solution for the LOO error for the latter. The selection of λ for the scalar l1l2 regularization method was performed by a 5-fold CV. The evaluation of the performce of these models has been carried out through LOO CV, in correspondence to the procedure adopted for the vector-valued model. 4 Results The data set is composed of 84 healthy volunteers represented by vectors of features which are reported in Table 1. Note that from now on we will refer to n as the number of examples in the training set within the innermost loop of CV. The features are highly inhomogeneous and can lead to numerical problems, therefore we decided to normalize our data. We set the columns of the n p data matrix X = (x 1,..., x n ) T, to have zero mean and fixed range and changed the variable t from { 8, 4, 0, 4, 8} to { 1, 0.5, 0, 0.5, 1}, since it only represents a label for the components of the vector y. Thus, each element of the 3d matrix ϕ(x) R n 3p 5, obtained by applying the feature map to the data matrix X, belongs to [ 1, 1]. In the test phase, we apply these normalizing factors to the test data. For model selection and assessment we used the experimental protocol outlined in subsec.3.4: the model parameters to be selected are the number of iterations m for the Landweber and ν-method algorithms and the regularizing parameter λ for the l1l2 method. In the latter case, 11
12 Table 1: Volunteer s features Feature Description 1 Eddy current at -12 cm 2 Eddy current at -8 cm 3 Eddy current at -4 cm 4 Eddy current at 0 cm 5 Eddy current at 4 cm 6 Eddy current at 8 cm 7 Eddy current at 12 cm 8 Thorax section area at 0 cm 9 Thorax section area at 18 cm 10 Thorax section area at -18 cm 11 Thorax height at 0 cm 12 Thorax height at 18 cm 13 Thorax height at -18 cm 14 Adam s apple position 15 Navel position 16 Age 17 Height 18 Weight 19 Thorax circumference 20 Circumference under ribs arch 21 BMI (Body Mass Index) 22 Body area 12
13 α was set to 0.9 to enforce maximum sparsity while retain correlated features [3]. For scalar RLS regression, we also selected the kernel parameters (the degree of the polynomial or the σ of the gaussian) alongside the regularization parameter λ. The implemented iterative algorithms require the specification of a step size η. We choose the value η = (2 ϕ T ϕ ) 1 which guarantees their convergence, see [12; 3; 7]. We report the selected parameters in Table 2 for the vector-valued model and in Table 3 for the naive approach. Note that the values correspond to the median of the parameters for each model learnt during the outer loop of LOO cross validation. Table 2: Selected parameters Model Number of iterations λ Landweber 397 n.a. ν-method 68 n.a. l1l Table 3: Selected parameters - Naive approach Model RLS gaussian RLS polynomial l1l2 Position λ Par λ Par λ Par x x x x x Figure 2 shows the boxplots for the LOO errors distributions for all the models tested, compared with the model in use at the E.O. Ospedali Galliera Hospital, assessed with the same validation protocol. As expected, we observe that the LOO errors show a high variance. Table 4 summarizes the statistics for each distribution. The l1l2 algorithm performs slightly better and seems more robust to outliers, both for the vector-valued model and for the naïve approach. The accuracies obtained with these methods 13
14 Table 4: Statistic summary for the LOO errors distributions Model I quartile Median III quartile Landweber ν-method l1l MID RLS gauss RLS polynomial L1L correspond to a precision in the iron overload estimation of about 0.8g. Iron overload lower than 1g is considered mild: currently no model is capable to detect this kind of iron burden. 5 Conclusions The model proposed is a general method to approach vector-valued regression problems. Moreover, it can also be used to estimate a curve explained by a variable that is always sampled at fixed values. Prior knowledge (e.g. the shape of the curve w.r.t. the parametrizing variable, or the correlation among the elements of the vector-valued function to be estimated) can be easily incorporated by explicitly writing the feature map or the kernel function. Our results show that the iterative algorithms can be applied to the vector-valued case with success. They also provide an efficient alternative to the direct computation of the inverse of a nd nd matrix. The model selection and validation protocol adopted leads to an unbiased solution, avoiding overfitting and unreliable estimates of the performance. The Marinelli group will soon start a new data acquisition campaign on volunteers: the old features will be measured more accurately and some new ones will be introduced, for example a 3d laser scan of the volunteer s thorax. The statistical methods here exposed will be employed on the new dataset and will be compared against a neural network system that the Marinelli group is planning to develop. 14
15 Absolute value of residue Landweber nu method L1L2 MID RLS gauss RLS poly L1L2 scalar Model Figure 2: LOO errors distributions. The first three models are obtained from the vector-valued one by the indicated algorithms. The MID model is the one currently used for diagnosis. The last three boxplots regard the naïve scalar approach. References [1] P. Bühlmann and B. Yu. Boosting with the l2- loss: Regression and classication. Journal of American Statistical Association, 98: , [2] A. Caponnetto, C. Micchelli, M. Pontil, and Y. Ying. Universal kernels for multi-task learning. Journal of Machine Learning Research, submitted. [3] C. De Mol, E. De Vito, and L. Rosasco. Sparse tikhonov regularization for variable selection and learning. Technical report, DISI, [4] C. De Mol, S. Mosci, M. Traskine, and A. Verri. A regularized method for selecting nested groups of relevant genes from microarray data. Technical report, DISI,
16 [5] E. De Vito, L. Rosasco, A. Caponnetto, M. Piana, and A. Verri. Some properties of regularized kernel methods. Journal of Machine Learning Research, 5: , [6] A. Destrero, S. Mosci, C. D. Mol, A. Verri, and F. Odone. Feature selection for high dimensional data. Computational Management Science, to appear. [7] L. Lo Gerfo, L. Rosasco, F. Odone, E. De Vito, and A. Verri. Spectral algorithms for supervised learning. Neural Computation, to appear. [8] M. Marinelli, S. Cuneo, B. Gianesin, A. Lavagetto, M. Lamagna, E. Oliveri, G. Sobrero, L. Terenzani, and G. Forni. Non-invasive measurement of iron overload in the human body. IEEE Transactions on applied superconductivity, 16(2), June [9] M. Marinelli, B. Gianesin, M. Lamagna, A. Lavagetto, E. Oliveri, M. Saccone, G. Sobrero, L. Terenzani, and G. Forni. Whole liver iron overload measurement by a non-cryogenic magnetic susceptometer. In Proc. of New Frontiers in Biomagnetism, Vancouver, Canada, [10] C. A. Micchelli and M. Pontil. On learning vector-valued functions. Neural Computation, 17, [11] R. M. Rifkin and R. A. Lippert. Notes on regularized least squares. Technical report, MIT Dspace [ (United States), [12] Y. Yao, L. Rosasco, and A. Caponnetto. On early stopping in gradient descent learning. Constructive Approximation, 26(2): , August [13] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2),
Regularization Algorithms for Learning
DISI, UNIGE Texas, 10/19/07 plan motivation setting elastic net regularization - iterative thresholding algorithms - error estimates and parameter choice applications motivations starting point of many
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationRegularization via Spectral Filtering
Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationDipartimento di Fisica
Dipartimento di Fisica Multi-Output Learning with Spectral Filters by Luca Baldassarre DIFI, Università di Genova Via Dodecaneso 33, 16146 Genova, Italy http://www.fisica.unige.it/ Dottorato di Ricerca
More informationTUM 2016 Class 3 Large scale learning by regularization
TUM 2016 Class 3 Large scale learning by regularization Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x n, y n ) Beyond
More informationSpectral Regularization
Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationOnline Gradient Descent Learning Algorithms
DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More information2 Tikhonov Regularization and ERM
Introduction Here we discusses how a class of regularization methods originally designed to solve ill-posed inverse problems give rise to regularized learning algorithms. These algorithms are kernel methods
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationSpectral Filtering for MultiOutput Learning
Spectral Filtering for MultiOutput Learning Lorenzo Rosasco Center for Biological and Computational Learning, MIT Universita di Genova, Italy Plan Learning with kernels Multioutput kernel and regularization
More informationSpectral Algorithms for Supervised Learning
LETTER Communicated by David Hardoon Spectral Algorithms for Supervised Learning L. Lo Gerfo logerfo@disi.unige.it L. Rosasco rosasco@disi.unige.it F. Odone odone@disi.unige.it Dipartimento di Informatica
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationMathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationSUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS
SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS Filippo Portera and Alessandro Sperduti Dipartimento di Matematica Pura ed Applicata Universit a di Padova, Padova, Italy {portera,sperduti}@math.unipd.it
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationSample questions for Fundamentals of Machine Learning 2018
Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationLecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012
CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationAre Loss Functions All the Same?
Are Loss Functions All the Same? L. Rosasco E. De Vito A. Caponnetto M. Piana A. Verri November 11, 2003 Abstract In this paper we investigate the impact of choosing different loss functions from the viewpoint
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More information9.520 Problem Set 2. Due April 25, 2011
9.50 Problem Set Due April 5, 011 Note: there are five problems in total in this set. Problem 1 In classification problems where the data are unbalanced (there are many more examples of one class than
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLinear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationVC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms
03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationRelevance Vector Machines for Earthquake Response Spectra
2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationTutorial on Machine Learning for Advanced Electronics
Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling
More informationCS168: The Modern Algorithmic Toolbox Lecture #6: Regularization
CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationOnline Learning With Kernel
CS 446 Machine Learning Fall 2016 SEP 27, 2016 Online Learning With Kernel Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Stochastic Gradient Descent Algorithms Regularization Algorithm Issues
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationNeural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science
Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationResampling techniques for statistical modeling
Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error
More informationDELFT UNIVERSITY OF TECHNOLOGY
DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationAn introduction to Support Vector Machines
1 An introduction to Support Vector Machines Giorgio Valentini DSI - Dipartimento di Scienze dell Informazione Università degli Studi di Milano e-mail: valenti@dsi.unimi.it 2 Outline Linear classifiers
More informationNotes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2007-025 CBCL-268 May 1, 2007 Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert massachusetts institute
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationSVM optimization and Kernel methods
Announcements SVM optimization and Kernel methods w 4 is up. Due in a week. Kaggle is up 4/13/17 1 4/13/17 2 Outline Review SVM optimization Non-linear transformations in SVM Soft-margin SVM Goal: Find
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More information