Kernel Methods with Imbalanced Data and Applications

Size: px
Start display at page:

Download "Kernel Methods with Imbalanced Data and Applications"

Transcription

1 Kernel Methods with Imbalanced Data and Applications Theodore B. Trafalis Laboratory of Optimization and Intelligent Systems School of Industrial Engineering University of Oklahoma 1 st Hellenic Forum for Science, Technology and Innovation N.C.S.R. Demokritos, Athens, Greece, July 17-19, 2013 Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 1 / 70

2 Part I Kernel Methods for Imbalanced Data and Application to Tornado Prediction Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 2 / 70

3 Imbalanced Data What is an Imbalanced Data Problem? 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 3 / 70

4 Imbalanced Data What is an Imbalanced Data Problem? Imbalanced Data Problems and their Importance The problem of learning from an imbalanced data set occurs when the number of samples in one class is significantly greater than that of the other class. Imbalanced data is very important in data mining and data classification. Examples of imbalanced data sets include: Fraudulent credit card transactions. Telecommunication equipment failures. Oil spills from satellite images. Tornado, earthquake and landslide occurrences. Cancer and health science data. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 4 / 70

5 Imbalanced Data What is an Imbalanced Data Problem? Example of the Classification of Imbalanced Data Source: Tang et al. SVMs Modeling for Highly Imbalanced Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(1): , Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 5 / 70

6 Imbalanced Data What is an Imbalanced Data Problem? Between Class and Within Class Imbalances Source: He and Garcia. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9): , Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 6 / 70

7 Imbalanced Data Impact of Imbalanced Data on Learning Machines 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 7 / 70

8 Imbalanced Data Impact of Imbalanced Data on Learning Machines Impact of Imbalanced Data Problems in Classification Classifiers tend to provide an imbalanced degree of accuracy with the majority class having close to 100 percent accuracy, and the minority class having an accuracy close to 0-10 percent. In the tornado data set, for example, a 10 percent accuracy for the minority class suggests that 72 tornadoes would be classified as nontornadoes. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 8 / 70

9 Imbalanced Data Impact of Imbalanced Data on Learning Machines Illustration of Impact of Imbalanced Classification On the left, the accuracy of the minority class is zero percent. On the right, the accuracy for the minority class is 80 percent. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 9 / 70

10 Imbalanced Data State of the Art Techniques for Imbalanced Learning 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 10 / 70

11 Imbalanced Data State of the Art Techniques for Imbalanced Learning Current Approaches for Imbalanced Learning I Algorithm Level Threshold method. Learn only the minority class. Cost-sensitive approaches. Data Level Random under-sampling and over-sampling. Uninformed under-sampling (EasyEnsemble, BalanceCascade). Synthetic sampling with data generation (SMOTE). Adaptive synthetic sampling (Borderline-SMOTE, ADASYN). Sampling with data cleaning (OSS method, CNN+Tomek Links integration method, Neighborhood Cleaning rule, SMOTE+ENN, SMOTE+Tomek). Cluster-based sampling (CBO). Integration of sampling and boosting (SMOTEBoost). Kernel-based approaches Variations of Support Vector Machines (SVM). Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 11 / 70

12 Imbalanced Data State of the Art Techniques for Imbalanced Learning Current Approaches for Imbalanced Learning II Kernel Logistic Regression. Evaluation metrics. Metrics used to evaluate accuracies. Receiver Operating Characteristic (ROC), Precision-Recall (PR) and Cost Curves. Singular assessment metrics based on the confusion or multi-class cost matrix (F-measure, G-mean, etc). Source: He and Garcia. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9): , Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 12 / 70

13 Imbalanced Data State of the Art Techniques for Imbalanced Learning Problems with Imbalanced Data Prediction Improper evaluation metrics. Lack of data: absolute rarity. Number of observations is small in absolute sense. Relative lack of data: relative rarity. Relative to other events. Data fragmentation. Absolute lack of data within a single partition. Inappropriate inductive bias. Such as an assumption of linearity. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 13 / 70

14 Kernel Methods General Description of Kernel Methods 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 14 / 70

15 Kernel Methods General Description of Kernel Methods Historical Perspective Efficient algorithms for detecting linear relations were used in the 1950s and 1960s (perceptron algorithm). Handling nonlinear relationships was seen as major research goal at that time but the development of nonlinear algorithms with the same efficiency and stability has proven as an elusive goal. In the mid 80s, the field of pattern analysis underwent a nonlinear revolution with backpropagation neural networks (NNs) and decision trees (based on heuristics and lacking a firm theoretical foundation, local minima problems, nonconvexity). In the mid 90s, kernel based methods have been developed while retaining the guarantees and understanding that have been developed for linear algorithms. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 15 / 70

16 Kernel Methods General Description of Kernel Methods Overview I Kernel Methods are a new class of machine learning algorithms which can operate on very general types of data and can detect very general types of relations (e.g., Potential function method; Aizerman et al., 1964, Vapnik, 1982, 1995). Correlation, factor, cluster and discriminant analysis are some of the types of machine learning analysis tasks that can be performed on data as diverse as sequences, text, images, graphs and vectors using kernels. Kernel methods provide also a natural way to merge and integrate different types of data. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 16 / 70

17 Kernel Methods General Description of Kernel Methods Overview II Kernel methods offer a modular framework. In a first step, a dataset is processed into a kernel matrix. Data can be of various types and also of heterogeneous types. In a second step, a variety of kernel algorithms can be used to analyze the data, using only the information contained in the kernel matrix. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 17 / 70

18 Kernel Methods General Description of Kernel Methods Modular Framework Source: J. Shawe-Taylor and N. Cristianini, Kernel methods for pattern analysis, Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 18 / 70

19 Kernel Methods General Description of Kernel Methods Basic Idea of Kernel Methods Kernel Methods work by: Embedding data in a vector space called feature space using a kernel function. Looking for linear relations in such a space. Much of the geometry of the data in the embedding space (relative positions) is contained in all pair-wise inner products (information bottleneck). We can work in feature space by specifying an inner product function k between points in it. In many cases, inner product in the embedding space (feature space) is very cheap to compute. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 19 / 70

20 Kernel Methods General Description of Kernel Methods Properties of Kernels I Definition (Mercer kernel) Let E be any set. A function k : E E R that is continuous, symmetric and finitely positive semi-definite is called here a Mercer kernel. Definition (Finitely positive semi-definiteness) A function k : E E R, where E is any set, is a finitely positive semi-definite kernel if m m i=1 j=1 k(x i,x j )λ i λ j 0, for any m N, λ i R, x i E and i 1,m. It can be seen as the generalization of a positive semi-definite matrix. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 20 / 70

21 Kernel Methods General Description of Kernel Methods Properties of Kernels II Definition (RKHS) A Reproducing Kernel Hilbert Space (RKHS) F is a Hilbert space of complexvalued functions on a set E for which there exists a function k : E E C (the reproducing kernel) such that k(,x) F for any x E and such that f,k(,x) = f (x) for all f F (reproducing property). If k is a symmetric positive definite kernel then, by the Moore-Aronszajn s theorem, there is a unique RKHS with k as the reproducing kernel. A symmetric positive definite kernel k can be expressed as a dot product k : (x,y) φ(x),φ(y), where φ is a map from R n to a RKHS H (kernel trick). Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 21 / 70

22 Kernel Methods General Description of Kernel Methods Properties of Kernels III Properties For any x 1,...,x l the l l matrix K with entries K ij = k(x i,x j ) is symmetric and positive semi-definite. The matrix K is called kernel matrix. A kernel k can be expressed as k : (x,y) φ(x),φ(y), where φ is a map from R n to a Hilbert space H (kernel trick). The space H is called the feature space. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 22 / 70

23 Kernel Methods General Description of Kernel Methods Properties of Kernels IV The image of R d by φ is a manifold S in H. Kernels can be interpreted as measures of distance and measures of angles on S. Simple geometric relations between S and hyperplanes of H can give complex forms in R d. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 23 / 70

24 Kernel Methods Support Vector Machines Applied to Imbalanced Data 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 24 / 70

25 Kernel Methods Support Vector Machines Applied to Imbalanced Data Support Vector Machines Definition (SVM) Support Vector Machines are a family of learning algorithms that use kernel methods to solve supervised learning problems. Common supervised learning tasks concern problems of classification and regression. SVMs work by solving Quadratic Programming problems that aim to minimize the generalization error. If we are given a set S of l points x i R n where each x i belongs to either of two classes defined by y i { 1,+1}, then the objective is to find a hyperplane that divides S leaving all the points of the same class on the same side while maximizing the minimum distance between either of the two classes and the hyperplane [Vapnik 1995]. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 25 / 70

26 Kernel Methods Support Vector Machines Applied to Imbalanced Data Optimal Separating Hyperplane Source: Microsoft Research. Vision, Graphics, and Visualization Group, Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 26 / 70

27 Kernel Methods Support Vector Machines Applied to Imbalanced Data Dual problem in nonlinear case The optimal hyperplane is obtained by solving the following Quadratic Programming (QP) problem: { min α t Kα/2 1,α : α,y = 0 and 0 α C }. α R l This QP problem is the dual formulation of a QP problem that maximizes the margin of separation between the sets of points in the feature space. Given a solution α, the optimal hyperplane is expressed as {x R n : l i=1 α i y i k(x i,x)+b = 0} where b is computed using the complementary slackness conditions of the primal formulation. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 27 / 70

28 Kernel Methods Support Vector Machines Applied to Imbalanced Data Binary Classification of Imbalanced Data with SVMs Binary classification of imbalanced data needs a rewritting of the primal SVM problem, namely: min w,b,ξ subject to 1 2 w,w H + C 1 ξ i + C 1 ξ i ( y i =1 y i = 1 y i w,φ(xi ) H + b ) 1 ξ i, i [1,l] ξ i 0, i [1,l] C 1 is the trade-off coefficient for the minority class and C 1 is the tradeoff coefficient for the majority class. For imbalanced data, we wish to have C 1 < C 1 i.e. the penalty for outliers in the minority class is greater than the one for the majority class. This approach is strongly related to robust classification. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 28 / 70

29 Kernel Methods Support Vector Machines Applied to Imbalanced Data Illustration of SVM Training with Imbalanced Data Source: Tang et al. SVMs Modeling for Highly Imbalanced Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(1): , Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 29 / 70

30 Kernel Methods Support Vector Machines Applied to Imbalanced Data One-Class SVM for Anomaly Detection Anomaly detection is equivalent to building an enclosure around a cloud of points which are coding non-anomalous objects in order to separate them from outliers which represent anomalies. This problem is known as the soft minimal hypersphere problem and, for points mapped in the feature space H, it is expressed as min c,r,ξ subject to r 2 + C l i=1 ξ i φ(x i ) c 2 H r 2 + ξ i, i [1,l] ξ i 0, i [1,l] Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 30 / 70

31 Kernel Methods Support Vector Machines Applied to Imbalanced Data Example of a Soft Minimal Enclosing Hypersphere Sublevel sets for different values of the constant γ. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 31 / 70

32 Kernel Methods Support Vector Machines Applied to Imbalanced Data Drawbacks of SVMs The soft-margin maximization paradigm minimizes the total error, which in return introduces a bias toward the majority class. Offline calculations. Unsuitable for processing data streams. Inadequate for large problems (except when using heuristics such as Platt s Sequential Minimal Optimization). Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 32 / 70

33 Application to Tornado Prediction Description of the Experiments 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 33 / 70

34 Application to Tornado Prediction Description of the Experiments Tornado Experiments I The data were randomly divided into two sets: training/validation and independent testing. In the complete training/validation set, there are 361 cases of tornadic observations and 5048 cases of non-tornadic observations from 59 storm days. In the independent testing set, there are 360 tornadic observations and 5047 non-tornadic observations from 52 storm days. The percentage of tornadic observations in each data set is 6.7%. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 34 / 70

35 Application to Tornado Prediction Description of the Experiments Tornado Experiments II Cross validations were applied with different combinations of kernel functions (linear, polynomial and Gaussian radial basis function) and parameter values on the training/validation set. Each classifier is tested on the test observations drawn randomly with replacement using bootstrap resampling (Efron and Tibshirani, 1993) with 1000 replications on the independent testing set to establish confidence intervals. The best support vector solution is chosen for which the classifier has the highest mean Critical Success Index (Hit/(Hit + Miss + False Alarms)) on the validation set. The best classifier uses the Gaussian radial basis function kernel with radius of We apply these optimal parameters to predict the outcomes of the testing set. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 35 / 70

36 Application to Tornado Prediction Dr. T. Trafalis (University of Oklahoma) Description of the Experiments Kernel Methods with Imbalanced Data 1st Hellenic Forum for Science 36 / 70

37 Application to Tornado Prediction Results for the Tornado Data Set 1 Imbalanced Data What is an Imbalanced Data Problem? Impact of Imbalanced Data on Learning Machines State of the Art Techniques for Imbalanced Learning 2 Kernel Methods General Description of Kernel Methods Support Vector Machines Applied to Imbalanced Data 3 Application to Tornado Prediction Description of the Experiments Results for the Tornado Data Set Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 37 / 70

38 Application to Tornado Prediction Results for the Tornado Data Set Results for the Tornado Data Set Results computed from the binary confusion matrix with a 95% confidence interval. Measure Validation Test POD 57% ± 13% 57% ± 13% FAR 18% ± 10% 31% ± 14% CSI 50% ± 10% 45% ± 12% Bias 69% ± 21% 83% ± 20% HSS 62% ± 9% 60% ± 11% POD: probability of detection (hit/(hit + correct null)); FAR: false alarm rate (false alarm/(hit + false alarm)); CSI: critical success ratio (hit/(hit + false alarm + miss)); Bias ((hit + false alarm)/(hit + miss)); HSS: Heidke skill score. Source: I. Adrianto, T. B. Trafalis, and V. Lakshmanan. Support vector machines for spatiotemporal tornado prediction. International Journal of General Systems, 38(7): , 2009 Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 38 / 70

39 Part II Dynamic Forecasting Using Kernel Methods Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 39 / 70

40 Filtering with Kernel Methods Key Notions 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 40 / 70

41 Filtering with Kernel Methods Key Notions Objectives of Dynamic Forecasting Using Kernel Methods Dynamical systems Physical systems are mathematically represented by states in some abstract space. Transitions between states are modeled using transition functions over the state space in order to simulate the system dynamics. Objectives To provide an alternative to Kalman filtering to predict the future states of nonlinear dynamical systems. To use machine learning techniques and kernel methods in order to build nonlinear state predictors. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 41 / 70

42 Filtering with Kernel Methods Key Notions Kalman Filtering Definition (Kalman Filter) Given a sequence of perturbed measurements, a Kalman Filter is a process that estimates the states of a dynamical system. We will consider only differentiable real-time nonlinear dynamical systems (to which correspond nonlinear Kalman filters). The state transition and observation models of the nonlinear dynamical system are t x(t) = f (x,u,t) + w(t) and z(t) = h(x,t) + v(t), where x is the state of the system, z is the observation, f is the state transition function, h is the observation model, u is the control and (w,v) are the (Gaussian) noise. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 42 / 70

43 Filtering with Kernel Methods Key Notions Example: radar tracking From Pattern Recognition and Machine Learning by C. M. Bishop. Blue points: true positions; Green points: noisy observations; Red crosses: forecasts. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 43 / 70

44 Filtering with Kernel Methods Key Notions Kalman Filter and Kernel Methods: Comparison of Assumptions Unlike the linear Kalman Filter, the nonlinear variants do not necessarily give an optimal state estimator. The filter may also diverge if the initial estimate is wrong or if the model is incorrect. For Kalman filters, the process must be Markovian, the perturbations must be independent and they must follow a Gaussian distribution. Implementations must face problems related to matrix storage, matrix inversion and/or matrix factorization. Kernel methods need no statistical assumptions on the process noise and they work with both Markovian and non-markovian processes. Storage and computational requirements are modest. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 44 / 70

45 Filtering with Kernel Methods Approach Outline 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 45 / 70

46 Filtering with Kernel Methods Approach Outline Assimilation and Forecasting with Kernel Methods 1. Assimilation The assimilation step attempts to recover the unperturbed system states from the current and past observations using kernel-based regression techniques. Kernel methods are removing noise from states trajectories and are updating them from the previous forecasts. 2. Forecasting The last assimilated state can be used as an initial estimate for one iteration of a nonlinear Kalman filter. A polynomial predictive analysis on the last recorded state trajectories using a Lagrange interpolation with Chebyshev nodes can provide reliable extrapolations. The generalization property of the SVM regression function can be used to estimate the next future state. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 46 / 70

47 Filtering with Kernel Methods Approach Outline Advantages and Shortcomings of Kernel Methods Advantages Low memory requirements (some kernels require only O(n) elements to be stored in memory). Acceptable computational complexity (of the order of O(n 2 ), data thinning can reduce the size of the input data set). Massive parallelization (can be applied separately to each trajectory). No statistical assumptions, no state transition model necessary. Can be combined with a Kalman filter if necessary. Shortcomings Estimation of some kernel parameters. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 47 / 70

48 Filtering with Kernel Methods Assimilation with Kernel Methods 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 48 / 70

49 Filtering with Kernel Methods Assimilation with Kernel Methods Interpolating States Trajectories without Model Main Idea Illustration Find a non-trivial function f such as for every given sample (t i,x i ) R 2 we have f (t i ) = x i (or f (t i ) is in an interval centered at x i and of half-width ε 0). The interpolation function is expressed using an affine combination of kernel-based functions k(t, ). The positive semi-definite matrix K with entries K ij = k(t i,t j ) is called the kernel matrix. Kalman filtering works differently. Contrary to this approach, no state trajectory interpolation takes place with KFs. Furthermore KFs absolutely need a model. Skip to Conclusions Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 49 / 70

50 Filtering with Kernel Methods Assimilation with Kernel Methods Fitting Functions The non-trivial function f such that f (t i ) = x i is chosen to belong to the function class: { F = t R l i=1 α i k(t i,t) + b R : α t Kα B 2 }. The Rademacher complexity of F measures the capability of the functions of F to fit random data with respect to a probability distribution generating this data. The empirical Rademacher complexity of F, denoted by ˆR(F ), is such that ˆR(F ) 2B tr(k)/l + 2 b / l. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 50 / 70

51 Filtering with Kernel Methods Assimilation with Kernel Methods Minimizing the Generalization and the Empirical Errors We use ˆR(F ) to control the upper bound of the generalization error of the interpolation function. Small empirical errors and ˆR(F ) contribute to decrease this bound, therefore: Aim We need to minimize the absolute value of b. Also we have α t Kα K α 2. Thus minimizing α t α + b 2 contributes to a smaller ˆR(F ); The empirical error defined by ( l i=1 ξ i ) /l, where the ξ i s are the differences between targets and desired outputs, should be minimized. To minimize the quantity α t α + b 2 + Cξ t ξ with C > 0. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 51 / 70

52 Filtering with Kernel Methods Assimilation with Kernel Methods Optimization Problem Introducing tolerances ρ i > 0, the empirical errors ξ i are equal to f (t i ) x i. That is the only constraints associated with the previous objective function. Hence the previous calculations lead to: Optimization Problem for Data Assimilation (Gilbert et al., 2010) { min α t α + b 2 + Cξ t ξ : Kα + b1 l x = ξ }. (α,ξ,b) R 2l+1 The solution of this optimization problem is: Analytical Solution ( ) K 2 + I l /C + 1 l 1 t l d = x, α = Kd, b = 1 t l d. The solutions α and b describe the regression function (that belongs to the function class F ) which interpolates state trajectories during assimilation. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 52 / 70

53 Filtering with Kernel Methods Assimilation with Kernel Methods Computational details To compute the solution d of the linear system ( K 2 + I l /C + 1 l 1 t l) d = x, we define σ = 1/ C, the matrix A = K + σi l (A is symmetric and positive definite) and the following sequences: Aũ 0 = x, Au 0 = ũ 0, Aũ n = u n, Au n+1 = 2σ(u n σũ n ). Aṽ 0 = 1 l, Av 0 = ṽ 0, Aṽ n = v n, Av n+1 = 2σ(v n σṽ n ). We set u = u n and v = v n. Both series are rapidly convergent and they n 0 n 0 are truncated at a step m > 0 in practical problems. Once m is determined we then have d = u 1 l,u l,v v. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 53 / 70

54 Filtering with Kernel Methods Assimilation with Kernel Methods Approach Summary We have chosen a class of functions in order to interpolate state trajectories without a model. We defined a fitness measure for this class of functions and linked it to the parameters describing a function in that class. We defined an optimization problem that aims to minimize empirical errors and maximize the fitness of the function interpolating state trajectories. An analytical solution of the optimization problem was derived and its computation was reduced to solving a sequence of linear systems. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 54 / 70

55 Applications to Meteorology Experimental Setup 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 55 / 70

56 Applications to Meteorology Experimental Setup Experimental Setup Machine and Software All codes were implemented on MATLAB 7.9 using a 2002 DELL Precision Workstation 530 with two 2.4 GHz Intel Xeon processors and 2 GiB of RAM. EnKF forecasts were generated with the EnKF Matlab toolbox version 0.23 by Pavel Sakov (available at Evensen s webpage at enkf.nersc.no). Experimental Models The kernel approach was tested on the Lorenz 96 model and the Quasi- Geostrophic 1.5-layer reduced gravity model. Forecasts were obtained using a combination of polynomial predictive analysis and kernel-based extrapolation during the assimilation stage. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 56 / 70

57 Applications to Meteorology Lorenz 96 Model 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 57 / 70

58 Applications to Meteorology Lorenz 96 Model Lorenz 96 Model: Description Introduced by Lorenz and Emanuel (1998). The state transition model is representing the values of atmospheric quantities at discrete locations spaced equally on a latitude circle (1D problem). The state transition model at a location i on the latitude circle is: t x i = (x i+1 x i 2 )x i 1 } {{ } advection x i }{{} dissipation + }{{} F. external forcing The states represent an unspecified scalar meteorological quantity, e.g. vorticity or temperature (Lorenz and Emanuel). This model was introduced in order to select which locations on a latitude circle are the most effective in improving weather assimilation and forecasts. Observations where generated with an error variance of 1. The external forcing was set to F = 8. The system showed a chaotic behavior. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 58 / 70

59 Applications to Meteorology Lorenz 96 Model Lorenz 96 Model: Assimilation The following figures illustrate how kernel methods remove the observational noise and interpolate state trajectories during assimilation. The used kernel is a Gaussian RBF kernel with σ = 3δ t. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 59 / 70

60 Applications to Meteorology Lorenz 96 Model Lorenz 96 Model: Forecast Errors Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 60 / 70

61 Applications to Meteorology Quasi-Geostrophic Model 4 Filtering with Kernel Methods Key Notions Approach Outline Assimilation with Kernel Methods 5 Applications to Meteorology Experimental Setup Lorenz 96 Model Quasi-Geostrophic Model Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 61 / 70

62 Applications to Meteorology Quasi-Geostrophic Model Quasi-Geostrophic Model: Description It is an atmospheric dynamical model involving an approximation of actual winds. It is used in the analysis of large scale extratropical weather systems. System states are scalar quantities representing the air flow. Horizontal winds are replaced by their geostrophic values in the horizontal acceleration terms of the momentum equations, and horizontal advection in the thermodynamic equation is approximated by geostrophic advection. Furthermore, vertical advection of momentum is neglected. It is a 2D problem: the atmosphere has a single level in the vertical and was represented by a square grid where each state is located on a node of that grid. Observations where generated with an error variance of 1. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 62 / 70

63 Applications to Meteorology Quasi-Geostrophic Model Quasi-Geostrophic Model: Assimilation Example Despite the absence of dynamical model, the kernel interpolation obtained from the noisy observations closely matches the true states. Return to the Approach Outline Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 63 / 70

64 Applications to Meteorology Quasi-Geostrophic Model Quasi-Geostrophic Model: Forecast Example Kernel and EnKF forecasts at time step 51. The EnKF forecast is 8 units away from the true state while the kernel forecast is 0.5 units away. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 64 / 70

65 Applications to Meteorology Quasi-Geostrophic Model Quasi-Geostrophic Model: Forecast Errors Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 65 / 70

66 Applications to Meteorology Quasi-Geostrophic Model Conclusions I What was Achieved? A viable kernel-based approach for data assimilation and forecasting has been introduced for nonlinear dynamical systems. It showed predictable performances on meteorological models ranging from equivalent to that of EnKF with 50 ensemble members to exceeding that of EnKF with 100 ensemble members, with lower error as an inverse function of the amount of chaos present in the dynamical system. Encouraging results in removing observational noise and interpolating state trajectories were obtained. They represent an improvement with respect to standard EnKF based on less than 20 ensembles. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 66 / 70

67 Applications to Meteorology Quasi-Geostrophic Model Conclusions II Future Work We are currently applying these techniques to financial and petroleum engineering problems with the same type of multi-dimensional time series. We are developing approaches that identify independent factors influencing the shape of multi-dimensional time series in a nonlinear fashion. The same tools will be used for the prediction of rare events and their magnitude. Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 67 / 70

68 Questions? Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 68 / 70

69 For Further Reading R. C. Gilbert, M. B. Richman, L. M. Leslie, and T. B. Trafalis. Kernel Methods for Data Driven Numerical Modeling. Monthly Weather Review, submitted, E. N. Lorenz and K. A. Emanuel. Optimal sites for supplementary weather observations: simulation with a small model. Journal of the Atmospheric Sciences, 55(3): , P. Sarma and W. H. Chen. Generalization of the Ensemble Kalman Filter Using Kernels for Nongaussian Random Fields. In SPE Reservoir Simulation Symposium Proceedings, Society of Petroleum Engineers. F. Steinke and B. Schölkopf. Kernels, regularization and differential equations. Pattern Recognition, 41(11): , J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University Press, Cambridge, UK, V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, NY, USA, Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 69 / 70

70 Dr. T. Trafalis (University of Oklahoma) Kernel Methods with Imbalanced Data 1 st Hellenic Forum for Science 70 / 70

Active Learning with Support Vector Machines for Tornado Prediction

Active Learning with Support Vector Machines for Tornado Prediction International Conference on Computational Science (ICCS) 2007 Beijing, China May 27-30, 2007 Active Learning with Support Vector Machines for Tornado Prediction Theodore B. Trafalis 1, Indra Adrianto 1,

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Support Vector Machines Explained

Support Vector Machines Explained December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Learning with kernels and SVM

Learning with kernels and SVM Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Robust Kernel-Based Regression

Robust Kernel-Based Regression Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Machine Learning : Support Vector Machines

Machine Learning : Support Vector Machines Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Nearest Neighbors Methods for Support Vector Machines

Nearest Neighbors Methods for Support Vector Machines Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad

More information

An introduction to Support Vector Machines

An introduction to Support Vector Machines 1 An introduction to Support Vector Machines Giorgio Valentini DSI - Dipartimento di Scienze dell Informazione Università degli Studi di Milano e-mail: valenti@dsi.unimi.it 2 Outline Linear classifiers

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber. CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform

More information

Linear Classification and SVM. Dr. Xin Zhang

Linear Classification and SVM. Dr. Xin Zhang Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning

More information

9.2 Support Vector Machines 159

9.2 Support Vector Machines 159 9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

Analysis of Multiclass Support Vector Machines

Analysis of Multiclass Support Vector Machines Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern

More information

Machine Learning 2010

Machine Learning 2010 Machine Learning 2010 Michael M Richter Support Vector Machines Email: mrichter@ucalgary.ca 1 - Topic This chapter deals with concept learning the numerical way. That means all concepts, problems and decisions

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Constrained Optimization and Support Vector Machines

Constrained Optimization and Support Vector Machines Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2

More information

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines Kernel Methods & Support Vector Machines Mahdi pakdaman Naeini PhD Candidate, University of Tehran Senior Researcher, TOSAN Intelligent Data Miners Outline Motivation Introduction to pattern recognition

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

A Tutorial on Support Vector Machine

A Tutorial on Support Vector Machine A Tutorial on School of Computing National University of Singapore Contents Theory on Using with Other s Contents Transforming Theory on Using with Other s What is a classifier? A function that maps instances

More information

Non-linear Support Vector Machines

Non-linear Support Vector Machines Non-linear Support Vector Machines Andrea Passerini passerini@disi.unitn.it Machine Learning Non-linear Support Vector Machines Non-linearly separable problems Hard-margin SVM can address linearly separable

More information

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Discussion of Some Problems About Nonlinear Time Series Prediction Using ν-support Vector Machine

Discussion of Some Problems About Nonlinear Time Series Prediction Using ν-support Vector Machine Commun. Theor. Phys. (Beijing, China) 48 (2007) pp. 117 124 c International Academic Publishers Vol. 48, No. 1, July 15, 2007 Discussion of Some Problems About Nonlinear Time Series Prediction Using ν-support

More information

Support Vector Machine Regression for Volatile Stock Market Prediction

Support Vector Machine Regression for Volatile Stock Market Prediction Support Vector Machine Regression for Volatile Stock Market Prediction Haiqin Yang, Laiwan Chan, and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Lecture 9. Time series prediction

Lecture 9. Time series prediction Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Support Vector Machine and Neural Network Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 1 Announcements Due end of the day of this Friday (11:59pm) Reminder

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information