Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment
|
|
- Roland Lee
- 5 years ago
- Views:
Transcription
1 Enhancing Generalization Capability of SVM Classifiers ith Feature Weight Adjustment Xizhao Wang and Qiang He College of Mathematics and Computer Science, Hebei University, Baoding 07002, Hebei, China Abstract. It is ell recognized that support vector machines (SVMs ould produce better classification performance in terms of generalization poer. A SVM constructs an optimal separating hyper-plane through maximizing the margin beteen to classes in high-dimensional feature space. Based on statistical learning theory, the margin scale reflects the generalization capability to a great extent. The bigger the margin scale taes, the better the generalization capability of SVMs ill have. This paper maes an attt to enlarge the margin beteen to support vector hyper-planes by feature eight adjustment. The experiments demonstrate that our proposed techniques in this paper can enhance the generalization capability of the original SVM classifiers. Introduction Statistical learning theory (SLT [], a ne theory for small-sample learning problems, as introduced in the 960 s by Vapni et al, hich can deal ith the situation here the samples are limited. Most machine-learning methods perform irical ris minimization (ERM induction principle [], hich is effective hen samples are enough. Hoever, in most cases of real orld, the samples are limited. It is difficult to apply expected ris minimization [2] directly to classification problems. Most expected ris minimization problems are converted to minimize the irical ris. Unfortunately irical ris minimization is not alays equivalent to expected ris minimization. It implies that ERM cannot lead to a good generalization capability but expected ris minimization can. The statistical learning theory has shon a clear relationship beteen expected ris and irical ris, and shon that the generalization can be controlled by the capacity of learning machine. Support vector machines (SVMs are a ne classification technique based on SLT [2]. Due to its extraordinary generalization, SVMs have been a poerful tool for solving classification problems ith to classes. A SVM first maps the original input space into a high-dimensional feature space through some predefined nonlinear mapping and then constructs an optimal separating hyper-plane maximizing the margin beteen to classes in the feature space. Based on SLT, e no that, the bigger the margin scale taes, the better the generalization capability of SVMs ill have. Therefore e alays expect that the margin is as large as possible for to-class problems. M.Gh. Negoita et al. (Eds.: KES 2004, LNAI 323, pp , Springer-Verlag Berlin Heidelberg 2004
2 038 X. Wang and Q. He Feature eight learning, hich assigns a eight to each feature to indicate the importance degree of feature, is an extension of feature selection. This paper adopts a feature eight learning technique introduced in [4]. We expect to enlarge the margin beteen to support vector hyper-planes by using this technique for generalization capability improvement. The rest of this paper is organized as follos. In section 2, e revie the relationship beteen the margin of to hyper-planes and the generalization capability of SVMs. In section 3, the detailed technique of feature eight learning is introduced. To experiments, hich sho that the feature eight learning s effect to margin enlargement, are presented in section 4. Section 5 gives some remars and concludes our paper. 2 Relationship Beteen the Margin and the Generalization Capability Firstly, e consider a function, i.e., the groth function of the set of indicator functions [2] G ( l = ln sup N ( z, z2,, zl ( z,, zl here N ( z, z2,, zl evaluates ho many different separations of the given sample z, z 2,, z l can be done using functions from the set of indicator functions. From SLT, e no the folloing conclusion: any groth function satisfies G ( l = l ln 2 or l G ( l < h(ln + h (2 here h is an integer for hich G ( h = h ln 2 and G ( h + ( h + ln 2 (3 We no revie the definition of VC dimension [3]: The VC dimension of the set of indicator function Q (z,α,α equal h if the groth function is bounded by a logarithmic function ith coefficient h. VC dimension is a pivotal concept in the statistical learning theory. For a to-class classification problem, the -margin separating hyper-plane is defined in SLT []. The folloing theorems are valid for the set of -margin separating hyper-planes [2]. Theorem. Let vector x X belong to a sphere of radius R. then the set of - margin separating hyper-plane has the VC dimension h bounded by the inequality 2 R h min, n + 2 (4 From inequality (4, e no that larger the margin of the set of functions is, the more less VC dimension is.
3 Theorem 2. With probability at least Enhancing Generalization Capability of SVM Classifiers 039 η, the inequality Bε 4 R( α R( α R( α Bε (5 holds true, here R (α, R (α are the expected ris and the irical ris functional respectively [],l is the size of training set, h is VC dimension of the set of functions. ε is formulated in [2]. For simplicity, e rite (5 as follo: ( h R( α R( α + Φ l (6 Φ is called confidence interval, hich is monotonic increasing function of h, and monotonic decreasing function of l. Inequality (6 gives bounds on the generalization ability of learning machine. Our purpose is to minimize the left side of (6, i.e., R (α. Minimization of R (α implies the optimal generalization capability of learning machines. In fact, minimizing R (α is not feasible due to its integral formulation. Practically, in place of minimizing R (α, e do minimize R (α. Inequality (6 shos the relationship beteen R (α and R (α. Due to the existence of the second term of the right side of (6, it is clear that the minimization of R (α cannot guarantee the minimization of R (α. It is expected that the second term of the right side of (6 is as small as possible since the small confidence interval possibly implies the small actual ris R (α, i.e., possibly implies a good generalization capability. Noting that Φ is a monotonic increasing function of h hich is decreasing ith the increase of here is the margin of separating hyper-planes (see Inequation (4, enlarging possibly leads to an improvement of generalization capability of the learning machine. The traditional ERM principle only minimizes irical ris ithout considering confidence interval, so it perhaps cannot get good generalization capability. SVMs based on SLT perform structural ris minimization (SRM induction principle [][2], not only minimizes the irical ris, but also considers confidence interval by maximizing the margin beteen to classes in high-dimensional feature space. 3 Feature-Weight Learning Each feature is considered to have an importance degree called feature-eight. Featureeight assignment is an extension of feature selection. The latter has only either 0- eight or -eight value, hile the former can have eight values in the interval [0,]. The feature-eight learning depends on the similarity beteen samples. There are many ays to define the similarity measure, such as the related coefficient and Euclidean distance, etc. Here the similarity measure ρ is defined as follos:
4 040 X. Wang and Q. He ρ = (7 + β * d here β is a positive parameter in the interval [0,]. It can adjust the similarity degrees distributed around 0.5, i.e., β is required to satisfy the folloing: here 2 = 0.5 (8 nn ( + β d p< q d is commonly used Euclidean distance, and distance defined as follos: ( d is the eighted Euclidean d = ( x x 2 2 i j = s (9 here = (,, s is the feature-eight vector. Its component is the importance degree corresponding to each feature. Larger is, more important the th feature is. When = (,,, the space { ( d r } is a hyper-sphere ith radius r (called original space and ( d is d and ( ρ is axes ould be extended or shrun in accordance ith ρ. When (,,, it means that the. Thus the space { ( d r } is hyper-ellipse, called the transformed space. The loer the value of is, the higher the flattening extent is. A good partition should have the folloing property: the samples ithin one cluster are more similar and dissimilar samples are more separate, hich implies that the fuzziness of the partition is lo. Therefore e hope that by adjusting, ( ρ tends to one or zero if ρ is greater or less than 0.5, respectively. Based on above discussion, e learn the feature-eight value by minimizing an evaluation function E ( [4] defined as follos: ( ρ ρ ρ ρ 2 E ( = ( + ( nn ( 2 i j i here n is the number of samples. From equation (0, e can say that E ( decreases as the similarity degree ρ tends to 0 or if ρ < 0.5 or ρ > 0.5. Therefore e expect that feature-eight learning by minimizing E ( can lead to ( ρ close to 0 or, hich obviously can improve pq the performance of FCM. To minimize the evaluation function E (, e use the gradient descent technique. First of all, let be the change of, compute as follos: (0
5 Enhancing Generalization Capability of SVM Classifiers 04 = η here η is the learning rate. An appropriate value of η could speed up the convergence of the algorithm since too small η leads to lo computational efficiency but too big η results in divergence of the algorithm. Through one dimensional searching technique [5], η is determined by: ( E( = Min η,, E( η λ,, λ > 0 n λ n n n (2 The derivate of E ( can obtained from folloing: E ρ = ( 2 N( N j < i d ρ (3 d ρ d β = ( + β d 2 (4 d ( x = d x i The algorithm is described briefly as follos: ( Initialize all eight values ith. And solve β from equation (8; (2 Compute ( ρ by equation (7 and ( d by equation (9. (3 Let be the change of. Compute by (3-5; = η (4 If + 0 is satisfied,then = + ; (5 Go to (3 until E ( is less than a given threshold or the times of iteration reach the user specified number. j (5 4 Enlarging the Margin by Feather Weight Adjustment for Generalization Capability Improvement In this section, for the purpose of generalization capability improvement, e ould lie to experimentally demonstrate the enlargement of margin beteen to hyperplanes by the feature eight adjustment mentioned in previous section. We select to databases to complete the demonstration. The first one is the ell-non toy example, i.e., Iris database from UCI, hich includes 50 cases ith 3 classes. Since the SVM is only for to-class classification
6 042 X. Wang and Q. He problems, e delete the cases belonging to class one and the remaining 00 cases ith to classes are used to demonstrate. The second one, called Pima India Diabetes, is also from UCI. The to databases characters are shon in table. According to the feature eight learning algorithm given in the end of section 3, e can learn the eights for the selected to databases. Table 2 shos the result of feature eight leaning for the to selected databases. Applying SVM Toolbox ( to the original data of the to selected databases, one can obtain the optimal separating hyper-planes and the corresponding margin. Due to the limit of paper length, e omit the formulation of hyper-planes and margins. For details, one can refer to [][2]. Similarly, using the same method, one can evaluate the margins among the separating hyperplanes after the learned eighs are incorporated into the original databases. Tables 3 and 4 sho the size of margin of separating hyper-planes, the training accuracy, and the testing accuracy for the to databases, here 70% of the databases are randomly selected as the training sets and the remaining 30% as the testing sets. It is orth noting that the experimental results depend on the parameters chosen in the SVM Toolbox. From Tables 3 and 4, one can see that the margins are indeed enlarged. Hoever, the improvement for training and testing accuracy is not significant. We speculate that the reason is that ( the data is not enough for Iris database and (2 Pima database is non-linear separable very much. The further investigation to the generalization capability improvement is in progress. Table. The characters of databases Database Name Number of Number of Category of samples features features Iris 00 4 Numerical Pima Numerical Table 2. The results of feature-eight learning Database Name Results of feature-eight learning Iris 0.000,0.0002,.0,0.64 Pima , , , , , , , Table 3. Margin enlargement for Iris database Iris database Margin Training Testing Accuracy Accuracy Before feature eight adjustment After feature eight adjustment
7 Enhancing Generalization Capability of SVM Classifiers 043 Table 4. Margin enlargement for Pima database Pima database Margin Training Accuracy Testing Accuracy Before feature eight adjustment After feature eight adjustment Conclusions According to SLT, the enlargement of margin of separating hyper-plane can enhance the generalization capability of the learning machine. This paper maes an attt to enlarge the margin by an approach of feature eight adjustment. Initial experiments sho the approach s effectiveness. We have the folloing remars: ( Whether the feature eight learning approach can be mathematically proved to enlarge the margin? (2 Whether the approach has a difference beteen linear separable and non-linear separable data sets? (3 When evaluating the margin of separating hyper-plane, ho to choose the optimal values of parameters in the quadratic program? Acnoledgement This research is supported by the ey project of Chinese Educational Ministry (0306, the Natural Science Foundation of HeBei Province (60336, and the Doctoral Foundation of Hebei Educational Committee (B20036 References. Vapni V. N, The Nature of statistical learning theory, Ne Yor: Springer-Verlag, Vapni V. N, Statistical learning theory. Ne Yor: A Wiley-Interscience Publication, Vladimir N. Vapni, An Overvie of Statistical Learning Theory, IEEE Transactions on Neural Netors, 0, ( Basa J, De R. K, Pal S. K, Unsupervised feature selection using a neuro-fuzzy approach, Pattern Recognition Letters 9 ( Rao, Optimization theory and applications (Wiley eastern limited, 985.
Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationMax-Margin Ratio Machine
JMLR: Workshop and Conference Proceedings 25:1 13, 2012 Asian Conference on Machine Learning Max-Margin Ratio Machine Suicheng Gu and Yuhong Guo Department of Computer and Information Sciences Temple University,
More informationPreliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1
90 8 80 7 70 6 60 0 8/7/ Preliminaries Preliminaries Linear models and the perceptron algorithm Chapters, T x + b < 0 T x + b > 0 Definition: The Euclidean dot product beteen to vectors is the expression
More informationLinear Discriminant Functions
Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and
More informationPart 8: Neural Networks
METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as
More informationA ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT
Progress In Electromagnetics Research Letters, Vol. 16, 53 60, 2010 A ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT Y. P. Liu and Q. Wan School of Electronic Engineering University of Electronic
More informationNeural Networks. Associative memory 12/30/2015. Associative memories. Associative memories
//5 Neural Netors Associative memory Lecture Associative memories Associative memories The massively parallel models of associative or content associative memory have been developed. Some of these models
More informationMultilayer Neural Networks
Pattern Recognition Lecture 4 Multilayer Neural Netors Prof. Daniel Yeung School of Computer Science and Engineering South China University of Technology Lec4: Multilayer Neural Netors Outline Introduction
More informationLinear models and the perceptron algorithm
8/5/6 Preliminaries Linear models and the perceptron algorithm Chapters, 3 Definition: The Euclidean dot product beteen to vectors is the expression dx T x = i x i The dot product is also referred to as
More informationCHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum
CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum 1997 19 CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION 3.0. Introduction
More informationROBUST LINEAR DISCRIMINANT ANALYSIS WITH A LAPLACIAN ASSUMPTION ON PROJECTION DISTRIBUTION
ROBUST LINEAR DISCRIMINANT ANALYSIS WITH A LAPLACIAN ASSUMPTION ON PROJECTION DISTRIBUTION Shujian Yu, Zheng Cao, Xiubao Jiang Department of Electrical and Computer Engineering, University of Florida 0
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 In the previous lecture, e ere introduced to the SVM algorithm and its basic motivation
More informationA Modified ν-sv Method For Simplified Regression A. Shilton M. Palaniswami
A Modified ν-sv Method For Simplified Regression A Shilton apsh@eemuozau), M Palanisami sami@eemuozau) Department of Electrical and Electronic Engineering, University of Melbourne, Victoria - 3 Australia
More informationThe Application of Liquid State Machines in Robot Path Planning
82 JOURNAL OF COMPUTERS, VOL. 4, NO., NOVEMBER 2009 The Application of Liquid State Machines in Robot Path Planning Zhang Yanduo School of Computer Science and Engineering, Wuhan Institute of Technology,
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationThe Analytic Hierarchy Process for the Reservoir Evaluation in Chaoyanggou Oilfield
Advances in Petroleum Exploration and Development Vol. 6, No. 2, 213, pp. 46-5 DOI:1.3968/j.aped.1925543821362.1812 ISSN 1925-542X [Print] ISSN 1925-5438 [Online].cscanada.net.cscanada.org The Analytic
More informationCALCULATION OF STEAM AND WATER RELATIVE PERMEABILITIES USING FIELD PRODUCTION DATA, WITH LABORATORY VERIFICATION
CALCULATION OF STEAM AND WATER RELATIVE PERMEABILITIES USING FIELD PRODUCTION DATA, WITH LABORATORY VERIFICATION Jericho L. P. Reyes, Chih-Ying Chen, Keen Li and Roland N. Horne Stanford Geothermal Program,
More informationAnalysis of Nonlinear Characteristics of Turbine Governor and Its Impact on Power System Oscillation
Energy and Poer Engineering, 203, 5, 746-750 doi:0.4236/epe.203.54b44 Published Online July 203 (http://.scirp.org/journal/epe) Analysis of Nonlinear Characteristics of Turbine Governor and Its Impact
More informationOn the approximation of real powers of sparse, infinite, bounded and Hermitian matrices
On the approximation of real poers of sparse, infinite, bounded and Hermitian matrices Roman Werpachoski Center for Theoretical Physics, Al. Lotnikó 32/46 02-668 Warszaa, Poland Abstract We describe a
More informationMinimax strategy for prediction with expert advice under stochastic assumptions
Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction
More informationUC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes
UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 26: Probability and Random Processes Problem Set Spring 209 Self-Graded Scores Due:.59 PM, Monday, February 4, 209 Submit your
More informationUniversal Learning Technology: Support Vector Machines
Special Issue on Information Utilizing Technologies for Value Creation Universal Learning Technology: Support Vector Machines By Vladimir VAPNIK* This paper describes the Support Vector Machine (SVM) technology,
More informationAn artificial neural networks (ANNs) model is a functional abstraction of the
CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly
More informationMultilayer Feedforward Networks. Berlin Chen, 2002
Multilayer Feedforard Netors Berlin Chen, 00 Introduction The single-layer perceptron classifiers discussed previously can only deal ith linearly separable sets of patterns The multilayer netors to be
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationEarly & Quick COSMIC-FFP Analysis using Analytic Hierarchy Process
Early & Quick COSMIC-FFP Analysis using Analytic Hierarchy Process uca Santillo (luca.santillo@gmail.com) Abstract COSMIC-FFP is a rigorous measurement method that makes possible to measure the functional
More informationEcon 201: Problem Set 3 Answers
Econ 20: Problem Set 3 Ansers Instructor: Alexandre Sollaci T.A.: Ryan Hughes Winter 208 Question a) The firm s fixed cost is F C = a and variable costs are T V Cq) = 2 bq2. b) As seen in class, the optimal
More informationSolving Classification Problems By Knowledge Sets
Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose
More informationLong run input use-input price relations and the cost function Hessian. Ian Steedman Manchester Metropolitan University
Long run input use-input price relations and the cost function Hessian Ian Steedman Manchester Metropolitan University Abstract By definition, to compare alternative long run equilibria is to compare alternative
More informationHomework 6 Solutions
Homeork 6 Solutions Igor Yanovsky (Math 151B TA) Section 114, Problem 1: For the boundary-value problem y (y ) y + log x, 1 x, y(1) 0, y() log, (1) rite the nonlinear system and formulas for Neton s method
More informationA Generalization of a result of Catlin: 2-factors in line graphs
AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 72(2) (2018), Pages 164 184 A Generalization of a result of Catlin: 2-factors in line graphs Ronald J. Gould Emory University Atlanta, Georgia U.S.A. rg@mathcs.emory.edu
More informationTANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION. Shiliang Sun
TANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION Shiliang Sun Department o Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 200241, P. R.
More informationNetworks of McCulloch-Pitts Neurons
s Lecture 4 Netorks of McCulloch-Pitts Neurons The McCulloch and Pitts (M_P) Neuron x x sgn x n Netorks of M-P Neurons One neuron can t do much on its on, but a net of these neurons x i x i i sgn i ij
More informationElectromagnetics I Exam No. 3 December 1, 2003 Solution
Electroagnetics Ea No. 3 Deceber 1, 2003 Solution Please read the ea carefull. Solve the folloing 4 probles. Each proble is 1/4 of the grade. To receive full credit, ou ust sho all ork. f cannot understand
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationMachine Learning: Logistic Regression. Lecture 04
Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input
More informationSupport Vector Machines
Supervised Learning Methods -nearest-neighbors (-) Decision trees eural netors Support vector machines (SVM) Support Vector Machines Chapter 18.9 and the optional paper Support vector machines by M. Hearst,
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More information= Prob( gene i is selected in a typical lab) (1)
Supplementary Document This supplementary document is for: Reproducibility Probability Score: Incorporating Measurement Variability across Laboratories for Gene Selection (006). Lin G, He X, Ji H, Shi
More informationLecture 3a: The Origin of Variational Bayes
CSC535: 013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton The origin of variational Bayes In variational Bayes, e approximate the true posterior across parameters
More informationAverage Coset Weight Distributions of Gallager s LDPC Code Ensemble
1 Average Coset Weight Distributions of Gallager s LDPC Code Ensemble Tadashi Wadayama Abstract In this correspondence, the average coset eight distributions of Gallager s LDPC code ensemble are investigated.
More informationSemi-simple Splicing Systems
Semi-simple Splicing Systems Elizabeth Goode CIS, University of Delaare Neark, DE 19706 goode@mail.eecis.udel.edu Dennis Pixton Mathematics, Binghamton University Binghamton, NY 13902-6000 dennis@math.binghamton.edu
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationDeterministic convergence of conjugate gradient method for feedforward neural networks
Deterministic convergence of conjugate gradient method for feedforard neural netorks Jian Wang a,b,c, Wei Wu a, Jacek M. Zurada b, a School of Mathematical Sciences, Dalian University of Technology, Dalian,
More informationSMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines
vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,
More informationDay 4: Classification, support vector machines
Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far
More informationTowards a Theory of Societal Co-Evolution: Individualism versus Collectivism
Toards a Theory of Societal Co-Evolution: Individualism versus Collectivism Kartik Ahuja, Simpson Zhang and Mihaela van der Schaar Department of Electrical Engineering, Department of Economics, UCLA Theorem
More informationSupervised Learning Part I
Supervised Learning Part I http://www.lps.ens.fr/~nadal/cours/mva Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure
More informationWhy do Golf Balls have Dimples on Their Surfaces?
Name: Partner(s): 1101 Section: Desk # Date: Why do Golf Balls have Dimples on Their Surfaces? Purpose: To study the drag force on objects ith different surfaces, ith the help of a ind tunnel. Overvie
More informationSingle layer NN. Neuron Model
Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent
More informationAn Improved Driving Scheme in an Electrophoretic Display
International Journal of Engineering and Technology Volume 3 No. 4, April, 2013 An Improved Driving Scheme in an Electrophoretic Display Pengfei Bai 1, Zichuan Yi 1, Guofu Zhou 1,2 1 Electronic Paper Displays
More informationBRUTE FORCE AND INTELLIGENT METHODS OF LEARNING. Vladimir Vapnik. Columbia University, New York Facebook AI Research, New York
1 BRUTE FORCE AND INTELLIGENT METHODS OF LEARNING Vladimir Vapnik Columbia University, New York Facebook AI Research, New York 2 PART 1 BASIC LINE OF REASONING Problem of pattern recognition can be formulated
More informationBloom Filters and Locality-Sensitive Hashing
Randomized Algorithms, Summer 2016 Bloom Filters and Locality-Sensitive Hashing Instructor: Thomas Kesselheim and Kurt Mehlhorn 1 Notation Lecture 4 (6 pages) When e talk about the probability of an event,
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationSVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION
International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2
More informationMargin Distribution Logistic Machine
Margin Distribution Logistic Machine Abstract Yi Ding Sheng-Jun Huang Chen Zu Daoqiang Zhang Linear classifier is an essential part of machine learning, and improving its robustness has attracted much
More informationCriteria for the determination of a basic Clark s flow time distribution function in network planning
International Journal of Engineering & Technology IJET-IJENS Vol:4 No:0 60 Criteria for the determination of a basic Clark s flo time distribution function in netork planning Dusko Letić, Branko Davidović,
More informationCOS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007
COS 424: Interacting ith Data Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 Recapitulation of Last Lecture In linear regression, e need to avoid adding too much richness to the
More informationA turbulence closure based on the maximum entropy method
Advances in Fluid Mechanics IX 547 A turbulence closure based on the maximum entropy method R. W. Derksen Department of Mechanical and Manufacturing Engineering University of Manitoba Winnipeg Canada Abstract
More informationArtificial Neural Networks. Part 2
Artificial Neural Netorks Part Artificial Neuron Model Folloing simplified model of real neurons is also knon as a Threshold Logic Unit x McCullouch-Pitts neuron (943) x x n n Body of neuron f out Biological
More informationN-bit Parity Neural Networks with minimum number of threshold neurons
Open Eng. 2016; 6:309 313 Research Article Open Access Marat Z. Arslanov*, Zhazira E. Amirgalieva, and Chingiz A. Kenshimov N-bit Parity Neural Netorks ith minimum number of threshold neurons DOI 10.1515/eng-2016-0037
More informationIntroduction to Artificial Neural Network - theory, application and practice using WEKA- Anto Satriyo Nugroho, Dr.Eng
Introduction to Artificial Neural Netor - theory, application and practice using WEKA- Anto Satriyo Nugroho, Dr.Eng Center for Information & Communication Technology, Agency for the Assessment & Application
More informationSupport Vector Machine. Industrial AI Lab. Prof. Seungchul Lee
Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationLinear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26
Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines
More informationRegression coefficients may even have a different sign from the expected.
Multicolinearity Diagnostics : Some of the diagnostics e have just discussed are sensitive to multicolinearity. For example, e kno that ith multicolinearity, additions and deletions of data cause shifts
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data
More informationTraining the linear classifier
215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.
More informationResearch on Evaluation Method of Organization s Performance Based on Comparative Advantage Characteristics
Vol.1, No.10, Ar 01,.67-7 Research on Evaluation Method of Organization s Performance Based on Comarative Advantage Characteristics WEN Xin 1, JIA Jianfeng and ZHAO Xi nan 3 Abstract It as under the guidance
More informationLanguage Recognition Power of Watson-Crick Automata
01 Third International Conference on Netorking and Computing Language Recognition Poer of Watson-Crick Automata - Multiheads and Sensing - Kunio Aizaa and Masayuki Aoyama Dept. of Mathematics and Computer
More informationOutline of lectures 3-6
GENOME 453 J. Felsenstein Evolutionary Genetics Autumn, 013 Population genetics Outline of lectures 3-6 1. We ant to kno hat theory says about the reproduction of genotypes in a population. This results
More informationBusiness Cycles: The Classical Approach
San Francisco State University ECON 302 Business Cycles: The Classical Approach Introduction Michael Bar Recall from the introduction that the output per capita in the U.S. is groing steady, but there
More informationRandomized Smoothing Networks
Randomized Smoothing Netorks Maurice Herlihy Computer Science Dept., Bron University, Providence, RI, USA Srikanta Tirthapura Dept. of Electrical and Computer Engg., Ioa State University, Ames, IA, USA
More informationEnergy Minimization via a Primal-Dual Algorithm for a Convex Program
Energy Minimization via a Primal-Dual Algorithm for a Convex Program Evripidis Bampis 1,,, Vincent Chau 2,, Dimitrios Letsios 1,2,, Giorgio Lucarelli 1,2,,, and Ioannis Milis 3, 1 LIP6, Université Pierre
More informationCHARACTERIZATION OF ULTRASONIC IMMERSION TRANSDUCERS
CHARACTERIZATION OF ULTRASONIC IMMERSION TRANSDUCERS INTRODUCTION David D. Bennink, Center for NDE Anna L. Pate, Engineering Science and Mechanics Ioa State University Ames, Ioa 50011 In any ultrasonic
More informationMultisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues
Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification
More informationreader is comfortable with the meaning of expressions such as # g, f (g) >, and their
The Abelian Hidden Subgroup Problem Stephen McAdam Department of Mathematics University of Texas at Austin mcadam@math.utexas.edu Introduction: The general Hidden Subgroup Problem is as follos. Let G be
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationClassification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation.
Classification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation BY Imane Chatri A thesis submitted to the Concordia Institute for Information
More informationFORCE NATIONAL STANDARDS COMPARISON BETWEEN CENAM/MEXICO AND NIM/CHINA
FORCE NATIONAL STANDARDS COMPARISON BETWEEN CENAM/MEXICO AND NIM/CHINA Qingzhong Li *, Jorge C. Torres G.**, Daniel A. Ramírez A.** *National Institute of Metrology (NIM) Beijing, 100013, China Tel/fax:
More informationThe DMP Inverse for Rectangular Matrices
Filomat 31:19 (2017, 6015 6019 https://doi.org/10.2298/fil1719015m Published by Faculty of Sciences Mathematics, University of Niš, Serbia Available at: http://.pmf.ni.ac.rs/filomat The DMP Inverse for
More informationAnomaly Detection using Support Vector Machine
Anomaly Detection using Support Vector Machine Dharminder Kumar 1, Suman 2, Nutan 3 1 GJUS&T Hisar 2 Research Scholar, GJUS&T Hisar HCE Sonepat 3 HCE Sonepat Abstract: Support vector machine are among
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationGroup-invariant solutions of nonlinear elastodynamic problems of plates and shells *
Group-invariant solutions of nonlinear elastodynamic problems of plates and shells * V. A. Dzhupanov, V. M. Vassilev, P. A. Dzhondzhorov Institute of mechanics, Bulgarian Academy of Sciences, Acad. G.
More informationAnalytical Study of Biometrics Normalization and Fusion Techniques For Designing a Multimodal System
Volume Issue 8, November 4, ISSN No.: 348 89 3 Analytical Study of Biometrics Normalization and Fusion Techniques For Designing a Multimodal System Divya Singhal, Ajay Kumar Yadav M.Tech, EC Department,
More informationClassification of Ordinal Data Using Neural Networks
Classification of Ordinal Data Using Neural Networks Joaquim Pinto da Costa and Jaime S. Cardoso 2 Faculdade Ciências Universidade Porto, Porto, Portugal jpcosta@fc.up.pt 2 Faculdade Engenharia Universidade
More informationBivariate Uniqueness in the Logistic Recursive Distributional Equation
Bivariate Uniqueness in the Logistic Recursive Distributional Equation Antar Bandyopadhyay Technical Report # 629 University of California Department of Statistics 367 Evans Hall # 3860 Berkeley CA 94720-3860
More informationIntroduction To Resonant. Circuits. Resonance in series & parallel RLC circuits
Introduction To esonant Circuits esonance in series & parallel C circuits Basic Electrical Engineering (EE-0) esonance In Electric Circuits Any passive electric circuit ill resonate if it has an inductor
More informationGRADIENT DESCENT. CSE 559A: Computer Vision GRADIENT DESCENT GRADIENT DESCENT [0, 1] Pr(y = 1) w T x. 1 f (x; θ) = 1 f (x; θ) = exp( w T x)
0 x x x CSE 559A: Computer Vision For Binary Classification: [0, ] f (x; ) = σ( x) = exp( x) + exp( x) Output is interpreted as probability Pr(y = ) x are the log-odds. Fall 207: -R: :30-pm @ Lopata 0
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More informationAn Investigation of the use of Spatial Derivatives in Active Structural Acoustic Control
An Investigation of the use of Spatial Derivatives in Active Structural Acoustic Control Brigham Young University Abstract-- A ne parameter as recently developed by Jeffery M. Fisher (M.S.) for use in
More informationMMSE Equalizer Design
MMSE Equalizer Design Phil Schniter March 6, 2008 [k] a[m] P a [k] g[k] m[k] h[k] + ṽ[k] q[k] y [k] P y[m] For a trivial channel (i.e., h[k] = δ[k]), e kno that the use of square-root raisedcosine (SRRC)
More informationA proof of topological completeness for S4 in (0,1)
A proof of topological completeness for S4 in (,) Grigori Mints and Ting Zhang 2 Philosophy Department, Stanford University mints@csli.stanford.edu 2 Computer Science Department, Stanford University tingz@cs.stanford.edu
More informationImproving Time Efficiency of Feedforward Neural Network Learning
Improving Time Efficiency of Feedforard Neural Netork Learning by Batsukh Batbayar Bachelor of Engineering, National University of Mongolia Master of Physics, National University of Mongolia A thesis submitted
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationComputation of turbulent natural convection at vertical walls using new wall functions
Computation of turbulent natural convection at vertical alls using ne all functions M. Hölling, H. Herig Institute of Thermo-Fluid Dynamics Hamburg University of Technology Denickestraße 17, 2173 Hamburg,
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More information