Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment

Size: px
Start display at page:

Download "Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment"

Transcription

1 Enhancing Generalization Capability of SVM Classifiers ith Feature Weight Adjustment Xizhao Wang and Qiang He College of Mathematics and Computer Science, Hebei University, Baoding 07002, Hebei, China Abstract. It is ell recognized that support vector machines (SVMs ould produce better classification performance in terms of generalization poer. A SVM constructs an optimal separating hyper-plane through maximizing the margin beteen to classes in high-dimensional feature space. Based on statistical learning theory, the margin scale reflects the generalization capability to a great extent. The bigger the margin scale taes, the better the generalization capability of SVMs ill have. This paper maes an attt to enlarge the margin beteen to support vector hyper-planes by feature eight adjustment. The experiments demonstrate that our proposed techniques in this paper can enhance the generalization capability of the original SVM classifiers. Introduction Statistical learning theory (SLT [], a ne theory for small-sample learning problems, as introduced in the 960 s by Vapni et al, hich can deal ith the situation here the samples are limited. Most machine-learning methods perform irical ris minimization (ERM induction principle [], hich is effective hen samples are enough. Hoever, in most cases of real orld, the samples are limited. It is difficult to apply expected ris minimization [2] directly to classification problems. Most expected ris minimization problems are converted to minimize the irical ris. Unfortunately irical ris minimization is not alays equivalent to expected ris minimization. It implies that ERM cannot lead to a good generalization capability but expected ris minimization can. The statistical learning theory has shon a clear relationship beteen expected ris and irical ris, and shon that the generalization can be controlled by the capacity of learning machine. Support vector machines (SVMs are a ne classification technique based on SLT [2]. Due to its extraordinary generalization, SVMs have been a poerful tool for solving classification problems ith to classes. A SVM first maps the original input space into a high-dimensional feature space through some predefined nonlinear mapping and then constructs an optimal separating hyper-plane maximizing the margin beteen to classes in the feature space. Based on SLT, e no that, the bigger the margin scale taes, the better the generalization capability of SVMs ill have. Therefore e alays expect that the margin is as large as possible for to-class problems. M.Gh. Negoita et al. (Eds.: KES 2004, LNAI 323, pp , Springer-Verlag Berlin Heidelberg 2004

2 038 X. Wang and Q. He Feature eight learning, hich assigns a eight to each feature to indicate the importance degree of feature, is an extension of feature selection. This paper adopts a feature eight learning technique introduced in [4]. We expect to enlarge the margin beteen to support vector hyper-planes by using this technique for generalization capability improvement. The rest of this paper is organized as follos. In section 2, e revie the relationship beteen the margin of to hyper-planes and the generalization capability of SVMs. In section 3, the detailed technique of feature eight learning is introduced. To experiments, hich sho that the feature eight learning s effect to margin enlargement, are presented in section 4. Section 5 gives some remars and concludes our paper. 2 Relationship Beteen the Margin and the Generalization Capability Firstly, e consider a function, i.e., the groth function of the set of indicator functions [2] G ( l = ln sup N ( z, z2,, zl ( z,, zl here N ( z, z2,, zl evaluates ho many different separations of the given sample z, z 2,, z l can be done using functions from the set of indicator functions. From SLT, e no the folloing conclusion: any groth function satisfies G ( l = l ln 2 or l G ( l < h(ln + h (2 here h is an integer for hich G ( h = h ln 2 and G ( h + ( h + ln 2 (3 We no revie the definition of VC dimension [3]: The VC dimension of the set of indicator function Q (z,α,α equal h if the groth function is bounded by a logarithmic function ith coefficient h. VC dimension is a pivotal concept in the statistical learning theory. For a to-class classification problem, the -margin separating hyper-plane is defined in SLT []. The folloing theorems are valid for the set of -margin separating hyper-planes [2]. Theorem. Let vector x X belong to a sphere of radius R. then the set of - margin separating hyper-plane has the VC dimension h bounded by the inequality 2 R h min, n + 2 (4 From inequality (4, e no that larger the margin of the set of functions is, the more less VC dimension is.

3 Theorem 2. With probability at least Enhancing Generalization Capability of SVM Classifiers 039 η, the inequality Bε 4 R( α R( α R( α Bε (5 holds true, here R (α, R (α are the expected ris and the irical ris functional respectively [],l is the size of training set, h is VC dimension of the set of functions. ε is formulated in [2]. For simplicity, e rite (5 as follo: ( h R( α R( α + Φ l (6 Φ is called confidence interval, hich is monotonic increasing function of h, and monotonic decreasing function of l. Inequality (6 gives bounds on the generalization ability of learning machine. Our purpose is to minimize the left side of (6, i.e., R (α. Minimization of R (α implies the optimal generalization capability of learning machines. In fact, minimizing R (α is not feasible due to its integral formulation. Practically, in place of minimizing R (α, e do minimize R (α. Inequality (6 shos the relationship beteen R (α and R (α. Due to the existence of the second term of the right side of (6, it is clear that the minimization of R (α cannot guarantee the minimization of R (α. It is expected that the second term of the right side of (6 is as small as possible since the small confidence interval possibly implies the small actual ris R (α, i.e., possibly implies a good generalization capability. Noting that Φ is a monotonic increasing function of h hich is decreasing ith the increase of here is the margin of separating hyper-planes (see Inequation (4, enlarging possibly leads to an improvement of generalization capability of the learning machine. The traditional ERM principle only minimizes irical ris ithout considering confidence interval, so it perhaps cannot get good generalization capability. SVMs based on SLT perform structural ris minimization (SRM induction principle [][2], not only minimizes the irical ris, but also considers confidence interval by maximizing the margin beteen to classes in high-dimensional feature space. 3 Feature-Weight Learning Each feature is considered to have an importance degree called feature-eight. Featureeight assignment is an extension of feature selection. The latter has only either 0- eight or -eight value, hile the former can have eight values in the interval [0,]. The feature-eight learning depends on the similarity beteen samples. There are many ays to define the similarity measure, such as the related coefficient and Euclidean distance, etc. Here the similarity measure ρ is defined as follos:

4 040 X. Wang and Q. He ρ = (7 + β * d here β is a positive parameter in the interval [0,]. It can adjust the similarity degrees distributed around 0.5, i.e., β is required to satisfy the folloing: here 2 = 0.5 (8 nn ( + β d p< q d is commonly used Euclidean distance, and distance defined as follos: ( d is the eighted Euclidean d = ( x x 2 2 i j = s (9 here = (,, s is the feature-eight vector. Its component is the importance degree corresponding to each feature. Larger is, more important the th feature is. When = (,,, the space { ( d r } is a hyper-sphere ith radius r (called original space and ( d is d and ( ρ is axes ould be extended or shrun in accordance ith ρ. When (,,, it means that the. Thus the space { ( d r } is hyper-ellipse, called the transformed space. The loer the value of is, the higher the flattening extent is. A good partition should have the folloing property: the samples ithin one cluster are more similar and dissimilar samples are more separate, hich implies that the fuzziness of the partition is lo. Therefore e hope that by adjusting, ( ρ tends to one or zero if ρ is greater or less than 0.5, respectively. Based on above discussion, e learn the feature-eight value by minimizing an evaluation function E ( [4] defined as follos: ( ρ ρ ρ ρ 2 E ( = ( + ( nn ( 2 i j i here n is the number of samples. From equation (0, e can say that E ( decreases as the similarity degree ρ tends to 0 or if ρ < 0.5 or ρ > 0.5. Therefore e expect that feature-eight learning by minimizing E ( can lead to ( ρ close to 0 or, hich obviously can improve pq the performance of FCM. To minimize the evaluation function E (, e use the gradient descent technique. First of all, let be the change of, compute as follos: (0

5 Enhancing Generalization Capability of SVM Classifiers 04 = η here η is the learning rate. An appropriate value of η could speed up the convergence of the algorithm since too small η leads to lo computational efficiency but too big η results in divergence of the algorithm. Through one dimensional searching technique [5], η is determined by: ( E( = Min η,, E( η λ,, λ > 0 n λ n n n (2 The derivate of E ( can obtained from folloing: E ρ = ( 2 N( N j < i d ρ (3 d ρ d β = ( + β d 2 (4 d ( x = d x i The algorithm is described briefly as follos: ( Initialize all eight values ith. And solve β from equation (8; (2 Compute ( ρ by equation (7 and ( d by equation (9. (3 Let be the change of. Compute by (3-5; = η (4 If + 0 is satisfied,then = + ; (5 Go to (3 until E ( is less than a given threshold or the times of iteration reach the user specified number. j (5 4 Enlarging the Margin by Feather Weight Adjustment for Generalization Capability Improvement In this section, for the purpose of generalization capability improvement, e ould lie to experimentally demonstrate the enlargement of margin beteen to hyperplanes by the feature eight adjustment mentioned in previous section. We select to databases to complete the demonstration. The first one is the ell-non toy example, i.e., Iris database from UCI, hich includes 50 cases ith 3 classes. Since the SVM is only for to-class classification

6 042 X. Wang and Q. He problems, e delete the cases belonging to class one and the remaining 00 cases ith to classes are used to demonstrate. The second one, called Pima India Diabetes, is also from UCI. The to databases characters are shon in table. According to the feature eight learning algorithm given in the end of section 3, e can learn the eights for the selected to databases. Table 2 shos the result of feature eight leaning for the to selected databases. Applying SVM Toolbox ( to the original data of the to selected databases, one can obtain the optimal separating hyper-planes and the corresponding margin. Due to the limit of paper length, e omit the formulation of hyper-planes and margins. For details, one can refer to [][2]. Similarly, using the same method, one can evaluate the margins among the separating hyperplanes after the learned eighs are incorporated into the original databases. Tables 3 and 4 sho the size of margin of separating hyper-planes, the training accuracy, and the testing accuracy for the to databases, here 70% of the databases are randomly selected as the training sets and the remaining 30% as the testing sets. It is orth noting that the experimental results depend on the parameters chosen in the SVM Toolbox. From Tables 3 and 4, one can see that the margins are indeed enlarged. Hoever, the improvement for training and testing accuracy is not significant. We speculate that the reason is that ( the data is not enough for Iris database and (2 Pima database is non-linear separable very much. The further investigation to the generalization capability improvement is in progress. Table. The characters of databases Database Name Number of Number of Category of samples features features Iris 00 4 Numerical Pima Numerical Table 2. The results of feature-eight learning Database Name Results of feature-eight learning Iris 0.000,0.0002,.0,0.64 Pima , , , , , , , Table 3. Margin enlargement for Iris database Iris database Margin Training Testing Accuracy Accuracy Before feature eight adjustment After feature eight adjustment

7 Enhancing Generalization Capability of SVM Classifiers 043 Table 4. Margin enlargement for Pima database Pima database Margin Training Accuracy Testing Accuracy Before feature eight adjustment After feature eight adjustment Conclusions According to SLT, the enlargement of margin of separating hyper-plane can enhance the generalization capability of the learning machine. This paper maes an attt to enlarge the margin by an approach of feature eight adjustment. Initial experiments sho the approach s effectiveness. We have the folloing remars: ( Whether the feature eight learning approach can be mathematically proved to enlarge the margin? (2 Whether the approach has a difference beteen linear separable and non-linear separable data sets? (3 When evaluating the margin of separating hyper-plane, ho to choose the optimal values of parameters in the quadratic program? Acnoledgement This research is supported by the ey project of Chinese Educational Ministry (0306, the Natural Science Foundation of HeBei Province (60336, and the Doctoral Foundation of Hebei Educational Committee (B20036 References. Vapni V. N, The Nature of statistical learning theory, Ne Yor: Springer-Verlag, Vapni V. N, Statistical learning theory. Ne Yor: A Wiley-Interscience Publication, Vladimir N. Vapni, An Overvie of Statistical Learning Theory, IEEE Transactions on Neural Netors, 0, ( Basa J, De R. K, Pal S. K, Unsupervised feature selection using a neuro-fuzzy approach, Pattern Recognition Letters 9 ( Rao, Optimization theory and applications (Wiley eastern limited, 985.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

Max-Margin Ratio Machine

Max-Margin Ratio Machine JMLR: Workshop and Conference Proceedings 25:1 13, 2012 Asian Conference on Machine Learning Max-Margin Ratio Machine Suicheng Gu and Yuhong Guo Department of Computer and Information Sciences Temple University,

More information

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1 90 8 80 7 70 6 60 0 8/7/ Preliminaries Preliminaries Linear models and the perceptron algorithm Chapters, T x + b < 0 T x + b > 0 Definition: The Euclidean dot product beteen to vectors is the expression

More information

Linear Discriminant Functions

Linear Discriminant Functions Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and

More information

Part 8: Neural Networks

Part 8: Neural Networks METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as

More information

A ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT

A ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT Progress In Electromagnetics Research Letters, Vol. 16, 53 60, 2010 A ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT Y. P. Liu and Q. Wan School of Electronic Engineering University of Electronic

More information

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories //5 Neural Netors Associative memory Lecture Associative memories Associative memories The massively parallel models of associative or content associative memory have been developed. Some of these models

More information

Multilayer Neural Networks

Multilayer Neural Networks Pattern Recognition Lecture 4 Multilayer Neural Netors Prof. Daniel Yeung School of Computer Science and Engineering South China University of Technology Lec4: Multilayer Neural Netors Outline Introduction

More information

Linear models and the perceptron algorithm

Linear models and the perceptron algorithm 8/5/6 Preliminaries Linear models and the perceptron algorithm Chapters, 3 Definition: The Euclidean dot product beteen to vectors is the expression dx T x = i x i The dot product is also referred to as

More information

CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum

CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum 1997 19 CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION 3.0. Introduction

More information

ROBUST LINEAR DISCRIMINANT ANALYSIS WITH A LAPLACIAN ASSUMPTION ON PROJECTION DISTRIBUTION

ROBUST LINEAR DISCRIMINANT ANALYSIS WITH A LAPLACIAN ASSUMPTION ON PROJECTION DISTRIBUTION ROBUST LINEAR DISCRIMINANT ANALYSIS WITH A LAPLACIAN ASSUMPTION ON PROJECTION DISTRIBUTION Shujian Yu, Zheng Cao, Xiubao Jiang Department of Electrical and Computer Engineering, University of Florida 0

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 In the previous lecture, e ere introduced to the SVM algorithm and its basic motivation

More information

A Modified ν-sv Method For Simplified Regression A. Shilton M. Palaniswami

A Modified ν-sv Method For Simplified Regression A. Shilton M. Palaniswami A Modified ν-sv Method For Simplified Regression A Shilton apsh@eemuozau), M Palanisami sami@eemuozau) Department of Electrical and Electronic Engineering, University of Melbourne, Victoria - 3 Australia

More information

The Application of Liquid State Machines in Robot Path Planning

The Application of Liquid State Machines in Robot Path Planning 82 JOURNAL OF COMPUTERS, VOL. 4, NO., NOVEMBER 2009 The Application of Liquid State Machines in Robot Path Planning Zhang Yanduo School of Computer Science and Engineering, Wuhan Institute of Technology,

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

The Analytic Hierarchy Process for the Reservoir Evaluation in Chaoyanggou Oilfield

The Analytic Hierarchy Process for the Reservoir Evaluation in Chaoyanggou Oilfield Advances in Petroleum Exploration and Development Vol. 6, No. 2, 213, pp. 46-5 DOI:1.3968/j.aped.1925543821362.1812 ISSN 1925-542X [Print] ISSN 1925-5438 [Online].cscanada.net.cscanada.org The Analytic

More information

CALCULATION OF STEAM AND WATER RELATIVE PERMEABILITIES USING FIELD PRODUCTION DATA, WITH LABORATORY VERIFICATION

CALCULATION OF STEAM AND WATER RELATIVE PERMEABILITIES USING FIELD PRODUCTION DATA, WITH LABORATORY VERIFICATION CALCULATION OF STEAM AND WATER RELATIVE PERMEABILITIES USING FIELD PRODUCTION DATA, WITH LABORATORY VERIFICATION Jericho L. P. Reyes, Chih-Ying Chen, Keen Li and Roland N. Horne Stanford Geothermal Program,

More information

Analysis of Nonlinear Characteristics of Turbine Governor and Its Impact on Power System Oscillation

Analysis of Nonlinear Characteristics of Turbine Governor and Its Impact on Power System Oscillation Energy and Poer Engineering, 203, 5, 746-750 doi:0.4236/epe.203.54b44 Published Online July 203 (http://.scirp.org/journal/epe) Analysis of Nonlinear Characteristics of Turbine Governor and Its Impact

More information

On the approximation of real powers of sparse, infinite, bounded and Hermitian matrices

On the approximation of real powers of sparse, infinite, bounded and Hermitian matrices On the approximation of real poers of sparse, infinite, bounded and Hermitian matrices Roman Werpachoski Center for Theoretical Physics, Al. Lotnikó 32/46 02-668 Warszaa, Poland Abstract We describe a

More information

Minimax strategy for prediction with expert advice under stochastic assumptions

Minimax strategy for prediction with expert advice under stochastic assumptions Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction

More information

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 26: Probability and Random Processes Problem Set Spring 209 Self-Graded Scores Due:.59 PM, Monday, February 4, 209 Submit your

More information

Universal Learning Technology: Support Vector Machines

Universal Learning Technology: Support Vector Machines Special Issue on Information Utilizing Technologies for Value Creation Universal Learning Technology: Support Vector Machines By Vladimir VAPNIK* This paper describes the Support Vector Machine (SVM) technology,

More information

An artificial neural networks (ANNs) model is a functional abstraction of the

An artificial neural networks (ANNs) model is a functional abstraction of the CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly

More information

Multilayer Feedforward Networks. Berlin Chen, 2002

Multilayer Feedforward Networks. Berlin Chen, 2002 Multilayer Feedforard Netors Berlin Chen, 00 Introduction The single-layer perceptron classifiers discussed previously can only deal ith linearly separable sets of patterns The multilayer netors to be

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Early & Quick COSMIC-FFP Analysis using Analytic Hierarchy Process

Early & Quick COSMIC-FFP Analysis using Analytic Hierarchy Process Early & Quick COSMIC-FFP Analysis using Analytic Hierarchy Process uca Santillo (luca.santillo@gmail.com) Abstract COSMIC-FFP is a rigorous measurement method that makes possible to measure the functional

More information

Econ 201: Problem Set 3 Answers

Econ 201: Problem Set 3 Answers Econ 20: Problem Set 3 Ansers Instructor: Alexandre Sollaci T.A.: Ryan Hughes Winter 208 Question a) The firm s fixed cost is F C = a and variable costs are T V Cq) = 2 bq2. b) As seen in class, the optimal

More information

Solving Classification Problems By Knowledge Sets

Solving Classification Problems By Knowledge Sets Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose

More information

Long run input use-input price relations and the cost function Hessian. Ian Steedman Manchester Metropolitan University

Long run input use-input price relations and the cost function Hessian. Ian Steedman Manchester Metropolitan University Long run input use-input price relations and the cost function Hessian Ian Steedman Manchester Metropolitan University Abstract By definition, to compare alternative long run equilibria is to compare alternative

More information

Homework 6 Solutions

Homework 6 Solutions Homeork 6 Solutions Igor Yanovsky (Math 151B TA) Section 114, Problem 1: For the boundary-value problem y (y ) y + log x, 1 x, y(1) 0, y() log, (1) rite the nonlinear system and formulas for Neton s method

More information

A Generalization of a result of Catlin: 2-factors in line graphs

A Generalization of a result of Catlin: 2-factors in line graphs AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 72(2) (2018), Pages 164 184 A Generalization of a result of Catlin: 2-factors in line graphs Ronald J. Gould Emory University Atlanta, Georgia U.S.A. rg@mathcs.emory.edu

More information

TANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION. Shiliang Sun

TANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION. Shiliang Sun TANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION Shiliang Sun Department o Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 200241, P. R.

More information

Networks of McCulloch-Pitts Neurons

Networks of McCulloch-Pitts Neurons s Lecture 4 Netorks of McCulloch-Pitts Neurons The McCulloch and Pitts (M_P) Neuron x x sgn x n Netorks of M-P Neurons One neuron can t do much on its on, but a net of these neurons x i x i i sgn i ij

More information

Electromagnetics I Exam No. 3 December 1, 2003 Solution

Electromagnetics I Exam No. 3 December 1, 2003 Solution Electroagnetics Ea No. 3 Deceber 1, 2003 Solution Please read the ea carefull. Solve the folloing 4 probles. Each proble is 1/4 of the grade. To receive full credit, ou ust sho all ork. f cannot understand

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

Machine Learning: Logistic Regression. Lecture 04

Machine Learning: Logistic Regression. Lecture 04 Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input

More information

Support Vector Machines

Support Vector Machines Supervised Learning Methods -nearest-neighbors (-) Decision trees eural netors Support vector machines (SVM) Support Vector Machines Chapter 18.9 and the optional paper Support vector machines by M. Hearst,

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

= Prob( gene i is selected in a typical lab) (1)

= Prob( gene i is selected in a typical lab) (1) Supplementary Document This supplementary document is for: Reproducibility Probability Score: Incorporating Measurement Variability across Laboratories for Gene Selection (006). Lin G, He X, Ji H, Shi

More information

Lecture 3a: The Origin of Variational Bayes

Lecture 3a: The Origin of Variational Bayes CSC535: 013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton The origin of variational Bayes In variational Bayes, e approximate the true posterior across parameters

More information

Average Coset Weight Distributions of Gallager s LDPC Code Ensemble

Average Coset Weight Distributions of Gallager s LDPC Code Ensemble 1 Average Coset Weight Distributions of Gallager s LDPC Code Ensemble Tadashi Wadayama Abstract In this correspondence, the average coset eight distributions of Gallager s LDPC code ensemble are investigated.

More information

Semi-simple Splicing Systems

Semi-simple Splicing Systems Semi-simple Splicing Systems Elizabeth Goode CIS, University of Delaare Neark, DE 19706 goode@mail.eecis.udel.edu Dennis Pixton Mathematics, Binghamton University Binghamton, NY 13902-6000 dennis@math.binghamton.edu

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

Deterministic convergence of conjugate gradient method for feedforward neural networks

Deterministic convergence of conjugate gradient method for feedforward neural networks Deterministic convergence of conjugate gradient method for feedforard neural netorks Jian Wang a,b,c, Wei Wu a, Jacek M. Zurada b, a School of Mathematical Sciences, Dalian University of Technology, Dalian,

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

Day 4: Classification, support vector machines

Day 4: Classification, support vector machines Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far

More information

Towards a Theory of Societal Co-Evolution: Individualism versus Collectivism

Towards a Theory of Societal Co-Evolution: Individualism versus Collectivism Toards a Theory of Societal Co-Evolution: Individualism versus Collectivism Kartik Ahuja, Simpson Zhang and Mihaela van der Schaar Department of Electrical Engineering, Department of Economics, UCLA Theorem

More information

Supervised Learning Part I

Supervised Learning Part I Supervised Learning Part I http://www.lps.ens.fr/~nadal/cours/mva Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure

More information

Why do Golf Balls have Dimples on Their Surfaces?

Why do Golf Balls have Dimples on Their Surfaces? Name: Partner(s): 1101 Section: Desk # Date: Why do Golf Balls have Dimples on Their Surfaces? Purpose: To study the drag force on objects ith different surfaces, ith the help of a ind tunnel. Overvie

More information

Single layer NN. Neuron Model

Single layer NN. Neuron Model Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent

More information

An Improved Driving Scheme in an Electrophoretic Display

An Improved Driving Scheme in an Electrophoretic Display International Journal of Engineering and Technology Volume 3 No. 4, April, 2013 An Improved Driving Scheme in an Electrophoretic Display Pengfei Bai 1, Zichuan Yi 1, Guofu Zhou 1,2 1 Electronic Paper Displays

More information

BRUTE FORCE AND INTELLIGENT METHODS OF LEARNING. Vladimir Vapnik. Columbia University, New York Facebook AI Research, New York

BRUTE FORCE AND INTELLIGENT METHODS OF LEARNING. Vladimir Vapnik. Columbia University, New York Facebook AI Research, New York 1 BRUTE FORCE AND INTELLIGENT METHODS OF LEARNING Vladimir Vapnik Columbia University, New York Facebook AI Research, New York 2 PART 1 BASIC LINE OF REASONING Problem of pattern recognition can be formulated

More information

Bloom Filters and Locality-Sensitive Hashing

Bloom Filters and Locality-Sensitive Hashing Randomized Algorithms, Summer 2016 Bloom Filters and Locality-Sensitive Hashing Instructor: Thomas Kesselheim and Kurt Mehlhorn 1 Notation Lecture 4 (6 pages) When e talk about the probability of an event,

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2

More information

Margin Distribution Logistic Machine

Margin Distribution Logistic Machine Margin Distribution Logistic Machine Abstract Yi Ding Sheng-Jun Huang Chen Zu Daoqiang Zhang Linear classifier is an essential part of machine learning, and improving its robustness has attracted much

More information

Criteria for the determination of a basic Clark s flow time distribution function in network planning

Criteria for the determination of a basic Clark s flow time distribution function in network planning International Journal of Engineering & Technology IJET-IJENS Vol:4 No:0 60 Criteria for the determination of a basic Clark s flo time distribution function in netork planning Dusko Letić, Branko Davidović,

More information

COS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007

COS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 COS 424: Interacting ith Data Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 Recapitulation of Last Lecture In linear regression, e need to avoid adding too much richness to the

More information

A turbulence closure based on the maximum entropy method

A turbulence closure based on the maximum entropy method Advances in Fluid Mechanics IX 547 A turbulence closure based on the maximum entropy method R. W. Derksen Department of Mechanical and Manufacturing Engineering University of Manitoba Winnipeg Canada Abstract

More information

Artificial Neural Networks. Part 2

Artificial Neural Networks. Part 2 Artificial Neural Netorks Part Artificial Neuron Model Folloing simplified model of real neurons is also knon as a Threshold Logic Unit x McCullouch-Pitts neuron (943) x x n n Body of neuron f out Biological

More information

N-bit Parity Neural Networks with minimum number of threshold neurons

N-bit Parity Neural Networks with minimum number of threshold neurons Open Eng. 2016; 6:309 313 Research Article Open Access Marat Z. Arslanov*, Zhazira E. Amirgalieva, and Chingiz A. Kenshimov N-bit Parity Neural Netorks ith minimum number of threshold neurons DOI 10.1515/eng-2016-0037

More information

Introduction to Artificial Neural Network - theory, application and practice using WEKA- Anto Satriyo Nugroho, Dr.Eng

Introduction to Artificial Neural Network - theory, application and practice using WEKA- Anto Satriyo Nugroho, Dr.Eng Introduction to Artificial Neural Netor - theory, application and practice using WEKA- Anto Satriyo Nugroho, Dr.Eng Center for Information & Communication Technology, Agency for the Assessment & Application

More information

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26 Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines

More information

Regression coefficients may even have a different sign from the expected.

Regression coefficients may even have a different sign from the expected. Multicolinearity Diagnostics : Some of the diagnostics e have just discussed are sensitive to multicolinearity. For example, e kno that ith multicolinearity, additions and deletions of data cause shifts

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data

More information

Training the linear classifier

Training the linear classifier 215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.

More information

Research on Evaluation Method of Organization s Performance Based on Comparative Advantage Characteristics

Research on Evaluation Method of Organization s Performance Based on Comparative Advantage Characteristics Vol.1, No.10, Ar 01,.67-7 Research on Evaluation Method of Organization s Performance Based on Comarative Advantage Characteristics WEN Xin 1, JIA Jianfeng and ZHAO Xi nan 3 Abstract It as under the guidance

More information

Language Recognition Power of Watson-Crick Automata

Language Recognition Power of Watson-Crick Automata 01 Third International Conference on Netorking and Computing Language Recognition Poer of Watson-Crick Automata - Multiheads and Sensing - Kunio Aizaa and Masayuki Aoyama Dept. of Mathematics and Computer

More information

Outline of lectures 3-6

Outline of lectures 3-6 GENOME 453 J. Felsenstein Evolutionary Genetics Autumn, 013 Population genetics Outline of lectures 3-6 1. We ant to kno hat theory says about the reproduction of genotypes in a population. This results

More information

Business Cycles: The Classical Approach

Business Cycles: The Classical Approach San Francisco State University ECON 302 Business Cycles: The Classical Approach Introduction Michael Bar Recall from the introduction that the output per capita in the U.S. is groing steady, but there

More information

Randomized Smoothing Networks

Randomized Smoothing Networks Randomized Smoothing Netorks Maurice Herlihy Computer Science Dept., Bron University, Providence, RI, USA Srikanta Tirthapura Dept. of Electrical and Computer Engg., Ioa State University, Ames, IA, USA

More information

Energy Minimization via a Primal-Dual Algorithm for a Convex Program

Energy Minimization via a Primal-Dual Algorithm for a Convex Program Energy Minimization via a Primal-Dual Algorithm for a Convex Program Evripidis Bampis 1,,, Vincent Chau 2,, Dimitrios Letsios 1,2,, Giorgio Lucarelli 1,2,,, and Ioannis Milis 3, 1 LIP6, Université Pierre

More information

CHARACTERIZATION OF ULTRASONIC IMMERSION TRANSDUCERS

CHARACTERIZATION OF ULTRASONIC IMMERSION TRANSDUCERS CHARACTERIZATION OF ULTRASONIC IMMERSION TRANSDUCERS INTRODUCTION David D. Bennink, Center for NDE Anna L. Pate, Engineering Science and Mechanics Ioa State University Ames, Ioa 50011 In any ultrasonic

More information

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification

More information

reader is comfortable with the meaning of expressions such as # g, f (g) >, and their

reader is comfortable with the meaning of expressions such as # g, f (g) >, and their The Abelian Hidden Subgroup Problem Stephen McAdam Department of Mathematics University of Texas at Austin mcadam@math.utexas.edu Introduction: The general Hidden Subgroup Problem is as follos. Let G be

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Classification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation.

Classification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation. Classification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation BY Imane Chatri A thesis submitted to the Concordia Institute for Information

More information

FORCE NATIONAL STANDARDS COMPARISON BETWEEN CENAM/MEXICO AND NIM/CHINA

FORCE NATIONAL STANDARDS COMPARISON BETWEEN CENAM/MEXICO AND NIM/CHINA FORCE NATIONAL STANDARDS COMPARISON BETWEEN CENAM/MEXICO AND NIM/CHINA Qingzhong Li *, Jorge C. Torres G.**, Daniel A. Ramírez A.** *National Institute of Metrology (NIM) Beijing, 100013, China Tel/fax:

More information

The DMP Inverse for Rectangular Matrices

The DMP Inverse for Rectangular Matrices Filomat 31:19 (2017, 6015 6019 https://doi.org/10.2298/fil1719015m Published by Faculty of Sciences Mathematics, University of Niš, Serbia Available at: http://.pmf.ni.ac.rs/filomat The DMP Inverse for

More information

Anomaly Detection using Support Vector Machine

Anomaly Detection using Support Vector Machine Anomaly Detection using Support Vector Machine Dharminder Kumar 1, Suman 2, Nutan 3 1 GJUS&T Hisar 2 Research Scholar, GJUS&T Hisar HCE Sonepat 3 HCE Sonepat Abstract: Support vector machine are among

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Group-invariant solutions of nonlinear elastodynamic problems of plates and shells *

Group-invariant solutions of nonlinear elastodynamic problems of plates and shells * Group-invariant solutions of nonlinear elastodynamic problems of plates and shells * V. A. Dzhupanov, V. M. Vassilev, P. A. Dzhondzhorov Institute of mechanics, Bulgarian Academy of Sciences, Acad. G.

More information

Analytical Study of Biometrics Normalization and Fusion Techniques For Designing a Multimodal System

Analytical Study of Biometrics Normalization and Fusion Techniques For Designing a Multimodal System Volume Issue 8, November 4, ISSN No.: 348 89 3 Analytical Study of Biometrics Normalization and Fusion Techniques For Designing a Multimodal System Divya Singhal, Ajay Kumar Yadav M.Tech, EC Department,

More information

Classification of Ordinal Data Using Neural Networks

Classification of Ordinal Data Using Neural Networks Classification of Ordinal Data Using Neural Networks Joaquim Pinto da Costa and Jaime S. Cardoso 2 Faculdade Ciências Universidade Porto, Porto, Portugal jpcosta@fc.up.pt 2 Faculdade Engenharia Universidade

More information

Bivariate Uniqueness in the Logistic Recursive Distributional Equation

Bivariate Uniqueness in the Logistic Recursive Distributional Equation Bivariate Uniqueness in the Logistic Recursive Distributional Equation Antar Bandyopadhyay Technical Report # 629 University of California Department of Statistics 367 Evans Hall # 3860 Berkeley CA 94720-3860

More information

Introduction To Resonant. Circuits. Resonance in series & parallel RLC circuits

Introduction To Resonant. Circuits. Resonance in series & parallel RLC circuits Introduction To esonant Circuits esonance in series & parallel C circuits Basic Electrical Engineering (EE-0) esonance In Electric Circuits Any passive electric circuit ill resonate if it has an inductor

More information

GRADIENT DESCENT. CSE 559A: Computer Vision GRADIENT DESCENT GRADIENT DESCENT [0, 1] Pr(y = 1) w T x. 1 f (x; θ) = 1 f (x; θ) = exp( w T x)

GRADIENT DESCENT. CSE 559A: Computer Vision GRADIENT DESCENT GRADIENT DESCENT [0, 1] Pr(y = 1) w T x. 1 f (x; θ) = 1 f (x; θ) = exp( w T x) 0 x x x CSE 559A: Computer Vision For Binary Classification: [0, ] f (x; ) = σ( x) = exp( x) + exp( x) Output is interpreted as probability Pr(y = ) x are the log-odds. Fall 207: -R: :30-pm @ Lopata 0

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines

More information

An Investigation of the use of Spatial Derivatives in Active Structural Acoustic Control

An Investigation of the use of Spatial Derivatives in Active Structural Acoustic Control An Investigation of the use of Spatial Derivatives in Active Structural Acoustic Control Brigham Young University Abstract-- A ne parameter as recently developed by Jeffery M. Fisher (M.S.) for use in

More information

MMSE Equalizer Design

MMSE Equalizer Design MMSE Equalizer Design Phil Schniter March 6, 2008 [k] a[m] P a [k] g[k] m[k] h[k] + ṽ[k] q[k] y [k] P y[m] For a trivial channel (i.e., h[k] = δ[k]), e kno that the use of square-root raisedcosine (SRRC)

More information

A proof of topological completeness for S4 in (0,1)

A proof of topological completeness for S4 in (0,1) A proof of topological completeness for S4 in (,) Grigori Mints and Ting Zhang 2 Philosophy Department, Stanford University mints@csli.stanford.edu 2 Computer Science Department, Stanford University tingz@cs.stanford.edu

More information

Improving Time Efficiency of Feedforward Neural Network Learning

Improving Time Efficiency of Feedforward Neural Network Learning Improving Time Efficiency of Feedforard Neural Netork Learning by Batsukh Batbayar Bachelor of Engineering, National University of Mongolia Master of Physics, National University of Mongolia A thesis submitted

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Computation of turbulent natural convection at vertical walls using new wall functions

Computation of turbulent natural convection at vertical walls using new wall functions Computation of turbulent natural convection at vertical alls using ne all functions M. Hölling, H. Herig Institute of Thermo-Fluid Dynamics Hamburg University of Technology Denickestraße 17, 2173 Hamburg,

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information