School of Information Engineering, Guangdong University of Technology, Guangzhou, , China; 2

Size: px
Start display at page:

Download "School of Information Engineering, Guangdong University of Technology, Guangzhou, , China; 2"

Transcription

1 Sparse Representation Based Augmented Multinomial Logistic Extreme Learning Machine with Weighted Composite Features for Spectral-Spatial Classification of Hyperspectral Images Faxian Cao 1, Zhijing Yang 1*, Jinchang Ren 2, Wing-Kuen Ling 1 1 School of Information Engineering, Guangdong University of echnology, Guangzhou, , China; 2 Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, G1 1XW, UK * yzhj@gdut.edu.cn Abstract Although extreme learning machine (ELM) has been successfully applied to a number of pattern recognition problems, it fails to provide sufficient good results in hyperspectral image (HSI) classification due to two main drawbacks. he first is due to the random weights and bias of ELM, which may lead to ill-posed problems. he second is the lack of spatial information for classification. o tackle these two problems, in this paper, we propose a new framework for ELM based spectral-spatial classification of HSI, where probabilistic modelling with sparse representation and weighted composite features (WCF) are employed respectively to derive the optimized output weights and extract spatial features. First, the ELM is represented as a concave logarithmic likelihood function under statistical modelling using the maximum a posteriori (MAP). Second, the sparse representation is applied to the Laplacian prior to efficiently determine a logarithmic posterior with a unique maximum in order to solve the ill-posed problem of ELM. he variable splitting and the augmented Lagrangian are subsequently used to further reduce the computation complexity of the proposed algorithm and it has been proven a more efficient method for speed improvement. hird, the spatial information is extracted using the weighted composite features (WCFs) to construct the spectral-spatial classification framework. In addition, the lower bound of the proposed method is derived by a rigorous mathematical proof. Experimental results on two publicly available HSI data sets demonstrate that the proposed methodology outperforms ELM and a number of state-of-the-art approaches. Index erms hyperspectral image (HSI), spectral-spatial classification, extreme learning machine (ELM), maximum a posterior (MAP), sparse representation, Laplacian prior, variable splitting, augmented Lagrangian. I. INRODUCION With rich spectral and spatial information contained in a three-dimensional hypercube, hyperspectral images (HSI) provides a unique way for characterizing objects in geographical scenes, especially remote sensing images [1]. However, classification of high dimensional data such as HSI is still challenging, particularly due to the unfavorable ratio between the limited number of training samples and large number of spectral bands, i.e. the Hughes phenomenon [2]-[4]. o tackle this problem, a number of feature extraction and data classification approaches have been proposed [12]. hese include the singular spectrum analysis (SSA) [5]-[8], segmented auto-encoders [11], principal component analysis (PCA) and its variations [13]-[14], spectral-spatial classification using multiple kernels and active learning [15][16], where support vector machines (SVM) [9][10] are widely used for data classification. Although these approaches have produced quite good results, their performance can be further improved by addressing two main difficulties: (1) Inaccurate classification under a large number of spectral bands yet limited training samples. (2) Relatively low efficiency for processing high dimensional HSI data. As a single forward layer neural network, the extreme learning machine (ELM) is a fast and effective machine learning method and has received a wide attention due to its good performance [17]-[19]. he ELM needs not tune the hidden layer parameters once the number of hidden layer nodes is determined. In ELM, the weight and bias vectors between the input layer and the hidden layer are randomly generated, which are independent of the training samples and the specific applications [1]. ELM have achieved good performance in many applications [20-23] due to its good performance. Also, ELM has been widely applied to HSI classification. For example, In [24],[25], bilateral filtering and extended morphological profiles were used for feature extraction respectively. And ELM was used for classification. In [26],[27], [28], superpixel, watershed and Gabor filter were used for feature extraction and ELM was used for classification. Although these methods ELM-based have achieved good performance, they just combine ELM and other feature extraction method. hese ELM-based method ignore one important problem in ELM that the randomly generated input weights and bias of ELM may cause ill-posed problems. Based on this perspective, we propose augmented sparse multinomial logistic ELM (ASMLELM) for HSIs classification. Based on proposed ASMLELM, we propose weighted composite features (WCFs) for extracting rich spatial information for ASMLELM. herefore, a novel framework ASMLELM-WCFs have been proposed for spectral-spatial classification of HSI, and the main contributions can be highlighted as follows. 1

2 First, the ELM is represented by a maximum a posteriori (MAP) based probabilistic model, which is further represented by a concave logarithmic likelihood function (LLF). As the LLF value can be arbitrarily large if the training data is separable, a prior/regularized term on the LLF is critical [29], for which the sparse representation is employed for representing the ELM. In order to settle the ill-posed problem in ELM, Laplacian prior/regularized term is applied to improve the sparsity of the learnt weights of the proposed ELM classifier. he Laplacian function has the heaviest tailed density derived from its logarithmic concave, where its sparsity promoting nature has been theoretically well justified [29]-[32]. his can guarantee the logarithmic posterior to have a unique maximum when it is combined with a concave logarithmic likelihood [29]. As the ELM is represented by a probabilistic model under a maximum a posteriori (MAP), it can enjoy the advantage of existing a unique maximum. herefore, the ill-posed problem in ELM can be addressed by above operation namely spare multinomial logistic ELM (SMLELM). Recently, variable splitting and augmented Lagrangian [33] have been proposed for speed improvement. Hence, we adopt the idea variable splitting and augmented Lagrangian for proposed SMELM to construct proposed augmented SMLELM (ASMLELM), which transform SMLELM into a new form using variable splitting and augmented Lagrangian for speed improvement. Second, by combining composite kernels (CK) [34] and weighted mean filters (WMFs) [35], the weighted composite features (WCFs) are utilized to extract spatial features and further improve the classification accuracy. Accordingly, three improved spectral-spatial classifiers are derived, which include the ELM, the nonlinear ELM (NLELM) and the kernel ELM (KELM) based classifiers. hird, the generalization bounds of the proposed method is derived in a similar way to determine the margin bounds of the sparse multinomial logistic regression (SMLR) [29][36][37]. hese bounds can provide a theoretical insight of and justification for our proposed methods. he rest of this paper is organized as follows. In Section II, the background of the ELM is introduced. he proposed method is detailed in Section III. Section IV reports the experimental results in benchmarking with a few state-of-the-art approaches. Finally, some conclusions are drawn in Section V. A. Basic Concepts of ELM II. HE EXREME LEARNING MACHINE (ELM) he ELM is a generalized single layer feedforward neural network (SLFNs) [1], [17]. he weight vector and the bias between input layer and hidden layer are randomly initialized, though the final values will be determined by the learning process. Once the initial values for the weight/bias vectors are assigned, the hidden layer output matrix remains unchanged in the learning process [1]. Let X (x 1, x 2,, x N ) R d N be the training data of a HIS, which has N pixels and each pixel has a d-dimensional feature. Let Y = (y 1, y 2,, y N ) R M N be a matrix representing the class label of the training samples, where M is the number of class in datasets. Given a pixel label y i, if it belongs to the k-th class, we have y i,j = { 1, j = k, 0, otherwise. he model of a single hidden layer of the neural network with L hidden neurons and the activation function H(x) can be expressed as follows: L j=1 β j H(w j x i + b j ) = y i, i=1,2,,n (1) where β j represents the weight vector between the hidden layer and the output layer; w j and b j are the weight vector and bias from the inputs to the hidden layer, respectively; H(w j x i + b j ) represents the output of the j-th hidden neuron with respect to the input sample x i. Obviously, equation (1) can be further expressed in the following matrix form: H β = Y (2) where β = [β 1 β M ] L M, H =[H(x 1 ) H(x n )] L N, and H(x i ) = [H 1 (x i ) H L (x i )] L 1. H is the hidden layer output matrix, and β is the output weight matrix between hidden layer and output layer. From (2), β can be simply obtained below, where is the Moore Penrose generalized inverse of a matrix [17]. β = (H ) Y (3) B. Constrained Optimization of the ELM he constrained optimization of the ELM aims to achieve not only the smallest training error but also the smallest output weights [19]: min H β Y 2 and β 2 (4) According to the Bartlett s neural network generalization theory [38], the smaller weights will result in a smaller training error of the feedforward neural networks. According to this optimization theory, Eq. (4) can be rewritten as: N 2 min L ELM = 1 β β,ξ i 2 F 2 + C 1 ξ 2 2 i=1 i 2, s. t. H (x i )β = y i ξ i, i=1,..,n (5)

3 where ξ i is the training error for the training sample x i, C is the regularization parameter. Based on the Karush-Kuhn-ucker (KK) theorem [39], training the ELM is equivalent to solve the following dual optimization problem: N min ELM = 1 β (β,α,ξ i ) 2 F 2 + C 1 ξ 2 2 i=1 i 2 i=1 j=1 α i,j (H (x i )β j y i,j + ξ i,j ) (6) where β j is the column vector of the matrix β, and α i,j is the Lagrange multiplier. From the KK theorem, we can further derive L ELM = 0 β = H α β j (7) L ELM = 0 α ε i = Cε i, i i = 1,, N (8) L ELM = 0 H (x α i )β = y i ξ i i i=1,, N (9) where α i = [α i,1, α i,2,, α i,m ] and α = [α 1, α 2,, α N ]. hen, it can be shown that the output weight β is: N M β = H ( I C + H H) 1 Y (10) he activation functions of the neurons in the hidden layer are unknown, and any kernel satisfying the Mercer s conditions can be used: Ω KELM = H H { Ω KELM (x i, x j ): h(x i ) (11) h(x i ) = K(x i, x j ) In fact, the Gaussian kernel is one of the good choices: K ELM (x i, x j ) = exp ( x i x j 2 ) (12) 2 σ ELM Based on the above analysis, two well-known constrained optimization methods of ELM had been proposed. One is to define β in (10) without a kernel, namely nonlinear ELM (NLELM). he other one is to use the kernel function for the kernel ELM (KELM) as given below. β NLELM = H( I C + H H) 1 Y (13) β KELM = ( I C + K(x i, x j )) 1 Y (14) III. HE PROPOSED FRAMEWORK A. Augmented Sparse Multinomial Logistic Extreme Learning Machine (ASMLELM) he goal of a supervised learning algorithm is to design a classifier based on a set of N training samples that is capable of distinguishing M classes on the basis of an input vector of length d [29]. Under the multinomial logistic regression model [40], β in equations (3), (13) and (14) can be transformed to a new form via a probability model. If training sample x i belongs to the j-th class, the probability model can be represented by the following equation: P(y i,j = 1 H(x i ), β) = exp (β j H(x i )) 3 j=1 exp ( β j H(xi )) In (3), (13), (14) and (15), β may not be optimal due to the ill-posed problem. herefore, it is important to find the optimal β to obtain high classification accuracy. In order to find the optimal parameter β for the ELM, β will be estimated again after presenting the ELM by a probabilistic model. o this end, the maximum likelihood (ML) estimation is introduced to the ELM. Let β = [β 1 ; β 2 ; ; β M ] (L M) 1 be a column vector with L M elements. A simple maximization of the logarithmic likelihood can be expressed as follows: max β N M (15) L(β) = i=1( j=1 y i,j β j H(x i ) log j=1 exp (β j H(x i ))) (16) In order to maximize L(β), consider the second order aylor series of L(β) evaluated at β : where ρ (0,1) and L(β) L(β ) = (β β ) L(β ) (β β ) 2 L(β + ρ(β β ))(β β ) M (β β ) L(β ) (β β ) B(β β ) (17) B 1 2 [I 11 M ] HH (18) where I R M M is an identity matrix, 1 = [1, 1,,1] and is the kronecker matrix product. his result has been proved in [40], [41]. hen, the ML estimation can be expressed as follows:

4 β = arg max β β ( L(β ) Bβ ) β Bβ (19) Hence, β at the (t+1)-th iteration can be expressed by a simple update equation: β t+1 = B 1 (Bβ t L(β t )) (20) From (20), it can be seen that it is very similar to an iteratively reweighted least squares (IRLS) algorithm [42]. However, the Hessian matrix in the IRLS algorithm is replaced by matrix B. Since B 1 can be precomputed, it is a big advantage of the proposed approach. Compared to the IRLS algorithm, whose Hessian matrix must be inverted at each iteration [29], [43], our proposed approach is better than the IRLS algorithm. However, the concave LLF value can be arbitrarily large if the training data is separable. From [29], it is known that a prior on the logarithmic likelihood is crucial. In order to address the ill-posed problem in ELM, the prior/regularized term is adopted on β. Here, the Laplacian prior is used: L1(β) = L(β) L(β ) + logp(β) (21) where p(β) exp ( λ β 1 ) (22) and β 1 = β l l denotes the l 1 norm and β l = β l 2. Consider the following inequality: where h>0 and u>0. For any β, we have herefore, h + u 2 h u yields u 1 ( u + h) (23) 2 h λ β 1 1 λ( β l 2 2 l β + β l l l ) (24) β ( L1(β ) Bβ ) β (B λ )β (25) can be maximized. Here, = diag{ β 11 1,, β LM 1 } (26) Finally, (20) can be expressed by the following equation: β t+1 = (B λ t ) 1 (Bβ t L(β t )) (27) From the above, it can be seen that the Laplacian prior/regularized term is applied to β with λ acting as a regularization parameter. he Laplacian prior imposed on the sparse multinomial logistic ELM (SMLELM) controls the complexity of the SMLELM classifier and improves the generalization capacity of the SMLELM since p(β) in (22) forces many components of β to be zero. Since the term L(β) in (16) is not quadratic and p(β) in (22) is nonsmooth, finding the solution of the optimization problem defined as (25) is very difficult. Recently, a novel method called the majorization minimization [43] has been proposed to decompose this kind of problems [29], [44]-[47]. However, the computation complexity of this algorithm is very large. In [48], the logistic regression via a variable splitting and an augumented Lagrangian (LORSAL) have been used for speed improvement. Moreover, variable splitting and augmented Lagrangian have been used for speed improvement in some work [33] [16] [48]. his algorithm have been proven that can greatly reduce the computation complexity [16], [45], [49]. As the variable splitting and augmented Lagrangian approach shows a good performance in the corresponding works, then we can sue this approach to reduce the timeconsuming of proposed SMLELM, which transform proposed SMLELM into a new form. Variable splitting is a very simple procedure which consist in creating a new variable[50]. hen the problem defined in (21) is equivalent to: (β, v ) = arg min L(β) + λ v 1 s.t. β = v (28) β,v his optimization problem can be solved via applying the direction method of multipliers [51] (see also [52] and the references therein).his neural network is called the augmented SMLELM(ASMLELM). Apply the augmented Lagrangian [50] to solve the equation (28). hen, the solution of (28) at (t+1)-th iteration can be rewritten as follows: β t+1 = arg min L(β) + γ β β 2 vt b t 2 (29) v t+1 = arg arg min λ v 1 + γ v 2 βt v b t 2 (30) and b t+1 = b t β t+1 + v t+1 (31) t where γ 0 is the weight of the augmented SMLELM (ASMLELM). For any γ 0, the sequence β converges to a minimizer [45], [48] [50]. For the convenience, γ = 10λ is set. he solution of the problem defined in (28) is the simple soft-threshold rule [51], [45]. It can be expressed as: v t+1 = max (0, abs(e) 1 } (32) 10 4

5 where e = β t+1 b t (33) After designing the ASMLELM framework which can be applied to the ELM, NLELM and KELM, three new spectral algorithms for performing the HSI classification can be generated. hey are named as the ASMLELM for the basic ELM (ASMLBELM), the ASMLELM for the NLELM (ASMLNLELM) and the ASMLELM for the KELM (ASMLKELM). he pseudocodes for these three methods are shown in Algorithm 1. Algorithm 1:he ASMLELM for basic the ELM, the NLELM and the KELM Input: he training sample pairs {x = (x 1, x 2,, x n ) and Y = (y 1, y 2,, y n )} as well as the parameters λ, b = 0. raining phase L: he number of nodes in a hidden layer. H( ): he sigmoid function. β: he output weight in the hidden layer. 1: Randomly generate the input weight {w 1, w L } and the bias {b 1,, b L }. 2: For each training sample x i, calculate the hidden layer matrix H(x i ) = [H 1 (w 1 x i + b 1 ),..., H L (w L x i + b L )] L 1. 3: Calculate the output weight (1) β = (H ) Y for the ASMLBELM. (2) min L ELM = 1 β β,ξ i 2 F 2 + C 1 N ξ 2 2 i=1 i 2, s. t. H (x i )β = y i ξ i i=1,..,n,. Here, β = H ( I C + H H) 1 Y for the ASMLNLELM. (3) Let π = ( I C + H H) 1 Y and the Gaussian kernel K rain (x i, x j ) = exp ( x i x j 2 2 σ ELM ). hen, π = ( I C + K rain) 1 Y for the ASMLKELM. 4. Represent the ELM by a probability model (1) P(y i,j = 1 H(x i ), β) = exp (β j H(x i )) j=1 exp ( β j H(xi )) (2) P(y i,j = 1 K train (x i ), π) = exp (π j K rain ) j=1 exp ( π j Krain ) for the ASMLBELM and the ASMLNLELM. for the ASMLKELM. 5. ASMLELM: he ML estimate based on the sparse representation with the Laplacian prior and the LORSAL algorithm. 5.1 β = arg max β β ( L(β ) Bβ ) β (B λ t )β. 5.2 Set t= Repeat. 5.4 β t+1 = arg min L(β) + 10λ β β 2 vt b t v t+1 = arg arg min λ v λ β v v 2 bt b t+1 = b t β t+1 + v t Increase t to t+1; If the ASMLKELM is applied, replace β by π. 5.8 Quit the algorithm until the stopping criterion is met. Prediction phase {X (x 1, x 2,, x N ) and Y = (y 1, y 2,, y N )}. 1: (1) Calculate the output layer matrix H (x i ) = [H 1 (w 1 x i + b 1 ),, H L (w L x i + b L )] L 1 i=1,,n for the ASMLBELM and the ASMLNLELM. (2) K test = H H = exp ( x i x j 2 ) for the ASMLKELM. 2σ ELM 2: (1) P(y i,j = 1 H (x i ), β) = exp (β k H (x i )) k=1 exp ( β k H (x i )) (2) P(y i,j = 1 K test, π) = exp (π k K test ) k=1 exp ( π kktest ) for the ASMLBELM and the AMSLNLELM. for the ASMLKELM. B. Weighted Composite Features Based ASMLELM (ASMLELM-WCFs) 5

6 From the above, it can be seen that the ASMLELM can just use the spectral information of the HSI data for classification. A pixel and its spatial neighborhood pixel likely belong to the same class [1]. herefore, the spatial information is very important. he spatial information will be adopted to the ASMLELM. Hence, the WCFs will be used to perform the spectral spatial classification for the proposed ASMLELM. For a given pixel x i, let the pixel coordinate of sample x i be (p, q), then the local pixel neighborhood centered at x i is N(x i ) = {x = (p, q) p [p a, p + a]; q [q a; q + a]}, a=(w opt-1)/2 w where w opt is the width of the neighborhood window. Let x i be s the spectral feature of the training sample and x i be the information extracted from a local spatial neighborhood of the pixel x w i. It can be represented as x s i = x w xk w k v k xk w v k with the weight v k = exp { z x i x ik 2 } measuring the spectral distance between the central pixel and the neighboring pixels (x ij N(x i )). Following the setting in [35], we set z = 0.2 in this work. hen, the output matrix of the hidden layer defined in (2) and (13) can be expressed as: H = μh w + (1 μ)h s (34) where H w = [H w (x w 1 ) H w (x w n )] L N (35) H s = [H s (x s 1 ) H s (x s n )] L N (36) and μ is a combination coefficient balancing the spectral and spatial information. For the KELM defined in (14), let β be where and 6 β = ( I C + K) 1 Y (37) K = μk Hw + (1 μ)k Hs (38) K Hw (x w i, x w j ) = exp ( x i w w x j 2 ) 2 σ w (39) K Hs (x s i, x s j ) = exp ( x i s s x j 2 ) (40) 2 σ s Here, σ w and σ s control the widths of the spectral and spatial Gaussian kernel. Now, three new methods for performing the spectral-spatial HSI classification can be proposed via the ASMLELM and WCFs. hat is, the ASMLBELM-WCFs, the ASMLNELM-WCFs and the ASMLKELM-WCFs. he details are summarized in Algorithm 2. Figure 1 show the flowchart of proposed ASMLELM-WCFs. Algorithm 2: ASMLELM with WCFs (ASMLELM-WCFs) Input: he spectral feature {x w = (x w 1, x w w 2,, x n ) and Y = (y 1, y 2,, y n )}, the spatial feature {x s (x s 1, x s s 2,, x n ) and Y = (y 1, y 2,, y n )} as well as the parameters C, λ, b = 0. raining phase: L: he number of nodes in a hidden layer. H( ): he sigmoid function. he output weight of the hidden layer β. 1: Randomly generate the input weight {w 1, w L } and the bias {b 1,, b L }. 2: For any training sample x i, calculate the hidden layer matrix H w (x w i ) = [H 1 (w 1 x w i + b 1 ),..., H L (w L x w i + b L )] L 1 and H s (x s i ) = [H 1 (w 1 x s i + b 1 ),..., H L (w L x s i + b L )] L 1. Here, (1) H = μh w + (1 μ)h s for the ASMLBELM-WCFs. (2) H = μh w + (1 μ)h s for the ASMLNLELM-WCFs and the ASMLKELM-WCFs. 3: Calculate the output weight (1) β = (H ) Y for the ASMLBELM-WCFs. (2) min L ELM = 1 β β,ξ i 2 F 2 + C 1 N ξ 2 2 i=1 i 2, s. t. H (x i )β = y i ξ i i=1,..,n. Here, β = H ( I C + H H) 1 Y for the ASMLNLELM-WCFs. (3) π = ( I C + K train) 1 Y, K train = μh w H w + (1 μ)h s H s = μk Hw + (1 μ)k Hs for the ASMLKELM- WCFs. 4.Represent the ELM by a probability mode P(y i,j = 1 H(x i ), β) = exp (β j H(x i )) j=1 exp ( β j H(xi )) P(y i,j = 1 K train (x i ), π) = exp (π j CK train (x i )) j=1 exp ( π j Ktrain (x i )) for the ASMLBELM-WCF and the ASMLNLELM-WCFs. for the ASMLKELM-WCFs. 5. ASMLELM : he ML estimate based on the sparse representation with the Laplacian prior and the LORSAL

7 algorithm. 5.1 β = arg max β β ( L(β ) Bβ ) β (B λ t )β. 5.2 t:= Repeat. 5.4 β t+1 = arg min L(β) + 10λ β β 2 vt b t v t+1 = arg arg min λ v λ β v v 2 bt b t+1 = b t β t+1 + v t Quit the algorithm until the stopping criterion is met. Prediction phase: he spectral feature {X w = (x w 1, x w w 2,, x N ) and Y = (y 1, y 2,, y N )} as well as the spatial feature {X s = (x s 1, x s s 2,, x N ) and Y = (y 1, y 2,, y N )}. 1: Calculate the output layer matrix H w (x w i ) = [H 1 (w 1 x w i + b 1 ),..., H L (w L x w i + b L )] L 1 and H s (x s i ) = [H 1 (w 1 x s i + b 1 ),..., H L (w L x s i + b L )] L 1. Here, (1) H = μh w + (1 μ)h s for the ASMLBELM-WCFs. (2) H = μh w + (1 μ)h s for the ASMLNLELM-WCFs. (3) H = μh w + (1 μ)h s, K test = μh w H w + (1 μ)h s H s = μk Hw + (1 μ)k Hs for ASMLKELM-WCFs. 2: (1) P(y i,j = 1 H (x i ), β) = exp (β j H (x i )) j=1 exp ( β j H (x i )) (2) P(y i,j = 1 K test (x i ), β) = exp (π j K test (x i )) k=1 exp ( π j Ktest (x i )) for the ASMLBELM-WCFs and the ASMLNLELM-WCFs. for the ASMLKELM-WCFs. D. he lower bound of the ASMLELM In this section, the lower bound of the proposed ASMLELM will be derived. From (19), we have: 2 L(β) B (41) From (20), it is well known that B is symmetric and negative definite independent from β. β at the (t+1)-th iteration is defined as: β t+1 = β t (B λ t ) 1 L(β t ) (42) Equation (42) corresponds to the following equation: From [29], (43) can be expressed as follows: hen, we have the following two lemmas: Lemma 1: (a): Q1(β) is maximized at: β = β (B λ t ) 1 L(β ). Q(β) = (β β ) L(β ) (β β ) B(β β ) λ β 1 (43) Q1(β) = (β β ) L(β ) (β β ) (B λ )(β β ) (44) (b): Q1(β )= 1 2 L (β )(B λ t ) 1 L(β ) 0, where the inequality is strictly satisfied if L(β ) 0. Proof: (a) As Q1(β) = L(β ) + (B λ )(β β ) = 0, we have β = β (B λ t ) 1 L(β ). (b) As Q1(β )= ((B λ t ) 1 L(β )) L(β ) ((B λ t ) 1 L(β )) (B λ t )((B λ t ) 1 L(β )) = L(β ) (B λ t ) 1 L(β ) L(β ) (B λ t ) 1 L(β ) = 1 2 L(β ) (B λ t ) 1 L(β ) 0, the inequality is strictly satisfied if L(β ) 0. Lemma 2: (a) Monotonicity: L(β t+1 ) L(β t ). (b) Convergence: he sequence L(β t ) converges to 0 if L is bounded as described in (a). Proof: 7

8 (a) For the convenience, let h = (B λ t ) 1 L(β t ). hen, we have L(β t+1 ) L(β t ) = h L(β t ) h 2 L(β t + ρh)h h L(β t ) h (B λ t )h 0. (b) o prove this lemma, suppose that L(β t ) is bounded by a value larger than 0. From (b) of Lemma 1, it can be seen that the increments are lower bounded. herefore, it contradicts the boundedness of Q1. As a result, it can be concluded that the sequence L(β t ) converges to 0. A. HSI Data Sets Figure 1. he flowchart of proposed ASMLELM-WCFs framework IV. Experiments and Analysis In this section, the performances of the proposed new framework using two well-known HSI data sets (the Indian Pines data set and the Pavia University data set) will be evaluated. hese two data sets are available in the public. (1) Indian Pines data set: he Indian Pines HSI data set consists of the urban images collected by the AVIRIS sensors built in June he image scene has pixels with 200 spectral bands, where 20 spectral bands are the water absorption bands. Each band is ranging from 0.2μm to 2.4μm and the spatial resolution of the HSI data is 20m per pixel. here are totally 16 classes in the HSI data. (2) Pavia University data set: he Pavia University HSI data set consists of data over the city Pavia Italy acquired by the ROSIS instrument in he image scene has pixels with 103 spectral bands after removing 12 water absorption bands. he spatial resolution of the HSI data is 1.3m per pixel. here are totally 9 classes in the HSI data. B. Measurements and Parameter Setting he parameter setting and the measurements are described below before conducting the experiments. he proposed six methods are compared with the state-of-the-art methods including SVM, SVM with CK (SVM-CK), LORSAL, KLORSAL, SMLR-SpAV, BELM, NLELM, and KELM. he LIBSVM [53] software is used for the implementation of the SVM and the SVM-CK. For the kernel based methods such as the SVM, the SVM-CK, the KELM, the ASMLKELM and the ASMLKELM-WCFs, the Gaussian kernel is used. he Gaussian kernel parameter σ and the penalty parameter C will be automatically tuned by the three folds cross validations. C=2 p, σ = 2 q, p ={1, 2,, 11, 12, 13, 14, 15} and q ={-6, -5, -4, -3, -2, -1, 0, 1}. Other parameters of the SVM and 8

9 the SVM-CK are set the same as [1]. he parameters of the LORSAL, the KLORSAL (kernel based LORSAL) and the SMLR- SpAV (KLORSAL with the weighted Markov random field [54]-[55]) are chosen the same as [43]. All experiments are conducted in MALAB R2015a and run in a computer with 2.9 GHz CPU and 32.0 G RAM. (1) For the proposed ASMLBELM, the total number of the neurons in the hidden layer L and λ are very important parameters. hey will be evaluated in the next subsection. (2) For the ASMLNLELM, the parameter C will be automatically tuned by these three folds cross validations. he effects of L and λ will be evaluated in the next subsection. (3) he important parameters of the ASMLKELM are C, σ and λ. C and σ will be automatically tuned by these three folds cross validations. he effect of λ will be evaluated in the next subsection. (4) C, L, λ and μ are important parameters of the proposed ASMLBELM-WCFs. C will be automatically tuned by these three folds cross validations. μ is empirically set to be 0.1 in the experiments. For the parameters L and λ, their effects will be evaluated in the next subsection. (5) he proposed ASMLNLELM-WCFs has several parameters such as L, C, λ and μ. he parameter C will be automatically tuned by these three folds cross validations. μ is empirically set to be 0.1 in the experiments. For the parameters L and λ, their effects will be evaluated in the next subsection. (6) For the ASMLKELM-WCFs, C and σ will be automatically tuned by these three folds cross validations. μ is empirically set to be 0.1 in the experiments. he effect of λ will be evaluated in the next subsection. C. Discussion on Parameters In this subsection, several important parameters of the proposed methods will be evaluated and its performance will be compared with the BELM and the NLELM. It is worth noting that the window size is set at 9 for the WCFs based methods in both experiments 1 and experiment 2, which means the widths of the neighborhood window are set at 9. he effects of the window size for the proposed CF based methods will be evaluated after evaluating the effects of λ and L. Experiment 1: In Fig. 2, the effect of the parameter λ (λ = 2 a ) on the proposed method is evaluated. For convenience, in this experiments, the number of the hidden layer is set at L=550 and L=900 for the Indian Pines data set and the Pavia University data set, respectively. Fig. 1(a) and Fig. 1(b) plot the OA results as a function of a with 1043 and 3921 training samples (10% and 9% of the available samples) of the Indian Pines data set and the Pavia University data set, respectively. From Fig. 1, it can be seen that these three spectral and these three spatial classifiers are still stable to achieve the good performances when the parameter a is varying. It can also be seen that the proposed spectral classifiers can achieve higher accuracy when a is relatively large. Overall, the proposed six classifiers can achieve good and stable performances when the parameter a is varying. For following experiments, both Indian Pines datasets and Pavia University datasets, we set a to be -20 for the proposed ASMLBELM-WMFs, ASMLNLELM-WMFs and ASMLKELM-WMFs, and assign a to be -10 for the proposed ASMLBELM and ASMLNLELM. For the proposed ASMLKELM, we set a to be -17 in Indian Pines datasets, and set a to be -13 in Pavia University datasets. Experiment 2: In Fig. 3, the OA results are plotted as a function of the hidden layer neurons L and its effects on the proposed methods, the BELM and the NLELM. From Fig. 2, it can be seen that the ASMLBELM and the ASMLNLELM always achieve higher accuracies than the BELM and the NLELM. It can also be seen that the proposed spectral spatial classifiers based on the ASMLBELM-WCFs and the ASMLNLELM-WCFs can greatly improve the performances of the proposed spectral classifiers based on the ASMLBELM and the ASMLNLELM. his is because the spatial information is considered in these two spectral classifiers. herefore, the ASMLBELM, the ASMLNLELM, the ASMLBELM-WCFs and the ASMLNLELM-WCFs can achieve better performances compared with the BELM and the NLELM for any value of L. For all the next experiments, if no special emphasized, we set L to be 450 for BELM, ASMLBELM-based (including ASMLBELM and ASMLBELM-WCFs), 1000 for NLELM, ASMLNLELM-based (including ASMLNLELM and ASMLNLELM-WCFs) in Indian Pines datasets. In Pavia University datasets, we set L to be 1100 for BELM, NLELM, ASMLBELM, ASMLNLELM, ASMLBELM-WCFs and ASMLNLELM- WCFs. Experiment 3: In this experiment, the effects of the widths of the window for the ASMBELM-WCFs, the ASMNLELM-WCFs and the ASMKELM-WCFs will be evaluated. Fig. 4 plots the OA result as a function of the window size of these three spectral spatial classifiers. From Fig. 3, it can be seen that the results of the proposed method are not very good when the window size is too small. However, the results of the proposed methods become very good and stable when the window size is large. his shows the generalization for achieving the good performance of the spectral spatial classifiers. For the convenience, the window size is set as 13 both for the Indian Pines data set and the Pavia University data set in the following experiments. 9

10 (a) (b) Fig. 2. Effect of the parameter a in an image in (a) the Indian Pines data set and (b) Pavia University data set. (a) (b) Fig. 3. Effect of the total number of neurons in the hidden layer L in an image in (a) the Indian Pines data set (b) the Pavia University data set. (a) (b) Fig. 4. Effect of the window size in an image in (a) the Indian Pines data set and (b) the Pavia University data set. able 1. Classification accuracy using 10% of the labeled samples per class for the Indian Pines data set (he best results are bolded). NO rain est SVM SVM-CK LORSAL KLORSAL SMLR-SpAV BELM NLELM KELM ASMLBELM ASMNLELM ASMLKELM ASMLBELM-WCFs ASMNLELM-WCFs ASMLKELM-WCFs

11 OA AA k able 2. Classification accuracy using 9% of the labeled samples per class for the Pavia University data set (he best results are bolded). NO. rain est SVM SVM-CK LORSAL KLORSAL SMLR-SpAV BELM NLELM KELM ASMLBELM ASMNLELM ASMLKELM ASMLBELM-WCFs ASMNLELM-WCFs ASMLKELM-WCFs OA AA KAPPA D. Experiment Results and Comparisons in the Indian Pines Data Set and the Pavia University Data Set In this subsection, the classification results are evaluated using the Indian Pines data set and the Pavia University data set. able 1 and able 2 show the training samples and the testing samples of the Indian Pines data set and the Pavia University data set. 10% of the training samples are employed for the training and the remaining samples are employed for the testing in the Indian Pines data set. Similarly, 9% of the training samples are employed for the training and the remaining samples are employed for the testing in the Pavia University data set. he classification accuracies are shown in able 1 and able 2. It can be seen that the ASMLBELM, the ASMLNLELM and ASMLKELM obtain the higher accuracies than the BELM the NLELM and the KELM, respectively. he performances of these proposed three spectral classifiers are improved dramatically when the spatial information (WCFs) are considered. Compared with other spectral spatial classifiers such as the SVM-CK and the SMLR-SpAV, the ASMLBELM-WCFs, the ASMLNLELM-WCFs and ASMLKELM-WCFs have achieved the higher classification accuracies than the SVM-CK and the SMLR-SpAV. Based on the above results, it can be concluded that the proposed six methods can achieve the very good performances. Fig. 5 and Fig. 6 show the images and the ground truth obtained by different methods for the Indian Pines data set and the Pavia University data set. E. Experiment Results with Different Number of raining Samples of he Proposed methods he performances of these six methods are evaluated under different numbers of training samples. he total number of training samples is chosen as 5, 10, 15, 20, 25, 30, 35 and 40 from each class. Half of the total samples are chosen when the total number of training samples is more than half of the total samples of the class. For the parameters in the proposed methods, the approaches discussed in the previous subsections are employed. 11

12 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) Fig. 5. Images of the Indian Pines data set. (a) he SVM (OA=85.71). (b) he SVM-CK (OA=96.05). (c) he LORSAL (OA=73.14). (d) he KLORSAL (OA=82.26). (e) he SMLR-SpAV (OA=97.50). (f) he BELM (OA=77.16). (g) he NLELM (OA=79.18). (h) he KELM (OA=85.28). (i) he ASMLBELM (OA=80.62). (j) he ASMLNLELM (OA=80.47). (k) he ASMLKELM (OA=85.93). (l) he ASMLBELM- WCFs (OA=98.33). (m) he ASMLNLELM-WCFs (OA=98.21). (n) he ASMLKELM-WCFs (OA=98.57) with 10% training samples. (o) he ground truth. (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) 12

13 (k) (l) (m) (n) (o) Fig. 6. Images of the Pavia University data set. (a) he SVM (OA=89.14). (b) he SVM-CK (OA=98.93). (c) he LORSAL (OA=77.63). (d) he KLORSAL (OA=87.79). (e) he SMLR-SpAV (OA=98.88). (f) he BELM (OA=90.87). (g) he NLELM (OA=90.80). (h) he KELM (OA=92.82). (i) he ASMLBELM (OA=92.06). (j) he ASMLNLELM (OA=91.71). (k) he ASMLKELM (OA=93.07). (l) he ASMLBELM- WCFs (OA=99.46). (m) he ASMLNLELM-WCFs (OA=99.44). (n) he ASMLKELM-WCFs (OA=99.69) with 9% training samples. (o) he ground truth. From able 3 and able 4, it can be seen that the classification accuracies of these three spectral classifiers are better than those of the BELM, the NLELM and the KELM, respectively. hese three spectral spatial classification algorithms can also achieve better performances. Compared with other classification methods listed in able 3 and able 4, these three spectral spatial classification methods achieve the higher accuracies than other spectral and spectral spatial classification methods. We can find an interesting result in able 3 and able 4 that the classification accuracies of BELM decrease when the number of training samples increases in most cases. his is because the ill-posed problem is particularly serious in BELM. Fortunately, the proposed ASMLBELM and ASMLNLELM-WCFs well alleviate this problem. In able 3 and able 4, we also display the execution time of the six proposed algorithms and other methods when using 100 training samples. From able 3 and able 4, in Indian Pines datasets and Pavia University datasets, the three proposed spectral algorithms, i.e. ASMLBELM, ASMLBELM and ASMLKELM need more consuming time than BELM, NLELM and KELM, respectively. But these three proposed algorithms have less consuming time than SVM. In Indian Pines data set, the proposed ASMLEBELM-WCFs and ASMLNLELM-WCFs algorithms need less consuming time than SVM-CK and SMLR-SpAV. Although the proposed ASMLEKELM-WCFs needs more consuming time than SMLR-SpAV, it needs less consuming time than SVM-CK. In Pavia University datasets, the three proposed spectral spatial algorithms have less consuming time than SVM-CK and SMLR-SpAV. Based on the above analysis, it can be seen that the proposed six classification methods achieve the very good performances especially the three spectral spatial classification algorithms. able 3. Classification accuracy (%) under different numbers of training samples for the Indian Pines data set (he best results are bolded). Number Index SVM SVM-CK LORSAL KLORSAL SMLR-SpAV BELM NLELM KELM ASMLBELM ASMNLELM ASMLKELM ASMLBELM-WCFs ASMNLELM-WCFs ASMLKELM-WCFs 5 OA AA k OA AA k OA AA k OA AA k OA AA k OA AA k OA AA k OA AA k ime(s)

14 able 4. Classification accuracy (%)under different numbers of training samples for the Pavia University data set (Best results are bolded). Number Index SVM SVM-CK LORSAL KLORSAL SMLR-SpAV BELM NLELM KELM ASMLBELM ASMNLELM ASMLKELM ASMLBELM-WCFs ASMNLELM-WCFs ASMLKELM-WCFs 5 OA AA k OA AA k OA AA k OA AA k OA AA k OA AA k OA AA k OA AA k ime(s) V. Conclusion his paper proposes three spectral algorithms and three spectral spatial methods which are improvements of ELM for performing the HSI classification. First, the ELM is represented by a probabilistic model via the MAP. hen, it was represented by a concave logarithmic likelihood function where its maximum can be obtained. Second, the Laplacian prior is adopted to it. he performance of the proposed framework is improved. hird, we adopt the LORSAL algorithm for the proposed framework in order to improve the performances. Fourthly, the spatial information are considered and the spectral spatial framework is employed for performing the HSI classification to further improve the classification accuracy. Finally, the lower bounds of the proposed method is derived by a rigorous mathematical proof, which demonstrate the good performances of proposed six proposed algorithms. For future work, we will focus on improving the computationally of proposed ASMLKELM by further spare representation. Moreover, we will also develop the classification accuracy by resorting to extended multi-attribute profiles [56] (EMAPs) method. ACKNOWLEDGMEN his work is supported by the National Nature Science Foundation of China (no , ), the raining program for outstanding young teachers in higher education institutions of Guangdong Province (no. YQ ). REFERENCES [1] Y. Zhou, J. Peng, C. L. P. Chen, Extreme learning machine with composite kernels for hyperspectral image classification, IEEE Journal of Selected opics in Applied Earth Observations and Remote Sensing, vol. 8, no. 6, pp , [2] A. Plaza, J. A. Benediktsson, J. W. Boardman, et al, Recent advances in techniques for hyperspectral image processing, Remote sensing of environment, vol. 113, pp. S110-S122, [3] Hughes G, On the mean accuracy of statistical pattern recognizers, IEEE transactions on information theory, vol. 14, no. 1, pp , [4] J. Li, P. R. Marpu, A. Plaza, et al, Generalized composite kernel frame work for hyperspectral image classification, IEEE ransactions on Geoscience and Remote Sensing, vol. 51, no. 9, pp , [5]. Qiao, J. Ren et al, Effective denoising and classification of hyperspectral images using curvelet transform and singular spectrum analysis, IEEE rans. Geoscience and Remote Sensing, vol. 55, no. 1, pp ,

15 [6] J. Zabalza, J. Ren, J. Zheng, J. Han, H. Zhao, S. Li, and S. Marshall, "Novel two dimensional singular spectrum analysis for effective feature extraction and data classification in hyperspectral imaging," IEEE rans. Geoscience and Remote Sensing, vol. 53, no. 8, pp , [7]. Qiao, J. Ren, C. Craigie, Z. Zabalza, C. Maltin, S. Marshall, "Singular spectrum analysis for improving hyperspectral imaging based beef eating quality evaluation," Computers and Electronics in Agriculture, [8] J. Zabalza, J. Ren, Z. Wang, H. Zhao, J. Wang, and S. Marshall, "Fast implementation of singular spectrum analysis for effective feature extraction in hyperspectral imaging," IEEE Journal of Selected opics in Earth Observation and Remote Sensing, vol. 8, no. 6, pp , [9] F. Melgani, L. Bruzzone, Classification of hyperspectral remote sensing images with support vector machines, IEEE ransactions on geoscience and remote sensing, vol. 42, no. 8, pp , [10] Q. He, C. Du, Q. Wang, F. Zhuang, and Z. Shi, A parallel incremental extreme SVM classifier, Neurocomputing, vol. 74, pp , [11] J. Zabalza, J. Ren, J. Zheng, H. Zhao, C. Qing, Z. Yang, P. Du and S. Marshall, Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging, Neurocomputing, vol. 185, pp. 1-10, 2016 [12] J. Ren, Z. Zabalza, S. Marshall and J. Zheng, "Effective feature extraction and data reduction with hyperspectral imaging in remote sensing," IEEE Signal Processing Magazine, vol. 31, no. 4, pp , [13] J. Zabalza, J.-C. Ren, J. Ren, Z. Liu, and S. Marshall, "Structured cova ciance principle component analysis for real-time onsite feature extraction and dimensionality reduction in hyperspectral imaging," Applied Optics, vol. 53, no. 20, pp , [14] J. Zabalza, J. Ren, M. Yang, Y. Zhang, J. Wang, S. Marshall, J. Han, "Novel Folded-PCA for Improved Feature Extraction and Data Reduction with Hyperspectral Imaging and SAR in Remote Sensing," ISPRS J. Photogrammetry & Remote Sensing, vol. 93, no. 7, pp , [15] L. Fang, S. Li, W. Duan, J. Ren, J. Atli Benediktsson, "Classification of hyperspectral images by exploiting spectral-spatial information of superpixel via multiple kernels," IEEE rans. Geoscience and Remote Sensing, vol. 53, no. 12, pp , [16] J. Li, JM. Bioucas-Dias, A. Plaza, Spectral spatial classification of hyperspectral data using loopy belief propagation and active learning, IEEE ransactions on Geoscience and Remote Sensing, vol. 51, no. 2, pp , [17] G. B. Huang, Q. Y. Zhu, C. K. Siew, Extreme learning machine: theory and applications, Neurocomputing, vol. 70, no. 1, pp , [18] G. B. Huang, An insight into extreme learning machines: random neurons, random features and kernels, Cognitive Computation, vol. 6, no. 3, pp , [19] G. B. Huang, H. Zhou, X. Ding, et al, Extreme learning machine for regression and multiclass classification, IEEE ransactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 2, pp , [20] Y. Wang, F. Cao, and Y. Yuan, A study on effectiveness of extreme learning machine, Neurocomputing, vol. 74, pp , [21] H. J. Rong, Y. S. Ong, A. H. an, and Z. Zhu, A fast pruned-extreme learning machine for classification problem, Neurocomputing, vol. 72, pp , [22] G. B. Huang, X. Ding, and H. Zhou, Optimization method based extreme learning machine for classification, Neurocomputing, vol. 74, pp , [23] A. Samat, P. Du, S. Liu, et al, E 2 LMs: Ensemble Extreme Learning Machines for Hyperspectral Image Classification, IEEE Journal of Selected opics in Applied Earth Observations and Remote Sensing, vol. 7, no. 4, pp , [24]Shen Y, Xu J, Li H, et al. ELM-based spectral-spatial classification of hyperspectral images using bilateral filtering information on spectral band-subsets[c]//geoscience and Remote Sensing Symposium (IGARSS), 2016 IEEE International. IEEE, 2016: [25]Argüello F, Heras D B. ELM-based spectral spatial classification of hyperspectral images using extended morphological profiles and composite feature mappings[j]. International Journal of Remote Sensing, 2015, 36(2): [26]Duan W, Li S, Fang L. Spectral-spatial hyperspectral image classification using superpixel and extreme learning machines[c]//chinese Conference on Pattern Recognition. Springer, Berlin, Heidelberg, 2014: [27]Heras D B, Argüello F, Quesada-Barriuso P. Exploring ELM-based spatial spectral classification of hyperspectral images[j]. International Journal of Remote Sensing, 2014, 35(2): [28]Chen C, Li W, Su H, et al. Spectral-spatial classification of hyperspectral image based on kernel extreme learning machine[j]. Remote Sensing, 2014, 6(6): [29] B. Krishnapuram, L. Carin, M. A.. Figueiredo, et al, Sparse multinomial logistic regression: Fast algorithms and generalization bounds, IEEE transactions on pattern analysis and machine intelligence, vol. 27, no. 6, pp , [30] D. L. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization, Proceedings of the National Academy of Sciences, vol. 100, no. 5, pp , [31] J. Friedman,. Hastie, S. Rosset, et al, Discussion of boosting papers, Ann. Statist, vol. 32, pp , [32] A. Y. Ng, Feature selection, L 1 vs. L 2 regularization, and rotational invariance, Proceedings of the twenty-first international conference on Machine learning. ACM, vol. 78, [33]Afonso M V, Bioucas-Dias J M, Figueiredo M A. Fast image recovery using variable splitting and constrained optimization[j]. IEEE ransactions on Image Processing, 2010, 19(9):

Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification

Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification Article Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification Faxian Cao 1, Zhijing Yang 1, *, Jinchang Ren 2, Wing-Kuen Ling 1, Huimin Zhao

More information

The Application of Extreme Learning Machine based on Gaussian Kernel in Image Classification

The Application of Extreme Learning Machine based on Gaussian Kernel in Image Classification he Application of Extreme Learning Machine based on Gaussian Kernel in Image Classification Weijie LI, Yi LIN Postgraduate student in College of Survey and Geo-Informatics, tongji university Email: 1633289@tongji.edu.cn

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING. Alexandros Iosifidis and Moncef Gabbouj

A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING. Alexandros Iosifidis and Moncef Gabbouj A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING Alexandros Iosifidis and Moncef Gabbouj Department of Signal Processing, Tampere University of Technology, Finland {alexandros.iosifidis,moncef.gabbouj}@tut.fi

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Haiqin Yang, Kaizhu Huang, Laiwan Chan, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

SECOND ORDER STATISTICS FOR HYPERSPECTRAL DATA CLASSIFICATION. Saoussen Bahria and Mohamed Limam

SECOND ORDER STATISTICS FOR HYPERSPECTRAL DATA CLASSIFICATION. Saoussen Bahria and Mohamed Limam Manuscrit auteur, publié dans "42èmes Journées de Statistique (2010)" SECOND ORDER STATISTICS FOR HYPERSPECTRAL DATA CLASSIFICATION Saoussen Bahria and Mohamed Limam LARODEC laboratory- High Institute

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Classification of High Spatial Resolution Remote Sensing Images Based on Decision Fusion

Classification of High Spatial Resolution Remote Sensing Images Based on Decision Fusion Journal of Advances in Information Technology Vol. 8, No. 1, February 2017 Classification of High Spatial Resolution Remote Sensing Images Based on Decision Fusion Guizhou Wang Institute of Remote Sensing

More information

A Tutorial on Support Vector Machine

A Tutorial on Support Vector Machine A Tutorial on School of Computing National University of Singapore Contents Theory on Using with Other s Contents Transforming Theory on Using with Other s What is a classifier? A function that maps instances

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Kernel Logistic Regression and the Import Vector Machine

Kernel Logistic Regression and the Import Vector Machine Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Pattern Recognition 2018 Support Vector Machines

Pattern Recognition 2018 Support Vector Machines Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Sparse Support Vector Machines by Kernel Discriminant Analysis

Sparse Support Vector Machines by Kernel Discriminant Analysis Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Spectral and Spatial Methods for the Classification of Urban Remote Sensing Data

Spectral and Spatial Methods for the Classification of Urban Remote Sensing Data Spectral and Spatial Methods for the Classification of Urban Remote Sensing Data Mathieu Fauvel gipsa-lab/dis, Grenoble Institute of Technology - INPG - FRANCE Department of Electrical and Computer Engineering,

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Non-negative Matrix Factorization on Kernels

Non-negative Matrix Factorization on Kernels Non-negative Matrix Factorization on Kernels Daoqiang Zhang, 2, Zhi-Hua Zhou 2, and Songcan Chen Department of Computer Science and Engineering Nanjing University of Aeronautics and Astronautics, Nanjing

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Robust Multiple Estimator Systems for the Analysis of Biophysical Parameters From Remotely Sensed Data

Robust Multiple Estimator Systems for the Analysis of Biophysical Parameters From Remotely Sensed Data IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 43, NO. 1, JANUARY 2005 159 Robust Multiple Estimator Systems for the Analysis of Biophysical Parameters From Remotely Sensed Data Lorenzo Bruzzone,

More information

Classification and Support Vector Machine

Classification and Support Vector Machine Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline

More information

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Regularization in Neural Networks

Regularization in Neural Networks Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

NON-FIXED AND ASYMMETRICAL MARGIN APPROACH TO STOCK MARKET PREDICTION USING SUPPORT VECTOR REGRESSION. Haiqin Yang, Irwin King and Laiwan Chan

NON-FIXED AND ASYMMETRICAL MARGIN APPROACH TO STOCK MARKET PREDICTION USING SUPPORT VECTOR REGRESSION. Haiqin Yang, Irwin King and Laiwan Chan In The Proceedings of ICONIP 2002, Singapore, 2002. NON-FIXED AND ASYMMETRICAL MARGIN APPROACH TO STOCK MARKET PREDICTION USING SUPPORT VECTOR REGRESSION Haiqin Yang, Irwin King and Laiwan Chan Department

More information

HYPERSPECTRAL imaging sensors collect hyperspectral

HYPERSPECTRAL imaging sensors collect hyperspectral 1 Hyperspectral Image Spectral-Spatial Feature Extraction via Tensor Principal Component Analysis Yuemei Ren, Liang Liao, Member, IEEE, Stephen John Maybank, Fellow, IEEE, Yanning Zhang, Senior member,

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Links between Perceptrons, MLPs and SVMs

Links between Perceptrons, MLPs and SVMs Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized

More information

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp.

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp. On different ensembles of kernel machines Michiko Yamana, Hiroyuki Nakahara, Massimiliano Pontil, and Shun-ichi Amari Λ Abstract. We study some ensembles of kernel machines. Each machine is first trained

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition

Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition J. Uglov, V. Schetinin, C. Maple Computing and Information System Department, University of Bedfordshire, Luton,

More information

Constrained Optimization and Support Vector Machines

Constrained Optimization and Support Vector Machines Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Self-Tuning Semantic Image Segmentation

Self-Tuning Semantic Image Segmentation Self-Tuning Semantic Image Segmentation Sergey Milyaev 1,2, Olga Barinova 2 1 Voronezh State University sergey.milyaev@gmail.com 2 Lomonosov Moscow State University obarinova@graphics.cs.msu.su Abstract.

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Research on Feature Extraction Method for Handwritten Chinese Character Recognition Based on Kernel Independent Component Analysis

Research on Feature Extraction Method for Handwritten Chinese Character Recognition Based on Kernel Independent Component Analysis Research Journal of Applied Sciences, Engineering and echnology 6(7): 183-187, 013 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 013 Submitted: November 13, 01 Accepted: January 11,

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Margin Maximizing Loss Functions

Margin Maximizing Loss Functions Margin Maximizing Loss Functions Saharon Rosset, Ji Zhu and Trevor Hastie Department of Statistics Stanford University Stanford, CA, 94305 saharon, jzhu, hastie@stat.stanford.edu Abstract Margin maximizing

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of

More information

Support Vector Machine Regression for Volatile Stock Market Prediction

Support Vector Machine Regression for Volatile Stock Market Prediction Support Vector Machine Regression for Volatile Stock Market Prediction Haiqin Yang, Laiwan Chan, and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

Support Vector Machine via Nonlinear Rescaling Method

Support Vector Machine via Nonlinear Rescaling Method Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and

More information

SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE

SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE M. Lázaro 1, I. Santamaría 2, F. Pérez-Cruz 1, A. Artés-Rodríguez 1 1 Departamento de Teoría de la Señal y Comunicaciones

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information