Radial Basis Function Network for Multi-task Learning
|
|
- Tracey Rose
- 6 years ago
- Views:
Transcription
1 Radial Basis Function Networ for Multi-tas Learning Xuejun Liao Department of ECE Due University Durham, NC , USA Lawrence Carin Department of ECE Due University Durham, NC , USA Abstract We extend radial basis function (RBF networs to the scenario in which multiple correlated tass are learned simultaneously, and present the corresponding learning algorithms. We develop the algorithms for learning the networ structure, in either a supervised or unsupervised manner. raining data may also be actively selected to improve the networ s generalization to test data. Experimental results based on real data demonstrate the advantage of the proposed algorithms and support our conclusions. 1 Introduction In practical applications, one is frequently confronted with situations in which multiple tass must be solved. Often these tass are not independent, implying what is learned from one tas is transferable to another correlated tas. By maing use of this transferability, each tas is made easier to solve. In machine learning, the concept of explicitly exploiting the transferability of expertise between tass, by learning the tass simultaneously under a unified representation, is formally referred to as multi-tas learning 1]. In this paper we extend radial basis function (RBF networs 4,5] to the scenario of multitas learning and present the corresponding learning algorithms. Our primary interest is to learn the regression model of several data sets, where any given data set may be correlated with some other sets but not necessarily with all of them. he advantage of multi-tas learning is usually manifested when the training set of each individual tas is wea, i.e., it does not generalize well to the test data. Our algorithms intend to enhance, in a mutually beneficial way, the wea training sets of multiple tass, by learning them simultaneously. Multi-tas learning becomes superfluous when the data sets all come from the same generating distribution, since in that case we can simply tae the union of them and treat the union as a single tas. In the other extreme, when all the tass are independent, there is no correlation to utilize and we learn each tas separately. he paper is organized as follows. We define the structure of multi-tas RBF networ in Section and present the supervised learning algorithm in Section 3. In Section 4 we show how to learn the networ structure in an unsupervised manner, and based on this we demonstrate how to actively select the training data, with the goal of improving the
2 generalization to test data. We perform experimental studies in Section 5 and conclude the paper in Section 6. Multi-as Radial Basis Function Networ Figure 1 schematizes the radial basis function (RBF networ structure customized to multitas learning. he networ consists of an input layer, a hidden layer, and an output layer. he input layer receives a data point x = x 1,,x d ] R d and submits it to the hidden layer. Each node at the hidden layer has a localized activation φ n (x = φ( x c n,σ n, n = 1,,N, where denotes the vector norm and φ n ( is a radial basis function (RBF localized around c n with the degree of localization parameterized by σ n. Choosing φ(z,σ = exp( z σ gives the Gaussian RBF. he activations of all hidden nodes are weighted and sent to the output layer. Each output node represents a unique tas and has its own hidden-to-output weights. he weighted activations of the hidden nodes are summed at each output node to produce the output for the associated tas. Denoting w = w 0,w 1,,w N ] as the weights connecting hidden nodes to the -th output node, then the output for the -th tas, in response to input x, taes the form f (x = w φ(x (1 where φ(x = φ 0 (x,φ 1 (x,...,φ N (x ] is a column containing N + 1 basis functions with φ 0 (x 1 a dummy basis accounting for the bias in Figure 1. f 1 (x f (x f K (x Networ response as 1 as as K Output layer (Specified by the number of tass w 1 w w K Hidden-to-output weights 0 1 Bias 1 (x (x N (x Hidden layer (basis functions (o be learned by algorithms Input layer (Specified by data dimensionality x= x 1 x x d ] Input data point Figure 1: A multi-tas structure of RBF Networ. Each of the output nodes represents a unique tas. Each tas has its own hidden-to-output weights but all the tass share the same hidden nodes. he activation of hidden node n is characterized by a basis function φ n(x = φ( x c n, σ n. A typical choice of φ is φ(z, σ = exp( z σ, which gives the Gaussian RBF. 3 Supervised Learning Suppose we have K tass and the data set of the -th tas is D = {(x 1,y 1,,(x J,y J }, where y i is the target (desired output of x i. By definition, a given data point x i is said to be supervised if the associated target y i is provided and unsupervised if y i is not provided. he definition extends similarly to a set of data
3 able 1: Learning Algorithm of Multi-as RBF Networ Input: {(x 1,y,,(x J,,y J,} =1:K, φ(,σ, σ, and ρ; Output: φ( and {w } K =1. 1. For m=1:k, For n=1:j m, For =1:K, For i=1:j nm Compute φ i = φ( x nm x i,σ;. Let N = 0, φ( = 1, e 0 = K J =1 i=1 y i (J + ρ 1 ( J i=1 y i ]; For =1:K, compute A =J +ρ, w =(J +ρ 1 J i=1 y i; 3. For m=1:k, For n=1:j m If φ nm is not mared as deleted For =1:K, compute c = J i=1 φ nm i φ i, q = J nm i=1 ( φ i + ρ c A 1 c ; If there exists such that q = 0, mar φ nm as deleted ; else, compute δe(φ, φ nm using (5. 4. If { φ i } i=1:j,=1:k are all mared as deleted, go to Let (n,m = arg max φnm not mared as deleted δe(φ, φ nm ; Mar φ n m as deleted. 6. une RBF parameter σ N+1 = arg max σ δe(φ,φ( x n m,σ 7. Let φ N+1 ( = φ( x n m,σ N+1; Update φ( φ (,φ N+1 ( ] ; 8. For =1:K Compute A new and w new respectively by (A-1 and (A-3 in the appendix; Update A A new, w w new 9. Let e N+1 = e N δe(φ,φ N+1 ; If the sequence {e n } n=0:(n+1 is converged, go to 10, else update N N + 1 and go bac to Exit and output φ( and {w } K =1. points. We are interested in learning the functions f (x for the K tass, based on K =1 D. he learning is based on minimizing the squared error e(φ,w = K =1 { J i=1 ( w φ i y i + ρ w } ( where φ i = φ(x i for notational simplicity. he regularization terms ρ w, = 1,,K, are used to prevent singularity of the A matrices defined in (3, and ρ is typically set to a small positive number. For fixed φ s, the w s are solved by minimizing e(φ,w with respect to w, yielding w = A 1 J i=1 y iφ i and A = J i=1 φ iφ i + ρi, = 1,,K (3 In a multi-tas RBF networ, the input layer and output layer are respectively specified by the data dimensionality and the number of tass. We now discuss how to determine the hidden layer (basis functions φ. Substituting the solutions of the w s in (3 into ( gives e(φ = K =1 J i=1 ( y i y i w φ i where e(φ is a function of φ only because w s are now functions of φ as given by (3. By minimizing e(φ, we can determine φ. Recalling that φ i is an abbreviation of φ(x i = 1,φ 1 (x i,...,φ N (x i ], this amounts to determining N, the number of basis functions, and the functional form of each basis function φ n (, n = 1,...,N. Consider the candidate functions {φ nm (x = φ( x x nm,σ : n = 1,,J m, m = 1,,K}. We learn the RBF networ structure by selecting φ( from these candidate functions such that e(φ in (4 is minimized. he following theorem tells us how to perform the selection in a sequential way; the proof is given in the Appendix. (4
4 heorem 1 Let φ(x = 1,φ 1 (x,...,φ N (x] and φ N+1 (x be a single basis function. Assume the A matrices corresponding to φ and φ,φ N+1 ] are all non-degenerate. hen δe(φ,φ N+1 = e(φ e(φ,φ N+1 ] = K ( =1 c w J i=1 y iφ N+1 q 1 i (5 where φ N+1 i = φ N+1 (φ i, w and A are the same as in (3, and c = J i=1 φ iφ N+1 i, d = J i=1 (φn+1 i + ρ, q = d c A 1 c (6 By the conditions of the theorem A new is full ran and hence it is positive definite by con- is a diagonal element of (A new struction. By (A- in the Appendix, q 1 1, therefore q 1 is positive and by (5 δe(φ,φ N+1 > 0, which means adding φ N+1 to φ generally maes the squared error decrease. he decrease δe(φ,φ N+1 depends on φ N+1. By sequentially selecting basis functions that bring the maximum error reduction, we achieve the goal of maximizing e(φ. he details of the learning algorithm are summarized in able 1. 4 Active Learning In the previous section, the data in D are supervised (provided with the targets. In this section, we assume the data in D are initially unsupervised (only x is available without access to the associated y and we select a subset from D to be supervised (targets acquired such that the resulting networ generalizes well to the remaining data in D. he approach is generally nown as active learning 6]. We first learn the basis functions φ from the unsupervised data, and based on φ select data to be supervised. Both of these steps are based on the following theorem, the proof of which is given in the Appendix. heorem Let there be K tass and the data set of the -th tas is D D where D = {(x i,y i } J i=1 and D = {(x i,y i } J + J i=j +1. Let there be two multi-tas RBF networs, whose output nodes are characterized by f ( and f (, respectively, for tas = 1,...,K. he two networs have the same given basis functions (hidden nodes φ( = 1,φ 1 (,,φ N ( ], but different hidden-to-output weights. he weights of f ( are trained with D D, while the weights of f ( are trained using D. hen for = 1,,K, the square errors committed on D by f ( and f ( are related by 0 detγ ] 1 λ 1 max, J i=1 (y i f (x i ] 1 J i=1 (y i f (x i λ 1 min, 1 (7 where Γ = I + Φ (ρi + Φ Φ 1 Φ ] with Φ = φ(x1,...,φ(x J ] and Φ = φ(xj +1,,...,φ(x J + J, ], and λ max, and λ min, are respectively the largest and smallest eigenvalues of Γ. Specializing heorem to the case J = 0, we have Corollary 1 Let there be K tass and the data set of the -th tas is D = {(x i,y i } J i=1. Let the RBF networ, whose output nodes are characterized by f ( for tas = 1,...,K, have given basis functions (hidden nodes φ( = 1,φ 1 (,,φ N ( ] and the hidden-to-output weights of tas be trained with D. hen for = 1,,K, the squared error committed on D by f ( is bounded as 0 detγ ] 1 λ 1 max, J ] 1 i=1 y J i i=1 (y i f (x i λ 1 min, 1, where Γ = ( I + ρ 1 Φ Φ with Φ = φ(x 1,,...,φ(x J, ], and λ max, and λ min, are respectively the largest and smallest eigenvalues of Γ. It is evident from the properties of matrix determinant 7] and the definition of Φ that detγ = det(ρi + Φ Φ ] det(ρi] = det(ρi + J i=1 φ iφ i ] det(ρi].
5 Using (3 we write succinctly detγ = deta ]det(ρi]. We are interested in selecting the basis functions φ that minimize the error, before seeing y s. By Corollary 1 and the equation detγ = deta ]det(ρi], the squared error is lower bounded by J i=1 y i det(ρi] deta ]. Instead of minimizing the error directly, we minimize its lower bound. As det(ρi] L i=1 y i does not depend on φ, this amounts to selecting φ to minimize (deta. o minimize the errors for all tass = 1,K, we select φ to minimize K =1 (deta. he selection proceeds in a sequential manner. Suppose we have selected basis functions φ = 1,φ 1,,φ N ]. he associated A matrices are A = J i=1 φ iφ i + ρi (N+1 (N+1, = 1,,K. Augmenting basis functions to φ,φ N+1 ], the A matrices change to A new = J i=1 φ i,φ N+1 i ] φ i,φ N+1 i ] + ρi (N+ (N+. Using the determinant formula of bloc matrices 7], we get K =1 (detanew = K =1 (q deta, where q is the same as in (6. As A does not depend on φ N+1, the left-hand side is minimized by maximizing K =1 q. he selection is easily implemented by maing the following two minor modifications in able 1: (a in step, compute e 0 = K =1 ln(j + ρ ; in step 3, compute δe(φ, φ nm = K =1 lnq. Employing the logarithm is for gaining additivity and it does not affect the maximization. Based on the basis functions φ determined above, we proceed to selecting data to be supervised and determining the hidden-to-output weights w from the supervised data using the equations in (3. he selection of data is based on an iterative use of the following corollary, which is a specialization of heorem and was originally given in 8]. Corollary Let there be K tass and the data set of the -th tas is D = {(x i,y i } J i=1. Let there be two RBF networs, whose output nodes are characterized by f ( and f + (, respectively, for tas = 1,...,K. he two networs have the same given basis functions φ( = 1,φ 1 (,,φ N ( ], but different hidden-to-output weights. he weights of f ( are trained with D, while the weights of f + ( are trained using D + = D {(x J +1,,y J +1,}. hen for = 1,,K, the squared errors committed on (x J +1,,y J +1, by f ( and f + ( are related by f + (x J +1, y J +1,] = γ(xj +1, ] 1, f (x J +1, y J +1,] where γ(xj +1, = 1 + φ (x J +1,A 1 φ(x J +1, ] 1 and A = J i=1 ρi + φ(xi φ (x i ] is the same as in (3. wo observations are made from Corollary. First, if γ(x J +1, 1, seeing y J +1, does not effect the error on x J +1,, indicating D already contain sufficient information about (x J +1,,y J +1,. Second, if γ(x i 1, seeing y J +1, greatly decrease the error on x J +1,, indicating x J +1, is significantly dissimilar (novel to D and x J +1, must be supervised to reduce the error. Based on Corollary, the selection proceeds sequentially. Suppose we have selected data D = {(x i,y i } J i=1, from which we compute A. We select the next data point as x J +1, = arg max i>j, =1,,K γ(x i = arg max i>j =1,,K 1 + φ (x i A 1 φ(x i ]. After xj +1, is selected, the A is updated and the next selection begins. As the iteration advances γ will decrease until it reaches convergence. We use (3 to compute w from the selected x and their associated targets y, completing learning of the RBF networ. 5 Experimental Results In this section we compare the multi-tas RBF networ against single-tas RBF networs via experimental studies. We consider three types of RBF networs to learn K tass, each
6 with its data set D. In the first, which we call one RBF networ, we let the K tass share both basis functions φ (hidden nodes and hidden-to output weights w, thus we do not distinguish the K tass and design a single RBF networ to learn a union of them. he second is the multi-tas RBF networ, where the K tass share the same φ but each has its own w. In the third, we have K independent networs, each designed for a single tas. We use a school data set from the Inner London Education Authority, consisting of examination records of 1536 students from 139 secondary schools. he data are available at his data set was originally used to study the effectiveness of schools and has recently been used to evaluate multi-tas algorithms,3]. he goal is to predict the exam scores of the students based on 9 variables: year of exam (1985, 1986, or 1987, school code (1-139, FSM (percentage of students eligible for free school meals, VR1 band (percentage of students in school in VR band one, gender, VR band of student (3 categories, ethnic group of student (11 categories, school gender (male, female, or mixed, school denomination (3 categories. We consider each school a tas, leading to 139 tass in total. he remaining 8 variables are used as inputs to the RBF networ. Following,3], we converted each categorical variable to a number of binary variables, resulting in a total number of 7 input variables, i.e., x R 7. he exam score is the target to be predicted. he three types of RBF networs as defined above are designed as follows. he multi-tas RBF networ is implemented as the structure as shown in Figure 1 and trained with the learning algorithm in able 1. he one RBF networ is implemented as a special case of Figure 1, with a single output node and trained using the union of supervised data from all 139 schools. We design 139 independent RBF networs, each of which is implemented with a single output node and trained using the supervised data from a single school. We use the Gaussian RBF φ n (x = exp( x cn σ, where the c n s are selected from training data points and σ n s are initialized as 0 and optimized as described in able 1. he main role of the regularization parameter ρ is to prevent the A matrices from being singular and it does not affect the results seriously. In the results reported here, ρ is set to Following -3], we randomly tae 75% of the 1536 data points as training (supervised data and the remaining 5% as test data. he generalization performance is measured by the squared error (f (x i y i averaged over all test data x i of tass = 1,,K. We made 10 independent trials to randomly split the data into training and test sets and the squared error averaged over the test data of all the 139 schools and the trials are shown in able, for the three types of RBF networs. able : Squared error averaged over the test data of all 139 schools and the 10 independent trials for randomly splitting the school data into training (75% and testing (5% sets. Multi-tas RBF networ Independent RBF networs One RBF networ ± ± ±.8093 able clearly shows the multi-tas RBF networ outperforms the other two types of RBF networs by a considerable margin. he one RBF networ ignores the difference between the tass and the independent RBF networs ignore the tass correlations, therefore they both perform inferiorly. he multi-tas RBF networ uses the shared hidden nodes (basis functions to capture the common internal representation of the tass and meanwhile uses the independent hidden-to-output weights to learn the statistics specific to each tas. We now demonstrate the results of active learning. We use the method in Section 4 to actively split the data into training and test sets using a two-step procedure. First we learn the basis functions φ of multi-tas RBF networ using all 1536 data (unsupervised. Based on the φ, we then select the data to be supervised and use them as training data to learn
7 the hidden-to-output weights w. o mae the results comparable, we use the same training data to learn the other two types of RBF networs (including learning their own φ and w. he networs are then tested on the remaining data. Figure shows the results of active learning. Each curve is the squared error averaged over the test data of all 139 schools, as a function of number of training data. It is clear that the multi-tas RBF networ maintains its superior performance all the way down to 5000 training data points, whereas the independent RBF networs have their performances degraded seriously as the training data diminish. his demonstrates the increasing advantage of multi-tas learning as the number of training data decreases. he one RBF networ seems also insensitive to the number of training data, but it ignores the inherent dissimilarity between the tass, which maes its performance inferior. 60 Squared error averaged over test data Multi tas RBF networ Independent RBF networs One RBF networ Number of training (supervised data Figure : Squared error averaged over the test data of all 139 schools, as a function of the number of training (supervised data. he data are split into training and test sets via active learning. 6 Conclusions We have presented the structure and learning algorithms for multi-tas learning with the radial basis function (RBF networ. By letting multiple tass share the basis functions (hidden nodes we impose a common internal representation for correlated tass. Exploiting the inter-tas correlation yields a more compact networ structure that has enhanced generalization ability. Unsupervised learning of the networ structure enables us to actively split the data into training and test sets. As the data novel to the previously selected ones are selected next, what finally remain unselected and to be tested are all similar to the selected data which constitutes the training set. his improves the generalization of the resulting networ to the test data. hese conclusions are substantiated via results on real multi-tas data. References 1] R. Caruana. (1997 Multitas learning. Machine Learning, 8, p , ] B. Baer and. Heses (003. as clustering and gating for Bayesian multitas learning. Journal of Machine Learning Research, 4: 83-99, 003 3]. Evgeniou, C. A. Micchelli, and M. Pontil (005. Learning Multiple ass with Kernel Methods. Journal of Machine Learning Research, 6: , 005
8 4] Powell M. (1987, Radial basis functions for multivariable interpolation : A review, J.C. Mason and M.G. Cox, eds, Algorithms for Approximation, pp ] Chen, F. Cowan, and P. Grant (1991, Orthogonal least squares learning algorithm for radial basis function networs, IEEE ransactions on Neural Networs, Vol., No., , ] Cohn, D. A., Ghahramani, Z., and Jordan, M. I. (1995. Active learning with statistical models. Advances in Neural Information Processing Systems, 7, ] V. Fedorov (197, heory of Optimal Experiments, Academic Press, 197 8] M. Stone (1974, Cross-validatory choice and assessment of statistical predictions, Journal of the Royal Statistical Society, Series B, 36, pp , Appendix Proof of heorem 1:. Let φ new = φ, φ N+1 ]. By (3, the A matrices corresponding to φ new are A new = J φ ] i ] i=1 φ N+1 φ i φ N+1 A c ] i + ρi(n+ (N+ = i c (A-1 d where c and d are as in (6. By the conditions of the theorem, the matrices A and A new are all non-degenerate. Using the bloc matrix inversion formula 7] we get (A new 1 = A 1 + A 1 c q 1 q 1 where q is as in (6. By (3, the weights w new w new J = (A new 1 J i=1 y iφ i i=1 y iφ N+1 i c A 1 c A 1 A 1 c q 1 q 1 corresponding to φ, φ N+1 ] are ] w + A 1 = c q 1 g ] q 1 g ] (A- (A-3 with g = c w J i. Hence, (φ new i w new = φ iw + ( φ ia 1 c φ N+1 i g q 1, which is put into (4 to get e(φnew = K J ] =1 i=1 y i y i (φ new i w new = K J ( =1 i=1 y i y i φ iw y i φ ia 1 c ] φ N+1 i g q 1 = e(φ K ( =1 c w J i=1 y iφ N+1 i q 1, where in arriving the last equality we have used (3 and (4 and g = c w J i=1 y iφ N+1 i. he theorem is proved. i=1 y iφ N+1 Proof of heorem : he proof applies to = 1,, K. For any given, define Φ = φ(x1,..., φ(x J ], Φ = φ(xj +1,,..., φ(x J + J, ], y = y 1,..., y J ], ỹ = y J +1,,..., y J + J, ], f = f(x 1,..., f(x J ], f = f (x 1,..., f (x J ], and à = ρi + Φ Φ. By (1, (3, and the conditions of the theorem, f = Φ (à + Φ Φ 1 (Φ y + Φỹ (a = Φ Ã 1 ( Φ Ã 1 Φ +I I ( I+Φ Ã 1 Φ 1 ] Φ Ã 1 Φ y + ] ( Φ ỹ = I + Φ Ã 1 Φ 1 ] Φ (b à 1 ] Φ ỹ + Φ y = ( I + Φ Ã 1 Φ 1 f + ( I + Φ Ã 1 Φ 1 ( Φ Ã 1 Φ + I I y = y + ( I + Φ Ã 1 Φ 1 ( f y, where equation (a is due to the Sherman-Morrison-Woodbury formula and equation (b results because f = Φ Ã 1 Φ ỹ. Hence, f y = ( I + Φ Ã 1 Φ 1 ( f y, which gives J i=1 (y i f (x i = (f y (f y = ( f y Γ 1 ( f y (A-4 where Γ = I + Φ Ã 1 Φ ] = I + Φ (ρi + Φ ] Φ 1 Φ. By construction, Γ has all its eigenvalues no less than 1, i.e., Γ = E diagλ 1,, λ J ]E with E E = I and λ 1,, λ J 1, which maes the first, second, and last inequality in (7 hold. Using this expansion of Γ in (A-4 we get J i=1 (f (x i y i = ( f y E diagσ 1 1,..., σ 1 J ] ( f y ( f y E λ 1 min, I ] ( E f y = λ 1 J min, i=1 (f (x i y i (A-5 where the inequality results because λ min, = min(λ 1,,, λ J,. From (A-5 follows the fourth inequality in (7. he third inequality in (7 can be proven in in a similar way.
MTTTS16 Learning from Multiple Sources
MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:
More informationAdaptive Multi-Modal Sensing of General Concealed Targets
Adaptive Multi-Modal Sensing of General Concealed argets Lawrence Carin Balaji Krishnapuram, David Williams, Xuejun Liao and Ya Xue Department of Electrical & Computer Engineering Duke University Durham,
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationRadial Basis Function (RBF) Networks
CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1 Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationNeural Networks Lecture 4: Radial Bases Function Networks
Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationIntroduction to Machine Learning
Introduction to Machine Learning Kernel Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 21
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationIn the Name of God. Lectures 15&16: Radial Basis Function Networks
1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training
More informationProblem 1: Compactness (12 points, 2 points each)
Final exam Selected Solutions APPM 5440 Fall 2014 Applied Analysis Date: Tuesday, Dec. 15 2014, 10:30 AM to 1 PM You may assume all vector spaces are over the real field unless otherwise specified. Your
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationArtificial Neural Network : Training
Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural
More informationA Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation
A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation Vu Malbasa and Slobodan Vucetic Abstract Resource-constrained data mining introduces many constraints when learning from
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationNearest Neighbors Methods for Support Vector Machines
Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification
More informationThe Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationSupervised Learning Coursework
Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationDynamic Time-Alignment Kernel in Support Vector Machine
Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationA Spectral Regularization Framework for Multi-Task Structure Learning
A Spectral Regularization Framework for Multi-Task Structure Learning Massimiliano Pontil Department of Computer Science University College London (Joint work with A. Argyriou, T. Evgeniou, C.A. Micchelli,
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationRecursive Least Squares for an Entropy Regularized MSE Cost Function
Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationSlide05 Haykin Chapter 5: Radial-Basis Function Networks
Slide5 Haykin Chapter 5: Radial-Basis Function Networks CPSC 636-6 Instructor: Yoonsuck Choe Spring Learning in MLP Supervised learning in multilayer perceptrons: Recursive technique of stochastic approximation,
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationRadial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition
Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationIntroduction: The Perceptron
Introduction: The Perceptron Haim Sompolinsy, MIT October 4, 203 Perceptron Architecture The simplest type of perceptron has a single layer of weights connecting the inputs and output. Formally, the perceptron
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationGaussian Mixture Distance for Information Retrieval
Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationMachine Learning Lecture Notes
Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationAn Algorithm for Transfer Learning in a Heterogeneous Environment
An Algorithm for Transfer Learning in a Heterogeneous Environment Andreas Argyriou 1, Andreas Maurer 2, and Massimiliano Pontil 1 1 Department of Computer Science University College London Malet Place,
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationSolutions Preliminary Examination in Numerical Analysis January, 2017
Solutions Preliminary Examination in Numerical Analysis January, 07 Root Finding The roots are -,0, a) First consider x 0 > Let x n+ = + ε and x n = + δ with δ > 0 The iteration gives 0 < ε δ < 3, which
More informationLMS Algorithm Summary
LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal
More informationA Tutorial on Support Vector Machine
A Tutorial on School of Computing National University of Singapore Contents Theory on Using with Other s Contents Transforming Theory on Using with Other s What is a classifier? A function that maps instances
More informationFinal Exam. 1 True or False (15 Points)
10-606 Final Exam Submit by Oct. 16, 2017 11:59pm EST Please submit early, and update your submission if you want to make changes. Do not wait to the last minute to submit: we reserve the right not to
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationAn artificial neural networks (ANNs) model is a functional abstraction of the
CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly
More informationRobust Fisher Discriminant Analysis
Robust Fisher Discriminant Analysis Seung-Jean Kim Alessandro Magnani Stephen P. Boyd Information Systems Laboratory Electrical Engineering Department, Stanford University Stanford, CA 94305-9510 sjkim@stanford.edu
More informationCLOSE-TO-CLEAN REGULARIZATION RELATES
Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationSINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES
SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:
More informationWhen Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants
When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationMachine Learning Lecture Notes
Machine Learning Lecture Notes Predrag Radivojac February 2, 205 Given a data set D = {(x i,y i )} n the objective is to learn the relationship between features and the target. We usually start by hypothesizing
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationHeterogeneous Multitask Learning with Joint Sparsity Constraints
Heterogeneous ultitas Learning with Joint Sparsity Constraints Xiaolin Yang Department of Statistics Carnegie ellon University Pittsburgh, PA 23 xyang@stat.cmu.edu Seyoung Kim achine Learning Department
More informationCSC Neural Networks. Perceptron Learning Rule
CSC 302 1.5 Neural Networks Perceptron Learning Rule 1 Objectives Determining the weight matrix and bias for perceptron networks with many inputs. Explaining what a learning rule is. Developing the perceptron
More informationThe Expectation-Maximization Algorithm
The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in
More informationNonlinear Statistical Learning with Truncated Gaussian Graphical Models
Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin Department of Electrical & Computer Engineering, Duke University Presented
More informationLeast Squares SVM Regression
Least Squares SVM Regression Consider changing SVM to LS SVM by making following modifications: min (w,e) ½ w 2 + ½C Σ e(i) 2 subject to d(i) (w T Φ( x(i))+ b) = e(i), i, and C>0. Note that e(i) is error
More informationModeling human function learning with Gaussian processes
Modeling human function learning with Gaussian processes Thomas L. Griffiths Christopher G. Lucas Joseph J. Williams Department of Psychology University of California, Berkeley Berkeley, CA 94720-1650
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication
More informationMatrix Support Functional and its Applications
Matrix Support Functional and its Applications James V Burke Mathematics, University of Washington Joint work with Yuan Gao (UW) and Tim Hoheisel (McGill), CORS, Banff 2016 June 1, 2016 Connections What
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More information