Neural Network Learning as an Inverse Problem

Size: px
Start display at page:

Download "Neural Network Learning as an Inverse Problem"

Transcription

1 Neural Network Learning as an Inverse Proble VĚRA KU RKOVÁ, Institute of Coputer Science, Acadey of Sciences of the Czech Republic, Pod Vodárenskou věží 2, Prague 8, Czech Republic. Eail: Abstract Capability of generalization in learning of neural networks fro exaples can be odelled using regularization, which has been developed as a tool for iproving stability of solutions of inverse probles. Such probles are typically described by integral operators. It is shown that learning fro exaples can be reforulated as an inverse proble defined by an evaluation operator. This reforulation leads to an analytical description of an optial input/output function of a network with kernel units, which can be eployed to design a learning algorith based on a nuerical solution of a syste of linear equations. Keywords: learning fro data, generalization, epirical error functional, inverse proble, evaluation operator, kernel ethods 1 Introduction Today s technology provides any easuring and recording devices, which supply us with a huge aount of data. Unfortunately, such achine-ade data are not coprehensible for our brains. We reseble the king Midas, whose foolish wish to transfor everything he touched into gold was fulfilled, which led to his starvation as his body could not be feeded by gold. So we turn again to achines (coputers) to help us to transfor raw data into patterns that our brains can digest. As Popper [24] ephasized, no patterns can be derived solely fro epirical data. Soe hypotheses about patterns have to be chosen and aong patterns satisfying these hypotheses, a pattern with a good fit to data has to be searched for. History of science presents any exaples of this approach, e.g., Kepler s assuption that planets ove on the ost perfect curves fitting to the data collected by Tycho de Brahe. Kepler found that aong perfect curves, ellipses are the ones best fitting to the data (better than circles, which he tried first). Also Gauss and Legendre searched for curves fitting to astronoical data collected fro observations of coets. They developed the least square ethod. In 1805, Legendre wrote of all the principles that can be proposed, I think that there is none ore general, ore exact and ore easy to apply than that consisting of iniizing the su of the squares of errors. Many researchers of later generations shared Legendre s enthusias and used the least square ethod in statistical inference, pattern recognition, function approxiation, curve or surface fitting, etc. In all these techniques, iniization of the average of squares of errors is used to find coefficients of a linear cobination of soe fixed faily of functions. Thus the best fitting function is searched for in a linear hypothesis space. However, the assuption that epirical c The Author, Published by Oxford University Press. All rights reserved. For Perissions, please eail: journals.perissions@oupjournals.org doi: /jigpal/jzi041

2 552 Neural Network Learning as an Inverse Proble data can be approxiated by a linear function is rather restrictive. Especially, it is not suitable for high-diensional data. It is known fro theory of linear approxiation that the diension of a linear space needed for approxiation within accuracy ε is O ( ( 1 ε )d), where d denotes the nuber of variables [21]. So coplexity of linear odels grows exponentially with the data diension d. In contrast to the traditional applications of the least square ethod, in neurocoputing this ethod is applied to nonlinear hypothesis sets. The hypothesis sets are fored by linear cobinations of paraeterized functions, the for of which is given by the type of coputational units, fro which a network is built. Originally the units, called perceptrons, odelled soe siplified properties of neurons, but later also other types of units (radial and kernel units) with suitable atheatical properties for function approxiation becae popular. The back-propagation algorith developed by Werbos in 1970s and reinvented by Ruelhart, Hinton and Willias in 1980s (see [29]) calculates how the gradient of the average square error depends on coefficients of the linear cobination as well as on inner paraeters of functions foring their linear cobination. The algorith iteratively odifies all network paraeters until a sufficiently well fitting input/output function of the network is found. Neurocoputing brought to data analysis also a new terinology: searching for paraeters of their input/output functions is called learning, saples of data training sets and a capability to satisfactorily process new data that have not been used for learning is called generalization. Capability of generalization depends on the choice of a hypothesis set of input/output functions, where one searches for a pattern (a functional relationship) fitting to epirical data. So a restriction of the hypothesis set to only physically eaningful functions can iprove generalization. An alternative approach to odelling of generalization was proposed by Poggio and Girosi [22]. They odified the least square ethod by Tikhonov s regularization, which adds to the least square error a ter, called stabilizer, penalizing undesired input/output functions. Girosi, Jones and Poggio considered stabilizers penalizing high frequencies in the Fourier representation of the hypothetical solutions [11]. Girosi [10] showed that stabilizers of this type belong to a wider class fored by the squares of nors on a special type of Hilbert spaces called reproducing kernel Hilbert spaces (RKHS). Such nors can play a role of easures of various types of oscillations and thus enable to odel a variety of prior knowledge (conceptual data), which has to be added to the epirical ones to guarantee a generalization capability. RKHS were forally defined by Aronszajn [2], but their theory includes any classical results on positive definite functions, atrices and integral operators with kernels. In data analysis, kernels were first used by Parzen [19] and Wahba [28], who applied the to data soothing by splines. Aizeran, Braveran and Rozonoer [1] used kernels (under the nae potential functions) to solve classification tasks by transforing geoetry of input spaces by ebedding the into higher diensional inner product spaces [1]. Boser, Guyon and Vapnik [5] and Cortes and Vapnik [6] farther developed this ethod of classification into the concept of the support vector achine (a one-hidden-layer network with kernel units in the hidden layer and one threshold output unit). A variety of kernel ethods and algoriths are of current interest, and their potential uses are wide ranging; see, e.g., the recent applications-oriented onographs by Schöllkopf and Sola [25] and by Cristianini and Shawe-Taylor [7], a theoretical article by Cucker and

3 Neural Network Learning as an Inverse Proble 553 Sale s [8] or a brief survey by Poggio and Sale [23]. So use of kernels in achine learning has two reasons: (1) nors on RKHSs can play roles of undesirable attributes of input/output functions, the penalization of which iproves generalization, and (2) kernels define ebeddings of input spaces into feature spaces with geoetries that allow to separate linearly data to be classified. In this paper, we add to these two reasons a third one. We show that the Aronszajn s definition [2] of RKHSs (as Hilbert spaces with all evaluation operators continuous) deterines precisely a class of Hilbert spaces, where solutions of the least square probles of finding input/output functions fitting to epirical data (iniization of epirical error functionals ) can be obtained as Moore-Penrose pseudosolutions of linear inverse probles. Tasks of finding unknown causes (such as shapes of functions, forces or distributions) fro known consequences (easured data) have been studied in applied science (such as acoustics, geophysics and coputerized toography see, e.g., [13]) under the nae inverse probles. To solve such a proble, one needs to know how unknown causes deterine known consequences, which can often be described in ters of operators. In probles originating fro physics, such operators are typically integral (such as those defining Radon or Laplace transfors [3], [9]). In this paper, we reforulate iniization of the least square error as an inverse proble defined by an evaluation operator. A crucial property for an application of tools fro theory of inverse probles is continuity of an operator. As the evaluation operator is defined in ters of the evaluation functionals at the saple of input data, the class of Hilbert spaces, where the theory can be applied, corresponds exactly to the class of reproducing kernel Hilbert spaces. Thus reforulation of a learning proble as an inverse proble justifies the use of RKHSs as suitable hypothesis spaces. We show that this reforulation leads to analytical description of the optial input/output function of a network with kernel units, which can be eployed to design a learning algorith based on a nuerical solution of a syste of linear equations. The paper is organized as follows. In Section 2, learning fro data is reforulated as an inverse proble. Section 3 describes properties of reproducing kernel Hilbert spaces. In Sections 4 and 5, ain results on functions iniizing an epirical error functional and its regularizations are presented. 2 Learning fro epirical data as an inverse proble A standard approach to learning fro data used, e.g., in neurocoputing in the backpropagation algorith, is based on iniization of the least square error. For a saple of input/output pairs of data (training set) z = {(u i,v i ) Ω R,i=1,...,}, where Ω R d, a functional E z called epirical error is defined for a function f :Ω Ras E z (f) = 1 (f(u i ) v i ) 2. i=1 We show that iniization of this functional can be reforulated as an inverse proble. For a linear operator A : X Y between two Hilbert spaces X, Y (in finite-diensional case, a atrix A) aninverse proble deterined by A is to find for g Y soe f X such that A(f) =g,

4 554 Neural Network Learning as an Inverse Proble where g is called a data and f a solution [3]. When for soe g Y no solution exists, one can at least search for a pseudosolution f o, for which A(f o ) is a best approxiation to g aong eleents of the range of A, i.e., A(f o ) g Y = in f X A(f) g Y. Using standard terinology fro optiization theory, for a functional Φ : X Rand M X we denote by (M,Φ) the proble of iniizitation of Φ over M; the set M is called a hypothesis set. An eleent f o M such that Φ(f o ) = in Φ(f) f M is called a solution of the proble (M,Φ) and argin(m,φ) = {f o M :Φ(f o ) = in f M Φ(f)} denotes the set of all solutions of (M,Φ). So a pseudosolution is a solution of the proble (X, A(.) g Y ). The set argin(x, A(.) g Y ) is convex and, hence, if it is nonepty, there exists a unique pseudosolution of inial nor, called the noral pseudosolution [12]). This noral pseudosolution, denoted by f +, satisfies f + X = in{ f o X : f o argin(x, A(.) g Y )}. When for every g Y, the set argin(x, A(.) g Y ) is non-epty (and thus there exists a noral pseudosolution f + ), then a pseudoinverse operator A + : Y X can be defined by setting A + (g) =f +. In 1920, Moore [18] described properties of the pseudoinverse of a atrix, which were rediscovered in 1955 by Penrose [20]. In 1970s, the theory of Moore-Penrose pseudoinversion has been extended to the infinite-diensional case it was shown that siilar properties as the ones of Moore-Penrose pseudoinverses of atrices also are possessed by continuous linear operators with closed ranges [12]. We can eploy theory of inverse probles to investigation of iniization of an epirical error functional. To reforulate iniization of such a functional as an inverse proble, for an input data vector u =(u 1,...,u ) Ω and a Hilbert space X of functions on Ω, define an evaluation operator L u : X R by L u (f) = ( f(u1 ),..., f(u ) ).

5 It is easy to check that for every f on Ω, E z (f) = 1 So E z can be represented as i=1 Neural Network Learning as an Inverse Proble 555 (f(u i ) v i ) 2 = L u (f) v 2 2. (2.1) E z = L u v 2, where. 2 denotes the l 2 -nor on R. The representation (2.1) allows one to express the proble of iniization of the epirical error functional E z as the proble of finding a pseudosolution L + u ( v ) of the inverse proble given by the operator L u for the data v. As the range of the operator L u is finite diensional, it is closed in R. Thus to eploy the extension of Moore-Penrose pseudoinversion to the doain of infinite diensional spaces, it reains to find proper hypothesis spaces, on which operators L u are continuous for all input data vectors u. 2 3 Reproducing kernel Hilbert spaces Neither the space C(Ω) of all continuous functions on Ω R d nor the Lebesgue space L 2 (Ω) of square integrable functions are suitable as the hypothesis spaces: the first one is not a Hilbert space and the second one is not fored by pointwise defined functions. Moreover, L u is not continuous on any subspace of L 2 (R d ) containing the sequence of functions {n d x ( e )2} n : all eleents of this sequence have L 2 -nors equal to 1, but the evaluation functional at zero aps this sequence to an unbounded sequence of real nubers and thus it is not continuous. Fortunately, there exists a large class of Hilbert spaces, on which all evaluation functionals are continuous and oreover, nors on such spaces can play roles of easures of various types of oscillations of input/output appings. Spaces fro this class are called reproducing kernel Hilbert spaces (RKHSs). ARKHSis a Hilbert space fored by functions on a nonepty set Ω such that for every x Ω, the evaluation functional F x, defined for any f in the Hilbert space as F x (f) =f(x), is bounded [2]. RKHS can be characterized in ters of kernels, which are syetric positive seidefinite functions K :Ω Ω R, i.e., functions satisfying for all, all (w 1,...,w ) R, and all (x 1,...,x ) Ω, w i w j K(x i,x j ) 0. i,j=1 A kernel is positive definite if i,j=1 w iw j K(x i,x j ) = 0 for any distinct x 1,...,x iplies that for all i =1,...,, w i =0.Insuchacase{K x : x Ω} is a linearly independent set. The RKHS defined by a kernel K on Ω Ω is denoted by H K (Ω).

6 556 Neural Network Learning as an Inverse Proble It is fored by linear cobinations of functions fro {K x : x Ω} and liits of all Cauchy sequences with respect to the nor. K on the RKHS induced by the inner product defined on generators by K x,k y K = K(x, y). For a positive integer and a vector u =(u 1,...,u ), by K[x] is denoted the atrix defined as K[u] i,j = K(u i,u j ), which is called the Gra atrix of the kernel K with respect to the vector u. A paradigatic exaple of a kernel is the Gaussian kernel G ρ (x, y) = e ρ x y 2 on R d R d. For this kernel, the space H Gρ (R d ) contains all functions coputable by radialbasis function networks with a fixed width equal to ρ. 4 Optial solution of the learning task Using tools fro the theory of inverse probles, we can describe the unique function in the reproducing kernel Hilbert space given by a kernel K iniizing an epirical error functional, which has the sallest K-nor. The next theore states that this function is the iage of the vector of noralized output data v under the pseudoinverse operator L + u (for the proof see [14]). Theore 4.1 Let K : Ω Ω R be a kernel, be a positive integer and z = (u, v), where u = (u 1,...,u ) Ω, u 1,...,u are distinct and v =(v 1,...,v ) R, then: (i) L + u ( v ) argin(h K (Ω), E z ), and for every f o argin(h K (Ω), E z ), L + u ( v ) K f o K ; (ii) L + u ( v )= i=1 c ik ui, where c =(c 1,...,c )=K[u] + v. So for every kernel K and every saple of epirical data z, there exists a function f + iniizing the epirical error functional E z over the whole RKHS defined by K. This function is fored by a linear cobination functions K u1,...,k u : f + = L + u ( v ) = c i K ui. f + can be interpreted as an input/output function of a neural network with one hidden layer of kernel units and a single linear output unit. The coefficients of the linear cobination c =(c 1,...,c ) (corresponding to so called output weights) can be coputed applying the pseudoinverse of the Gra atrix of the kernel K with respect to the input data vector u on the output data vector v: c = K[u] + v. Capability of generalization of such a network can be studied in ters of stability of f + against a noise. Recall that the condition nuber of a syetric positive definite atrix is the ratio between the largest and the sallest eigenvalue. So when this ratio is for the atrix K[u] sall, the solution f + is robust against a noise. The condition nuber easures the worst agnification of data errors, while a ore realistic description of ill-conditioning can be derived fro inspection of all the eigenvalues of K[u]. For K positive definite, the row vectors K u1,...,k u of the atrix K[u] are linearly independent. But when the distances i=1

7 Neural Network Learning as an Inverse Proble 557 between the data u 1,...,u are sall, the row vectors ight be nearly parallel and the sall eigenvalues of K[u] ight cluster near zero. In such a case, sall changes of v, can cause large changes of f +. 5 Generalization odelled as regularization The function f + = i=1 c ik ui with c = K[u] + v provides the best fit to the saple of data z that can be obtained using functions fro the reproducing kernel space defined by the kernel K. By choosing this RKHS as a hypothesis space, one iposes a condition on oscillations of potential solutions. The type of such a condition can be illustrated by convolution kernels K : R d R d Rsatisfying K(x, y) =k(x y) for soe k : R Rwith positive Fourier transfor k. For such kernels, K-nors can be expressed as high-frequency filters f 2 K = 1 Rd 2 f(ω) (2π) d/2 k(ω) dω [10]. So to keep this nor sall, f(ω) should decrease rather quickly with s increasing. The restriction on high frequency oscillations can be strengthened by Tikhonov s regularization with. 2 K as a stabilizer, i.e., by replacing iniization of E z with iniization of E z + γ. 2 K, where γ>0is called a regularization paraeter. The next theore describes properties of regularized solutions f γ, their relationship to the pseudosolution f + and iproveent of stability achievable using Tikhonov s regularization (for the proof see [14]). Theore 5.1 Let K :Ω Ω be a kernel, be a positive integer, z =(u, v), where u =(u 1,...,u ) Ω, u 1,...,u are distinct, v =(v 1,...,v ) R and γ>0, then: (i) there exists a unique solution f γ of the proble (H K (Ω), E z + γ. 2 K ); (ii) f γ = i=1 c ik ui, where c =(K[u]+γI) 1 v; (iii) when K is positive definite, then cond(k[u]+γi) =1+ (cond(k[u]) 1)λin λ in+γ, where λ in is the inial eigenvalue of K[u]. Siilarly as in the case of the function f + iniizing the epirical error, the function f γ iniizing the regularized epirical error is a linear cobination of the functions K u1,...,k u defined by the input data u 1,...,u : f γ = c γ i K u i. i=1 But the coefficients of the linear cobination are different, their vector c γ =(c γ 1,...,cγ )is the iage of the output data vector v under the inverse operator (K[u]+γI) 1 : c γ =(K[u]+γI) 1 v. This forula has been derived by several authors [28], [8], [23] using Fréchet derivatives and called the Representer Theore. Here, we obtained it directly fro properties of regularized solutions of inverse probles.

8 558 Neural Network Learning as an Inverse Proble So increase of soothness of the regularized solution f γ is achieved by erely changing the coefficients of the linear cobination: while in the non regularized case, the coefficients are obtained fro the output data vector v using the Moore-Penrose pseudoinverse of the Gra atrix K[u], in the regularized one, they are obtained using the inverse of a odified atrix K[u] + γi. So the regularization erely changes aplitudes, but it preserves the finite set of basis functions fro which the solution is coposed. These basis functions can be, for exaple, Gaussians with centers given by the input data. Theore 5.1 shows that continuous growth of the regularization paraeter γ leads fro under soothing to over soothing and estiates how ill-conditioning can be iproved by the Tikhonov s regularization. As li (1 + cond(k[u] 1)λ in )=1, γ λ in + γ the larger the product γ, the better iproveent of stability. However, the size of the regularization paraeter γ is liited by the requireent of fitting f γ to the saple of epirical data z, so γ cannot be too large. But can be increased by choosing a larger saple of epirical data. However for large, coputational efficiency of iterative ethods for solving systes of linear equations c = K[u] + v and c = (K[u] +γi) 1 v ight liit practical applications of learning algoriths based on the Representer Theore (Theore 5.1 (ii)); see [23] for references to such applications. In typical neural-network algoriths, networks with the nuber of hidden units n uch saller than the size of the training set are used. In [16] and [17], estiates of speed of convergence of infia of epirical error over sets of functions coputable by such networks to the iniu described in Theore 5.1 were derived. For reasonable data sets, such convergence is rather fast. Acknowledgeents This work was partially supported by the project 1ET of the progra Inforation Society of the National Research Progra of the Czech Republic. References [1] Aizeran M. A., Braveran E. M., Rozonoer L. I. (1964). Theoretical foundations of potential function ethod in pattern recognition learning. Autoation and Reote Control 28, [2] Aronszajn N. (1950). Theory of reproducing kernels. Transactions of AMS 68, [3] Bertero M. (1989). Linear inverse and ill-posed probles. Advances in Electronics and Electron Physics 75, [4] Björck A. (1996). Nuerical Methods for Least Squares Proble. SIAM. [5] Boser B., Guyon I., Vapnik V. N. (1992). A training algorith for optial argin clasifiers. In Proceedings of the 5th Annual ACM Workshop on Coputational Learning Theory (Ed. D. Haussler) (pp ). ACM Press. [6] Cortes C., Vapnik V. N. (1995). Support-vector networks. Machine Learning 20, [7] Cristianini N., Shawe-Taylor J. (2000). An Introduction to Support Vector Machines. Cabridge: Cabridge University Press. [8] Cucker F., Sale S. (2001). On the atheatical foundations of learning. Bulletin of the AMS 39, [9] Engl H. W., Hanke M., Neubauer A. (2000). Regularization of Inverse Probles. Dordrecht:Kluwer.

9 Neural Network Learning as an Inverse Proble 559 [10] Girosi F. (1998). An equivalence between sparse approxiation and support vector achines. Neural Coputation 10, (AI Meo No 1606, MIT). [11] Girosi F., Jones M., Poggio T. (1995). Regularization theory and neural network architectures. Neural Coputaion 7, [12] Groetch C. W. (1977). Generalized Inverses of Linear Operators. Dekker: New York. [13] Hansen P. C. (1998). Rank-Deficient and Discrete Ill-Posed Probles. Philadelphia: SIAM. [14] Kůrková V. (2004). Supervised learning as an inverse proble. Research Report ICS , Institute of Coputer Science, Prague. [15] Kůrková V. (2004). Learning fro data as an inverse proble. In COMPSTAT Proceedings on Coputational Statistics (J. Antoch, Ed.), Heidelberg: Physica-Verlag/Springer. [16] Kůrková V., Sanguineti M. (2005). Error estiates for approxiate optiization by the extended Ritz ethod. SIAM Journal on Optiization 15: [17] Kůrková V., Sanguineti M. (2005). Learning with generalization capability by kernel ethods with bounded coplexity. Journal of Coplexity (to appear). [18] Moore E. H. (1920). Abstract. Bull. Aer. Math. Soc. 26, [19] Parzen E. (1966). An approach to tie series analysis. Annals of Math. Statistics 32, [20] Penrose R. (1955). A generalized inverse for atrices. Proc. Cabridge Philos. Soc. 51, [21] Pinkus, A. (1985). n-width in Approxiation Theory. Berlin: Springer-Verlag. [22] Poggio T., Girosi F. (1990). Networks for approxiation and learning. Proceedings IEEE 78, [23] Poggio T., Sale S. (2003). The atheatics of learning: dealing with data. Notices of the AMS 50, [24] Popper K. (1968). The Logic of Scientific Discovery. New York: Harper Torch Book. [25] Schölkopf B., Sola A. J. (2002). Learning with Kernels Support Vector Machines, Regularization, Optiization and Beyond. Cabridge: MIT Press. [26] Tikhonov A. N., Arsenin V. Y. (1977). Solutions of Ill-posed Probles. Washington, D.C.: W.H. Winston. [27] Vapnik V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag. [28] Wahba G. (1990). Splines Models for Observational Data. Philadelphia: SIAM. [29] Werbos, P. J. (1995). Backpropagation: Basics and New Developents. In The Handbook of Brain Theory and Neural Networks (Ed. Arbib M.) (pp ). Cabridge: MIT Press. Received 5 January 2005.

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Complexity and regularization issues in kernel-based learning

Complexity and regularization issues in kernel-based learning Complexity and regularization issues in kernel-based learning Marcello Sanguineti Department of Communications, Computer, and System Sciences (DIST) University of Genoa - Via Opera Pia 13, 16145 Genova,

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE

RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE Proceedings of ICIPE rd International Conference on Inverse Probles in Engineering: Theory and Practice June -8, 999, Port Ludlow, Washington, USA : RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Foundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Kernel Methods Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Motivation Efficient coputation of inner products in high diension. Non-linear decision

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES ICONIC 2007 St. Louis, O, USA June 27-29, 2007 HIGH RESOLUTION NEAR-FIELD ULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR ACHINES A. Randazzo,. A. Abou-Khousa 2,.Pastorino, and R. Zoughi

More information

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

CHARACTER RECOGNITION USING A SELF-ADAPTIVE TRAINING

CHARACTER RECOGNITION USING A SELF-ADAPTIVE TRAINING CHARACTER RECOGNITION USING A SELF-ADAPTIVE TRAINING Dr. Eng. Shasuddin Ahed $ College of Business and Econoics (AACSB Accredited) United Arab Eirates University, P O Box 7555, Al Ain, UAE. and $ Edith

More information

Introduction to Kernel methods

Introduction to Kernel methods Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 19th Oct, 2012 Introduction

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Foundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Lecture 5 Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient coputation of inner products

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

INNER CONSTRAINTS FOR A 3-D SURVEY NETWORK

INNER CONSTRAINTS FOR A 3-D SURVEY NETWORK eospatial Science INNER CONSRAINS FOR A 3-D SURVEY NEWORK hese notes follow closely the developent of inner constraint equations by Dr Willie an, Departent of Building, School of Design and Environent,

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x) 7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not

More information

HESSIAN MATRICES OF PENALTY FUNCTIONS FOR SOLVING CONSTRAINED-OPTIMIZATION PROBLEMS

HESSIAN MATRICES OF PENALTY FUNCTIONS FOR SOLVING CONSTRAINED-OPTIMIZATION PROBLEMS R 702 Philips Res. Repts 24, 322-330, 1969 HESSIAN MATRICES OF PENALTY FUNCTIONS FOR SOLVING CONSTRAINED-OPTIMIZATION PROBLEMS by F. A. LOOTSMA Abstract This paper deals with the Hessian atrices of penalty

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Pattern Classification using Simplified Neural Networks with Pruning Algorithm

Pattern Classification using Simplified Neural Networks with Pruning Algorithm Pattern Classification using Siplified Neural Networks with Pruning Algorith S. M. Karuzzaan 1 Ahed Ryadh Hasan 2 Abstract: In recent years, any neural network odels have been proposed for pattern classification,

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Optimal Jamming Over Additive Noise: Vector Source-Channel Case Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

An Inverse Interpolation Method Utilizing In-Flight Strain Measurements for Determining Loads and Structural Response of Aerospace Vehicles

An Inverse Interpolation Method Utilizing In-Flight Strain Measurements for Determining Loads and Structural Response of Aerospace Vehicles An Inverse Interpolation Method Utilizing In-Flight Strain Measureents for Deterining Loads and Structural Response of Aerospace Vehicles S. Shkarayev and R. Krashantisa University of Arizona, Tucson,

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Feedforward Networks

Feedforward Networks Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 433 Christian Jacob Dept.of Coputer Science,University of Calgary CPSC 433 - Feedforward Networks 2 Adaptive "Prograing"

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

Feedforward Networks

Feedforward Networks Feedforward Neural Networks - Backpropagation Feedforward Networks Gradient Descent Learning and Backpropagation CPSC 533 Fall 2003 Christian Jacob Dept.of Coputer Science,University of Calgary Feedforward

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

Explicit solution of the polynomial least-squares approximation problem on Chebyshev extrema nodes

Explicit solution of the polynomial least-squares approximation problem on Chebyshev extrema nodes Explicit solution of the polynoial least-squares approxiation proble on Chebyshev extrea nodes Alfredo Eisinberg, Giuseppe Fedele Dipartiento di Elettronica Inforatica e Sisteistica, Università degli Studi

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION Vol. IX Uncertainty Models For Robustness Analysis - A. Garulli, A. Tesi and A. Vicino

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION Vol. IX Uncertainty Models For Robustness Analysis - A. Garulli, A. Tesi and A. Vicino UNCERTAINTY MODELS FOR ROBUSTNESS ANALYSIS A. Garulli Dipartiento di Ingegneria dell Inforazione, Università di Siena, Italy A. Tesi Dipartiento di Sistei e Inforatica, Università di Firenze, Italy A.

More information

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization Structured signal recovery fro quadratic easureents: Breaking saple coplexity barriers via nonconvex optiization Mahdi Soltanolkotabi Ming Hsieh Departent of Electrical Engineering University of Southern

More information

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004 Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 533 Winter 2004 Christian Jacob Dept.of Coputer Science,University of Calgary 2 05-2-Backprop-print.nb Adaptive "Prograing"

More information

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming SVM Soft Margin Classifiers: Linear Prograing versus Quadratic Prograing Qiang Wu wu.qiang@student.cityu.edu.hk Ding-Xuan Zhou azhou@cityu.edu.hk Departent of Matheatics, City University of Hong Kong,

More information

The linear sampling method and the MUSIC algorithm

The linear sampling method and the MUSIC algorithm INSTITUTE OF PHYSICS PUBLISHING INVERSE PROBLEMS Inverse Probles 17 (2001) 591 595 www.iop.org/journals/ip PII: S0266-5611(01)16989-3 The linear sapling ethod and the MUSIC algorith Margaret Cheney Departent

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS Jochen Till, Sebastian Engell, Sebastian Panek, and Olaf Stursberg Process Control Lab (CT-AST), University of Dortund,

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Predicting FTSE 100 Close Price Using Hybrid Model

Predicting FTSE 100 Close Price Using Hybrid Model SAI Intelligent Systes Conference 2015 Noveber 10-11, 2015 London, UK Predicting FTSE 100 Close Price Using Hybrid Model Bashar Al-hnaity, Departent of Electronic and Coputer Engineering, Brunel University

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

P016 Toward Gauss-Newton and Exact Newton Optimization for Full Waveform Inversion

P016 Toward Gauss-Newton and Exact Newton Optimization for Full Waveform Inversion P016 Toward Gauss-Newton and Exact Newton Optiization for Full Wavefor Inversion L. Métivier* ISTerre, R. Brossier ISTerre, J. Virieux ISTerre & S. Operto Géoazur SUMMARY Full Wavefor Inversion FWI applications

More information