Radial-Basis Function Networks. Radial-Basis Function Networks

Size: px
Start display at page:

Download "Radial-Basis Function Networks. Radial-Basis Function Networks"

Transcription

1 Radial-Basis Function Networks November 00 Michel Verleysen Radial-Basis Function Networks - Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks -

2 Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - 3 Origin: Covers theorem p Covers theorem on separability of patterns (965) p x, x,, x P assigned to two classes C C p ϕ-separability: w T w ϕ T w ϕ ( x ) ( x ) > 0 < 0 x C x C p Cover s theorem: p non-linear functions ϕ(x) p dimension hidden space > dimension input space probability of separability closer to p Example linear quadratic Michel Verleysen Radial-Basis Function Networks - 4

3 Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - 5 Interpolation problem p Given points (x p, t p ), x p R D, t p R, p P : p Find F : R D Rthat satisfies p p ( ) KP F x = t, p = p RBF technique (Powell, 988): ( ) F P p ( x ) = w p ϕ( x x ) p= p p ϕ x x p as many functions as data points p centers fixed at known points x p are arbitrary non-linear functions (RBF) Michel Verleysen Radial-Basis Function Networks - 6 3

4 4 Michel Verleysen Radial-Basis Function Networks - 7 Interpolation problem p Into matrix form: p Vital question: is Φ non-singular? ( ) p p F = t x ( ) ( ) = ϕ = P p p w k F x x x = ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ P P PP P P P P t t t w w w M M L M O M M L L ( ) l k kl x x = ϕ ϕ where x w = Φ x w = Φ Michel Verleysen Radial-Basis Function Networks - 8 Michelli s theorem p If points x k are distinct, Φ is non-singular (regardless of the dimension of the input space) p Valid for a large class of RBF functions: ( ), + l = ϕ c x c x ( ) l, + = ϕ c x c x (l > 0) ( ) σ = ϕ exp c x c x, (σ > 0) non-localized function localized functions

5 Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - 9 Learning: ill-posed problem t x p Necessity for regularization p Error criterion: E P p p ( F ) ( t F( x ) + λ C( w ) = P p= MSE regularization Michel Verleysen Radial-Basis Function Networks - 0 5

6 Solution to the regularization problem p Poggio & Girosi (990): p if C(w) is a (problem-dependent) linear differential operator, the solution to P p p E( F ) ( t F ( x ) + λ C( w ) is of the following form: where F P p ( x ) = w p G( x, x ) p= G() is a Green s function, w = = P p= ( G + λi ) t G kl = G(x k,x l ) Michel Verleysen Radial-Basis Function Networks - Interpolation - Regularization p Interpolation F P p ( x ) = w p ϕ( x x ) w = Φ p= x p Exact interpolator p Possible RBF: p x x p ϕ( x, x ) = exp σ p Regularization F P p ( x ) = w p G( x, x ) w = p= ( G + λi ) t p Exact interpolator p Equal to the «interpolation» solution iff λ=0 p Example of Green s function: G p ( x, x ) p x x = exp σ One RBF / Green s function for each learning pattern! Michel Verleysen Radial-Basis Function Networks - 6

7 Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - 3 Generalized RBFN (GRBFN RBFN) p As many radial functions as learning patterns: p computationally (too) intensive (inversion of PxP matrix grows with P 3 ) p ill-conditioned matrix p regularization not easy (problem-specific) Generalized RBFN approach! Typically: p K << P p F ϕ ( x ) w i ϕ( x ci ) = K i = ( x c ) i = exp x c σ i i Parameters: c i, σ i, w i Michel Verleysen Radial-Basis Function Networks - 4 7

8 Radial-Basis Function Networks (RBFN) F ( x ) w i ϕ( x ci ) = K i = x 0 (bias) x ϕ( x c i ) σ w i F(x) ϕ ( x c ) i = exp x c i σ i c ij σk if several outputs x d st layer nd layer p Possibilities: p several outputs (common hidden layer) p bias (recommended) (see extensions) Michel Verleysen Radial-Basis Function Networks - 5 RBFN: universal approximation p Park & Sandberg 99: p For any continuous input-output mapping function f(x) F K ( x ) = wiϕ( x ci ) Lp( f ( x ),F ( x )) < ε ( ε > 0,p [, ] ) i = p The theorem is stronger (radial symmetry not needed) p K not specified p Provides a theoretical basis for practical RBFN! Michel Verleysen Radial-Basis Function Networks - 6 8

9 RBFN and kernel regression p non-linear regression model p p p p p ( ) P t = f x + ε = y + ε, p p estimation of f(x): average of t around x. More precisely: f ( x ) = E[ y x] = = yf yf Y ( y x ) X Y, f x dy ( x,y ) ( x ) dy p Need for estimates of fx, Y ( x,y ) and f X ( x ) Parzen-Rosenblatt density estimator Michel Verleysen Radial-Basis Function Networks - 7 Parzen-Rosenblatt density estimator fˆ x P x x K d Ph p= h ( x) = p with K() continuous, bounded, symmetric about the origin, with maximum value at 0, and with unit integral, is consistent (asymptotically unbiased). p Estimation of f (,y ) fˆ X,Y ( x,y ) X, Y x P x x = K d Ph p= h y y K + h p p Michel Verleysen Radial-Basis Function Networks - 8 9

10 RBFN and kernel regression fˆ ( x) yfˆ = X,Y fˆ X ( x,y ) ( x ) dy P p p x x y K p= h = P p x x K p= h f ( x ) = yfx Y, fx ( x,y ) ( x ) dy p Weighted average of y i p called Nadaraya-Watson estimator (964) p equivalent to Normalized RBFN in the unregularized context Michel Verleysen Radial-Basis Function Networks - 9 RBFN MLP p RBFN p single hidden layer p non-linear hidden layer linear output layer p argument of hidden units: Euclidean norm p universal approximation property p local approximators p splitted learning p MLP p single or multiple hidden layers p non-linear hidden layer linear or non-linear output layer p argument of hidden units: scalar product p universal approximation property p global approximators p global learning Michel Verleysen Radial-Basis Function Networks - 0 0

11 Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - RBFN: learning strategies F ( x ) w i ϕ( x ci ) ϕ( x ci ) = K i = = exp x c σ i i p Parameters to be determined: c i, σ i, w i p Traditional learning strategy: splitted computation. centers c i. widths σ i 3. weights w i Michel Verleysen Radial-Basis Function Networks -

12 RBFN: computation of centers p Idea: centers c i must have the (density) properties of learning points x k vector quantization p seen in details hereafter p This phase only uses the x k information, not the t k Michel Verleysen Radial-Basis Function Networks - 3 RBFN: computation of widths p Universal approximation property: valid with identical widths p In practice (limited learning set): variable widths σ i p Idea: RBFN use local clusters p choose σ i according to standard deviation of clusters Michel Verleysen Radial-Basis Function Networks - 4

13 RFBN: computation of weights F K ( x ) = w i ϕ( x ci ) i = ϕ ( x c ) i = exp x c i σ i p Problem becomes linear! p Solution of least square criterion leads to T T w = Φ + t = ( Φ Φ) Φ where k Φ ϕki = ϕ x ci p In practise: use SVD! ( ) constants! E P p p ( F ) = ( t F( x ) P p= Michel Verleysen Radial-Basis Function Networks - 5 p 3-steps method: RBFN: gradient descent F ( x ) = K w i exp i = 3 supervised x c i σ i unsupervised p Once c i, σ i, w i have been set by the previous method, possibility of gradient descent on all parameters p Some improvement, but p learning speed p local minima p risk of non-local basis functions p etc. Michel Verleysen Radial-Basis Function Networks - 6 3

14 More elaborated models p Add constant and linear terms K D i F( ) w exp x c x = i + w' x w' i i + 0 i = σi i = good idea (very difficult to approximate a constant with kernels) p Use normalized RBFN x ci exp K ( ) σi F x = w i = i K x c j exp = σ j j basis functions are bouded [0,] can be interpreted as probability values (classification) Michel Verleysen Radial-Basis Function Networks - 7 Back to the widths p choose σ i according to standard deviation of clusters p In the literature: p σ = dmax K where d max = maximum distance between centroids [] q p σ = c c where index j scans the q nearest centroids to c i [] i i q j = ( ) j p σi = r min ci c j where r is an overlap constant [3] j p.. [] S. Haykin, "Neural Networks a Comprehensive Foundation", Prentice-Hall Inc, second edition, 999. [] J. Moody and C. J. Darken, "Fast learning in networks of locally-tuned processing units", Neural Computation, pp. 8-94, 989. [3] A. Saha and J. D. Keeler, ''Algorithms for Better Representation and Faster Learning in Radial Basis Function Networks", Advances in Neural Information Processing Systems, Edited by David S. Touretzky, pp , 989. Michel Verleysen Radial-Basis Function Networks - 8 4

15 Basic example p Approximation of f(x) = with a d-dimensional RBFN p In theory: identical w i p Experimentally: side effects only middle taken into account p Error versus width Michel Verleysen Radial-Basis Function Networks - 9 Basic example: erros vs space dimension Michel Verleysen Radial-Basis Function Networks

16 Basic example: local decomposition? Michel Verleysen Radial-Basis Function Networks - 3 Multiple local minima in error curve p Choose the first minimum to preserve the locality of clusters p The first local minimum is usually less sensitive to variability Michel Verleysen Radial-Basis Function Networks - 3 6

17 Some concluding comments p RBFN: easy learning (compared to MLP) p in a cross-validation scheme: important! p Many RBFN models p Even more RBFN learning schemes p Results not very sensitive to unsupervised part of learning (c i, σ i ) p Open work for a priori (proble-dependent) choice of widths σ i Michel Verleysen Radial-Basis Function Networks - 33 Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks

18 Back to the centers: Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 35 Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks

19 Aim of vector quantization p To reduce the size of a database N features N features P vectors Q vectors Q < P Michel Verleysen Radial-Basis Function Networks - 37 Principle of vector quantization p To project a continuous input space on a discrete output space, while minimizing the loss of information Q vectors P points Q vectors Michel Verleysen Radial-Basis Function Networks

20 Principle of vector quantization p To define zones in the space, the set of points contained in each zone being projected on a representative vector (centroid) p Example: -dimensional spaces Michel Verleysen Radial-Basis Function Networks - 39 What is a vector quantizer? p Vector quantizer =. A codebook (set of centroids, or codewords) { y j, j Q} m =. A quantization function q: i j ( x ) y i q x q = Usually: q is defined by the nearest neighbour rule (according to some distance measure) Michel Verleysen Radial-Basis Function Networks

21 p Least square error d p r-norm error dr Distance measures D i j i j i j i j T i j ( x, y ) ( x y ) = x y = ( x y ) ( x y ) = k k = D i j i j r ( x, y ) ( x y ) = k k= k k r p Mahalanobis distance (Γ is the covariance matrix of the inputs) d W ( i j i j T i j x, y ) = ( x y ) Γ ( x y ) Michel Verleysen Radial-Basis Function Networks - 4 Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 4

22 Vector scalar quantization p Shannon: a vector quantizer always gives better results than the product of scalar quantizers, even if the probability densities are independent p Example #: uniform -D distribution a scalar quantization [ ( ) 0 d x y ] a E = 36 vector quantization [ ( ) 6 E d x y ] = a 36 Michel Verleysen Radial-Basis Function Networks - 43 Vector scalar quantization p Shannon: a vector quantizer always gives better results than the product of scalar quantizers, even if the probability densities are independent p Example #: uniform -D distribution scalar quantization 4 [ d( x y ) ] c 6 E SQ = vector quantization 4 [ d( x y ) ]. 08h E VQ = c same ratio #centroids/unit surface: E VQ = 0.96 E SQ h Michel Verleysen Radial-Basis Function Networks - 44

23 Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 45 Lloyd s principle x i encoder j decoder y j p 3 properties:. The first one gives the best encoder, once the decoder is known. The second one gives the best decoder, once the encoder is known 3. There is no point on the borders between Voronoï regions (probability = 0) p Optimal quantizer: properties are bad necessary, but not sufficient: good Michel Verleysen Radial-Basis Function Networks

24 Lloyd: property # x i encoder j decoder y j p For a given decoder β, the best encoder is given by: α i i j j ( x ) = argmin [ d( x, y )] where y = β( j ) j nearest-neighbor rule! Michel Verleysen Radial-Basis Function Networks - 47 Lloyd: property # x i encoder j decoder y j p For a given encoder α, the best decoder is given by: β i j i ( j ) = argmin [ E[ d( x, y ) α( x ) j ] y j = center-of-gravity rule! Michel Verleysen Radial-Basis Function Networks

25 Lloyd: property #3 p The probability to find a point xi on a border (between Voronoï regions) is zero! probability = 0 Michel Verleysen Radial-Basis Function Networks - 49 Lloyd s algorithm. Choice of an initial codebook.. All points x i are encoded; E VQ is evaluated. 3. If E VQ is small enough, then stop. 4. All centroids y j are replaced by the center-of-gravity of the data x i associated to y j in step. 5. Back to step. Michel Verleysen Radial-Basis Function Networks

26 Lloyd: example. Initialization of the codeboook y y y 3 y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 5 Lloyd: example. Encoding (nearest-neighbor) y y y 3 y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 5 6

27 Lloyd: example 4. Decoding (center-of-gravity) y y 3 y y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 53 Lloyd: example. Encoding (nearest-neighbor) new borders y y y 3 y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks

28 Lloyd: example 4. Decoding (center-of-gravity) y y 3 new positions of centroids y y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 55 Lloyd: example. Encoding (nearest-neighbor) new borders y y 3 y y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks

29 Lloyd: example 4. Decoding (center-of-gravity) new positions of centroids y y 3 y y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 57 Lloyd: example. Encoding (nearest-neighbor) final borders (convergence) y y 3 y y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks

30 Lloyd s algorithm: the names p Lloyd s algorithm p Generalized Lloyd s algorithm p Linde-Buzzo-Gray (LBG) algorithm p K-means p ISODATA p All based on the same principle! Michel Verleysen Radial-Basis Function Networks - 59 Lloyd s algorithm: properties p The codebook is modified only after the presentation of the whole dataset p The mean square error (E VQ ) decreases at each iteration p The risk of getting trapped in local minima is high p The final quantizer depends on the initial one Michel Verleysen Radial-Basis Function Networks

31 Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 6 How to initialize Lloyd s algorithm?. randomly in the input space. the Q first data points x i 3. Q randomly chosen data points x i 4. Product codes : the product of scalar quantizers 5. growing initial set: p a first centroid y is randomly chosen (in the data set) p a second centroid y is randomly chosen (in the data set); if d(y,y ) > threshold, y is kept p a third centroid y 3 is randomly chosen (in the data set); if d(y,y 3 ) > threshold AND d(y,y 3 ) > threshold, y 3 is kept p Michel Verleysen Radial-Basis Function Networks - 6 3

32 How to initialize Lloyd s algorithm? 6. pairwise nearest neighbor : p a first codebook is built with all data points x i p the two centroids y j nearest one from another are merged (center-of-gravity) p Variant: p the increase of distortion (E VQ ) is evaluated for the merge of each pair of centroids y j ; the pair gicing the lowest increase is merged. Michel Verleysen Radial-Basis Function Networks - 63 How to initialize Lloyd s algorithm? 7. Splitting p a first centroid y is randomly chosen (in the data set) p a second centroid y +ε is created; Lloyd s algorithm is applied to the new codebook p two new centroids are created by perturbing the two existing ones; Lloyd s algorithm is applied to this 4-centroids codebook p Michel Verleysen Radial-Basis Function Networks

33 Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 65 Vector quantization: «neural» algorithms p Principle: the codebook is (partly or fully) modified at each presentation of one data vector x i p Advantages: p simplicity p adaptive algorithm (with varying data) p possible parallelisation p speed? p avoids local minima? Michel Verleysen Radial-Basis Function Networks

34 Competitive learning p Algorithm: p For each input vector x i, the «winner» is selected: y y 3 d i k i j ( x, y ) d( x, y ), j,k Q p Adaptation rule: the winner is moved towards the input vector: y k t k i k ( + ) = y ( t) + α( x y ) y y k (t) y k (t+) x i y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 67 Competitive learning p Adaptation rule: stochastic gradient descent on j ( ( x) E = x y ) p( x ) dx convergence to (local) minimum y y 3 p Robbins-Monro conditions on α: α() t = and α () t < t = 0 t = 0 p Local minima! p Some centroids may be «lost»! y never «winner»! y 4 Michel Verleysen Radial-Basis Function Networks

35 Frequency sensitive learning p Competitive learning; some centroids may be lost during learning centroids often chosen (as winners) are penalized! p Choice of winner is replaced by u k d i k j i j ( x, y ) u d( x, y ), j,k Q where u j, u k are incremented each time they are chosen as winner (starting at ) Michel Verleysen Radial-Basis Function Networks - 69 Frequency sensitive learning y y 3 u d u d i ( x, y ) i ( x, y ) y x i possible «winner»! y 4 (with u = and u =3 for example) Michel Verleysen Radial-Basis Function Networks

36 Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 7 Winner-take-all Winner-take-most p Most VQ algorithms: if two centroids are close, it will be hard to separate them! p (Competitive learning and LVQ: one or two winners are adapted) p Solution: to adapt the winner and other centroids: p (with respect to the distance between the centroid and x i (SCS) ) p stochastically with respect to the distance (SRS) p with respect to the order of proximity with x i (neural-gas) p those in a neighborhood (on a grid) of the winner (Kohonen) These algorithms accelerate the VQ too! Michel Verleysen Radial-Basis Function Networks

37 Soft Competition Scheme (SCS) p Adaptation rule on all centroids: k y k i i k ( t + ) = y ( t) + αg( k, x )( x y () t ) with i ( x ) G k, i k x y () t e T = i j x y () t Q T e j= (T is made decreasing over time) Michel Verleysen Radial-Basis Function Networks - 73 Stochastic Relaxation Scheme (SRS) p Adaptation rule on all centroids: k y k i i k ( t + ) = y ( t) + αg( k, x )( x y () t ) with i ( x ) G k, = 0 i k x y () t e T P = i j x y () t Q T e j = with probabilit y P with probabilit y P (T is made decreasing over time) Michel Verleysen Radial-Basis Function Networks

38 Neural gas p Principle: centroids are ranked according to their distance with x i : h(j, x i ) = c if y j is the c-st nearest centroid from x i p Adaptation rule: k y k ( t ) = y ( t) i ( k, x ) h + + αe λ i k ( x y () t ) Michel Verleysen Radial-Basis Function Networks - 75 Neural gas p Properties: p if λ=0: competitive learning p partial sorting possible p improvement: prossibility of frequency sensitive learning p The adaptation rule is a stochastic gradient descent on i ( ) Q h j, x E = λ () e Q h l λ j = e l = i j x y () t p( x ) dx Michel Verleysen Radial-Basis Function Networks

39 Sources and references (RBFN) p Most of the basic concepts developed in these slides come from the excellent book: p Neural networks a comprehensive foundation, S. Haykin, Macmillan College Publishing Company, 994. p Some supplementary comments come from the tutorial on RBF: p An overview of Radial Basis Function Networks, J. Ghosh & A. Nag, in: Radial Basis Function Networks, R.J. Howlett & L.C. Jain eds., Physica-Verlag, 00. p The results on the RBFN basic exemple were generated by my colleague N. Benoudjit, and are submitted for publication. Michel Verleysen Radial-Basis Function Networks - 77 Sources and references (VQ) p A classical tutorial on vector quantization p Vector quantization, R.M. Gray, IEEE ASSP Mag., vol., pp. 4-9, April 984. p Most concepts in this chapter come from the following (specialized) papers: p Vector quantization is speech coding, K. Makhoul, S. Roucos, H. Gish, Proceedings IEEE, vol. 73, n., November 985. p Neural-gas network for vector quantization and its application to timeseries prediction, T.M. Martinetz, S.G. Berkovich, K.J. Schulten, IEEE T. Neural Networks, vol.4, n.4, July 993. p Habituation in Learning Vector Quantization, T.? Gestzi, I. Csabai, Complex Systems, n.6, 99. p DVQ: Dynamic Vector Quantization an incremental LVQ, F. Poirier, A. Ferrieux, in: Artificial Neural Networks, T. Kohonen et al. eds., Elsevier, 99. p Representation of nonlinear data structures thourgh a fast VQP neural network, P. Demartines, J. Hérault, Proc. Neuro-Nîmes 993. Michel Verleysen Radial-Basis Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.

More information

Ch. 10 Vector Quantization. Advantages & Design

Ch. 10 Vector Quantization. Advantages & Design Ch. 10 Vector Quantization Advantages & Design 1 Advantages of VQ There are (at least) 3 main characteristics of VQ that help it outperform SQ: 1. Exploit Correlation within vectors 2. Exploit Shape Flexibility

More information

Learning Vector Quantization

Learning Vector Quantization Learning Vector Quantization Neural Computation : Lecture 18 John A. Bullinaria, 2015 1. SOM Architecture and Algorithm 2. Vector Quantization 3. The Encoder-Decoder Model 4. Generalized Lloyd Algorithms

More information

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

EM-algorithm for Training of State-space Models with Application to Time Series Prediction EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research

More information

Learning Vector Quantization (LVQ)

Learning Vector Quantization (LVQ) Learning Vector Quantization (LVQ) Introduction to Neural Computation : Guest Lecture 2 John A. Bullinaria, 2007 1. The SOM Architecture and Algorithm 2. What is Vector Quantization? 3. The Encoder-Decoder

More information

THE frequency sensitive competitive learning (FSCL) is

THE frequency sensitive competitive learning (FSCL) is 1026 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 5, SEPTEMBER 1997 Diffusion Approximation of Frequency Sensitive Competitive Learning Aristides S. Galanopoulos, Member, IEEE, Rolph L. Moses, Senior

More information

EE368B Image and Video Compression

EE368B Image and Video Compression EE368B Image and Video Compression Homework Set #2 due Friday, October 20, 2000, 9 a.m. Introduction The Lloyd-Max quantizer is a scalar quantizer which can be seen as a special case of a vector quantizer

More information

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009 The Secrets of Quantization Nimrod Peleg Update: Sept. 2009 What is Quantization Representation of a large set of elements with a much smaller set is called quantization. The number of elements in the

More information

Distance Preservation - Part I

Distance Preservation - Part I October 2, 2007 1 Introduction 2 Scalar product Equivalence with PCA Euclidean distance 3 4 5 Spatial distances Only the coordinates of the points affects the distances. L p norm: a p = p D k=1 a k p Minkowski

More information

VECTOR-QUANTIZATION BY DENSITY MATCHING IN THE MINIMUM KULLBACK-LEIBLER DIVERGENCE SENSE

VECTOR-QUANTIZATION BY DENSITY MATCHING IN THE MINIMUM KULLBACK-LEIBLER DIVERGENCE SENSE VECTOR-QUATIZATIO BY DESITY ATCHIG I THE IIU KULLBACK-LEIBLER DIVERGECE SESE Anant Hegde, Deniz Erdogmus, Tue Lehn-Schioler 2, Yadunandana. Rao, Jose C. Principe CEL, Electrical & Computer Engineering

More information

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l Vector Quantization Encoder Decoder Original Image Form image Vectors X Minimize distortion k k Table X^ k Channel d(x, X^ Look-up i ) X may be a block of l m image or X=( r, g, b ), or a block of DCT

More information

Uniform Convergence of a Multilevel Energy-based Quantization Scheme

Uniform Convergence of a Multilevel Energy-based Quantization Scheme Uniform Convergence of a Multilevel Energy-based Quantization Scheme Maria Emelianenko 1 and Qiang Du 1 Pennsylvania State University, University Park, PA 16803 emeliane@math.psu.edu and qdu@math.psu.edu

More information

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir Supervised (BPL) verses Hybrid (RBF) Learning By: Shahed Shahir 1 Outline I. Introduction II. Supervised Learning III. Hybrid Learning IV. BPL Verses RBF V. Supervised verses Hybrid learning VI. Conclusion

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Slide05 Haykin Chapter 5: Radial-Basis Function Networks

Slide05 Haykin Chapter 5: Radial-Basis Function Networks Slide5 Haykin Chapter 5: Radial-Basis Function Networks CPSC 636-6 Instructor: Yoonsuck Choe Spring Learning in MLP Supervised learning in multilayer perceptrons: Recursive technique of stochastic approximation,

More information

Regularization via Spectral Filtering

Regularization via Spectral Filtering Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Quantization. Introduction. Roadmap. Optimal Quantizer Uniform Quantizer Non Uniform Quantizer Rate Distorsion Theory. Source coding.

Quantization. Introduction. Roadmap. Optimal Quantizer Uniform Quantizer Non Uniform Quantizer Rate Distorsion Theory. Source coding. Roadmap Quantization Optimal Quantizer Uniform Quantizer Non Uniform Quantizer Rate Distorsion Theory Source coding 2 Introduction 4 1 Lossy coding Original source is discrete Lossless coding: bit rate

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Spectral Regularization

Spectral Regularization Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,

More information

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Fadwa DAMAK, Mounir BEN NASR, Mohamed CHTOUROU Department of Electrical Engineering ENIS Sfax, Tunisia {fadwa_damak,

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

CHAPTER 2 NEW RADIAL BASIS NEURAL NETWORKS AND THEIR APPLICATION IN A LARGE-SCALE HANDWRITTEN DIGIT RECOGNITION PROBLEM

CHAPTER 2 NEW RADIAL BASIS NEURAL NETWORKS AND THEIR APPLICATION IN A LARGE-SCALE HANDWRITTEN DIGIT RECOGNITION PROBLEM CHAPTER 2 NEW RADIAL BASIS NEURAL NETWORKS AND THEIR APPLICATION IN A LARGE-SCALE HANDWRITTEN DIGIT RECOGNITION PROBLEM N.B. Karayiannis Department of Electrical and Computer Engineering University of

More information

Vector Quantization and Subband Coding

Vector Quantization and Subband Coding Vector Quantization and Subband Coding 18-796 ultimedia Communications: Coding, Systems, and Networking Prof. Tsuhan Chen tsuhan@ece.cmu.edu Vector Quantization 1 Vector Quantization (VQ) Each image block

More information

Hopfield Network Recurrent Netorks

Hopfield Network Recurrent Netorks Hopfield Network Recurrent Netorks w 2 w n E.P. P.E. y (k) Auto-Associative Memory: Given an initial n bit pattern returns the closest stored (associated) pattern. No P.E. self-feedback! w 2 w n2 E.P.2

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial basis () if its output depends on (is a non-increasing function of) the distance of the input from a given stored vector. s represent local receptors,

More information

CSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent

CSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent CSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent April 27, 2018 1 / 32 Outline 1) Moment and Nesterov s accelerated gradient descent 2) AdaGrad and RMSProp 4) Adam 5) Stochastic

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial () if its output depends on (is a nonincreasing function of) the distance of the input from a given stored vector. s represent local receptors, as illustrated

More information

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression Institut Mines-Telecom Vector Quantization Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced Compression 2/66 19.01.18 Institut Mines-Telecom Vector Quantization Outline Gain-shape VQ 3/66 19.01.18

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Back to the future: Radial Basis Function networks revisited

Back to the future: Radial Basis Function networks revisited Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions

More information

CSE446: non-parametric methods Spring 2017

CSE446: non-parametric methods Spring 2017 CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Long-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method

Long-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method Long-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method Geoffroy Simon Université catholique de Louvain DICE - Place du Levant, 3 B-1348 Louvain-la-Neuve Belgium

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Abstract This report has the purpose of describing several algorithms from the literature all related to competitive learning. A uniform terminology i

Abstract This report has the purpose of describing several algorithms from the literature all related to competitive learning. A uniform terminology i Some Competitive Learning Methods Bernd Fritzke Systems Biophysics Institute for Neural Computation Ruhr-Universitat Bochum Draft from April 5, 1997 (Some additions and renements are planned for this document

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Bootstrap for model selection: linear approximation of the optimism

Bootstrap for model selection: linear approximation of the optimism Bootstrap for model selection: linear approximation of the optimism G. Simon 1, A. Lendasse 2, M. Verleysen 1, Université catholique de Louvain 1 DICE - Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium,

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Self-organizing Neural Networks

Self-organizing Neural Networks Bachelor project Self-organizing Neural Networks Advisor: Klaus Meer Jacob Aae Mikkelsen 19 10 76 31. December 2006 Abstract In the first part of this report, a brief introduction to neural networks in

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Relevance determination in learning vector quantization Thorsten Bojer, Barbara Hammer, Daniel Schunk, and Katharina Tluk von Toschanowitz University of Osnabrück, Department of Mathematics/ Computer Science,

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Fractal Dimension and Vector Quantization

Fractal Dimension and Vector Quantization Fractal Dimension and Vector Quantization Krishna Kumaraswamy a, Vasileios Megalooikonomou b,, Christos Faloutsos a a School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 523 b Department

More information

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016 CPSC 340: Machine Learning and Data Mining Gradient Descent Fall 2016 Admin Assignment 1: Marks up this weekend on UBC Connect. Assignment 2: 3 late days to hand it in Monday. Assignment 3: Due Wednesday

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

2 Tikhonov Regularization and ERM

2 Tikhonov Regularization and ERM Introduction Here we discusses how a class of regularization methods originally designed to solve ill-posed inverse problems give rise to regularized learning algorithms. These algorithms are kernel methods

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze -

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze - Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze - SS 004 Holger Fröhlich (abg. Vorl. von S. Kaushik¹) Lehrstuhl Rechnerarchitektur, Prof. Dr. A. Zell ¹www.cse.iitd.ernet.in/~saroj Radial

More information

DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING

DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING Vanessa Gómez-Verdejo, Jerónimo Arenas-García, Manuel Ortega-Moral and Aníbal R. Figueiras-Vidal Department of Signal Theory and Communications Universidad

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Fractal Dimension and Vector Quantization

Fractal Dimension and Vector Quantization Fractal Dimension and Vector Quantization [Extended Abstract] Krishna Kumaraswamy Center for Automated Learning and Discovery, Carnegie Mellon University skkumar@cs.cmu.edu Vasileios Megalooikonomou Department

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

CHAPTER IX Radial Basis Function Networks

CHAPTER IX Radial Basis Function Networks Ugur HAICI - METU EEE - ANKARA 2/2/2005 CHAPTER IX Radial Basis Function Networks Introduction Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm.

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Generalization and Function Approximation

Generalization and Function Approximation Generalization and Function Approximation 0 Generalization and Function Approximation Suggested reading: Chapter 8 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.

More information

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU

More information

Vector quantization: a weighted version for time-series forecasting

Vector quantization: a weighted version for time-series forecasting Future Generation Computer Systems 21 (2005) 1056 1067 Vector quantization: a weighted version for time-series forecasting A. Lendasse a,, D. Francois b, V. Wertz b, M. Verleysen c a Computer Science Division,

More information

Example: for source

Example: for source Nonuniform scalar quantizer References: Sayood Chap. 9, Gersho and Gray, Chap.'s 5 and 6. The basic idea: For a nonuniform source density, put smaller cells and levels where the density is larger, thereby

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Least Squares Approximation

Least Squares Approximation Chapter 6 Least Squares Approximation As we saw in Chapter 5 we can interpret radial basis function interpolation as a constrained optimization problem. We now take this point of view again, but start

More information

Multi-Layer Boosting for Pattern Recognition

Multi-Layer Boosting for Pattern Recognition Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

EXPECTATION- MAXIMIZATION THEORY

EXPECTATION- MAXIMIZATION THEORY Chapter 3 EXPECTATION- MAXIMIZATION THEORY 3.1 Introduction Learning networks are commonly categorized in terms of supervised and unsupervised networks. In unsupervised learning, the training set consists

More information

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Fall, 2018 Outline Introduction A Brief History ANN Architecture Terminology

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

CHAPTER 3. Transformed Vector Quantization with Orthogonal Polynomials Introduction Vector quantization

CHAPTER 3. Transformed Vector Quantization with Orthogonal Polynomials Introduction Vector quantization 3.1. Introduction CHAPTER 3 Transformed Vector Quantization with Orthogonal Polynomials In the previous chapter, a new integer image coding technique based on orthogonal polynomials for monochrome images

More information

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014 Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists

More information

Automatic Noise Recognition Based on Neural Network Using LPC and MFCC Feature Parameters

Automatic Noise Recognition Based on Neural Network Using LPC and MFCC Feature Parameters Proceedings of the Federated Conference on Computer Science and Information Systems pp 69 73 ISBN 978-83-60810-51-4 Automatic Noise Recognition Based on Neural Network Using LPC and MFCC Feature Parameters

More information

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions Chapter ML:VI VI. Neural Networks Perceptron Learning Gradient Descent Multilayer Perceptron Radial asis Functions ML:VI-1 Neural Networks STEIN 2005-2018 The iological Model Simplified model of a neuron:

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition Review of Quantization UMCP ENEE631 Slides (created by M.Wu 004) Quantization UMCP ENEE631 Slides (created by M.Wu 001/004) L-level Quantization Minimize errors for this lossy process What L values to

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Analysis of Interest Rate Curves Clustering Using Self-Organising Maps

Analysis of Interest Rate Curves Clustering Using Self-Organising Maps Analysis of Interest Rate Curves Clustering Using Self-Organising Maps M. Kanevski (1), V. Timonin (1), A. Pozdnoukhov(1), M. Maignan (1,2) (1) Institute of Geomatics and Analysis of Risk (IGAR), University

More information

3.4 Linear Least-Squares Filter

3.4 Linear Least-Squares Filter X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum

More information

Classification and Pattern Recognition

Classification and Pattern Recognition Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

A Tutorial on Support Vector Machine

A Tutorial on Support Vector Machine A Tutorial on School of Computing National University of Singapore Contents Theory on Using with Other s Contents Transforming Theory on Using with Other s What is a classifier? A function that maps instances

More information

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.

More information