1 Introduction Robust statistics methods provide tools for statistics problems in which underlying assumptions are inexact. A robust procedure should

Size: px
Start display at page:

Download "1 Introduction Robust statistics methods provide tools for statistics problems in which underlying assumptions are inexact. A robust procedure should"

Transcription

1 Pattern Recognition, Vol.29, No.1, pages , Robustizing Robust M-Estimation Using Deterministic Annealing S. Z. Li School of Electrical and Electronic Engineering Nanyang Technological University Singapore ABSTRACT This paper presents a modied robust M-estimator referred to as annealing M-estimator (AM-estimator) to avoid problems with M-estimator. The AM-estimator combines the annealing technique into the M-estimator. It has the following advantages: It gives the global solution regardless of the initialization. It involves no scale estimator nor free parameters, avoiding the unreliability therein. Nor does it need order statistics such as the median and hence no sorting. Experimental results show that the AM-estimator is very stable and has an elegant behavior w.r.t. percentage of outliers and noise variance. Key words Annealing, M-estimator, pattern recognition, robust statistics. 1

2 1 Introduction Robust statistics methods provide tools for statistics problems in which underlying assumptions are inexact. A robust procedure should be insensitive to departures from underlying assumptions, that is, it should have good performance under the underlying assumptions and the performance deteriorates gracefully as the situation departs from the assumptions. One of the primary concerns of robustness is with noise distributions. In practice, it is very common to assume Gaussian noise but this is often not accurate. A typical situation is contaminated Gaussian which is a mixture of Gaussian and some unknown distributions. There are various types of robust estimators. Recent years have seen increasing interests in applications of robust techniques in computer vision. Kashyap and Eom [1] develop an robust algorithm for estimating parameters in an autoregressive image model where the noise is assumed to be a mixture of a Gaussian and an outlier process. Shulman and Herve [2] propose to use Huber's robust M-estimator [3] to compute optical ow involving discontinuities. Stevenson and Delp [4] use the same estimator for curve tting. Besl et al. [5] propose a robust M window operator to prevent smoothing across discontinuities. Haralick et al. [6], Kumar and Hanson [7] and Zhuang et al. [8] use robust estimators to nd pose parameters. Jolion and Meer [9] identify clusters in feature space based on the robust minimum volume ellipsoid estimator. More recently, Boyer et al. [10] present a procedure for surface parameterization using a robust M estimator. Other recent advances in this area can be found in Proceedings of International Conference on Robust Computer Vision [11, 12]. A study on Markov random eld (MRF) vision models [13] points out close relationships between robust M-estimators and discontinuity-adaptive MRF priors. This paper concerns the following computational diculties associated with the M-estimator: First, the M-estimator is not robust to the initialization; the choice of the initial estimate has signicant inuence on the quality of the M-estimate [6, 14, 8, 15]. This is a problem common to nonlinear regression procedures [16]. It is due to the following reason: The M-estimate is dened as the global minimum of a non-convex energy function. However, gradient based algorithms, which are commonly used, can get stuck at a non-global solution. Second, it is dependent on some scale estimate, such as the median of absolute deviation (MAD). Nonetheless, the robustness of such estimates is questionable and deserves a devoted study. Third, there are free parameters which, together with the scale estimate, determine the threshold for rejection of outliers and have to be chosen on some basis. Forth, the convergence of the M-estimator is not approved in most cases and often not guaranteed. Owing to the above problems, the theoretical breakdown point can hardly be achieved. The aim of this paper is to avoid the above problems. We presents a modied robust M- estimator referred to as annealing M-estimator (AM-estimator). The AM-estimator combines the annealing technique [17, 18, 19] into the M-estimator in the estimation process. It involves no scale estimates such as the MAD nor free parameters therein. This avoids problems with the unreliability of scale estimates and the selection of parameters. No order statistics are needed and hence no sorting. Most importantly, the AM-estimate is independent on the initialization. The AM-estimate is dened as the minimum of a global energy function parameterized by a parameter. The annealing is performed by continuation in from a very high value down to 0 +. The sequence of global solutions is traced for decreasing values of. The nal solution is in the zero parameter limit when! 0 +. Experimental results show that the AM-estimator is very stable and has an elegant behavior w.r.t. percentage of outliers and noise variance, in contrast to the M-estimator. The rest of the paper is organized as follows. Section 2 presents the annealing robust estimator. Section 3 presents experimental results. Section 4 concludes the paper. 2

3 2 Robust Estimators and Annealing The AM-estimator is very similar to the familiar M-estimator in form. We rst briey describe the M-estimator and then introduce the AM-estimator and point out dierences. 2.1 The M-Estimator The essential form of the M-estimation problem is the following: Given a set of m data samples r = fr i j 1 i mg where r i = f + i, the problem is to estimate the location parameter f under noise i. The distribution of i is not assumed to be known exactly. The only underlying assumption is that 1 ; : : : ; m obey a symmetric, independent, identical distribution (symmetric i.i.d.). A robust estimator has to deal with departures from this assumption. Let the residual errors be i = r i? f (i = 1; : : : ; m) and the potential (penalty) function be g( i ). The M-estimate f is dened as the minimum of a global energy function f = arg min E(f) (1) f where X E(f) = g(r i? f) (2) i To minimize above, it is necessary to solve the following equation X i (r i? f) = 0 (3) where () = g 0 (). This is based on gradient descent. When g( i ) is also a function of 2 i, ( i ) takes the following form ( i ) = 2 i h( i ) = 2(r i? f)h(r i? f) (4) where h() is an even function. In this case, the estimate f can be expressed as the following weighted sum of the data samples f = P i h( i) r i P i h( i) where h acts as the interaction (weighting) function. The above represents a xed point equation because i = r i? f. The function h provides adaptive interaction; data points r i with larger errors eta i have smaller interactions h( i ) and those with innitely large errors have the zero interactions. The interaction function h in M-estimation is usually piecewisely dened. Tukey's biweight function [20] is dened as 2 2 i 1? if 2 i cs cs < 1 where h( i ) = 8 < : 0 otherwise (5) For example, (6) S = median i f i g (7) is an estimate of spread, c is a constant whose value is often set to 6 or 9, and cs is the scale estimate. Some use the following as the scale estimate cs = c medianfj i? medianf i gjg (8) where the constant c is usually taken as c = 1:4826 to be consistent with the Gaussian distribution. All M-estimators involve some scale estimates. Design of scale estimates is crucial and nding good scale estimates is a topic in robust statistics. Classical scale estimates such as the median and MAD are not very robust. 3

4 2.2 The Annealing M-Estimator The AM-estimator has the same form as the M-estimator and Equations (1) { (5) apply to the both. The major dierences is that the scale estimate in the M-estimator is replaced by a parameter in the AM-estimator in which the value of is approaching 0 + during the estimation process. The AM-estimate under is dened by f = P i h ( i ) r i P i h ( i ) where h () is an adaptive interaction function parameterized by and will be dened in the next subsection. The AM-estimate is dened in the zero parameter limit (9) f = lim!0 + f (10) Computationally, is initially set to a high enough value and is decreased toward 0 + (a very small number); a sequence of solutions ff g is generated for the decreasing ; and f is the last one in the sequence. This is the annealing process. Given the form of the AM-estimator, it is the interaction function h and the annealing schedule that determine the nal AM-estimate. The following two subsections dene the AM-estimator. 2.3 Adaptive Interaction Functions Denition 1. An adaptive interaction function (AIF) h parameterized by (> 0) is a function which satises: (i) h is continuous, (ii) h () = h (?), (iii) h () > 0, (iv) h 0 () < 0 (8 > 0) and (v) lim!1 jh ()j = C < 1. The class of AIFs is dened as the collection of all such h and is denoted by H. # The continuity of (i) means that the interaction varies continuously w.r.t. the error. The evenness of (ii) makes the interaction relative to the error magnitude only, regardless of its sign. The positive deniteness of (iii) keeps the weight positive. The monotony of (iv) leads to decreasing interaction as the error magnitude increases. In (v), C 0 is a constant whose value is the asymptote of jh ()j. To satisfy this property, it is necessary that lim!1 h () = 0. This is essential for robust estimators; it sets zero interactions for data points having innitely large errors. The above characterizes the properties that an AIF h should possess rather than instantiates some particular forms; therefore the denition is rather broad. The denition of the AIF has more implications than the robust estimation: It also describes how neighboring pixels of an image should interact for discontinuity adaptive regularization (smoothing) [21]. The AM-estimator is dened by Eqs.(9 { 10) constrained by h 2 H. An AIF adaptively weights the importance of data points in computing the estimate. 2.4 Adaptive Potential Functions Denition 2. R The adaptive potential function (APF) corresponding to an h 2 H by g () = h ( 0 )d 0. # is dened 4

5 AIF APF Band h 1 () = exp(? 2 =) g 1 () =? exp(? 2 =) B 1 = (? p =2; p =2) h 2 () = 1=[1 + 2 =] 2 g 2 () =?=[1 + 2 =] B 2 = (? p =3; p =3) h 3 () = 1=[1 + 2 =] g 3 () = log(1 + 2 =) B 3 = (? p ; p ) Table 1: Three possible choices of h (), the corresponding g () and the bands. Figure 1: The qualitative shapes of the three APFs g () (bottom) and their directive functions g 0 = 2h () (top). Basically, g is C 1 continuous; it is even: g () = g (?); its derivative function is odd g() 0 =?g(?). 0 However, it is not necessary for g (1) to be bounded. Furthermore, g is strictly monotonically increasing as the error jj increases because g () = g (jj) and g() 0 = h () > 0 for > 0. This means larger jj leads to larger potential (penalty) g (). It conforms to the original spirit of the quadratic potential function g q () = 2. Most existing potential functions in M-estimation do not have such a property: the potential there does not increase as the error jj increases beyond certain value. Of the two denitions, the former is more important for dening the AM-estimator. It is h 2 H that captures the essence of the robustness of the AM-estimate to distributional outliers. It is usually unnecessary to dene the AM-estimator based on the potential function g. Nonetheless, knowing g is helpful for us to study the convexity property of the corresponding AM-estimator. For a given g (), there exists a region of within which the function is convex: B = [b L ; b H ] = f j g 00 () 0g (11) We refer to this region B as the band. The lower and upper bounds b L ; b H correspond to the two extrema of g(), 0 which can be obtained by letting g() 00 = 0, and we have b l =?b H when g is even. When b L < < b H, g() 00 > 0 and thus g () is strictly convex. Table 1 instantiates three possible choices of AIFs, the corresponding APFs and the bands. Fig.1 shows the qualitative shapes of the three APFs g () and their derivative functions g() 0 = 5

6 AM-Estimator Begin Algorithm (1) set t = 1, f (0) = f MSE ; choose initial ; (2) do f (3) t t + 1; (4) compute errors i = r i? f (t) ; 8i; (5) compute weighted sum f (t) = (6) lower temperature lower(); (7) g until ( < or jf (t) (7) f f (t) ; End Algorithm P i h ( i) r i Pi h ( i ; )? f (t?1) j < ) /* converged */ Figure 2: The AM-estimation algorithm. 2h (). Another interesting AIF is h 4 = 1=[1 + jj=] (12) It allows bounded but non-zero contribution due to errors i = r i?f! 1, with lim!1 h 4 () =. It is attractive because g 00 4 () = [2h 4()] 0 > 0 for all and hence leads to strictly convex minimization. Huber's function [3] g () = minf 2 ; 2 + 2j? jg (13) is a convex function. Its rst derivative is g 0 () = 2h () = 2 for jj and g 0 () = 2h () = 2=jj for other. Hence, its AIF is h () = 1 for jj and h () = =jj for other. 2.5 Annealing Procedure When g () is non-convex, the direct method using the xed point iteration f (t+1) = Pi h (r i? f (t) ) r i P i h (r i? f (t) ) (14) can get stuck at a local minimum because the above equating is derived based on gradient descent. This problem is particularly serious for small. To solve this problem, we combine the annealing technique into the iteration. Annealing, stochastic [17, 18, 19] or deterministic [22, 23, 24, 25], is a continuation technique for avoiding local optima. In the AM-estimator, the annealing is performed by gradually decreasing the parameter value towards 0 + during the estimation process. This is very signicant in improving the quality of the estimate. Initially, the parameter is set to a suciently large value (0) such that the APF g () is strictly convex. With such (0), it is easy to nd the unique minimum of the global energy function E(f ) using the gradient descent method, regardless of the initial value f (0). The 6

7 minmum is then used as the initial value for the next phase of minimization under a lower to obtain the next minimum. As is lowered, g () may be no longer convex and local minima may appear. However, if we track the global minima from high to! 0 +, we can approximate the global minimum f under! 0 +. The AM-estimator algorithm is summarized in C-like language in g.2. Initially, f is set to the MSE estimate f MSE = 1 m The initial is chosen to satisfy the following mx i=1 r i (15) j i j = jr i? f MSE j < b H () (16) This is to guarantee g( 00 i ) > 0 and hence the convexity of g. In the above, b H () (=?b L ()) is the upper bound of the band in (11). The parameter is lowered according to a schedule written in the function lower() in line (6). The convergence is judged by the conditions in line (7) where and are small numbers. We found that the location estimate converges (jf (t) after dozens of iterations and there is no much change for smaller. 3 Experimental Results? f (t?1) j < ) The following experiment on location estimation is aimed to compare performances of the AMestimator and the M-estimator with Tukey's biweight function. Simulated data points in 2D locations are generated. The data set is a mixture of true data points and outliers. First, m true data points f(x i ; y i ) j i = 1; : : : ; mg are randomly generated around f = (10; 10). The values of x i and y i obey an identical, independent Gaussian distribution with a xed mean value of 10 and a variance value V. After that, a percentage of the m data points are replaced by random outlier values. The outliers are uniformly distributed in a square of size centered at (b; b) 6= f. There are four parameters to control the data generation. Their values are: 1. the number of data points m 2 f50; 200g, 2. the noise variance V 2 f0; 2; 5; 8; 12; 17; 23; 30g, 3. the the percentage of outliers from 0 to 70 with step 5, and 4. the outlier square center parameter b = 22:5 or 50. The experiments are done with dierent combination of the parameter values.? The AIF is chosen to be h () = h 3 () = 1:0=(1 + 2 =). The schedule in lower(t ) is: 100 1:5? 1; when t 2 time t! 1,! 0 +. It takes about 50 iterations to converge for each of these data sets. Two quantities are computed as the performance measures of an estimator: (1) the mean error e versus the percentage of outliers (PO) and (2) the mean error e versus the noise variance (NV) V. Let the Euclidean error by e = kf? fk p = (x? 10) 2 + (y? 10) 2 where f is the estimate and f is the true location. Fig. 3 and 4 show the performances of the AM-estimator and the M-estimator, respectively. A comparison of these results demonstrates that the AM-estimator is remarkably better than the M-estimator. The AM-estimator has not only lower estimation errors but also behaves in a more stable and elegant manner as the percentage of outliers and the noise variance increase. 7

8 Figure 3: Mean error of the AM-estimate. Mean error vs. percentage of outliers with m = 50 (row 1) and m = 200 (row 2). Mean error vs. noise variance with m = 50 (row 3) and m = 200 (row 4). Outlier are uniformly distributed in a square centered at b = 22:5 (left) or b = 50 (right). 8

9 Figure 4: Mean error of the M-estimate. Mean error vs. percentage of outliers with m = 50 (row 1) and m = 200 (row 2). Mean error vs. noise variance with m = 50 (row 3) and m = 200 (row 4). Outlier are uniformly distributed in a square centered at b = 22:5 (left) or b = 50 (right). 9

10 4 Conclusion The AM-estimator has the advantages that it gets rid of scale estimates and free parameters and nds a good approximation to the global solution. Divergence is minimal in the AM-estimator because the initial estimate for the current value is the convergence point obtained with the previous value. Experimental results demonstrate signicant improvements in accuracy, stability, and breakdown point as well, over the traditional M-estimator. Each statistic is made based on 1000 random tests and the data sets are exactly the same for the two methods. These mean that the results of the comparison of the two methods are suciently reliable. References [1] R. L. Kashyap and K. N. Eom. \Robust image modeling techniques with their applications". IEEE Transactions on Acoustic, Speech and Signal Processing, 36(8):1313{1325, [2] D. Shulman and J.Y. Herve'. \Regularization of discontinuous ow elds". In Proc. Workshop on Visual Motion, pages 81{86, [3] P. Huber. Robust Statistics. Wiley, [4] R. Stevenson and E. Delp. \Fitting curves with discontinuities". In Proceedings of International Workshop on Robust Computer Vision, pages 127{136, Seattle, WA, October [5] P. J. Besl, J. B. Birch, and L. T. Watson. \Robust window operators". In Proceedings of Second International Conference on Computer Vision, pages 591{600, Florida, December [6] R. M. Haralick, H. Joo, C.N. Lee, X. Zhuang, V.G. Vaidya, and M.B. Kim. \ Pose estimation from corresponding point data". IEEE Transactions on Systems, Man and Cybernetics, 19:1426{1446, [7] R. Kumar and A.R. Hanson. \Robust estimation of camera location and orientation from noisy data having outliers". In Proc Workshop on Interpretation of Three-Dimensional Scenes, pages 52{60, [8] X. Zhuang, T. Wang, and P. Zhang. \A highly robust estimator through partially likelihood function modeling and its application in computer vision". IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, [9] J. M. Jolion, P. Meer, and S. Bataouche. \Robust clustering with applications in computer vision". IEEE Transactions on Pattern Analysis and Machine Intelligence, 13:791{802, [10] K. L. Boyer, M. J. Mirza, and G. Ganguly. \The robust sequential estimator: A general approach and its application to surface organization in range data". IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(10), Octorber [11] R. M. Haralick, editor. Proceedings of International Workshop on Robust Computer Vision, Seattle, WA, October [12] W. Forstner and S. Ruwiedel, editors. Robust Computer Vision-Quality of Vision Algorithms (Proceedings of 2nd International Workshop on Robust Computer Vision, Karlsruhe, Germany, March 10-12),

11 [13] S. Z. Li. Markov Random Field Modeling in Computer Vision. Springer-Verlag, [14] P. Meer, D. Mintz, A. Rosenfeld, and D.Y. Kim. \Robust regression methods for computer vision: A review". International Journal of Computer Vision, 6:59{70, [15] A. Stein and M. Werman. \Robust statistics in shape tting". In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 540{ 546, [16] Raymond H. Myers. Classical and Modern Regression with Applications. PWS-Kent Publishing Company, [17] S. Kirkpatrick, C. D. Gellatt, and M. P. Vecchi. \Optimization by simulated annealing". Science, 220:671{680, [18] S. Geman and D. Geman. \Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images". IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):721{741, November [19] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski. \A learning algorithm for Boltzmann machines". Cognitive Science, 9:147{169, [20] J. W. Tukey. Explortary Data Analysis. Addison-Wesley, Reading, MA, [21] S. Z. Li. \On discontinuity-adaptive smoothness priors in computer vision". IEEE Transactions on Pattern Analysis and Machine Intelligence, accepted for publication. [22] A. Blake and A. Zisserman. Visual Reconstruction. MIT Press, Cambridge, MA, [23] J. J. Hopeld. \Neurons with graded response have collective computational properties like those of two state neurons". Proceedings of National Academic Science, USA, 81:3088{3092, [24] C. Koch, J. Marroquin, and A. Yuille. \Analog `neuronal' networks in early vision". Proceedings of National Academic Science, USA, 83:4263{4267, [25] A. Witkin, D. T. Terzopoulos, and M. Kass. \Signal matching through scale space". International Journal of Computer Vision, pages 133{144, Biography S. Z. Li received the B.Sc degree from Hunan University, China, in 1982, M.Sc degree from the National University of Defense Technology, China, in 1985 and Ph.D degree from the University of Surrey, UK, in All degrees are in EEE. He is currently a lecturer at Nanyang Technological University, Singapore. His research interests include computer vision, pattern recognition, image processing and optimization methods. 11

12 Figure 5: The qualitative shapes of the three APFs g () (bottom) and their directive functions g 0 = 2h () (top). 12

13 AM-Estimator Begin Algorithm (1) set t = 1, f (0) = f MSE ; choose initial ; (2) do f (3) t t + 1; (4) compute errors i = r i? f (t) ; 8i; (5) compute weighted sum f (t) = (6) lower temperature lower(); (7) g until ( < or jf (t) (7) f f (t) ; End Algorithm P i h ( i) r i Pi h ( i ; )? f (t?1) j < ) /* converged */ Figure 6: The AM-estimation algorithm. 13

14 Figure 7: Mean error of the AM-estimate. Mean error vs. percentage of outliers with m = 50 (row 1) and m = 200 (row 2). Mean error vs. noise variance with m = 50 (row 3) and m = 200 (row 4). Outlier are uniformly distributed in a square centered at b = 22:5 (left) or b = 50 (right). 14

15 Figure 8: Mean error of the M-estimate. Mean error vs. percentage of outliers with m = 50 (row 1) and m = 200 (row 2). Mean error vs. noise variance with m = 50 (row 3) and m = 200 (row 4). Outlier are uniformly distributed in a square centered at b = 22:5 (left) or b = 50 (right). 15

Continuous State MRF s

Continuous State MRF s EE64 Digital Image Processing II: Purdue University VISE - December 4, Continuous State MRF s Topics to be covered: Quadratic functions Non-Convex functions Continuous MAP estimation Convex functions EE64

More information

Notes on Regularization and Robust Estimation Psych 267/CS 348D/EE 365 Prof. David J. Heeger September 15, 1998

Notes on Regularization and Robust Estimation Psych 267/CS 348D/EE 365 Prof. David J. Heeger September 15, 1998 Notes on Regularization and Robust Estimation Psych 67/CS 348D/EE 365 Prof. David J. Heeger September 5, 998 Regularization. Regularization is a class of techniques that have been widely used to solve

More information

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.

More information

ARTIFICIAL INTELLIGENCE LABORATORY. and CENTER FOR BIOLOGICAL INFORMATION PROCESSING. A.I. Memo No August Federico Girosi.

ARTIFICIAL INTELLIGENCE LABORATORY. and CENTER FOR BIOLOGICAL INFORMATION PROCESSING. A.I. Memo No August Federico Girosi. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL INFORMATION PROCESSING WHITAKER COLLEGE A.I. Memo No. 1287 August 1991 C.B.I.P. Paper No. 66 Models of

More information

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Sean Borman and Robert L. Stevenson Department of Electrical Engineering, University of Notre Dame Notre Dame,

More information

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo The Fixed-Point Algorithm and Maximum Likelihood Estimation for Independent Component Analysis Aapo Hyvarinen Helsinki University of Technology Laboratory of Computer and Information Science P.O.Box 5400,

More information

Impulsive Noise Filtering In Biomedical Signals With Application of New Myriad Filter

Impulsive Noise Filtering In Biomedical Signals With Application of New Myriad Filter BIOSIGAL 21 Impulsive oise Filtering In Biomedical Signals With Application of ew Myriad Filter Tomasz Pander 1 1 Division of Biomedical Electronics, Institute of Electronics, Silesian University of Technology,

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting

More information

Markov chain Monte Carlo methods for visual tracking

Markov chain Monte Carlo methods for visual tracking Markov chain Monte Carlo methods for visual tracking Ray Luo rluo@cory.eecs.berkeley.edu Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720

More information

Gaussian process for nonstationary time series prediction

Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong

More information

Comments on \Wavelets in Statistics: A Review" by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill

Comments on \Wavelets in Statistics: A Review by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill Comments on \Wavelets in Statistics: A Review" by A. Antoniadis Jianqing Fan University of North Carolina, Chapel Hill and University of California, Los Angeles I would like to congratulate Professor Antoniadis

More information

SINCE THE elegant formulation of anisotropic diffusion

SINCE THE elegant formulation of anisotropic diffusion IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 3, MARCH 1998 421 Robust Anisotropic Diffusion Michael J. Black, Member, IEEE, Guillermo Sapiro, Member, IEEE, David H. Marimont, Member, IEEE, and David

More information

CONDITIONAL MORPHOLOGY FOR IMAGE RESTORATION

CONDITIONAL MORPHOLOGY FOR IMAGE RESTORATION CONDTONAL MORPHOLOGY FOR MAGE RESTORATON C.S. Regauoni, E. Stringa, C. Valpreda Department of Biophysical and Electronic Engineering University of Genoa - Via all opera Pia 11 A, Genova taly 16 145 ABSTRACT

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a preprint version which may differ from the publisher's version. For additional information about this

More information

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i )

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i ) Symmetric Networks Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). How can we model an associative memory? Let M = {v 1,..., v m } be a

More information

Nonnegative Tensor Factorization with Smoothness Constraints

Nonnegative Tensor Factorization with Smoothness Constraints Nonnegative Tensor Factorization with Smoothness Constraints Rafal ZDUNEK 1 and Tomasz M. RUTKOWSKI 2 1 Institute of Telecommunications, Teleinformatics and Acoustics, Wroclaw University of Technology,

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show

More information

/97/$10.00 (c) 1997 AACC

/97/$10.00 (c) 1997 AACC Optimal Random Perturbations for Stochastic Approximation using a Simultaneous Perturbation Gradient Approximation 1 PAYMAN SADEGH, and JAMES C. SPALL y y Dept. of Mathematical Modeling, Technical University

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Performance Comparison of Two Implementations of the Leaky. LMS Adaptive Filter. Scott C. Douglas. University of Utah. Salt Lake City, Utah 84112

Performance Comparison of Two Implementations of the Leaky. LMS Adaptive Filter. Scott C. Douglas. University of Utah. Salt Lake City, Utah 84112 Performance Comparison of Two Implementations of the Leaky LMS Adaptive Filter Scott C. Douglas Department of Electrical Engineering University of Utah Salt Lake City, Utah 8411 Abstract{ The leaky LMS

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

ε ε

ε ε The 8th International Conference on Computer Vision, July, Vancouver, Canada, Vol., pp. 86{9. Motion Segmentation by Subspace Separation and Model Selection Kenichi Kanatani Department of Information Technology,

More information

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada. In Advances in Neural Information Processing Systems 8 eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press, 1996. Gaussian Processes for Regression Christopher K. I. Williams Neural Computing

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

(xf, yf) (xn, yn) (x1, y1) (xs, ys)

(xf, yf) (xn, yn) (x1, y1) (xs, ys) Modelling and Testing the Stability of Edge Segments: Length and Orientation Claus Brndgaard Madsen & Henrik Iskov Christensen Aalborg University, Fr. Bajers Vej 7D DK-9220 Aalborg, Denmark E-mail: [cbm,

More information

Convergence of Newton's Method for a Minimization Problem in Impulse Noise Removal Raymond H. Chan, Chung-Wa Ho, and Mila Nikolova y February 5, 2004

Convergence of Newton's Method for a Minimization Problem in Impulse Noise Removal Raymond H. Chan, Chung-Wa Ho, and Mila Nikolova y February 5, 2004 Convergence of Newton's Method for a Minimization Problem in Impulse Noise Removal Raymond H. Chan, Chung-Wa Ho, and Mila Nikolova y February 5, 004 Abstract Recently, two-phase schemes for removing salt-and-pepper

More information

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models $ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department

More information

Accelerated MRI Image Reconstruction

Accelerated MRI Image Reconstruction IMAGING DATA EVALUATION AND ANALYTICS LAB (IDEAL) CS5540: Computational Techniques for Analyzing Clinical Data Lecture 15: Accelerated MRI Image Reconstruction Ashish Raj, PhD Image Data Evaluation and

More information

Finding a Needle in a Haystack: Conditions for Reliable. Detection in the Presence of Clutter

Finding a Needle in a Haystack: Conditions for Reliable. Detection in the Presence of Clutter Finding a eedle in a Haystack: Conditions for Reliable Detection in the Presence of Clutter Bruno Jedynak and Damianos Karakos October 23, 2006 Abstract We study conditions for the detection of an -length

More information

1 Introduction When the model structure does not match the system, is poorly identiable, or the available set of empirical data is not suciently infor

1 Introduction When the model structure does not match the system, is poorly identiable, or the available set of empirical data is not suciently infor On Tikhonov Regularization, Bias and Variance in Nonlinear System Identication Tor A. Johansen SINTEF Electronics and Cybernetics, Automatic Control Department, N-7034 Trondheim, Norway. Email: Tor.Arne.Johansen@ecy.sintef.no.

More information

ROYAL INSTITUTE OF TECHNOLOGY KUNGL TEKNISKA HÖGSKOLAN. Department of Signals, Sensors & Systems

ROYAL INSTITUTE OF TECHNOLOGY KUNGL TEKNISKA HÖGSKOLAN. Department of Signals, Sensors & Systems The Evil of Supereciency P. Stoica B. Ottersten To appear as a Fast Communication in Signal Processing IR-S3-SB-9633 ROYAL INSTITUTE OF TECHNOLOGY Department of Signals, Sensors & Systems Signal Processing

More information

Kjersti Aas Line Eikvil Otto Milvang. Norwegian Computing Center, P.O. Box 114 Blindern, N-0314 Oslo, Norway. sharp reexes was a challenge. machine.

Kjersti Aas Line Eikvil Otto Milvang. Norwegian Computing Center, P.O. Box 114 Blindern, N-0314 Oslo, Norway. sharp reexes was a challenge. machine. Automatic Can Separation Kjersti Aas Line Eikvil Otto Milvang Norwegian Computing Center, P.O. Box 114 Blindern, N-0314 Oslo, Norway e-mail: Kjersti.Aas@nr.no Tel: (+47) 22 85 25 00 Fax: (+47) 22 69 76

More information

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts

More information

Markov Random Fields (A Rough Guide)

Markov Random Fields (A Rough Guide) Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 Markov Random Fields (A Rough Guide) Anil C. Kokaram anil.kokaram@tcd.ie Electrical and Electronic Engineering Dept., University of Dublin,

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

Linearly-solvable Markov decision problems

Linearly-solvable Markov decision problems Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Grassmann Averages for Scalable Robust PCA Supplementary Material

Grassmann Averages for Scalable Robust PCA Supplementary Material Grassmann Averages for Scalable Robust PCA Supplementary Material Søren Hauberg DTU Compute Lyngby, Denmark sohau@dtu.dk Aasa Feragen DIKU and MPIs Tübingen Denmark and Germany aasa@diku.dk Michael J.

More information

In: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,

More information

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University

More information

Outliers Robustness in Multivariate Orthogonal Regression

Outliers Robustness in Multivariate Orthogonal Regression 674 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 6, NOVEMBER 2000 Outliers Robustness in Multivariate Orthogonal Regression Giuseppe Carlo Calafiore Abstract

More information

Remaining energy on log scale Number of linear PCA components

Remaining energy on log scale Number of linear PCA components NONLINEAR INDEPENDENT COMPONENT ANALYSIS USING ENSEMBLE LEARNING: EXPERIMENTS AND DISCUSSION Harri Lappalainen, Xavier Giannakopoulos, Antti Honkela, and Juha Karhunen Helsinki University of Technology,

More information

design variable x1

design variable x1 Multipoint linear approximations or stochastic chance constrained optimization problems with integer design variables L.F.P. Etman, S.J. Abspoel, J. Vervoort, R.A. van Rooij, J.J.M Rijpkema and J.E. Rooda

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

COMPRESSIVE SAMPLING USING EM ALGORITHM. Technical Report No: ASU/2014/4

COMPRESSIVE SAMPLING USING EM ALGORITHM. Technical Report No: ASU/2014/4 COMPRESSIVE SAMPLING USING EM ALGORITHM ATANU KUMAR GHOSH, ARNAB CHAKRABORTY Technical Report No: ASU/2014/4 Date: 29 th April, 2014 Applied Statistics Unit Indian Statistical Institute Kolkata- 700018

More information

J. Sadeghi E. Patelli M. de Angelis

J. Sadeghi E. Patelli M. de Angelis J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of

More information

Average Reward Parameters

Average Reward Parameters Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999 In: Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Methods and Applications (CIMA), International Computer Science Conventions Rochester New York, 999 Feature Selection Based

More information

ing a minimal set of layers, each describing a coherent motion from the scene, which yield a complete description of motion when superposed. Jepson an

ing a minimal set of layers, each describing a coherent motion from the scene, which yield a complete description of motion when superposed. Jepson an The Structure of Occlusion in Fourier Space S. S. Beauchemin Dept. of Computer Science The University of Western Ontario London, Canada N6A 5B7 e-mail: beau@csd.uwo.ca J. L. Barron Dept. of Computer Science

More information

No. of dimensions 1. No. of centers

No. of dimensions 1. No. of centers Contents 8.6 Course of dimensionality............................ 15 8.7 Computational aspects of linear estimators.................. 15 8.7.1 Diagonalization of circulant andblock-circulant matrices......

More information

Linear Regression. S. Sumitra

Linear Regression. S. Sumitra Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D

More information

Support Vector Machines vs Multi-Layer. Perceptron in Particle Identication. DIFI, Universita di Genova (I) INFN Sezione di Genova (I) Cambridge (US)

Support Vector Machines vs Multi-Layer. Perceptron in Particle Identication. DIFI, Universita di Genova (I) INFN Sezione di Genova (I) Cambridge (US) Support Vector Machines vs Multi-Layer Perceptron in Particle Identication N.Barabino 1, M.Pallavicini 2, A.Petrolini 1;2, M.Pontil 3;1, A.Verri 4;3 1 DIFI, Universita di Genova (I) 2 INFN Sezione di Genova

More information

Nonlinear Signal Processing ELEG 833

Nonlinear Signal Processing ELEG 833 Nonlinear Signal Processing ELEG 833 Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware arce@ee.udel.edu May 5, 2005 8 MYRIAD SMOOTHERS 8 Myriad Smoothers 8.1 FLOM

More information

Learning in Boltzmann Trees. Lawrence Saul and Michael Jordan. Massachusetts Institute of Technology. Cambridge, MA January 31, 1995.

Learning in Boltzmann Trees. Lawrence Saul and Michael Jordan. Massachusetts Institute of Technology. Cambridge, MA January 31, 1995. Learning in Boltzmann Trees Lawrence Saul and Michael Jordan Center for Biological and Computational Learning Massachusetts Institute of Technology 79 Amherst Street, E10-243 Cambridge, MA 02139 January

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT

More information

Surface Reconstruction: GNCs and MFA. Mads Nielsen y DIKU. Universitetsparken 1. DK-2100 Copenhagen. Denmark.

Surface Reconstruction: GNCs and MFA. Mads Nielsen y DIKU. Universitetsparken 1. DK-2100 Copenhagen. Denmark. Surface Reconstruction: GNCs and MFA Mads Nielsen y DIKU Universitetsparken 1 DK-100 Copenhagen Denmark email: malte@diku.dk September 1994 Abstract Noise-corrupted signals and images can be reconstructed

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24

More information

Simulated Annealing for Constrained Global Optimization

Simulated Annealing for Constrained Global Optimization Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective

More information

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp Input Selection with Partial Retraining Pierre van de Laar, Stan Gielen, and Tom Heskes RWCP? Novel Functions SNN?? Laboratory, Dept. of Medical Physics and Biophysics, University of Nijmegen, The Netherlands.

More information

CS 540: Machine Learning Lecture 1: Introduction

CS 540: Machine Learning Lecture 1: Introduction CS 540: Machine Learning Lecture 1: Introduction AD January 2008 AD () January 2008 1 / 41 Acknowledgments Thanks to Nando de Freitas Kevin Murphy AD () January 2008 2 / 41 Administrivia & Announcement

More information

Non-Euclidean Independent Component Analysis and Oja's Learning

Non-Euclidean Independent Component Analysis and Oja's Learning Non-Euclidean Independent Component Analysis and Oja's Learning M. Lange 1, M. Biehl 2, and T. Villmann 1 1- University of Appl. Sciences Mittweida - Dept. of Mathematics Mittweida, Saxonia - Germany 2-

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

On the Noise Model of Support Vector Machine Regression. Massimiliano Pontil, Sayan Mukherjee, Federico Girosi

On the Noise Model of Support Vector Machine Regression. Massimiliano Pontil, Sayan Mukherjee, Federico Girosi MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1651 October 1998

More information

Projektpartner. Sonderforschungsbereich 386, Paper 163 (1999) Online unter:

Projektpartner. Sonderforschungsbereich 386, Paper 163 (1999) Online unter: Toutenburg, Shalabh: Estimation of Regression Coefficients Subject to Exact Linear Restrictions when some Observations are Missing and Balanced Loss Function is Used Sonderforschungsbereich 386, Paper

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Slide a window along the input arc sequence S. Least-squares estimate. σ 2. σ Estimate 1. Statistically test the difference between θ 1 and θ 2

Slide a window along the input arc sequence S. Least-squares estimate. σ 2. σ Estimate 1. Statistically test the difference between θ 1 and θ 2 Corner Detection 2D Image Features Corners are important two dimensional features. Two dimensional image features are interesting local structures. They include junctions of dierent types Slide 3 They

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Uncertainty Models in Quasiconvex Optimization for Geometric Reconstruction

Uncertainty Models in Quasiconvex Optimization for Geometric Reconstruction Uncertainty Models in Quasiconvex Optimization for Geometric Reconstruction Qifa Ke and Takeo Kanade Department of Computer Science, Carnegie Mellon University Email: ke@cmu.edu, tk@cs.cmu.edu Abstract

More information

The simplest kind of unit we consider is a linear-gaussian unit. To

The simplest kind of unit we consider is a linear-gaussian unit. To A HIERARCHICAL COMMUNITY OF EXPERTS GEOFFREY E. HINTON BRIAN SALLANS AND ZOUBIN GHAHRAMANI Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 3H5 fhinton,sallans,zoubing@cs.toronto.edu

More information

Widths. Center Fluctuations. Centers. Centers. Widths

Widths. Center Fluctuations. Centers. Centers. Widths Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu

More information

Support Vector Regression with Automatic Accuracy Control B. Scholkopf y, P. Bartlett, A. Smola y,r.williamson FEIT/RSISE, Australian National University, Canberra, Australia y GMD FIRST, Rudower Chaussee

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

NON-NEGATIVE SPARSE CODING

NON-NEGATIVE SPARSE CODING NON-NEGATIVE SPARSE CODING Patrik O. Hoyer Neural Networks Research Centre Helsinki University of Technology P.O. Box 9800, FIN-02015 HUT, Finland patrik.hoyer@hut.fi To appear in: Neural Networks for

More information

A Robust PCA by LMSER Learning with Iterative Error. Bai-ling Zhang Irwin King Lei Xu.

A Robust PCA by LMSER Learning with Iterative Error. Bai-ling Zhang Irwin King Lei Xu. A Robust PCA by LMSER Learning with Iterative Error Reinforcement y Bai-ling Zhang Irwin King Lei Xu blzhang@cs.cuhk.hk king@cs.cuhk.hk lxu@cs.cuhk.hk Department of Computer Science The Chinese University

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past

More information

G. Larry Bretthorst. Washington University, Department of Chemistry. and. C. Ray Smith

G. Larry Bretthorst. Washington University, Department of Chemistry. and. C. Ray Smith in Infrared Systems and Components III, pp 93.104, Robert L. Caswell ed., SPIE Vol. 1050, 1989 Bayesian Analysis of Signals from Closely-Spaced Objects G. Larry Bretthorst Washington University, Department

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Missing Data and Dynamical Systems

Missing Data and Dynamical Systems U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598PS Machine Learning for Signal Processing Missing Data and Dynamical Systems 12 October 2017 Today s lecture Dealing with missing data Tracking and linear

More information

Performance of Low Density Parity Check Codes. as a Function of Actual and Assumed Noise Levels. David J.C. MacKay & Christopher P.

Performance of Low Density Parity Check Codes. as a Function of Actual and Assumed Noise Levels. David J.C. MacKay & Christopher P. Performance of Low Density Parity Check Codes as a Function of Actual and Assumed Noise Levels David J.C. MacKay & Christopher P. Hesketh Cavendish Laboratory, Cambridge, CB3 HE, United Kingdom. mackay@mrao.cam.ac.uk

More information

Random Field Models for Applications in Computer Vision

Random Field Models for Applications in Computer Vision Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

Error Empirical error. Generalization error. Time (number of iteration)

Error Empirical error. Generalization error. Time (number of iteration) Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems Weinan E 1 and Bing Yu 2 arxiv:1710.00211v1 [cs.lg] 30 Sep 2017 1 The Beijing Institute of Big Data Research,

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES S. Visuri 1 H. Oja V. Koivunen 1 1 Signal Processing Lab. Dept. of Statistics Tampere Univ. of Technology University of Jyväskylä P.O.

More information