Draft February Novelty Detection using Extreme Value Statistics. London, UK. February 23, 1999.

Size: px
Start display at page:

Download "Draft February Novelty Detection using Extreme Value Statistics. London, UK. February 23, 1999."

Transcription

1 Draft February 1999 Novelty Detection using Extreme Value Statistics Stephen J. Roberts Department of Electrical & Electronic Engineering Imperial College of Science, Technology & Medicine London, UK s.j.roberts@ic.ac.uk February 23, 1999 Abstract Extreme value theory (EVT) is a branch of statistics which concerns the distributions of data of unusually low or high value i.e. in the tails of some distribution. These extremal points are important in many applications as they represent the outlying regions of normal events against which we may wish to dene abnormal events. In the context of density modelling, novelty detection, or Radial-Basis function (RBF) systems points which lie outside of the range of expected extreme values may be agged as outliers. Whilst there has been interest in the area of novelty detection decisions as to whether a point is an outlier or not tend to be made on the basis of exceeding some (heuristic) threshold. In this paper it is shown that a more principled approach may be taken using extreme value statistics. 1 Introduction Extreme value theory (EVT) concerns the distributions of data of abnormally low or high value in the tails of some data generating distribution. As we observe more data points the statistics of the largest (or smallest) values we observe change. Knowledge of these statistics may be useful in a number of areas, for example detecting events which are outside our expected data extremes for outlier detection or for rejecting classication or regression on data which lies far from the expected statistics of some training data set. Novelty, outlier or abnormality detection is a method of choice in the case where the usual assumption of supervised classication breaks down. This assumption implies that the data set contains viable information regarding all classication hypotheses. This assumption will break down under two major conditions: 1. There are very few examples of an important class within the data set. This is often the case in medical data or in condition monitoring when data from `normal' system states are far easier and less costly to acquire than abnormal exemplars (patients with rare conditions, for example). In a classication system it becomes increasingly dicult to make decisions into this class as the 1

2 number of examples reduces. It is argued that, in such a case, it is better to formulate a model of the `normal' data and test for `novel' data against this model [1, 2]. 2. The data set contains information regarding all classes in the classication set, but the classication set itself is incomplete i.e. data from some extra class is observed at a future point. Such a classication system will classify this `novel' data erroneously into one of the classes in the classication set. For the rst of these conditions, furthermore, this paper distinguishes between two sub-cases: 1a. A data set exists which is known to be representative of `normal' data. No occurrences of `abnormalities' are believed to be present. 1b. The data set is believed to consist of a number of examples of `normal' data with a smaller, but unknown, number of `abnormal' data. The rst of these cases is, of course, simpler. A representation of the `normal' data may have any desired complexity (one can imagine a crude representation based on a single normal distribution, or a Parzen windows approach in which the representation is based upon each datum in the set). If the same approach is applied to case 1b the representation runs the risk of tting abnormal data and hence making its subsequent detection more dicult. It is desirable, therefore, to impose some form of model-complexity penalty. This scheme may be used to form representations for case 1a as well. This issue has been addressed previously in the context of novelty detection by using a cross-validation heuristic [2]. More principled methods exist, however, and these are detailed in subsection These generic issues of novelty detection have been dealt with previously in several ways. Typical methods develop a semi-parametric representation using, for example, a Gaussian mixture model (GMM) against which data are assessed. Some methods rely upon heuristic approaches, using maximum (un-normalised) Gaussian response as a measure of novelty [1, 2], creating an articial `novelty' class [3] or utilising thresholds on a density estimate of the `normal' data [4]. 2 Theory Extreme-value theory forms representations for the tails of distributions. When discussing the properties of the tails of a distribution we will, for convenience, discuss the right-hand tail. The same theory 2

3 applies, with minor modication, to the left-hand tail. Consider a set Z m = fz 1 ; z 2 ; :::; z m g of m i.i.d. (independent & identically distributed) random variables z i 2 R drawn from a distribution function D and dene x m = max(z m ), i.e. the largest element observed in m samples of z i. In pioneering work on classical extreme-value theory Fisher & Tippett (1928) [5] (also see [6, 7, 8, 9, 1]) showed that if the distribution function over x m, H, is to be stable in the limit m! 1 then it must weakly converge under a positive ane transform of the form x d m = m x + m (1) for appropriate norming parameters m 2 R + (the scale parameter) and m 2 R (the location parameter). The use of = d denotes a (weak) convergence of distributions rather than a strict equality. Fisher & Tippett proved furthermore that the normalised form H(x m x) d = H( 1 m (x m)) (2) is the only form for the limit distribution of maxima from any D. The second key theorem in extreme-value statistics is referred to as the Fisher-Tippett theorem [5] and states that if H is a non-degenerate limit distribution for normalised maxima of the form 1 m (x m m ) then H is of one of only three forms. Denoting y m = 1 m (x m ) (often referred to as the reduced variate) these three forms are: H 1 (y m ) = exp( exp( y m )) H 2 (y m ; ) = H 3 (y m ; ) = if ym exp ( ym ) if y m > exp ( ( ym ) ) if y m 1 if y m > (3) for 2 R +. These three forms are respectively referred to as the type I or Gumbel, type II or Frechet and type III or Weibull distributions. These three forms may also be regarded as special cases of the generalised extreme-value (GEV) distribution, n H GEV (y m ; ; ) = exp [1 + y m ] 1=o (4) for some 2 R (the shape parameter). This remarkable result is the equivalent, for maxima of random variables, to the central limit theorem for sums of random variables. As m! 1 the distribution of 3

4 maxima from any D is in the domain of attraction of one of the three limit forms. This paper is ultimately concerned with samples assumed drawn from D = jn(; 1)j (i.e. a one-sided normal distribution) whose maxima distribution converges to Gumbel form [9, 6]. The other two limit forms will not be considered further. The interested reader is pointed to (e.g.) Embrechts et al. [6] for more detailed discussions. 2.1 The Gumbel distribution The GEV distribution of Equation 4 gives the Gumbel distribution in the limit case of! [8, 9, 1]. The probability of observing some extremum, x m x is hence given as: P (x m x j m ; m ) = expf exp( y m )g (5) where, as before, y m is the reduced variate y m = 1 m (x m m ) (6) For notational simplicity we will henceforth denote P (x m x j m ; m ) = P (x j m). The appropriate norming parameters, m ; m require estimation, however. In classical extremevalue theory these parameters depend only on the sample size m. In [11] it is proposed that the parameters m and m are coupled via: m = = c (7) m = ln(m) + r in which c and r are constants, so that a plot of m against ln(m) is a straight line. The parameter m may also be experimentally determined from Monte-Carlo simulation as it represents the expected largest element in m samples drawn from D, i.e. if fz i g represent samples from the distribution, then: expt m = hmaxfz 1 ; :::; z m gi (8) A plot of expt m against ln(m) for random variables drawn from jn(; 1)j (i.e. a one-sided Gaussian) is shown in Figure 1. Results were obtained over 1 Monte-Carlo simulations each with m ranging from 1 to 1. Note that the relationship is roughly linear, but that a linear approximation is not appropriate if we wish to describe the relationship over a wide range of m. Whilst maximum-likelihood 4

5 estimation of the norming parameters is required for arbitrary distributions, if the analytic form of the observations distribution, D, is known then m and m may be estimated in closed form. These forms are asymtotically correct as m! 1. As will be shown, however, they parameterise extremevalue distributions which t experimental data well even for small m. The calculation of the norming parameters is a standard problem in EVT and the results are quoted in this paper. Considerably more detail is available in [6, 8]. The asymptotic form of the location parameter for D = jn(; 1)j is given as (following, for example, the results for the normal distribution [6, p.145]): and the scale parameter as: m = (2 ln m) 1=2 ln ln m + ln 2 2(2 ln m) 1=2 (9) m = (2 ln m) 1=2 (1) Figure 2 plots the experimental (1 Monte Carlo trials) and theoretical values of 1 m m for m = 2:::1. One standard deviation errors are also shown for the experimental data, estimated over the Monte Carlo trials. Note that the t between theoretical and experimental results is good. If the cumulative distribution is dened via Equation 5 then the associated density function may be written as: p(x m = x j m ; m ) = 1 m expf y m exp( y m )g (11) where y m is dened via Equation 6. Once more, for notational simplicity, p(x m = x j m ; m ) is henceforth written simply as p(x j m). Figures 3 and 4 respectively show the experimental (1 Monte-Carlo simulations) and theoretical P (x j m) and p(x j m) with x for diering values of m. Note that the original distribution was once more, one-sided normal with zero-mean and unit variance. Note also the movement and form of the extreme-value distribution with increasing m. The distributions obtained from Equations 5, 9, 1 and 11 are in good agreement with the experimental results. For the case of m = 1 in Figure 5, however, note that there is indication that the predicted curve is an underestimate as P (x j m)! 1. It is noted that in the limit m = 1 the resultant probabilities are obtained from D = jn(; 1)j. 5

6 2.2 The Gaussian mixture model The Gaussian Mixture Model (GMM) is widely used in data density estimation and to form the latent (hidden) space of radial-basis function networks. The choice of the Gaussian kernel is not an arbitrary one. Gaussian mixture methods and radial-basis systems using Gaussian bases both have the property of universal approximation, are analytically attractive and are hence favoured in the literature. The form of the GMM is simple, consisting of a set of k = 1::K components each of which has standard Gaussian form and is uniquely parameterised by its mean and covariance matrix. For such a mixture Bayes' theorem implies that the data likelihood, p(x), may be written as mixture of component (conditional) likelihoods of the form p(x) = X k (k)p(x j k) (12) where (k) are the component priors. If we assume that the statistics of outlying events are dominated by the component closest (in the Mahalanobis sense) then we are interested in the resultant probability of data lying interior to some distance h from the component mean. Integrating the (one-sided) density function gives P (h(x)) = Z h(x) c k p(w j k )dw (13) where c k is the mean of the closest component and w is a dummy variable of integration. The more complex case in which x is multidimensional is easily dealt with by evaluating the above Equation using the Mahalanobis distance dened in its standard form: h(x) k = q (x c k ) T C 1 k (x c k ) (14) where C is the covariance matrix and c is the centroid. The Mahalanobis distance, evaluated with respect to a Gaussian component, has a density function given by: p(h(x)) = p 2 exp 12 2 h(x)2 (15) Combining Equations 13 and 15 and using the standard form for the Gaussian integral gives: P (h(x)) = erf h(x) p 2 (16) where erf(.) is the error function. 6

7 To evaluate the resultant EVT probability at some point x it is noted that the theory (and parametric form) of the EVT distribution derived for the jn(; 1)j distribution (Equation 5) may be used directly as h(x) is expected (with respect to a Gaussian component) to be distributed as jn(; 1)j. Furthermore, if a total of m points are observed and used to t the model so an estimate of the eective number of points `seen' by each component is simply m k = m(k). The EVT probability measure is hence written as: P (x j m) = P (h(x) k j m k ) (17) where the index k denotes the fact that we deal only with the component closest (in the Mahanolobis sense) to x as this dominates the EVT statistics. Figure 5 shows the evolution of P (x j m) with m for the case of a one-dimensional, two-component, Gaussian mixture model. The components are centred at 1, 1 = 1, 2 = 1=2 and have equal priors. As the number of observed data points (m) increases so the resultant extremal probabilities become less conservative. The value of m increases linearly from 2 to Model tting and complexity penalisation All the results in this paper are obtained using the Expectation-Maximisation (EM) algorithm [12] to estimate a maximum-likelihood parameter set for the means, covariance matrices and component priors in the mixture model. Details of the standard algorithmic approach taken to implement the EM procedure may be found in [13] for example. The likelihood function evaluated from maximum-likelihood methods, such as EM, is monotone increasing with the complexity of the model (i.e. increasing numbers of components). If we wish, however, to form a parsimonious representation of the data set this likelihood function is modied via a function which penalises models with larger complexity (more parameters). Such penalty measures may take several forms. One of the simplest is that of minimum description length coding (MDL), rst derived by Rissanen in 1978 [14]. Although other elegant paradigms exist for obtaining a penalised likelihood function, in particular Bayesian schemes, MDL is observed in practice to give results similar to those of the more (mathematically) complex Bayesian methods (see [15] for a description of the Bayesian approach and comparisons with MDL and other methods). The results in this paper were 7

8 all obtained using a (log) likelihood functional penalised using MDL of the form: L pen (K) = L(K) 1 2 N p(k) ln N (18) where L(K) is the unpenalised likelihood solution for the model with K components, N p (K) is the number of free parameters in that model and N is the total number of data in the data set in question. We choose, therefore, that model with the highest L pen. 3 Results Results in this section are from two medical data sets and a noise removal problem in image processing. The rst data set presented comprises auto-regressive model coecients from a study of hand tremor in patient and non-patient groups. The second data set comes from recordings of electrical activity of the brain (the electroencephalogram or EEG) in which epileptic seizure events occur. 3.1 Tremor data set The tremor data set consists of 9 two-dimensional data points representing hand tremors of a set of healthy volunteers and a set of ten points randomly drawn from the patient population. 1 A vecomponent Gaussian mixture model was trained on the non-patient data set using the EM algorithm. Figure 6 shows a surface mesh plot of the EVT probability for this data set. Figure 7 shows the data along with P = :95 boundaries from (thin line) integrating the mixture model and (thick line) the extreme value distribution. The latter gives rise to fewer false positives and, it is argued here, a qualitatively more reasonable novelty measure. It should be noted that the use of thresholding probability measures is for presentation purposes and novelty measures should ideally remain continuous. 3.2 Epilepsy data set The second data set consists of a single channel of EEG from an epileptic subject. The signal was parameterised over one-second segments using information from the singular spectrum of a phasespace embedding (see [17] for details). A four component Gaussian mixture model was adapted to the parameter space over a 1-minute section of the data in which no epileptic activity was observed. Figure 8 shows the resultant novelty probabilities from (upper trace) the error-function approach and 1 The nal numbers of patient and non-patient data were arranged to be equal, as in the original study a supervised classication method was used [16]. 8

9 (middle trace) the EVT approach. It is clear that both methods highlight, in particular, the four regions of epileptic activity in this record. The EVT method, however, gives a considerably more interpretable trace as background (non-epileptic) activity has uniformly low probability. The lower plot of the gure shows the activity highlighted by the rst region of novelty in the EVT trace (between 63 and 71 seconds into the recording). The characteristic 3Hz `spike and wave' activity is very clear. 3.3 Image noise removal The nal example presented consists of removal of `salt and pepper' noise from an image ( image of a house). A pixel-by-pixel EVT probability was calculated using grey-scale information from a 7 7 neighbourhood around each pixel to which a single Gaussian was tted. A pixel-by-pixel probability value was also calculated for the standard Gaussian integral approach using the same information. The image was also median ltered using a 3 3 neighbourhood. Denoting P [r; c] as either the EVT or Gaussian-integral outlier probability for pixel [r; c], M[r; c] the median-ltered response and O[r; c] the observed (noisy) image, then the ltered images, F [r; c] are obtained from a convex combination of the noisy image and the median-ltered version. F [r; c] = (1 P [r; c])o[r; c] + P [r; c]m[r; c] (19) i.e. pixels are replaced by their median-ltered values only if it is believed that noise corruption occurs. This is in contrast, of course, to the `traditional' median ltering paradigm in which all pixels are replaced by the neighbourhood median value. It is noted that the latter approach gave results worse than either probabilistic approach. Furthermore the convex combination dened in Equation 19 gives better results for both probability measures (EVT and Gaussian integral) than threshold-based pixel replacement methods (i.e. ltering pixel if P [r; c] > threshold) even if the threshold value is optimised. Figure 9 shows the original image, I (top left), the corrupted image, O (top right), the ltered image obtained via the Gaussian-integral outlier probability (bottom left) and the ltered image obtained via the EVT outlier probability (bottom right). Also indicated are the normalised mean-square-error (NMSE) values (averaged per pixel) between I and each other image, I, dened as and summarised in the following table, NMSE = h(i[r; c] I [r; c]) 2 i hvar (I[r; c])i 9

10 Method NMSE Median.534 Gaussian-integral lter.432 EVT lter.14 It is clear that a considerable reduction in reconstruction error is obtained using the EVT method. 4 Conclusions The issue of determining whether some datum is indicative of abnormal or novel system behaviour is paramount in areas such as medical data analysis where the acquisition of large quantities of abnormal data is dicult. Several approaches to screening data for such novel segments have been proposed previously. Extreme-value theory and the associated statistics provide, however, a complete and principled approach to the problem which avoids the need for cross-validation or setting of (possibly arbitrary) novelty thresholds. EVT forms an elegant methodology for any problem in which the chance of nding extremes of some quantity is required. It is shown that a simple parametric model for the reduced variate may be used which enables the EVT distributions from a Gaussian mixture model to be estimated. These estimates t the experimental data well. A variety of novelty problems are analysed using this method and the results are shown to be superior to those obtained using more conventional approaches. 5 Acknowledgements Thanks are rst due to Dr. Stephen Walker for help with aymptotic forms and to the anonymous referees of earlier drafts of this paper whose patient comments have helped improve the work considerably. The author would also like to thank Dirk Husmeier and Peter Sykacek for helpful comments. The author is most grateful to Julia Spyers-Ashby of the Royal Hospital for Neurodisability, Putney, UK, for the tremor data. This work was researched whilst the author was on sabbatical leave from Imperial College at the Newton Institute for Mathematical Sciences in Cambridge whose support is gratefully acknowledged. 1

11 References [1] S. Roberts and L. Tarassenko. A Probabilistic Resource Allocating Network for Novelty Detection. Neural Computation, 6:27{284, [2] L. Tarassenko, P. Hayton, N. Cerneaz, and M. Brady. Novelty detection for the detection of masses in mammograms. In Proceedings of : Fourth IEE International Conference on Articial Neural Networks, pages 442{447. IEE, June [3] S. Roberts, L. Tarassenko, J. Pardey, and D. Siegwart. A Validation Index for Articial Neural Networks. Proceedings of International Conference on Neural Networks & Expert Systems in Medicine & Healthcare, Plymouth, August 1994, pages 23{3, [4] C.M. Bishop. Novelty Detection and Neural Network Validation. IEE Proc. Vis. Image Signal Processing, 141:217{222, [5] R.A. Fisher and L.H.C. Tippett. Limiting Forms of the Frequency Distributions of the Largest or Smallest Members of a Sample. Proc. Cambridge Phil. Soc., 24:18{19, [6] P. Embrechts, C. Kluppelberg, and T. Mikosch. Modelling Extremal Events. Springer-Verlag, [7] W. Weibull. A Statistical Distribution of Wide Applicability. J. Appl. Mechanics, 18:293{297, [8] R.L. Smith. Handbook of Applicable Mathematics: Supplement, chapter 14, Extreme Value Theory, pages 437{472. John Wiley, 199. [9] E.J. Gumbel. Statistics of Extremes. Columbia University Press, New York, [1] N. Balakrishnan and A.C. Cohen. Order Statistics abd Inference. Academic Press, New York, [11] S.L. Miller, W.M. Miller, and P.J. McWhorter. Extremal Dynamics: A unifying physical explanation of fractals, 1/f noise, and activated processes. J. Appl. Phys., 73(6):2617{2628, [12] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Roy. Stat. Soc., 39(1):1{38, [13] C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, [14] J. Rissanen. Modelling by shortest data description. Automatica, 14:465{471, [15] S.J. Roberts, D. Husmeier, I. Rezek, and W. Penny. Bayesian approaches to mixture modelling. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2(11):1133{1142, [16] J.M. Spyers-Ashby. The Recording and Analysis of Tremor in Neurological Disorders. PhD thesis, Imperial College, University of London, [17] I.A. Rezek and S.J. Roberts. Stochastic complexity measures for physiological signal analysis. IEEE Transactions on Biomedical Engineering, 44(9),

12 6 Figures µ m expt Figure 1: Plot of expt m against (log) m evaluated over Monte-Carlo simulations. Note that whilst the relationship is roughly linear, a straight-line t would be inappropriate if we wish to model over a large range of m. m 12

13 σ m 1 µm m Figure 2: Plot of experimental (solid line) and theoretical (dashed line) values of 1 m m. One standard deviation error intervals are also shown for the experimental data (the thinner solid lines) P(x m).5 m = 1.4 m = m = x Figure 3: This plot shows the form of the cumulative distribution P (x j m) for m = 1; 1; 1. Note that as m increases the resultant distributions become more extreme. Circles denote experimental results from Monte-Carlo simulations and the solid line the theoretical result of Equation 5. 13

14 1.5 1 m = 1 p(x m) m = x Figure 4: This plot shows the form of the density function p(x j m) for m = 1; 1. Note that as m increases the resultant distributions become more extreme. Circles denote experimental results from Monte-Carlo simulations and the solid line the theoretical result of Equation 11. P(x m) m x Figure 5: Evolution of P (x j m) with m for a one-dimensional, two-component, Gaussian mixture model. The components are centred at 1, 1 = 1, 2 = 1=2 and have equal priors. As the number of observed data increases so the resultant extremal probabilities become less conservative. The value of m increases linearly from 2 to 1. 14

15 1.8 P(x m) x x1 1 2 Figure 6: Mesh plot of the EVT probability (Equation 17) over the two-dimensional space of the hand tremor data x x1 Figure 7: Contour plot of P > :95 boundary on EVT probability (Equation 17) [thick curve] along with the P > :95 boundary from the error function approach (Equation 16) [thin curve]. Data from the non-patient class are shown as open circles (N = 9) and data from patients (N = 1) are shown as lled circles. Note the larger number of false positives using the integrated mixture approach and the one unavoidable false negative present in both approaches. 15

16 1 P(x) P(x m) t (seconds) Figure 8: Novelty probability for epileptic seizure activity using error-function method (upper plot), EVT (middle plot). The lower trace shows the rst period of activity agged as novel by the EVT approach. All x-axis times are in seconds from the start of the recording. 16

17 Original NSME =.5312 NSME =.432 NSME =.14 Figure 9: Original image (top left), corrupted image (top right), median ltered image (bottom left) and EVT-ltered image (bottom right). Normalised mean-square errors of distortion from the original are also given. 17

EXTREME VALUE STATISTICS FOR NOVELTY DETECTION IN BIOMEDICAL SIGNAL PROCESSING

EXTREME VALUE STATISTICS FOR NOVELTY DETECTION IN BIOMEDICAL SIGNAL PROCESSING EXTREME VALUE STATISTICS FOR NOVELTY DETECTION IN BIOMEDICAL SIGNAL PROCESSING Stephen J. Roberts Department of Engineering Science, University of Oxford, UK NOVELTY DETECTION Novelty, outlier or abnormality

More information

Minimum Entropy Data Partitioning. Exhibition Road, London SW7 2BT, UK. in even a 3-dimensional data space). a Radial-Basis Function (RBF) classier in

Minimum Entropy Data Partitioning. Exhibition Road, London SW7 2BT, UK. in even a 3-dimensional data space). a Radial-Basis Function (RBF) classier in Minimum Entropy Data Partitioning Stephen J. Roberts, Richard Everson & Iead Rezek Intelligent & Interactive Systems Group Department of Electrical & Electronic Engineering Imperial College of Science,

More information

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1 To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Widths. Center Fluctuations. Centers. Centers. Widths

Widths. Center Fluctuations. Centers. Centers. Widths Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.

More information

Figure : Learning the dynamics of juggling. Three motion classes, emerging from dynamical learning, turn out to correspond accurately to ballistic mot

Figure : Learning the dynamics of juggling. Three motion classes, emerging from dynamical learning, turn out to correspond accurately to ballistic mot Learning multi-class dynamics A. Blake, B. North and M. Isard Department of Engineering Science, University of Oxford, Oxford OX 3PJ, UK. Web: http://www.robots.ox.ac.uk/vdg/ Abstract Standard techniques

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling

Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling Abstract An automated unsupervised technique, based upon a Bayesian framework, for the segmentation of low light level

More information

EEG- Signal Processing

EEG- Signal Processing Fatemeh Hadaeghi EEG- Signal Processing Lecture Notes for BSP, Chapter 5 Master Program Data Engineering 1 5 Introduction The complex patterns of neural activity, both in presence and absence of external

More information

Novelty detection in jet engine vibration spectra

Novelty detection in jet engine vibration spectra FEATURE MODELLING IN CM Novelty detection in jet engine vibration spectra D A Clifton and L Tarassenko Submitted 25.11.13 Accepted 09.01.14 This paper presents a principled method for detecting the abnormal

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Novelty Detection with Multivariate Extreme Value Statistics

Novelty Detection with Multivariate Extreme Value Statistics J Sign Process Syst (2) 65:37 389 DOI.7/s265--53-6 Novelty Detection with Multivariate Extreme Value Statistics David Andrew Clifton Samuel Hugueny Lionel Tarassenko Received: 5 January 2 / Revised: 5

More information

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting

More information

Gaussian process for nonstationary time series prediction

Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong

More information

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems Silvia Chiappa and Samy Bengio {chiappa,bengio}@idiap.ch IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland Abstract. We compare the use

More information

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

Unsupervised Learning Methods

Unsupervised Learning Methods Structural Health Monitoring Using Statistical Pattern Recognition Unsupervised Learning Methods Keith Worden and Graeme Manson Presented by Keith Worden The Structural Health Monitoring Process 1. Operational

More information

A class of probability distributions for application to non-negative annual maxima

A class of probability distributions for application to non-negative annual maxima Hydrol. Earth Syst. Sci. Discuss., doi:.94/hess-7-98, 7 A class of probability distributions for application to non-negative annual maxima Earl Bardsley School of Science, University of Waikato, Hamilton

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Modelação de valores extremos e sua importância na

Modelação de valores extremos e sua importância na Modelação de valores extremos e sua importância na segurança e saúde Margarida Brito Departamento de Matemática FCUP (FCUP) Valores Extremos - DemSSO 1 / 12 Motivation Consider the following events Occurance

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

A variational radial basis function approximation for diffusion processes

A variational radial basis function approximation for diffusion processes A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University - Neural Computing Research Group Aston Triangle, Birmingham B4

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Confidence Estimation Methods for Neural Networks: A Practical Comparison , 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.

More information

Smooth Bayesian Kernel Machines

Smooth Bayesian Kernel Machines Smooth Bayesian Kernel Machines Rutger W. ter Borg 1 and Léon J.M. Rothkrantz 2 1 Nuon NV, Applied Research & Technology Spaklerweg 20, 1096 BA Amsterdam, the Netherlands rutger@terborg.net 2 Delft University

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,

More information

Intelligent Systems Statistical Machine Learning

Intelligent Systems Statistical Machine Learning Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Intelligent Systems Statistical Machine Learning

Intelligent Systems Statistical Machine Learning Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2015/2016, Our model and tasks The model: two variables are usually present: - the first one is typically discrete

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA

MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA Jiří Grim Department of Pattern Recognition Institute of Information Theory and Automation Academy

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

G. Larry Bretthorst. Washington University, Department of Chemistry. and. C. Ray Smith

G. Larry Bretthorst. Washington University, Department of Chemistry. and. C. Ray Smith in Infrared Systems and Components III, pp 93.104, Robert L. Caswell ed., SPIE Vol. 1050, 1989 Bayesian Analysis of Signals from Closely-Spaced Objects G. Larry Bretthorst Washington University, Department

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp Input Selection with Partial Retraining Pierre van de Laar, Stan Gielen, and Tom Heskes RWCP? Novel Functions SNN?? Laboratory, Dept. of Medical Physics and Biophysics, University of Nijmegen, The Netherlands.

More information

Gaussian Processes for Sequential Prediction

Gaussian Processes for Sequential Prediction Gaussian Processes for Sequential Prediction Michael A. Osborne Machine Learning Research Group Department of Engineering Science University of Oxford Gaussian processes are useful for sequential data,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Dimensional reduction of clustered data sets

Dimensional reduction of clustered data sets Dimensional reduction of clustered data sets Guido Sanguinetti 5th February 2007 Abstract We present a novel probabilistic latent variable model to perform linear dimensional reduction on data sets which

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

HIERARCHICAL MODELS IN EXTREME VALUE THEORY HIERARCHICAL MODELS IN EXTREME VALUE THEORY Richard L. Smith Department of Statistics and Operations Research, University of North Carolina, Chapel Hill and Statistical and Applied Mathematical Sciences

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Convergence Rate of Expectation-Maximization

Convergence Rate of Expectation-Maximization Convergence Rate of Expectation-Maximiation Raunak Kumar University of British Columbia Mark Schmidt University of British Columbia Abstract raunakkumar17@outlookcom schmidtm@csubcca Expectation-maximiation

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999 In: Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Methods and Applications (CIMA), International Computer Science Conventions Rochester New York, 999 Feature Selection Based

More information

ECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012

ECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012 Reading ECE 275B Homework #2 Due Thursday 2-16-12 MIDTERM is Scheduled for Tuesday, February 21, 2012 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example 7.11,

More information

Learning from Data Logistic Regression

Learning from Data Logistic Regression Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Estimation of Gutenberg-Richter seismicity parameters for the Bundaberg region using piecewise extended Gumbel analysis

Estimation of Gutenberg-Richter seismicity parameters for the Bundaberg region using piecewise extended Gumbel analysis Estimation of Gutenberg-Richter seismicity parameters for the Bundaberg region using piecewise extended Gumbel analysis Abstract Mike Turnbull Central Queensland University The Gumbel statistics of extreme

More information

EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY TREND WITH APPLICATIONS IN ELECTRICITY CONSUMPTION ALEXANDR V.

EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY TREND WITH APPLICATIONS IN ELECTRICITY CONSUMPTION ALEXANDR V. MONTENEGRIN STATIONARY JOURNAL TREND WITH OF ECONOMICS, APPLICATIONS Vol. IN 9, ELECTRICITY No. 4 (December CONSUMPTION 2013), 53-63 53 EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY

More information

The Minimum Message Length Principle for Inductive Inference

The Minimum Message Length Principle for Inductive Inference The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,

More information

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Original Article Robust correlation coefficient goodness-of-fit test for the Gumbel distribution Abbas Mahdavi 1* 1 Department of Statistics, School of Mathematical

More information

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015 Reading ECE 275B Homework #2 Due Thursday 2/12/2015 MIDTERM is Scheduled for Thursday, February 19, 2015 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example

More information

in a Rao-Blackwellised Unscented Kalman Filter

in a Rao-Blackwellised Unscented Kalman Filter A Rao-Blacwellised Unscented Kalman Filter Mar Briers QinetiQ Ltd. Malvern Technology Centre Malvern, UK. m.briers@signal.qinetiq.com Simon R. Masell QinetiQ Ltd. Malvern Technology Centre Malvern, UK.

More information

Extreme Value Theory for Novelty Detection in Vital Sign Monitoring

Extreme Value Theory for Novelty Detection in Vital Sign Monitoring Extreme Value Theory for Novelty Detection in Vital Sign Monitoring Samuel Hugueny Life Sciences Interface Doctoral Training Centre University of Oxford July 25, 2008 Abstract Studies indicate that a large

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a preprint version which may differ from the publisher's version. For additional information about this

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

Statistical Filters for Crowd Image Analysis

Statistical Filters for Crowd Image Analysis Statistical Filters for Crowd Image Analysis Ákos Utasi, Ákos Kiss and Tamás Szirányi Distributed Events Analysis Research Group, Computer and Automation Research Institute H-1111 Budapest, Kende utca

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

Statistical inference for MEG

Statistical inference for MEG Statistical inference for MEG Vladimir Litvak Wellcome Trust Centre for Neuroimaging University College London, UK MEG-UK 2014 educational day Talk aims Show main ideas of common methods Explain some of

More information

A Generalization of Principal Component Analysis to the Exponential Family

A Generalization of Principal Component Analysis to the Exponential Family A Generalization of Principal Component Analysis to the Exponential Family Michael Collins Sanjoy Dasgupta Robert E. Schapire AT&T Labs Research 8 Park Avenue, Florham Park, NJ 7932 mcollins, dasgupta,

More information

Overlapping Astronomical Sources: Utilizing Spectral Information

Overlapping Astronomical Sources: Utilizing Spectral Information Overlapping Astronomical Sources: Utilizing Spectral Information David Jones Advisor: Xiao-Li Meng Collaborators: Vinay Kashyap (CfA) and David van Dyk (Imperial College) CHASC Astrostatistics Group April

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Statistical Learning Reading Assignments

Statistical Learning Reading Assignments Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

MC3: Econometric Theory and Methods. Course Notes 4

MC3: Econometric Theory and Methods. Course Notes 4 University College London Department of Economics M.Sc. in Economics MC3: Econometric Theory and Methods Course Notes 4 Notes on maximum likelihood methods Andrew Chesher 25/0/2005 Course Notes 4, Andrew

More information