Draft February Novelty Detection using Extreme Value Statistics. London, UK. February 23, 1999.
|
|
- Aldous Wilkinson
- 5 years ago
- Views:
Transcription
1 Draft February 1999 Novelty Detection using Extreme Value Statistics Stephen J. Roberts Department of Electrical & Electronic Engineering Imperial College of Science, Technology & Medicine London, UK s.j.roberts@ic.ac.uk February 23, 1999 Abstract Extreme value theory (EVT) is a branch of statistics which concerns the distributions of data of unusually low or high value i.e. in the tails of some distribution. These extremal points are important in many applications as they represent the outlying regions of normal events against which we may wish to dene abnormal events. In the context of density modelling, novelty detection, or Radial-Basis function (RBF) systems points which lie outside of the range of expected extreme values may be agged as outliers. Whilst there has been interest in the area of novelty detection decisions as to whether a point is an outlier or not tend to be made on the basis of exceeding some (heuristic) threshold. In this paper it is shown that a more principled approach may be taken using extreme value statistics. 1 Introduction Extreme value theory (EVT) concerns the distributions of data of abnormally low or high value in the tails of some data generating distribution. As we observe more data points the statistics of the largest (or smallest) values we observe change. Knowledge of these statistics may be useful in a number of areas, for example detecting events which are outside our expected data extremes for outlier detection or for rejecting classication or regression on data which lies far from the expected statistics of some training data set. Novelty, outlier or abnormality detection is a method of choice in the case where the usual assumption of supervised classication breaks down. This assumption implies that the data set contains viable information regarding all classication hypotheses. This assumption will break down under two major conditions: 1. There are very few examples of an important class within the data set. This is often the case in medical data or in condition monitoring when data from `normal' system states are far easier and less costly to acquire than abnormal exemplars (patients with rare conditions, for example). In a classication system it becomes increasingly dicult to make decisions into this class as the 1
2 number of examples reduces. It is argued that, in such a case, it is better to formulate a model of the `normal' data and test for `novel' data against this model [1, 2]. 2. The data set contains information regarding all classes in the classication set, but the classication set itself is incomplete i.e. data from some extra class is observed at a future point. Such a classication system will classify this `novel' data erroneously into one of the classes in the classication set. For the rst of these conditions, furthermore, this paper distinguishes between two sub-cases: 1a. A data set exists which is known to be representative of `normal' data. No occurrences of `abnormalities' are believed to be present. 1b. The data set is believed to consist of a number of examples of `normal' data with a smaller, but unknown, number of `abnormal' data. The rst of these cases is, of course, simpler. A representation of the `normal' data may have any desired complexity (one can imagine a crude representation based on a single normal distribution, or a Parzen windows approach in which the representation is based upon each datum in the set). If the same approach is applied to case 1b the representation runs the risk of tting abnormal data and hence making its subsequent detection more dicult. It is desirable, therefore, to impose some form of model-complexity penalty. This scheme may be used to form representations for case 1a as well. This issue has been addressed previously in the context of novelty detection by using a cross-validation heuristic [2]. More principled methods exist, however, and these are detailed in subsection These generic issues of novelty detection have been dealt with previously in several ways. Typical methods develop a semi-parametric representation using, for example, a Gaussian mixture model (GMM) against which data are assessed. Some methods rely upon heuristic approaches, using maximum (un-normalised) Gaussian response as a measure of novelty [1, 2], creating an articial `novelty' class [3] or utilising thresholds on a density estimate of the `normal' data [4]. 2 Theory Extreme-value theory forms representations for the tails of distributions. When discussing the properties of the tails of a distribution we will, for convenience, discuss the right-hand tail. The same theory 2
3 applies, with minor modication, to the left-hand tail. Consider a set Z m = fz 1 ; z 2 ; :::; z m g of m i.i.d. (independent & identically distributed) random variables z i 2 R drawn from a distribution function D and dene x m = max(z m ), i.e. the largest element observed in m samples of z i. In pioneering work on classical extreme-value theory Fisher & Tippett (1928) [5] (also see [6, 7, 8, 9, 1]) showed that if the distribution function over x m, H, is to be stable in the limit m! 1 then it must weakly converge under a positive ane transform of the form x d m = m x + m (1) for appropriate norming parameters m 2 R + (the scale parameter) and m 2 R (the location parameter). The use of = d denotes a (weak) convergence of distributions rather than a strict equality. Fisher & Tippett proved furthermore that the normalised form H(x m x) d = H( 1 m (x m)) (2) is the only form for the limit distribution of maxima from any D. The second key theorem in extreme-value statistics is referred to as the Fisher-Tippett theorem [5] and states that if H is a non-degenerate limit distribution for normalised maxima of the form 1 m (x m m ) then H is of one of only three forms. Denoting y m = 1 m (x m ) (often referred to as the reduced variate) these three forms are: H 1 (y m ) = exp( exp( y m )) H 2 (y m ; ) = H 3 (y m ; ) = if ym exp ( ym ) if y m > exp ( ( ym ) ) if y m 1 if y m > (3) for 2 R +. These three forms are respectively referred to as the type I or Gumbel, type II or Frechet and type III or Weibull distributions. These three forms may also be regarded as special cases of the generalised extreme-value (GEV) distribution, n H GEV (y m ; ; ) = exp [1 + y m ] 1=o (4) for some 2 R (the shape parameter). This remarkable result is the equivalent, for maxima of random variables, to the central limit theorem for sums of random variables. As m! 1 the distribution of 3
4 maxima from any D is in the domain of attraction of one of the three limit forms. This paper is ultimately concerned with samples assumed drawn from D = jn(; 1)j (i.e. a one-sided normal distribution) whose maxima distribution converges to Gumbel form [9, 6]. The other two limit forms will not be considered further. The interested reader is pointed to (e.g.) Embrechts et al. [6] for more detailed discussions. 2.1 The Gumbel distribution The GEV distribution of Equation 4 gives the Gumbel distribution in the limit case of! [8, 9, 1]. The probability of observing some extremum, x m x is hence given as: P (x m x j m ; m ) = expf exp( y m )g (5) where, as before, y m is the reduced variate y m = 1 m (x m m ) (6) For notational simplicity we will henceforth denote P (x m x j m ; m ) = P (x j m). The appropriate norming parameters, m ; m require estimation, however. In classical extremevalue theory these parameters depend only on the sample size m. In [11] it is proposed that the parameters m and m are coupled via: m = = c (7) m = ln(m) + r in which c and r are constants, so that a plot of m against ln(m) is a straight line. The parameter m may also be experimentally determined from Monte-Carlo simulation as it represents the expected largest element in m samples drawn from D, i.e. if fz i g represent samples from the distribution, then: expt m = hmaxfz 1 ; :::; z m gi (8) A plot of expt m against ln(m) for random variables drawn from jn(; 1)j (i.e. a one-sided Gaussian) is shown in Figure 1. Results were obtained over 1 Monte-Carlo simulations each with m ranging from 1 to 1. Note that the relationship is roughly linear, but that a linear approximation is not appropriate if we wish to describe the relationship over a wide range of m. Whilst maximum-likelihood 4
5 estimation of the norming parameters is required for arbitrary distributions, if the analytic form of the observations distribution, D, is known then m and m may be estimated in closed form. These forms are asymtotically correct as m! 1. As will be shown, however, they parameterise extremevalue distributions which t experimental data well even for small m. The calculation of the norming parameters is a standard problem in EVT and the results are quoted in this paper. Considerably more detail is available in [6, 8]. The asymptotic form of the location parameter for D = jn(; 1)j is given as (following, for example, the results for the normal distribution [6, p.145]): and the scale parameter as: m = (2 ln m) 1=2 ln ln m + ln 2 2(2 ln m) 1=2 (9) m = (2 ln m) 1=2 (1) Figure 2 plots the experimental (1 Monte Carlo trials) and theoretical values of 1 m m for m = 2:::1. One standard deviation errors are also shown for the experimental data, estimated over the Monte Carlo trials. Note that the t between theoretical and experimental results is good. If the cumulative distribution is dened via Equation 5 then the associated density function may be written as: p(x m = x j m ; m ) = 1 m expf y m exp( y m )g (11) where y m is dened via Equation 6. Once more, for notational simplicity, p(x m = x j m ; m ) is henceforth written simply as p(x j m). Figures 3 and 4 respectively show the experimental (1 Monte-Carlo simulations) and theoretical P (x j m) and p(x j m) with x for diering values of m. Note that the original distribution was once more, one-sided normal with zero-mean and unit variance. Note also the movement and form of the extreme-value distribution with increasing m. The distributions obtained from Equations 5, 9, 1 and 11 are in good agreement with the experimental results. For the case of m = 1 in Figure 5, however, note that there is indication that the predicted curve is an underestimate as P (x j m)! 1. It is noted that in the limit m = 1 the resultant probabilities are obtained from D = jn(; 1)j. 5
6 2.2 The Gaussian mixture model The Gaussian Mixture Model (GMM) is widely used in data density estimation and to form the latent (hidden) space of radial-basis function networks. The choice of the Gaussian kernel is not an arbitrary one. Gaussian mixture methods and radial-basis systems using Gaussian bases both have the property of universal approximation, are analytically attractive and are hence favoured in the literature. The form of the GMM is simple, consisting of a set of k = 1::K components each of which has standard Gaussian form and is uniquely parameterised by its mean and covariance matrix. For such a mixture Bayes' theorem implies that the data likelihood, p(x), may be written as mixture of component (conditional) likelihoods of the form p(x) = X k (k)p(x j k) (12) where (k) are the component priors. If we assume that the statistics of outlying events are dominated by the component closest (in the Mahalanobis sense) then we are interested in the resultant probability of data lying interior to some distance h from the component mean. Integrating the (one-sided) density function gives P (h(x)) = Z h(x) c k p(w j k )dw (13) where c k is the mean of the closest component and w is a dummy variable of integration. The more complex case in which x is multidimensional is easily dealt with by evaluating the above Equation using the Mahalanobis distance dened in its standard form: h(x) k = q (x c k ) T C 1 k (x c k ) (14) where C is the covariance matrix and c is the centroid. The Mahalanobis distance, evaluated with respect to a Gaussian component, has a density function given by: p(h(x)) = p 2 exp 12 2 h(x)2 (15) Combining Equations 13 and 15 and using the standard form for the Gaussian integral gives: P (h(x)) = erf h(x) p 2 (16) where erf(.) is the error function. 6
7 To evaluate the resultant EVT probability at some point x it is noted that the theory (and parametric form) of the EVT distribution derived for the jn(; 1)j distribution (Equation 5) may be used directly as h(x) is expected (with respect to a Gaussian component) to be distributed as jn(; 1)j. Furthermore, if a total of m points are observed and used to t the model so an estimate of the eective number of points `seen' by each component is simply m k = m(k). The EVT probability measure is hence written as: P (x j m) = P (h(x) k j m k ) (17) where the index k denotes the fact that we deal only with the component closest (in the Mahanolobis sense) to x as this dominates the EVT statistics. Figure 5 shows the evolution of P (x j m) with m for the case of a one-dimensional, two-component, Gaussian mixture model. The components are centred at 1, 1 = 1, 2 = 1=2 and have equal priors. As the number of observed data points (m) increases so the resultant extremal probabilities become less conservative. The value of m increases linearly from 2 to Model tting and complexity penalisation All the results in this paper are obtained using the Expectation-Maximisation (EM) algorithm [12] to estimate a maximum-likelihood parameter set for the means, covariance matrices and component priors in the mixture model. Details of the standard algorithmic approach taken to implement the EM procedure may be found in [13] for example. The likelihood function evaluated from maximum-likelihood methods, such as EM, is monotone increasing with the complexity of the model (i.e. increasing numbers of components). If we wish, however, to form a parsimonious representation of the data set this likelihood function is modied via a function which penalises models with larger complexity (more parameters). Such penalty measures may take several forms. One of the simplest is that of minimum description length coding (MDL), rst derived by Rissanen in 1978 [14]. Although other elegant paradigms exist for obtaining a penalised likelihood function, in particular Bayesian schemes, MDL is observed in practice to give results similar to those of the more (mathematically) complex Bayesian methods (see [15] for a description of the Bayesian approach and comparisons with MDL and other methods). The results in this paper were 7
8 all obtained using a (log) likelihood functional penalised using MDL of the form: L pen (K) = L(K) 1 2 N p(k) ln N (18) where L(K) is the unpenalised likelihood solution for the model with K components, N p (K) is the number of free parameters in that model and N is the total number of data in the data set in question. We choose, therefore, that model with the highest L pen. 3 Results Results in this section are from two medical data sets and a noise removal problem in image processing. The rst data set presented comprises auto-regressive model coecients from a study of hand tremor in patient and non-patient groups. The second data set comes from recordings of electrical activity of the brain (the electroencephalogram or EEG) in which epileptic seizure events occur. 3.1 Tremor data set The tremor data set consists of 9 two-dimensional data points representing hand tremors of a set of healthy volunteers and a set of ten points randomly drawn from the patient population. 1 A vecomponent Gaussian mixture model was trained on the non-patient data set using the EM algorithm. Figure 6 shows a surface mesh plot of the EVT probability for this data set. Figure 7 shows the data along with P = :95 boundaries from (thin line) integrating the mixture model and (thick line) the extreme value distribution. The latter gives rise to fewer false positives and, it is argued here, a qualitatively more reasonable novelty measure. It should be noted that the use of thresholding probability measures is for presentation purposes and novelty measures should ideally remain continuous. 3.2 Epilepsy data set The second data set consists of a single channel of EEG from an epileptic subject. The signal was parameterised over one-second segments using information from the singular spectrum of a phasespace embedding (see [17] for details). A four component Gaussian mixture model was adapted to the parameter space over a 1-minute section of the data in which no epileptic activity was observed. Figure 8 shows the resultant novelty probabilities from (upper trace) the error-function approach and 1 The nal numbers of patient and non-patient data were arranged to be equal, as in the original study a supervised classication method was used [16]. 8
9 (middle trace) the EVT approach. It is clear that both methods highlight, in particular, the four regions of epileptic activity in this record. The EVT method, however, gives a considerably more interpretable trace as background (non-epileptic) activity has uniformly low probability. The lower plot of the gure shows the activity highlighted by the rst region of novelty in the EVT trace (between 63 and 71 seconds into the recording). The characteristic 3Hz `spike and wave' activity is very clear. 3.3 Image noise removal The nal example presented consists of removal of `salt and pepper' noise from an image ( image of a house). A pixel-by-pixel EVT probability was calculated using grey-scale information from a 7 7 neighbourhood around each pixel to which a single Gaussian was tted. A pixel-by-pixel probability value was also calculated for the standard Gaussian integral approach using the same information. The image was also median ltered using a 3 3 neighbourhood. Denoting P [r; c] as either the EVT or Gaussian-integral outlier probability for pixel [r; c], M[r; c] the median-ltered response and O[r; c] the observed (noisy) image, then the ltered images, F [r; c] are obtained from a convex combination of the noisy image and the median-ltered version. F [r; c] = (1 P [r; c])o[r; c] + P [r; c]m[r; c] (19) i.e. pixels are replaced by their median-ltered values only if it is believed that noise corruption occurs. This is in contrast, of course, to the `traditional' median ltering paradigm in which all pixels are replaced by the neighbourhood median value. It is noted that the latter approach gave results worse than either probabilistic approach. Furthermore the convex combination dened in Equation 19 gives better results for both probability measures (EVT and Gaussian integral) than threshold-based pixel replacement methods (i.e. ltering pixel if P [r; c] > threshold) even if the threshold value is optimised. Figure 9 shows the original image, I (top left), the corrupted image, O (top right), the ltered image obtained via the Gaussian-integral outlier probability (bottom left) and the ltered image obtained via the EVT outlier probability (bottom right). Also indicated are the normalised mean-square-error (NMSE) values (averaged per pixel) between I and each other image, I, dened as and summarised in the following table, NMSE = h(i[r; c] I [r; c]) 2 i hvar (I[r; c])i 9
10 Method NMSE Median.534 Gaussian-integral lter.432 EVT lter.14 It is clear that a considerable reduction in reconstruction error is obtained using the EVT method. 4 Conclusions The issue of determining whether some datum is indicative of abnormal or novel system behaviour is paramount in areas such as medical data analysis where the acquisition of large quantities of abnormal data is dicult. Several approaches to screening data for such novel segments have been proposed previously. Extreme-value theory and the associated statistics provide, however, a complete and principled approach to the problem which avoids the need for cross-validation or setting of (possibly arbitrary) novelty thresholds. EVT forms an elegant methodology for any problem in which the chance of nding extremes of some quantity is required. It is shown that a simple parametric model for the reduced variate may be used which enables the EVT distributions from a Gaussian mixture model to be estimated. These estimates t the experimental data well. A variety of novelty problems are analysed using this method and the results are shown to be superior to those obtained using more conventional approaches. 5 Acknowledgements Thanks are rst due to Dr. Stephen Walker for help with aymptotic forms and to the anonymous referees of earlier drafts of this paper whose patient comments have helped improve the work considerably. The author would also like to thank Dirk Husmeier and Peter Sykacek for helpful comments. The author is most grateful to Julia Spyers-Ashby of the Royal Hospital for Neurodisability, Putney, UK, for the tremor data. This work was researched whilst the author was on sabbatical leave from Imperial College at the Newton Institute for Mathematical Sciences in Cambridge whose support is gratefully acknowledged. 1
11 References [1] S. Roberts and L. Tarassenko. A Probabilistic Resource Allocating Network for Novelty Detection. Neural Computation, 6:27{284, [2] L. Tarassenko, P. Hayton, N. Cerneaz, and M. Brady. Novelty detection for the detection of masses in mammograms. In Proceedings of : Fourth IEE International Conference on Articial Neural Networks, pages 442{447. IEE, June [3] S. Roberts, L. Tarassenko, J. Pardey, and D. Siegwart. A Validation Index for Articial Neural Networks. Proceedings of International Conference on Neural Networks & Expert Systems in Medicine & Healthcare, Plymouth, August 1994, pages 23{3, [4] C.M. Bishop. Novelty Detection and Neural Network Validation. IEE Proc. Vis. Image Signal Processing, 141:217{222, [5] R.A. Fisher and L.H.C. Tippett. Limiting Forms of the Frequency Distributions of the Largest or Smallest Members of a Sample. Proc. Cambridge Phil. Soc., 24:18{19, [6] P. Embrechts, C. Kluppelberg, and T. Mikosch. Modelling Extremal Events. Springer-Verlag, [7] W. Weibull. A Statistical Distribution of Wide Applicability. J. Appl. Mechanics, 18:293{297, [8] R.L. Smith. Handbook of Applicable Mathematics: Supplement, chapter 14, Extreme Value Theory, pages 437{472. John Wiley, 199. [9] E.J. Gumbel. Statistics of Extremes. Columbia University Press, New York, [1] N. Balakrishnan and A.C. Cohen. Order Statistics abd Inference. Academic Press, New York, [11] S.L. Miller, W.M. Miller, and P.J. McWhorter. Extremal Dynamics: A unifying physical explanation of fractals, 1/f noise, and activated processes. J. Appl. Phys., 73(6):2617{2628, [12] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Roy. Stat. Soc., 39(1):1{38, [13] C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, [14] J. Rissanen. Modelling by shortest data description. Automatica, 14:465{471, [15] S.J. Roberts, D. Husmeier, I. Rezek, and W. Penny. Bayesian approaches to mixture modelling. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2(11):1133{1142, [16] J.M. Spyers-Ashby. The Recording and Analysis of Tremor in Neurological Disorders. PhD thesis, Imperial College, University of London, [17] I.A. Rezek and S.J. Roberts. Stochastic complexity measures for physiological signal analysis. IEEE Transactions on Biomedical Engineering, 44(9),
12 6 Figures µ m expt Figure 1: Plot of expt m against (log) m evaluated over Monte-Carlo simulations. Note that whilst the relationship is roughly linear, a straight-line t would be inappropriate if we wish to model over a large range of m. m 12
13 σ m 1 µm m Figure 2: Plot of experimental (solid line) and theoretical (dashed line) values of 1 m m. One standard deviation error intervals are also shown for the experimental data (the thinner solid lines) P(x m).5 m = 1.4 m = m = x Figure 3: This plot shows the form of the cumulative distribution P (x j m) for m = 1; 1; 1. Note that as m increases the resultant distributions become more extreme. Circles denote experimental results from Monte-Carlo simulations and the solid line the theoretical result of Equation 5. 13
14 1.5 1 m = 1 p(x m) m = x Figure 4: This plot shows the form of the density function p(x j m) for m = 1; 1. Note that as m increases the resultant distributions become more extreme. Circles denote experimental results from Monte-Carlo simulations and the solid line the theoretical result of Equation 11. P(x m) m x Figure 5: Evolution of P (x j m) with m for a one-dimensional, two-component, Gaussian mixture model. The components are centred at 1, 1 = 1, 2 = 1=2 and have equal priors. As the number of observed data increases so the resultant extremal probabilities become less conservative. The value of m increases linearly from 2 to 1. 14
15 1.8 P(x m) x x1 1 2 Figure 6: Mesh plot of the EVT probability (Equation 17) over the two-dimensional space of the hand tremor data x x1 Figure 7: Contour plot of P > :95 boundary on EVT probability (Equation 17) [thick curve] along with the P > :95 boundary from the error function approach (Equation 16) [thin curve]. Data from the non-patient class are shown as open circles (N = 9) and data from patients (N = 1) are shown as lled circles. Note the larger number of false positives using the integrated mixture approach and the one unavoidable false negative present in both approaches. 15
16 1 P(x) P(x m) t (seconds) Figure 8: Novelty probability for epileptic seizure activity using error-function method (upper plot), EVT (middle plot). The lower trace shows the rst period of activity agged as novel by the EVT approach. All x-axis times are in seconds from the start of the recording. 16
17 Original NSME =.5312 NSME =.432 NSME =.14 Figure 9: Original image (top left), corrupted image (top right), median ltered image (bottom left) and EVT-ltered image (bottom right). Normalised mean-square errors of distortion from the original are also given. 17
EXTREME VALUE STATISTICS FOR NOVELTY DETECTION IN BIOMEDICAL SIGNAL PROCESSING
EXTREME VALUE STATISTICS FOR NOVELTY DETECTION IN BIOMEDICAL SIGNAL PROCESSING Stephen J. Roberts Department of Engineering Science, University of Oxford, UK NOVELTY DETECTION Novelty, outlier or abnormality
More informationMinimum Entropy Data Partitioning. Exhibition Road, London SW7 2BT, UK. in even a 3-dimensional data space). a Radial-Basis Function (RBF) classier in
Minimum Entropy Data Partitioning Stephen J. Roberts, Richard Everson & Iead Rezek Intelligent & Interactive Systems Group Department of Electrical & Electronic Engineering Imperial College of Science,
More informationoutput dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1
To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationWidths. Center Fluctuations. Centers. Centers. Widths
Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.
More informationFigure : Learning the dynamics of juggling. Three motion classes, emerging from dynamical learning, turn out to correspond accurately to ballistic mot
Learning multi-class dynamics A. Blake, B. North and M. Isard Department of Engineering Science, University of Oxford, Oxford OX 3PJ, UK. Web: http://www.robots.ox.ac.uk/vdg/ Abstract Standard techniques
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationAutomated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling
Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling Abstract An automated unsupervised technique, based upon a Bayesian framework, for the segmentation of low light level
More informationEEG- Signal Processing
Fatemeh Hadaeghi EEG- Signal Processing Lecture Notes for BSP, Chapter 5 Master Program Data Engineering 1 5 Introduction The complex patterns of neural activity, both in presence and absence of external
More informationNovelty detection in jet engine vibration spectra
FEATURE MODELLING IN CM Novelty detection in jet engine vibration spectra D A Clifton and L Tarassenko Submitted 25.11.13 Accepted 09.01.14 This paper presents a principled method for detecting the abnormal
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationNovelty Detection with Multivariate Extreme Value Statistics
J Sign Process Syst (2) 65:37 389 DOI.7/s265--53-6 Novelty Detection with Multivariate Extreme Value Statistics David Andrew Clifton Samuel Hugueny Lionel Tarassenko Received: 5 January 2 / Revised: 5
More informationLearning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract
Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting
More informationGaussian process for nonstationary time series prediction
Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong
More informationHMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems
HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems Silvia Chiappa and Samy Bengio {chiappa,bengio}@idiap.ch IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland Abstract. We compare the use
More informationBayesian time series classification
Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University
More informationUnsupervised Learning Methods
Structural Health Monitoring Using Statistical Pattern Recognition Unsupervised Learning Methods Keith Worden and Graeme Manson Presented by Keith Worden The Structural Health Monitoring Process 1. Operational
More informationA class of probability distributions for application to non-negative annual maxima
Hydrol. Earth Syst. Sci. Discuss., doi:.94/hess-7-98, 7 A class of probability distributions for application to non-negative annual maxima Earl Bardsley School of Science, University of Waikato, Hamilton
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationBayesian Inference for the Multivariate Normal
Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationModelação de valores extremos e sua importância na
Modelação de valores extremos e sua importância na segurança e saúde Margarida Brito Departamento de Matemática FCUP (FCUP) Valores Extremos - DemSSO 1 / 12 Motivation Consider the following events Occurance
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationA variational radial basis function approximation for diffusion processes
A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University - Neural Computing Research Group Aston Triangle, Birmingham B4
More informationINTRODUCTION TO PATTERN RECOGNITION
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationConfidence Estimation Methods for Neural Networks: A Practical Comparison
, 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.
More informationSmooth Bayesian Kernel Machines
Smooth Bayesian Kernel Machines Rutger W. ter Borg 1 and Léon J.M. Rothkrantz 2 1 Nuon NV, Applied Research & Technology Spaklerweg 20, 1096 BA Amsterdam, the Netherlands rutger@terborg.net 2 Delft University
More informationABSTRACT INTRODUCTION
ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,
More informationIntelligent Systems Statistical Machine Learning
Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationIntelligent Systems Statistical Machine Learning
Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2015/2016, Our model and tasks The model: two variables are usually present: - the first one is typically discrete
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationMIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA
MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA Jiří Grim Department of Pattern Recognition Institute of Information Theory and Automation Academy
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationG. Larry Bretthorst. Washington University, Department of Chemistry. and. C. Ray Smith
in Infrared Systems and Components III, pp 93.104, Robert L. Caswell ed., SPIE Vol. 1050, 1989 Bayesian Analysis of Signals from Closely-Spaced Objects G. Larry Bretthorst Washington University, Department
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationGaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer
Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationa subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp
Input Selection with Partial Retraining Pierre van de Laar, Stan Gielen, and Tom Heskes RWCP? Novel Functions SNN?? Laboratory, Dept. of Medical Physics and Biophysics, University of Nijmegen, The Netherlands.
More informationGaussian Processes for Sequential Prediction
Gaussian Processes for Sequential Prediction Michael A. Osborne Machine Learning Research Group Department of Engineering Science University of Oxford Gaussian processes are useful for sequential data,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationDimensional reduction of clustered data sets
Dimensional reduction of clustered data sets Guido Sanguinetti 5th February 2007 Abstract We present a novel probabilistic latent variable model to perform linear dimensional reduction on data sets which
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationHIERARCHICAL MODELS IN EXTREME VALUE THEORY
HIERARCHICAL MODELS IN EXTREME VALUE THEORY Richard L. Smith Department of Statistics and Operations Research, University of North Carolina, Chapel Hill and Statistical and Applied Mathematical Sciences
More informationNon-independence in Statistical Tests for Discrete Cross-species Data
J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationConvergence Rate of Expectation-Maximization
Convergence Rate of Expectation-Maximiation Raunak Kumar University of British Columbia Mark Schmidt University of British Columbia Abstract raunakkumar17@outlookcom schmidtm@csubcca Expectation-maximiation
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationIn: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999
In: Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Methods and Applications (CIMA), International Computer Science Conventions Rochester New York, 999 Feature Selection Based
More informationECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012
Reading ECE 275B Homework #2 Due Thursday 2-16-12 MIDTERM is Scheduled for Tuesday, February 21, 2012 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example 7.11,
More informationLearning from Data Logistic Regression
Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationEstimation of Gutenberg-Richter seismicity parameters for the Bundaberg region using piecewise extended Gumbel analysis
Estimation of Gutenberg-Richter seismicity parameters for the Bundaberg region using piecewise extended Gumbel analysis Abstract Mike Turnbull Central Queensland University The Gumbel statistics of extreme
More informationEXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY TREND WITH APPLICATIONS IN ELECTRICITY CONSUMPTION ALEXANDR V.
MONTENEGRIN STATIONARY JOURNAL TREND WITH OF ECONOMICS, APPLICATIONS Vol. IN 9, ELECTRICITY No. 4 (December CONSUMPTION 2013), 53-63 53 EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY
More informationThe Minimum Message Length Principle for Inductive Inference
The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationJournal of Biostatistics and Epidemiology
Journal of Biostatistics and Epidemiology Original Article Robust correlation coefficient goodness-of-fit test for the Gumbel distribution Abbas Mahdavi 1* 1 Department of Statistics, School of Mathematical
More informationBAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS
BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015
Reading ECE 275B Homework #2 Due Thursday 2/12/2015 MIDTERM is Scheduled for Thursday, February 19, 2015 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example
More informationin a Rao-Blackwellised Unscented Kalman Filter
A Rao-Blacwellised Unscented Kalman Filter Mar Briers QinetiQ Ltd. Malvern Technology Centre Malvern, UK. m.briers@signal.qinetiq.com Simon R. Masell QinetiQ Ltd. Malvern Technology Centre Malvern, UK.
More informationExtreme Value Theory for Novelty Detection in Vital Sign Monitoring
Extreme Value Theory for Novelty Detection in Vital Sign Monitoring Samuel Hugueny Life Sciences Interface Doctoral Training Centre University of Oxford July 25, 2008 Abstract Studies indicate that a large
More informationThe Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a preprint version which may differ from the publisher's version. For additional information about this
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationExtreme Value Analysis and Spatial Extremes
Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models
More informationStatistical Filters for Crowd Image Analysis
Statistical Filters for Crowd Image Analysis Ákos Utasi, Ákos Kiss and Tamás Szirányi Distributed Events Analysis Research Group, Computer and Automation Research Institute H-1111 Budapest, Kende utca
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationSmall sample size generalization
9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University
More informationStatistical inference for MEG
Statistical inference for MEG Vladimir Litvak Wellcome Trust Centre for Neuroimaging University College London, UK MEG-UK 2014 educational day Talk aims Show main ideas of common methods Explain some of
More informationA Generalization of Principal Component Analysis to the Exponential Family
A Generalization of Principal Component Analysis to the Exponential Family Michael Collins Sanjoy Dasgupta Robert E. Schapire AT&T Labs Research 8 Park Avenue, Florham Park, NJ 7932 mcollins, dasgupta,
More informationOverlapping Astronomical Sources: Utilizing Spectral Information
Overlapping Astronomical Sources: Utilizing Spectral Information David Jones Advisor: Xiao-Li Meng Collaborators: Vinay Kashyap (CfA) and David van Dyk (Imperial College) CHASC Astrostatistics Group April
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationLecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1
Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring
More informationFeature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size
Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationStatistical Learning Reading Assignments
Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationMC3: Econometric Theory and Methods. Course Notes 4
University College London Department of Economics M.Sc. in Economics MC3: Econometric Theory and Methods Course Notes 4 Notes on maximum likelihood methods Andrew Chesher 25/0/2005 Course Notes 4, Andrew
More information