Cross-validation of N earest-neighbour Discriminant Analysis

Size: px
Start display at page:

Download "Cross-validation of N earest-neighbour Discriminant Analysis"

Transcription

1 r--;"'-'p'-;lo':y;q":;lpaof!i"joq PS56:t;;-'5""V""""::;':::'''-''i;''''':h'''''<>--:''':-c'-::.'" !.1 ---;-- '.' -" ',". ----_. -- Cross-validation of N earest-neighbour Discriminant Analysis A.P. White! Computer Centre Elms Road University of Birmingham Edgbaston Birmingham B15 2TT United Kingdom Abstract The SAS statistical package contains a general-purpose discriminant procedure, DISCRIM. Among the options available for this procedure are ones for performing nearest-neighbour discriminant analysis and cross-validation. Each of these works well enough when used separately but, when the two options are used together, an optimistic bias in cross-validated performance emerges. For certain parameter values, this bias can be dramatically large. The cause of the problem is analyzed mathematically for the two-class case with uniformly distributed data and demonstrated by simulation for normal data. The corresponding misbehaviour for multiple classes is also demonstrated by Monte Carlo simulation. A modification to the procedure, which would remove the bias, is proposed. Key Words: SAS; optimistic bias. Cross-validation; nearest neighbour discriminant analysis; 1 A.P. White is also an Associate Member of the School of Mathematics and Statistics at the University of Birmingham 829

2 1 Introd uction The general discriminant problem is one of deriving a model for classifying observations of unknown class membership by making use of a set of observations from the same population, whose class membership is known. To be : < :.i., -':' :". t' ',' ' ( f: Ii L!, : I'.t. " ti :r l',- f. K [r, " f r 11 I:t,, I, more specific, let S be a set of n observations, each of the form (w, x), where W denotes membership of one of c classes. Each observation has measurements on m variables, giving the vector x. Let the prior probability for membership of class i be p(wi). Also, let the unconditional and class-specific probability densities at x be I(x) and Ji(x), respectively. Bayes' theorem then gives the posterior probability that an observation at x belongs to class i: (.1 ) - p(wi)!i(x) P W t x - I(x) (1) Because the classes are mutually exclusive and jointly exhaustive, this can be re-written as: Classification of a new observation at x is then carried out from the posterior probabilities. p(wjlx) = maxi(p(wilx )). (2) Thus x is predicted as belonging to class Wj if Now, different approaches to discriminant analysis employ different methods of estimating the class-specific probability densities, Ji(x). In the parametric case, Fisher's linear discriminant analysis derives these estimates byassuming a multivariate normal distribution for the data, based on class-specific sample means and a pooled covariance matrix. (The quadratic version is similar but uses separate covariance matrices for each class). The k-nearestneighbour (k-nn) method, on the other hand, is nonparametric and makes no such distributional assumptions. In this approach, a kernel is formed in the measurement space. This kernel has the shape of a hypersphere and is centred on x. The volume, V, of the kernel is such that it is just large enough to contain k observations. Let ki of these observations belong to class Wi and let there be ni observations in S, belonging to class Wi. Thus, summing over all c classes gives E ki = k and E ni = n. \ t \ "."- '.".'... - ;'_-rr : X.-.; - '-;;_'.-'.--'

3 r""9:.q.:=;sk"'5'.....;;;;... Y"&-:::<:S'"'---<..;?-G-;;;.-""-... -"'-"-'.,- --;: -" I } Hand (1981) shows how posterior probabilities can be estimated in such a situation. The essence of his argument is as follows. probability density for class Wi at x is estimated by: A k hex) =-' niv The class-specific Similarly, the unconditional probability density at x is estimated by: k nv A I(x) = If the sample sizes, ni, are proportional to the prior probabilities, p(wi), then the priors can also be estimated by: Substituting from Equations 3, 4 and 5 into 1 gives estimates for the posterior probabilities: p(wilx ) = ki (6) k On the other hand, if the sample sizes are not proportional to the priors, then an adjustment is required. In this case, let: where the various Ti are adjustment factors for the lack of proportionality. The estimates for the posterior probabilities now become: (3) (4) (5) (7) (8) 2 Cross-validation Anomaly in SAS The statistical package SAS contains a multi-purpose discriminant procedure, called DISCRIM. This procedure has options for k-nn discriminant analysis and for cross-validation. These options may be used in combination. However, the way in which this is implemented in SAS is responsible for a rather strange difficulty which arises under cross-validation, in the form of a parameter-dependent bias in the cross-validated error rate estimate. In certain 831

4 . ::. ; " f t";. :r, >.. 2 "J: ;- '.3" l #, : )-; t f' f; f l'l," '" "- {; '" i If 1? t k i. circumstances, this bias can be dramatically large. This anomaly is, perhaps, best introduced by means of an example, leaving a more general treatment of the problem until later in this paper. Suppose that cross-validation is being performed on a data set in which the measurement space consists of a single uniformly distributed variable, x, and that observations belong to one of two equiprobable classes. Suppose that x contains no information at all about class membership and that the sample sizes nl and n2 are equal. Now, consider the behaviour of an algorithm, operating as previously specified, with a parameter setting of k = 2. The distribution of kernel membership over the cross-validation procedure will be very nearly binomial (k,p), where p = nt/no In this case, p = 1/2. (The distribution is not exactly binomial because p changes very slightly, according to the actual class membership of he small detail is unimportant here). observation being classified but this The focus of interest is the consequences which follow from tied classmembership in the kernels. In this example, approximately half the kernels would be expected to have one neighbour belonging to each class. Without loss of generality, consider what happens when a member of class 1 is subjected to cross-validatory classification in this situation, in order to estimate eev (the cross-validated error rate). From Equation (3), it can be seen that: and also that: A 1 hex) = V(nl - 1) A 1 h(x)= Vn2 As the prior probabilities are equal, it follows that p(wllx) > p(w2ix) and the case will be classified correctly. Kernels with pure class membership will obviously produce classification in the expected direction. This leads to the expected value for the cross-validated error rate, E(ecv ), being only 1/4, rather than 1/2 as expected under a random assignment of observations to predicted classes. With these parameter values, it is easy to see that when k is changed, the parity of k has a marked effect on the estimated error rate under crossvalidation, because of the effect of ties in the kernel membership when k is (9) (10) \ \. \. :.: I 832

5 even but not when it is odd. Thus, for odd values of k, E(e cv ) = 1/2 but for even values of k, an optimistic bias in E(ecv ) is clearly evident: (for k even) (11) Another disturbing feature of this approach is the relationship that emerges between ers (the resubstitution error estimate) and e cv Under resubstitution, for the same parameter values, the effect of ties in kernel membership is different. In these circumstances, the class-specific probability densities will be exactly equal, leading to a tie in the posterior probabilities. In SAS these ties are evaluated conservatively (i.e. as classification errors). Consequently,for k > 1, the following relationship holds: e(k+l) = e(k) rs cv (12) (Of course, for k = 1, ers is zero, because each observation is its own nearest neighbour). In fact, this relationship is quite general and can be shown to hold for any number of equiprobable classes. Moving from cross-validation with k nearest neighbours, to resubstitution with k + 1, increases the number of neighbours of the same class as the test case by one. Because of the different consequences of having ties for the majority in kernel membership under resubstitution and cross-validation, this means that the judgement of majority membership will not differ between the two schemes. Now, all these strange properties follow from the fact that, for tied kernel membership, the density estimates for the two classes are not equal (as might be naively expected) but are biased in favour of the class to which the deleted observation belongs. In the parametric situation, by contrast, this does not happen. Consider a particular observation being classified using Fisher's linear discriminant analysis. Suppose that, under resubstitution, the observation lies at a point exactly mid-way between the two group means (and hence the group-specific densities are equal). Under cross-validation, the group mean of the class to which the deleted case belongs will have moved slightly farther away from the observation itself (because this observation no longer makes a contribution to the computation of the mean) and hence the classpecific 833

6 density estimate will be somewhat lower for the true class, than for the other class. When the sample sizes are not equal, the effect of sample size on the estimates of group-specific density and hence on the estimated posterior probabilities is easily calculated, for ties in kernel membership. The results are summarised in Table 1. It can easily be seen that, for Inl - n21 1, there is an optimistic bias in the classification behaviour under cross-validation. Outside these limits, the mean error rate has the appropriate theoretical value but the performance is markedly different for observations from the two different classes. Actual Class nl- n2 1 2 >1 wrong correct 1 tie correct 0 correct correct -1 correct tie < -1 correct wrong Table 1: Classification behaviour under cross-validation, for ties in kernel membership, as a function of differences in sample size. See text for further explanation. 3 Differences in Class Location For uniformly distributed data, it is a simple matter to generalise the argument just presented to the situation where there are differences in location between two equiprobable classes. Let one class have uniformly distributed data lying in the range (0, 1) and the other have uniform data in the range (s, 1 + s). Thus s is the separation distance between the class means and the classes overlap for a distance 1 - s on the data line. A Bayes' decision rule will give errors only where the classes overlap in the data space. Thus, the Bayes' error rate, eb is given simply by: 1 eb = -(1 - s) 2 (13) 834

7 r-qk-s...ag44i'm""",,,-",::x :('::':.'-.>"--'----" i! In this situation, the k-nn classification rule should approximate to the Bayes' rate, provided that k is small compared with the sample sizes. In SAS, under cross-validation, this is the case only for odd values of k. For even values of k, the effect of kernel ties in the overlap region produces an optimistic bias in E(eev}. Just as before, this is clearly evident: (k)} ( k ) E(ecv - eb(1 k/2 2k} (for k even) (14) 4 Unequal Class Probabilities Any attempt to extend the argument just presented to the situation involving unequal class probabilities immediately runs into a complicating factor in the performance of the nearest-neighbour algorithm. This arises from the fact that the k-nn classification rule is optimal only for the special case of equal class probabilities. Cover & Hart (1967) proved that, for any number of classes, the single nearest-neighbour decision rule produces an error rate, e), which is bounded below by the Bayes' error rate and above by twice the Bayes' rate: (15) This is easily illustrated with the sort of discrimination problem under consideration in this paper. For a two-class problem, with the sample sizes proportional to the prior probabilities, then provided that the data are smoothly distributed, the following error rate analysis is applicable. Within the region of overlap, for an observation in class 1, the probability of a classification error is simply the probability that its nearest neighbour in the data space is in class 2, i.e. p(w2}. Likewise, the probability of mis-classifying a case in class 2 is p(wl). Without loss of generality, let p(wl) be the smaller of the two class probabilities, i.e. p(wl) p(w2). Taking class separation into account, weighting the classes by their prior probabilities and writing p(w2) as 1 - p(wl) gives: (16) By contrast, the Bayes' error rate is given by the simple decision rule of classifying according to the most frequent class. Hence: (17) 835

8 ""'7""'''''''''"='''t'-:.f.-;..::---"'--'-:-i:"-;'<.: -"., Clearly, E(e!/) = eb when p(wl) = 1/2 and tends to the value 2eb as p(wl) approaches zero. Thus, it is easier to deal with the general values of k without reference to the Bayes' error rate. For the general k-nn rule, within the region of overlap, the expected error rates for odd values and even values of k can be obtained from binomial expansions, as shown below. To simplify the notation, let the prior probability of one of the classes be p. Then, by the same sort of argument used earlier, odd values of k give: (k-l)/2 (k).. i=k (k)... eo=p L i p'(i_p)k-t+(i_p) L i p'(i_p)k-1 i=o (k+1)/2 (18) For even values of k, two quantities can be defined: k/2-l (k).. i=k (k).. e = p?:i pt(1 - p)k-1 + (1- p) L i pt(1 - p)k-, t=o k/2+l (19) and ee = e + (k2)pk/2(1- p)k/2 (20) Thus the true expected cross-validated error rates are: E(eC'l)) = (1 - s)eo (for k odd) (21) and (for k even) (22) However, because of the way that kernel ties are evaluated in SAS: E(ecv ) = (1 - s)e (for k even, in SAS) (23) Thus, for even values of k, an optimistic bias in the cross-validated error rate is noticeable as: \ \. 836

9 5 The Normal Error Distribution So far, the analysis in this paper of cross-validation of the k-nn algorithm in SAS has examined its behaviour in dealing with the uniform error distribution only. This distribution was chosen because of the simplicity that it lends to the theoretical analysis. This simplicity arises from the fact that the range boundaries allow the data space to be partitioned into regions which differ strongly in class-specific density. Also, within each. region, the class-specific densities are necessarily constant. If we turn to examining behaviour with other distributions, such as the normal, then theoretical analysis becomes more difficult because of the fact that the class-specific densities change continuously with the value of x. Furthermore, the normal curve cannot be integrated analytically, which adds to the problem. For this reason, it was thought preferable to examine the behaviour of the algorithm with normally-distributed data by means of Monte Carlo simulation. Twelve thousand observations were drawn from the normal (0,1) distribution and two binary class membership indicators were simulated independently. One had exactly equal numbers of cases in each class, while the other was arranged to have exactly two-thirds of the cases in the most frequent class. Two situations were examined. In one, x was uncorrelated with class. In the other, a new variable, xl, was derived from x, simply by adding the binary class membership indicator, thereby introducing a class separation distance of unity between the respective class means. Thus four conditions were arranged, as follows: EQRAND: equiprobable classes and no class separation; EQDIST: equiprobable classes and unit class separation; NEQRAND: class membership odds of 2 : 1 and no class separation; NEQDIST: class membership odds of 2 : 1 and unit class separation; For each condition, the classification performance of the k-nn discriminant algorithm in SAS was estimated with the cross-validation option. Prior probabilities were estimated from the data. Each condition was tested using values 837

10 == of k from 1 to 12. For comparison purposes, Fisher's linear discriminant analysis was also applied to the same data. Experimental Condition k EQRAND EQDIST NEQRAND NEQDIST paramo Bayes ; Table 2: Cross-validation estimates of error rates as a function of k and experimental condition, obtained in Monte Carlo simulation. The parametric crossvalidation estimates for Fisher's linear discriminant analysis are also given, in addition to the Bayes error rate. See text for further explanation. The resulting error rates are shown in Table 2. For each experimental condition, there is the same type of optimistic bias in E(ecv ), for even values of k, as was deduced analytically for the uniform error distribution. It is obvious that these estimates are impossibly optimistic because they are smaller than the Bayes' rate, most noticeably for k = 2. These results are hardly surprising, because the essence of the problem lies in the high frequency of kernel ties and the way that they are evaluated in SAS and is not dependent on the particular within-class error distribution. \ \. 838

11 6 Extension to Multiple Classes by Monte Carlo Simulation If more than two classes are considered, the position becomes more complex because many different possibilities for ties emerge for majority kernel membership. For example, if c = 3 and k = 6, a three-way tie is possible, as well as three possible two-way ties. Also, ties emerge even when k is not a multiple of c. For example, with k = 6 and four classes, a kernel might have two observations from each of two classes and a single observation from each of the other two classes. Thus, an exact analytic approach to examining the effect of ties for more than two classes is extremely tedious and not worth the trouble. For this reason, a simple Monte Carlo simulation was performed, as follows. Twelve thousand observations were drawn from a uniform distribution and class membership indicator variables for 2, 3, 4, 5 and 6 classes were simulated (independently of the continuous variable), so as to have exactly equal numbers of observations in each class. Thus five data sets were generated, all sharing the same independent variable, which conveyed no information about class membership. The classification performance of k-nn discriminant analysis was estimated using the DISCRIM procedure in SAS, with the cross-validation option. Each data set was tested using values of k from 1 to 12. The resulting error rates are shown in Table 3. The following points should be noted. 1. The error rates for c = 2 are as expected from Equation In all cases where k = 1, the error rates approximate closely to the theoretical expected values. 3. In all situations where c > 2 and k > 1, the results show clearly that the estimated error rates are substantially lower than the corresponding theoretical expected values. 4. Resubstitution estimates of the error rates were also recorded. For k > 1, they confirmed exactly the relationship with the cross-validation estimates stated in Equation

12 N umber of Classes k paramo Bayes Table 3: Cross-validation estimates of error rates as a function of k and number of classes obtained in Monte Carlo simulation. The parametric cross-validation estimates for Fisher's linear discriminant analysis are also given, in addition to the Bayes error rate. See text for further explanation. 7 A Possible Remedy The basis of the problem lies in peculiarities of the density estimation procedure in the k-nn algorithm under cross-validation, compounded by the high frequency of kernel ties. However, it is possible to compensate for this by making adjustments to the estimates of the prior probabilities. Hence, one solution to the problem is to estimate the prior probabilities from the data after case deletion, rather than fix them from the outset as is done conventionally. 2 If this course of action is taken then, under cross-validation, if the deleted case belongs to class i, the prior probability for membership of class i is then estimated by: A( ) ni - 1 (25) PWi =-- n-l 2Note that this adaptation is proposed for the nonparametric algorithm only. 840

13 However, the prior probability for membership of any of the other classes, j, is given by: n p(wj) = n 1 (26) The corresponding class-specific densities at x are estimated as: and ki fi(x) = V(ni - 1) (27) j.(x) = l5.l J Vn. and the unconditional density is estimated by: J (28) k f(x) = (n - l)v (29) Thus the corresponding estimated posterior probabilities become: ( I) ki P Wi x =- k (30) and k p(wjlx) = -.2 (31) k In these circumstances, if there is a tie for majority kernel membership involving the class to which the deleted case belongs, then there will also be a tie in the estimated posterior probabilities. It is proposed that a random classification choice is made between the tied classes in these cases. If this is done, then the resulting error rate will be essentially unbiased. Of course, the approach just described is appropriate only when the samples have been drawn so as to be representative of the populations that they are intended to represent. If the priors are non-proportional, then this approach needs to be modified. In this situation, the priors must be specified initially by the user. To begin with, this additional information is ignored and the computation is performed as previously specified, up to the point where the posterior probabilities are estimated. However, the posterior probabilities then need to be adjusted for the lack of proportionality before the assignment to classes is made. Thus, if 7ri is the user-specified prior for class Wi, then the adjustment factor required is: (32) 841

14 The appropriately adjusted estimate of the posterior probability is then given by Equation 8. 8 Conclusion This is an interesting example of a problem occurring in statistical software which is caused, not by a computing error, but by a mathematical one. The misbehaviour of the k-nn algorithm under cross-validation is entirely deducible from the mathematics given in the SAS manual, (SAS Institute, 1989). In this respect, the nature of the problem is similar to the one reported by White & Liu (1993), in which a stepwise discriminant algorithm is improperly crossvalidated. In both cases, the respective problems arose because of lack of consideration of the effects of combining techniques. In the case of SAS, the k-nn algorithm works well enough when considered in isolation and so does the cross-validation technique. The difficulty arises when the two techniques are used in combination. Apart from SAS, nonparametric discriminant techniques are not available in the commonly used statistical software with which the author is familiar. Hence, problems encountered by combining these two techniques do not seem to have been encountered elsewhere. The solution offered in this paper keeps as close as possible to the original philosophies of both cross-validation and k-nn discriminant analysis. It involves estimating the prior probabilities from the data after the case deletion which forms part of the cross-validation procedure. The only possibly contentious aspect is the proposed use of random choice between predicted classes in the case of ties in posterior probabilities. One feature of this approach is that the procedure is non-repeatable. However, there are precedents for this type of procedure. Tocher (1950) proposed a modification to Fisher's exact probability test which utilised random choice in order to achieve specified a values for significance testing purposes. Also, the use of approximate randomization techniques for conducting significance tests has been described by Edgington (1980), Still & White (1981) and White & Still (1984, 1987). The proposal here is to make use of random choice to achieve an unbiased estimate of classification performance when kernel ties are encountered. \ \, 842 -_.._.. _

15 Acknowledgements The author would like to thank Prof. J.B. Copas (from the Department of Statistics at the University of Warwick) and Prof. A.J. Lawrance and Dr. P. Davies (both from the School of Mathematics and Statistics, at the University of Birmingham) for their helpful comments. References Cover, T.M. and Hart, P.E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, IT-13, Edgington, E.S. (1980). Randomization Tests, New York: Marcel Dekker. Hand, D.J. (1981). Discrimination and Classification, New York: John Wiley & Sons Ltd. SAS Institute Inc. (1989). SAS/STAT User's Guide, Version 6, Fourth Edition, Volume 1, Cary, NC, USA: SAS Institute Inc. Still, A.W. and White, A.P. (1981). The approximate randomization test as an alternative to the F test in analysis of variance. British Journal of Mathematical and Statistical Psychology, 34, Tocher, K.D. (1950). Extension of the Neyman-Pearson theory of tests to discontinuous variates. Biometrika, 37, White, A.P. and Liu, W.Z. (1993). The jackknife with a stepwise discriminant algorithm - a warning to BMDP users. Journal of Applied Statistics, 20, (1), White, A.P. and Still, A.W. (1984). Monte Carlo analysis of variance. In Proceedings of the Sixth Symposium in Computational Statistics (Prague). Vienna: Physica-Verlag. White, A.P. and Still, A.W. (1987). Monte Carlo randomization tests: A reply to Bradbury. British Journal of Mathematical and Statistical Psychology, 40,

Naive Bayesian classifiers for multinomial features: a theoretical analysis

Naive Bayesian classifiers for multinomial features: a theoretical analysis Naive Bayesian classifiers for multinomial features: a theoretical analysis Ewald van Dyk 1, Etienne Barnard 2 1,2 School of Electrical, Electronic and Computer Engineering, University of North-West, South

More information

Discrimination: finding the features that separate known groups in a multivariate sample.

Discrimination: finding the features that separate known groups in a multivariate sample. Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

SOME RESULTS ON THE MULTIPLE GROUP DISCRIMINANT PROBLEM. Peter A. Lachenbruch

SOME RESULTS ON THE MULTIPLE GROUP DISCRIMINANT PROBLEM. Peter A. Lachenbruch .; SOME RESULTS ON THE MULTIPLE GROUP DISCRIMINANT PROBLEM By Peter A. Lachenbruch Department of Biostatistics University of North Carolina, Chapel Hill, N.C. Institute of Statistics Mimeo Series No. 829

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Advanced statistical methods for data analysis Lecture 1

Advanced statistical methods for data analysis Lecture 1 Advanced statistical methods for data analysis Lecture 1 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Biased Urn Theory. Agner Fog October 4, 2007

Biased Urn Theory. Agner Fog October 4, 2007 Biased Urn Theory Agner Fog October 4, 2007 1 Introduction Two different probability distributions are both known in the literature as the noncentral hypergeometric distribution. These two distributions

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 9: Discrimination and Classification 1 Basic concept Discrimination is concerned with separating

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set MAS 6J/1.16J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Discriminant Analysis and Statistical Pattern Recognition

Discriminant Analysis and Statistical Pattern Recognition Discriminant Analysis and Statistical Pattern Recognition GEOFFREY J. McLACHLAN Department of Mathematics The University of Queensland St. Lucia, Queensland, Australia A Wiley-Interscience Publication

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Explicit evaluation of the transmission factor T 1. Part I: For small dead-time ratios. by Jorg W. MUller

Explicit evaluation of the transmission factor T 1. Part I: For small dead-time ratios. by Jorg W. MUller Rapport BIPM-87/5 Explicit evaluation of the transmission factor T (8,E) Part I: For small dead-time ratios by Jorg W. MUller Bureau International des Poids et Mesures, F-930 Sevres Abstract By a detailed

More information

44 CHAPTER 2. BAYESIAN DECISION THEORY

44 CHAPTER 2. BAYESIAN DECISION THEORY 44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng

More information

Extensions to LDA and multinomial regression

Extensions to LDA and multinomial regression Extensions to LDA and multinomial regression Patrick Breheny September 22 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction Quadratic discriminant analysis Fitting models Linear discriminant

More information

SOME TECHNIQUES FOR SIMPLE CLASSIFICATION

SOME TECHNIQUES FOR SIMPLE CLASSIFICATION SOME TECHNIQUES FOR SIMPLE CLASSIFICATION CARL F. KOSSACK UNIVERSITY OF OREGON 1. Introduction In 1944 Wald' considered the problem of classifying a single multivariate observation, z, into one of two

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Necessary Corrections in Intransitive Likelihood-Ratio Classifiers

Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Gang Ji and Jeff Bilmes SSLI-Lab, Department of Electrical Engineering University of Washington Seattle, WA 9895-500 {gang,bilmes}@ee.washington.edu

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

Chapter 2 Measurements

Chapter 2 Measurements Chapter 2 Measurements 2.1 Measuring Things Measurement of things is a fundamental part of any scientifically based discipline. Some things are simple to measure, like the length of a piece of string or

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained

More information

Nonparametric Bayes-Risk Estimation

Nonparametric Bayes-Risk Estimation 440 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-17, NO. 4, JULY 1971 Nonparametric Bayes-Risk Estimation STANLEY C. FRALICK, MEMBER, IEEE, AND RICHARD w. SCOTT, MEMBER, IEEE Absrract-Two nonparametric

More information

Exploiting k-nearest Neighbor Information with Many Data

Exploiting k-nearest Neighbor Information with Many Data Exploiting k-nearest Neighbor Information with Many Data 2017 TEST TECHNOLOGY WORKSHOP 2017. 10. 24 (Tue.) Yung-Kyun Noh Robotics Lab., Contents Nonparametric methods for estimating density functions Nearest

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

p(x ω i 0.4 ω 2 ω

p(x ω i 0.4 ω 2 ω p( ω i ). ω.3.. 9 3 FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value given the pattern is in category ω i.if represents

More information

1. Introductory Examples

1. Introductory Examples 1. Introductory Examples We introduce the concept of the deterministic and stochastic simulation methods. Two problems are provided to explain the methods: the percolation problem, providing an example

More information

to be tested with great accuracy. The contrast between this state

to be tested with great accuracy. The contrast between this state STATISTICAL MODELS IN BIOMETRICAL GENETICS J. A. NELDER National Vegetable Research Station, Wellesbourne, Warwick Received I.X.52 I. INTRODUCTION THE statistical models belonging to the analysis of discontinuous

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data

Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data Jonathan Ortigosa-Hernández, Iñaki Inza, and Jose A. Lozano Contents 1 Appendix A.

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

A Statistical Analysis of Fukunaga Koontz Transform

A Statistical Analysis of Fukunaga Koontz Transform 1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,

More information

Terminology for Statistical Data

Terminology for Statistical Data Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations

More information

Is cross-validation valid for small-sample microarray classification?

Is cross-validation valid for small-sample microarray classification? Is cross-validation valid for small-sample microarray classification? Braga-Neto et al. Bioinformatics 2004 Topics in Bioinformatics Presented by Lei Xu October 26, 2004 1 Review 1) Statistical framework

More information

1 Using standard errors when comparing estimated values

1 Using standard errors when comparing estimated values MLPR Assignment Part : General comments Below are comments on some recurring issues I came across when marking the second part of the assignment, which I thought it would help to explain in more detail

More information

Probability calculus and statistics

Probability calculus and statistics A Probability calculus and statistics A.1 The meaning of a probability A probability can be interpreted in different ways. In this book, we understand a probability to be an expression of how likely it

More information

Latin Hypercube Sampling with Multidimensional Uniformity

Latin Hypercube Sampling with Multidimensional Uniformity Latin Hypercube Sampling with Multidimensional Uniformity Jared L. Deutsch and Clayton V. Deutsch Complex geostatistical models can only be realized a limited number of times due to large computational

More information

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced

More information

7 Gaussian Discriminant Analysis (including QDA and LDA)

7 Gaussian Discriminant Analysis (including QDA and LDA) 36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,

More information

MULTIVARIATE HOMEWORK #5

MULTIVARIATE HOMEWORK #5 MULTIVARIATE HOMEWORK #5 Fisher s dataset on differentiating species of Iris based on measurements on four morphological characters (i.e. sepal length, sepal width, petal length, and petal width) was subjected

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Nonparametric Methods Lecture 5

Nonparametric Methods Lecture 5 Nonparametric Methods Lecture 5 Jason Corso SUNY at Buffalo 17 Feb. 29 J. Corso (SUNY at Buffalo) Nonparametric Methods Lecture 5 17 Feb. 29 1 / 49 Nonparametric Methods Lecture 5 Overview Previously,

More information

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS EECS 833, March 006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@gs.u.edu 864-093 Overheads and resources available at http://people.u.edu/~gbohling/eecs833

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

A noninformative Bayesian approach to domain estimation

A noninformative Bayesian approach to domain estimation A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal

More information

Classification via kernel regression based on univariate product density estimators

Classification via kernel regression based on univariate product density estimators Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP

More information

NONINFORMATIVE NONPARAMETRIC BAYESIAN ESTIMATION OF QUANTILES

NONINFORMATIVE NONPARAMETRIC BAYESIAN ESTIMATION OF QUANTILES NONINFORMATIVE NONPARAMETRIC BAYESIAN ESTIMATION OF QUANTILES Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 Appeared in Statistics & Probability Letters Volume 16 (1993)

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Strati cation in Multivariate Modeling

Strati cation in Multivariate Modeling Strati cation in Multivariate Modeling Tihomir Asparouhov Muthen & Muthen Mplus Web Notes: No. 9 Version 2, December 16, 2004 1 The author is thankful to Bengt Muthen for his guidance, to Linda Muthen

More information

NAG Library Chapter Introduction. G08 Nonparametric Statistics

NAG Library Chapter Introduction. G08 Nonparametric Statistics NAG Library Chapter Introduction G08 Nonparametric Statistics Contents 1 Scope of the Chapter.... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric Hypothesis Testing... 2 2.2 Types

More information

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression ECE521: Inference Algorithms and Machine Learning University of Toronto Assignment 1: k-nn and Linear Regression TA: Use Piazza for Q&A Due date: Feb 7 midnight, 2017 Electronic submission to: ece521ta@gmailcom

More information

The DISCRIM Procedure

The DISCRIM Procedure SAS/STAT 9.2 User s Guide (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete manual is as follows:

More information

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:

More information

Online Question Asking Algorithms For Measuring Skill

Online Question Asking Algorithms For Measuring Skill Online Question Asking Algorithms For Measuring Skill Jack Stahl December 4, 2007 Abstract We wish to discover the best way to design an online algorithm for measuring hidden qualities. In particular,

More information

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department

More information

2.6 Tools for Counting sample points

2.6 Tools for Counting sample points 2.6 Tools for Counting sample points When the number of simple events in S is too large, manual enumeration of every sample point in S is tedious or even impossible. (Example) If S contains N equiprobable

More information

ISSN X On misclassification probabilities of linear and quadratic classifiers

ISSN X On misclassification probabilities of linear and quadratic classifiers Afrika Statistika Vol. 111, 016, pages 943 953. DOI: http://dx.doi.org/10.1699/as/016.943.85 Afrika Statistika ISSN 316-090X On misclassification probabilities of linear and quadratic classifiers Olusola

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

All models are wrong but some are useful. George Box (1979)

All models are wrong but some are useful. George Box (1979) All models are wrong but some are useful. George Box (1979) The problem of model selection is overrun by a serious difficulty: even if a criterion could be settled on to determine optimality, it is hard

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

REED-SOLOMON CODE SYMBOL AVOIDANCE

REED-SOLOMON CODE SYMBOL AVOIDANCE Vol105(1) March 2014 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 13 REED-SOLOMON CODE SYMBOL AVOIDANCE T Shongwe and A J Han Vinck Department of Electrical and Electronic Engineering Science, University

More information

Chapter 8 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Chapter 8 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc. 1 Learning Objectives Chapter 8 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. 2 Process Capability Natural tolerance limits are defined as follows: Chapter 8 Statistical Quality Control,

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Look before you leap: Some insights into learner evaluation with cross-validation

Look before you leap: Some insights into learner evaluation with cross-validation Look before you leap: Some insights into learner evaluation with cross-validation Gitte Vanwinckelen and Hendrik Blockeel Department of Computer Science, KU Leuven, Belgium, {gitte.vanwinckelen,hendrik.blockeel}@cs.kuleuven.be

More information

CYCLES AND FIXED POINTS OF HAPPY FUNCTIONS

CYCLES AND FIXED POINTS OF HAPPY FUNCTIONS Journal of Combinatorics and Number Theory Volume 2, Issue 3, pp. 65-77 ISSN 1942-5600 c 2010 Nova Science Publishers, Inc. CYCLES AND FIXED POINTS OF HAPPY FUNCTIONS Kathryn Hargreaves and Samir Siksek

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003. A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS

A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS K Y B E R N E T I K A V O L U M E 4 3 ( 2 0 0 7, N U M B E R 4, P A G E S 4 7 1 4 8 0 A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS

More information

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information