ALGEBRAIC STABILITY INDICATORS FOR RANKED LISTS IN MOLECULAR PROFILING

Size: px
Start display at page:

Download "ALGEBRAIC STABILITY INDICATORS FOR RANKED LISTS IN MOLECULAR PROFILING"

Transcription

1 ALGEBRAIC STABILITY INDICATORS FOR RANKED LISTS IN MOLECULAR PROFILING Giuseppe MPBA Fondazione Bruno Kessler Trento Italy Sandpit on the Application of Machine Learning Techniques to the Analysis of Complex Biomedical Data London, 6th July 2009

2 THE PROBLEM M(n p, R) n 10 3 samples p 10 5 genes / 10 6 SNPs A result of a classification/ranking procedure is a list of features, ranked according to their relevance for the classifier. To ensure repeatability and not to incur in overfitting due to information leakage phenomena and such as the selection bias, methodology has to be carefully designed.

3 THE PROBLEM M(n p, R) n 10 3 samples p 10 5 genes / 10 6 SNPs A result of a classification/ranking procedure is a list of features, ranked according to their relevance for the classifier. To ensure repeatability and not to incur in overfitting due to information leakage phenomena and such as the selection bias, methodology has to be carefully designed. Error curve for the Colon cancer dataset (left) with true labels (blue) and random labels (black).

4 THE PROBLEM M(n p, R) n 10 3 samples p 10 5 genes / 10 6 SNPs A result of a classification/ranking procedure is a list of features, ranked according to their relevance for the classifier. To ensure repeatability and not to incur in overfitting due to information leakage phenomena and such as the selection bias, methodology has to be carefully designed. Error curve for the two synthetic datasets (right) f , with 1000 discriminant features (black) and f (blue) with no discriminant features.

5 THE PROBLEM M(n p, R) n 10 3 samples p 10 5 genes / 10 6 SNPs A result of a classification/ranking procedure is a list of features, ranked according to their relevance for the classifier. To ensure repeatability and not to incur in overfitting due to information leakage phenomena and such as the selection bias, methodology has to be carefully designed. Result of a multi-institution analysis of papers about gene expression profiling in a leading journal: Inability to reproduce the analysis > 50% Partial reproduction in 1/3 Perfect reproduction in 11%

6 THE PROBLEM Solution: use a correctly planned Data Analysis Protocol (DAP) implementing a resampling scheme. At each run i=1... B an ordered list L i is produced. M(B d, N), B 10 3 THE SET OF LISTS L = {L i } B i=1 is the set of all lists and, for each list, L k i is its top-k list. i.e. the sublist consisting of the first ranked k elements from L i. What information can be derived from the structure of the feature lists set?

7 PERMUTATION GROUP THEORY GROUP A group is a set together with an associative operation which admits an identity element and such that every element has an inverse. PERMUTATIONS A permutation on a set Ω n = {s 1, s 2,..., s n} of n elements is a bijection τ : Ω n Ω n s i τ(s i ) = s j The number of different permutations on set of n elements is n!. SYMMETRIC GROUP The set of all permutations on a set Ω n (usually Ω n = N [1, n]) becomes a group S n, called permutation group or symmetric group when endowed with the operation στ : Ω n Ω n s i σ(τ(s i )) = s k

8 LIST REPRESENTATION Example: The symmetric group S 3 consists of {1 1, 2 2, 3 3} {1 2, 2 1, 3 3} {1 3, 2 2, 3 1} {1 1, 2 3, 3 2} {1 2, 2 3, 3 1} {1 3, 2 1, 3 2} REPRESENTATION Labelling the n genes involved in a problem by N [1, n], every ranked list L t can be identified as a permutation τ t S n, the image τ t (i) of the i-th gene being its ranking inside the list L t. Gene Ω n τ t (Ω n) AFFX-MurIL2_at _at n 175 Thus τ can be written as τ = (τ(1),, τ(n)) = (13347,, 175).

9 METRICS ON PERMUTATION GROUPS DEFINITION A map d defined on a permutation group S d : S S R is a distance function or a metric if it satifies the following axioms: NON NEGATIVITY d(σ, τ) 0 SEPARATION d(σ, τ) = 0 σ = τ SYMMETRY d(σ, τ) = d(τ, σ) σ, τ S σ, τ S σ, τ S TRIANGULAR d(σ, τ) + d(τ, ρ) > d(ρ, σ) RIGHT INVARIANCE d(σ, τ) = d(σρ, τρ) σ, τ, ρ S σ, τ, ρ S Triangularity might sometimes be relaxed to some less strict requirements (e.g. polygonal inequalities) that makes d a pseudometric.

10 RIGHT INVARIANCE The fifth property d(σ, τ) = d(σρ, τρ) σ, τ, ρ S is an essential requirements for distance functions in permutation group spaces. AN EXAMPLE L 1 = {A, C, D, B, E} and L 2 = {B, C, D, E, A} via the identification A1, B2, C3, D4, E5 correspond to τ 1 = (1, 4, 2, 3, 5), τ 2 = (5, 1, 2, 3, 4). Apply the permutation ρ corresponding to the relabeling A2, B5, C4, D1, E3 Then τ 1 ρ = (3, 1, 5, 2, 4) and τ 2 ρ = (3, 5, 4, 2, 1). Right-invariance corresponds to relabeling independence. The Euclidean distance d(τ, σ) = q Pn i=1 (τ 1 (i) σ 1 (i)) 2 is not right-invariant.

11 STATISTICAL DISTANCES SPEARMAN S FOOTRULE It is the L 1 distance between permutations: F (σ, τ) = P n i=1 σ(i) τ(i) SPEARMAN S RHO q Pn It is the L 2 distance between permutations: R(σ, τ) = i=1 (σ(i) τ(i))2 KENDALL S TAU It is the minimum number of pairwise adjacent transpositions (swaps) needed to transform σ into τ: K (σ, τ) = Card{(i, j) Ω 2 n : σ(i) < σ(j) and τ(i) > τ(j)} All these measures are metric.

12 THE CANBERRA DISTANCE A key fact in gene lists is that variations in the bottom part of the lists are much less relevant than differences in the top part. Thus a weight factor is needed to take care of such requirement. THE CANBERRA DISTANCE Ca(τ, σ) = nx i=1 τ(i) σ(i) τ(i) + σ(i) A natural candidate to substitute Spearman s footrule is its weighted version.

13 Let H s = sx j=1 1 j = log(s) + γ + 1 2s 1 12s 2 + ɛs 120s 4 MEAN be the s-th harmonic number with a famous approximation, where 0 < ɛ s < 1 and γ is the Eulero-Mascheroni constant. Many closed forms for sum and products involving H s have been found only very recently. THEOREM The expected value for the Canberra distance is E{Ca} Sp = 2p «H 2p 2p Its approximation is 2p p Ê{Ca} Sp = (p + 1) log(4) (p + 2) + o(1). «H p n + 3 «. 2

14 VARIANCE Let and px f p(i) = 1 p j=1 c p(i, j) = i j i + j c p(i, j) = 2i p (2H 2i H i H p+i 1) THEOREM The variance cannot be written in a entirely closed form Its approximation is V {Ca} Sp = 1 p 1 px cp(i, 2 j) + E{Ca} Sp 2c p(i, j)f p(j) i,j=1 px px = P(H p, H 2p ) + i 2 H p+i H 2i + i 2 H p+i H i. i=1 i=1 ˆV {Ca} Sp = αp + β log(p) + δ + o(p 1 ), α, β, δ R

15 DISTRIBUTION Let Since then d p(i, j) = c p(i, j) 1 px c p(i, l) 1 px c p(m, j) + 1 X px p p p 2 p c p(m, l). l=1 m=1 l=1 m=1 max d p 2 (i, i) = dp 2 (1, 1) = 1 E{Ca} Sp 2p 4 4H p+1, 1 i,j p p thus by Hoeffding theorem we have THEOREM max 1 i,j p dp 2 (i, i) lim p p 1 p V {Ca} = 0, S p The distribution of the Canberra distance on S p is asymptotically normal.

16 The problem ρ = arg max S p (Ca I ) has one solution ρ M for p even and two solutions for p odd, namely 8 p 2Y >< i=1 ρ M = >: θ = i p 2 + i 1 p p p while the corresponding maximum value is 2 p 1 2 UPPER BOUND for even p «+ 2 and θ 1 for odd p, max(ca) = Ca I (ρ M ) = S p 8 2r(H 3r H r ) if p = 4r >< (2r + 1)H 6r+1 + rh 3r+1 r + 1 H 2 3r (2r + 1)H 2r Hr if p = 4r (2r + 1)(2H 6r+3 H 3r+1 2H 2r+1 + H r ) if p = 4r + 2 >: (2r + 1)H 6r H 3r+2 (2r + 1)H 2r+1 (r + 1)H r+1 + r + 1 H 2 r if p = 4r + 3

17 DISTANCES ON TOP-k LISTS Since most important genes are located in the upper part of the lists, it is also important to be able to compare those more important portions of the lists. Managing partial lists is much harder than global lists, since they do not necessarily involve all the same elements. TOP k -LIST A top-k list is the sublist including the elements ranking from position 1 to k of the original list, which corresponds to induce a partial ordering on k out of n objects. This has a natural counterpart into the group-theoretical framework by Hausdorff topology: DISTANCE WITH LOCATION PARAMETER Consider a global distance D Consider two top-k lists τ m for m = 1, 2, involving respectively the subsets D τ 1 = τm 1 (N [1, k]) of Ω n ; m ( τm(i) Define the global list τ m(i) for i D = τ 1 m k + 1 otherwise Define D (k+1) (τ 1, τ 2 ) = D(τ 1, τ 2 ).

18 CANBERRA DISTANCE WITH LOCATION PARAMETER Ca (k+1) (τ, σ) = px i=1 min{τ(i), k + 1} min{σ(i), k + 1} min{τ(i), k + 1} + min{σ(i), k + 1}. THEOREM The expected (average) value of the Canberra metric on S n is E{Ca (k+1) } Sp = k 2k «H 2k 2k «H k k + 3 ««p 2k 4k 2 2(p k) + (2(k + 1)(H 2k+1 H k+1 k), p which can be approximated up to terms o(1) as Ê{Ca (k+1) } Sp = (k + 1)(2p k) p For p 10 4, E{Ca (k+1) } Sp Ê{Ca(k+1) } Sp kp + 3p k k 2 log(4). p

19 INCLUDING MODULES Moreover, the existence of feature modules, for instance highly correlated genes (e.g. clones), or genes belonging to the same pathway must be taken into account. Classifiers as SVM tend to swap the relative importance of correlated features during different ranking processes, and those swaps should be less penalized than movements between uncorrelated genes. MODULES-AWARE DISTANCES Let M = {g 1,..., g m} Ω p = {1,..., p} be a module consisting of m features. For each permutation τ L, define the permutation η M S m by the property η M (i) < η M (j) if τ(g ηm (i)) < τ(g ηm (j)). Then define the new permutation τ M as τ M (g i ) = τ(g ηm (i)), leaving τ M Ωp \M = τ out of M. The distance is then computed on L M = {τ M : τ L}.

20 EXTENSIONS TO PARTIAL LISTS WITH DIFFERENT LENGTH Let L 1 and L 2 be two partial lists of length respectively l 1 l 2 whose elements belong to F Let d be a distance on permutation groups. Let S τ be the set of all the dual lists of the elements in S L, so that if α S τ, then α(i) = τ(i) for all indices i L. Define the distance between L 1 and L 2 as a function of the distance on the two quotient groups, i.e. d(l 1, L 2 ) = f ` d(α, β): α S τ1, β S τ2 = f (d(sτ1, S τ2 )), for f a function of the (p l 1 )!(p l 2 )! distances d(α, β) such that on sigletons f ({t}) = t.

21 EXTENSIONS TO PARTIAL LISTS WITH DIFFERENT LENGTH In particular, for d = Ca the Canberra distance and f the mean function: where Λ = Ca(L 1, L 2 ) = 1 1 X X Ca(α, β) S τ1 S τ2 α S τ1 β S τ2 = Λ X X Ca(α, β) α S τ1 β S τ2 1 (p l 1 )!(p l 2 )!. = Λ X = Λ X px α S τ1 β S τ2 i=1 X α S p,β S p α(i)=τ 1 (i) if i L 1 β(i)=τ 2 (i) if i L 2 α(i) β(i) α(i) + β(i) px i=1 α(i) β(i) α(i) + β(i),

22 EXTENSIONS TO PARTIAL LISTS WITH DIFFERENT LENGTH Expanding the formula, we get for Ca(L 1, L 2 ): where X i L 1 L 2 τ1 (i) τ 2 (i) τ 1 (i) + τ 2 (i) (l 2 + 1, p, τ 1 (i)) (l «1 + 1, p, τ 2 (i)) p l 2 p l p l 2 `l1 (p l 2 ) 2ε p(l 1 ) + 2ε l2 (l 1 ) + 1 p l 1 `l1 (p l 1 ) : +4ε l1 (l 1 ) + 2ξ(l 2 ) 2ξ(l 1 ) 2ε l1 (l 2 ) 2ε p(l 2 )+ (p + l 1 )(l 2 l 1 ) + l 1 (l 1 + 1) l 2 (l 2 + 1) + A `2ξ(p) 2ξ(l 2 ) 2ε l1 (p) + 2ε l1 (l 2 ) 2ε p(p) + 2ε p(l 2 ) + (p + l 1 )(p l 2 )+ l 2 (l 2 + 1) p(p + 1)), A = F \ (L 1 L 2 ) (p l 1 )(p l 2 ) (a, b, c) = X a i b c i c + i sx ε k (s) = jh j+k j=1 Setting A = 0 we neglect the (preminent) contribution of the not included features. We call core the formula thus obtained, and complete the original one.

23 DISTANCE MATRIX AND DISTRIBUTION Given a set of lists L = {L t } b t=1 of p genes and an integer computation of all mutual top-k distances leads to the construction of a symmetric distance matrix M k M(b b, R + ). Then the corresponding histogram can be built Often the histogram can be approximated by a gaussian distribution n = 100, p = 500, b = 100, SVM-RFE, D = C INDICATORS It is thus possible to use the mean (and the variance) of the matrix M k as measures of the intrinsic distance of the top-k lists of the set L.

24 STABILITY Intuitively, the more mutually different the lists, the more unstable the problem (either because of the data or because of the employed classification procedure). STABILITY CURVE Given a set of list L on p genes, decide a sequence K of relevant gene subset dimensions. For each k K, compute the mean and the variance of M k (possibly normalized by k). Plot those values versus the sequence K : this may be computationally heavy, depending on p and on the number of lists. mean Stability curves k solid: p = dotted: p = The above procedure is independent from the source generating the lists and from the dimensions of the gene expression data.

25 DEFINITION Let µ k = 2 B(B 1) P2 i<j B (M k ) ij be the mean of all the B(B 1) non-trivial values of 2 the distance matrix M k M(B B, R + ). THE MEAN LIST DISTANCE INDICATOR is the sequence I Dµ (L) = {(k, µ k ): 1 k p}. THE NO-INFORMATION CURVE is the sequence I Dµ (S p) = {(k, E{Ca (k+1) } Sp ): 1 k p} associated to the group S p of all possible dual lists with p features. An indication of the stability of a list set L is given by considering (! ) µ k Î Dµ = k, E{Ca (k+1) } Sp : 1 k p. independent from p and k.

26 PREDICTIVE PROFILING ON SYNTHETIC DATA Synthetic datasets fx/y : binary labelled samples described by X discriminant features from N (µ, σ) (µ depending on the sample label) and by Y X random features from the uniform U[a, b]. DAP: B = 400 LinSVM/RFE exps, with correlation modules M1 (>0.8) and M2 (>0.7). On both datasets, less than five features are sufficient to reach perfect classification. I Dµ I Dµ Stability indicators in profiling synthetic data (solid lines: f 10/100; dotted lines: f 30/100) and in presence of feature modules.

27 CLASSIFIERS COMPARISON Classifiers: Linear SVM / Terminated ramp SVM Feature ranking methods: (E )RFE / 1 RFE Datasets: Microarray Breast Cancer cases described by gene expression / Proteomic Ovarian Cancer samples described by 123 mass spec peaks. Breast Cancer Ovarian Cancer Average Test Error Average Test Error LinearSVM/RFE (LR: dashed) / LinearSVM/1RFE (L1: dotted) / TRSVM/1RFE (T1: solid)

28 CLASSIFIERS COMPARISON Classifiers: Linear SVM / Terminated ramp SVM Feature ranking methods: (E )RFE / 1 RFE Datasets: Microarray Breast Cancer cases described by gene expression / Proteomic Ovarian Cancer samples described by 123 mass spec peaks. Breast Cancer Ovarian Cancer Mean distance I Dµ Mean distance I Dµ LinearSVM/RFE (LR: dashed) / LinearSVM/1RFE (L1: dotted) / TRSVM/1RFE (T1: solid)

29 CLASSIFIERS COMPARISON Classifiers: Linear SVM / Terminated ramp SVM Feature ranking methods: (E )RFE / 1 RFE Datasets: Microarray Breast Cancer cases described by gene expression / Proteomic Ovarian Cancer samples described by 123 mass spec peaks. Breast Cancer Ovarian Cancer I Dµ top-25 I Dµ top-25 LinearSVM/RFE (LR: dashed) / LinearSVM/1RFE (L1: dotted) / TRSVM/1RFE (T1: solid)

30 APPLICATION TO ENRICHMENT If a library of functionally correlated gene sets (e.g. pathways) is available, the stability indicator can be used to compare the stability of the original lists with the stability of the ranked gene sets produced by their enrichment. A dishomogeneous set of lists may be shown to correspond to a much stabler lists of pathways. Example on synthetic data with 100 lists of 100 genes and 124 pathways, where the lists are created by random permutations fixing the gene sets.

31 FILTER FEATURE SELECTIONS Dataset Tib100: samples and 100 normally distributed features, with decreasing discriminant power. Fix a number n of samples and a suitable threshold θ for the employed filtering algorithm A Randomly extract from Tib100 dataset n positive and n negative examples and compute the A statistics. The statistics considered are Fold Change (FC), Significance Analysis of Microarray (SAM), B statistics, F statistics, t, mod F and mod t. Consider the list of the features whose value of the statistic is above the threshold θ, ranked by statistic value; repeat the above process for B times. Final output: set of all lists L(n, A, θ, i), where the number of samples n ranges between 5 and 45, an ad-hoc sets of 100 values for θ are chosen and B = 100.

32 FILTER FEATURE SELECTIONS Most researcher agree on the fact that SAM should outperform all other methods because of its ability in controlling the false discovery rate. Complete Core B statistics B statistics SAM statistics SAM statistics

33 ACCURACY AND STABILITY I Although they share a common trend in many cases, accuracy and stability are independent measures. Thus we can analyze a dataset in the accuracy vs. stability space. Left-down direction indicates better performance. This diagnostic plot allows the comparison of different datasets, different profiling methods (classifiers/feature ranking algorithm) and different models. ATE vs. ÎDµ for different profiling methods (LE,L1,T1) and cancer datasets (Breast, Ovarian); each point corresponds to a feature subset size, indicated for extremal models

34 ACCURACY AND STABILITY II US-FDA MAQC-II - sources of variability in microarray experiments: 6 datasets, 13 endpoints (A-M). Analysis of swapped training/validation experiments I F A G J Performance-Stability analysis for the candidate models at endpoints( both Blind and Swap) M D B E C H The MCC coordinate is the median of the MCCs for endpoint and same experiment. K Canb. Complete 0.90 L MCC = TP TN FP FN p (TP + FP)(TP + FN)(TN + FP)(TN + FN) 0.85 For each point, the rectangles are defined by CI (95% bootstrap Student s t) MCC Stability consistently predicts level of MCC performance, i.e. endpoints for which lists change less (between Blind and Swap) have higher MCC scores.

35 ACCURACY AND STABILITY II US-FDA MAQC-II - sources of variability in microarray experiments: 6 datasets, 13 endpoints (A-M). Analysis of swapped training/validation experiments. Inter-experiment list stability compared with MCC. For each pair (edp, exp) the partial top- med(edp, exp) Borda list B(edp, exp) is used in alternate experiment. Swap experiment is run with features belonging to B(exp,Blind) retrieving a MCCBlind(Swap) value MCC_Blind(Swap) MCC_Swap(Blind) Analogously, we compute the MCCSwap(Blind) for the same endpoint. InterCC on the x axis ranks endpoints for increasing list variability between experiments. Predictive Borda MCC L C H E B K D J G A F I M InterCC Ranked Endpoints Stability consistently predicts level of MCC performance, i.e. endpoints for which lists change less (between Blind and Swap) have higher MCC scores.

36 REFERENCES [Borda81] Borda, J. Mémoire sur les élections au scrutin. Histoire de l Académie Royale des Sciences, Année M. DCCLXXXI, [Critchlow85] Critchlow, D. Metric methods for analyzing partially ranked data. Lecture Notes in Statistics, 34, Springer-Verlag, Berlin, [Diaconis88] Diaconis, P. Group representations in probability and statistics. Institute of Mathematical Statistics Lecture Notes Monograph Series 11, [Dwork01] Dwork, C., Kumar, R., Naor, M. & Sivakumar, D. Rank aggregation revisited, In WWW10. [Fagin04] Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D. & Vee, E. Comparing and Aggregating Rankings with Ties. In Proc ACM Symposium on Principles of Database Systems (PODS 04), 47 58, [Saari01] Saari, D. Chaotic Elections! A Mathematician Looks at Voting. American Mathematical Society, [Simon06] Simon, R. Development and Evaluation of Therapeutically Relevant Predictive Classifiers Using Gene Expression Profiling. J Natl Cancer Inst., 98(17), 2006.

COMPARING PARTIAL RANKINGS. Laura Wilke Mathematics Honors Thesis Advisor: Professor David Meyer, Ph.D UCSD May 13, 2014

COMPARING PARTIAL RANKINGS. Laura Wilke Mathematics Honors Thesis Advisor: Professor David Meyer, Ph.D UCSD May 13, 2014 COMPARING PARTIAL RANKINGS Laura Wilke Mathematics Honors Thesis Advisor: Professor David Meyer, Ph.D UCSD May 13, 2014 Astract. A partial ranking is a partially known total ordering of a set of ojects,

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Compression in the Space of Permutations

Compression in the Space of Permutations Compression in the Space of Permutations Da Wang Arya Mazumdar Gregory Wornell EECS Dept. ECE Dept. Massachusetts Inst. of Technology Cambridge, MA 02139, USA {dawang,gww}@mit.edu University of Minnesota

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

COMPARING TOP k LISTS

COMPARING TOP k LISTS SIAM J. DISCRETE MATH. Vol. 17, No. 1, pp. 134 160 c 2003 Society for Industrial and Applied Mathematics COMPARING TOP k LISTS RONALD FAGIN, RAVI KUMAR, AND D. SIVAKUMAR Abstract. Motivated by several

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

The Nearest Neighbor Spearman Footrule Distance for Bucket, Interval, and Partial Orders

The Nearest Neighbor Spearman Footrule Distance for Bucket, Interval, and Partial Orders The Nearest Neighbor Spearman Footrule Distance for Bucket, Interval, and Partial Orders Franz J. Brandenburg, Andreas Gleißner, and Andreas Hofmeier Department of Informatics and Mathematics, University

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Generative MaxEnt Learning for Multiclass Classification

Generative MaxEnt Learning for Multiclass Classification Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Maximum topological entropy of permutations and cycles. Deborah King. University of Melbourne. Australia

Maximum topological entropy of permutations and cycles. Deborah King. University of Melbourne. Australia Maximum topological entropy of permutations and cycles Deborah King University of Melbourne Australia A review of results including joint work with John Strantzen, Lluis Alseda, David Juher. 1 A finite,

More information

Rank aggregation in cyclic sequences

Rank aggregation in cyclic sequences Rank aggregation in cyclic sequences Javier Alcaraz 1, Eva M. García-Nové 1, Mercedes Landete 1, Juan F. Monge 1, Justo Puerto 1 Department of Statistics, Mathematics and Computer Science, Center of Operations

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

CSC 411: Lecture 03: Linear Classification

CSC 411: Lecture 03: Linear Classification CSC 411: Lecture 03: Linear Classification Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 03-Classification 1 / 24 Examples of Problems What

More information

A note on top-k lists: average distance between two top-k lists

A note on top-k lists: average distance between two top-k lists A note on top-k lists: average distance between two top-k lists Antoine Rolland Univ Lyon, laboratoire ERIC, université Lyon, 69676 BRON, antoine.rolland@univ-lyon.fr Résumé. Ce papier présente un complément

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Algorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric

Algorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric Axioms of a Metric Picture analysis always assumes that pictures are defined in coordinates, and we apply the Euclidean metric as the golden standard for distance (or derived, such as area) measurements.

More information

ESS2222. Lecture 4 Linear model

ESS2222. Lecture 4 Linear model ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)

More information

A Result of Vapnik with Applications

A Result of Vapnik with Applications A Result of Vapnik with Applications Martin Anthony Department of Statistical and Mathematical Sciences London School of Economics Houghton Street London WC2A 2AE, U.K. John Shawe-Taylor Department of

More information

arxiv: v1 [cs.db] 2 Sep 2014

arxiv: v1 [cs.db] 2 Sep 2014 An LSH Index for Computing Kendall s Tau over Top-k Lists Koninika Pal Saarland University Saarbrücken, Germany kpal@mmci.uni-saarland.de Sebastian Michel Saarland University Saarbrücken, Germany smichel@mmci.uni-saarland.de

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Plan Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Exercise: Example and exercise with herg potassium channel: Use of

More information

DISTINGUISH HARD INSTANCES OF AN NP-HARD PROBLEM USING MACHINE LEARNING

DISTINGUISH HARD INSTANCES OF AN NP-HARD PROBLEM USING MACHINE LEARNING DISTINGUISH HARD INSTANCES OF AN NP-HARD PROBLEM USING MACHINE LEARNING ZHE WANG, TONG ZHANG AND YUHAO ZHANG Abstract. Graph properties suitable for the classification of instance hardness for the NP-hard

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Statistical Theory 1

Statistical Theory 1 Statistical Theory 1 Set Theory and Probability Paolo Bautista September 12, 2017 Set Theory We start by defining terms in Set Theory which will be used in the following sections. Definition 1 A set is

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions 58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Riemannian geometry of surfaces

Riemannian geometry of surfaces Riemannian geometry of surfaces In this note, we will learn how to make sense of the concepts of differential geometry on a surface M, which is not necessarily situated in R 3. This intrinsic approach

More information

Fuzzy order-equivalence for similarity measures

Fuzzy order-equivalence for similarity measures Fuzzy order-equivalence for similarity measures Maria Rifqi, Marie-Jeanne Lesot and Marcin Detyniecki Abstract Similarity measures constitute a central component of machine learning and retrieval systems,

More information

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I 1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

(i) The optimisation problem solved is 1 min

(i) The optimisation problem solved is 1 min STATISTICAL LEARNING IN PRACTICE Part III / Lent 208 Example Sheet 3 (of 4) By Dr T. Wang You have the option to submit your answers to Questions and 4 to be marked. If you want your answers to be marked,

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION

TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION Gábor J. Székely Bowling Green State University Maria L. Rizzo Ohio University October 30, 2004 Abstract We propose a new nonparametric test for equality

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Surrogate regret bounds for generalized classification performance metrics

Surrogate regret bounds for generalized classification performance metrics Surrogate regret bounds for generalized classification performance metrics Wojciech Kotłowski Krzysztof Dembczyński Poznań University of Technology PL-SIGML, Częstochowa, 14.04.2016 1 / 36 Motivation 2

More information

Applications of multi-class machine

Applications of multi-class machine Applications of multi-class machine learning models to drug design Marvin Waldman, Michael Lawless, Pankaj R. Daga, Robert D. Clark Simulations Plus, Inc. Lancaster CA, USA Overview Applications of multi-class

More information

Introduction to Supervised Learning. Performance Evaluation

Introduction to Supervised Learning. Performance Evaluation Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Immerse Metric Space Homework

Immerse Metric Space Homework Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

FEATURE SELECTION COMBINED WITH RANDOM SUBSPACE ENSEMBLE FOR GENE EXPRESSION BASED DIAGNOSIS OF MALIGNANCIES

FEATURE SELECTION COMBINED WITH RANDOM SUBSPACE ENSEMBLE FOR GENE EXPRESSION BASED DIAGNOSIS OF MALIGNANCIES FEATURE SELECTION COMBINED WITH RANDOM SUBSPACE ENSEMBLE FOR GENE EXPRESSION BASED DIAGNOSIS OF MALIGNANCIES Alberto Bertoni, 1 Raffaella Folgieri, 1 Giorgio Valentini, 1 1 DSI, Dipartimento di Scienze

More information

ABELIAN SELF-COMMUTATORS IN FINITE FACTORS

ABELIAN SELF-COMMUTATORS IN FINITE FACTORS ABELIAN SELF-COMMUTATORS IN FINITE FACTORS GABRIEL NAGY Abstract. An abelian self-commutator in a C*-algebra A is an element of the form A = X X XX, with X A, such that X X and XX commute. It is shown

More information

Algorithm Independent Topics Lecture 6

Algorithm Independent Topics Lecture 6 Algorithm Independent Topics Lecture 6 Jason Corso SUNY at Buffalo Feb. 23 2009 J. Corso (SUNY at Buffalo) Algorithm Independent Topics Lecture 6 Feb. 23 2009 1 / 45 Introduction Now that we ve built an

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

MIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:???

MIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:??? MIT 6.972 Algebraic techniques and semidefinite optimization May 9, 2006 Lecture 2 Lecturer: Pablo A. Parrilo Scribe:??? In this lecture we study techniques to exploit the symmetry that can be present

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

An Introduction to Statistical Machine Learning - Theoretical Aspects -

An Introduction to Statistical Machine Learning - Theoretical Aspects - An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2013/2014 Lesson 18 23 April 2014 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005

More information

Lecture: Aggregation of General Biased Signals

Lecture: Aggregation of General Biased Signals Social Networks and Social Choice Lecture Date: September 7, 2010 Lecture: Aggregation of General Biased Signals Lecturer: Elchanan Mossel Scribe: Miklos Racz So far we have talked about Condorcet s Jury

More information

UVA CS 4501: Machine Learning

UVA CS 4501: Machine Learning UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Borda count approximation of Kemeny s rule and pairwise voting inconsistencies

Borda count approximation of Kemeny s rule and pairwise voting inconsistencies Borda count approximation of Kemeny s rule and pairwise voting inconsistencies Eric Sibony LTCI UMR No. 5141, Telecom ParisTech/CNRS Institut Mines-Telecom Paris, 75013, France eric.sibony@telecom-paristech.fr

More information

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] 8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] While cross-validation allows one to find the weight penalty parameters which would give the model good generalization capability, the separation of

More information

Machine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77

Machine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77 Machine Learning Ludovic Samper Antidot September 1st, 2015 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77 Antidot Software vendor since 1999 Paris, Lyon, Aix-en-Provence 45 employees

More information

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier Classifier Learning Curve How to estimate classifier performance. Learning curves Feature curves Rejects and ROC curves True classification error ε Bayes error ε* Sub-optimal classifier Bayes consistent

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012 CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information