ALGEBRAIC STABILITY INDICATORS FOR RANKED LISTS IN MOLECULAR PROFILING
|
|
- Junior Murphy
- 6 years ago
- Views:
Transcription
1 ALGEBRAIC STABILITY INDICATORS FOR RANKED LISTS IN MOLECULAR PROFILING Giuseppe MPBA Fondazione Bruno Kessler Trento Italy Sandpit on the Application of Machine Learning Techniques to the Analysis of Complex Biomedical Data London, 6th July 2009
2 THE PROBLEM M(n p, R) n 10 3 samples p 10 5 genes / 10 6 SNPs A result of a classification/ranking procedure is a list of features, ranked according to their relevance for the classifier. To ensure repeatability and not to incur in overfitting due to information leakage phenomena and such as the selection bias, methodology has to be carefully designed.
3 THE PROBLEM M(n p, R) n 10 3 samples p 10 5 genes / 10 6 SNPs A result of a classification/ranking procedure is a list of features, ranked according to their relevance for the classifier. To ensure repeatability and not to incur in overfitting due to information leakage phenomena and such as the selection bias, methodology has to be carefully designed. Error curve for the Colon cancer dataset (left) with true labels (blue) and random labels (black).
4 THE PROBLEM M(n p, R) n 10 3 samples p 10 5 genes / 10 6 SNPs A result of a classification/ranking procedure is a list of features, ranked according to their relevance for the classifier. To ensure repeatability and not to incur in overfitting due to information leakage phenomena and such as the selection bias, methodology has to be carefully designed. Error curve for the two synthetic datasets (right) f , with 1000 discriminant features (black) and f (blue) with no discriminant features.
5 THE PROBLEM M(n p, R) n 10 3 samples p 10 5 genes / 10 6 SNPs A result of a classification/ranking procedure is a list of features, ranked according to their relevance for the classifier. To ensure repeatability and not to incur in overfitting due to information leakage phenomena and such as the selection bias, methodology has to be carefully designed. Result of a multi-institution analysis of papers about gene expression profiling in a leading journal: Inability to reproduce the analysis > 50% Partial reproduction in 1/3 Perfect reproduction in 11%
6 THE PROBLEM Solution: use a correctly planned Data Analysis Protocol (DAP) implementing a resampling scheme. At each run i=1... B an ordered list L i is produced. M(B d, N), B 10 3 THE SET OF LISTS L = {L i } B i=1 is the set of all lists and, for each list, L k i is its top-k list. i.e. the sublist consisting of the first ranked k elements from L i. What information can be derived from the structure of the feature lists set?
7 PERMUTATION GROUP THEORY GROUP A group is a set together with an associative operation which admits an identity element and such that every element has an inverse. PERMUTATIONS A permutation on a set Ω n = {s 1, s 2,..., s n} of n elements is a bijection τ : Ω n Ω n s i τ(s i ) = s j The number of different permutations on set of n elements is n!. SYMMETRIC GROUP The set of all permutations on a set Ω n (usually Ω n = N [1, n]) becomes a group S n, called permutation group or symmetric group when endowed with the operation στ : Ω n Ω n s i σ(τ(s i )) = s k
8 LIST REPRESENTATION Example: The symmetric group S 3 consists of {1 1, 2 2, 3 3} {1 2, 2 1, 3 3} {1 3, 2 2, 3 1} {1 1, 2 3, 3 2} {1 2, 2 3, 3 1} {1 3, 2 1, 3 2} REPRESENTATION Labelling the n genes involved in a problem by N [1, n], every ranked list L t can be identified as a permutation τ t S n, the image τ t (i) of the i-th gene being its ranking inside the list L t. Gene Ω n τ t (Ω n) AFFX-MurIL2_at _at n 175 Thus τ can be written as τ = (τ(1),, τ(n)) = (13347,, 175).
9 METRICS ON PERMUTATION GROUPS DEFINITION A map d defined on a permutation group S d : S S R is a distance function or a metric if it satifies the following axioms: NON NEGATIVITY d(σ, τ) 0 SEPARATION d(σ, τ) = 0 σ = τ SYMMETRY d(σ, τ) = d(τ, σ) σ, τ S σ, τ S σ, τ S TRIANGULAR d(σ, τ) + d(τ, ρ) > d(ρ, σ) RIGHT INVARIANCE d(σ, τ) = d(σρ, τρ) σ, τ, ρ S σ, τ, ρ S Triangularity might sometimes be relaxed to some less strict requirements (e.g. polygonal inequalities) that makes d a pseudometric.
10 RIGHT INVARIANCE The fifth property d(σ, τ) = d(σρ, τρ) σ, τ, ρ S is an essential requirements for distance functions in permutation group spaces. AN EXAMPLE L 1 = {A, C, D, B, E} and L 2 = {B, C, D, E, A} via the identification A1, B2, C3, D4, E5 correspond to τ 1 = (1, 4, 2, 3, 5), τ 2 = (5, 1, 2, 3, 4). Apply the permutation ρ corresponding to the relabeling A2, B5, C4, D1, E3 Then τ 1 ρ = (3, 1, 5, 2, 4) and τ 2 ρ = (3, 5, 4, 2, 1). Right-invariance corresponds to relabeling independence. The Euclidean distance d(τ, σ) = q Pn i=1 (τ 1 (i) σ 1 (i)) 2 is not right-invariant.
11 STATISTICAL DISTANCES SPEARMAN S FOOTRULE It is the L 1 distance between permutations: F (σ, τ) = P n i=1 σ(i) τ(i) SPEARMAN S RHO q Pn It is the L 2 distance between permutations: R(σ, τ) = i=1 (σ(i) τ(i))2 KENDALL S TAU It is the minimum number of pairwise adjacent transpositions (swaps) needed to transform σ into τ: K (σ, τ) = Card{(i, j) Ω 2 n : σ(i) < σ(j) and τ(i) > τ(j)} All these measures are metric.
12 THE CANBERRA DISTANCE A key fact in gene lists is that variations in the bottom part of the lists are much less relevant than differences in the top part. Thus a weight factor is needed to take care of such requirement. THE CANBERRA DISTANCE Ca(τ, σ) = nx i=1 τ(i) σ(i) τ(i) + σ(i) A natural candidate to substitute Spearman s footrule is its weighted version.
13 Let H s = sx j=1 1 j = log(s) + γ + 1 2s 1 12s 2 + ɛs 120s 4 MEAN be the s-th harmonic number with a famous approximation, where 0 < ɛ s < 1 and γ is the Eulero-Mascheroni constant. Many closed forms for sum and products involving H s have been found only very recently. THEOREM The expected value for the Canberra distance is E{Ca} Sp = 2p «H 2p 2p Its approximation is 2p p Ê{Ca} Sp = (p + 1) log(4) (p + 2) + o(1). «H p n + 3 «. 2
14 VARIANCE Let and px f p(i) = 1 p j=1 c p(i, j) = i j i + j c p(i, j) = 2i p (2H 2i H i H p+i 1) THEOREM The variance cannot be written in a entirely closed form Its approximation is V {Ca} Sp = 1 p 1 px cp(i, 2 j) + E{Ca} Sp 2c p(i, j)f p(j) i,j=1 px px = P(H p, H 2p ) + i 2 H p+i H 2i + i 2 H p+i H i. i=1 i=1 ˆV {Ca} Sp = αp + β log(p) + δ + o(p 1 ), α, β, δ R
15 DISTRIBUTION Let Since then d p(i, j) = c p(i, j) 1 px c p(i, l) 1 px c p(m, j) + 1 X px p p p 2 p c p(m, l). l=1 m=1 l=1 m=1 max d p 2 (i, i) = dp 2 (1, 1) = 1 E{Ca} Sp 2p 4 4H p+1, 1 i,j p p thus by Hoeffding theorem we have THEOREM max 1 i,j p dp 2 (i, i) lim p p 1 p V {Ca} = 0, S p The distribution of the Canberra distance on S p is asymptotically normal.
16 The problem ρ = arg max S p (Ca I ) has one solution ρ M for p even and two solutions for p odd, namely 8 p 2Y >< i=1 ρ M = >: θ = i p 2 + i 1 p p p while the corresponding maximum value is 2 p 1 2 UPPER BOUND for even p «+ 2 and θ 1 for odd p, max(ca) = Ca I (ρ M ) = S p 8 2r(H 3r H r ) if p = 4r >< (2r + 1)H 6r+1 + rh 3r+1 r + 1 H 2 3r (2r + 1)H 2r Hr if p = 4r (2r + 1)(2H 6r+3 H 3r+1 2H 2r+1 + H r ) if p = 4r + 2 >: (2r + 1)H 6r H 3r+2 (2r + 1)H 2r+1 (r + 1)H r+1 + r + 1 H 2 r if p = 4r + 3
17 DISTANCES ON TOP-k LISTS Since most important genes are located in the upper part of the lists, it is also important to be able to compare those more important portions of the lists. Managing partial lists is much harder than global lists, since they do not necessarily involve all the same elements. TOP k -LIST A top-k list is the sublist including the elements ranking from position 1 to k of the original list, which corresponds to induce a partial ordering on k out of n objects. This has a natural counterpart into the group-theoretical framework by Hausdorff topology: DISTANCE WITH LOCATION PARAMETER Consider a global distance D Consider two top-k lists τ m for m = 1, 2, involving respectively the subsets D τ 1 = τm 1 (N [1, k]) of Ω n ; m ( τm(i) Define the global list τ m(i) for i D = τ 1 m k + 1 otherwise Define D (k+1) (τ 1, τ 2 ) = D(τ 1, τ 2 ).
18 CANBERRA DISTANCE WITH LOCATION PARAMETER Ca (k+1) (τ, σ) = px i=1 min{τ(i), k + 1} min{σ(i), k + 1} min{τ(i), k + 1} + min{σ(i), k + 1}. THEOREM The expected (average) value of the Canberra metric on S n is E{Ca (k+1) } Sp = k 2k «H 2k 2k «H k k + 3 ««p 2k 4k 2 2(p k) + (2(k + 1)(H 2k+1 H k+1 k), p which can be approximated up to terms o(1) as Ê{Ca (k+1) } Sp = (k + 1)(2p k) p For p 10 4, E{Ca (k+1) } Sp Ê{Ca(k+1) } Sp kp + 3p k k 2 log(4). p
19 INCLUDING MODULES Moreover, the existence of feature modules, for instance highly correlated genes (e.g. clones), or genes belonging to the same pathway must be taken into account. Classifiers as SVM tend to swap the relative importance of correlated features during different ranking processes, and those swaps should be less penalized than movements between uncorrelated genes. MODULES-AWARE DISTANCES Let M = {g 1,..., g m} Ω p = {1,..., p} be a module consisting of m features. For each permutation τ L, define the permutation η M S m by the property η M (i) < η M (j) if τ(g ηm (i)) < τ(g ηm (j)). Then define the new permutation τ M as τ M (g i ) = τ(g ηm (i)), leaving τ M Ωp \M = τ out of M. The distance is then computed on L M = {τ M : τ L}.
20 EXTENSIONS TO PARTIAL LISTS WITH DIFFERENT LENGTH Let L 1 and L 2 be two partial lists of length respectively l 1 l 2 whose elements belong to F Let d be a distance on permutation groups. Let S τ be the set of all the dual lists of the elements in S L, so that if α S τ, then α(i) = τ(i) for all indices i L. Define the distance between L 1 and L 2 as a function of the distance on the two quotient groups, i.e. d(l 1, L 2 ) = f ` d(α, β): α S τ1, β S τ2 = f (d(sτ1, S τ2 )), for f a function of the (p l 1 )!(p l 2 )! distances d(α, β) such that on sigletons f ({t}) = t.
21 EXTENSIONS TO PARTIAL LISTS WITH DIFFERENT LENGTH In particular, for d = Ca the Canberra distance and f the mean function: where Λ = Ca(L 1, L 2 ) = 1 1 X X Ca(α, β) S τ1 S τ2 α S τ1 β S τ2 = Λ X X Ca(α, β) α S τ1 β S τ2 1 (p l 1 )!(p l 2 )!. = Λ X = Λ X px α S τ1 β S τ2 i=1 X α S p,β S p α(i)=τ 1 (i) if i L 1 β(i)=τ 2 (i) if i L 2 α(i) β(i) α(i) + β(i) px i=1 α(i) β(i) α(i) + β(i),
22 EXTENSIONS TO PARTIAL LISTS WITH DIFFERENT LENGTH Expanding the formula, we get for Ca(L 1, L 2 ): where X i L 1 L 2 τ1 (i) τ 2 (i) τ 1 (i) + τ 2 (i) (l 2 + 1, p, τ 1 (i)) (l «1 + 1, p, τ 2 (i)) p l 2 p l p l 2 `l1 (p l 2 ) 2ε p(l 1 ) + 2ε l2 (l 1 ) + 1 p l 1 `l1 (p l 1 ) : +4ε l1 (l 1 ) + 2ξ(l 2 ) 2ξ(l 1 ) 2ε l1 (l 2 ) 2ε p(l 2 )+ (p + l 1 )(l 2 l 1 ) + l 1 (l 1 + 1) l 2 (l 2 + 1) + A `2ξ(p) 2ξ(l 2 ) 2ε l1 (p) + 2ε l1 (l 2 ) 2ε p(p) + 2ε p(l 2 ) + (p + l 1 )(p l 2 )+ l 2 (l 2 + 1) p(p + 1)), A = F \ (L 1 L 2 ) (p l 1 )(p l 2 ) (a, b, c) = X a i b c i c + i sx ε k (s) = jh j+k j=1 Setting A = 0 we neglect the (preminent) contribution of the not included features. We call core the formula thus obtained, and complete the original one.
23 DISTANCE MATRIX AND DISTRIBUTION Given a set of lists L = {L t } b t=1 of p genes and an integer computation of all mutual top-k distances leads to the construction of a symmetric distance matrix M k M(b b, R + ). Then the corresponding histogram can be built Often the histogram can be approximated by a gaussian distribution n = 100, p = 500, b = 100, SVM-RFE, D = C INDICATORS It is thus possible to use the mean (and the variance) of the matrix M k as measures of the intrinsic distance of the top-k lists of the set L.
24 STABILITY Intuitively, the more mutually different the lists, the more unstable the problem (either because of the data or because of the employed classification procedure). STABILITY CURVE Given a set of list L on p genes, decide a sequence K of relevant gene subset dimensions. For each k K, compute the mean and the variance of M k (possibly normalized by k). Plot those values versus the sequence K : this may be computationally heavy, depending on p and on the number of lists. mean Stability curves k solid: p = dotted: p = The above procedure is independent from the source generating the lists and from the dimensions of the gene expression data.
25 DEFINITION Let µ k = 2 B(B 1) P2 i<j B (M k ) ij be the mean of all the B(B 1) non-trivial values of 2 the distance matrix M k M(B B, R + ). THE MEAN LIST DISTANCE INDICATOR is the sequence I Dµ (L) = {(k, µ k ): 1 k p}. THE NO-INFORMATION CURVE is the sequence I Dµ (S p) = {(k, E{Ca (k+1) } Sp ): 1 k p} associated to the group S p of all possible dual lists with p features. An indication of the stability of a list set L is given by considering (! ) µ k Î Dµ = k, E{Ca (k+1) } Sp : 1 k p. independent from p and k.
26 PREDICTIVE PROFILING ON SYNTHETIC DATA Synthetic datasets fx/y : binary labelled samples described by X discriminant features from N (µ, σ) (µ depending on the sample label) and by Y X random features from the uniform U[a, b]. DAP: B = 400 LinSVM/RFE exps, with correlation modules M1 (>0.8) and M2 (>0.7). On both datasets, less than five features are sufficient to reach perfect classification. I Dµ I Dµ Stability indicators in profiling synthetic data (solid lines: f 10/100; dotted lines: f 30/100) and in presence of feature modules.
27 CLASSIFIERS COMPARISON Classifiers: Linear SVM / Terminated ramp SVM Feature ranking methods: (E )RFE / 1 RFE Datasets: Microarray Breast Cancer cases described by gene expression / Proteomic Ovarian Cancer samples described by 123 mass spec peaks. Breast Cancer Ovarian Cancer Average Test Error Average Test Error LinearSVM/RFE (LR: dashed) / LinearSVM/1RFE (L1: dotted) / TRSVM/1RFE (T1: solid)
28 CLASSIFIERS COMPARISON Classifiers: Linear SVM / Terminated ramp SVM Feature ranking methods: (E )RFE / 1 RFE Datasets: Microarray Breast Cancer cases described by gene expression / Proteomic Ovarian Cancer samples described by 123 mass spec peaks. Breast Cancer Ovarian Cancer Mean distance I Dµ Mean distance I Dµ LinearSVM/RFE (LR: dashed) / LinearSVM/1RFE (L1: dotted) / TRSVM/1RFE (T1: solid)
29 CLASSIFIERS COMPARISON Classifiers: Linear SVM / Terminated ramp SVM Feature ranking methods: (E )RFE / 1 RFE Datasets: Microarray Breast Cancer cases described by gene expression / Proteomic Ovarian Cancer samples described by 123 mass spec peaks. Breast Cancer Ovarian Cancer I Dµ top-25 I Dµ top-25 LinearSVM/RFE (LR: dashed) / LinearSVM/1RFE (L1: dotted) / TRSVM/1RFE (T1: solid)
30 APPLICATION TO ENRICHMENT If a library of functionally correlated gene sets (e.g. pathways) is available, the stability indicator can be used to compare the stability of the original lists with the stability of the ranked gene sets produced by their enrichment. A dishomogeneous set of lists may be shown to correspond to a much stabler lists of pathways. Example on synthetic data with 100 lists of 100 genes and 124 pathways, where the lists are created by random permutations fixing the gene sets.
31 FILTER FEATURE SELECTIONS Dataset Tib100: samples and 100 normally distributed features, with decreasing discriminant power. Fix a number n of samples and a suitable threshold θ for the employed filtering algorithm A Randomly extract from Tib100 dataset n positive and n negative examples and compute the A statistics. The statistics considered are Fold Change (FC), Significance Analysis of Microarray (SAM), B statistics, F statistics, t, mod F and mod t. Consider the list of the features whose value of the statistic is above the threshold θ, ranked by statistic value; repeat the above process for B times. Final output: set of all lists L(n, A, θ, i), where the number of samples n ranges between 5 and 45, an ad-hoc sets of 100 values for θ are chosen and B = 100.
32 FILTER FEATURE SELECTIONS Most researcher agree on the fact that SAM should outperform all other methods because of its ability in controlling the false discovery rate. Complete Core B statistics B statistics SAM statistics SAM statistics
33 ACCURACY AND STABILITY I Although they share a common trend in many cases, accuracy and stability are independent measures. Thus we can analyze a dataset in the accuracy vs. stability space. Left-down direction indicates better performance. This diagnostic plot allows the comparison of different datasets, different profiling methods (classifiers/feature ranking algorithm) and different models. ATE vs. ÎDµ for different profiling methods (LE,L1,T1) and cancer datasets (Breast, Ovarian); each point corresponds to a feature subset size, indicated for extremal models
34 ACCURACY AND STABILITY II US-FDA MAQC-II - sources of variability in microarray experiments: 6 datasets, 13 endpoints (A-M). Analysis of swapped training/validation experiments I F A G J Performance-Stability analysis for the candidate models at endpoints( both Blind and Swap) M D B E C H The MCC coordinate is the median of the MCCs for endpoint and same experiment. K Canb. Complete 0.90 L MCC = TP TN FP FN p (TP + FP)(TP + FN)(TN + FP)(TN + FN) 0.85 For each point, the rectangles are defined by CI (95% bootstrap Student s t) MCC Stability consistently predicts level of MCC performance, i.e. endpoints for which lists change less (between Blind and Swap) have higher MCC scores.
35 ACCURACY AND STABILITY II US-FDA MAQC-II - sources of variability in microarray experiments: 6 datasets, 13 endpoints (A-M). Analysis of swapped training/validation experiments. Inter-experiment list stability compared with MCC. For each pair (edp, exp) the partial top- med(edp, exp) Borda list B(edp, exp) is used in alternate experiment. Swap experiment is run with features belonging to B(exp,Blind) retrieving a MCCBlind(Swap) value MCC_Blind(Swap) MCC_Swap(Blind) Analogously, we compute the MCCSwap(Blind) for the same endpoint. InterCC on the x axis ranks endpoints for increasing list variability between experiments. Predictive Borda MCC L C H E B K D J G A F I M InterCC Ranked Endpoints Stability consistently predicts level of MCC performance, i.e. endpoints for which lists change less (between Blind and Swap) have higher MCC scores.
36 REFERENCES [Borda81] Borda, J. Mémoire sur les élections au scrutin. Histoire de l Académie Royale des Sciences, Année M. DCCLXXXI, [Critchlow85] Critchlow, D. Metric methods for analyzing partially ranked data. Lecture Notes in Statistics, 34, Springer-Verlag, Berlin, [Diaconis88] Diaconis, P. Group representations in probability and statistics. Institute of Mathematical Statistics Lecture Notes Monograph Series 11, [Dwork01] Dwork, C., Kumar, R., Naor, M. & Sivakumar, D. Rank aggregation revisited, In WWW10. [Fagin04] Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D. & Vee, E. Comparing and Aggregating Rankings with Ties. In Proc ACM Symposium on Principles of Database Systems (PODS 04), 47 58, [Saari01] Saari, D. Chaotic Elections! A Mathematician Looks at Voting. American Mathematical Society, [Simon06] Simon, R. Development and Evaluation of Therapeutically Relevant Predictive Classifiers Using Gene Expression Profiling. J Natl Cancer Inst., 98(17), 2006.
COMPARING PARTIAL RANKINGS. Laura Wilke Mathematics Honors Thesis Advisor: Professor David Meyer, Ph.D UCSD May 13, 2014
COMPARING PARTIAL RANKINGS Laura Wilke Mathematics Honors Thesis Advisor: Professor David Meyer, Ph.D UCSD May 13, 2014 Astract. A partial ranking is a partially known total ordering of a set of ojects,
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationCompression in the Space of Permutations
Compression in the Space of Permutations Da Wang Arya Mazumdar Gregory Wornell EECS Dept. ECE Dept. Massachusetts Inst. of Technology Cambridge, MA 02139, USA {dawang,gww}@mit.edu University of Minnesota
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationCOMPARING TOP k LISTS
SIAM J. DISCRETE MATH. Vol. 17, No. 1, pp. 134 160 c 2003 Society for Industrial and Applied Mathematics COMPARING TOP k LISTS RONALD FAGIN, RAVI KUMAR, AND D. SIVAKUMAR Abstract. Motivated by several
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationThe Nearest Neighbor Spearman Footrule Distance for Bucket, Interval, and Partial Orders
The Nearest Neighbor Spearman Footrule Distance for Bucket, Interval, and Partial Orders Franz J. Brandenburg, Andreas Gleißner, and Andreas Hofmeier Department of Informatics and Mathematics, University
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationGenerative MaxEnt Learning for Multiclass Classification
Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More informationMaximum topological entropy of permutations and cycles. Deborah King. University of Melbourne. Australia
Maximum topological entropy of permutations and cycles Deborah King University of Melbourne Australia A review of results including joint work with John Strantzen, Lluis Alseda, David Juher. 1 A finite,
More informationRank aggregation in cyclic sequences
Rank aggregation in cyclic sequences Javier Alcaraz 1, Eva M. García-Nové 1, Mercedes Landete 1, Juan F. Monge 1, Justo Puerto 1 Department of Statistics, Mathematics and Computer Science, Center of Operations
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationCSC 411: Lecture 03: Linear Classification
CSC 411: Lecture 03: Linear Classification Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 03-Classification 1 / 24 Examples of Problems What
More informationA note on top-k lists: average distance between two top-k lists
A note on top-k lists: average distance between two top-k lists Antoine Rolland Univ Lyon, laboratoire ERIC, université Lyon, 69676 BRON, antoine.rolland@univ-lyon.fr Résumé. Ce papier présente un complément
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationAlgorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric
Axioms of a Metric Picture analysis always assumes that pictures are defined in coordinates, and we apply the Euclidean metric as the golden standard for distance (or derived, such as area) measurements.
More informationESS2222. Lecture 4 Linear model
ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)
More informationA Result of Vapnik with Applications
A Result of Vapnik with Applications Martin Anthony Department of Statistical and Mathematical Sciences London School of Economics Houghton Street London WC2A 2AE, U.K. John Shawe-Taylor Department of
More informationarxiv: v1 [cs.db] 2 Sep 2014
An LSH Index for Computing Kendall s Tau over Top-k Lists Koninika Pal Saarland University Saarbrücken, Germany kpal@mmci.uni-saarland.de Sebastian Michel Saarland University Saarbrücken, Germany smichel@mmci.uni-saarland.de
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationPlan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.
Plan Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Exercise: Example and exercise with herg potassium channel: Use of
More informationDISTINGUISH HARD INSTANCES OF AN NP-HARD PROBLEM USING MACHINE LEARNING
DISTINGUISH HARD INSTANCES OF AN NP-HARD PROBLEM USING MACHINE LEARNING ZHE WANG, TONG ZHANG AND YUHAO ZHANG Abstract. Graph properties suitable for the classification of instance hardness for the NP-hard
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationStatistical Theory 1
Statistical Theory 1 Set Theory and Probability Paolo Bautista September 12, 2017 Set Theory We start by defining terms in Set Theory which will be used in the following sections. Definition 1 A set is
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationSupervised Machine Learning (Spring 2014) Homework 2, sample solutions
58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationRiemannian geometry of surfaces
Riemannian geometry of surfaces In this note, we will learn how to make sense of the concepts of differential geometry on a surface M, which is not necessarily situated in R 3. This intrinsic approach
More informationFuzzy order-equivalence for similarity measures
Fuzzy order-equivalence for similarity measures Maria Rifqi, Marie-Jeanne Lesot and Marcin Detyniecki Abstract Similarity measures constitute a central component of machine learning and retrieval systems,
More informationNonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I
1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More information(i) The optimisation problem solved is 1 min
STATISTICAL LEARNING IN PRACTICE Part III / Lent 208 Example Sheet 3 (of 4) By Dr T. Wang You have the option to submit your answers to Questions and 4 to be marked. If you want your answers to be marked,
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationTESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION
TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION Gábor J. Székely Bowling Green State University Maria L. Rizzo Ohio University October 30, 2004 Abstract We propose a new nonparametric test for equality
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationSurrogate regret bounds for generalized classification performance metrics
Surrogate regret bounds for generalized classification performance metrics Wojciech Kotłowski Krzysztof Dembczyński Poznań University of Technology PL-SIGML, Częstochowa, 14.04.2016 1 / 36 Motivation 2
More informationApplications of multi-class machine
Applications of multi-class machine learning models to drug design Marvin Waldman, Michael Lawless, Pankaj R. Daga, Robert D. Clark Simulations Plus, Inc. Lancaster CA, USA Overview Applications of multi-class
More informationIntroduction to Supervised Learning. Performance Evaluation
Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationImmerse Metric Space Homework
Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps
More informationBuilding a Prognostic Biomarker
Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,
More informationFEATURE SELECTION COMBINED WITH RANDOM SUBSPACE ENSEMBLE FOR GENE EXPRESSION BASED DIAGNOSIS OF MALIGNANCIES
FEATURE SELECTION COMBINED WITH RANDOM SUBSPACE ENSEMBLE FOR GENE EXPRESSION BASED DIAGNOSIS OF MALIGNANCIES Alberto Bertoni, 1 Raffaella Folgieri, 1 Giorgio Valentini, 1 1 DSI, Dipartimento di Scienze
More informationABELIAN SELF-COMMUTATORS IN FINITE FACTORS
ABELIAN SELF-COMMUTATORS IN FINITE FACTORS GABRIEL NAGY Abstract. An abelian self-commutator in a C*-algebra A is an element of the form A = X X XX, with X A, such that X X and XX commute. It is shown
More informationAlgorithm Independent Topics Lecture 6
Algorithm Independent Topics Lecture 6 Jason Corso SUNY at Buffalo Feb. 23 2009 J. Corso (SUNY at Buffalo) Algorithm Independent Topics Lecture 6 Feb. 23 2009 1 / 45 Introduction Now that we ve built an
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationMIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:???
MIT 6.972 Algebraic techniques and semidefinite optimization May 9, 2006 Lecture 2 Lecturer: Pablo A. Parrilo Scribe:??? In this lecture we study techniques to exploit the symmetry that can be present
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationAn Introduction to Statistical Machine Learning - Theoretical Aspects -
An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2013/2014 Lesson 18 23 April 2014 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationA Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data
A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005
More informationLecture: Aggregation of General Biased Signals
Social Networks and Social Choice Lecture Date: September 7, 2010 Lecture: Aggregation of General Biased Signals Lecturer: Elchanan Mossel Scribe: Miklos Racz So far we have talked about Condorcet s Jury
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationBorda count approximation of Kemeny s rule and pairwise voting inconsistencies
Borda count approximation of Kemeny s rule and pairwise voting inconsistencies Eric Sibony LTCI UMR No. 5141, Telecom ParisTech/CNRS Institut Mines-Telecom Paris, 75013, France eric.sibony@telecom-paristech.fr
More information8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]
8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] While cross-validation allows one to find the weight penalty parameters which would give the model good generalization capability, the separation of
More informationMachine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77
Machine Learning Ludovic Samper Antidot September 1st, 2015 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77 Antidot Software vendor since 1999 Paris, Lyon, Aix-en-Provence 45 employees
More informationClassifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier
Classifier Learning Curve How to estimate classifier performance. Learning curves Feature curves Rejects and ROC curves True classification error ε Bayes error ε* Sub-optimal classifier Bayes consistent
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationGene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm
Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationLecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012
CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationELEMENTS OF PROBABILITY THEORY
ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable
More information