Variable Selection and Weighting by Nearest Neighbor Ensembles
|
|
- Camron Gray
- 5 years ago
- Views:
Transcription
1 Variable Selection and Weighting by Nearest Neighbor Ensembles Jan Gertheiss (joint work with Gerhard Tutz) Department of Statistics University of Munich WNI 2008
2 Nearest Neighbor Methods Introduction One of the simplest and most intuitive techniques for statistical discrimination (Fix & Hodges, 1951). Nonparametric and memory based. Given data (y i, x i ) with categorial response y i {1,..., G} and (metric) p-dim. predictor x i : Place a new observation with unknown class label into the class of the observation from the training set that is closest to the new observation - with respect to covariates. Closeness, resp. distance d of two observations is derived from a specific metric in the predictor space. Given the Euclidian metric d(x i, x r ) = p j=1 x ij x rj 2.
3 Nearest Neighbor Methods Class probability estimates Use not only the first but the k nearest neighbors. The relative frequency of predictions in favor of category g among these neighbors can be seen as an estimate of the probability of category g. Estimates ˆπ ig can take values h/k, h {0,..., k}. If neighbors are weighted with respect to their distance to the observation of interest, ˆπ can in principle take all values in [0, 1].
4 Nearest Neighbor Ensembles Basic Concept Final estimation by an ensemble of single predictors: Use the ensemble formula for computing the probability that observation i falls in category g: ˆπ ig = p c j ˆπ ig(j), with c j 0 j and j j=1 c j = 1. With k nearest neighbor estimates ˆπ ig(j) based on predictor j only. Weights - or coefficients - c 1,..., c p need to be determined.
5 Nearest Neighbor Ensembles More Flexibility? Why not ˆπ ig = j c gj ˆπ ig(j), with c gj 0 g, j and j c gj = 1 g? It can be shown: Restriction c 1j =... = c Gj = c j is the only possibility to ensure that 1. ˆπ ig 0 g and 2. P g ˆπ ig = 1 for all possible future estimations {ˆπ ig(j) } with 1. ˆπ ig(j) 0 g, j and 2. P g ˆπ ig(j) = 1 j.
6 Determination of Weights Principles Given all {ˆπ ig(j) }, matrix ˆΠ with (ˆΠ) ig = ˆπ ig depends on c = (c 1,..., c p ) T. Given the training data with predictors x 1,..., x n and true class labels y = (y 1,..., y n ) T, a previously chosen loss function - or score - L(y, ˆΠ) is minimized over all possible c. Note: The categorial response y i is alternatively represented by a vector z i = (z i1,..., z ig ) T of dummy variables { 1, if yi = g z ig = 0, otherwise
7 Determination of Weights Possible loss functions Log Score L(y, ˆΠ) = i z ig log(1/ˆπ ig ) g + likelihood based Hypersensitive Inapplicable for nearest neighbor estimates. Approximate Log Score L(y, ˆΠ) = i g z ig ( (1 ˆπ ig ) (1 ˆπ ig ) 2 ) + Hypersensitivity removed Not "incentive compatible" (Selten, 1998), i.e. expected loss E(L) = G y=1 π y L(y, ˆπ y ) not minimized by ˆπ g = π g.
8 Determination of Weights Possible loss functions Quadratic Loss / Brier Score L(y, ˆΠ) = i (z ig ˆπ ig ) 2 g (introduced by Brier, 1950) + Not hypersensitive + Incentive compatible (see e.g. Selten, 1998) + Also takes into account how the estimated probabilities are distributed over the false classes.
9 Determination of Weights Practical implementation 1. For each observation i create a matrix P i of predictions: (P i ) gj = ˆπ ig(j). 2. Create a vector z = (z T 1,..., zt n ) T and a matrix P = (P T 1... PT n ) T. 3. Now the Brier Score as function of c can be written in matrix notation: L(c) = (z Pc) T (z Pc). 4. Given restrictions c j 0 j and j c j = 1, weights c j can be determined using quadratic programming methods; e.g. using the R add-on package quadprog. Given the approximate log score the weights can be determined in a similar way.
10 Variable Selection Variable Selection means setting c j = 0 for some j. Thresholding: Hard: c j = 0, if c j < t; c j = c j, otherwise; e.g. t = 0.25 max j {c j }. Soft: c j = (c j t) +. (followed by rescaling) Lasso based approximate solutions: If restrictions are replaced by j c j s, a lasso type problem (Tibshirani, 1996) arises. Lasso typical selection characteristics cause c j = 0 for some j. (followed by rescaling and c j = c + j )
11 Including Interactions Matrix P may be augmented by including interactions of predictors. Adding all predictions ˆπ ig(jl), resp. ˆπ ig(jlm) based on two or even three predictors. Feasible for small scale problems only; P has p + ( p 2) +... columns.
12 Simulation Studies I Two classification problems There are 10 independent features x j, each uniformly distributed on [0, 1]. The two class 0/1 coded response y is defined as follows (cf. Hastie et al., 2001): as an "easy" problem: y = I (x 1 > 0.5), and as a "difficult" problem: y = I (sign( 3 j=1 (x j 0.5)) > 0).
13 Simulation Studies I Reference methods Nearest neighbor methods: (3) Nearest neighbor based extended forward / backward variable selection. With tuning parameter S as the number of simple forward / backward selection steps that are checked in each iteration. Weighted (5) nearest neighbor prediction; R add-on package kknn. Some alternative classification tools: Linear discriminant analysis (LDA); R add-on package MASS. CART (Breiman et al., 1984) and Random Forests (Breiman, 2001); R add-on packages rpart, randomforest.
14 Simulation Studies I The easy problem Prediction performance on the test set in terms of the Brier Score and No. of Missclassified Observations: loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF
15 Simulation Studies I The easy problem Variable selection/weighting by nearest neighbor based (extended) forward/backward selection (left) or nearest neighbor ensembles (right): (1) approx. Log Score used Realization Weight Weight Term No. (2) Quadratic Score used Variable No. Term No.
16 Simulation Studies I The difficult problem Prediction performance on the test set in terms of the Brier Score and No. of Missclassified Observations: loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF
17 Simulation Studies I The difficult problem Variable selection/weighting by nearest neighbor based (extended) forward/backward selection (left) or nearest neighbor ensembles (right): (1) approx. Log Score used Realization Weight Weight Term No. (2) Quadratic Score used Variable No. Term No.
18 Simulation Studies II cf. Hastie & Tibshirani (1996) 1. 2 Dimensional Gaussian: Two Gaussian classes in two dimensions Dimensional Gaussian with 14 Noise: Additionally 14 independent standard normal noise variables. 3. Unstructured: 4 classes, each with 3 spherical bivariate normal subclasses; means are chosen at random. 4. Unstructured with 8 Noise: Augmented with 8 independent standard normal predictors Dimensional Spheres with 6 Noise: First 4 predictors in class 1 independent standard normal, conditioned on radius > 3; class 2 without restrictions Dimensional Spheres: All 10 predictors in class 1 conditioned on 22.4 < radius 2 < Constant Class Probabilities: Class probabilities (0.1,0.2,0.2,0.5) are independent of the predictors. 8. Friedman s example: Predictors in class 1 independent standard normal, in class 2 independent normal with mean and variance proportional to j and 1/ j respectively, j = 1,..., 10.
19 Simulation Studies II Scenario 1-4 loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF (1) 2 normals (2) 2 normals with noise loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF (3) unstructured (4) unstructured with noise
20 Simulation Studies II Scenario 5-8 loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF (5) 4D sphere in 10 dimensions (6) 10D sphere in 10 dimensions loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF (7) constant class probabilities (8) Friedman's example
21 Real World Data Glass data, R package mlbench Forecast the type of glass (6 classes) on the basis of the chemical analysis given in form of 9 metric predictors. Result 3NNE-QS / all data Weight Term No.
22 Real World Data Glass data, R package mlbench Forecast the type of glass (6 classes) on the basis of the chemical analysis given in form of 9 metric predictors. Performance / 50 random splits Loss NN FS 3NN BS 3NNE LS 3NNE QS w5nn LDA CART RF
23 Summary Nearest neighbor ensembles Nonparametric probability estimation by an ensemble, i.e. weighted average of nearest neighbor estimates. Each estimate is based on a single or a very small subset of predictors. No black box (by contrast to many other ensemble methods). Good performance for small scale problems, particularly if pure noise variables can be separated from relevant covariates. Direct application to high dimensional problems with interactions is not recommended. Given microarrays possibly useful as nonparametric gene preselection tool. May be employed for automatic choice of the most appropriate metrics or the right neighborhood. Application to regression problems is possible as well.
24 References BREIMAN, L. (2001): Random Forests, Machine Learning 45, 5 32 BREIMAN, L. et al. (1984): Classification and Regression Trees, Monterey, CA: Wadsworth. BRIER, G. W. (1950): Verification of forcasts expressed in terms of probability, Monthly Weather Review 78, 1 3. FIX, E. and J. L. HODGES (1951): Discriminatory analysis - nonparametric discrimination: consistency properties, US air force school of aviation medicine, Randolph Field Texas. GERTHEISS, J. and TUTZ, G. (2008): Feature Selection and Weighting by Nearest Neighbor Ensembles, University of Munich, Department of Statistics: Technical Report, No.33. HASTIE, T. and R. TIBSHIRANI (1996): Discriminant adaptive nearest neighbor classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 18, HASTIE, T. et al. (2001): The Elements of Statistical Learning, New York: Springer. R Development Core Team (2007): R: A Language and Environment for Statistical Computing, Vienna: R Foundation for Statistical Computing. SELTEN, R. (1998): Axiomatic characterization of the quadratic scoring rule, Experimental Economics 1, TIBSHIRANI, R. (1996): Regression Shrinkage and Selection via the Lasso, Journal of the Royal Statistical Society B, 58,
Classification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationRandom Forests for Ordinal Response Data: Prediction and Variable Selection
Silke Janitza, Gerhard Tutz, Anne-Laure Boulesteix Random Forests for Ordinal Response Data: Prediction and Variable Selection Technical Report Number 174, 2014 Department of Statistics University of Munich
More informationNearest Neighbors Methods for Support Vector Machines
Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad
More informationMaking Our Cities Safer: A Study In Neighbhorhood Crime Patterns
Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence
More informationJEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA
1 SEPARATING SIGNAL FROM BACKGROUND USING ENSEMBLES OF RULES JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA 94305 E-mail: jhf@stanford.edu
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationLASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape
LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationRandom Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.
Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average
More informationSupplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests
Supplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests Irene Epifanio Dept. Matemàtiques and IMAC Universitat Jaume I Castelló,
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Introduction to Classification Algorithms Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationVariance and Bias for General Loss Functions
Machine Learning, 51, 115 135, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Variance and Bias for General Loss Functions GARETH M. JAMES Marshall School of Business, University
More informationClassification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.
Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationAdaptive Crowdsourcing via EM with Prior
Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and
More informationREGRESSION TREE CREDIBILITY MODEL
LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017 Overview Statistical }{{ Method
More informationRandom projection ensemble classification
Random projection ensemble classification Timothy I. Cannings Statistics for Big Data Workshop, Brunel Joint work with Richard Samworth Introduction to classification Observe data from two classes, pairs
More informationVariable Selection in Data Mining Project
Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationClassification via kernel regression based on univariate product density estimators
Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationapplication in microarrays
Biostatistics Advance Access published April 7, 2006 Regularized linear discriminant analysis and its application in microarrays Yaqian Guo, Trevor Hastie and Robert Tibshirani Abstract In this paper,
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationBAGGING PREDICTORS AND RANDOM FOREST
BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationIndependent Factor Discriminant Analysis
Independent Factor Discriminant Analysis Angela Montanari, Daniela Giovanna Caló, and Cinzia Viroli Statistics Department University of Bologna, via Belle Arti 41, 40126, Bologna, Italy (e-mail: montanari@stat.unibo.it,
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More informationImportance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University
Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response
More informationInternational Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA
International Journal of Pure and Applied Mathematics Volume 19 No. 3 2005, 359-366 A NOTE ON BETWEEN-GROUP PCA Anne-Laure Boulesteix Department of Statistics University of Munich Akademiestrasse 1, Munich,
More informationPerformance of Cross Validation in Tree-Based Models
Performance of Cross Validation in Tree-Based Models Seoung Bum Kim, Xiaoming Huo, Kwok-Leung Tsui School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, Georgia 30332 {sbkim,xiaoming,ktsui}@isye.gatech.edu
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationRegularized Discriminant Analysis and Its Application in Microarrays
Biostatistics (2005), 1, 1, pp. 1 18 Printed in Great Britain Regularized Discriminant Analysis and Its Application in Microarrays By YAQIAN GUO Department of Statistics, Stanford University Stanford,
More informationMSA220 Statistical Learning for Big Data
MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More information2. (Today) Kernel methods for regression and classification ( 6.1, 6.2, 6.6)
1. Recap linear regression, model selection, coefficient shrinkage ( 3.1, 3.2, 3.3, 3.4.1,2,3) logistic regression, linear discriminant analysis (lda) ( 4.4, 4.3) regression with spline basis (ns, bs),
More informationRegularized Discriminant Analysis and Its Application in Microarray
Regularized Discriminant Analysis and Its Application in Microarray Yaqian Guo, Trevor Hastie and Robert Tibshirani May 5, 2004 Abstract In this paper, we introduce a family of some modified versions of
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationChapter 6. Ensemble Methods
Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationProximity-Based Anomaly Detection using Sparse Structure Learning
Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /
More informationIntroduction to Signal Detection and Classification. Phani Chavali
Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationA Robust Approach to Regularized Discriminant Analysis
A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationGenomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs.
Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Genomic Data German Cancer Research Center, Heidelberg, Germany June 16, 6 Diagnostics signatures
More informationarxiv: v1 [stat.ml] 3 Mar 2019
Classification via local manifold approximation Didong Li 1 and David B Dunson 1,2 Department of Mathematics 1 and Statistical Science 2, Duke University arxiv:1903.00985v1 [stat.ml] 3 Mar 2019 Classifiers
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationENSEMBLES OF DECISION RULES
ENSEMBLES OF DECISION RULES Jerzy BŁASZCZYŃSKI, Krzysztof DEMBCZYŃSKI, Wojciech KOTŁOWSKI, Roman SŁOWIŃSKI, Marcin SZELĄG Abstract. In most approaches to ensemble methods, base classifiers are decision
More informationFeature Engineering, Model Evaluations
Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationA Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data
A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005
More information8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]
8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] While cross-validation allows one to find the weight penalty parameters which would give the model good generalization capability, the separation of
More informationMaximum Entropy Generative Models for Similarity-based Learning
Maximum Entropy Generative Models for Similarity-based Learning Maya R. Gupta Dept. of EE University of Washington Seattle, WA 98195 gupta@ee.washington.edu Luca Cazzanti Applied Physics Lab University
More informationStatistical Consulting Topics Classification and Regression Trees (CART)
Statistical Consulting Topics Classification and Regression Trees (CART) Suppose the main goal in a data analysis is the prediction of a categorical variable outcome. Such as in the examples below. Given
More information