Advances in Anomaly Detection
|
|
- Marsha Shaw
- 5 years ago
- Views:
Transcription
1 Advances in Anomaly Detection Tom Dietterich Alan Fern Weng-Keen Wong Andrew Emmott Shubhomoy Das Md. Amran Siddiqui Tadesse Zemicheal
2 Outline Introduction Three application areas Two general approaches to anomaly detection Under-fitting Over-fitting DARPA ADAMS Red Team results Benchmarks for Anomaly Detection Validation Comparison Study Next Steps Anomaly Explanations Ensembles 2
3 Why Anomaly Detection? Data cleaning Find data points that contain errors Science Find data points that are interesting or unusual Security / fraud detection Find users/customers who are behaving weirdly 3
4 Data Cleaning for Sensor Networks An ideal method should produce two things given raw data: x11 Air Temperature (Degrees Celsius) x31 x29 x25 x20 x19 x18 x17 x Day Index (From Start of Deployment) 4
5 Data Cleaning for Sensor Networks An ideal method should produce two things given raw data: A label that marks anomalies Air Temperature (Degrees Celsius) Day Index (From Start of Deployment) x11 x12 x17 x18 x19 x20 x25 x29 x31 5
6 Data Cleaning for Sensor Networks An ideal method should produce two things given raw data: A label that marks anomalies An imputation of the true value (with some confidence measure) Dereszynski &, Dietterich, ACM TOS Air Temperature (Degrees Celsius) Day Index (From Start of Deployment) 6 x11 x12 x17 x18 x19 x20 x25 x29 x31
7 NASA: Finding Interesting Data Points Ingest data set Rank points by interestingness Repeat Show most interesting point to scientist Yes: Interesting No: Not Interesting Build model of the uninteresting points Most interesting point == most un-uninteresting point Most extreme outlier among the uninteresting points Mars Science Laboratory ChemCam Olivine First non-carbonate Wagstaff, Lanza, Thompson, Dietterich, Gilmore. AAAI
8 Security/Fraud Detection: DARPA ADAMS Program Desktop activity data collected from ~5000 employees of a corporation using Raytheon-Oakley Sureview CERT Red Team overlays selected employees with insider threat activity based on real scenarios Example Scenarios: Anomalous Encryption Layoff Logic Bomb Insider Startup Circumventing SureView Hiding Undue Affluence Survivor s Burden Team: LEIDOS (former SAIC); Ted Senator, PI; Rand Waltzman, PM. 8
9 Outline Introduction Three application areas Two general approaches to anomaly detection Under-fitting Over-fitting DARPA ADAMS Red Team results Benchmarks for Anomaly Detection Validation Comparison Study Next Steps Anomaly Explanations Ensembles 9
10 What is Anomaly Detection? Input: vectors xx ii R dd for ii = 1,, NN Assumed to be a mix of normal and anomalous data points Anomalies are generated by some distinct process (e.g., instrument failures, fraud, intruders, etc.) Output Anomaly score ss ii for each input xx ii such that higher scores are more anomalous and similar scores imply similar levels of anomalousness Metrics AUC: Probability that a randomly-chosen anomaly is ranked above a randomly-chosen normal point Precision in top K 10
11 Two General Approaches: Anomaly Detection Methods Anomaly Detection by Underfitting Anomaly Detection by Overfitting Gaussian Mixture Model (GMM) Ensemble of Gaussian Mixture Models (EGMM) Isolation Forest (IFOR) Repeated Impossible Discrimination Ensemble (RIDE) 11
12 Anomaly Detection by Under-Fitting Choose a class of models Fit to the data Let PP θθ xx ii be the probability density assigned to data point xx ii by the model θθ Assign score ss ii log PP θθ xx ii Low density points (poorlyexplained by the model) are the anomalies 12
13 Example: Gaussian Mixture Model PP xx = KK kk=1 pp kk Normal xx μμ kk, Σ kk K=3 13
14 Ensemble of GMMs Train MM independent Gaussian Mixture Models Train model mm = 1,, MM on a bootstrap replicate of the data Vary the number of clusters KK Delete any model with log likelihood < 70% of best model Compute average surprise: 1 MM log PP mm(xx ii ) mm 14
15 DARPA ADAMS Vegas Results Score each user and rank them all AUC = Probability that we correctly rank a randomlychoosen Red Team insert above a randomly-chosen normal user Vegas Sept 2012 ROC ROC (Vegas Sept) Vegas Oct 2012 ROC AUC=0.970 AUC=0.970 AvgLift= AUC=
16 New approach: Anomaly Detection By Over-Fitting Take the input points Randomly split in half and label half as 0, half as 1 Apply supervised learning to discriminate the 0 s from the 1 s (which by construction is impossible) ssssssssss xx = 0.5 PP yy ii = 1 Repeat random split Repeat discrimination ssssssssss xx = 0.5 PP yy ii = 1 Total score after 50 iterations RIDE: Repeated Impossible Discrimination Ensemble 16
17 RIDE Vegas Results Vegas Sept ROC Vegas Oct ROC AUC= AUC=
18 Isolation Forest [Liu, Ting, Zhou, 2011] Construct a fully random binary tree choose attribute jj at random choose splitting threshold θθ uniformly from min xx jj, max xx jj until every data point is in its own leaf let dd(xx ii ) be the depth of point xx ii repeat 100 times let dd (xx ii ) be the average depth of xx ii ssssssssss xx ii = 2 dd xx ii rr xx ii rr(xx ii ) is the expected depth xx jj > θθ xx 2 > θθ 2 xx 8 > θθ 3 xx 3 > θθ 4 xx 1 > θθ 5 xx ii 18
19 Outline Introduction Three application areas Two general approaches to anomaly detection Under-fitting Over-fitting DARPA ADAMS Red Team results Benchmarks for Anomaly Detection Validation Comparison Study Next Steps Anomaly Explanations Ensembles 19
20 VEGAS Results May 2013
21 VEGAS Results June 2013
22 VEGAS Results July 2013
23 Outline Introduction Three application areas Two general approaches to anomaly detection Under-fitting Over-fitting DARPA ADAMS Red Team results Benchmarks for Anomaly Detection Validation Comparison Study Next Steps Anomaly Explanations Ensembles 23
24 Needed: Benchmarks for Anomaly Detection Algorithms Shared benchmark databases have helped supervised learning make rapid progress UCI Repository of Machine Learning Data Sets Anomaly Detection lacks shared benchmarks Most data sets are proprietary and/or classified Exception: Lincoln Labs Simulated Network Intrusion data set hopelessly out of date Goal: Develop a collection of benchmark data sets with known properties 24
25 Benchmark Requirements The underlying process generating the anomalies should be distinct from the process generating the normal points anomalies are not merely outliers We need many benchmark data sets prevent the research community from fixating on a small number of problems Benchmark data sets should systematically vary a set of relevant properties 25
26 Relevant Properties Point difficulty: How difficult is it to separate each individual anomaly point from the normal points? Relative frequency: How rare are the anomalies? Clusteredness: Are the anomalous points tightly clustered or widely scattered? Irrelevant features: How many features are irrelevant? 26
27 Creating an Anomaly Detection Benchmark Data Set Select a UCI supervised learning dataset Choose one class to be the anomalies (call this class 0 and the union of the other classes class 1) Ensures that different processes are generating the anomalies and the normal points Computing point difficulty: Fit a kernel logistic regression model to estimate PP(yy = 1 xx), where yy is the class label oracle model Difficulty of xx ii is defined as PP yy = 1 xx for anomaly points according to the oracle For the desired relative frequency Select points based on difficulty and clusteredness Optionally: Add irrelevant features by selecting existing features and randomly permuting their values 27
28 Benchmark Collection 19 mother UCI data sets point difficulty: low: (0, 0.16) medium: [0.16, 0.33) high: [0.33, 0.5) very high: [0.5, 1) relative frequency: 0.001, 0.005, 0.01, 0.05, 0.1 clusteredness: 7 levels based on log σσ nn 2 σσ 2 aa variance of normal points divided by variance of anomalous points facility location algorithm used to select well-spaced points seed point neighbors used to find clustered points irrelevant features: 4 levels based on increasing the average distance between normal points 24,800 benchmark data sets generated 28
29 Benchmarking Study State of the art methods: ocsvm: one-class SVM (Schoelkopf et al. 1999) lof: Local Outlier Factor (Breuning et al. 2000) svdd: Support Vector Data Description (Tax & Duin, 2004) if: Isolation Forest (Liu et al., 2008, 2011) scif: SciForest (Liu et al., 2010) rkde: Robust Kernel Density Estimation (Kim & Scott, 2012) egmm: ours Analysis Measure the AUC of each method Compute mean AUC for each method Fit logistic regression model: logit AAAAAA = mmmmmmmmmmm + dddddddddddddddddddd + ffffffffffffffffff + cccccccccccccccccccccccccc + iiiiiiiiiiiiiiiiiiiiii 29
30 Benchmark Validity: Point Difficulty 30
31 Benchmark Validity: Relative Frequency 31
32 Benchmark Validity: Clusteredness 32
33 Algorithm Comparisons: Mean AUC Mean AUC if lof rkde egmm svdd scif ocsvm 33
34 Algorithm Comparisons: Logistic Regression Results if: Isolation Forest (Ling et al, 2011) rkde: Robust Kernel Density Estimation (Kim & Scott, 2012) egmm: ours lof: Local Outlier Factor (Breuning et al. 2000) ocsvm: one-class SVM (Schoelkopf et al. 1999) svdd: Support Vector Data Description (Tax & Duin, 2004) 34
35 Sensitivity to Irrelevant Features The performance of all methods drops with increasing # of irrelevant features RKDE and IFOR performing very well OCSVM extremely sensitive EGMM was hurt by the largest level of irrelevance top performer when there is no noise Average AUC level-0 level-1 level-2 level-3 Increasing # of Irrelevant Features egmm if lof rkde svdd scif ocsvm 35
36 Outline Introduction Three application areas Two general approaches to anomaly detection Under-fitting Over-fitting DARPA ADAMS Red Team results Benchmarks for Anomaly Detection Validation Comparison Study Next Steps Anomaly Explanations Ensembles 36
37 Next Steps Generate Explanations of each Anomaly for the Analyst Ensembles Model the peer-group structure of the organization the same user in previous days all users in the company today users with the same job class users who work together shared printer cliques 37
38 Anomaly Explanations Data Points Outliers Alarms Threats & Non-Threats Anomaly Detector Threats & False Positives Human Analyst Threats & False Positives Non-Outliers Discarded Non-Threats & Missed Threats Non-Threats & Missed Threats Type 1 Missed Threats = Anomaly Detector False Negatives Reduce by improving anomaly detector Type 2 Missed Threats = Analyst False Negatives Can occur due to information overload and time constraints We consider reducing Type 2 misses by providing explanations Why did the detector consider an object to be an outlier? Analyst can focus on info related to explanation 38
39 Sequential Feature Explanations Outliers + Explanations Threats & False Positives Human Analyst Alarms Threats & False Positives Goal: reduce analyst effort for correctly detecting outliers that are threats How: provide analyst with sequential feature explanations of outlier points Sequential Feature Explanation (SFE): an ordering on features of an outlier prioritized by importance to anomaly detector Protocol: incrementally reveal features ordered by SFE until analyst can make a confident determination 39
40 Typical Sequential Feature Explanation Curve Performance Metric: # of features that must be examined by the analyst in order to make a confident decision that a proposed threat (outlier) requires opening an investigation 40
41 Evaluating Explanations Methodological Problem: Evaluation requires access to an analyst, but we can t run large scale experiments with real analysts Solution: Construct simulated analysts that compute PP(nnnnnnnnnnnn xx) How: Start with an anomaly detection benchmark constructed from a UCI supervised learning data set [Emmott et al., 2013] Learn a classifier to predict anomaly vs. normal from labeled data (cheating) UCI Dataset Normal Points Supervised Learning Simulated Analyst Classifier PP(nnnnnnnnnnnn xx) 41 Anomaly Points Repeat for each subset of KK features
42 Explanation Methods for Density- Based Anomaly Detectors Density-based: Rank points xx according to estimated density ff(xx) Marginal Methods: greedily add features that most decrease joint marginal ff(xx 1,, xx KK ) Sequential Marginal: First feature xx ii minimizes ff(xx ii ) Second feature xx jj minimizes ff xx ii, xx jj.. Independent Marginal -- Order features by ff xx ii Dropout Methods: greedily remove features that most increase ff xx Sequential Dropout: Independent Dropout First feature xx ii minimizes ff(xx ( ii) ) -- Order features by ff xx ( ii) Second feature xx jj minimizes ff xx ( ii jj).. 42
43 Empirical Demonstration Datasets: 10,000 benchmarks derived from 7 UCI datasets Anomaly Detector: Ensemble of Gaussian Mixture Model (EGMM) Simulated Analysts: Regularized Random Forests (RRFs) Evaluation Metric: mean minimum feature prefix (MMFP) = average number of features revealed before the analyst is able to make a decision (exonerate vs. open investigation) 43
44 Results (EGMM + Explanation Method) MMFP 6 In these domains, IndDO an oracle 5only needs IndMarg 1-2 features OptOracle SeqDO Dropout SeqMarg methods Randomare often worse than marginal Often no benefit to Random is sequential methods always worst over independent methods 44
45 Results (EGMM + Explanation Method) All methods significantly beat random Marginal methods no worse and sometimes better than dropout Independent marginal is nearly as good as sequential marginal 45
46 KDD99 (Computer Intrusion) Results (EGMM detector) MMFP Independent Dropout Sequential Dropout Independent Marginal Sequential Marginal 46 [95% Confidence Intervals] Marginal Methods are Best One Feature is Enough!
47 Ensemble Methods In Supervised Learning, ensemble methods have been shown to be very powerful bagging random forests boosting Can we develop general-purpose ensemble methods for Anomaly Detection? Our methods employ internal ensembles Can we combine heterogeneous anomaly detection algorithms into an external ensemble? 47
48 Comparison of Ensemble Methods 2-Component Gaussian PCA 0.3 Schubert 0.2 Schubert-info 0.1 glmnet 0 L1 logistic regression -0.1 Isolation Forest -0.2 best non-ensemble method -0.3 Change in logit(auc) wrt gauss-model Ensemble Comparison (MAGIC Gamma Telescope) gauss-model PCA Schubert Schubertinfo glmnet iforest 48
49 Ensemble Conclusions No convincing evidence that ensembles work better than simply running iforest 49
50 Concluding Remarks Anomaly Detection has received relatively little study in machine learning, statistics, and data mining There are two main paradigms for designing algorithms anomaly detection by under-fitting anomaly detection by over-fitting The over-fitting paradigm is producing interesting algorithms They also require less modeling effort They can be very efficient In the analyst case, simple marginal scores work very well for sequential feature explanations 50
51 Questions? Anomaly Detection has received relatively little study in machine learning, statistics, and data mining There are two main paradigms for designing algorithms anomaly detection by under-fitting anomaly detection by over-fitting The over-fitting paradigm is producing interesting algorithms They also require less modeling effort They can be very efficient In the analyst case, simple marginal scores work very well for sequential feature explanations 51
SYSTEMATIC CONSTRUCTION OF ANOMALY DETECTION BENCHMARKS FROM REAL DATA. Outlier Detection And Description Workshop 2013
SYSTEMATIC CONSTRUCTION OF ANOMALY DETECTION BENCHMARKS FROM REAL DATA Outlier Detection And Description Workshop 2013 Authors Andrew Emmott emmott@eecs.oregonstate.edu Thomas Dietterich tgd@eecs.oregonstate.edu
More informationSequential Feature Explanations for Anomaly Detection
Sequential Feature Explanations for Anomaly Detection Md Amran Siddiqui, Alan Fern, Thomas G. Die8erich and Weng-Keen Wong School of EECS Oregon State University Anomaly Detection Anomalies : points that
More informationIntroduction to Density Estimation and Anomaly Detection. Tom Dietterich
Introduction to Density Estimation and Anomaly Detection Tom Dietterich Outline Definition and Motivations Density Estimation Parametric Density Estimation Mixture Models Kernel Density Estimation Neural
More informationarxiv: v2 [cs.ai] 26 Aug 2016
AD A Meta-Analysis of the Anomaly Detection Problem ANDREW EMMOTT, Oregon State University SHUBHOMOY DAS, Oregon State University THOMAS DIETTERICH, Oregon State University ALAN FERN, Oregon State University
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationCS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.
CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform
More informationToward automated quality control for hydrometeorological. station data. DSA 2018 Nyeri 1
Toward automated quality control for hydrometeorological weather station data Tom Dietterich Tadesse Zemicheal 1 Download the Python Notebook https://github.com/tadeze/dsa2018 2 Outline TAHMO Project Sensor
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationAnomaly Detection. Jing Gao. SUNY Buffalo
Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationIntroduction to Signal Detection and Classification. Phani Chavali
Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)
More informationAnomaly Detection for the CERN Large Hadron Collider injection magnets
Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing
More informationUnsupervised Anomaly Detection for High Dimensional Data
Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationAnomaly Detection via Over-sampling Principal Component Analysis
Anomaly Detection via Over-sampling Principal Component Analysis Yi-Ren Yeh, Zheng-Yi Lee, and Yuh-Jye Lee Abstract Outlier detection is an important issue in data mining and has been studied in different
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationLearning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht
Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh Outline Introduction Learning with
More informationChapter 6: Classification
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 6: Classification Lecture: Prof. Dr. Thomas
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationAn Overview of Outlier Detection Techniques and Applications
Machine Learning Rhein-Neckar Meetup An Overview of Outlier Detection Techniques and Applications Ying Gu connygy@gmail.com 28.02.2016 Anomaly/Outlier Detection What are anomalies/outliers? The set of
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationPointwise Exact Bootstrap Distributions of Cost Curves
Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationFeedback-Guided Anomaly Discovery via Online Optimization
Feedback-Guided Anomaly Discovery via Online Optimization Md Amran Siddiqui Oregon State University Corvallis, OR 97331, USA siddiqmd@eecs.oregonstate.edu Ryan Wright Galois, Inc. Portland, OR 97204, USA
More informationA Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter
A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter Murat Akcakaya Department of Electrical and Computer Engineering University of Pittsburgh Email: akcakaya@pitt.edu Satyabrata
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationW vs. QCD Jet Tagging at the Large Hadron Collider
W vs. QCD Jet Tagging at the Large Hadron Collider Bryan Anenberg: anenberg@stanford.edu; CS229 December 13, 2013 Problem Statement High energy collisions of protons at the Large Hadron Collider (LHC)
More informationOliver Dürr. Statistisches Data Mining (StDM) Woche 11. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften
Statistisches Data Mining (StDM) Woche 11 Oliver Dürr Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften oliver.duerr@zhaw.ch Winterthur, 29 November 2016 1 Multitasking
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationLoda: Lightweight on-line detector of anomalies
Mach Learn (2016) 102:275 304 DOI 10.1007/s10994-015-5521-0 Loda: Lightweight on-line detector of anomalies Tomáš Pevný 1,2 Received: 2 November 2014 / Accepted: 25 June 2015 / Published online: 21 July
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationBoosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationSummary and discussion of: Dropout Training as Adaptive Regularization
Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial
More informationLoss Functions, Decision Theory, and Linear Models
Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationSupport Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationRoberto Perdisci^+, Guofei Gu^, Wenke Lee^ presented by Roberto Perdisci. ^Georgia Institute of Technology, Atlanta, GA, USA
U s i n g a n E n s e m b l e o f O n e - C l a s s S V M C l a s s i f i e r s t o H a r d e n P a y l o a d - B a s e d A n o m a l y D e t e c t i o n S y s t e m s Roberto Perdisci^+, Guofei Gu^, Wenke
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationA Framework for Adaptive Anomaly Detection Based on Support Vector Data Description
A Framework for Adaptive Anomaly Detection Based on Support Vector Data Description Min Yang, HuanGuo Zhang, JianMing Fu, and Fei Yan School of Computer, State Key Laboratory of Software Engineering, Wuhan
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Vector Data: Clustering: Part II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 3, 2017 Methods to Learn: Last Lecture Classification Clustering Vector Data Text Data Recommender
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More informationFRaC: A Feature-Modeling Approach for Semi-Supervised and Unsupervised Anomaly Detection
Noname manuscript No. (will be inserted by the editor) FRaC: A Feature-Modeling Approach for Semi-Supervised and Unsupervised Anomaly Detection Keith Noto Carla Brodley Donna Slonim Received: date / Accepted:
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,
More informationChart types and when to use them
APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationA Simple Algorithm for Learning Stable Machines
A Simple Algorithm for Learning Stable Machines Savina Andonova and Andre Elisseeff and Theodoros Evgeniou and Massimiliano ontil Abstract. We present an algorithm for learning stable machines which is
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationPredicting Storms: Logistic Regression versus Random Forests for Unbalanced Data
CS-BIGS 1(2): 91-101 2007 CS-BIGS http://www.bentley.edu/cdbigs/vol1-2/ruiz.pdf Predicting Storms: Logistic Regression versus Random Forests for Unbalanced Data Anne Ruiz-Gazen Institut de Mathématiques
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationAutomated Discovery of Novel Anomalous Patterns
Automated Discovery of Novel Anomalous Patterns Edward McFowland III Machine Learning Department School of Computer Science Carnegie Mellon University mcfowland@cmu.edu DAP Committee: Daniel B. Neill Jeff
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationRandom Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.
Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average
More informationClassification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.
Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),
More informationManual for a computer class in ML
Manual for a computer class in ML November 3, 2015 Abstract This document describes a tour of Machine Learning (ML) techniques using tools in MATLAB. We point to the standard implementations, give example
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationConcentration-based Delta Check for Laboratory Error Detection
Northeastern University Department of Electrical and Computer Engineering Concentration-based Delta Check for Laboratory Error Detection Biomedical Signal Processing, Imaging, Reasoning, and Learning (BSPIRAL)
More informationMidterm Exam, Spring 2005
10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name
More information